CN107133607B

CN107133607B - Demographics' method and system based on video monitoring

Info

Publication number: CN107133607B
Application number: CN201710395675.4A
Authority: CN
Inventors: 黄良军; 张亚妮
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2017-05-27
Filing date: 2017-05-27
Publication date: 2019-10-11
Anticipated expiration: 2037-05-27
Also published as: CN107133607A

Abstract

The present invention proposes a kind of Demographics' method and system based on video monitoring, method includes the following steps: S1: obtaining the real-time monitoring images of pre- monitoring area；S2: foreground segmentation is carried out to real-time monitoring images, is partitioned into crowd；S3: complete characteristics of human body's model is divided into five local feature models, and configures corresponding weight for each local feature model；S4: the feature of the crowd after segmentation is extracted, and matching detection is carried out with five local feature models, comprehensive matching degree is calculated according to the positional shift between weight and the five local feature models and complete characteristics of human body's model of extraction, if being more than specific threshold, successful match；S5: counting crowd's quantity according to the number of successful match, draw crowd's quantity change curve, when curve or curvilinear trend triggering anomalous event, is warned.Demographics' method and system based on video monitoring of the invention, are applicable to different occasions, accuracy is higher.

Description

Demographics' method and system based on video monitoring

Technical field

The present invention relates to field of intelligent monitoring more particularly to a kind of Demographics' method and system based on video monitoring.

Background technique

Transport hub, the people streams in public places congestion phenomenon such as large-scale activity scene and megastore are more and more frequent.Number Statistics has become the importance of society's total safety and most optimum distribution of resources.The competition of business etc., with accurate Data based on analyzed and managed, establish intelligentized passenger number statistical system as development trend.

There are mainly two types of current existing demographic methods: 1) being examined using body templates to the people in video image It surveys；2) people in video image is detected using the number of people or head and shoulder template.Two methods have some limitations, only Can be used for specific scene, portable poor, method 1) detection effect is poor in the higher scene of crowd density, method 2) in people Accuracy rate is low in group's lower scene of density.

Summary of the invention

Demographics' method and system based on video monitoring that technical problem to be solved by the invention is to provide a kind of, can Suitable for different occasions, accuracy is higher.

To solve the above problems, the present invention proposes a kind of Demographics' method based on video monitoring, comprising the following steps:

S1: the real-time monitoring images of pre- monitoring area are obtained；

S2: foreground segmentation is carried out to real-time monitoring images, is partitioned into crowd；

S3: complete characteristics of human body's model is divided into five local feature models, and configures phase for each local feature model The weight answered；

S4: extracting the feature of the crowd after segmentation, and carries out matching detection with five local feature models, Comprehensive is calculated according to the positional shift between weight and the five local feature models and complete characteristics of human body's model of extraction With degree, if being more than specific threshold, successful match；

S5: crowd's quantity is counted according to the number of successful match, crowd's quantity change curve is drawn, when curve or curve become When gesture triggers anomalous event, warned.

According to one embodiment of present invention, the step S3 the following steps are included:

S31: by complete characteristics of human body's model partition be five local feature models, respectively head, the upper part of the body a left side Two up and down of right two and the lower part of the body, obtain model M₁, model M₂, model M₃, model M₄, model M₅, complete characteristics of human body Model M=M₁+M₂+M₃+M₄+M₅；

S32: weight beta={ β of five local feature models of configuration₁, β₂, β₃, β₄, β₅, so that β₁+β₂+β₃+β₄+β₅=1；

S33: the HOG feature of characteristics of human body's model sample and non-human characteristic model sample is extracted, is obtained by SVM classifier The template of five local feature models.

According to one embodiment of present invention, the step S4 the following steps are included:

S41: the HOG feature of the crowd after extracting segmentation；

S42: the feature of extraction is matched with the template of five local feature models, comprehensive matching degree S= β₁M₁+β₂M₂+β₃M₃+β₄M₄+β₅M₅- b, wherein b indicates that the position between practical local feature model and ideal stance model is inclined It moves, b=b₂+b₃+b₄+b₅, model M₁Coordinate (the e of central point₁, f₁), model M₂Coordinate (the e of central point₂, f₂), model M₃Center Coordinate (the e of point₃, f₃), model M₄Coordinate (the e of central point₄, f₄), model M₅Coordinate (the e of central point₅, f₅)；

S43: the comprehensive matching degree of human body each in obtained crowd is compared with specific threshold, if being more than specific Threshold value, then successful match is primary, the successful number of statistical match.

According to one embodiment of present invention, the data of pre- monitoring area are acquired as training set, are determined using training set Mix the weight of five local feature models of dynamic texture model and human body.

According to one embodiment of present invention, the step S5 the following steps are included:

S51: counting crowd's quantity according to the number of successful match, draws crowd's quantity change curve, and abnormal in triggering Event alarm；

S52: recording crowd's number curve of triggered anomalous event, is placed into possible anomalous event set；

S53: it when curve or trend occur again, is given warning in advance according to warning level；If subsequent triggers anomalous event, The danger classes of the curve element is increased in possible anomalous event set, conversely, danger classes declines；

S54: the height according to danger classes is determining or adjusts warning level.

According to one embodiment of present invention, in the step S2, to real-time monitoring images using mixing dynamic texture mould Type carries out foreground segmentation to video image, comprising the following steps:

S21: by real-time monitoring images binaryzation, and a mixing dynamic texture model is established；

S22: the unknown parameter in the mixing dynamic texture model is solved using EM algorithm, so that it is determined that the mixing is dynamic State texture model；

S23: according to the mixing dynamic texture model, calculating the maximum likelihood probability of monitoring image mixing dynamic texture, Texture class label is obtained, is clustered according to texture class label by dynamic texture is mixed；

S24: the texture after scanning cluster, using the similar dynamic texture for the crowd that characterizes as prospect, remaining texture is as back Scape is partitioned into the dynamic texture of the crowd of different directions walking.

According to one embodiment of present invention, the real-time monitoring images of the pre- monitoring area of acquisition are divided into training set and test Collection determines the weight of mixing five local feature models of dynamic texture model and human body using training set, then by test set Investment executes step S1-S5.

Demographics' system based on video monitoring that the present invention also provides a kind of, comprising:

Image acquisition unit obtains the real-time monitoring images of pre- monitoring area；

Crowd's cutting unit carries out foreground segmentation to real-time monitoring images, is partitioned into crowd；

Complete characteristics of human body's model is divided into five local feature models, and is each local feature by feature division unit Model configures corresponding weight；

Characteristic matching unit extracts the feature of the crowd after segmentation, and with five local feature models into Row matching detection, according to the positional shift between weight and the five local feature models and complete characteristics of human body's model of extraction Comprehensive matching degree is calculated, if being more than specific threshold, successful match；

Alarm unit is counted, crowd's quantity is counted according to the number of successful match, crowd's quantity change curve is drawn, works as song When line or curvilinear trend triggering anomalous event, warned.

After adopting the above technical scheme, the present invention has the advantages that compared with prior art

Each local feature model has corresponding weight, and the match condition of comprehensive all local feature models obtains finally Matching result, by change local feature model weight to can be applied to different occasions, facilitate the transplanting of algorithm, really The accuracy that guarantor group counts；Anomalous event feedback mechanism is established, enhances the stability of system, avoids false-alarm；

Foreground segmentation is carried out using mixing dynamic texture, the stream of people of different directions is distinguished, additional algorithm is not needed Tracking crowd determines crowd direction, saves time cost, improves efficiency of algorithm；

Consider the positional shift between body local characteristic model and complete manikin, human body shape can be effectively prevent Become.

Detailed description of the invention

Fig. 1 is the flow diagram of Demographics' method based on video monitoring of one embodiment of the invention；

Fig. 2 is that using to real-time monitoring images for one embodiment of the invention mixes dynamic texture model progress foreground segmentation Flow diagram；

Fig. 3 is the flow diagram that the EM algorithm of one embodiment of the invention solves；

Fig. 4 a is the structural schematic diagram of the ideal stance model of one embodiment of the invention；

Fig. 4 b be one embodiment of the invention there are the structural schematic diagrams of the practical local feature model of positional shift；

Fig. 5 is the flow diagram of the statistics alert stage of one embodiment of the invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.

In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention.But the present invention can be with Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to intension of the present invention the case where Under do similar popularization, therefore the present invention is not limited to the specific embodiments disclosed below.

Referring to Fig. 1, in one embodiment, Demographics' method based on video monitoring the following steps are included:

S1: the real-time monitoring images of pre- monitoring area are obtained；

Below to the further details of elaboration of Demographics' method based on video monitoring, but should not be as limit.

In step sl, the real-time monitoring images of pre- monitoring area are obtained, which is to need to carry out number system In the occasion of meter as obtained by the shooting of the filming apparatus such as monitoring camera, monitoring camera passes real time monitoring picture back in real time, for holding Row subsequent step.

In step s 2, foreground segmentation is carried out to the real-time monitoring images obtained in step S1, is partitioned into crowd.This time-division The prospect cut out is crowd's information, can not also determine crowd's quantity.

Referring to Fig. 2, in step s 2, prospect point preferably is carried out using mixing dynamic texture model to real-time monitoring images It cuts, comprising the following steps:

Wherein, y_tIndicate the video image of binaryzation, i.e. observation sequence；x_tIndicate the derivation ingredient of video image at any time, That is hidden state sequence；A_zFor state-transition matrix；C_zFor observation matrix；v_tFor the noise generated in state migration procedure, obey Mean value is zero, covariance Q_zGaussian Profile；ω_tFor observation noise, obeying mean value is zero, covariance R_z；Z is directed toward certain Specific dynamic texture；

S22: institute is solved using EM algorithm (Expectation Maximization Algorithm, EM algorithm) The unknown parameter in mixing dynamic texture model is stated, so that it is determined that the mixing dynamic texture model；

More specifically, observation sequence y_tLength be N, traditional representation isInitial hidden state variable x₁Clothes It is μ from mean value_z, covariance S_zGaussian Profile；Model parameterθ_z={ A_z, Q_z, C_z, R_z, μ_z, S_z}；K is indicated The type of texture, every kind of texture represent a kind of crowd in direction；α indicates the weight of texture, andEM algorithm is related to Following formula:

One, the conditional probability distribution of hidden state sequence and observation sequence:

Conditional probability distribution p (the x of initial hidden state sequence₁| z)=G (x₁, μ_z, S_z)

(2)

Conditional probability distribution p (the x of hidden state sequence_t|x_t-1, z) and=G (x_t, A_zx_t-1, Q_z)

(3)

Conditional probability distribution p (the y of observation sequence_t|x_t, z) and=G (y_t, C_zx_t, R_z)

(4)

T indicates the time, and t=1 indicates initial time, x₁Indicate initial hidden state variable.Z is directed toward certain specific dynamic State texture, x_tIndicate the hidden state sequence of implicit t moment, y_tIndicate the observation sequence of t moment.Indicate initial hidden on the right of equation Conditional probability, the conditional probability Gaussian distributed of observation sequence of conditional probability, hidden state sequence containing status switch.A_z For state-transition matrix；C_zFor observation matrix；v_tFor the noise generated in state migration procedure, obeying mean value is zero, and covariance is Q_zGaussian Profile；ω_tFor observation noise, obeying mean value is zero, covariance R_z。

Two, expectation relevant to hidden state sequence:

The expectation of hidden state sequence

(i indicates that i-th of observation sequence, j indicate certain specific texture)；

Two above formula (6) (7) is relevant to the expectation of hidden state sequence, since it is desired that by hidden state sequence The expectation of column acquires, and formula (10) below asks parameter needs to use.

Three, conditional probability relevant to texture:

Conditional probability relevant to texture is i.e.: according to known observation sequence y⁽ⁱ⁾Seek z⁽ⁱ⁾The probability of=j is how many, j table Show jth kind texture, j=1,2 ..., and K }, K is the type of texture.At this point it is possible to by the dynamic texture in video image into Row classification can extract prospect by corresponding label using the similar dynamic texture for the crowd that characterizes as prospect, that is, before Scape segmentation.

Four, the veins clustering label of dynamic texture is mixed:

l_i=argmax [logp (y⁽ⁱ⁾|z⁽ⁱ⁾=j)+log α_j] (9)

argmax[logp(y⁽ⁱ⁾|z⁽ⁱ⁾=j)+log α_j] indicate to take logp (y⁽ⁱ⁾|z⁽ⁱ⁾=j)+log α_jIn maximum value, The occurrence of j is to mix texture classification belonging to dynamic texture at this time.

Five, the expression of dynamic texture model parameter is mixed:

Cause are as follows:

Formula (10) is to simplify expression, and the formula on the right of equation has passed through formula (5)~(8) and acquired；

So:

Formula (11) is the value that parameter is sought using formula (10), and * indicates estimated value.

Referring to Fig. 3, it is as follows to solve the step of mixing dynamic texture model parameter for EM algorithm:

A, observation sequence is inputtedInitialization model parameter { θ_j, α_j, j=1,2 ..., K }；

B, calculation formula (5)~(8) expectation, i=1 ..., N }, j=1 ..., K }；

C, calculation formula (10) update model parameter using formula (11)

Return step B is continued to execute, until convergence.

The maximum likelihood probability of mixing dynamic texture is calculated using formula, obtains texture class label l_i, according to l_iIt will mixing Dynamic texture cluster.Texture after scanning cluster, similar dynamic texture as prospect, remaining texture is partitioned into as background The dynamic texture of the crowd of different directions walking.

In step s3, complete characteristics of human body's model is divided into five local feature models, and is each local feature mould Type configures corresponding weight.The weight of different local feature models can be the same or different, and can pass through actual scene Data are trained to obtain desired value.

Further, step S3 the following steps are included:

Complete characteristics of human body's model M can be divided into five offices based on the body local characteristic model detection method of HOG Portion's characteristic model, specifically includes: model M₁, model M₂, model M₃, model M₄, model M₅, and M=M₁+M₂+M₃+M₄+M₅, such as scheme Shown in 4a.

In order to break the limitation of the prior art.Two methods as described in the background art, have some limitations, It is only used for specific scene, it is portable poor.It can be applied to different fields by changing the weight of local feature model It closes, in other words, the weight of local feature model is adjusted according to the demand of different occasions, such as: in high-density environments, due to The reasons such as block, for crowd's head feature than more prominent, other features are relatively weak, assign the local feature model for representing head Weight is big, and the weight for assigning other local feature models is small；Under case of low density case, in order to ensure crowd count accuracy, The feature of detection whole man be it is best, give the imparting of each local feature model identical weight；Etc..Five local feature models It is to be easier to find out model M from Fig. 4 a and 4b according to organization of human body model partition₁Indicate the head of people, model M₂And mould Type M₃Indicate the left and right two (being free of head) of the upper part of the body, model M₄And model M₅Indicate two up and down of the lower part of the body.

Wherein, in step S32, weight beta={ β of five local feature models₁, β₂, β₃, β₄, β₅Initial value is respectively 0.2, The error of the crowd's quantity and true statistical result that are gone out according to test set statistical estimate adjusts the value of weight, until i.e. error is small In preset value.

In step s 4, the feature of (dynamic texture) crowd after foreground segmentation is extracted, and special with five parts It levies model and carries out matching detection, according between weight and the five local feature models and complete characteristics of human body's model of extraction Positional shift calculates comprehensive matching degree, if being more than specific threshold, successful match.

Referring to Fig. 4 a and 4b, because deformation occurs for human body in the process of movement, it will lead to and dropped with the matching degree of template Low, so to subtract corresponding positional shift, when positional shift is more than specific threshold value, comprehensive matching degree will be less than certain threshold Value, then match it is unsuccessful (such as happen that by the upper limb of the head of pedestrian A and pedestrian B, lower limb error detection be one Individual, this is clearly that not pair, and the positional shift between the upper limb of pedestrian B, lower limb and the head of pedestrian A is not necessarily more than specific Threshold value, the generation of mistake can be avoided well by subtracting positional shift this variable).

Further, step S4 the following steps are included:

S41: the HOG feature of the crowd after extracting segmentation；

In step s 5, crowd's quantity is counted according to the number of successful match, draws crowd's quantity change curve, works as curve Or it when curvilinear trend triggering anomalous event, is warned.Since the crowd in scene is dynamic motion, thus crowd's quantity is not Break and changing, can intuitively be embodied by change curve, and can be according to trend prediction.The standard of triggering anomalous event can be set, Reach respective standard just to trigger.

Further, step S5 the following steps are included:

Anomalous event feedback mechanism is established, stability can be enhanced, avoid false-alarm, abnormal conditions can be carried out with early warning, and And can taking place frequently degree and determine its danger classes according to anomalous event, provide guidance in time.

Preferably, the data of pre- monitoring area are acquired as training set, determine mixing dynamic texture model using training set And the weight of five local feature models of human body.

Before formally coming into operation, needs to acquire data, the data of acquisition are divided into training set and test set, utilize training Collection acquires the parameter of mixing dynamic texture model, and the weight of local feature model also acquires at this moment (generally to be taken under primary condition The weight of five local feature models is 0.2, since the data manually acquired have true statistical result, according to system The error of crowd's quantity of estimation and true statistical result adjusts the value of weight, until convergence: i.e. error is provided less than some Value)；Test set can be used to debug.

The data of acquisition are randomly divided into training set and test set by the crowd's incremental data for manually acquiring certain public place, Learn the parameter of mixing dynamic texture model using training setAnd weight { the β of body local characteristic model₁, β₂, β₃, β₄, β₅, data debugging is carried out using test set, it is ensured that is operated normally.Demographic method is transplanted to another public When place, due to the change of environment, crowd density most probably changes, and manually acquires crowd's incremental data in the place, weight Multiple above-mentioned training process and test process, can come into operation again.

Particular content about Demographics' system the present invention is based on video monitoring may refer in previous embodiment The description of Demographics' method part based on video monitoring, details are not described herein.

Although the present invention is disclosed as above with preferred embodiment, it is not for limiting claim, any this field Technical staff without departing from the spirit and scope of the present invention, can make possible variation and modification, therefore of the invention Protection scope should be subject to the range that the claims in the present invention are defined.

Claims

1. a kind of Demographics' method based on video monitoring, which comprises the following steps:

S1: the real-time monitoring images of pre- monitoring area are obtained；

S3: complete characteristics of human body's model is divided into five local feature models, and corresponding for the configuration of each local feature model Weight；

The step S3 the following steps are included:

S31: by complete characteristics of human body's model partition be five local feature models, respectively head, the upper part of the body left and right two Two up and down of portion and the lower part of the body, obtain model M₁, model M₂, model M₃, model M₄, model M₅, complete characteristics of human body's model M =M₁+M₂+M₃+M₄+M₅；

S33: the HOG feature of characteristics of human body's model sample and non-human characteristic model sample is extracted, obtains five by SVM classifier The template of local feature model；

S4: extracting the feature of the crowd after segmentation, and carries out matching detection with five local feature models, according to Positional shift between weight and the five local feature models and complete characteristics of human body's model of extraction calculates comprehensive matching journey Degree, if being more than specific threshold, successful match；

The step S4 the following steps are included:

S41: the HOG feature of the crowd after extracting segmentation；

S42: the feature of extraction is matched with the template of five local feature models, comprehensive matching degree S=β₁M₁+ β₂M₂+β₃M₃+β₄M₄+β₅M₅- b, wherein b indicates the positional shift between practical local feature model and ideal stance model, b= b₂+b₃+b₄+b₅, model M₁Coordinate (the e of central point₁, f₁), model M₂Coordinate (the e of central point₂, f₂), model M₃The seat of central point Mark (e₃, f₃), model M₄Coordinate (the e of central point₄, f₄), model M₅Coordinate (the e of central point₅, f₅)；

S43: the comprehensive matching degree of human body each in obtained crowd is compared with specific threshold, if being more than certain threshold Value, then successful match is primary, the successful number of statistical match

S5: counting crowd's quantity according to the number of successful match, draw crowd's quantity change curve, when curve or curvilinear trend touch When sending out anomalous event, warned.

2. Demographics' method based on video monitoring as described in claim 1, which is characterized in that in the step S32, institute State weight beta={ β of five local feature models₁, β₂, β₃, β₄, β₅Initial value is respectively 0.2, gone out according to test set statistical estimate The error of crowd's quantity and true statistical result adjusts the value of weight, until i.e. error is less than preset value.

3. Demographics' method based on video monitoring as described in any one of claim 1-2, which is characterized in that described Step S5 the following steps are included:

S51: counting crowd's quantity according to the number of successful match, draws crowd's quantity change curve, and in triggering anomalous event Alarm；

S53: it when curve or trend occur again, is given warning in advance according to warning level；If subsequent triggers anomalous event, can The danger classes of the curve element can be increased in anomalous event set, conversely, danger classes declines；

4. Demographics' method based on video monitoring as described in any one of claim 1-2, which is characterized in that described In step S2, foreground segmentation is carried out using mixing dynamic texture model to real-time monitoring images, comprising the following steps:

S22: the unknown parameter in the mixing dynamic texture model is solved using EM algorithm, so that it is determined that the mixing dynamic line Manage model；

S23: according to the mixing dynamic texture model, the maximum likelihood probability of monitoring image mixing dynamic texture is calculated, is obtained Texture class label is clustered according to texture class label by dynamic texture is mixed；

S24: the texture after scanning cluster, using the similar dynamic texture for the crowd that characterizes as prospect, remaining texture divides as background Cut out the dynamic texture of the crowd of different directions walking.

5. Demographics' method based on video monitoring as claimed in claim 4, which is characterized in that acquire pre- monitoring area Data determine the weight of mixing five local feature models of dynamic texture model and human body using training set as training set.

6. a kind of Demographics' system based on video monitoring characterized by comprising

Complete characteristics of human body's model is divided into five local feature models, and is each local feature model by feature division unit Configure corresponding weight；

Characteristic matching unit extracts the feature of the crowd after segmentation, and with five local feature models carry out With detection, calculated according to the positional shift between weight and the five local feature models and complete characteristics of human body's model of extraction Comprehensive matching degree, if being more than specific threshold, successful match；

Count alarm unit, crowd's quantity counted according to the number of successful match, draws crowd's quantity change curve, when curve or When curvilinear trend triggers anomalous event, warned.