CN102360422A

CN102360422A - Violent behavior detecting method based on video analysis

Info

Publication number: CN102360422A
Application number: CN2011103182972A
Authority: CN
Inventors: 谢剑斌; 刘通; 闫玮; 李沛秦; 谢建华; 杨郴涟
Original assignee: HUNAN SHUNDE ELECTRONIC TECHNOLOGY Co Ltd
Current assignee: Beijing Guochuang Keshi Technology Co., Ltd.
Priority date: 2011-10-19
Filing date: 2011-10-19
Publication date: 2012-02-22

Abstract

The invention relates to a violent behavior detecting method based on video analysis. The method substantially comprises four steps of obtaining an ROI region, calculating an athletic field, extracting characteristics and classifying the characteristics. Computer assistance means and video analysis technology are utilized in the method for intelligently detecting the violent behavior in a monitoring video, quickly finding an abnormal condition and quickly performing early warning so as to effectively avoid the development of violent behaviors, so that the economic and social benefits are remarkable.

Description

Act of violence detection method based on video analysis

Technical field

The present invention relates to a kind of act of violence detection method based on video analysis.

Background technology

There are some potential safety hazards in a lot of occasions; For example personnel are numerous and jumbled in the supervision place, assembled some and had the personnel of violent tenet or extreme behavior, therefore; Incident of violence takes place frequently in the supervision place; Annual because of the act of violence incident that disables of causing injury reaches thousands of, this has not only caused ill effect to society, and has brought irreparable harm to the victim.Current, the supervision place adopts artificial surveillance style to keep watch on the act of violence in the supervision place mostly, still; Prison is many in the supervision place, and the convict is many, and scope of activities is big; And act of violence is sudden strong, when act of violence takes place, usually avoids the supervision personnel again, therefore; Artificial means of keeping watch on are the labor human and material resources not only, and can't prevent the generation of act of violence preferably.

Supervision place act of violence towards monitoring video detects; Adopt computer auxiliaring means and video analysis technology, the intelligent act of violence that detects in the monitoring video is in time found and early warning; Can effectively avoid the generation and the development of act of violence, economic and social benefit is remarkable.

The automatic detection of act of violence at present also is in the starting stage, and achievement in research is less relatively both at home and abroad.People such as Datta are according to human body contour outline and four limbs fetched behavioural characteristic; But for the complicated monitoring scene of reality, the accurate description of human body contour outline is difficulty very, when especially multiple goal is blocked each other; These class methods are difficult to prove effective, and this class methods detection speed is very slow; People's utilization sound such as Nam, as the feature detection act of violence, but this method only can to shoot, bleed, special screne such as blast detects, and can't detect common acts of violence such as having a fist fight, break; The local complexity notion that people such as Mecocci propose maximum variable color energy defines act of violence, but because this method is discerned according to color characteristic, the adaptability of monitoring environment complicated and changeable such as more weak is not strong for night, illumination; Patent [CN101557506] is analyzed act of violence through HMM, but this method is not suitable for complicated supervision scene only to the act of violence in simple scenario such as the elevator cage; Patent [CN101464952] adopts the abnormal behaviour recognition methods based on profile, but situation of blocking for many people and the target far away apart from video camera can't prove effective because this moment human body integrity profile be difficult to obtain; Patent [CN101344966] is extracted target trajectory feature detection abnormal behaviour, but this method can only detect in the simple scenario such as abnormal behaviour such as running, pace up and down, and is very low for act of violence accuracy of detection such as have a fist fight, break.

Summary of the invention

For this reason, the present invention extracts the act of violence method for quick based on sports ground, solves the low and slow-footed problem of act of violence accuracy of detection in the complicated monitoring scene.

The act of violence detection method that the present invention proposes mainly comprises four step: ROI (Region of Interest), and the zone obtains, sports ground calculating, feature extraction and tagsort, and as shown in Figure 1, details are as follows:

One, the ROI zone obtains

The zone that act of violence takes place must be the zone that compound movement is arranged, and obtains the ROI zone (MR) of compound movement, can reduce the calculated amount of subsequent process, reduces the flase drop phenomenon simultaneously.

(1), target detection

The adjacent N frame difference algorithm of target detection that the present invention's proposition is cut apart based on adaptive threshold, this algorithm noiseproof feature is strong, speed is fast, efficient is high, can reduce false alarm rate effectively, and its detailed process is (situation with N=5 is an example):

Step1 gets adjacent five two field picture I _K-2, I _K-1, I _k, I _K+1, I _K+2, calculate frame difference data Erro.

Erro = | I_{k} - α (\frac{I_{k - 1} + I_{k - 2}}{2}) - (1 - α) (\frac{I_{k + 1} + I_{k + 2}}{2}) |

Wherein, α is weights, and initial value is made as 0.5.

Step2 confirms adaptive threshold T.Calculate frame difference data average, and it multiply by a weighting coefficient, with as adaptive threshold.

m = \frac{1}{M \times N} Σ_{i = 1}^{M} Σ_{j = 1}^{N} Erro (i, j)

T＝β×m

Wherein, M * N is a picture size, and β is a weighting coefficient, gets β=10 here.

Step3 upgrades α, extracts moving region M _k

α＝e ^-2/m

D_{p, q} (i, j) = \{\begin{matrix} 1, & if | I_{p} (i, j) - I_{q} (i, j) | &GreaterEqual; T \\ 0, & otherwise \end{matrix}

M_{k} (i, j) = \{\begin{matrix} 1, & if (D_{k, k - 2} (i, j) + D_{k, k - 1} (i, j) + D_{k, k + 1} (i, j) + D_{k, k + 2} (i, j) &GreaterEqual; 3) \\ 0, & otherwise \end{matrix}

(2), ROI area identification

The present invention proposes the regional blending algorithm of ROI that combines based on medium filtering and mathematical morphology, and the target area that detects is merged and identifies, and step is following:

Step1 uses 3 * 3 medium filtering templates to eliminate isolated motor point;

The merging of step2 target area

Adopt expansion and corrosion operation in the mathematical morphology, remove " hole " of image;

Step3 ROI area identification

Adopt 8-in abutting connection with connection method, the bianry image of the moving target that detects is identified; Owing to only just act of violence possibly take place, therefore use a fixed threshold T in big zone _AreaReject the very few moving region of motion number of pixels.

{MR}_{t}^{j} = \{\begin{matrix} 1, & {Num}^{i}_{j} > T_{area} \\ 0, & otherwise \end{matrix}

Wherein, MR ⁱ _jExpression t j moving region constantly, Num ⁱ _jRepresent the number of motion pixel in this moving region, T _Area=55.

Two, sports ground calculates

As key property, therefore, the sports ground of asking for act of violence place ROI zone is the basis that act of violence detects with the complexity of motion in act of violence.

The present invention adopts the ROI regional movement field computing method based on many rhombuses template, and the acquisition process of ROI regional movement field is following:

Step1 searches for 17 " 1. " points (as shown in Figure 2), the position of asking for least error MBD.If the MBD point is at the center of search window, then algorithm finishes; If the MBD point then carries out step2, otherwise carries out step3 on big rhombus template.

Step2 is the center with the MBD point of step1, reuses little rhombus template and searches for, and is positioned at the search window center up to the MBD point.

Step3 reduces by half step-size in search, and confirms new MBD point, equals 1 up to step-length, and algorithm finishes.

Because it is different that target range video camera distance not simultaneously, is asked for the yardstick of sports ground, for this reason, need carry out normalization to the sports ground of asking for.

The present invention proposes the sports ground method for normalizing based on pinhole camera modeling, and concrete method for normalizing is described below:

The projection of real-world object on the video camera imaging plane, the pinhole camera modeling that is widely used, as shown in Figure 3, be inverted at imaging plane for avoiding real-world object, we have been placed on imaging plane and real-world object the homonymy of focus.Wherein, F is the focal length of camera lens, and C is a focus, highly is respectively h ₁, h ₂Target T ₁, T ₂Picture altitude on imaging plane is respectively h ₁', h ₂', can know h by geometric relationship ₁', h ₂' exist as follows to concern:

\frac{h_{1}^{'}}{h_{2}^{'}} = \frac{h_{1} D_{2}}{h_{2} D_{1}}

If the pinhole camera modeling field angle is β, its focal length is F, and imaging plane is positioned at the camera focus place, and the imaging size is m * n, and is as shown in Figure 4.By the geometric relationship between them, can easily derive following relation:

F = \frac{\sqrt{m^{2} + n^{2}}}{2 \tan (β / 2)}

Central point with imaging plane is former heart O, sets up cartesian coordinate system, can know that by the related properties of camera lens optical axis OC is perpendicular to imaging plane, and is as shown in Figure 5.If the coordinate of target reference point T ' be (x, y), its projection on u, v axle is respectively α, γ with the angle that the formed line of focus C is become with optical axis, by relevant trigonometric function knowledge, can know:

α = \arctan (\frac{y - \frac{m}{2}}{F})

γ = \arctan (\frac{x - \frac{n}{2}}{F})

In the formula, (x is a true origin with the image lower left corner y) to the coordinate of target reference point T ', and F is a focal length.

Shown in Figure 6 is the geometric representation that a camera is positioned at the supervisory system of guarded region oblique upper.Wherein, C is a focus, itself and floor level CA=H, the angle on optical axis and ground is θ, T is an impact point, its position on imaging plane be T ' (x, y), wherein, TB ⊥ OA can be concerned through geometric relationship as follows:

CT = \frac{CB}{\cos γ} = \frac{H}{\sin (θ + α) \cos γ}

So can get both form images the size ratio k _n:

k_{n} = \frac{{h_{t}}^{'}}{{h_{0}}^{'}} = \frac{h_{t} D_{0}}{h_{0} H \sin (θ + α) \cos γ} = \frac{h_{t}}{h_{0}} \cdot \frac{D_{0}}{H} \cdot \frac{1}{\sin (θ + α) \cos γ}

For the ratio of the imaging size of same object on different distance, h _t/ h ₀=1, and D ₀/ H is the zoom factor η that is asked just.Therefore, following formula can be reduced to:

k_{n} = \frac{η}{\sin (θ + α) \cos γ}

Wherein, η=D ₀/ H is the ratio of camera height and reference altitude.

Here k _nBe called zoom factor, the sports ground for each the ROI zone that obtains multiply by corresponding zoom factor with it, can realize that the normalization of sports ground is handled.

Three, feature extraction

See that from the angle of statistics when containing act of violence in the scene, the sports ground mould value in corresponding ROI zone is big, direction is disorderly, extracts the act of violence characteristic with this characteristic here.

(1), stable factor f _U

When act of violence takes place, stop each other and antagonism that owing to interpersonal the variation of moving target centroid position is comparatively slow.This phenomenon is reacted on the sports ground, and promptly the average of sports ground is less.The present invention proposes stable factor f _UDescribe this phenomenon, its computing method are following:

For ease of statement, establish certain moving region through the piece matching criterior, obtain M motion amplitude altogether and be not 0 sports ground, wherein the sports ground of i macro block is (Vx _i, Vy _i).Calculate the average

that sports ground makes progress at x, y respectively

\overset{&OverBar;}{Vx} = \frac{1}{M} Σ_{i = 1}^{M} {Vx}_{i},

\overset{&OverBar;}{Vy} = \frac{1}{M} Σ_{i = 1}^{M} {Vy}_{i}

Through following formula calculation stability factor f _U:

f_{U} = \exp (- λ \sqrt{{\overset{&OverBar;}{Vx}}^{2} + {\overset{&OverBar;}{V} y}^{2}})

Wherein, λ is a fixed coefficient, can confirm through experiment, gets λ=0.5 here.

(2), sports ground average energy M _RThe peace meansquaredeviation _R

When act of violence took place, some position of moving region (like arm, weapon and pin etc.) were inevitable with the fast speeds motion, and the movement velocity at some other position is relatively slow.Be reacted on the sports ground, i.e. sports ground energy hunting is bigger.The present invention proposes sports ground average energy M _RThe peace meansquaredeviation _RThis phenomenon is described.Its computing method are following:

Use following formula to calculate the energy R of each sports ground earlier _i:

R_{i} = \sqrt{{Vx}_{i}^{2} + {Vy}_{i}^{2}}

Calculate the average energy M of sports ground then _RThe peace meansquaredeviation _R:

M_{R} = \frac{1}{M} Σ_{i = 1}^{M} R_{i}

σ_{R} = \frac{1}{M} \sqrt{Σ_{i = 1}^{M} {(R_{i} - M_{R})}^{2}}

(3), normalization direction entropy E _oWith direction deviation M _o

When act of violence takes place,, must cause sports ground on direction, to seem and be in a mess owing to confront with each other and behavior such as hide.The present invention proposes normalization direction entropy E _oWith direction deviation M _oCharacterize this phenomenon.Its implementation is following:

Step1 is divided into N direction with 0～360 degree, and N is a positive integer, and (experiment is found; The value of N is advisable between should being taken at 10～30), carry out mark with 0～N-1 respectively, the direction of sports ground is carried out normalization; The probability that sports ground occurs on the statistics all directions; Be called normalization direction of motion histogram H (θ), as shown in Figure 7, N is taken as 16 in Fig. 7.

Step2 calculates the entropy E of normalization direction histogram H (θ) _o:

E_{o} = Σ_{i = 0}^{N - 1} p_{i} \log p_{i}

In the formula, p _iBe the probability of sports ground on i direction.

Step3 calculated direction deviation M _o: for i direction among the histogram H (θ), the relative direction θ of it and arbitrary direction j _IjAvailable following formula calculates:

θ_{ij} = \{\begin{matrix} | θ_{i} - θ_{j} | & if | θ_{i} - θ_{j} | \leq 8 \\ 16 - | θ_{i} - θ_{j} | & else \end{matrix}

Then the relative direction average of i direction

is:

\overset{&OverBar;}{θ_{i}} = \frac{1}{N} Σ_{j = 0}^{N - 1} (| θ_{ij} | \times p_{i})

Choose wherein minimum

As direction deviation M _o:

M_{o} = \min_{0 \leq i \leq 15} {\overset{&OverBar;}{θ_{i}}}

Four, tagsort

Generally, when having act of violence to take place in the moving region, the f that calculates _U, σ _R, E _oAnd M _oBe worth bigger, and M _RCan be in the metastable scope.This statistical property is not then satisfied in other behavior,, though some slow motions such as for example walking, chat are its f _U, M _RValue might be close with act of violence, but σ _R, E _oAnd M _oValue can be obviously less than normal; And move its f faster for the running uniform velocity _UIt is very little that value can become, E _oAnd M _oValue less than normal, M _RValue can be obviously bigger than normal.According to above-mentioned statistical property, the present invention adopts associating Gaussian membership function that characteristic parameter is carried out normalization and handles, to reduce the difference of each characteristic parameter on number change:

f_{i} (x) = \{\begin{matrix} \exp (- {(x - c_{1})}^{2} / 2 {σ_{1}}^{2}) & ifx < c_{1} \\ \exp (- {(x - c_{2})}^{2} / {σ_{2}}^{2}) & ifx > c_{2} \\ 1 & else \end{matrix}

Wherein, f _i=σ _R, E _o, M _o, M _Rc ₁, c ₂Be respectively the average of two Gaussian functions, σ ₁, σ ₂Be respectively the mean square deviation of two Gaussian functions, can confirm through experiment.

After experiment showed, that normalization is handled, characteristic parameter has good statistical property: when the moving region had act of violence to take place, each characteristic parameter all can obtain bigger value; Otherwise, when different normal behaviours takes place, have the different character parameter value less.Therefore, algorithm is lower to the requirement of Feature Fusion, and the present invention adopts weighted sum mode efficiently that the characteristic parameter of asking for is merged, and proposes the notion of violence progression RVI:

{RVI}_{i} = Σ_{j = 1}^{5} w_{j} \times f_{i}

In the formula, 0≤w _i≤1, represent the weights of i characteristic parameter, can confirm through experiment.f _i＝f _U、M _R、σ _R、E _o、M _o。

Violence progression RVI is the situation of change to movement locus, speed, direction in the moving region, and the concentrated expression of confusion degree, and the act of violence in the scene is had stronger sign ability.Because have a plurality of moving regions in every two field picture usually, the present invention chooses wherein maximum RVI and characterizes present frame, is defined as maximum violence progression MVI:

MVI＝max{RVI _i}

Because polytrope and some other unpredictable factors of people's behavior in the true environment are used single frames MVI to carry out act of violence and are detected the alert rate of the higher mistake of appearance easily.The present invention proposes the notion of average maximum violence progression AMVI, uses the average of multiframe MVI to characterize the possibility that act of violence is taking place in the supervision scene:

AMVI = \frac{1}{N} Σ_{j = 1}^{N} {MVI}_{j}

The use fixed threshold is judged the AMVI of present frame:

flag = \{\begin{matrix} 1 & ifAMVI &GreaterEqual; T \\ 0 & else \end{matrix}

If flag=1, judging has act of violence to take place in the scene, and present frame is the violence frame, can give the alarm or carries out other processing.

The advantage of method of the present invention is: the acts of violence such as having a fist fight, break, run that (1) exists in can the Intelligent Measurement video, and detection efficiency is high, and loss and false drop rate are low; (2) do not need to carry out the behavior differentiation according to the colouring information of human body, adaptive capacity to environment is strong, can adapt to non-stop run round the clock; (3) need not rely on the accurate profile information of human body to carry out the behavior differentiation, can adapt to the crowd of different crowded programs; (4) carry out characteristic normalization automatically according to pinhole camera modeling and handle, to video camera to set up conditional request little.

Description of drawings

Fig. 1 act of violence testing process

Fig. 2 sports ground search procedure

The perspective projection of Fig. 3 pinhole camera

The focal length of Fig. 4 pinhole camera and field angle

Fig. 5 object pixel is at the projection angle of imaging plane

The geometric representation of Fig. 6 supervisory system

Fig. 7 normalization direction of motion and its histogram

Embodiment

The act of violence detection method that the present invention proposes mainly comprises four steps:

One, the ROI zone obtains;

Two, sports ground calculates;

Three, feature extraction;

Four, tagsort.

Wherein,

One, the ROI zone obtains and comprises:

(1), target detection, its detailed process is:

Step1 gets adjacent five two field picture I _K-2, I _K-1, I _k, I _K+1, I _K+2, calculate frame difference data Erro;

Step2 confirms adaptive threshold T; Calculate frame difference data average, and it multiply by a weighting coefficient, with as adaptive threshold;

Step3 upgrades α, extracts moving region M _k

(2), the ROI area identification, step is following:

Step1 uses 3 * 3 medium filtering templates to eliminate isolated motor point;

The merging of step2 target area;

Step3 ROI area identification;

Adopt 8-in abutting connection with connection method, the bianry image of the moving target that detects is identified.

Two, sports ground calculates, and the acquisition process of ROI regional movement field is following:

Step1 searches for 17 " 1. " points, the position of asking for least error MBD, and at the center of search window, then algorithm finishes as if the MBD point; If the MBD point then carries out step2, otherwise carries out step3 on big rhombus template;

Step2 is the center with the MBD point of step1, reuses little rhombus template and searches for, and is positioned at the search window center up to the MBD point;

Method for normalizing is specially: k _nBe zoom factor, the sports ground for each the ROI zone that obtains multiply by corresponding zoom factor with it, can realize that the normalization of sports ground is handled;

k_{n} = \frac{η}{\sin (θ + α) \cos γ}

Wherein, η=D ₀/ H is the ratio of camera height and reference altitude.

Three, feature extraction comprises:

(1), stable factor f _U, its concrete computing method are:

f_{U} = \exp (- λ \sqrt{{\overset{&OverBar;}{Vx}}^{2} + {\overset{&OverBar;}{Vy}}^{2}})

Wherein, λ is a fixed coefficient; Can confirm through experiment; Here get λ=0.5,

representes the average of sports ground on x, y direction respectively;

(2), sports ground average energy M _RThe peace meansquaredeviation _R

M_{R} = \frac{1}{M} Σ_{i = 1}^{M} R_{i}

σ_{R} = \frac{1}{M} \sqrt{Σ_{i = 1}^{M} {(R_{i} - M_{R})}^{2}}

Wherein: R _iEnergy for each sports ground:

R_{i} = \sqrt{{Vx}_{i}^{2} + {Vy}_{i}^{2}}

(Vx _i, Vy _i) expression i macro block sports ground;

(3), normalization direction entropy E _oWith direction deviation M _o, its implementation is following:

Step1 is divided into N direction with 0～360 degree, and N does;

Step2 calculates the entropy E of normalization direction histogram H (θ) _o

Four, tagsort

Adopt associating Gaussian membership function that characteristic parameter is carried out normalization and handle, to reduce the difference of each characteristic parameter on number change:

f_{i} (x) = \{\begin{matrix} \exp (- {(x - c_{1})}^{2} / 2 {σ_{1}}^{2}) & ifx < c_{1} \\ \exp (- {(x - c_{2})}^{2} / {σ_{2}}^{2}) & ifx > c_{2} \\ 1 & else \end{matrix}

Claims

1. based on the act of violence detection method of video analysis, it is characterized in that, mainly comprise four steps:

One, the ROI zone obtains, and comprises (1), target detection; (2), ROI area identification;

Two, sports ground calculates, and adopts the ROI regional movement field computing method based on many rhombuses template;

Three, feature extraction;

Four, tagsort adopts associating Gaussian membership function that characteristic parameter is carried out normalization and handles, to reduce the difference of each characteristic parameter on number change.

2. the act of violence detection method based on video analysis according to claim 1 is characterized in that the ROI zone obtains and comprises:

(1), target detection, its detailed process is:

Step3 upgrades α, extracts moving region M _k

(2), the ROI area identification, step is following:

Step1 uses 3 * 3 medium filtering templates to eliminate isolated motor point;

The merging of step2 target area;

Step3 ROI area identification;

3. the act of violence detection method based on video analysis according to claim 1 is characterized in that, sports ground calculates, and the acquisition process of ROI regional movement field is following:

4. the act of violence detection method based on video analysis according to claim 1 is characterized in that method for normalizing is specially: k _nBe zoom factor, the sports ground for each the ROI zone that obtains multiply by corresponding zoom factor with it, can realize that the normalization of sports ground is handled;

k_{n} = \frac{η}{\sin (θ + α) \cos γ}

Wherein, η=D ₀/ H is the ratio of camera height and reference altitude.

5. the act of violence detection method based on video analysis according to claim 1 is characterized in that feature extraction comprises:

(1), stable factor f _U, its concrete computing method are:

f_{U} = \exp (- λ \sqrt{{\overset{&OverBar;}{Vx}}^{2} + {\overset{&OverBar;}{Vy}}^{2}})

Wherein, λ is a fixed coefficient; Can confirm through experiment; Here get λ=0.5, representes the average of sports ground on x, y direction respectively;

(2), sports ground average energy M _RThe peace meansquaredeviation _R

M_{R} = \frac{1}{M} Σ_{i = 1}^{M} R_{i}

σ_{R} = \frac{1}{M} \sqrt{Σ_{i = 1}^{M} {(R_{i} - M_{R})}^{2}}

Wherein: R _iEnergy for each sports ground:

R_{i} = \sqrt{{Vx}_{i}^{2} + {Vy}_{i}^{2}}

(Vx _i, Vy _i) expression i macro block sports ground;

Step1 is divided into N direction with 0～360 degree, and N does;

Step2 calculates the entropy E of normalization direction histogram H (θ) _o

6. the act of violence detection method based on video analysis according to claim 1 is characterized in that, tagsort adopts associating Gaussian membership function that characteristic parameter is carried out normalization and handles, to reduce the difference of each characteristic parameter on number change:

f_{i} (x) = \{\begin{matrix} \exp (- {(x - c_{1})}^{2} / 2 {σ_{1}}^{2}) & ifx < c_{1} \\ \exp (- {(x - c_{2})}^{2} / {σ_{2}}^{2}) & ifx > c_{2} \\ 1 & else \end{matrix}

7. the act of violence detection method based on video analysis according to claim 1 is characterized in that, adopts the weighted sum mode that the characteristic parameter of asking for is merged, and proposes the notion of violence progression RVI:

{RVI}_{i} = Σ_{j = 1}^{5} w_{j} \times f_{i}

In the formula, 0≤w _i≤1, represent the weights of i characteristic parameter, can confirm f through experiment _i=f _U, M _R, σ _R, E _o, M _o