CN105046195A

CN105046195A - Human behavior identification method based on asymmetric generalized Gaussian distribution model (AGGD)

Info

Publication number: CN105046195A
Application number: CN201510313321.1A
Authority: CN
Inventors: 李俊峰; 方建良; 张飞燕
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2015-06-09
Filing date: 2015-06-09
Publication date: 2015-11-11
Anticipated expiration: 2035-06-09
Also published as: CN105046195B

Abstract

The invention discloses a human behavior identification method based on an asymmetric generalized Gaussian distribution model (AGGD). First of all, a training video is preprocessed, a spatio-temporal interest point of the video is detected, then by taking the interest point as a center, a video block is extracted, optical flow information and gradient information are calculated, a corresponding histogram is drafted according to the obtained optical flow information and the gradient information, then, the corresponding histogram is fitted by use of the AGGD, and AGGD parameters of the optical flow information and the gradient information are taken as features so as to form a feature matrix for training the video. According to the invention, the above processing is also performed on a test video in the same way so that a feature matrix of the test video is obtained. Finally, the mahalanobis distance between the feature matrix of the training video and the feature matrix of the test video is calculated, and then the behavior of the test video is identified according to a nearest neighbor principle. The method provided by the invention improves the accuracy of the behavior of a video to be identified to a quite large degree.

Description

Based on the Human bodys' response method of asymmetric generalized gaussian model

Technical field

The present invention relates to a kind of method of Human bodys' response, belong to computer vision and machine learning field, is a kind of Human bodys' response algorithm specifically.

Background technology

In recent years, the research and apply of Intelligent Video Surveillance Technology enjoys the concern of people.As its base conditioning part, Activity recognition is a very active research direction, belongs to the important research content of computer vision field.

Two classes can be divided into: based on the Activity recognition method of global characteristics and local feature according to current research method.Global characteristics adopts the information such as edge, light stream, outline profile to be described the whole interested human region detected usually, more responsive to noise, visual angle change, partial occlusion.Adaptive updates background model is carried out as utilized mixed Gauss model, after sport foreground is extracted to video sequence, zone marker is carried out to prospect, then employing Lucas-Kanade optical flow computation method obtains the Optic flow information in moving region, uses the weighting direction histogram based on amplitude to describe behavior; Double-background model is utilized to carry out adaptive updates background model in addition, after extracting the sport foreground of video sequence, adopt Lucas-Kanade optical flow computation method to calculate Optic flow information to the minimum adjacent rectangular area in prospect, utilize the unit weighting light stream energy of moving target to carry out Activity recognition; Some scholar proposes the Optic flow information first extracted in video, then an empirical covariance matrix is utilized to obtain a covariance descriptor to its dimensionality reduction, being mapped to vector space by taking the logarithm to it, utilizing the logarithm covariance descriptor finally obtained to carry out Activity recognition; Be extracted one in addition and carry out Describing Motion behavior, without any need for human body segmentation and background subtraction during this feature extraction towards the histogrammic feature of Optical-flow Feature; Proposing a sampling system Activity recognition method based on depth image 3D articulation point Sampling characters bag, describing human body behavior by extracting the 3D articulation point characterizing human body attitude from range image sequence; What have is then propose a kind of motor pattern analytical approach that uses to carry out the detection of abnormal behaviour, it generates a kind of motion model by the light stream calculating video and defines track, then utilize space time information to carry out hierarchical cluster to learn this statistics motor pattern to track, finally utilize this statistical model to carry out abnormality detection; .

Local feature is described after extracting interested piece or point in human body, do not need accurately to locate human body and follow the tracks of, and local feature is insensitive to screening part gear, visual angle change etc.Therefore in Activity recognition, local feature frequency of utilization is higher.From compression of video sequence, such as extract quantization parameter and motion vector as feature; Utilize 3D-HOG characteristic sum Optical-flow Feature to describe video behavior; Some is then extracted 3D-SIFT feature in video sequence; And the space-time cube describing together and extract from video sequence that HOG characteristic sum HOF integrate features is got up had; From video, extract space-time word feature bag, then utilize the latent Di Li Cray apportion model of mark as sorter to carry out Activity recognition; Propose a kind of fast density track Activity recognition method, by the region of interesting extraction density trace feature in frame of video, then utilize time pyramid to realize the velocity adaptive mechanism of different action to carry out Activity recognition; More accurate Optical-flow Feature is calculated utilize Harris Corner Detection Algorithm detected image point of interest after removing image background through pre-service after; Light stream detection method is first used to detect position and the direction of motion, the method of being known together by random sampling is further located and motion the most outstanding in the framework that identifies, then according to the mean difference of position, point of interest horizontal and vertical direction in optical flow field and a little rectangular area of standard deviation location human motion, this little rectangular area is divided into several pieces, light stream is calculated frame by frame according to point of interest, synthesize a matrix again, on average represent the behavior again after the matrix of identical behavior is added, finally utilize simple sorter to carry out Activity recognition.

How obtain from image sequence the feature of effective expression body motion information to be the key of Human bodys' response.Optical-flow Feature is reasonable space-time characteristic, is also a kind of motion feature often used during motion identifies.In said method, be all after extraction video motion prospect, foreground moving region is marked, then optical flow computation is carried out to it; Some is then calculate light stream to regular-shape motion area dividing again after detecting whole human motion region.For the various different behavior of human body, the Optic flow information of the unconspicuous body part that moves is negligible, and said method needs the light stream calculating whole human region, which not only adds calculated amount, but also can reduce accuracy of identification.Simultaneously for space-time characteristic, space-time characteristic descriptor constructs word bag code book again after carrying out PCA dimensionality reduction, carry out cluster calculation generation " dictionary " again after namely sampling to training data, and this method makes training sample not to be fully utilized; And in order to ensure certain average recognition rate, even if dimensionality reduction, sample data amount is still too high, and cluster speed is slow.In addition, may there is certain similarity in the characteristic of all directions, and all directions together cluster will reduce descriptive to behavior of different directions feature.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of Human bodys' response method based on asymmetric generalized gaussian model.

In order to solve the problems of the technologies described above, the invention provides a kind of Human bodys' response method based on asymmetric generalized gaussian model, be realized by training video storehouse and test video; Comprise the following steps: step one, respectively point of interest detection is carried out for given training video storehouse and test video; Step 2, extracts video block centered by point of interest; Step 3, respectively the video block information of calculation training video and test video, and obtain respective X, Y, Z tri-direction gradient feature data and Optical-flow Feature u, v two component data; Step 4, draws gradient three direction histogram and light stream two direction histogram respectively to above-mentioned data; Step 5, carrys out the corresponding histogram of matching with asymmetric generalized gaussian model; Step 6, the parameter extracting asymmetric generalized gaussian model forms the eigenmatrix of each behavior of training video and the eigenmatrix of test video as feature; Step 7, calculates the mahalanobis distance between test video eigenmatrix and each behavioural characteristic matrix of training video; Step 8, carries out Activity recognition according to nearest neighbouring rule.

Improvement as to the Human bodys' response method based on asymmetric generalized gaussian model of the present invention: in described step, point of interest detects as follows: the image sequence f (x, y, t) video being regarded as multiple image composition; Defined function f: f obtains L after adding linear space yardstick: by image sequence f through separate space variable be with time variable be gaussian function carry out convolutional filtering and obtain, expression formula is as follows:

Gauss's window in time-space domain be defined as:

g (x, y, t; σ_{l}^{2}; τ_{l}^{2}) = \frac{1}{\sqrt{{(2 π)}^{3} σ_{l}^{4} τ_{l}^{2}}} \times \exp (\frac{- {(x^{2} + y^{2})}^{2}}{2 σ_{l}^{2}} - \frac{t^{2}}{2 τ_{l}^{2}}) - - - (10)

σ in formula _lfor space scale variable, τ _lfor time scale variable, t is time dimension; Definition response function R is:

R(x,y,t)＝(I*g*h _ev) ²+(I*g*h _od) ²(11)

In formula, * is convolution algorithm symbol, and I is video image, and g is dimensional Gaussian smoothing kernel, h _evand h _odit is one dimension Gabor filter orthogonal in spatial domain; h _evand h _odbe defined as:

h_{e v} (t; τ; ω) = - c o s (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (12)

h_{o d} (t; τ; ω) = - s i n (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (13)

In formula (12), formula (13), σ and τ is respectively the detection yardstick in spatial domain and time domain, gets σ=2 and τ=3; Gaussian smoothing filter yardstick is 2; The adjacent domain of each maximum point of response function R contains the body local movable information in I (x, y, t).

As to the Human bodys' response further improvements in methods based on asymmetric generalized gaussian model of the present invention: in described step, Optical-flow Feature u, v two component data comprise as follows: the extraction of space-time characteristic point: after point of interest detection is carried out to image sequence, obtain space-time interest points, centered by space-time interest points, define a space-time cube, extract the pixel of this space-time cube to construct space-time characteristic; If space-time cube is I (x, y, t), the then gradient G of its X-axis, Y-axis, Z-direction _x, G _y, G _zcan be defined as respectively:

G _x(x,y,t)＝L(x+1,y,t)-L(x-1,y,t)，(14)

G _y(x,y,t)＝L(x,y+1,t)-L(x,y-1,t)，(15)

G _z(x,y,t)＝L(x,y,t+1)-L(x,y,t-1)，(16)

In formula, L (x+1, y, t) is the gradient of this Gaussian function after convolutional filtering;

The extraction of Optical-flow Feature: adopt Lucas-Kanade method to calculate light stream: be located at moment t place, pixel (x, y) is at position 1 place, gray-scale value is herein I (x, y, t), in (the t+ △ t) moment, preimage vegetarian refreshments moves to position 2 place, now its change in location is (x+ △ x, y+ △ y), and new gray-scale value is I (x+ △ x, y+ △ y, t+ △ t); According to image consistency hypothesis, meet

\frac{d I (x, y, t)}{d t} = 0,

Then:

I(x,y,t)＝I(x+△x,y+△y,t+△t)(17)

If u and v is respectively the component of light stream vector along x and y both direction of pixel (x, y), by formula (17) Taylor expansion be:

I (x + Δ x, y + Δ y, t + Δ t) = I (x, y, t) + \frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t + ϵ - - - (18)

After high-order term ε more than second order is ignored, then meet:

\frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t = 0 - - - (19)

∵△t→0

∴

\frac{\partial I}{\partial x} \frac{d x}{d t} + \frac{\partial I}{\partial y} \frac{d y}{d t} + \frac{\partial I}{\partial t} = 0

That is: I _xu+I _yv+I _t=0 (20)

In formula (20), I _x, I _y, I _tfor pixel (x, y) is along the partial derivative in x, y, t tri-directions; Can express with the vector expression of following formula:

&dtri; I \cdot U + I_{t} = 0 - - - (21)

In formula (21), for gradient direction, U=(u, v) ^trepresent light stream; Suppose that the light stream in the window of specifying in a size keeps constant, the optical flow constraint equation in this window can be asked thus to obtain light stream (u, v) that size is the characteristic window of x × x, that is:

[\begin{matrix} I_{x 1} & I_{y 1} \\ I_{y 2} & I_{y 2} \\ . & . \\ . & . \\ . & . \\ I_{x i} & I_{y i} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - [\begin{matrix} I_{t 1} \\ I_{t 2} \\ . \\ . \\ . \\ I_{t i} \end{matrix}] - - - (22)

In formula (22), i is the number of pixels i=(x × x) in characteristic window, I _xand I _yfor the spatial gradient of image, I _tit is time gradient; Solve formula (22) can obtain:

[\begin{matrix} u \\ v \end{matrix}] = {[\begin{matrix} Σ {I^{2}}_{x_{i}} Σ I_{x_{i}} I_{y_{i}} \\ {ΣI}_{x_{i}} I_{y_{i}} {ΣI}^{2}_{y i} \end{matrix}]}^{- 1} [\begin{matrix} - Σ I_{x_{i}} I_{t_{i}} \\ - {ΣI}_{x_{i}} I_{t_{i}} \end{matrix}] - - - (23) .

As to the Human bodys' response further improvements in methods based on asymmetric generalized gaussian model of the present invention: in described step, the parameter attribute based on asymmetric generalized gaussian model extracts: the expression formula of asymmetric generalized gaussian model is as follows:

f (x; v, σ_{l}^{2}, σ_{r}^{2}) = \{\begin{matrix} \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x < 0 \\ \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x &GreaterEqual; 0 \end{matrix} - - - (24)

In formula (24), with wherein Γ () is gamma function, and expression formula is:

Γ (α) = {&Integral;}_{0}^{\infty} t^{α - 1} e^{- t} d t, α > 0 - - - (25)

After asymmetric generalized gaussian model matching is carried out to characteristic, extract its five parameters (α, β _l, β _r, v, u), by five parameters (α, β _l, β _r, v, u) and as feature.

First training video is carried out pre-service and detects the space-time interest points of this video by the present invention, centered by point of interest, extract its video block again and calculate its Optic flow information and gradient information, corresponding histogram is drawn according to the Optic flow information obtained and gradient information, then carry out the corresponding histogram of matching with asymmetric generalized gaussian model (AGGD), form the eigenmatrix of training video using the parameter of the AGGD of Optic flow information and gradient information as feature.The eigenmatrix that above-mentioned all process obtain test video is carried out equally for test video.Mahalanobis distance between last calculation training video and the eigenmatrix of test video, then the behavior of test video is identified according to nearest neighbouring rule.Method of the present invention improves the accuracy rate of video behavior to be identified to a great extent.

Accompanying drawing explanation

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.

Fig. 1 is training video treatment scheme of the present invention;

Fig. 2 is Activity recognition flow process;

Fig. 3 angle point;

Fig. 4 elliptic function schematic diagram;

Fig. 5 eigenwert and angle point graph of a relation;

The extraction of the AGGD parameter of Fig. 6 Gradient Features;

The AGGD parameter attribute Activity recognition of Fig. 7 Gradient Features;

The Activity recognition rate of Fig. 8 Weizmann storehouse Gradient Features AGGD parameter attribute;

The Activity recognition rate of Fig. 9 KTH storehouse Gradient Features AGGD parameter attribute;

Figure 10 Optical-flow Feature AGGD parameter attribute extracts figure;

Figure 11 light stream AGGD parameter attribute Activity recognition;

Figure 12 Weizmann storehouse Optical-flow Feature AGGD parameter attribute Activity recognition rate;

Figure 13 KTH storehouse light stream AGGD parameter attribute Activity recognition rate;

Figure 14 merges the extraction of gradient and light stream parameter attribute;

Figure 15 merges gradient and light stream AGGD parameter attribute Activity recognition;

Gradient and light stream AGGD parameter attribute Activity recognition rate are merged in Figure 16 Weizmann storehouse;

Gradient and light stream AGGD parameter attribute Activity recognition rate are merged in Figure 17 KTH storehouse.

Embodiment

Embodiment 1, Fig. 1 ~ Figure 17 give a kind of Human bodys' response method based on asymmetric generalized gaussian model, comprise the following steps:

Step one, to extract by carrying out gradient and Optical-flow Feature data to training video storehouse, form each characteristic direction (characteristic direction is here 3 directions of Gradient Features and 2 components of Optical-flow Feature) characteristic set.

Step 2, histogram description is carried out for the characteristic of above-mentioned each characteristic direction.

Step 3, carry out the above-mentioned histogram of matching with asymmetric generalized gaussian model (AGGD), using the parameter of AGGD as feature, form the parameter attribute matrix of each behavior.

Step 4, for test video, we also carry out the extraction of gradient and Optical-flow Feature, form the characteristic set of each characteristic direction.

Step 5, histogram description is carried out for the characteristic of above-mentioned steps four.

Step 6, we also carry out matching histogram with AGGD equally, using the parameter of AGGD as feature, form the parameter attribute matrix of test video.

Step 7, calculate mahalanobis distance between test video eigenmatrix and the eigenmatrix of each behavior in training video storehouse respectively.

Step 8, judge the behavior of test video according to nearest neighbouring rule.

In above step 1 ~ 8, detect mainly through point of interest, the extraction of unique point and description, to extract based on the parameter attribute of asymmetric generalized gaussian model (AGGD), based on the Activity recognition of AGGD parameter.

1, point of interest detects as follows:

In the present invention, in order to the space-time interest points of image sequence I (x, y, t) effectively can be detected, adopt following method:

First we define the intersection point that image angle point is two edges, or can be understood as the unique point simultaneously in neighborhood with two principal directions, the turning in similar road and house.The neighborhood at general angle point place is normally stable in image and have the region of bulk information, and these regions have the characteristics such as affine-invariant features, scale invariability, rotational invariance.Human visual is completed by the region of a local or a wicket usually to the identification of angle point, as shown in Figure 3.If when being moved towards all directions by this specific wicket, meanwhile in moving window region, grey scale change is comparatively large, so just can judge there is angle point in this window, as shown in Figure 3.If when being moved towards all directions by this specific wicket, gray scale does not change in moving window region, so there is no angle point in this window, as shown in Figure 3.If when being moved towards some directions by this specific wicket, in moving window region, grey scale change is larger, and when moving towards another direction in window area gray scale constant, may be straight line so in this window, as shown in Figure 3.

According to autocorrelation function, the self-similarity expression formula of image I (x, y) after point (x, y) place translation (△ x, △ y) can be provided:

c (x, y, Δ x, Δ y) = \underset{(u, v) &Element; W (x, y)}{Σ} ω (u, v) {(I (u, v) - I (u + Δ x, v + Δ y))}^{2} - - - (1)

In formula (1), ω (u, v) is weighting function, and can get constant also can be gaussian weighing function; W (x, y) is the window centered by point (x, y).

According to Taylor expansion, after point (x, y) place translation (△ x, △ y), first approximation is carried out to image I (x, y) and obtains:

I (u + Δ x, v + Δ y) = (u, v) + [I_{x} (u, v), I_{y} (u, v)] [\begin{matrix} Δ x \\ Δ x \end{matrix}] - - - (2)

In formula (2), I _xand I _yit is the partial derivative of I (x, y).

Then formula (2) can be approximately:

c (x, y; Δ x, Δ y) = [\begin{matrix} Δ x & Δ y \end{matrix}] M (x, y) [\begin{matrix} Δ x \\ Δ y \end{matrix}] - - - (3)

In formula (3),

M (x, y) = [\begin{matrix} \underset{ω}{Σ} I_{x} {(u, v)}^{2} & \underset{ω}{Σ} I_{x} (u, v) I_{y} (u, v) \\ \underset{ω}{Σ} I_{x} (u, v) I_{y} (u, v) & \underset{ω}{Σ} I_{y} {(u, v)}^{2} \end{matrix}],

Namely the autocorrelation function of image I (x, y) after point (x, y) place translation (△ x, △ y) can be approximated to be quadratic term function.

Quadratic term function can see in fact an elliptic function as, and as shown in Figure 4, the ellipticity of elliptic function and size are by the eigenvalue λ of M (x, y) ₁, λ ₂determine, direction is determined by the eigenvector of M (x, y), and its equation is:

[\begin{matrix} Δ x & Δ y \end{matrix}] M (x, y) [\begin{matrix} Δ x \\ Δ y \end{matrix}] = 1 - - - (4)

The angle point in the image in window, edge (straight line) and plane can be judged, as shown in Figure 5 according to the size of the eigenwert of quadratic term function.Work as λ ₁< < λ ₂or λ ₁>> λ ₂, namely the value of autocorrelation function is only larger on some directions, time smaller in other directions, can be judged as straight line in window; Work as λ ₁≈ λ ₂, and λ ₁and λ ₂all smaller, namely the value of autocorrelation function is all smaller in all directions, can be judged as plane in window; Work as λ ₁≈ λ ₂, and λ ₁and λ ₂all larger, namely the value of autocorrelation function is all larger in all directions, can be judged as angle point in window.

In fact, differentiate that angle point does not need to calculate concrete eigenwert, calculate its value after only need defining an angle point response function to judge angle point.Definition response function R is:

R＝detM-α(traceM) ²(5)

M (x, y) in formula (3) is reduced to

M (x, y) = [\begin{matrix} A & D \\ D & B \end{matrix}],

DetM and traceM then in formula (5) is respectively the determinant of M (x, y) and straight mark, and wherein α is empirical value, generally gets 0.04-0.06.

Detected by our the Harris point of interest of extending out of define method of angle point above, the thinking of Harris point of interest detection method finds image f ^spthe position all had significant change in all directions.Then the detection method of Harris point of interest can be described as: defining a sub-picture is f ^sp: f ^spl is obtained after linear filtering ^sp: its expression formula is as follows:

L^{s p} (x, y; σ_{l}^{2}) = g^{s p} (x, y; σ_{l}^{2}) * f^{s p} (x, y) - - - (6)

In formula (6), g ^spimage f ^spcarry out the gaussian kernel function of convolutional filtering, for its dimensional variation factor.

To the given observing result of formula (6) yardstick is utilized to be second-order matrix with Gauss's window finds point of interest, and expression formula is as follows:

In formula (7), * is convolution symbol, with being yardstick is gradient on Gaussian function x and y,

A second moment descriptor can be regarded as the framing covariance matrix of a some near zone Two dimensional Distribution.So, matrix μ ^speigenvalue λ ₁and λ ₂(λ ₁≤ λ ₂) constitute f ^spchange descriptor in image both direction, and λ ₁and λ ₂just point of interest is there is when being all and being worth greatly.Harris and Stephens proposes the maximum value calculation method of a Corner Detection function based on this, and expression formula is as follows:

H ^sp＝det(μ ^sp)-k×trace ²(μ ^sp)＝λ ₁λ ₂-k(λ ₁+λ ₂) ²(8)

In the position that point of interest exists, the ratio α=λ of eigenwert ₂/ λ ₁value can be larger.From formula (8) known H ^spget positive maximum, the ratio α of eigenwert meets k≤α/(1+ α) ²if, definition k=0.25, now α=1, λ ₁=λ ₂, H gets positive maximum, and point of interest has desirable isotropy.

What will detect due to this patent is point of interest in video (image sequence), video can be regarded as the image sequence f (x, y, t) of multiple image composition.Defined function f: f obtains L after adding linear space yardstick: by image sequence f through separate space variable be with time variable be gaussian function carry out convolutional filtering and obtain, expression formula is as follows:

Gauss's window in time-space domain be defined as:

g (x, y, t; σ_{l}^{2}; τ_{l}^{2}) = \frac{1}{\sqrt{{(2 π)}^{2} σ_{l}^{4} τ_{l}^{2}}} \times \exp (\frac{- {(x^{2} + y^{2})}^{2}}{2 σ_{l}^{2}} - \frac{t^{2}}{2 τ_{l}^{2}}) - - - (10)

σ in formula _lfor space scale variable, τ _lfor time scale variable, t is time dimension.

The point of interest detection method that this patent is used, space dimension is continued to use the point of interest method in above-mentioned image, and time dimension then adopts the Gabor filter that Dollar proposes, then define response function R to be:

R(x,y,t)＝(I*g*h _ev) ²+(I*g*h _od) ²(11)

In formula, * is convolution algorithm symbol, and I is video image, and g is dimensional Gaussian smoothing kernel, h _evand h _odit is one dimension Gabor filter orthogonal in spatial domain.

H _evand h _odbe defined as:

h_{e v} (t; τ; ω) = - c o s (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (12)

h_{o d} (t; τ; ω) = - s i n (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (13)

In formula (12), formula (13), σ and τ is respectively the detection yardstick in spatial domain and time domain, gets σ=2 and τ=3 in the present invention; Gaussian smoothing filter yardstick is 2.

The adjacent domain of each maximum point of response function R contains the body local movable information in I (x, y, t).

2, feature extraction and be described below:

The extraction of 2.1 space-time characteristics:

After point of interest detection is carried out to image sequence, just a series of space-time interest points can be obtained, but only human body behavior cannot be effectively described by these points of interest.The present invention defines a space-time cube centered by space-time interest points, and extract the pixel of this space-time cube to construct space-time characteristic, the cubical length of side gets six times of its place yardstick.This space-time cube contains and is conducive to most of points that response function gets maximum value.

Describe the method for space-time cube have cube expansion value is described as a vector, pixel normalization description and histogram description etc.Because human body is when moving, the brightness of image change of point of interest near zone is very violent, and when the motor behavior of human body is different, the brightness of image change of point of interest near zone is also different.Therefore, the brightness of image of point of interest near zone can be utilized to change the point of interest describing different human body behavior.The brightness of image change of the point of interest near zone of different human body behavior can be reflected by the gradient of the brightness of space-time cube along X-axis, Y-axis and Z axis (i.e. time shaft) direction, and this patent extracts these gradients and carries out Human bodys' response as feature.

If space-time cube is I (x, y, t), the then gradient G of its X-axis, Y-axis, Z-direction _x, G _y, G _zcan be defined as respectively:

G _x(x,y,t)＝L(x+1,y,t)-L(x-1,y,t)，(14)

G _y(x,y,t)＝L(x,y+1,t)-L(x,y-1,t)，(15)

G _z(x,y,t)＝L(x,y,t+1)-L(x,y,t-1)，(16)

In formula, L (x+1, y, t) is the gradient of this Gaussian function after convolutional filtering.

2.2, the extraction of Optical-flow Feature:

Optical flow field is the how time dependent vector field of energy Description Image sequence, and containing the transient motion velocity information of pixel, is reasonable space-time characteristic.But optical flow computation amount is comparatively large, in order to reduce calculated amount by only calculating the light stream of the video block extracted, text selects Lucas-Kanade method to calculate light stream.

Optical flow computation principle:

Be located at moment t place, pixel (x, y) is at position 1 place, gray-scale value is herein I (x, y, t), in (the t+ △ t) moment, preimage vegetarian refreshments moves to position 2 place, now its change in location is (x+ △ x, y+ △ y), and new gray-scale value is I (x+ △ x, y+ △ y, t+ △ t).According to image consistency hypothesis, meet then:

I(x,y,t)＝I(x+△x,y+△y,t+△t)(17)

I (x + Δ x, y + Δ y, t + Δ t) = I (x, y, t) + \frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t + ϵ - - - (18)

After high-order term ε more than second order is ignored, then meet:

\frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t = 0 - - - (19)

∵△t→0

∴

\frac{\partial I}{\partial x} \frac{d x}{d t} + \frac{\partial I}{\partial y} \frac{d y}{d t} + \frac{\partial I}{\partial t} = 0

That is: I _xu+I _yv+I _t=0 (20)

In formula (20), I _x, I _y, I _tfor pixel (x, y) is along the partial derivative in x, y, t tri-directions.Can express with the vector expression of following formula:

&dtri; I \cdot U + I_{t} = 0 - - - (21)

In formula (21), for gradient direction, U=(u, v) ^trepresent light stream.

Lucas-Kanade optical flow method: Lucas-Kanade selected by text ^[25]optical flow method calculates light stream.Suppose that the light stream in the window of specifying in a size keeps constant, the optical flow constraint equation in this window can be asked thus to obtain light stream (u, v) that size is the characteristic window of x × x, that is:

[\begin{matrix} I_{x 1} & I_{y 1} \\ I_{y 2} & I_{y 2} \\ . & . \\ . & . \\ . & . \\ I_{x i} & I_{y i} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - [\begin{matrix} I_{t 1} \\ I_{t 2} \\ . \\ . \\ . \\ I_{t i} \end{matrix}] - - - (22)

In formula (22), i is the number of pixels i=(x × x) in characteristic window, I _xand I _yfor the spatial gradient of image, I _tit is time gradient.Solve formula (22) can obtain:

[\begin{matrix} u \\ v \end{matrix}] = {[\begin{matrix} Σ {I^{2}}_{x_{i}} & Σ I_{x_{i}} I_{y_{i}} \\ Σ I_{x_{i}} I_{y_{i}} & Σ {I^{2}}_{y_{i}} \end{matrix}]}^{- 1} [\begin{matrix} - Σ I_{x_{i}} I_{t_{i}} \\ - Σ I_{x_{i}} I_{t_{i}} \end{matrix}] - - - (23) .

3, the parameter attribute based on asymmetric generalized gaussian model (AGGD) extracts:

Although characteristic compares after curve be close to Gaussian distribution, they are not strictly symmetrical, choose asymmetric generalized Gaussian density distribution (AGGD) carry out matching to two category feature data according to this feature this patent.

The expression formula of AGGD is as follows:

f (x; v, σ_{l}^{2}, σ_{r}^{2}) = \{\begin{matrix} \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x < 0 \\ \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x &GreaterEqual; 0 \end{matrix} - - - (24)

Γ (α) = {&Integral;}_{0}^{\infty} t^{α - 1} e^{- t} d t, α > 0 - - - (25)

After AGGD models fitting is carried out to characteristic, extract its five parameters (α, β _l, β _r, v, u).

4, based on the Activity recognition of AGGD parameter:

4.1, based on the Activity recognition of Gradient Features AGGD parameter:

4.1.1, Human bodys' response algorithm:

Gradient Features AGGD parameter based on most of behavior spatially can make a distinction this characteristic with other behavior, and this patent chooses five parameters (α, β of the asymmetric Generalized Gaussian Distribution Model of Gradient Features _l, β _r, v, u) and as feature, if table 4.1 is feature lexical or textual analysis table.

Table 4.1 feature lexical or textual analysis table

This patent is according to the movement characteristic of different behavior, the space-time cube of respective numbers is extracted respectively after point of interest is carried out to each pretreated behavior video, gradient is carried out to space-time cube and describes the feature (flow process as shown in Figure 6) shown in rear extraction table 4.1, then the mahalanobis distance between calculation training collection behavior video feature matrix and test set behavior video feature matrix, nearest neighbor classifier is finally utilized to carry out Activity recognition, as shown in Figure 7.

Nearest neighbor classifier be using training data concentrate each sample as distinguishing rule, in training set, find the sample nearest apart from sample to be sorted, then classify on this basis.Suppose that training set has N number of sample { x ₁, x ₂..., x _n, be divided into y classification, then sample x to be sorted is to training sample x _idistance d (x, x _i) be:

d(x,x _i)＝||x-x _i||(26)

If d is (x, x _k) meet following formula:

d (x, x_{k}) = \min_{i = 1, 2, ..., N} {d (x, x_{i})}, x_{k} &Element; ω_{j},

Then x ∈ ω _j.

4.1.1, Weizmann database experiment:

Can observe from Fig. 8, the recognition accuracy of jump with side behavior is compared with the recognition accuracy of other behavior, and it relatively low, is respectively 0.82 and 0.844.Wherein, the False Rate that jump behavior is mainly mistaken for side behavior is the highest, be 0.072, main cause is that jump behavior is more similar with the leg action of side behavior, side behavior has the action of similar jump when action, thus the Gradient Features numerical value of the Gradient Features numerical value of jump behavior and side behavior relatively, cause accurately identifying; And side behavior is mainly mistaken for walk behavior, False Rate is 0.048, reason is side behavior is horizontal walking when action, and walk behavior is forward walking when action, the footwork of two behaviors is similar, just movement range is slightly different with speed, and the Gradient Features numerical value of therefore side behavior and walk behavior has certain similarity, thus misjudged likelihood ratio is higher.

4.1.3, KTH database experiment:

From in Fig. 9, compared with other behavior, the False Rate of handclap and handwave behavior is higher, and handclap behavior is mainly mistaken for box and handwave behavior, and False Rate is respectively 0.056 and 0.048; Handwave behavior is mainly mistaken for box behavior, and False Rate is 0.072.Main cause is that the part hand behavior when action of this three behaviors is more similar, the action that they have hand to stretch out forward when action, and just the speed of action is different with amplitude.Therefore, their Gradient Features numerical value has certain similarity, thus causes misjudged.

4.2, based on the Human bodys' response of the AGGD parameter attribute of Optical-flow Feature:

4.2.1, Human bodys' response algorithm:

Similar as above, this patent utilizes AGGD model to carry out matching to Optical-flow Feature data, then, extracts its five parameters (α, β _l, β _r, v, u) and as feature, feature lexical or textual analysis is as shown in table 4.2.

The AGGD parameter attribute lexical or textual analysis table of table 4.2 Optical-flow Feature

With gradient parameter characteristic behavior identification experimental technique is consistent above, according to the movement characteristic of different behavior, the space-time cube of respective numbers is extracted respectively after point of interest is carried out to each pretreated behavior video, light stream is carried out to space-time cube and describes the feature (as shown in Figure 10) shown in rear extraction table 4.2, then Activity recognition is carried out by the mahalanobis distance between calculation training collection behavior video feature matrix and test set behavior video feature matrix, nearest neighbor classifier is finally utilized to carry out Activity recognition, as shown in figure 11.

4.2.2, Weizmann database experiment:

Can observe out by Figure 12, discrimination and other behavior of side and wave1 behavior are relatively low, and discrimination is respectively 0.856 and 0.76.Wherein, side behavior is mistaken for the probability of jump behavior up to 0.06, and this may be comparatively similar relevant with the part footwork of jump behavior with side behavior, thus cause the part Optical-flow Feature numerical value of these two kinds of behaviors close and accurately identify; And wave1 behavior is mainly mistaken for wave2, jack behavior, False Rate is respectively 0.096,0.084, reason is these three behaviors have hand wobbling action when action, amplitude when just swinging is different with speed, therefore the Optical-flow Feature numerical value of these three behaviors has certain similarity, thus causes erroneous judgement.

4.2.3, KTH database experiment:

Relatively in Figure 13, the recognition accuracy of each behavior is known, and the False Rate of jog and handwave behavior is higher.Wherein, jog behavior is mainly mistaken for run, False Rate is 0.096, reason may be that jog behavior and run behavior have running action when action, just (jog behavior speed is slow for speed difference, run behavior speed), thus cause the Optical-flow Feature numeric ratio of these two behaviors more similar and cannot accurately identify; Wherein, handwave behavior is mistaken for the probability of walk up to being 0.048, and reason is that the partial act of these two behavior hands is similar, and just action is different with amplitude.

4.3, based on the Human bodys' response of AGGD parameter attribute merging gradient and Optical-flow Feature:

4.3.1, Human bodys' response algorithm:

The Activity recognition rate of the AGGD parameter attribute of two databases, two kinds of features is known from the above, Weizmann database discrimination difference in different characteristic situation is smaller, and the Activity recognition rate of the AGGD parameter attribute of KTH database light stream is more much higher than the discrimination of the AGGD parameter attribute of gradient, proposes the AGGD parameter attribute of two kinds of features to be combined into according to the order of sequence a fusion AGGD parameter attribute based on this this patent and carry out Activity recognition.Fusion gradient and the lexical or textual analysis of light stream AGGD parameter attribute are as shown in table 4.3.1.

Table 4.3 merges gradient and light stream AGGD parameter attribute lexical or textual analysis table

Consistent with above-mentioned Activity recognition method, feature (as shown in figure 14) shown in table 4.3.1 is extracted respectively to the behavior video of training set and test set, then Activity recognition is carried out by the mahalanobis distance between calculation training collection behavior video feature matrix and test set behavior video feature matrix, nearest neighbor classifier is finally utilized to carry out Activity recognition, as shown in figure 15.

4.3.2, Weizmann database experiment:

What discrimination was minimum as can be known from Fig. 16 is jump and wave1 behavior, and discrimination is respectively 0.856 and 0.844.Wherein, jump behavior is mistaken for the probability of walk behavior up to 0.084, and reason is that jump behavior is similar to the part footwork of walk behavior, causes the AGGD parameter differences of fusion feature data little and cannot accurately identify; And wave1 behavior is mainly mistaken for wave2, jack behavior, False Rate is respectively 0.084,0.072, reason is these three behaviors have hand wobbling action when action, amplitude when just swinging is different with speed, therefore the AGGD parameter values of the fusion feature data of wave1 behavior and wave2, jack behavior has certain similarity, causes erroneous judgement.

4.3.3, KTH database experiment:

Adopt experiment group technology of the same race above to test, the average behavior discrimination obtaining KTH storehouse is 95.2%.Figure 17 is the confusion matrix of the Activity recognition rate in KTH storehouse, wherein, the discrimination of jog behavior is low compared with other behavior, its probability being mistaken for run behavior is up to 0.088, main cause may be jog and run is all running behavior, and just jog jogs, and run hurries up, both have a difference in speed, cause the fusion feature parameter value of these two behaviors more similar and cannot accurately identify.

4.3.4, the Activity recognition rate of two different AGGD parameter attributes of database:

Following table is the discrimination of three kinds of AGGD parameter attributes in Weizmann database and KTH database.Can observe in Weizmann database from table, the discrimination of the AGGD parameter attribute of light stream is minimum, and being that the discrimination of 90.16%, compound AGGD parameter attribute is the highest, is 93.16%; In KTH database, the discrimination of the AGGD parameter attribute of gradient is minimum, and being that the discrimination of 88.40%, compound AGGD parameter attribute is the highest, is 95.20%.

As can be seen from the Activity recognition rate of the different parameters of above-mentioned two databases, the Activity recognition rate merging light stream and gradient parameter is all higher than the discrimination respectively based on gradient and light stream parameter.

The Activity recognition rate of the different AGGD parameter attribute of table 4.4 two databases

Finally, it is also to be noted that what enumerate above is only a specific embodiment of the present invention.Obviously, the invention is not restricted to above embodiment, many distortion can also be had.All distortion that those of ordinary skill in the art can directly derive from content disclosed by the invention or associate, all should think protection scope of the present invention.

Claims

1., based on a Human bodys' response method for asymmetric generalized gaussian model, realized by training video storehouse and test video; It is characterized in that: comprise the following steps:

Step one, carries out point of interest detection respectively for given training video storehouse and test video;

Step 2, extracts video block centered by point of interest;

Step 3, respectively the video block information of calculation training video and test video, and obtain respective X, Y, Z tri-direction gradient feature data and Optical-flow Feature u, v two component data;

Step 4, draws gradient three direction histogram and light stream two direction histogram respectively to above-mentioned data;

Step 5, carrys out the corresponding histogram of matching with asymmetric generalized gaussian model;

Step 6, extracts h _evthe parameter of asymmetric generalized gaussian model forms the eigenmatrix of each behavior of training video and the eigenmatrix of test video as feature;

Step 7, calculates the mahalanobis distance between test video eigenmatrix and each behavioural characteristic matrix of training video;

Step 8, carries out Activity recognition according to nearest neighbouring rule.

2. as to the Human bodys' response method based on asymmetric generalized gaussian model according to claim 1, it is characterized in that: in described step, point of interest detects as follows:

Video is regarded as the image sequence f (x, y, t) of multiple image composition;

Defined function f: f obtains L after adding linear space yardstick: by image sequence f through separate space variable be with time variable be gaussian function carry out convolutional filtering and obtain, expression formula is as follows:

L (\cdot; σ_{l}^{2}; τ_{l}^{2}) = g (\cdot; σ_{l}^{2}; τ_{l}^{2}) * f (.) - - - (9)

Gauss's window in time-space domain be defined as:

g (x, y, t; σ_{l}^{2}; τ_{l}^{2}) = \frac{1}{\sqrt{{(2 π)}^{3} σ_{l}^{4} τ_{l}^{2}}} \times \exp (\frac{- {(x^{2} + y^{2})}^{2}}{2 σ_{1}^{2}} - \frac{t^{2}}{2 τ_{l}^{2}}) - - - (10)

σ in formula _lfor space scale variable, τ _lfor time scale variable, t is time dimension;

Definition response function R is:

R(x,y,t)＝(I*g*h _ev) ²+(I*g*h _od) ²(11)

In formula, * is convolution algorithm symbol, and I is video image, and g is dimensional Gaussian smoothing kernel, h _evand h _odit is one dimension Gabor filter orthogonal in spatial domain;

H _evand h _odbe defined as:

h_{e v} (t; τ; ω) = - c o s (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (12)

h_{o d} (t; τ; ω) = - s i n (2 π t ω) e^{\frac{- t^{2}}{- τ^{2}}} - - - (13)

In formula (12), formula (13), σ and τ is respectively the detection yardstick in spatial domain and time domain, gets σ=2 and τ=3; Gaussian smoothing filter yardstick is 2;

3., as to the Human bodys' response method based on asymmetric generalized gaussian model according to claim 2, it is characterized in that: in described step, Optical-flow Feature u, v two component data comprise as follows:

The extraction of space-time characteristic point:

After point of interest detection is carried out to image sequence, obtain space-time interest points, centered by space-time interest points, define a space-time cube, extract the pixel of this space-time cube to construct space-time characteristic;

G _x(x,y,t)＝L(x+1,y,t)-L(x-1,y,t)，(14)

G _y(x,y,t)＝L(x,y+1,t)-L(x,y-1,t)，(15)

G _z(x,y,t)＝L(x,y,t+1)-L(x,y,t-1)，(16)

The extraction of Optical-flow Feature:

Lucas-Kanade method is adopted to calculate light stream:

Be located at moment t place, pixel (x, y) is at position 1 place, gray-scale value is herein I (x, y, t), in (the t+ △ t) moment, preimage vegetarian refreshments moves to position 2 place, now its change in location is (x+ △ x, y+ △ y), and new gray-scale value is I (x+ △ x, y+ △ y, t+ △ t); According to image consistency hypothesis, meet then:

I(x,y,t)＝I(x+△x,y+△y,t+△t)(17)

I (x + Δ x, y + Δ y, t + Δ t) = I (x, y, t) + \frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t + ϵ - - - (18)

After high-order term ε more than second order is ignored, then meet:

\frac{\partial I}{\partial x} Δ x + \frac{\partial I}{\partial y} Δ y + \frac{\partial I}{\partial t} Δ t = 0 - - - (19)

∵△t→0

∴

\frac{\partial I}{\partial x} \frac{d x}{d t} + \frac{\partial I}{\partial y} \frac{d y}{d t} + \frac{\partial I}{\partial t} = 0

That is: I _xu+I _yv+I _t=0 (20)

&dtri; I \cdot U + I_{t} = 0 - - - (21)

In formula (21), for gradient direction, U=(u, v) ^trepresent light stream;

Suppose that the light stream in the window of specifying in a size keeps constant, the optical flow constraint equation in this window can be asked thus to obtain light stream (u, v) that size is the characteristic window of x × x, that is:

[\begin{matrix} I_{x 1} & I_{y 1} \\ I_{y 2} & I_{y 2} \\ . & . \\ . & . \\ . & . \\ I_{x i} & I_{y i} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - [\begin{matrix} I_{t 1} \\ I_{t 2} \\ . \\ . \\ . \\ I_{t i} \end{matrix}] - - - (22)

In formula (22), i is the number of pixels i=(x × x) in characteristic window, I _xand I _yfor the spatial gradient of image, I _tit is time gradient;

Solve formula (22) can obtain:

[\begin{matrix} u \\ v \end{matrix}] = {[\begin{matrix} Σ {I^{2}}_{x_{i}} & Σ I_{x_{i}} I_{y_{i}} \\ Σ I_{x_{i}} I_{y_{i}} & Σ {I^{2}}_{y_{i}} \end{matrix}]}^{- 1} [\begin{matrix} Σ I_{x_{i}} I_{t_{i}} \\ Σ I_{x_{i}} I_{t_{i}} \end{matrix}] - - - (23) .

4. as to the Human bodys' response method based on asymmetric generalized gaussian model according to claim 3, it is characterized in that: in described step, the parameter attribute based on asymmetric generalized gaussian model extracts:

The expression formula of asymmetric generalized gaussian model is as follows:

f (x; v, σ_{l}^{2}, σ_{r}^{2}) = \{\begin{matrix} \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x < 0 \\ \frac{v}{(β_{l} + β_{r}) Γ (\frac{1}{v})} \exp (- {(\frac{- x - u}{β_{l}})}^{v}), x &GreaterEqual; 0 \end{matrix} - - - (24)

\begin{matrix} Γ (α) = {&Integral;}_{0}^{\infty} t^{α - 1} e^{- t} d t & α > 0 \end{matrix} - - - (25)