CN100570640C

CN100570640C - A kind of method for expressing of people's motion

Info

Publication number: CN100570640C
Application number: CNB2007101753713A
Authority: CN
Inventors: 陈�峰; 杜友田
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2007-09-29
Filing date: 2007-09-29
Publication date: 2009-12-16
Anticipated expiration: 2027-09-29
Also published as: CN101123004A

Abstract

The invention discloses a kind of method for expressing of people's motion, will come on the embedded space of people's Motion mapping to a low-dimensional by Nonlinear Dimension Reduction; With the linear temporal model modeling of the data behind the dimensionality reduction.Wherein will come on the configuration space of the embedding of people's Motion mapping to a low-dimensional by Nonlinear Dimension Reduction; Be embodied as: z=(x ₁+ jy ₁, x ₂+ jy ₂..., x _M+ jy _M) ^TWith the linear temporal model modeling of the data behind the dimensionality reduction; Performing step: based on the motion modeling of linear session series model; The autoregressive model (AR) of setting up p rank comprises; The autoregressive model on p rank (AR) AR (p) has following parameter: coefficient matrices A _k∈ R ^{M * m}, be the parameter v that non-zero is introduced in order to guarantee the dynamic process average, the covariance matrix Q of white Gaussian noise establishes A _kBeing diagonal matrix, is separate between each component of z (t) then; Given two autoregressive models (AR) A=[v, A ₁, A ₂..., A _p] and A '=[v ', A ₁', A ₂' ..., A _p'], distance metric D (A, A ')=‖ A-A ' ‖ then _F, ‖ ‖ wherein _FThe F-norm of representing matrix.

Description

A kind of method for expressing of people's motion

Technical field

The present invention relates to a kind of method for expressing of people's motion, belong to computer vision field and video content intellectual analysis field.

Background technology

In computer vision field and the video content intellectual analysis field, human motion analysis has become the research topic in an extremely important and forward position ^[1-7]In human motion analysis, people's motion detection, tracking belong to the processing of low level in the vision, and people's motion is expressed and understanding belongs to high-level processing.People's motion is expressed and is understood and play crucial effects in applications such as video brainpower watch and control.

In the more than ten years in past, a lot of movement representation methods have been emerged about the people ^[1-6]Wherein, most research work movable information of extracting static information in every two field picture or consecutive frame is represented people's motion.People such as Efros ^[1]Adopt light stream to come the movable information of expressing human, and remote motion is analyzed.Arie ^[2]Adopt 3D column model to come volume modeling Deng the people, in every frame, extract angle and angular velocity thereof between health various piece and the transverse axis for the people.These method for expressing have only reflected the spatial character of motion in every frame, do not embody the dynamic perfromance of motion in a period of time well, and the data volume that expression needs is very big.In recent years, there was a spot of research work to adopt the linear temporal model to study the dynamic perfromance of motion.Liu and Ahuja ^[4]Propose a kind of dynamic shape model and represented the variation of target in a period of time, the profile employing Fourier descriptors (FD) of target has been represented, adopted autoregression (AR) model to come the dynamic perfromance of evaluating objects motion then.People such as Veeraraghavan ^[5], Jin and Nokhtarian ^[6]Adopt the Kendall shape theory to come the profile of expressing human, adopt autoregression (AR) model and autoregressive moving average (ARMA) model to come people's gait and action are analyzed then.The shortcoming of these methods is that the data dimension is high especially before adopting the linear temporal model analysis, need more training data when modeling, and the estimation of parameter is also not accurate enough.

The movable information that static information in every two field picture or consecutive frame are extracted in most research work is represented people's motion.People such as Efros [1] adopt light stream to come the movable information of expressing human, and remote motion is analyzed.People such as Arie [2] adopt 3 dimension column models to come the volume modeling for the people, extract angle and angular velocity thereof between health various piece and the transverse axis in every frame.

These method for expressing have only reflected the spatial character of motion in every frame, do not embody the dynamic perfromance of motion in a period of time well, and the data volume that expression needs is very big.

In recent years, there was a spot of research work to adopt the linear temporal model to study the dynamic perfromance of motion.Liu and Ahuja[4] propose a kind of dynamic shape model and represented the variation of target in a period of time, adopt Fourier descriptors (FD) to represent the profile of target, adopt the AR model to come the dynamic perfromance of evaluating objects motion then.People such as Veeraraghavan [5], Jin and Nokhtarian[6] adopt the Kendall shape theory to come the profile of expressing human, adopt AR model and arma modeling to come people's gait and action are analyzed then.

The shortcoming of these methods is that the data dimension is high especially before adopting the linear temporal model analysis, need more training data when modeling, and the estimation of parameter is also not accurate enough.

Summary of the invention

Fundamental purpose of the present invention is, by a kind of method for expressing of people's motion is provided, to solve at present in computer vision field and video content intellectual analysis field, the data dimension is high especially before the employing linear temporal model analysis that exists, when modeling, need more training data, and the also not accurate enough problem of the estimation of parameter.

In people's the motion, each attitude is represented by the vector of a higher-dimension that all the distribution of these vectors in feature space presents severe nonlinear.For the data that in higher dimensional space, present nonlinear Distribution, adopt the Nonlinear Dimension Reduction method to come the inner relation that exists of learning data to seem extremely important.This patent has proposed a kind of compact expression of people's motion.We at first adopt a kind of method of Nonlinear Dimension Reduction---and local linear algorithm (LLE) algorithm [8] that embeds is mapped to the embedded space of low-dimensional with the feature space of higher-dimension, and this helps the immanent structure that the finder moves.In lower dimensional space, come motion is analyzed then by linear temporal model AR model and arma modeling.

The present invention adopts following technological means to realize:

A kind of method for expressing of people's motion: will come on the embedded space of people's Motion mapping to a low-dimensional by Nonlinear Dimension Reduction; With the linear temporal model modeling of the data behind the dimensionality reduction;

Wherein saidly will be on the configuration space of the embedding of people's Motion mapping to a low-dimensional come by Nonlinear Dimension Reduction; Realize by following steps:

The expression of described configuration space; May further comprise the steps:

Adopt M gauge point P={p ₁, p ₂..., p _MPeople's outline is described, then each profile can be represented with complex vector z:

z＝(x ₁+jy ₁，x ₂+jy ₂，…，x _M+jy _M) ^T

Wherein: x _iAnd y _iRepresent i gauge point p respectively _iHorizontal ordinate and ordinate; The expression of this people's attitude need have unchangeability for position and isotropic dimensional variation, and vector Z is normalized to z '; The real part of described z ' and imaginary part constitute profile;

The immanent structure that adopts the local linear motion feature that embeds algorithm (LLE) finder in high-dimensional feature space, to hide; Suppose that the input sample set is Z, then the point set Y of Dui Ying low-dimensional embedded space can obtain by the local linear algorithm (LLE) that embeds;

Determine the dimension e of embedded space; Wherein said with the linear temporal model modeling of the data behind the dimensionality reduction; Realize by following steps: based on the motion modeling of linear session series model; The autoregressive model (AR) of setting up p rank comprises; The autoregressive model on p rank (AR) AR (p) has following parameter: coefficient matrices A _k∈ R ^{M * m}, be the parameter v that non-zero is introduced in order to guarantee the dynamic process average, the covariance matrix Q of white Gaussian noise establishes A _kBeing diagonal matrix, is separate between each component of z (t) then; Given two autoregressive models (AR) A=[v, A ₁, A ₂..., A _p] and A '=[v ', A ₁', A ₂' ..., A _p'], distance metric then:

D (A, A ')=‖ A-A ' ‖ _F, ‖ ‖ wherein _FThe F-norm of representing matrix.

Aforesaid motion modeling based on the linear session series model, can also adopt following steps to realize:

Set up autoregressive moving-average model (ARMA):

A given observation sequence z (1), z (2) ..., z (τ) obtains the model parameter of limited sample sequence correspondence by the method for maximal possibility estimation;

Distance between the autoregressive moving-average model can be measured with the subspace angle; The angle of two autoregressive moving-average model models is θ _i(i=1,2 ..., n); Usually there are three distances can be used for calculating: Martin distance (d _M), gap distance (d _g) and Frobenius distance (d _F).

The present invention compared with prior art has following remarkable advantages and beneficial effect:

The present invention has promptly reduced and is used for representing the data dimension that the people moves, and has given expression to the dynamic perfromance of people's motion again, has great significance for the classification and the understanding of motion.

Embodiment

Down to specific embodiments of the invention are illustrated:

Step 1: to the expression of configuration space: the gesticulate that at first is based on profile;

Profile is a kind of method of good expression people attitude, and it is insensitive for the color of the variation on people surface such as clothes and texture etc.Adopt M gauge point P={p ₁, p ₂..., p _MPeople's outline is described, then each profile can be represented with complex vector z: z=(x ₁+ jy ₁, x ₂+ jy ₂..., x _M+ jy _M) ^T, x wherein _iAnd y _iRepresent i gauge point p respectively _iHorizontal ordinate and ordinate.The expression of this people's attitude need have unchangeability for position and isotropic dimensional variation, so Z is normalized to z '.The representation that constitutes profile by real part and the imaginary part of complex vector z ' then.

The Nonlinear Mapping of profile below is described:

Local linear embedding algorithm (LLE) is a kind of good Nonlinear Dimension Reduction method, and it can make the neighborhood relationships of dimensionality reduction front and back data remain unchanged.Attitude in people's the motion in a certain frame usually and the attitude in adjacent several frames have substantial connection, and and other attitude relation not tight, so the local linear motion that algorithm (LLE) is very suitable for the analyst that embeds.

This patent adopts the local linear algorithm (LLE) that embeds to come the immanent structure that finder's motion feature hides and learn its stream shape at the low-dimensional embedded space in high-dimensional feature space.Suppose that the input sample set is Z, then the point set Y of Dui Ying low-dimensional embedded space can learn to obtain by the local linear algorithm (LLE) that embeds.

After obtaining the local linear result who embeds algorithm (LLE) on the training set, it is necessary learning a mapping in from the higher dimensional space to the lower dimensional space again.There is certain methods to address this problem at present.

Local linear the embedding in the algorithm (LLE) has an important parameters---and the dimension e of embedded space need determine.Occurred a lot of methods at present, this patent adopts the method for Eigenvalue Analysis to solve.

Step 2: based on the motion modeling of linear session series model:

How people's attitude is changed show very important.If learn a model that can reflect people's attitude variation exactly, we just can obtain a good movement representation.Because people's motion more complicated, and become when being, be difficult to it and set up accurate, unified analytic model.It is motion modeling that this patent adopts the linear session model approximation, and its parameter can reflect the dynamic perfromance that the people moves preferably.This patent adopts following two kinds of methods to analyze.

A) autoregressive model (AR)

The autoregressive model on p rank (AR) AR (p) has following parameter: coefficient matrices A _k∈ R ^{M * m}, be the parameter v that non-zero is introduced in order to guarantee the dynamic process average, the covariance matrix Q of white Gaussian noise.In order to simplify complexity, we suppose that Ak is a diagonal matrix, that is to say that between each component of z (t) be separate.

This patent adopts Neumaier and Schneider algorithm to learn autoregressive model (AR), and this algorithm can guarantee the uniqueness of parameter estimation.Given two autoregressive models (AR) A=[v, A ₁, A ₂..., A _p] and A '=[v ', A ₁', A ₂' ..., A _p'], this patent adopts following distance metric:

B) autoregressive moving-average model (ARMA)

Autoregressive moving-average model (ARMA) and autoregressive model (AR) are similar.A given observation sequence z (1), z (2) ..., z (τ) obtains the model parameter of limited sample sequence correspondence by the method for maximal possibility estimation.

Distance between the autoregressive moving-average model (ARMA) can be measured with the subspace angle.The angle of two autoregressive moving-average model models is θ _i(i=1,2 ..., n).Usually there are three distances can be used for calculating: Martin distance (d _M), gap distance (d _g) and Frobenius distance (d _F).

It should be noted that at last: above embodiment only in order to the explanation the present invention and and unrestricted technical scheme described in the invention; Therefore, although this instructions has been described in detail the present invention with reference to each above-mentioned embodiment,, those of ordinary skill in the art should be appreciated that still and can make amendment or be equal to replacement the present invention; And all do not break away from the technical scheme and the improvement thereof of the spirit and scope of invention, and it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1, a kind of method for expressing of people's motion is characterized in that;

To come on the embedding configuration space of people's Motion mapping to a low-dimensional by Nonlinear Dimension Reduction;

With the linear temporal model modeling of the data behind the dimensionality reduction;

Wherein saidly will be on the embedding configuration space of people's Motion mapping to a low-dimensional come, realize by following two steps by Nonlinear Dimension Reduction:

The 1st step was based on the gesticulate of profile, comprised following two steps:

At first, adopt M gauge point P={p ₁, p ₂..., p _MPeople's outline is described, each profile can be represented with complex vector z:

z＝(x ₁+jy ₁，x ₂+jy ₂，…，x _M+jy _M) ^T

X wherein _iAnd y _iRepresent i gauge point p respectively _iHorizontal ordinate and ordinate;

Secondly, vectorial z is normalized to complex vector z ', makes the expression of people's attitude have unchangeability for position and isotropic dimensional variation, the real part of described complex vector z ' and imaginary part constitute profile;

The 2nd step was non-linear dimensionality reduction, the immanent structure that adopts the local linear motion feature that embeds algorithm (LLE) finder to hide in high-dimensional feature space; Suppose that the input sample set is Z, the point set Y that then corresponding low-dimensional embeds configuration space can obtain by the local linear algorithm (LLE) that embeds; Determine the dimension e of the embedding configuration space of low-dimensional;

Wherein said with the linear temporal model modeling of the data behind the dimensionality reduction, adopt the linear session series model to be approximately motion modeling, specifically realize by following steps:

Set up the autoregressive model on p rank, the autoregressive model AR (p) on one of them p rank has following parameter:

Coefficient matrices A _k∈ R ^{M * m}, be parameter v, the covariance matrix Q of white Gaussian noise that non-zero is introduced in order to guarantee the dynamic process average;

If A _kBe diagonal matrix, then separate between each component among the observation sequence z (t);

Given two autoregressive models (AR) A=[v, A ₁, A ₂..., A _p] and A '=[v ', A ₁', A ₂' ..., A _p'], it is apart from adopting following tolerance:

D (A, A ')=|| A-A ' || _F, wherein || ‖ _FThe F-norm of representing matrix; This distance is used for measuring two similaritys between motion, thereby the motion of determining the people is expressed.

2, the method for expressing of a kind of people's motion according to claim 1 is characterized in that described employing linear session series model is approximately motion modeling and can also adopts following steps to realize:

Set up autoregressive moving-average model (ARMA):

Distance between the autoregressive moving-average model can be measured with the subspace angle; The angle of two autoregressive moving-average models is θ _i(i=1,2 ..., n); Usually there are three distances can be used for calculating: Martin distance (d _M), gap distance (d _g) and Frobenius distance (d _F).