CN102682452A

CN102682452A - Human movement tracking method based on combination of production and discriminant

Info

Publication number: CN102682452A
Application number: CN2012101048051A
Authority: CN
Inventors: 韩红; 冯光洁; 谢福强; 苟靖翔; 王瑞; 韩启强; 张红蕾; 顾建银; 李晓君
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2012-04-12
Filing date: 2012-04-12
Publication date: 2012-09-19

Abstract

The invention discloses a human movement tracking method based on combination of production and discriminant. The human movement tracking method is mainly used for solving the problem of inaccurate tracking result of the human movement in the prior art. The implementation steps are as follows: building a human skeleton model; pre-treating a video picture to obtain a detection joint; extracting bandlet 2 characteristic of the video picture; inputting the extracted bandlet 2 characteristic and predicting the human gesture by using double-Gauss; initializing the human skeleton model according to the detected joint; structuring 2D and 3D similarity functions between the joint predicted by the double-Gauss and the detected joint; minimizing the similarity function under the restriction of the skeleton length to obtain a group of human gesture; and selecting the state with the minimum Euclidean distance to the former frame skeleton from the obtained human gesture as the best movement gesture of the current frame. Compared with the existing human tracking method, the human movement tracking method based on combination of production and discriminant has the advantages of high accuracy of tracking result and high stability, and can be applied to the medical treatment, the physical training, the animation production and an intelligent monitoring system.

Description

Human body motion tracking method based on production and discriminant combination

Technical field

The invention belongs to technical field of image processing; Further relate to and realize a kind of method that human motion is followed the tracks of in the computer vision field; Adopt a kind of method of multiple-objection optimization to realize human motion tracking and 3 d pose estimation, can be used for fields such as athletic training and cartoon making.

Background technology

The main task that human motion is followed the tracks of is from video image, to detect human body contour outline, and the articulation point to human body positions again, identifies the human motion attitude on this basis, final reconstruction of three-dimensional human motion attitude.Because video image is the projection of human body contour outline on two dimensional image in the three-dimensional scenic at present; So; Lost a large amount of depth informations, and in the human motion process, human limb takes place often from blocking phenomenon; There is ambiguousness in video image, and this makes and is difficult to from unmarked monocular video, recover the human motion attitude.But, owing in various aspects such as therapeutic treatment, athletic training, cartoon making, intelligent monitor systems potential application and economic worth are arranged all based on the human motion tracking of monocular video, so received a lot of scholars' concern.The method of following the tracks of based on the human motion of video so far, mainly is divided into two big types: follow the tracks of and follow the tracks of based on the human motion of model based on the human motion of study.

First kind; Human body motion tracking method based on study: this method is at first extracted accurate characteristics of image at the video image and the target video image lane database of training; The characteristics of image in learning training vedio data storehouse and the mapping between the movement capturing data then, direct end user's body characteristics recovers 3 d pose on target video image at last.Like Urtasun et al. (R.Urtasun and T.Darrell.Local Probabilistic Regression for Activity-Independent Human Pose Inference IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2008) article; Just be to use balance Gaussian process dynamic model to instruct and in the monocular video sequence, follow the tracks of 3 d human motion, this dynamic model be from the less training data middle school acquistion that comprises various modes to.Sigal et al. (L.Sigal and M.Black.Measure Locally; Reason Globally:Occlusion-sensitive articulated pose estimation.IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2006.) Bayesian frame of proposition in this article; This framework comprises sequence importance sampling and annealing particle filter, and when following the tracks of, has used multiple motion model.In order to make 3 d pose recover to meet the anatomy joint constraint more, make the search volume dimensionality reduction simultaneously, this framework is learnt motion model from training data, and the Euclidean distance difference of using virtual tag is as error in measurement.The shortcoming of this method is to extract accurate characteristics of image to require a great deal of time, and video tracking receives the restriction that whether has learning database, if there is not learning database, then can't accomplish video tracking.

Second kind; Human body motion tracking method based on model: this method does not need learning database; Directly on target video image, extract image information; Set up the similarity function of target image and model, thereby then similarity function is optimized the optimum state of search in the state space of higher-dimension, thereby obtain human body attitude accurately.C.Sminchisescu and A.Jepson. like institut national de recherche en infomatique et automatique (INRIA) adopt this kind method to realize using the motion tracking of multiple manikin in the article of (C.Sminchisescu and A.Jepson.Generative Modeling for Continuous Non-Linearly Embedded Visual Inference.International Conference on Machine Learning (ICML), 2004).Deutscher et al. is at (J.Deutscher and I.Reid.Articulated body motion capture by stochastic search.International Journal of ComputerVision (IJCV); 61 (2): 185-205; 2004.) article in use border and silhouette to make up the similarity function of weighting as characteristics of image, use the particle filter of annealing and realized the human motion tracking.Because this method only sets up a similarity function, and the method that is used to optimize similarity function is easy to be absorbed in local optimum when the search optimal result, and the human body attitude that causes tracing into is inaccurate, and the time complexity of algorithm is high.

The number of patent application 200910043537.5 of Hunan University's application; " based on the multi-human body tracking method of attribute relational graph appearance model " of publication number CN101561928; This patent is at first set up attribute relational graph appearance model to present frame human detection zone; Calculate the similarity of following the tracks of the attribute relational graph appearance model of human body with previous frame, confirm the matching relationship of interframe human body, thereby confirm the human body tracking situation and obtain movement locus according to similarity.The deficiency that this patented claim disclosed method exists is: can only carry out human body tracking to the sporter of fixed scene, only the similarity through display model is not enough to follow the tracks of accurately human body attitude.

Summary of the invention

The objective of the invention is to above-mentioned deficiency of the prior art, proposes a kind of human body motion tracking method of the multiple-objection optimization based on model, realize the sporter in the different scenes is carried out human body attitude tracking accurately.

The technical scheme that realizes the object of the invention is; Employing is set up the human skeleton model based on the method for model, utilizes video image to extract the position and the half-tone information of articulation point; Make up two apart from similarity function; Through the multiple goal that makes up is optimized apart from similarity function, under the constraint of skeleton length, adopt multi-objective optimization algorithm to realize tracking apart from similarity function to the human motion attitude to these two.Implementation step comprises as follows:

(1) sets up the three-dimensional human skeleton model with the abstract method of bone: be about to human skeleton and be divided into 14 parts according to 15 joints; Every part is by a shaft-like bone model tormulation; The straight-line segment that has with 14 in the space between the articulation point of three-dimensional coordinate is represented this 14 shaft-like skeleton models; Connect corresponding body joint point coordinate and form whole three-dimensional human skeleton model; When importing the D coordinates value of 15 corresponding articulation points of one group of movement human, the human skeleton model will simulate the athletic posture of 3 D human body;

(2) pre-service human body video image

2a) input human body video image divides acquisition human body silhouette through background subtraction, extracts human body contour outline, and human body contour outline is carried out the axis micronization processes, forms the human skeleton line;

2b) in human skeleton line upper edge skeleton line search for to the end, belly, knee, pin node coordinate position, use the particle filter prediction to detect remaining human body body joint point coordinate position;

(3) the characteristics of image r of the second generation strip wave conversion Bandlet2 of extraction video image as the input of double gauss process, uses double gauss TGP algorithm, dopes 3 dimension coordinate articulation point v ' of i frame human body _i, i ∈ [1, N], the 3D articulation point that obtains video sequence is output as V ',

r＝(r ₁，r ₂，r ₃，...，r _N) ^T，

V^{'} = {(v_{1}^{'}, v_{2}^{'}, v_{3}^{'}, . . ., v_{N}^{'})}^{T},

Wherein, r _iThe Bandlet2 that is the i two field picture is a characteristics of image, i ∈ [1, N], () ^TThe commentaries on classics order of representing matrix;

(4) initialization human skeleton model

4a) to step 2b) the initial time video image articulation point position that obtains carries out manual demarcation, and by nominal data the corresponding human skeleton of initial time human body attitude is set and is designated as v ₀, v wherein ₀Be 2b) in the human joint points position of detected first frame video image;

4b) t-1 is followed the tracks of the human skeleton that obtains constantly as t initialization human skeleton constantly, t＞0;

(5) make up similarity function

5a) the 3D articulation point of human body is represented with V that the 2D articulation point is used V ^qExpression, V ^qBe the projection of V on the 2D plane, V is for treating estimator:

V＝(v ₁，v ₂，v ₃，...，v _N) ^T，

V^{q} = (v_{1}^{q}, v_{2}^{q}, v_{3}^{q}, . . ., v_{N}^{q}),

Wherein, v _iBe the 3D articulation point of i two field picture, i ∈ [1, N],

Be the 2D articulation point of i frame, i ∈ [1, N], N is the video frame number;

5b) will on the 2D plane, do projection, obtain the body joint point coordinate V ' of 2D projection with the i frame human body 3D articulation point V ' that double gauss TGP method dopes _p:

V_{p}^{'} = {(v_{p_{1}}^{'}, v_{p_{2}}^{'}, v_{p_{3}}^{'}, . . ., v_{p_{N}}^{'})}^{T},

Wherein, is the projection of i frame articulation point 3D articulation point on 2D; I ∈ [1, N];

5c) set up respectively under the 3D apart from similarity function f ₁(v _i, v ' _i) and 2D under apart from similarity function

(6) utilize non-domination neighborhood immune algorithm, at t constantly to two apart from similarity function f ₁(v _i, v ' _i),

Under the bone length constraint, be optimized, obtain t one group of human skeleton similar constantly with the real human body athletic posture;

(7) constantly each is obtained skeleton by the human skeleton that step (6) obtains at t; Calculate the Euclidean distance of the human skeleton articulation point that this skeleton articulation point and t-1 trace into constantly, select the most accurate human skeleton that the minimum human skeleton of Euclidean distance traces into as t constantly.

The present invention has the following advantages compared with prior art:

1, because the present invention has used particle filter prediction human joint points to obtain more accurate human joint points picture position, the method for compared with prior art obtaining the articulation point position is simpler, and time complexity is lower.

2, the present invention has been owing to combined the popular production in current human body tracking field and the method for discriminant, set up respectively 2D and 3D apart from similarity function, can the better utilization video image information.

3, the present invention is owing to used the non-domination neighborhood immune algorithm optimization aim function of multi-target evolution algorithm, and more existing single goal is optimized human body tracing method can avoid being absorbed in local optimum, has improved the degree of accuracy that human motion is followed the tracks of.

Description of drawings

Fig. 1 is a general flow chart of the present invention;

Fig. 2 detects sub-process figure for the human joint points among the present invention;

Fig. 3 is the three-dimensional tracking results figure of the present invention to the emulation experiment of walking posture;

Fig. 4 is the three-dimensional tracking results figure of the present invention to the emulation experiment of boxing attitude.

Embodiment

Below in conjunction with accompanying drawing the present invention is done further description.

With reference to Fig. 1, concrete performing step of the present invention is following:

Step 1 is set up the human skeleton model.

According to anatomical knowledge, though human skeleton receives age and health affected and constantly changes, the composition of skeleton is constant, and human body roughly comprises: shin bone, femur, hipbone, trunk, radius, humerus, clavicle, neck, head.The present invention is expressed as human body the skeleton pattern of following shaft-like bone to form by 15 articulation points and 14 in this case.The straight-line segment that has with 14 in the Virtual Space between the articulation point of three-dimensional coordinate is represented this 14 shaft-like skeleton models.

With the coordinate representation of each articulation point do

I ∈ [1,15], n ∈ [1, N], N are human motion video frame number to be tracked; N frame human skeleton is expressed as

The bone length of adjacent two articulation points is expressed as

P, q ∈ [1,15] obtains the restrictive condition of human skeleton model thus || L _iv _n||=l _i, i=1,2 ..., L wherein _iBe 3 * 15 matrixes, l _iBe the length of i root bone, m is total bone number;

In above-mentioned bone constraint || L _iv _n||=l _i, i=1,2 ... Down, connect to form whole three-dimensional human skeleton model by adjacent articulation point, during the D coordinates value of 15 articulation points, the human skeleton model will simulate the 3 D human body attitude of motion when the corresponding human motion of one group of input.

Step 2, the preprocessed video image.

With reference to Fig. 2, the concrete realization of this step is following:

2a) the skeleton line of extraction human body silhouette:

2a1) input human body video image adopts least square intermediate value LMedS method to obtain background image;

2a2) background image and the human motion image that obtain are done pixel difference, obtain the background subtraction image;

2a3) adopt morphological method to remove the noise of cutting apart in the background subtraction image, obtain human body silhouette clearly the background subtraction image that obtains;

2a4) adopt border following algorithm to obtain human body silhouette outline, extract the axis refinement human body silhouette of silhouette outline, obtain the skeleton line of human body silhouette the human body silhouette that obtains;

2b) at step 2a) in the skeleton line upper edge skeleton line search of the human body silhouette that obtains, to the end, the coordinate position of root, knee, pin node:

2b1) use the concentric circles template to search for along skeleton line, with the human body silhouettes point center of circle the most for a long time that falls into annulus as head node;

2b2) choosing human body silhouette center of gravity position is root node, with the arithmetic mean of everyone the side shadow point x coordinate figure x coordinate as root node, with the arithmetic mean of the y coordinate figure y coordinate as root node;

Be benchmark projection on video image with the root node 2b3), obtain trunk central point, clavicle joint point and left and right sides buttocks articulation point the three-dimensional human skeleton model;

2b4) according to the head of above acquisition, the root articulation point is used particle filter to detect to sell, the coordinate position of elbow, shoulder, knee and pin articulation point.

Step 3: the characteristics of image r that extracts the second generation strip wave conversion Bandlet2 of video image:

3a) import pending video image, extract human body block diagram in the image, block diagram is carried out two-dimentional multi-scale wavelet transformation;

3b) image behind the two-dimentional multi-scale wavelet transformation is sought with quad-tree partition algorithm and bottom-up fusion rule and quantized optimum geometry flow direction;

3c) the optimum geometry flow direction signal after will quantizing is done one-dimensional wavelet transform, is reassembled as two dimensional form, obtains the Bandelet2 matrix of coefficients;

3d) extract the Bandlet2 characteristic r of maximum geometry flow statistical nature as image, r=(r ₁... r _i... r _N) ^T, wherein, r _iBe the Bandlet2 characteristics of image of i two field picture, i ∈ [1, N], N is a video frame number () ^TThe transposition of representing matrix.

Step 4: the second generation strip wave conversion Bandlet2 characteristics of image r of the video image that extracts with step 3 as the input of double gauss method, dopes 3 dimension coordinate articulation point v ' of i frame human body _i:

[\begin{matrix} {({(V^{'})}^{(d)})}^{T} \\ {(v_{i}^{'})}^{(d)} \end{matrix}] &Proportional; N_{R} (0, [\begin{matrix} K_{R} & K_{R}^{r} \\ {(K_{R}^{r})}^{T} & K_{R} (r, r) \end{matrix}]),

Wherein, N _R() expression Gaussian process, () ^TThe transposition of representing matrix, r is the bandlet2 characteristic of input, V ' is the 3D articulation point output of human body attitude to be predicted, V '=(v ' ₁, v ' ₂, v ' ₃..., v ' _N) ^T, N is the video frame number, ((V ') ^(d)) ^TBe the capable i.e. human body attitude of d frame of d of human body attitude V ' to be predicted, (v ' _i) ^(d)Be 3 dimension articulation point v ' of i frame human body to be predicted _iIn d body joint point coordinate, K _R(r is zero r), K _RBe the matrix of a N * N, K _RIn the element of the capable j of i row be (K _R) _Ij,

Be the column vector of N * 1,

In the i row element do

{(K_{R}^{r})}_{i} = K_{R} (r_{i}, r),

K _R(r _i，r)＝cov(f(r _i)，f(r))，

(K _R) _ij＝K _R(r _i，r _j)，K _R(r _i，r _j)＝cov(f(r _i)，f(r _j))，

(f (the r of cov in the formula _i), f (r _j)) be f (r _i), f (r _j) between covariance function, f (r _i) be the zero-mean Gaussian function of the bandlet2 characteristic of i frame, f (r _j) be the zero-mean Gaussian function of the bandlet2 characteristic of j frame, f (r) is the zero-mean Gaussian function of the bandlet2 characteristic of input.

Step 5: initialization human skeleton model

5a) to step 2b) the initial time video image articulation point position that obtains carries out manual demarcation, and by nominal data the corresponding human skeleton of initial time human body attitude being set is v ₀, v wherein ₀Be 2b) in the human joint points position of detected first frame video image;

5b) t-1 is followed the tracks of the human skeleton that obtains constantly as t initialization human skeleton constantly, t＞0.

Step 6: set up similarity function

3D articulation point that 6a) obtains according to double gauss prediction and human joint points to be predicted are set up 3D under the n frame video apart from similarity function f ₁(v _n, v ' _n):

f_{1} (v_{n}, v_{n}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}} - v_{n_{i}}^{'} | |}_{2}, n &Element; [1, N],

Wherein, N is the video frame number, || || ₂Represent 2 norms,

Be articulation point to be predicted,

The articulation point that predicts for double gauss;

Articulation point that 6b) predicts according to double gauss and the projection of human joint points on the 2D plane to be predicted are set up 2D under the n frame video apart from similarity function

f_{2} (v_{n}^{q}, v_{p_{n}}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}}^{q} - v_{p_{n_{i}}}^{'} | |}_{2}, n &Element; [1, N],

Wherein, N is the video frame number, || || ₂Represent 2 norms,

Be articulation point to be predicted

Projection,

The articulation point that predicts for double gauss

Projection.

Step 7: optimize similarity function

Under two similarity functions that in step 6, obtain and the constraint of the skeleton length in the step 1, set and find the solution two similarity function f ₁(v _n, v ' _n) and

The system of equations of minimum value:

\{\begin{matrix} \arg \min f_{1} (v_{n}, v_{n}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}} - v_{n_{i}}^{'} | |}_{2}, \\ \arg \min f_{2} (v_{n}^{q}, v_{p_{n}}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}}^{q} - v_{p_{n_{i}}}^{'} | |}_{2}, \\ s . t . | | L_{i} v_{n} | | = l_{i}, i = 1,2, \cdot \cdot \cdot \end{matrix}

Wherein, l _iBe i human body bone length, m is the skeleton number, n ∈ [1, N], and N is the video frame number, arg min () expression is minimized, || || ₂Represent 2 norms.

Utilize non-domination neighborhood immune algorithm, when t, be engraved in the minimum value of solving equation group under the bone length constraint, obtain t one group of human skeleton similar constantly with the real human body athletic posture.

Step 8: select human body optimal movement attitude

At t constantly to each by the human skeleton that step 7 obtains, calculate the Euclidean distance of the human skeleton articulation point that this skeleton articulation point and t-1 trace into constantly, select the human skeleton the most accurately that the minimum human skeleton of Euclidean distance traces into as t constantly.

Experiment simulation

Effect of the present invention can obtain checking through following emulation experiment:

Emulation experiment of the present invention compiles completion on Matlab 2010a, execution environment is the HP workstation under the Windows framework.The used video image of emulation experiment of the present invention is from the HumanEva database of Brown Univ USA, and the video image size is 320 * 240.

The emulation content

Emulation 1 uses the present invention that walking states is followed the tracks of, and the result is as shown in Figure 3.Human body among Fig. 3 is a raw video image, and the skeleton line of human body surface is for following the tracks of the optimal motion state that obtains.

As can beappreciated from fig. 3, the attitude of ambiguity does not appear in tracking results, has accurately recovered the human motion attitude, shows that the present invention can realize following the tracks of accurately to simple athletic posture.

Emulation 2 uses the present invention that the boxing state is followed the tracks of, and the result is as shown in Figure 4.Human body image among Fig. 4 is a raw video image, and the skeleton line on human body image surface is for following the tracks of the optimal motion state that obtains.

As can be seen from Figure 4, tracking results does not have the ambiguity attitude to occur, and has accurately recovered the human motion attitude, shows that this method also can realize accurate tracking to the human motion state of complicacy.

Analysis of simulation result: also can find out from Fig. 3, Fig. 4; The present invention is basic identical with real human motion attitude to different motion state video image tracking results; Effectively solve the ambiguity problem that human motion is followed the tracks of, improved the accuracy and the stability of following the tracks of.Main cause is that this method has adopted two similarity functions, and the better utilization video image information has added human skeleton length constraint condition when minimizing two similarity functions, has limited the ambiguity human body attitude and has occurred.

Claims

1. the human body motion tracking method that combines of production and discriminant comprises the steps:

(2) pre-service human body video image

r＝(r ₁，r ₂，r ₃，...，r _N) ^T，

V^{'} = {(v_{1}^{'}, v_{2}^{'}, v_{3}^{'}, . . ., v_{N}^{'})}^{T},

Wherein, r _iBe the Bandlet2 characteristics of image of i two field picture, i ∈ [1, N], () ^TThe commentaries on classics order of representing matrix;

(4) initialization human skeleton model

(5) make up similarity function

V＝(v ₁，v ₂，v ₃，...，v _N) ^T，

V^{q} = (v_{1}^{q}, v_{2}^{q}, v_{3}^{q}, . . ., v_{N}^{q}),

Wherein, v _iBe the 3D articulation point of i two field picture, i ∈ [1, N], Be the 2D articulation point of i frame, i ∈ [1, N], N is the video frame number;

V_{p}^{'} = {(v_{p_{1}}^{'}, v_{p_{2}}^{'}, v_{p_{3}}^{'}, . . ., v_{p_{N}}^{'})}^{T},

Wherein,

is the projection of i frame articulation point 3D articulation point on 2D; I ∈ [1, N];

(6) utilize non-domination neighborhood immune algorithm, to t constantly two apart from similarity function f ₁(v _i, v ' _i),

Under the bone length constraint, find the solution minimum value, obtain t one group of human skeleton similar constantly with the real human body athletic posture;

2. human body motion tracking method according to claim 1, wherein the described use double gauss of step (3) TGP algorithm dopes v ' _i, undertaken by following formula:

[\begin{matrix} {({(V^{'})}^{(d)})}^{T} \\ {(v_{i}^{'})}^{(d)} \end{matrix}] &Proportional; N_{R} (0, [\begin{matrix} K_{R} & K_{R}^{r} \\ {(K_{R}^{r})}^{T} & K_{R} (r, r) \end{matrix}])

Wherein, N _R() expression Gaussian process, r is the bandlet2 characteristic of input, ((V ') ^(d)) ^TBe the capable i.e. human body attitude of d frame of d of human body attitude V ' to be predicted, (v ' _i) ^(d)Be 3 dimension articulation point v of i frame human body to be predicted _i' in d body joint point coordinate, K _R(r is zero r), K _RBe the matrix of a N * N, K _RIn the element of the capable j of i row be (K _R) _Ij, Be the column vector of N * 1,

In the i row element do

{(K_{R}^{r})}_{i} = K_{R} (r_{i}, r),

K _R(r _i，r)＝cov(f(r _i)，f(r))，

(K _R) _ij＝K _R(r _i，r _j)，K _R(r _i，r _j)＝cov(f(r _i)，f(r _j))，

3. human body motion tracking method according to claim 1, wherein step 5c) in 3D distance 2 norms be:

f_{1} (v_{i}, v_{i}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}} - v_{n_{i}}^{'} | |}_{2},

Wherein,

Be n two field picture human skeleton articulation point v to be estimated _nIn the 3D coordinate of i articulation point,

The n two field picture human skeleton articulation point v ' that predicts for double gauss _nIn the 3D coordinate of i articulation point, n ∈ [1, N].

4. human body motion tracking method according to claim 1, wherein step 5c) in 2D distance 2 norms be:

f_{2} (v_{i}^{q}, v_{p_{i}}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}}^{q} - v_{p_{n_{i}}}^{'} | |}_{2},

Wherein,

Be n two field picture human skeleton articulation point v to be estimated _nThe 2D coordinate of i articulation point of projection on the 2D plane,

For using double gauss to predict n two field picture skeleton articulation point v ' _nThe 2D coordinate of i articulation point of projection on the 2D plane, n ∈ [1, N].

5. human body motion tracking method according to claim 1; Step 2a wherein) described extraction human body contour outline; Human body contour outline being carried out the axis micronization processes, form the human skeleton line, is to adopt least square intermediate value LMedS method to obtain background image earlier; Human motion image and background image are done pixel difference, obtain the background subtraction image; Adopt morphological method to remove the noise of cutting apart in the background subtraction image again, obtain human body silhouette clearly; Adopt border following algorithm to obtain human body silhouette outline at last, extract the axis refinement human body silhouette of silhouettes, obtain the skeleton line of human body silhouette.

6. human body motion tracking method according to claim 1, the characteristics of image of the second generation strip wave conversion Bandlet2 of the extraction video image described in the step 3) wherein, carry out according to following steps:

3a) import pending video image earlier, extract human body block diagram in the image, block diagram is carried out two-dimentional multi-scale wavelet transformation;

3d) extract maximum geometry flow statistical nature as the final image character representation.

7. human body motion tracking method according to claim 1 wherein utilizes non-domination neighborhood immune algorithm described in the step 6), to t constantly two apart from similarity function f ₁(v _n, v ' _n),

Under the bone length constraint, find the solution minimum value, carry out according to following formula:

\{\begin{matrix} \arg \min f_{1} (v_{n}, v_{n}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}} - v_{n_{i}}^{'} | |}_{2}, \\ \arg \min f_{2} (v_{n}^{q}, v_{p_{n}}^{'}) = Σ_{i = 1}^{15} {| | v_{n_{i}}^{q} - v_{p_{n_{i}}}^{'} | |}_{2}, \\ s . t . | | L_{i} v_{n} | | = l_{i}, i = 1,2, \cdot \cdot \cdot \end{matrix}