CN102930516B

CN102930516B - Data driven and sparsely represented three-dimensional human motion denoising method

Info

Publication number: CN102930516B
Application number: CN201210462761.XA
Authority: CN
Inventors: 肖俊; 林海; 冯银付
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2012-11-16
Filing date: 2012-11-16
Publication date: 2015-04-29
Anticipated expiration: 2032-11-16
Also published as: CN102930516A

Abstract

The invention discloses a data driven and sparsely represented three-dimensional human motion denoising method. The method comprises the steps of dividing human motion gestures, obtaining time sequence segments of each part by using a sliding window, further obtaining fine-grained human motion space-time semantic features, searching k retrieval results nearest to the current segments suffering from noise interference from a motion database containing pure given three-dimensional gestures, and recovering and reconstructing the motion through a sparsely represented optimization frame by using the k candidate motion segments. A good denoising effect is obtained for the most common gaussian noise and singular value spot noise in the human motion.

Description

the 3 d human motion denoising method of a kind of data-driven and sparse expression

Technical field

The present invention relates to three-dimensional graphics and field of virtual reality, particularly relate to a kind of human body movement data denoising method of data-driven.

Background technology

Human motion capture technology is widely applied to cartoon making, film special efficacy, and the entertainment field such as computer game bring huge economic benefit, and these all application all depend on high-precision human body movement data.But; even commercial human motion capture equipment very expensive at present; owing to being subject to the reasons such as mutually blocking between gauge point, by mistake mark, usually there will be the phenomenon that exercise data comprises noise and distance values point, and cause needs to carry out denoising to these exercise datas in advance.

Denoising method is the earliest based on artificial detection and manual correction, need the animation teacher of specialty, check whether exercise data comprises noise and utilize mouse drag the point correction caught the exception to be returned frame by frame, under the application prerequisite nowadays needing mass data processing, this is very time-consuming and consumes cost of labor.Afterwards, automatic noise-removed technology based on Gauss or Kalman filter is suggested, but the correlativity that these wave filters are only applicable to each independent dimension of exercise data and have ignored between human body movement data dimension, causes the result after denoising to seem not nature.

But the data after these meticulous process, do not have reusability, are often only applicable to certain specific applied environment, are then just put aside as useless.

In recent years, denoising method based on data-driven is risen gradually, these methods utilize the pure exercise data of having caught as database support, using the motion of current input as querying condition, search neighbour's data similar with it, these neighbour's data are utilized to be reconstructed, above the difference of each method is mainly reflected in and how reconstructs.This method makes exercise data to reuse, and reduce further industrial cost and the cost performance that improve capturing movement.But this method, due to based on matrix manipulation, is regarded an entirety as each dimension of exercise data, and be have ignored kinetic characteristics different between different limbs (left and right arm and leg).

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, the 3 d human motion denoising method of a kind of data-driven and sparse expression is provided, both the correlativity between each dimension of human body movement data had been considered, avoid them integrally again, but obtain more fine-grained expression structure according to different limb parts.

The object of the invention is to be achieved through the following technical solutions: the 3 d human motion denoising method of a kind of data-driven and sparse expression, comprises the steps:

(1) pure human body movement data storehouse is downloaded;

(2) capturing movement equipment is used to catch human body movement data: to use the trace information of Vicon motion capture system to the marker point being affixed on human synovial place to catch;

(3) pre-service is carried out to step 1 and 2 data obtained: be TRC form by the input data of catching in the data of database in step 1 and 2 from BVH format conversion, translation, rotational transform are carried out to each attitude simultaneously; The initial point of the root node motion being positioned at buttocks of exercise data to global coordinate system mainly goes by translation transformation, and rotational transform is alignd with the x coordinate in global coordinate system by the normal vector of the plane at body trunk place (the marker point institute's matching plane out by health pastes), ensure the attitude after all process have identical position and towards; Write down transformation matrix , its contravariant is changed to ;

(4) human body attitude is divided into 5 subdivisions: pretreated exercise data is the two-dimensional matrix of a d*T, d is that human synovial is counted out and is multiplied by 3, namely one represents the vector of the coordinate position of institute's related node on x, y, z axle, and T represents the frame number of exercise data; By this matrix trace inequality for comprising five submatrixs of trunk (comprising head), left arm, right arm, left leg and right leg data, each submatrix only comprises the information of affiliated articulation point, for each subdivision, utilizes a width to be the moving window of frame, for natural number, sequentially on motion sequence, scanning obtains the motion segments of each subdivision, motion segments is pulled into a column vector, as a process primitive , i=1 ~ 5;

(5) fragment close with input motion fragment is searched in a database: for the process primitive having the input motion fragment of M frame to pull into of a subdivision , each motion segments in ergodic data storehouse, calculate the similarity between each motion segments in input motion fragment and database, the computing method of similarity are the mean value of Euclidean distance between frame and frame; Therefrom extract 1200 maximum motion segments of similarity as reconstruct primitive, each fragment matrix of reconstruct primitive is drawn a column vector and is spliced into matrix , be reconstruct dictionary;

(6) linear regression based on neighbour's fragment is carried out to the motion segments being subject to noise: the later process primitive of column vector is pulled into input motion fragment reconstruct dictionary corresponding with it , the framework of sparse expression can be utilized, calculate reconstruct dictionary in reconstruct primitive about reconstruction coefficients , for the exercise data comprising Gaussian noise, utilize solve sparse expression formula solve; For the data comprising singular value point, convex programming is utilized to solve sparse expression formula solve, wherein, for sparse regular terms parameter; The reconstruction coefficients solving out be multiplied by reconstruct dictionary , the exercise data after denoising can be obtained ;

(7) by the motion segments in five submatrixs calculating be redirected back to human joint points, the three dimensional space coordinate after the denoising of every frame human joint points can be obtained, just coordinate be now through conversion after, need the reverse transform matrix be multiplied by step 1 , obtain human body attitude original towards and side-play amount; So just obtain pure human body movement data.

The invention has the beneficial effects as follows, the present invention, by the method based on the pure neighbour's motion segments of database lookup, makes to not only avoid loaded down with trivial details craft consuming time and corrects link, and the exercise data of catching before making obtains and reuses, and reduces industrial cost.The Optimization Framework based on sparse expression of the present invention by proposing, avoid general data and drive model training link complicated in denoising method, and propose slightly different optimization aim formula for Gaussian noise modal in exercise data and singular value spot noise, make that there is better denoising effect.

Accompanying drawing explanation

Fig. 1 is the process flow diagram based on data-driven and the denoising of sparse expression human body movement data.

Embodiment

Describe the present invention in detail below in conjunction with accompanying drawing, object of the present invention and effect will become more obvious.

As shown in Figure 1, the human body movement data denoising method that the present invention is based on data-driven and sparse expression comprises the steps:

Step 1: download pure human body movement data storehouse.

Exercise data in database is the pure human body movement data not comprising any noise using accurate business equipment collection.The 3 d human motion data storehouse (can download from http://mocap.cs.cmu.edu) that can directly use CMU to provide, the various dissimilar motion gathered from different people is contained, as run, walking, jump and various sports in this database.

Step 2: use capturing movement equipment to catch human body movement data.

Use Vicon motion capture system (http://www.vicon.com/), the trace information of the marker point being affixed on human synovial place is caught, in order to be consistent with the data in database, we also use CMU(Carnegie Mellon University, CMU) marker point set allocation plan (with reference to http://mocap.cs.cmu.edu/markerPlacementGuide.pdf).Can ensure that database and pending data have identical dimensional and joint corresponding relation like this.

Step 3: pre-service is carried out to step 1 and 2 data obtained.

Use MotionBuilder software to be TRC form by the input data of catching in the data of database in step 1 and 2 from BVH format conversion, translation, rotational transform are carried out to each attitude simultaneously.The root node being arranged in buttocks of exercise data mainly (configures according to step 2 by translation transformation, root node is affixed on buttocks) initial point that moves to global coordinate system goes, and rotational transform is alignd with the x coordinate in global coordinate system by the normal vector of the plane at body trunk place (the marker point institute's matching plane out by health pastes), ensure the attitude after all process have identical position and towards.Write down transformation matrix , its contravariant is changed to .

Step 4: human body attitude is divided into 5 subdivisions.

Pretreated exercise data is the two-dimensional matrix of a d*T, and d is that human synovial is counted out and is multiplied by 3, and namely one represents the vector of the coordinate position of institute's related node on x, y, z axle, and T represents the frame number of exercise data.By this matrix trace inequality for comprising five submatrixs of trunk (comprising head), left arm, right arm, left leg and right leg data, each submatrix only comprises the information of affiliated articulation point, for each subdivision, utilizes a width to be the moving window of frame, for natural number, in embodiment be set to 25, sequentially on motion sequence, scanning obtains the motion segments of subdivision, motion segments is pulled into a column vector, as a process primitive , i=1 ~ 5.

Step 5: search the fragment close with input motion fragment in a database.

For the process primitive having the input motion fragment of M frame to pull into of a subdivision , i=1 ~ 5, each motion segments in ergodic data storehouse, calculate the similarity between each motion segments in input motion fragment and database, the computing method of similarity are the mean value of Euclidean distance between frame and frame.Therefrom extract 1200 maximum motion segments of similarity as reconstruct primitive, each fragment matrix of reconstruct primitive is drawn a column vector and is spliced into matrix , be reconstruct dictionary.

Step 6: the linear regression based on neighbour's fragment is carried out to the motion segments being subject to noise.

The later process primitive of column vector is pulled into input motion fragment reconstruct dictionary corresponding with it , the framework of sparse expression can be utilized, calculate reconstruct dictionary in reconstruct primitive about reconstruction coefficients , target is the motion segments wishing to recover linearly be made up of reconstruct primitive, and reconstruction coefficients rarefaction as far as possible, can have good adaptivity like this to the database comprising multi-motion type and style, meanwhile, and motion segments with input difference should meet the regularity of distribution of noise. as sparse regular terms parameter, be used for the sparse degree of control coefrficient.For the noise profile that Gaussian noise is different with singular value spot noise two kinds, this method proposes following two computing formula:

（1）

（2）

For the exercise data comprising Gaussian noise, utilize solve sparse expression formula (1), in embodiment, sparse regular terms parameter is set be 0.1; For the data comprising singular value point, utilize convex programming to solve sparse expression formula (2), sparse regular terms parameter is set be 10.The reconstruction coefficients solving out is multiplied by reconstruct dictionary, can obtain the exercise data after denoising.

Step 7: by the motion segments in five submatrixs calculating be redirected back to human joint points, the three dimensional space coordinate after the denoising of every frame human joint points can be obtained, just coordinate be now through conversion after, need the reverse transform matrix be multiplied by step 1 , obtain human body attitude original towards and side-play amount.So just obtain pure human body movement data.

The present invention, by the method based on the pure neighbour's motion segments of database lookup, makes to not only avoid loaded down with trivial details craft consuming time and corrects link, and the exercise data of catching before making obtains and reuses, and reduces industrial cost.The Optimization Framework based on sparse expression of the present invention by proposing, avoid general data and drive model training link complicated in denoising method, and propose slightly different optimization aim formula for Gaussian noise modal in exercise data and singular value spot noise, make that there is better denoising effect.

Claims

1. a 3 d human motion denoising method for data-driven and sparse expression, is characterized in that, comprise the steps:

(1) pure human body movement data storehouse is downloaded;

(3) pre-service is carried out to the data that step (1) and (2) obtain: be TRC form by the input data of catching in the data of database in step (1) and (2) from BVH format conversion, translation, rotational transform are carried out to each attitude simultaneously; The initial point of the root node motion being positioned at buttocks of exercise data to global coordinate system mainly goes by translation transformation, and rotational transform is being alignd body trunk place with the x coordinate in global coordinate system by the normal vector of marker point institute's matching plane out that health pastes, ensure the attitude after all process have identical position and towards; Write down transform matrix M _trans, its contravariant is changed to

(4) human body attitude is divided into 5 subdivisions: pretreated exercise data is the two-dimensional matrix of a d*T, d is that human synovial is counted out and is multiplied by 3, namely one represents the vector of the coordinate position of institute's related node on x, y, z axle, and T represents the frame number of exercise data; Be five submatrixs comprising trunk, left arm, right arm, left leg and right leg data by this matrix trace inequality, described trunk comprises head; Each submatrix only comprises the information of affiliated articulation point, for each subdivision, utilizes the moving window that a width is M frame, M is natural number, sequentially on motion sequence, scanning obtains the motion segments of each subdivision, motion segments is pulled into a column vector, as a process primitive y _i, i=1 ~ 5;

(5) fragment close with input motion fragment is searched in a database: for the process primitive y having the input motion fragment of M frame to pull into of a subdivision _i, each motion segments in ergodic data storehouse, calculate the similarity between each motion segments in input motion fragment and database, the computing method of similarity are the mean value of Euclidean distance between frame and frame; Therefrom extract 1200 maximum motion segments of similarity as reconstruct primitive, each fragment matrix of reconstruct primitive is drawn a column vector and is spliced into matrix D _i, D _ibe reconstruct dictionary;

(6) linear regression based on neighbour's fragment is carried out to the motion segments being subject to noise: the later process primitive y of column vector is pulled into input motion fragment _ireconstruct dictionary D corresponding with it _i, can utilize the framework of sparse expression, the reconstruct primitive in calculating reconstruct dictionary is about y _ireconstruction coefficients ω _i, for the exercise data comprising Gaussian noise, utilize l1-ls to solve sparse expression formula solve; For the data comprising singular value point, convex programming is utilized to solve sparse expression formula ω _i=arg min||y _i-D _iω _i|| ₁+ λ || ω _i|| ₁solve, wherein, λ is sparse regular terms parameter; The reconstruction coefficients ω solving out _ibe multiplied by reconstruct dictionary D _i, the exercise data z after denoising can be obtained _i;

(7) by the motion segments z in five submatrixs calculating _ibe redirected back to human joint points, the three dimensional space coordinate after the denoising of every frame human joint points can be obtained, just coordinate be now through conversion after, need to be multiplied by the reverse transform matrix in step (1) , obtain human body attitude original towards and side-play amount; So just obtain pure human body movement data.