CN104268138B

CN104268138B - Merge the human body motion capture method of depth map and threedimensional model

Info

Publication number: CN104268138B
Application number: CN201410205213.8A
Authority: CN
Inventors: 肖秦琨; 谢艳梅
Original assignee: Xian Technological University
Current assignee: Xian Technological University
Priority date: 2014-05-15
Filing date: 2014-05-15
Publication date: 2017-08-15
Anticipated expiration: 2034-05-15
Also published as: CN104268138A

Abstract

The present invention relates to a kind of human body motion capture method for merging depth map and threedimensional model.Optical motion catching method needs binding mark point, and mutual inconvenience, mark point is easily obscured, blocked.The depth information of present invention collection human action, remove moving target background, obtain complete human action depth information, human body three-dimensional point cloud information is changed into, human action threedimensional model is obtained, sets up database, corresponded with the human action skeleton database data of foundation, the depth information for extracting human action to be identified builds threedimensional model, and then carrying out similitude with the human action in three-dimensional modeling data storehouse matches, and capturing movement result is used as by similitude sequence output human action skeleton.The present invention is without the install sensor on human body or addition mark point, it is convenient easily to realize, the method using canonical Time alignment is matched in motion sequence, the precision of two sequences match is improved, the time of matching is set significantly to drop, it is ensured that the speed and precision of capturing movement.

Description

Merge the human body motion capture method of depth map and threedimensional model

Technical field

The invention belongs to technical field of multimedia information retrieval, and in particular to a kind of people of fusion depth map and threedimensional model Body motion capture method.

Background technology

Human motion capture technology is the hot issue in multimedia information retrieval field, particularly in video display animation, game Deng development in, with wide application prospect, domestic and international many research institutions are just being directed to the research in this direction.In recent years, With the fast development of capturing movement technology, the rise in the field such as three-dimensional video display animation, game, people's interaction of new generation is very more Complicated and true to nature human action needs quickly captured application, it is necessary to which a kind of fast and effectively method carries out human motion Capture.The method for capturing movement for the optical profile type having pointed out at present, is based primarily upon principle of computer vision, by specific in target The monitoring and tracking of luminous point completes motion-captured task.But above-mentioned method for capturing movement has some not enough：

(1) capturing movement of optical profile type needs the binding mark point with performing artist when in use, and needs performing artist to wear Upper special performance clothes, interaction is got up and inconvenient.When complicated movement, the mark points of different parts is likely to occur mixed Confuse, block, produce error result, at this moment need manual intervention last handling process.

(2) although it can catch real time kinematics, post processing (including the identification of mark point, tracking, the meter of space coordinate Calculate) workload it is larger, illumination, reflection case for performance venue have certain requirement, and device calibration is also more loaded down with trivial details.

The content of the invention

, can be effective gram it is an object of the invention to provide a kind of human body motion capture method for merging depth map and threedimensional model Taking movement range in existing method for capturing movement is restricted, catches effect distortion factor height and the larger technological deficiency of error.

The technical solution adopted in the present invention is：

Merge the human body motion capture method of depth map and threedimensional model, it is characterised in that：

Realized by following steps：

Step one：The depth information of human action is gathered, moving target background is removed, complete human action depth is obtained Information；

Step 2：The human action depth information information of extraction is converted to the three-dimensional point cloud information of adult body, human body is entered Row three-dimensional reconstruction, obtains the threedimensional model of human action；

Step 3：Repeat step one and step 2, based on the human action depth information largely gathered, set up human action Three-dimensional modeling data storehouse M={ Y₁,Y₂..., Y_n, M represents three-dimensional modeling data storehouse, Y_nRepresent n-th of threedimensional model；

Step 4：Human action skeleton is constructed according to the threedimensional model of organization of human body and human action, human body is set up and moves The skeleton data storehouse G={ S of work₁,S₂,…S_n, G represents human skeleton database, wherein S_nN-th of human skeleton data is represented, Wherein the skeleton data of human action is corresponded with three-dimensional modeling data；

Step 5：The depth information of human action to be identified is extracted, human action to be identified is built based on depth information Threedimensional model, then carries out similitude with the human action in three-dimensional modeling data storehouse and matches, by similitude sequence output human body Skeleton is acted, the minimum skeleton of similarity distance value is used as capturing movement result as optimal skeleton using optimal skeleton；

In step one, the human action depth information based on collection removes moving target background, obtains complete human body and moves Make concretely comprising the following steps for depth information：

(1) human action depth map F (x, y, a d of collection_p) represent, wherein, x,_yRespectively under pixel coordinate system Abscissa and ordinate, d_pFor depth information；Assuming that being based on the threshold value that depth information splits background area and target area：

T=(maxDepthVulue+minDepthVulue)/2

Wherein, maxDepthVulue and minDepthVulue are respectively the maximum and minimum value of image depth values, and T is remembered Record is in T₀In, according to threshold value T, by F (x, y, d_p) it is divided into background area and target area, background area and target area are obtained respectively The average depth value du in domain₁And du₂；

(2) T=(du are recalculated₁+du₂)/2, judge T and T₀It is whether equal, if unequal, then by T records in T₀ In, repeat the above steps, until T=T₀Set up, termination algorithm；Using the T finally obtained as optimal threshold to F (x, y, d_p) enter Row segmentation, removes background, obtains complete human action depth information d₀；

In step 2, the human action depth information of extraction is converted to the three-dimensional point cloud information of adult body, human body is carried out Three-dimensional reconstruction, the threedimensional model for obtaining human action is concretely comprised the following steps：

(1) by the depth information d extracted₀Normalization, it is assumed that the number of depth value is N, is calculated in depth value most Big value max d₀(k) with minimum value min d₀(k), the depth value after normalization is：

Z (n)=(d₀(n)-min d₀(k))/(max d₀(k)-min d₀(k)) k, n=1,2 ..., N

(2) Z=z (n) is made, world coordinate system is that, using video camera as origin, video camera can be regarded as after demarcation Preferable imaging model, according to simple similar triangles shift theory：

The X for obtaining world coordinate system, Y value can therefrom be calculated, wherein, x, y be respectively pixel coordinate system abscissa and Ordinate, f is the focal length of video camera；The three-dimensional point cloud information (X, Y, Z) of human action may finally be obtained；

In step 4, the specific steps of human action skeleton are constructed according to the threedimensional model of organization of human body and human action For：

(1) body trunk is approximately considered as quadrangle in depth map, is designated as Q；

Two summits above Q are exactly shoulder joint node a₁And a₂Position；

Neck joint point b is a₁、a₂The midpoint of line；

Along b points upwards, it is exactly topmost head node, c position；

Two summits below Q are hip joint d₁And d₂Position；

E is in d for hip artis₁、d₂Point midway；

(2) determine hand point and elbow joint position from a₁、a₂Place proceeds by search, if arm is stretched, and is set about according to upper Arm lengths ratio-dependent elbow joint point f₁、f₂With hand point g₁、g₂Position；If arm bending, have in the position of elbow joint point Flex point, the position of flex point is f₁、f₂, along f₁、f₂Continue search for being hand point g to terminal₁、g₂Position；

Similarly, knee joint point h can be determined according to hand point and elbow joint method for determining position₁、h₂And foot joint Point i₁、i₂Position；

(3) each body joint point coordinate is connected by organization of human body order with straight line, has just obtained human action skeleton；

In step 5, the depth information of human action to be identified is extracted, human action to be identified is built based on depth information Threedimensional model, then carry out concretely comprising the following steps of matching of similitude with the human action in three-dimensional modeling data storehouse：

(1) human action that threedimensional model is represented is clustered with hierarchical clustering algorithm：

1. assume that a parameter m represents the number of final clustering cluster, regard the data of each input as a single number According to cluster D₀, obtain the nearest cluster D adjacent with each D_x, the distance between aggregate of data can regard the distance between aggregate of data center as；

2. by two closest aggregate of data D_pAnd D_qMerge, the new aggregate of data D of generation_n, then calculate D_nWith it The distance of its cluster, if the number of current data cluster is more than m, then branch to the merging for 2. proceeding aggregate of data, otherwise calculate Method terminates, and obtains cluster result D={ u₁,u₂,…,u_m, wherein u represents the value of each class；

(2) each action action frame of human action sequence to be identified is calculated each with human action sequence in three-dimensional modeling data storehouse The distance between individual action action frame：

1. assume that any one threedimensional model cluster result is expressed as in human action to be identified and database：

D_i={ u '₁, u '₂..., u '_m}

D_j={ v₁, v₂..., v_m}

Since the root node of clustering tree, depth-first search is carried out until leaf node；

2. to D during traversal search_iAnd D_jIn each class calculation and object distance, the distance of each class is summed, Obtain the Euclidean distance of two clustering trees：

(3) Optimum Matching path is calculated with the method for canonical Time alignment, obtains Optimum Matching sequence：

1. assume that threedimensional model action sequence to be identified isThree-dimensional modeling data storehouse In set sequence beM and n represent the length of two action sequences, two respectively The distance between action action frame of each in sequence can calculate obtain in aforementioned manners；

2. because m and n may be unequal, its corresponding matching relationship may have many kinds, be advised using the canonical time Whole method, calculates the minimum range J between sequence, while obtaining best matching path P_xAnd P_y：

Here, W_x=W (p_x)∈{0,1}^m×lAnd W_y=W (p_y)∈{0,1}^n×lIt is two binary regular matrixes, to every The t ∈ { 1 of one step:L } haveDuring other situationsWherein l is the need that canonical time wrapping algorithm is automatically selected The number for the action action frame to be matched, l >=max (m, n)；φ () is regular terms：

P_xAnd P_yAll possible coupling path is represented, it is necessary to which the constraints met is：

Ψ={ { P_x,P_y}|P_x∈(1:m)^l,P_y∈(1:n)^l

Border：

Monotonicity：

Continuity：

WithIt is linear change matrix, d≤min (d_x,d_y), the constraints that need to be met For：

Wherein, λ ∈ [0,1]；

3. above step is repeated, using canonical Time alignment method, is calculated in three-dimensional modeling data storehouse per set sequence Arrange the minimum range J and best matching path P of Y respectively with X_xAnd P_y；Then each J minimum value P corresponding with its is obtained_xWith P_y, so as to obtain corresponding best match sequence.

The present invention has advantages below：

(1) method involved in the present invention uses relevant device sampling depth information, and performing artist on human body without installing Sensor or addition mark point, movable amplitude are big, few for performance venue requirement, are convenient and easy to realize.

(2) present invention is mainly based upon video method for the capturing movement technology of conventional unmarked point, shoots two dimension Image sequence reaches the purpose of capturing movement, and the present invention has merged depth information, solves in the X-Y scheme without depth information Body blocks the problem of causing body part loss of learning certainly as in, it is ensured that the integrality and reliability of acquisition of information.

(3) present invention uses hierarchical clustering method, and the similarity of distance and rule is easily defined between aggregate of data, by data volume compared with Big three dimensional point cloud is reduced into several classes, in order to which later use Depth Priority Searching calculates two frame threedimensional models Euclidean distance, reduces amount of calculation.

(4) present invention is matched using canonical Time alignment (Canonical Time Warping, CTW) in motion sequence Method, this method has the advantages that algorithm robust, even and if test mode sequence and reference sequences pattern in database Time scale can not be completely the same, and it remains to preferably complete the pattern match between cycle tests and reference sequences.CTW is base In the method for Dynamic Programming, the 2. middle formula of step 5 (3) step is added with just in the object function of CTW methods, object function Then item and two regular matrixes of binary system, the possibility of unique solution is provided for object function, improves two sequences match Precision, makes the time of matching significantly drop, it is ensured that the speed and precision of capturing movement.

Brief description of the drawings

Fig. 1 is the overview flow chart of the inventive method.

Fig. 2 is the idiographic flow block diagram of step one of the present invention.

Fig. 3 is the idiographic flow block diagram of step 5 of the present invention.

Embodiment

With reference to embodiment, the present invention will be described in detail.

Fusion depth map involved in the present invention and the human body motion capture method of threedimensional model, are realized by following steps：

Step one：The depth information of human action is gathered using relevant device, moving target background is removed, obtained completely Human action depth information.

Human action depth information based on collection, removes moving target background, obtains complete human action depth letter What is ceased concretely comprises the following steps：

(1) human action depth map F (x, y, a d of collection_p) represent, wherein, under x, y are respectively pixel coordinate system Abscissa and ordinate, d_pFor depth information；Assuming that being based on the threshold value that depth information splits background area and target area：

T=(maxDepthVulue+minDepthVulue)/2

(2) T=(du are recalculated₁+du₂)/2, judge T and T₀It is whether equal, if unequal, then by T records in T₀ In, repeat the above steps, until T=T₀Set up, termination algorithm；Using the T finally obtained as optimal threshold to F (x, y, d_p) enter Row segmentation, removes background, obtains complete human action depth information d₀。

Step 2：The human action depth information of extraction is converted to the three-dimensional point cloud information of adult body, three are carried out to human body Dimension is rebuild, and is obtained the threedimensional model of human action, is concretely comprised the following steps：

(1) by the depth information d extracted₀Normalization, it is assumed that the number of depth value is N, is calculated in depth value most Big value maximum max d₀(k) with minimum value min d₀(k), the depth value after normalization is：

Z (n)=(d₀(n)-mind₀(k))/(maxd₀(k)-mind₀(k)) k, n=1,2 ..., N

The X for obtaining world coordinate system, Y value can therefrom be calculated, wherein, x, y be respectively pixel coordinate system abscissa and Ordinate, f is the focal length of video camera.The three-dimensional point cloud information (X, Y, Z) of human action may finally be obtained.

Step 3：Repeat step one and step 2, based on the human action depth information as much as possible largely gathered, build Three-dimensional modeling data storehouse M={ the Y of vertical human action₁,Y₂..., Y_n, M represents three-dimensional modeling data storehouse, Y_nRepresent n-th of three-dimensional Model.

Step 4：Human action skeleton is constructed according to the threedimensional model of organization of human body and human action, human body is set up and moves The skeleton data storehouse G={ S of work₁,S₂,…S_n, G represents human skeleton database, wherein S_nN-th of human skeleton data is represented, Wherein the skeleton data of human action is corresponded with three-dimensional modeling data.

Concretely comprising the following steps for human action skeleton is constructed according to the threedimensional model of organization of human body and human action：

Two summits above Q are exactly shoulder joint node a₁And a₂Position；

Neck joint point b is a₁、a₂The midpoint of line；

Along b points upwards, it is exactly topmost head node, c position；

Two summits below Q are hip joint d₁And d₂Position；

E is in d for hip artis₁、d₂Point midway；

(3) each body joint point coordinate is connected by organization of human body order with straight line, has just obtained human action skeleton.

Step 5：The depth information of human action to be identified is extracted, human action to be identified is built based on depth information Threedimensional model, then carries out similitude with the human action in three-dimensional modeling data storehouse and matches, by similitude sequence output human body Skeleton is acted, the minimum skeleton of similarity distance value is used as capturing movement result as optimal skeleton using optimal skeleton.

The depth information of human action to be identified is extracted, the three-dimensional mould of human action to be identified is built based on depth information Type, then with concretely comprising the following steps that the human action progress similitude in three-dimensional modeling data storehouse is matched：

(1) human action for representing threedimensional model is clustered with hierarchical clustering algorithm (THA)：

D_i={ u '₁, u '₂..., u '_m}

D_j={ v₁, v₂..., v_m}

(3) Optimum Matching road is calculated with the method for canonical Time alignment (Canonical Time Warping, CTW) Footpath, obtains Optimum Matching sequence：

Here, W_x=W (p_x)∈{0,1}^m×lAnd W_y=W (p_y)∈{0,1}^n×lIt is two binary regular matrixes, to every The t ∈ { 1 of one step:L } haveDuring other situationsWherein l is the need that canonical time wrapping algorithm is automatically selected The number for the action action frame to be matched, l >=max (m, n).φ () is regular terms：

Ψ={ { P_x,P_y}|P_x∈(1:m)^l,P_y∈(1:n)^l

Border：

Monotonicity：

Continuity：

WithIt is linear change matrix, the pact that need to be met Beam condition is：

Wherein, λ ∈ [0,1]；

3. above step is repeated, using canonical Time alignment method, is calculated in three-dimensional modeling data storehouse per set sequence Arrange the minimum range J and best matching path P of Y respectively with X_xAnd P_y.Then each J minimum value P corresponding with its is obtained_xWith P_y, so as to obtain corresponding best match sequence.

Present disclosure is not limited to cited by embodiment, and those of ordinary skill in the art are by reading description of the invention And any equivalent conversion taken technical solution of the present invention, it is that claim of the invention is covered.

Claims

1. merge the human body motion capture method of depth map and threedimensional model, it is characterised in that：

Realized by following steps：

Step one：The depth information of human action is gathered, moving target background is removed, complete human action depth letter is obtained Breath；

Step 2：The human action depth information information of extraction is converted to the three-dimensional point cloud information of adult body, three are carried out to human body Dimension is rebuild, and obtains the threedimensional model of human action；

Step 3：Repeat step one and step 2, based on the human action depth information largely gathered, set up the three of human action Dimension module database M={ Y₁,Y₂..., Y_n, M represents three-dimensional modeling data storehouse, Y_nRepresent n-th of threedimensional model；

Step 4：Human action skeleton is constructed according to the threedimensional model of organization of human body and human action, human action is set up Skeleton data storehouse G={ S₁,S₂,…S_n, G represents human skeleton database, wherein S_nN-th of human skeleton data is represented, wherein The skeleton data of human action is corresponded with three-dimensional modeling data；

Step 5：The depth information of human action to be identified is extracted, the three-dimensional of human action to be identified is built based on depth information Model, then carries out similitude with the human action in three-dimensional modeling data storehouse and matches, by similitude sequence output human action Skeleton, the minimum skeleton of similarity distance value is used as capturing movement result as optimal skeleton using optimal skeleton；

In step one, the human action depth information based on collection removes moving target background, obtains complete human action deep Spend concretely comprising the following steps for information：

(1) human action depth map F (x, y, a d of collection_p) represent, wherein, x, y are respectively the horizontal stroke under pixel coordinate system Coordinate and ordinate, d_pFor depth information；Assuming that being based on the threshold value that depth information splits background area and target area：

T=(maxDepthVulue+minDepthVulue)/2

Wherein, maxDepthVulue and minDepthVulue are respectively the maximum and minimum value of image depth values, and T records are existed T₀In, according to threshold value T, by F (x, y, d_p) it is divided into background area and target area, background area and target area are obtained respectively Average depth value du₁And du₂；

(2) T=(du are recalculated₁+du₂)/2, judge T and T₀It is whether equal, if unequal, then by T records in T₀In, weight Multiple above-mentioned steps, until T=T₀Set up, termination algorithm；Using the T finally obtained as optimal threshold to F (x, y, d_p) divided Cut, remove background, obtain complete human action depth information d₀；

In step 2, the human action depth information of extraction is converted to the three-dimensional point cloud information of adult body, human body carried out three-dimensional Rebuild, the threedimensional model for obtaining human action is concretely comprised the following steps：

(1) by the depth information d extracted₀Normalization, it is assumed that the number of depth value is N, calculates the maximum in depth value max d₀(k) with minimum value min d₀(k), the depth value after normalization is：

Z (n)=(d₀(n)-min d₀(k))/(max d₀(k)-min d₀(k)) k, n=1,2 ..., N

(2) Z=z (n) is made, world coordinate system is that, using video camera as origin, video camera can regard preferable as after demarcation Imaging model, according to simple similar triangles shift theory：

The X for obtaining world coordinate system, Y value can be therefrom calculated, wherein, x, y are respectively the abscissa and vertical seat of pixel coordinate system Mark, f is the focal length of video camera；The three-dimensional point cloud information (X, Y, Z) of human action may finally be obtained；

In step 4, concretely comprising the following steps for human action skeleton is constructed according to the threedimensional model of organization of human body and human action：

Two summits above Q are exactly shoulder joint node a₁And a₂Position；

Neck joint point b is a₁、a₂The midpoint of line；

Along b points upwards, it is exactly topmost head node, c position；

Two summits below Q are hip joint d₁And d₂Position；

E is in d for hip artis₁、d₂Point midway；

(2) determine hand point and elbow joint position from a₁、a₂Place proceeds by search, if arm is stretched, long according to upper lower arms Spend ratio-dependent elbow joint point f₁、f₂With hand point g₁、g₂Position；If arm bending, turned in the position of elbow joint point Point, the position of flex point is f₁、f₂, along f₁、f₂Continue search for being hand point g to terminal₁、g₂Position；

Similarly, knee joint point h can be determined according to hand point and elbow joint method for determining position₁、h₂With foot joint point i₁、i₂Position；

In step 5, the depth information of human action to be identified is extracted, the three of human action to be identified are built based on depth information Dimension module, then with concretely comprising the following steps that the human action progress similitude in three-dimensional modeling data storehouse is matched：

1. assume that a parameter m represents the number of final clustering cluster, regard the data of each input as a single aggregate of data D₀, obtain the nearest cluster D adjacent with each D_x, the distance between aggregate of data can regard the distance between aggregate of data center as；

2. by two closest aggregate of data D_pAnd D_qMerge, the new aggregate of data D of generation_n, then calculate D_nWith other clusters Distance, if the number of current data cluster be more than m, then branch to the merging for 2. proceeding aggregate of data, otherwise algorithm knot Beam, obtains cluster result D={ u₁,u₂,…,u_m, wherein u represents the value of each class；

(2) calculating each action action frame of human action sequence to be identified and the human action sequence in three-dimensional modeling data storehouse, each is moved Make the distance between frame：

D_i={ u '₁, u '₂..., u '_m}

D_j={ v₁, v₂..., v_m}

2. to D during traversal search_iAnd D_jIn each class calculation and object distance, the distance of each class is summed, obtained The Euclidean distance of two clustering trees：

1. assume that threedimensional model action sequence to be identified isIn three-dimensional modeling data storehouse Set sequence isM and n represent the length of two action sequences, two sequences respectively In the distance between each action action frame can calculate obtain in aforementioned manners；

2. because m and n may be unequal, its corresponding matching relationship may have many kinds, utilize canonical Time alignment Method, calculates the minimum range J between sequence, while obtaining best matching path P_xAnd P_y：

Here, W_x=W (p_x)∈{0,1}^m×lAnd W_y=W (p_y)∈{0,1}^n×lIt is two binary regular matrixes, to each step T ∈ { 1:L } there is w_pt,t=1, w during other situations_pt,t=0, wherein l are the need for canonical time wrapping algorithm is automatically selected The number for the action action frame matched somebody with somebody, l >=max (m, n)；φ () is regular terms：

Ψ={ { P_x,P_y}|P_x∈(1:m)^l,P_y∈(1:n)^l

Border：

Monotonicity：

Continuity：

WithIt is linear change matrix, d≤min (d_x,d_y), the constraints that need to be met is：

Wherein, λ ∈ [0,1]；

3. above step is repeated, using canonical Time alignment method, each group of Y points of action sequence in three-dimensional modeling data storehouse is calculated Not with X minimum range J and best matching path P_xAnd P_y；Then each J minimum value P corresponding with its is obtained_xAnd P_y, from And obtain corresponding best match sequence.