CN100369064C

CN100369064C - Human body posture deforming method based on video content

Info

Publication number: CN100369064C
Application number: CNB200510012176XA
Authority: CN
Inventors: 邱显杰; 王兆其; 夏时洪
Original assignee: Institute of Computing Technology of CAS
Current assignee: DEQING ZHONGKE FINANCE INFORMATION TECHNOLOGY Co Ltd
Priority date: 2005-07-14
Filing date: 2005-07-14
Publication date: 2008-02-13
Anticipated expiration: 2025-07-14
Also published as: CN1725246A

Abstract

The present invention relates to the technical field of computer application, particularly to a body posture deforming method based on video content, which is used for restoring 3D body structure corresponding to a video according to the video content on the basis of the known initial three-dimensional posture information of a body. The method comprises the following steps: A 3D body model is customized according to the video content, and 3D body posture is projected so as to generate a model skeleton; body profile information in the video is extracted, and the corresponding relation between the point groups of a video profile and a model profile is set up; the 2D skeleton in the model profile is transplanted to the video profile, and the 3D structure parameters of the body in the video is restored. The present invention has the advantages of low quality requirements of the extraction of the video profile, enough robustness, the application of the 3D structure parameter restoration of different types of moving objects of which the initial 3D information is known and fine universality; the present invention only needs simple and highly efficient two-dimensional operation so as to achieve a real-time effect.

Description

Human body posture deformation method based on video content

Technical Field

The invention relates to the technical field of computer application, in particular to a human body posture deformation method based on video content.

Background

The acquisition of three-dimensional human body posture information from a two-dimensional video is a hotspot and difficult problem in the fields of computer vision, mode identification, virtual reality and intelligent human-computer interfaces. Herein, by convention, we uniformly shorthand two-dimensions as 2D and three-dimensions as 3D.

On the premise of knowing the 3D information of the initial human body posture, how to deform the initial posture according to the video content to obtain the corresponding 3D information in the video is a very important sub-problem, and the method has important research significance and wide application prospects in the aspects of games, key frame 3D animation, 3D information acquisition based on example data and the like. For example, in the field of video-based human motion analysis, only a small sample of 3D human body posture library is needed, and the corresponding human body posture information can be recovered according to the video image. For another example, in the animation field, as long as a small sample of 2D keyframes and corresponding 3D pose information are given, the 3D information of all 2D animation frames can be recovered, and the effect of 3D animation can be easily achieved.

Therefore, the human body posture deformation technology based on the video content has great theoretical significance, and also has wide application fields and important practical value. However, the existing video-based motion analysis and reconstruction software at home and abroad does not provide the human body posture deformation technology function based on video contents. When relevant patent searching is carried out, information of any relevant patent is not searched.

Disclosure of Invention

The invention aims to provide a human body posture deformation method based on video content, which realizes that the 3D posture is deformed according to the video content under the condition that the initial 3D posture is known, so that the corresponding human body 3D posture information in the video is recovered.

In order to achieve the above object, the present invention provides a human body posture deformation method based on video content, which is used for deforming an initial human body 3D posture according to video content based on the known initial human body three-dimensional posture information and according to the human body information content in the video, so as to find out corresponding 3D human body structure information in the video; the method comprises the following steps:

1) Customizing a 3D human body model according to the video content;

2) Projecting the 3D human body posture (surface geometric model description) customized in the step 1 on a 2D plane to generate a 2D model outline, and representing the 2D model outline by using a sampling point set;

3) Projecting the 3D human body posture (skeleton model description) on a 2D plane to generate a 2D model skeleton;

4) Extracting human body contour information in the video and expressing the human body contour information by using a sampling point set;

5) Establishing a point set corresponding relation between a human body contour and a model contour in a video;

6) Transplanting the 2D skeleton in the model outline to the human body outline in the video;

7) And recovering the 3D structural parameters of the human body in the video.

In the above technical solution, the establishing of the point set corresponding relationship between the human body contour and the model contour in the video in step 5) is realized as follows:

calculating the shape context of each point in the contour, namely establishing the measurement of the point and all other points in the contour by distance and angle;

and taking shape context as a standard for measuring the similarity of the feature points, wherein the two feature points with the most similar shape context in the two contours are matched feature points, so that the point set corresponding relation between the model contour and the video contour is established.

In the above technical solution, the step 6) of transplanting the 2D skeleton in the model contour to the video contour is implemented as follows:

determining a support set of each joint point of the 2D skeleton;

determining affine transformation relations of all support sets of the 2D joint points through the established point set corresponding relations between the two contours;

and performing the same transformation of the affine transformation determined by the support set on the joint point position of the 2D framework of the model contour, wherein the transformation result is the 2D framework joint point position of the video contour.

In the above technical solution, the restoring of the 3D structure parameters of the human body in the video in step 7) is implemented as follows:

only the known initial 3D human skeleton needs to be transformed.

The transformation of the known initial 3D human skeleton is realized by the following steps:

keeping the data in the depth Z direction of the original data unchanged,

the data in the X and Y directions are affine transformed as the previously determined 2D joint points projected thereon.

The invention has the advantages that:

1. the method realizes the human body posture deformation method based on the video content, not only has important theoretical significance, but also has wide application range and important use value.

2. The method can be used for various types of moving objects and has good universality.

3. The invention only needs simple and efficient two-dimensional operation and can achieve the real-time effect.

Drawings

Fig. 1 is a flow chart of a human body posture deformation technology based on video content.

Detailed Description

The method of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, it is a flowchart of the method of this embodiment, in which the dashed boxes indicate operations, and the solid boxes indicate results obtained by related operations.

The human body posture deformation technology based on the video content mainly comprises the following steps:

step 1, customizing a 3D human body model according to video content;

step 2, projecting the 3D human body posture (surface geometric model description) customized in the step 1 on a 2D plane to generate a 2D model outline, and representing the 2D model outline by using a sampling point set;

a1, describing 3D posture data by a surface geometry model through the customized 3D human body model in the last step;

b1, determining a visual angle of human body posture display from a given video, and projecting a 3D posture described by a surface geometric model on a 2D plane by using the visual angle to generate a 2D model outline;

c1, representing the generated 2D model contour by using sampling points of a contour boundary (for example, sampling 200 points);

and 3, projecting the 3D human body posture (skeleton model description) on a 2D plane to generate a 2D model skeleton. Projecting the 3D posture described by the skeleton model on a 2D plane by adopting the visual angle information obtained from the video in the previous step to generate a 2D model skeleton represented by the position of a 2D joint point;

and 4, extracting the human body contour information in the video and expressing the human body contour information by using a sampling point set. The method for extracting the human body outline in the video is many, and a simple background cutting method is adopted; meanwhile, the video profile is represented by sampling points of the profile boundary (for example, sampling 200 points), and the number of the sampling points is the same as that of the sampling points of the model profile;

and 5, establishing a point set corresponding relation between the human body contour and the model contour in the video. Establishing the corresponding relation between the contour point sets is realized by adopting a Shape Context (Shape Context) method;

the specific steps of establishing the corresponding relation of the feature points in the point set by using a Shape Context (Shape Context) method are as follows:

a2, for each feature point, establishing a vector set which takes the point as an origin and all other feature points in the contour as an end point;

b2, dividing the vector set into 12 parts and 5 parts according to the angle space and the mode space, then counting the number of the vectors falling into the divided space, and using the information to make a 12 x 5 histogram. The histogram is called Shape Context (Shape Context);

c2, establishing the shape context of each characteristic point in the sampling point set by the step b 2;

d2, representing the shape context of the feature point by a 60-dimensional vector, and then using Euclidean distance as a distance measure between the shape context of the feature point in the set and the shape context of the given feature point:

for example, let (x) ₁ ，x ₂ ，...，x ₆₀ ) And (y) ₁ ，y ₂ ，..，y ₆₀ ) The euclidean distance between the shape contexts of the two points is as follows:

the feature points with the closest distance between the shape contexts are the feature points that are the closest to the given feature point. According to the method, the point correspondence of the two feature point sets can be established.

And 6, transplanting the 2D skeleton in the model outline to the human body outline in the video. The 2D skeleton in the model outline is transplanted to the human body outline in the video by the following steps:

a3, determining a support set of each joint point of the 2D skeleton; the support set is a set of feature points in a circular area with a relevant node as a circle center and R (custom) as a radius, and the support set is a subset of the feature point set;

b3, solving the two-dimensional affine transformation relation of each support set corresponding to the 2D joint points between the model outline and the video outline;

in the form x' = a _xx x+a _xy y+b _x ，y′＝a _yx x+a _yy y+b _y The coordinate transformation of (a) is called two-dimensional affine transformation (affinetranformation). The transformed coordinates x 'and y' are both linear functions of the original coordinates x and y. Parameter a _ij And b _k Is a constant determined by the type of transform.

In the known corresponding point set:

[(x ₁ ，y ₁ )，(x ₂ ，y ₂ )，(x ₃ ，y ₃ )，...，(x _n ，y _n )]and [ (x) ₁ ′，y ₁ ′)，(x ₂ ′，y ₂ ′)，(x ₃ ′，y ₃ ′)，...(x _n ′，y _n ′)]If, then, the following overdetermined equation is solved by the least square method to obtain the corresponding affine transformation:

and c3, performing the same transformation of the affine transformation determined by the support set on the joint point position of the 2D framework of the model contour, wherein the transformation result is the 2D framework joint point position of the video contour.

And 7, restoring the 3D structural parameters of the human body in the video. By transforming the known initial 3D body (model) pose:

a4, keeping the data of the initial 3D (model) attitude data in the depth Z direction unchanged;

and b4, performing affine transformation on the data in the X and Y directions of the initial 3D (model) attitude data, which is determined in the previous step and is similar to the corresponding 2D joint points after projection.

The specific operation is as follows: let (x) _o ，y _o ，z _o ) Position of articulation point in initial pose, (x' _o ，y′ _o ) Is the projection position (parallel projection) of the joint point on the 2D plane, which corresponds to the joint point position in the model skeleton; (x' _t ，y′ _t ) Is prepared from (x' _o ，y′ _o ) And coordinates after affine transformation, which correspond to the joint point positions of the estimated video skeleton.

Namely: x' _t ＝a _xx x′ _o +a _xy y′ _o +bx，y′ _t ＝a _yx x′ _o +a _yy y′ _o +b _y

Since an affine projection model (parallel projection) is used, (x) _o ，y _o ，z _o ) And (x' _o ，y′ _o )， (x _t ，y _t ，z _t ) And (x' _t ，y′ _t ) The following relationship holds true:

(k is a known scaling factor).

Then, and (x' _t ，y′ _t ) Corresponding three-dimensional coordinates (x) _t ，y _t ，z _t ) Is composed of

(x _t ，y _t ，z _t ) This is also the joint point position information of the corresponding 3D skeleton in the video.

Claims

1. A human body posture deformation method based on video content is used for deforming an initial human body 3D posture according to video content on the basis of known initial human body three-dimensional posture information and according to the human body information content in a video, and therefore corresponding 3D human body structure information in the video is obtained; the method comprises the following steps:

1) Customizing the 3D human body model according to the video content;

2) Projecting the 3D human body posture which is customized in the step 1 and described by a surface geometric model on a 2D plane to generate a 2D model outline, and representing the 2D model outline by using a sampling point set;

3) Projecting the 3D human body posture described by the skeleton model on a 2D plane to generate a 2D model skeleton;

4) Extracting human body contour information in a video, and representing the human body contour information by using a sampling point set;

7) And recovering the 3D structural parameters of the human body in the video.

2. The method for human body pose deformation based on video content according to claim 1, wherein the establishing of the point set corresponding relation between the human body contour and the model contour in the video in the step 5) is realized by:

calculating the shape context of each point in the contour, namely establishing the distance and angle measurement of the point and all other points in the contour;

and taking the shape context as a standard for measuring the similarity of the feature points, wherein the two feature points with the most similar shape context in the two contours are matched feature points, so that the point set corresponding relation between the model contour and the video contour is established.

3. The method for human body pose deformation based on video content of claim 2, wherein the specific steps of establishing the corresponding relation of the feature points in the point set by using the shape context method are as follows:

a2, for each feature point, establishing a vector set which takes the point as an origin and takes all other feature points in the contour as an end point;

b2, dividing the vector set into 12 parts and 5 parts according to an angle space and a mode space respectively, then counting the number of the vectors respectively falling into the divided spaces, and making a 12 x 5 histogram by using the information;

d2, representing the shape context of the feature point by a 60-dimensional vector, and then using the Euclidean distance as a distance measure between the shape context of the feature point in the set and the shape context of the given feature point:

let (x) ₁ ，x ₂ ，...，x ₆₀ ) And (y) ₁ ，y ₂ ，...，y ₆₀ ) The euclidean distance between two feature points is the euclidean distance between the two feature points:

the feature point with the closest distance between the shape contexts is the feature point closest to the given feature point, and according to the method, the point correspondence of the two feature point sets can be established.

4. The method for human pose deformation based on video content according to claim 1, wherein the step 6) of transplanting the 2D skeleton in the model outline to the video outline is implemented by:

determining a support set of each joint point of the 2D skeleton;

and carrying out the same transformation as the imitation transformation determined by the support set on the joint point position of the 2D skeleton of the model contour, wherein the transformation result is the 2D skeleton joint point position of the video contour.

5. The method for human body pose deformation based on video content according to claim 1, wherein the step 6) of transplanting the 2D skeleton in the model contour to the video contour is implemented by:

a3, determining a support set of each joint point of the 2D skeleton; the support set is a set of characteristic points in a circular area with a joint point as a circle center and R as a radius, and the support set is a subset of the characteristic point set;

b3, solving the two-dimensional affine transformation relation of the support sets corresponding to the 2D joint points between the model outline and the video outline;

in the form of x' = a _xx x+a _xy y+b _x ，y′＝a _yx x+a _yy y+b _y The coordinate transformation is called two-dimensional imitation transformation, the transformed coordinates x 'and y' are linear functions of the original coordinates x and y, and the parameter a _xx 、 a _xy 、a _yx 、b _x And b _y Is a constant determined by the type of the transform,

in the known corresponding point set:

[(x ₁ ，y ₁ )，(x ₂ ，y ₂ )，(x ₃ ，y ₃ )，...，(x _n ，y _n )]and [ (x) ₁ ′，y ₁ ′)，(x ₂ ′，y ₂ ′)，(x ₃ ′，y ₃ ′)，…(x _n ′，y _n ′)]In the case of (1), the corresponding affine transformation can be obtained by solving the following overdetermined equation by the least square method:

6. The method for human body pose deformation based on video content according to claim 1, wherein the step 7) of restoring the 3D structural parameters of the human body in the video is implemented by transforming the known initial 3D human body skeleton.

7. The method for deforming human body posture based on video contents as claimed in claim 1, wherein the 3D structure parameters of the human body in the video are recovered in the step 7), and the method specifically comprises the following operations by transforming the known initial 3D human body posture:

let (x) _o ，y _o ，z _o ) Position of articulation point in initial pose, (x' _o ，y′ _o ) The projection position of the joint point on the 2D plane is corresponding to the position of the joint point in the model skeleton; (x' _t ，y′ _t ) Is prepared from (x' _o ，y′ _o ) The coordinates after affine transformation, which correspond to the joint point positions of the estimated video skeleton, are: x' _t ＝a _xx x′ _o +a _xy y′ _o +b _x ，y′ _t ＝a _yx x′ _o +a _yy y′ _o +b _y

Since parallel projections are used, (x) _o ，y _o ，z _o ) And (x' _o ，y′ _o )，(x _t ，y _t ，z _t ) And (x' _t ，y′ _t ) The following relationship holds true between:

k is a known zoom factor, then, and (x' _t ，y′ _t ) Corresponding three-dimensional coordinates (x) _t ，y _t ，z _t ) Is composed of