CN113205605A

CN113205605A - Method for acquiring hand three-dimensional parametric model from depth image

Info

Publication number: CN113205605A
Application number: CN202110595988.0A
Authority: CN
Inventors: 耿卫东; 梁秀波; 厉向东; 金文光; 戴青锋; 刘帅; 姬源智; 周洲; 韩晨晨; 毋从周; 朱俊威
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-05-29
Filing date: 2021-05-29
Publication date: 2021-08-03
Anticipated expiration: 2041-05-29
Also published as: CN113205605B

Abstract

The invention discloses a method for acquiring a three-dimensional hand parametric model from a depth image, which comprises the following steps: acquiring a depth image sequence and depth camera internal parameters; reconstructing a rough three-dimensional point cloud of the hand by using the depth image sequence of the hand and the corresponding depth camera parameters; manually removing non-hand point clouds and noise point clouds in the rough hand three-dimensional point clouds to obtain fine hand three-dimensional point clouds; and obtaining the personalized hand three-dimensional parameterized model of the user by carrying out iterative optimization on three stages of the fine hand three-dimensional point cloud. According to the method, the personalized hand three-dimensional parameterized model of the user can be obtained through the depth image sequence. Compared with the traditional model-free or universal hand model, the hand three-dimensional parameterized model with the personalized user provides more prior information based on the user, so the gesture three-dimensional parameterized hand model has higher precision and adaptability during gesture posture estimation, and has application prospects in specific scenes such as human-computer interaction, rehabilitation and medical treatment.

Description

Method for acquiring hand three-dimensional parametric model from depth image

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a method for acquiring a three-dimensional hand parametric model from a depth image.

Background

The traditional model-free gesture pose estimation based on computer vision often encounters the problems of occlusion (including self-occlusion), low resolution, noise and the like, and the problems have great influence on the final pose estimation result. Compared with the traditional model-free method, the hand parametric model-based gesture posture estimation method provides strong prior knowledge for the gesture posture estimation task, so that the gesture posture estimation task can provide excellent performance under the conditions of shielding, low resolution and the like to the maximum extent. The most widely used hand parameterization Model is currently the MANO Model (hand Model with organized and Non-threaded defied objects). The MANO model is a parameterized hand model published by Javier Romero et al in 2017. Relevant research work according to hand posture estimation at home and abroad since 2018 shows that: the MANO parameterized model is used, and the method plays a vital role in estimating a reasonable and accurate gesture posture.

Set of parameters for MANO model

To define triangulated hand three-dimensional mesh

Wherein

Is the shape parameter of the hand part,

is a hand pose parameter representing 16 hand joint angles represented by axis angles. Specifically, the MANO model is first modeled by an average

By definition, there are 778 vertices in the template. Shape mixing function

To be provided with

As an input, a hybrid shape describing a hand model is output. Attitude blend function

To be provided with

As an input, a model change due to a hand gesture is output. Output results of shape blending function and attitude blending function

And

is applied to the average stencil

To obtain the final hand three-dimensional mesh

The model is based on the vertex of each finger portion by blending the skin function W (-) with

Hybrid weights

Calculating the rotation amount for each key point

A rotation operation is performed.

Shape blending parameter B for a particular model_SWhich is itself a space of shapes from a set of lay-flat motion hands by principal component analysis

To extract a principal component vector S_nLinear combinations of (3). The principal component vectors are multiplied by linear coefficients in turn and accumulated to obtain the personal customized hand shape mixing parameter B_SThe corresponding linear coefficient is the shape parameter

Disclosure of Invention

The invention aims to provide a method for acquiring a three-dimensional hand parametric model from a depth image aiming at the requirement of acquiring the hand parametric model.

The purpose of the invention is realized by the following technical scheme: a method of obtaining a three-dimensional parametric model of a hand from a depth image, the method comprising the steps of:

(1) acquiring a depth image sequence and depth camera internal parameters, comprising the following sub-steps:

(1.1) shooting the two hands of a user, which are horizontally placed on a flat desktop, by using a structured light depth camera to obtain a hand depth image sequence;

(1.2) reading camera parameters of the depth camera, including focal length and center point offset;

(2) reconstructing a rough three-dimensional point cloud of the hand by the hand depth image sequence obtained in the step (1) and the corresponding depth camera internal reference, and comprising the following substeps:

(2.1) obtaining a three-dimensional point cloud according to the depth image of the first frame and the internal parameters of the depth camera, constructing a three-dimensional grid model by taking the camera coordinate system of the first frame as a world coordinate system, and executing the step (2.2) for the subsequent depth image sequence;

(2.2) obtaining a three-dimensional point cloud according to the single-frame depth image and the camera internal parameters and calculating a normal vector of each point in the point cloud;

(2.3) registering the three-dimensional point cloud of the current frame with the point cloud projected from the three-dimensional grid model through light projection according to the position of the previous frame of camera, and calculating the position of the current frame of camera;

(2.4) fusing the point cloud of the current frame with the three-dimensional grid model according to the calculated camera pose, and updating the three-dimensional grid model;

(2.5) projecting from the updated three-dimensional grid model by light projection according to the camera pose of the current frame to obtain point cloud under the current view angle, and calculating a normal vector of each point in the point cloud for registering the input depth image of the next frame; repeatedly executing the steps (2.2) - (2.5) until the processing of the depth image sequence is completed;

(2.6) converting the three-dimensional mesh model into a point cloud to obtain a rough hand three-dimensional point cloud (the hand three-dimensional point cloud usually contains the surrounding environment and noise);

(3) obtaining rough hand three-dimensional point cloud in the step (2), manually removing non-hand point cloud and noise point cloud to obtain fine hand three-dimensional point cloud, and comprising the following substeps:

(3.1) eliminating point clouds which are not connected with the hand three-dimensional point cloud;

(3.2) selecting three non-collinear points belonging to the desktop plane in the three-dimensional point cloud processed in the step (3.1), and calculating the planar representation of the desktop according to the three points;

(3.3) according to the plane representation of the desktop, removing the three-dimensional point cloud far away from the space on one side of the hand point cloud;

(3.4) eliminating point clouds which are not connected with the hand three-dimensional point cloud in the three-dimensional point cloud processed in the step (3.3) to obtain fine hand three-dimensional point cloud;

(4) the method for acquiring the personalized hand three-dimensional parameterized model of the user through the fine hand three-dimensional point cloud comprises the following substeps:

(4.1) based on the fine hand three-dimensional point cloud and the fingertip two-dimensional position projected to the image plane, optimally solving the global rotation and translation parameters of the hand three-dimensional parameterized model;

(4.2) fixing the global rotation and translation parameters obtained in the step (4.1), and optimizing and solving the shape and posture parameters of the hand three-dimensional parametric model based on the fine hand three-dimensional point cloud and the fingertip position projected to the image plane;

and (4.3) actively removing a part of point clouds of the finger parts in order to balance the influence caused by the dense point clouds of the finger parts, further optimizing and solving the shape parameters of the hand three-dimensional parameterized model based on the processed hand three-dimensional point clouds, and finally obtaining the hand three-dimensional parameterized model personalized by the user.

Further, the three-dimensional mesh model is constructed and updated by a three-dimensional reconstruction algorithm named tsdf (truncated Signed Distance function).

Further, the three-dimensional point clouds are registered through a point cloud matching algorithm named ICP (iterative closed Point), and the camera pose is calculated according to the registered point clouds.

Further, in the step (3), the selection and elimination of the three-dimensional point cloud are processed by using three-dimensional mesh visualization software capable of visual representation, and Meshlab and OpenSCAD can be adopted.

Further, in the steps (4.1) and (4.2), the three-dimensional point cloud of the hand is projected into an image under the reference of a stationary camera, and a two-dimensional position of a finger tip is obtained through a fingertip detection algorithm, wherein the fingertip detection algorithm can be performed through a contour detection method of an OpenCV tool.

Further, in the step (4), the objective function for optimizing and solving the optimal parameter includes three parts, which are: measuring point cloud matching error of overall matching degree of three-dimensional mesh patch q on hand three-dimensional parametric model and point p in three-dimensional point cloud, and measuring finger tip two-dimensional projection ft of hand three-dimensional parametric model_q,jAnd point cloud fingertip two-dimensional projection ft_p,jFingertip projection error of distance and measurement of hand three-dimensional parametric model shape parameters as final result

The prior error of the distance between the shape parameter of the average hand and the calculation formulas of the point cloud matching error, the fingertip projection error and the prior error are respectively as follows:

wherein j in the fingertip projection error formula represents 5 fingertips.

Further, in the step (4.3), the shape parameter of the hand three-dimensional parameterized model is a shape space of the hand shape from a set of flat-laying motions by principal component analysis

To extract a principal component vector S_nThe customized hand shape can be considered as a principal component vector S_nThe corresponding linear coefficient is the shape parameter

Further, in the step (4), the optimization algorithm for solving is Adam gradient descent algorithm.

Further, in the step (4.3), the point cloud elimination rule of the finger part is to eliminate points, which are projected on a two-dimensional plane and have a distance of more than 80 pixels from a two-dimensional point projected by a root node of the hand three-dimensional parameterized model.

Further, according to the user-personalized hand three-dimensional parameterized model obtained in the step (4), the user-personalized hand three-dimensional parameterized model provides more prior information based on the user than the traditional model-free or universal hand model, so that the gesture posture estimation method has higher precision and adaptability during gesture posture estimation, and has application prospects in specific scenes such as human-computer interaction, rehabilitation and the like.

The invention has the beneficial effects that: the invention provides a method for acquiring a three-dimensional parameterized model of a personalized hand of a user from a depth image sequence. According to the method, the three-dimensional point cloud of the user hand is obtained through the structured light depth camera, and the three-dimensional point cloud of the user hand is matched with the standard parameterized hand model, so that the personalized hand three-dimensional parameterized model of the user is obtained. Compared with the traditional model-free or universal hand model, the hand three-dimensional parameterized model with the personalized user provides more prior information based on the user, so the gesture three-dimensional parameterized hand model has higher precision and adaptability during gesture posture estimation, and has application prospects in specific scenes such as human-computer interaction, rehabilitation and medical treatment.

Drawings

Fig. 1 is a flowchart of a method for obtaining a three-dimensional hand parametric model from a depth image according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, a method for acquiring a three-dimensional hand parametric model from a depth image according to an embodiment of the present invention includes the following specific steps:

acquiring a depth image sequence and depth camera internal parameters from a depth camera, wherein the depth camera adopts a depth camera applying a structured light principle, can adopt Intel RealSense D435i, uses the depth camera to surround a hand depth image sequence shot at a distance of 0.6 m, reads internal parameters of the depth camera including the offset of a focal length and a central point, and needs to ensure that a hand is always in the visual field range of the depth camera during shooting;

and (2) obtaining rough hand three-dimensional reconstruction point clouds on the basis of the depth image sequence obtained in the step (1) and the internal parameters of the depth camera. And obtaining a three-dimensional point cloud according to the depth image of the first frame and the internal parameters of the depth camera, constructing a three-dimensional grid model by taking the camera coordinate system of the first frame as a world coordinate system, and constructing the three-dimensional grid model by using a three-dimensional reconstruction algorithm named as TSDF (reflected Distance function). For each frame of depth image, calculating a three-dimensional point cloud under a camera coordinate system from the frame of depth image and camera internal parameters, and calculating a normal vector of each point in the three-dimensional point cloud; and performing point cloud registration on the three-dimensional point cloud of the current frame and the point cloud obtained by projection calculation of the three-dimensional grid model according to the position and the posture of the previous frame of camera through a light projection algorithm, wherein the point cloud registration is performed by using a point cloud matching algorithm named ICP (iterative Closest Point). After point cloud registration, calculating the camera pose of the current frame based on the camera pose of the previous frame; fusing the point cloud of the current frame and the three-dimensional grid model by using a three-dimensional reconstruction algorithm named TSDF (time series decomposition-transformation function) according to the camera pose of the current frame obtained by calculation, and updating the three-dimensional grid model; meanwhile, according to the camera pose of the current frame, point clouds under the current visual angle are obtained through projection of the updated grid model through a light projection algorithm, and normal vectors of all points in the point clouds are calculated and used for registering the input depth image of the next frame; and repeatedly executing the steps until all the depth image sequences are processed, and obtaining rough three-dimensional point cloud of the hand. The rough three-dimensional point cloud of the hand usually includes surrounding environment point cloud and noise.

And (3) manually processing the rough three-dimensional point cloud of the hand obtained in the step (2). When the three-dimensional point cloud is manually selected and rejected, three-dimensional mesh visualization software capable of visually representing point cloud files is required to be used for processing, and Meshlab and OpenSCAD can be adopted. Firstly, selecting point clouds which are not connected with the hand three-dimensional point cloud, removing the point clouds, and obtaining the three-dimensional point cloud comprising the hand point cloud and the desktop point cloud through the operation; then selecting three non-collinear points belonging to a desktop plane, calculating the plane representation of the desktop according to the three points, eliminating the three-dimensional point cloud in the space on one side of the plane far away from the hand point cloud, and obtaining the three-dimensional point cloud which comprises the hand-shaped point cloud and a part of point cloud fragments which are not connected with the hand through the operation; and eliminating the point cloud which is not connected with the hand three-dimensional point cloud again to obtain the fine hand three-dimensional point cloud.

and (4.3) removing a part of point cloud of the finger part, further optimizing and solving the shape parameter of the hand three-dimensional parameterized model based on the processed hand three-dimensional point cloud, and finally obtaining the user-personalized hand three-dimensional parameterized model.

And (4) acquiring a hand three-dimensional parameterized model personalized by the user through the fine hand three-dimensional point cloud. The step is divided into three stages, and each stage adopts the following objective function in an iterative optimization mode:

E(θ,β,R,T)＝wE_pointcloud+αE_fingertip+γE_prior

wherein, the point cloud matching error loss E_pointcloudFor each point p on the hand three-dimensional point cloud, finding the nearest patch q on the three-dimensional mesh of the hand three-dimensional parameterized model, and calculating the distance of the patch q; and for each patch q on the three-dimensional grid of each hand three-dimensional parameterized model, finding the closest point p in the hand three-dimensional point cloud, and calculating the distance of the closest point p.

The point cloud matching error is used for measuring the integral matching degree of a three-dimensional mesh surface patch on the hand three-dimensional parameterized model and the three-dimensional point cloud, and finally the condition that the three-dimensional point cloud is closest to the integral distance of the hand three-dimensional parameterized model and the loss function E is required to be met_pointcloudThe formula is as follows:

projection error E of fingertip joint point_fingertipFor each fingertip in a five-finger fingertip set j, calculating a two-dimensional projection ft of the three-dimensional hand parametric model fingertip_q,jAnd point cloud fingertip two-dimensional projection ft_p,jOf a two-dimensional distance, loss function E_fingertipThe formula is as follows:

the prior error formula of the hand three-dimensional parametric model shape parameter is as follows:

hand three-dimensional parametric model shape parameter with end result of prior error measurement of hand three-dimensional parametric model shape parameter

Euler distance from the "average hand" shape parameter. Wherein the "average hand" is defined as all 0's of all elements in the shape parameterSpecial cases.

Shape parameter of hand three-dimensional parametric model is from a set of shape space of flat-laying motion hand shape by principal component analysis

To extract a principal component vector S_nLinear combinations of (3). Hand shape blending parameter B for a particular model_SThe corresponding linear coefficient is the shape parameter

The calculation formula is as follows:

in the first stage, the global rotation and translation parameters of the hand three-dimensional parameterized model are solved through iterative optimization based on the fine hand three-dimensional point cloud and the fingertip two-dimensional position projected to the image plane. The fingertip detection algorithm can be performed by a contour detection method of an OpenCV tool. In this stage, the hand gesture parameters are defaulted to standard five-finger stretching actions, and the gesture parameters are designated. In the stage, the weight w of the point cloud matching error is set to 300, the weight alpha of the fingertip matching error is set to 1, the prior error gamma of the hand shape parameter is set to 2, point cloud registration is performed by using a point cloud matching algorithm named ICP (iterative close point), and iterative optimization of the objective function is performed by using a gradient descent optimization algorithm named Adam. In the stage, the rotation and translation between the hand three-dimensional parameterized model and the world coordinate system where the three-dimensional point cloud is located are estimated as the initial parameters of R and T.

And in the second stage, based on the fine hand three-dimensional point cloud and the fingertip two-dimensional position projected to the image plane, the shape parameters and the posture parameters of the hand three-dimensional parametric model are solved through iterative optimization. The fingertip detection algorithm is consistent with the first stage. In the stage, hand posture parameters are not assumed any more when iterative optimization of hand parameters is carried out, but initial parameters of R and T obtained in the previous stage are fixedly used, so that influence on point cloud fitting caused by integral displacement and rotation of the model is avoided. In the stage, the weight w of the used point cloud matching error is set to 5000, the weight alpha of the fingertip matching error is set to 1, the prior error gamma of the hand shape parameter is set to 2, and the algorithm of point cloud registration and iterative optimization of the objective function is consistent with that in the first stage. In this stage, the initial values of the shape parameters and the attitude parameters of the hand three-dimensional parametric model are estimated.

In the third stage, firstly, in order to balance the influence of the point cloud density of the finger part on model fitting, a part of point cloud of the finger part is actively removed, and the removing rule is to remove points which are projected on a fine hand three-dimensional point cloud two-dimensional plane and have a distance of more than 80 pixels from a two-dimensional point projected by a hand three-dimensional parametric model root intercept. And solving the shape parameters of the hand three-dimensional parameterized model based on the processed hand three-dimensional point cloud iterative optimization, and finally obtaining the user-personalized hand three-dimensional parameterized model. In the stage, the weight w of the point cloud matching error is set to 6000, the weight alpha of the fingertip matching error is set to 1, the prior error gamma of the hand shape parameter is set to 2, and the algorithm of point cloud registration and iterative optimization of the objective function is consistent with that in the first stage. In the stage, the shape parameters of the hand three-dimensional parameterized model are estimated, and the user-customized hand three-dimensional parameterized model is obtained.

The above description is only a preferred embodiment, and the present invention is not limited to the above embodiment, and the technical effects of the present invention can be achieved by the same means, which are all within the protection scope of the present invention. Within the scope of protection of the present invention, various modifications and variations of the technical solution and/or embodiments thereof are possible.

Claims

1. A method for acquiring a three-dimensional hand parametric model from a depth image is characterized by comprising the following steps:

(2.6) converting the three-dimensional grid model into point cloud to obtain rough hand three-dimensional point cloud;

2. The method of claim 1, wherein the three-dimensional mesh model is constructed and updated by a three-dimensional reconstruction algorithm named TSDF.

3. The method for obtaining the three-dimensional parameterized hand model from the depth image as claimed in claim 1, characterized in that the three-dimensional point clouds are registered by a point cloud matching algorithm named ICP, and the camera pose is calculated according to the registered point clouds.

4. The method for obtaining a three-dimensional hand parametric model from a depth image as claimed in claim 1, wherein in the step (3), the selection and rejection of the three-dimensional point cloud are processed by using three-dimensional mesh visualization software capable of visual representation, such as Meshlab and OpenSCAD.

5. The method according to claim 1, wherein in the steps (4.1) and (4.2), the three-dimensional point cloud of the hand is projected into an image under the stationary camera parameters, and the two-dimensional position of the finger tip is obtained by a fingertip detection algorithm, which can be performed by an outline detection method of an OpenCV tool.

6. The method for obtaining a three-dimensional hand parametric model from a depth image as claimed in claim 1, wherein in the step (4), the objective function for optimizing the solution to the optimal parameters comprises three parts: measuring point cloud matching error of overall matching degree of three-dimensional mesh patch q on hand three-dimensional parametric model and point p in three-dimensional point cloud, and measuring finger tip two-dimensional projection ft of hand three-dimensional parametric model_q,jAnd point cloud fingertip two-dimensional projection ft_p,jFingertip projection error of distance and measurement of hand three-dimensional parametric model shape parameters as final result

wherein j in the fingertip projection error formula represents 5 fingertips.

7. A slave depth of claim 1The method for acquiring the hand three-dimensional parameterized model from the image is characterized in that the shape parameter of the hand three-dimensional parameterized model in the step (4.3) is the shape space of a group of flat-laying motion hand shapes through principal component analysis

8. The method for obtaining a three-dimensional hand parametric model from a depth image as claimed in claim 1, wherein in the step (4), the optimization algorithm for solving is Adam gradient descent algorithm.

9. The method for obtaining a three-dimensional hand parametric model from a depth image as claimed in claim 1, wherein in step (4.3), the point cloud elimination rule of the finger part is to eliminate points projected on a two-dimensional plane at a distance of more than 80 pixels from a two-dimensional point projected from a root node of the three-dimensional hand parametric model.

10. The method for obtaining a three-dimensional hand parametric model from a depth image as claimed in claim 1, wherein the three-dimensional hand parametric model obtained in step (4) has higher accuracy and adaptability in gesture pose estimation since the three-dimensional hand parametric model provides more prior information based on a user than a conventional model-free or general-purpose hand model.