WO2016145625A1

WO2016145625A1 - 3d hand pose recovery from binocular imaging system

Info

Publication number: WO2016145625A1
Application number: PCT/CN2015/074447
Authority: WO
Inventors: Xiaoou Tang; Chen QIAN; Tak Wai HUI; Chen Change Loy
Original assignee: Xiaoou Tang
Priority date: 2015-03-18
Filing date: 2015-03-18
Publication date: 2016-09-22
Also published as: CN108140243B; CN108140243A

Abstract

Disclosed is an apparatus, a method and a system for constructing a 3D hand model from a binocular imaging system. The apparatus may comprise a retrieving device configured to retrieve a hand region from a stereo frame comprising at least a first image and a second image; a segmenting device in electrical communication with the retrieving device and configured to segment at least one hand part each having feature points from the retrieved hand region; an acquiring device electrically coupled with the segmenting device and configured to, for each segmented hand part, acquire a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image; and a generating device in electrical communication with the acquiring device and configured to generate a 3D model of each hand part based on the matched feature point pairs of the hand part to construct the 3D hand model.

Description

3D HAND POSE RECOVERY FROM A BINOCULAR IMAGING SYSTEM

Technical Field

The present application generally relates to a field of body pose recognition, more particularly, to an apparatus for constructing a 3D hand model from a binocular image system. The present application further relates to a method and a system for constructing a 3D hand model from a binocular image system.

Background

Recently, body pose recognition systems, especially hand pose recognition systems have been applied in several applications, such as, hand gesture control in human-computer interface (HCI) , sign language recognition and etc. . Conventional recovery of a 3D model from a stereo image is generally divided into two steps including extraction of a 3D point cloud from the stereo image and then fitting the 3D point cloud into a 3D model.

However, traditional methods generally face the following problems. Firstly, 2D features of a finger are hardly distinguished from that of other fingers. Ambiguity in establishing the correspondence of the same 3D point across two or more different images in a stereo image pair exists and affects the accuracy of a 3D reconstruction. Secondly, distinct features extraction and feature mapping hardly meet the real-time requirement. Thirdly, a hand is considered as a multi-body object, which is generally called an articulated object, and thus hand pose recovery is an ill-posed task when the traditional single model fitting is used. Fourthly, even complex multi-body model fitting can be used instead of the single model, which is rather a computationally intensive task.

Traditional methods without considering the unique characteristics of the human hand can hardly overcome these difficulties.

Summary

In view of the above, an apparatus, a system and method are proposed to solve the aforementioned problems. With the apparatus, system and method, properties of human hand are wisely utilized by introducing and using a concept of hand parts to overcome the above difficulties. Therefore, the hand pose including 3D positions and directions of fingers and palm can be recovered in a real-time manner.

According to an embodiment of the present application, disclosed is an apparatus for constructing a 3D hand model. The apparatus may comprise a retrieving device configured to retrieve a hand region from a stereo frame comprising at least a first image and a second image； a segmenting device in electrical communication with the retrieving device and configured to segment one or more hand parts each consisting of a number of feature points from the retrieved hand region； an acquiring device electrically coupled with the segmenting device and configured to, for each segmented hand part, acquire a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image； and a generating device in electrical communication with the acquiring device and configured to generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.

According to an embodiment of the present application, disclosed is a method for constructing a 3D hand model. The method may comprise the following steps: retrieving a hand region from a stereo frame comprising at least a first image and a second image； segmenting, from the retrieved hand region, one or more hand parts each consisting of a number of feature points； acquiring, for each hand part, a plurality of matched feature point pairs in which the feature points in the first image are matched with the corresponding feature points in the second image； and generating a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.

According to an embodiment of the present application, disclosed is a system for constructing a 3D hand model. The system may comprise a memory that stores executable components and a processor, electrically coupled to the memory to execute the executable components to retrieve a hand region from a stereo frame comprising at least a first image and a second image； segment one or more hand parts each consisting of a number of feature points from the retrieved hand region； acquire, for each hand part, a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image； and generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.

The following description and the annexed drawings set forth certain illustrative aspects of the disclosure. These aspects are indicative, however, of but a few of the various ways in which the principles of the disclosure may be employed. Other aspects of the disclosure will become apparent from the following detailed description of the disclosure when considered in conjunction with the drawings.

Brief Description of the Drawing

Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.

Fig. 1 is a schematic diagram illustrating an apparatus for constructing a 3D hand model consistent with an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating a segmenting device of the apparatus for constructing a 3D hand model consistent with some disclosed embodiments.

Fig. 3 is a schematic diagram illustrating a generating device of the apparatus for constructing a 3D hand model consistent with one embodiment of the present application.

Fig. 4 is a schematic diagram illustrating an example of a constructed 3D hand model consistent with one embodiment of the present application.

Fig. 5 is a schematic flowchart illustrating a method for constructing a 3D hand model consistent with some disclosed embodiments.

Fig. 6 is a schematic flowchart illustrating a step of segmenting of the method for constructing a 3D hand model consistent with some other disclosed embodiments.

Fig. 7 is a schematic flowchart illustrating a step of generating of the method for constructing a 3D hand model consistent with some other disclosed embodiments.

Fig. 8 is a schematic diagram illustrating a system for constructing a 3D hand model consistent with an embodiment of the present application.

Detailed Description

Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a" , "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising, " when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit， ” “module” or “system. ” Furthermore， the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.

It is further understood that the use of relational terms such as first and second, and the like, if any, are used solely to distinguish one from another entity, item, or action without necessarily requiring or implying any actual such relationship or order between such entities, items or actions.

Much of the inventive functionality and many of the inventive principles when implemented, are best supported with or in software or integrated circuits (ICs) , such as a digital signal processor and software therefore or application specific ICs. It is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions or ICs with minimal experimentation. Therefore, in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present invention, further discussion of such software and ICs, if any, will be limited to the essentials with respect to the principles and concepts used by the preferred embodiments.

Fig. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for constructing a 3D hand model of a user from a binocular imaging system consistent with some disclosed embodiments. As shown, the apparatus 1000 may comprise a retrieving device 100, a segmenting device 200, an acquiring device 300 and a generating device 400.

In the embodiment shown in Fig. 1, the retrieving device 100 may retrieve a hand region from a stereo frame comprising at least a first image and a second image. In an embodiment, the retrieving device 100 may capture the stereo frame of the user’s hand from the binocular image system and retrieve the largest connected component of each of the images in the stereo frame as the hand region. Herein, the connected component refers to a region consisting of a set of image points which are located adjacently.

The segmenting device 200 may be in communication with the retrieving device 100 and may segment one or more hand parts from the retrieved hand region, wherein each of the hand parts consists of a number of feature points, which will be described later in details with reference to Fig. 2.

The acquiring device 300 may be electrically coupled with the segmenting device 200. For each hand part, the acquiring device 300 may acquire a plurality of matched feature point pairs in which the feature points in the first image are matched with the corresponding feature points in the second image.

The generating device 400 may be in electrical communication with the acquiring device 300 and may generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model, which will be described later in details with reference to Fig. 3.

With the apparatus 1000, the 3D positions and orientations of fingers and palm of the user’s hand can be recovered in a real-time manner. Fig. 4 illustrates an example of a 3D hand model constructed according to one embodiment of the present application, wherein five circles and an ellipse represent the detected fingertips and palm of the user’s hand, respectively.

The binocular imaging system (also known as a stereo camera) may be for example, an infra-red stereo camera. Hereinafter, each component of the apparatus 1000 will be described in details in an exemplary embodiment in which an infra-red (IR) stereo camera with a brightness-adjustable IR light source is used to capture stereo images. In this way, only objects which are illuminated by the light source will be captured by the binocular imaging system. Note that the images may be captured by any other kinds of imaging system and the present application is not limited thereto. For simplicity's sake, the binocular imaging system is calibrated, that is, image rectification is performed for every stereo image frame.

For the binocular imaging system, the stereo frame has at least two images, namely a left image I₁ captured by a left stereo camera and a right image I₂ captured by a right camera. Hereinafter, the first and second images may refer to any one of the left and right images in the stereo image frame (I₁， I₂) , unless otherwise specifically stated.

Referring to Fig. 2, the segmenting device 200 may further comprise a chooser 201, an extractor 202 and a detector 203. In particular, the chooser 201 may choose a representative point for identifying each of the hand parts from the hand region, the extractor 202 may extract a connected component of each hand part according to the chosen representative point, and the detector 203 may detect the corresponding feature points of each hand part according to the extracted connected component to segment at least one hand parts with the detected feature points.

The segmenting device 200 may segment the hand region into a plurality of hand parts at least comprising, for example, five finger parts and a palm part. In order to identify the hand part, each hand part is assigned by a representative point so as to distinguish it from different hand parts. In an embodiment, the chooser 201 may use a geometric method such that it chooses the most protruding point in the hand region as the representative point for identifying one finger part. In another embodiment, the chooser 201 may use an intensity approach to choose the brightest point of the stereo frame of the hand region as the representative point of a finger part, that is, to choose the point which has the highest intensity (i.e., the brightest) in the hand region as the representative point of the finger part. For the palm part, the chooser 201 may choose a center of the hand region as its representative point. Note that, other properties of the hand can also be used to identify finger or palm parts according to different imaging systems and the present application is not limited thereto.

The extractor 202 may extract the connected component of each hand part according to its representative point. The connected component consists of a set of image points around the representative point of the hand part. For the palm part, the connected component is a set of image points around its representative point within the average palm radius. For the geometric approach, each connected component of the finger part is a set of image points around the protruding point within a distance not exceeding an average finger length. In another embodiment, the connected component of the finger part is a set of image points such that the image points with the same distance to the protruding point do not exceed an average finger width. In another embodiment, for the finger parts identified using intensity-based approach, the connected component is a set of image points around the brightest point such that intensities of the image points at contour lines of the aforementioned distance map are lower than a certain threshold. The average palm radius, the average finger length, and the average finger width are pre-determined.

In an embodiment, the segmenting device 200 may further comprise a remover 204 configured for removing the hand part comprising the chosen representative point and the extracted connected component from the hand region, such that the choosing, the extracting and the removing are performed repeatedly in the remained hand region until no hand part needs to be removed from the hand region. Therefore, hand parts can be iteratively segmented and only a new single hand part is recovered from the remained hand region in each iteration process. In an embodiment, the chooser 201 may first choose the most protruding point as the representative point of a hand part. Then, the extractor 202 may extract the connected component around the protruding point within a distance not exceeding the predetermined average finger length. Then, the currently searched hand part in which the representative point and the connected component are extracted is removed from the hand region, wherein the removed hand part comprises the chosen representative point and the extracted connected component. The processes of the chooser 201, the extractor 202 and the remover 204 are performed repeatedly in the remained hand region until all the hand region is searched through.

The detector 203 may detect the corresponding feature points of each of the hand parts according to each connected component extracted by the extractor 202. The feature points are distributed widely enough to cover the whole hand part and are discriminative so that the 2D image projections from different 3D points are distinguishable from each other. In an embodiment, the image points which are located in the boundary along the connected components of a hand part are used as the feature points of the hand part.

In an embodiment, the segmenting device 200 may further comprise a validator 205. To validate the correctness of the segmented hand part, the validator 205 is configured to validate whether the segmented hand part is a finger part according to the extracted connected component of the hand part. If it does not belong to a finger part, then the segmented hand part is considered as a part of the palm part. A length-to-width ratio which is defined as a ratio of the length to width of the finger part is used to determine whether the current hand part is a valid finger or not. The information of the length and the width of the finger part are provided by the connected component extracted from the extractor 202. Note that, other properties relating to the representative point can also be used to provide useful cues to facilitate the validation of the hand part and the present application is not limited thereto.

Referring to Fig. 1 again, the acquiring device 300 is configured to acquire a matched hand part pair in which each hand part in the first image is matched with a hand part in the second image. In an embodiment, the acquiring device 300 may further acquire the matched feature point pairs in each matched hand part.

For a set of hand parts H＝ (F₁， F₂， ...， F₅， P) segmented by the segmenting device 200, the five components represent the five finger parts and the last one represents the palm part. Herein, for the palm part (P) , the center of the hand region is chosen as the representative point (p_p) . The acquiring device 300 may only acquire the matched finger part pair in which each hand part in the first image is matched with a hand part in the second image, according to the representative point of the hand part.

In particular, for the stereo frame (I₁， I₂) , each finger part (F₁) _i in the first image I₁ is matched to a finger part (F₂) _j in the second image I₂ by measuring the difference between the distance of the representative point of (F₁) _i relative to the palm center p_p1 and the distance of the representative point of (F₂) _j relative to the palm center p_p2 as follows:

where (p_f1) _i and (p_f2) _j represent the i^th and the j^th representative points of the finger parts (F₁) _i and (F₂) _j in I₁ and I₂ , respectively and i， j＝1， 2， ...， 5.

From the matched hand part, the acquiring device 300 may further acquire the matched feature point pairs, that is, the 2D image points and the hand part labels associated with the 2D image points. Herein, disparities d (p_H) of all the feature points x＝ (x， y) ^T in the same hand part H are assumed the same as the associated representative point p_H. In other words, for the stereo frame (I₁， I₂) ， the correspondence of a feature point x₁ in the first image I₁ is defined to be

in the second image I₂. The disparity may be provided by the generating device 400, which will be described later. After rejecting some of the impossible correspondences, the optimal matched feature point x₂ for x₁ is defined as follows:

where x′₂ s are the 2D image positions of the feature points around

Then, the acquiring device 300 acquires the matched feature points pair (x₁， x₂) .

Returning to Fig. 3, the generating device 400 may comprise an establisher 401, a determiner 402 and a fitter 403. The establisher 401 may establish a 3D point cloud for the first image and the second image from the matched feature point pairs of each hand part. Each of the matched feature point pairs may comprise hand part labels associated with 2D coordinated of the feature points. The determiner 402 may determine whether the established 3D point cloud of a hand part belongs to a finger part or not according to the hand part label. The fitter 403 may fit each established 3D point cloud with a specific 3D model according to the hand part label associated with the 3D point cloud.

For the matched feature point pair (x₁， x₂) ， the depth Z(x₁， x₂) is defined as the follows:

where d＝x₂-x₁ represents the disparity for the matched feature point pair (x₁， x₂) ； and f and b represent the focal length and the baseline of the stereo camera after rectification, respectively.

Therefore, the establisher 401 establishes the 3D point cloud, such that the 3D position X₁ with respect to the camera center associated with I₁ is defined as follows:

Then, the determiner 402 may determine whether the established 3D point cloud of one hand part belongs to a finger part or not according to the hand part label, such that the fitter 403 may fit the established 3D point cloud with a specific 3D model.

If the established 3D point cloud of a hand part is determined as belonging to the finger, a 3D finger model fitting is performed by the fitter 403. Herein, a cylinder in a 3D space is modeled as a finger and further simplified to a line segment. The line segment can be parameterized by finger length L, 3D coordinates of the fingertip P_f, and a unit direction vector

of the finger, wherein L may be pre-determined by the segmenting device 200. The parameters P_f and

may be initialized. The optimal values can be obtained by using a gradient descent optimization to minimize the total distance from all 3D feature points of the finger part to the line segment. Therefore, a cost function

is defined as follows:

where P_f represents the 3D coordinates of the fingertip of the finger part；

represents the unit direction vector of the finger part； and (X_f) _i represents the i^th point of the 3D point cloud for the finger part. From this, the 3D finger model of the finger part is constructed accordingly.

On the other hand, if the established 3D point cloud of a hand part is determined as belonging to the palm, a 3D palm model fitting is performed by the fitter 403. Herein, a 3D circle is modeled as a palm and further parameterized by using a palm center C_p, a radius r and a surface unit normal

After the palm center C_p, the radius r and the unit normal

are initialized, a gradient descent optimization is performed on

to minimize the total distance from all 3D points to the 3D circle and its variance. A cost function

is defined as follows:

where (X_p) ⁱ represents the i^th point of the 3D point cloud for the palm part；

represents the mean distance of X_ps’ to the palm center C_p； and

λ represents an adjustment factor.

After that, the radius r is re-estimated according to the calculated C_p.

Then, the above two steps are performed iteratively such that a final 3D hand model can be obtained.

Fig. 5 is a flowchart illustrating a method for constructing a 3D hand model and Figs. 6 and 7 are flowcharts respectively illustrating the segmenting step S502 and the generating step S504 of the method shown in Fig. 5. Hereinafter, the method 2000 will be described in details with reference to Figs. 5-7.

As shown in Fig. 5, at step S501, a hand region may be retrieved from a stereo frame comprising at least a first image and a second image. At step S502, one or more hand parts each consisting of a number of feature points may be segmented from the hand region retrieved at step S501. Then, at step S503, for each hand part, a plurality of matched feature point pairs in which the feature points in the first image may be matched with the feature points in the second image are acquired. After that, at step S504, a 3D model of each hand part may be generated based on the matched feature point pairs of the hand parts to construct the 3D hand model.

In an embodiment, the step S502 shown in Fig. 5 may further comprise steps S5021 to S5023 as shown in Fig. 6. Referring to Fig. 6, at S5021, a representative point for identifying each of the hand parts is chosen from the hand region. At S5022, the connected component of each of the hand parts is extracted according to the chosen representative point. Then, at S5023, the corresponding feature points of each of the hand parts are detected according to the extracted connected component to segment at least one hand parts with the detected feature points.

In an embodiment, the step S503 further comprise a step of acquiring a matched hand part pair in which each hand part in the first image is matched with a hand part in the second image, according to the representative point of the hand part and a step of acquiring the matched feature point pair in each matched hand part.

In an embodiment, the step S504 shown in Fig. 5 further comprises steps S5041 to S5044 as shown in Fig. 7. Each matched feature point pairs of the hand part acquired at step S503 may comprise hand part labels associated with 2D coordinates of the feature points. At step S5041, a 3D point cloud is established for the first image and the second image from the matched feature point pair of each hand part. At step S5042, whether the established 3D point cloud of a hand part belongs to a finger part or not may be determined according to the hand part label. If it is determined that the hand part is the finger part, then, at step S5043, a 3D finger model fitting process is performed, which may be governed by the above-mentioned formula (5) . If not, at step S5044, a 3D palm model fitting process is performed, which may be governed by the above-mentioned formula (6) .

Fig. 8 illustrates a system 3000 for constructing a 3D hand model consistent with an embodiment of the present application. Referring to Fig. 8, the system 3000 comprises a memory 3001 that stores executable components and a processor 3002, electrically coupled to the memory 3001 to execute the executable components to perform operations of the system 3000. The executable components may comprise: a retrieving component 3003 configured to retrieve a hand region from a stereo frame comprising at least a first image and a second image； a segmenting component 3004 configured to segment one or more hand parts each consisting of a number of feature points from the retrieved hand region； an acquiring component 3005 configured to, for each hand part, acquire a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image； and a generating component 3006 configured to generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model. The functions of the components 3003 to 3006 are similar to those of the devices 100 to 400, respectively, and thus the detailed descriptions thereof are omitted herein.

Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims are intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.

Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims

An apparatus for constructing a 3D hand model comprising:

a retrieving device configured to retrieve a hand region from a stereo frame comprising at least a first image and a second image；

a segmenting device in electrical communication with the retrieving device and configured to segment one or more hand parts each consisting of a number of feature points from the retrieved hand region；

an acquiring device electrically coupled with the segmenting device and configured to, for each segmented hand part, acquire a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image； and

a generating device in electrical communication with the acquiring device and configured to generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.
The apparatus of claim 1, wherein the segmenting device further comprises:

a chooser configured for choosing a representative point for identifying each of the hand parts from the hand region；

an extractor configured for extracting a connected component of each of the hand parts according to the chosen representative point；

a detector configured for detecting feature points of each of the hand parts according to the extracted connected component to segment at least one hand parts with the detected feature points.
The apparatus of claim 2, wherein the segmenting device further comprises:

a remover configured for removing the hand part comprising the chosen representative point and the extracted connected component from the hand region, wherein the choosing, the extracting and the removing are performed repeatedly in the remained hand region until no hand part needs to be removed from the hand region
The apparatus of claim 2, wherein the segmenting device further comprises:

a validator configured to validate whether the segmented hand part is a finger part according to the extracted connected component.
The apparatus of claim 2, wherein the hand parts at least comprise a plurality of finger parts, and wherein

the most protruding point in the hand region is chosen as the representative point for identifying a finger part, and

the connected component of one of the finger parts is a set of image points around the representative point within a distance not exceeding a predetermined finger length.
The apparatus of claim 2, wherein the hand parts at least comprise a palm part, and wherein

a center of the hand region is chosen as the representative point for identifying the palm part, and

the connected component of the palm part is a set of image points around the representative point within a predetermined palm radius.
The apparatus of claim 2, wherein the acquiring device is further configured to acquire a matched hand part pair in which each hand part in the first image is matched with a hand part in the second image, according to the representative point of each hand part； and acquire the matched feature point pairs in each matched hand part.
The apparatus of claim 1, wherein each of the matched feature point pairs comprises hand part labels associated with 2D coordinates of the feature points, and the generating device further comprises:

an establisher configured to establish a 3D point cloud for the first image and the second image from the matched feature point pairs of each hand part；

a determiner configured to determine whether the established 3D point cloud of a hand part belongs to a finger part or not according to the hand part label； and

a fitter configured to fit the established 3D point cloud with a specific 3D model according to the hand part label associated with the 3D point cloud.
The apparatus of claim 1, wherein the retrieving device is further configured to capture the stereo frame of a user’s hand from a binocular image system and retrieve the largest connected component of each of the first and the second images as the hand region.
A method for constructing a 3D hand model comprising:

retrieving a hand region from a stereo frame comprising at least a first image and a second image；

segmenting, from the retrieved hand region, one or more hand parts each consisting of a number of feature points；

acquiring, for each hand part, a plurality of matched feature point pairs in which the feature points in the first image are matched with the corresponding feature points in the second image； and

generating a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.
The method of claim 10, wherein the segmenting further comprises:

choosing a representative point for identifying each of the hand parts from the hand region；

extracting a connected component of each of the hand parts according to the chosen representative point； and

detecting the feature points of each of the hand parts according to the extracted component to segment at least one hand parts with the detected feature points.
The method of claim 11, wherein the segmenting further comprises:

removing the hand part comprising the chosen representative point and the extracted connected component from the hand region, wherein the choosing, the extracting and the detecting are completely performed in the remained hand region until no hand part needs to be removed from the hand region.
The method of claim 11, wherein the segmenting further comprises:

validating whether the segmented hand part is a finger part according to the extracted connected component.
The method of claim 11, wherein the hand parts at least comprise a plurality of finger parts, and wherein

the most protruding point in the hand region is chosen as the representative point for identifying a finger part, and

the connected component of one of the finger part is a set of image points around the representative point within a distance not exceeding a predetermined finger length.
The method of claim 11, wherein the hand parts at least comprise a palm part, and wherein

a center of the hand region is chosen as the representative point for identifying the palm part, and

the connected component of the palm part is a set of image points around the representative point within a predetermined palm radius.
The method of claim 11, wherein the acquiring further comprises:

acquiring a matched hand part pair in which each hand part in the first image is matched with a hand part in the second image, according to the representative point of each hand part； and

acquiring the matched feature point pairs in each matched hand part.
The method of claim 10, wherein each of the matched feature point pairs comprises hand part labels associated with 2D coordinates of the feature points, and the generating further comprises:

establishing a 3D point cloud for the first image and the second image from the matched feature point pairs of each hand part；

determining whether the established 3D point cloud of a hand part belongs to a finger part or not according to the hand part label； and

fitting the established 3D point cloud with a specific 3D model according to the hand part label associated with the 3D point cloud.
The method of claim 10, further comprising:

capturing the stereo frame of a user’s hand from a binocular image system； and

retrieving the largest connected component of each of the first and the second images as the hand region.
A system for constructing a 3D hand model, comprising:

a memory that stores executable components； and

a processor, electrically coupled to the memory to execute the executable components, to,

retrieve a hand region from a stereo frame comprising at least a first image and a second image；

segment one or more hand parts each consisting of a number of feature points from the retrieved hand region；

acquire, for each hand part, a plurality of matched feature point pairs in which the feature points in the first image are matched with corresponding feature points in the second image； and

generate a 3D model of each hand part based on the matched feature point pairs of the hand parts to construct the 3D hand model.