CN110348359A

CN110348359A - The method, apparatus and system of hand gestures tracking

Info

Publication number: CN110348359A
Application number: CN201910599290.9A
Authority: CN
Inventors: 齐越; 车云龙
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2019-10-18
Anticipated expiration: 2039-07-04
Also published as: CN110348359B

Abstract

The present invention provides the method, apparatus and system of a kind of hand gestures tracking, this method, including obtaining the depth image comprising hand region；According to the depth image, target hand gestures are obtained by target network model；Wherein, the target network model refers to trained in advance, for identifying to obtain the initial hand gestures of target to depth image progress feature extraction and feature, and the initial hand gestures of target are optimized to export the learning model of the initial hand gestures of target, to improve accuracy, the high efficiency of posture tracking.Calculating device resource (such as GPU (Graphics Processing Unit, graphics processor)) without height can be thus achieved real-time hand gestures tracking.

Description

The method, apparatus and system of hand gestures tracking

Technical field

A kind of tracked the present invention relates to technical field of computer vision more particularly to hand gestures method, apparatus and it is System.

Background technique

With the demand of the universal and field of human-computer interaction of depth transducer, hand gestures identification based on depth data and Tracking is one of important channel indispensable in novel human-machine interaction technology, is had a wide range of applications, such as gesture is distant Unmanned plane is controlled, gesture manipulates domestic robot, somatic sensation television game and assisted medical surgery etc..With tradition based on RGB image Hand gestures estimation is compared, and depth data can provide the three-dimensional distance information of hand.

Hand gestures identification in existing human-computer interaction requires user's hand to be parallel to camera imaging plane, however The hand gestures of people often have a certain degree with horizontal plane, so that hand gestures identification and tracking accuracy be not high.

Or existing hand method for tracing uses fixed hand model mostly, different user has various sizes of Hand can reduce tracking precision using the hand of fixed form, it is sometimes desirable to use progress hand shape in advance according to different users Calibration, cumbersome, posture tracks low efficiency, reduces the usage experience of user.

Summary of the invention

The present invention provides the method, apparatus and system of a kind of hand gestures tracking, to improve accuracy, the height of posture tracking Effect property calculates device resource (such as GPU (Graphics Processing Unit, graphics processor) can be real without high Now real-time posture tracking.

In a first aspect, a kind of method of hand gestures tracking provided in an embodiment of the present invention, comprising:

Obtain the depth image comprising hand region；

According to the depth image, target hand gestures are obtained by target network model；Wherein, the target network mould Type refers to trained in advance, identifies to obtain the initial hand of target for carrying out feature extraction and feature to the depth image Posture, and the initial hand gestures of the target are optimized to export the learning model of target hand gestures.

In a kind of possible design, the depth image comprising hand region is obtained, comprising: shoot using depth camera Depth image comprising hand region.

In a kind of possible design, according to the depth image, target hand appearance is obtained by target network model Before state, further includes:

It constructs posture and initializes network model, wherein the posture initialization network model includes hand Global localization branch Road, hand gestures classification branch；The hand Global localization branch is used to extract the characteristic point of hand in the depth image, and root Hand overall situation posture is exported according to the characteristic point；The hand gestures classification branch is used to extract the feature of the depth image Point, and matching classification is carried out according to the characteristic point of default hand reference attitude and the characteristic point of the depth image, obtain hand Current local pose；Target network model is obtained by training dataset training pose refinement module.

In a kind of possible design,

The pose refinement module is specifically used for carrying out the hand overall situation posture and the current local pose of the hand Matching fusion obtains the initial hand gestures of the target, is optimized using target equation to the initial hand gestures of the target, Obtain target hand gestures.

In a kind of possible design, comprising the target equation of a variety of majorized function items in the pose refinement module, lead to The majorized function item that default constraint condition constructs the pose refinement module is crossed, wherein majorized function item, including hand gestures are excellent Change corresponding multiple majorized function items and the corresponding majorized function item of hand shape optimum.

A kind of second aspect, device of hand gestures tracking provided in an embodiment of the present invention, appoints using in such as first aspect Method described in one, the target network model include: by multiple convolutional layers, normalization layer and relu active coating, pond layer, Non- pond layer, heating power figure layer, full articulamentum and relu active coating and the softmax layers of hand Global localization branch separately constituted Road, hand gestures classification branch.

In a kind of possible design, the hand Global localization branch, comprising: multiple convolutional layers, normalization layer and Relu active coating, pond layer and non-pond layer, wherein the convolutional layer is used to extract the characteristic point of hand in depth image；It is described Normalization layer is used to be arranged the numberical range of the characteristic point；The relu active coating is used to characteristic point output be enhancing table The characteristic pattern reached；The pond layer compresses the characteristic pattern of the Enhanced expressing, obtains compressive features figure so that characteristic pattern Become smaller, the non-pond layer for the compressive features figure to be amplified on scale, and with the maximum likelihood of artis away from It is exported from density map related.The heating power figure layer is for generating the initial global position of hand.

In a kind of possible design, the hand gestures classification branch, comprising:

It is multiple convolutional layers successively combined, the normalization layer and the relu active coating, the pond layer, described Full articulamentum and the relu active coating and softmax layers described.Wherein, the convolutional layer is used to extract the spy of depth image Point is levied, the normalization layer is used to limit the numberical range of the characteristic point；The relu active coating is used for the characteristic point Output is the characteristic pattern of Enhanced expressing, and the pond layer compresses the characteristic pattern of the Enhanced expressing, obtains compressive features To scheme so that characteristic pattern becomes smaller, the full articulamentum is used to be attached the compressive features figure to obtain local feature figure, Softmax layers are used to carry out the local feature figure classification acquisition hand currently local appearance by exporting the probability of the posture of hand State.

The third aspect, a kind of system of hand gestures tracking provided in an embodiment of the present invention, comprising: memory and processing Device is stored with the executable instruction of the processor in memory；Wherein, the processor is configured to hold via described in execution The method that row instructs to execute the described in any item hand gestures trackings of first aspect.

A kind of fourth aspect, computer readable storage medium provided in an embodiment of the present invention, is stored thereon with computer journey Sequence realizes the method for the described in any item hand gestures trackings of first aspect when the program is executed by processor.

The present invention provides the method, apparatus and system of a kind of hand gestures tracking, this method, including obtaining includes hand area The depth image in domain；According to the depth image, target hand gestures are obtained by target network model；Wherein, the target Network model refers to trained in advance, identifies for carrying out feature extraction and feature to the depth image, to export mesh Mark the learning model of hand gestures.To improve accuracy, the high efficiency of posture tracking, device resource (such as GPU is calculated without high (Graphics Processing Unit, graphics processor)) real-time posture tracking can be thus achieved.Pass through target network mould Type can be realized the level processes such as hand gestures optimization, hand shape optimum and hand gestures shape combined optimization, can be with Meet the accuracy and requirement of real-time of posture tracking simultaneously.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is the flow chart of the method for the hand gestures tracking that the embodiment of the present invention one provides；

Fig. 2 is the flow chart of the method for hand gestures provided by Embodiment 2 of the present invention tracking；

Fig. 3 is the structural schematic diagram of network model in the method for hand gestures provided by Embodiment 2 of the present invention tracking；

Fig. 4 is the flow chart of the method for the hand gestures tracking that the embodiment of the present invention three provides；

Fig. 5 is part effect diagram in the method for the hand gestures tracking that the embodiment of the present invention three provides；

Fig. 6 is the structural schematic diagram of the system for the hand gestures tracking that the embodiment of the present invention four provides.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein such as can in addition to The sequence other than those of diagram or description is implemented herein.In addition, term " includes " and " having " and their any change Shape, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, product Or equipment those of is not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for these The intrinsic other step or units of process, method, product or equipment.

How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.

Realize that hand gestures are tracked, and hand gestures tracking is reduced using fixed hand model mostly in the prior art Accuracy rate, while reducing user experience.

Fig. 1 is the flow chart of the method for the hand gestures tracking that the embodiment of the present invention one provides, as shown in Figure 1, this implementation Example in method may include:

S101, the depth image comprising hand region is obtained.

It is specific that the depth image comprising hand region is shot using depth camera.

Depth image is comprising the image or image channel with the distance dependent information on the surface of the scenario objects of viewpoint. Wherein, each pixel value of depth image includes the actual range of sensor distance object, and is had between pixel one-to-one Corresponding relationship, store digit used in each pixel can with measurement image color resolution ratio correspond.Obtain depth The method of image may include passive ranging method for sensing or active depth sensing method.It is most common in passive ranging sensing Method is binocular stereo vision, and this method obtains two width figures of Same Scene by two video cameras separated by a distance simultaneously Picture finds corresponding pixel in two images by Stereo Matching Algorithm, then calculates time difference information according to triangle principle, And parallax information can be used for characterizing the depth information of object in scene by conversion.Initiative range measurement sensing is compared to passive ranging Sensing most distinct feature is that: equipment itself needs emitted energy to complete the acquisition of depth information.This guarantees depth maps Acquisition of the acquisition of picture independently of color image.The method of active depth sensing mainly include TOF (Time of Flight, Flight time), structure light, laser scanning etc..

It include the three dimensional representation information of hand region in the present embodiment, in depth image, the gray value of each pixel can For characterizing distance of the hand region certain point apart from video camera in scene.Generally pass through stereocamera or TOF camera It obtains.If having the interior calibrating parameters of camera, depth image can be converted to a cloud.Wherein, TOF camera obtains depth map The principle of picture is: by emitting continuous near-infrared pulse to target scene, being then reflected back with sensor reception by object Light pulse.The phase difference of light pulse by comparing transmitting light pulse and by object reflection, can calculate to obtain light pulse it Between transmission delay so that obtain distance of the object relative to transmitter, finally obtain an amplitude deepness image.

S102, according to depth image, pass through target network model and obtain target hand gestures；Wherein, target network model Refer to trained in advance, identify to obtain the initial hand gestures of target for carrying out feature extraction and feature to depth image, And the initial hand gestures of target are optimized to export the learning model of target hand gestures.

In the present embodiment, target network model is different from classics CNN neural network, includes multiple convolution successively combined Layer, normalization layer and relu active coating, at least one pond layer, non-pond layer and multiple full articulamentums successively combined and Relu active coating.By identifying to obtain the initial hand of target to the depth image progress feature extraction comprising hand region and feature Posture, and the initial hand gestures of target are optimized to export the learning model of target hand gestures.

The present embodiment is by obtaining the depth image comprising hand region；According to depth image, pass through target network model Feature extraction is carried out to depth image and feature identifies to obtain the initial hand gestures of target and to the progress of target initial hand gestures Optimize to export the learning model of target hand gestures.To improve accuracy, the high efficiency of posture tracking, equipment is calculated without high Resource (such as GPU (Graphics Processing Unit, graphics processor) can be thus achieved real-time hand gestures and chase after Track.

Fig. 2 is referred to based on the above embodiment, and Fig. 2 is the stream of the method for hand gestures provided by Embodiment 2 of the present invention tracking Cheng Tu, this method is also wrapped before obtaining target hand gestures by target network model according to depth image in the present embodiment Step S201 and step S202 are included, i.e. method in the present embodiment includes:

S201, building posture initialize network model, wherein it includes hand Global localization branch that posture, which initializes network model, Road, hand gestures classification branch；Hand Global localization branch is used to extract the characteristic point of hand in depth image, and according to characteristic point It exports hand overall situation posture (i.e. the world coordinates and direction of hand)；Hand gestures classification branch is used to extract the spy of depth image Point is levied, and matching classification is carried out according to the characteristic point of default hand reference attitude and the characteristic point of depth image, hand is obtained and works as Preceding local pose (i.e. the rotation angle of each artis of hand).The 3D of hand overall situation posture, that is, hand entirety in space is sat Mark and 3 rotary freedoms, and hand shape is contained in hand overall situation posture.The current local pose of hand refers to each pass of hand The rotation angle of node.

With specific reference to Fig. 3, Fig. 3 is the knot of network model in the method for hand gestures provided by Embodiment 2 of the present invention tracking Structure schematic diagram, as shown in figure 3,1 is convolutional layer, normalization layer and relu active coating, 2 be pond layer, and 3 be non-pond layer, and 4 be hot Try hard to layer, 5 be full articulamentum and relu active coating, and 6 be softmax layers.Wherein building posture initializes network model, the posture Initialization network model includes hand Global localization branch, hand gestures classification branch and pose refinement module, wherein hand Global localization branch can successively include multiple combined convolutional layers, normalization layer and relu active coating, pond layer and non-pond Change layer, wherein convolutional layer is used to extract the characteristic point of hand in depth image, and normalization layer is used to limit the numerical value of these characteristic points Determine to a certain range, to eliminate adverse effect caused by singularity characteristics point, Relu active coating, which is used to export characteristic point, is The characteristic pattern of Enhanced expressing, pond layer compress this feature figure, obtain compressive features figure so that characteristic pattern becomes smaller, non-pond Layer is used to amplify compressive features figure on scale, and related with the maximum likelihood Distance Density figure output of artis.Most Eventually by heating power figure layer for generating the initial global position of hand.

Hand gestures classification branch can successively include multiple combined convolutional layers, normalization layer and relu active coating, pond Change layer, full articulamentum and relu active coating and softmax layers (softmax layer), wherein convolutional layer is for extracting depth The characteristic point of image, normalization layer are used for by the numerical definiteness to a certain range of these characteristic points, to eliminate singularity characteristics Adverse effect caused by point, relu active coating are used to characteristic point exporting the characteristic pattern for Enhanced expressing, and pond layer is to this feature Figure is compressed, and obtains compressive features figure so that characteristic pattern becomes smaller, full articulamentum is for compressive features figure to be attached to obtain Local feature figure, softmax layers are worked as by exporting the probability of the posture of hand for carrying out classification acquisition hand to local characteristic pattern Preceding local pose.

Pose refinement module is used to hand overall situation posture carrying out matching with hand current pose to merge to obtain as initial value Hand initial model is generated to the initial hand gestures of target, then the initial hand gestures of target are carried out using target equation excellent Change, finally obtains target hand gestures.S202, target network model is obtained by training dataset training pose refinement module.

Specifically, pose refinement module is used to hand overall situation posture carrying out matching with the current local pose of hand to merge To the initial hand gestures of target, the initial hand gestures of target are optimized using target equation and finally obtain target hand appearance State.

In the present embodiment, training dataset include a large number of users opens, be closed, pinch take, clench fist and the hands such as scissors move These depth images input posture of training dataset is initialized network model, instructed by successive ignition by the depth image of work Practice pose refinement module and obtains target network model.

S203, the depth image comprising hand region is obtained.

S204, according to depth image, pass through target network model and obtain target hand gestures.

In the present embodiment, step S203~step S204 specific implementation process and technical principle are shown in Figure 1 Associated description in method in step S101~step S102, details are not described herein again.

In the present embodiment, the depth image comprising hand region is obtained, network model is initialized by training posture and is obtained Target network model, and then according to depth image, target hand gestures are obtained by target network model.To improve posture tracking Accuracy, high efficiency, calculate device resource (such as GPU (Graphics Processing Unit, graphics process without high Device)) real-time posture tracking can be thus achieved.

With specific reference to Fig. 4, Fig. 4 is the flow chart of the method for the hand gestures tracking that the embodiment of the present invention three provides.Such as Fig. 4 It is shown, the present embodiment hand gestures tracking method can by target network model obtain target hand gestures the step of it Before, it further include that step S300 obtains the depth image comprising hand region, the specific implementation process and technical principle of step S300 Associated description in method shown in Figure 1 in step S101, details are not described herein again.

S301, the characteristic point for extracting hand in the depth image, and hand overall situation posture is exported according to the characteristic point.

Specifically, exporting hand root node according to hand root node (including the three-dimensional coordinate of artis) and its position Then the heatmap (thermodynamic chart) of position obtains space coordinate (i.e. 3 of hand of root node using gaussian probability models fitting Space coordinate freedom degree), the direction of characteristic point cloud is then calculated using principal component analysis (PCA) method, i.e. the three of hand are complete Office is rotated towards freedom degree.The wherein coordinate at root node, that is, hand wrist.

S302, the characteristic point for extracting the depth image, and according to the characteristic point of default hand reference attitude and the depth The characteristic point of degree image carries out matching classification, obtains the current local pose of hand.Specifically, the spy that will be extracted in depth image Sign point is matched with the characteristic point of default hand reference attitude, and classification obtains the current local pose of hand.Wherein hand benchmark Posture may include that hand opens, and hand closure, hand pinches take Deng movement postures, but do not include hand overall situation posture.

Wherein, step S301~step S302 does not limit sequencing.

S303, the hand overall situation posture match with the current local pose of the hand and merges to obtain the target Initial hand gestures optimize the initial hand gestures of the target using target equation, obtain target hand gestures.

Specifically, being carried out after the initial hand gestures of the target are carried out hand gestures optimization processing in step S303 Hand shape and hand gestures shape combined optimization；Obtain optimization hand gestures, target hand shape, target hand gestures.

And comprising the target equation of a variety of majorized function items in pose refinement module, appearance is stated by default constraint condition building The majorized function item of state optimization module, wherein majorized function item include hand gestures optimize corresponding various majorized function items and The corresponding majorized function item of hand shape optimum；Pose refinement module described in repetitive exercise obtains target network model.

In the present embodiment, hand overall situation posture merge estimating by pose refinement module with the current local pose of hand The hand model of the initial hand gestures combination variable size of target generates hand initial model, and it is excellent successively to carry out hand gestures Change processing, carries out hand shape and hand gestures shape optimum, obtains optimization hand gestures, target hand shape, target hand Posture, that is, target hand model.

With specific reference to Fig. 5, Fig. 5 is effect signal in part in the method for the hand gestures tracking that the embodiment of the present invention three provides Figure.As shown in figure 5, depth image is before hand initial model at black portions expression this feature point (dotted not with black) (i.e. the non-overburden depth image of hand initial model) indicates that the hand at this feature point is initial with the dotted black portions of black Before depth image, white portion indicates to match between hand initial model and depth image at this feature point to be melted model It closes, and error is within ± 5 millimeters.Therefore preferable optimum results are that hand initial model matches (i.e. white with depth image Part is The more the better).

Specifically, from left to right, first figure indicates the initial hand gestures of target by exporting after target network model, A rough hand gestures are estimated, construction generates hand initial model, it can be seen from the figure that the hand initial model There are biggish matching errors with range image registration.Second figure is only to carry out pose refinement results, it can be seen that hand The thumb of portion's initial model has been matched with depth image after optimization, obtain optimization hand gestures, but other fingers by It is not matched in the reason in length.Third figure is the optimization for only carrying out hand shape, it can be seen that hand introductory die The length of hand matches with depth image by optimization, obtains target hand shape in type.Last figure is through receiving and distributing After portion's posture shape combined optimization, obtains target hand gestures and generate target hand model.

Simultaneously in optimization process, the prior-constrained of a variety of hands, such as the collision of hand, joint rotation, timing are introduced Information etc. is successively carrying out hand gestures optimization processing, will during carrying out hand shape and hand gestures shape combined optimization Majorized function item that each optimization process includes sets weight, and consider different hands various different conditions (such as Whether from serious shielding, if lose, if motion blur is larger).And by presetting described in above-mentioned prior-constrained condition building The majorized function item of pose refinement module, repetitive exercise pose refinement model obtain target network model.

By a series of continuous range image sequence It, t is corresponding time index, and hand model is expressed as H (θ^t, β), wherein θ^tIt is the attitude parameter of t moment hand, β is the form parameter of hand, and the purpose of the algorithm of hand gestures estimation is to find One θ^tMeet following target equation (i.e. formula one):

It indicates hand region, is indicated using xIn a bit, using posture initialization network model can obtain To an initial solution of the equationThen initial solution substitution equation can be classified as:

Current hand gestures can be solved using optimization method using formula (two), but due to the high-freedom degree of hand It is caused from blocking and depth image noise, obtained solution may not meet the truth in actual environment, it is possible to Prior-constrained using a variety of hands limits posture, and the joint of hand has mainly been used to rotate limitation, the collision limitation of hand, hand Temporal constraint limitation, formula difference it is as follows:

The joint rotation limitation E of hand_bound:

Wherein,WithIndicate the range of the rotation angle in i-th of joint, ω₁Indicate that hand joint rotation angle is corresponding Minimum angles weight coefficient, ω₂Indicate the weight coefficient of the corresponding maximum angle of hand joint rotation angle, λ₃Represent the mesh The several corresponding weight coefficients of the offer of tender

The collision of hand limits E_col:

Wherein λ₄The corresponding weight coefficient of objective function item is represented, corresponding integral sign is sought in δ representative, and d represents Xi At a distance from Xj finger tip, χ (i, j) is used to indicate whether i-th of joint collides with j-th of joint, J_skel(x_i) indicate the joint pair The Jacobian matrix answered, the matrix for the first-order partial derivative that Jacobian matrix arranges in a certain way, determinant are known as Jacobi Determinant, in an alternative embodiment, the first three rows of Jacobian matrix are known as position Jacobian matrix, represent the hand overall situation Posture, rear three row are known as position and orientation matrix, represent the current local pose of hand, and can use the progress of IK Solvers related content It calculates.

The temporal constraint of hand limits Etemp:

Wherein k_iIndicate the current coordinate in i-th of joint,Then indicate the position of its former frame estimation.

With reference to the following table 1, hand gestures optimization processing is different during hand shape and hand gestures shape combined optimization The permission of each majorized function item is arranged in stage.

Table 1

In the present embodiment, hand gestures (mainly for hand joint rotation angle) and hand shape are (each mainly for hand The length of a bone) there is different dimensions, it directly optimizes simultaneously, easily falls into local minimum solution.Pass through target network mould The weight of each majorized function item during hierarchy optimization is arranged in type, can meet hand gestures optimization simultaneously and hand shape is excellent The estimation of change, for example, estimate which joint bone be blocked do not need optimization etc..The accurate of posture tracking is realized simultaneously Property and real-time.

Fig. 6 is the structural schematic diagram of the system for the hand gestures tracking that the embodiment of the present invention four provides, as shown in fig. 6, this The system 40 of the hand gestures tracking of embodiment may include: processor 41 and memory 42.

Memory 42 (such as realizes application program, the function of above-mentioned hand gestures method for tracing for storing computer program Module etc.), computer instruction etc.；

Above-mentioned computer program, computer instruction etc. can be with partitioned storages in one or more memories 42.And Above-mentioned computer program, computer instruction, data etc. can be called with device 41 processed.

Processor 41, for executing the computer program of the storage of memory 42, to realize method that above-described embodiment is related to In each step.

It specifically may refer to the associated description in previous methods embodiment.

Processor 41 and memory 42 can be absolute construction, be also possible to the integrated morphology integrated.Work as processing When device 41 and memory 42 are absolute construction, memory 42, processor 41 can be of coupled connections by bus 43.

The server of the present embodiment can execute the technical solution in method shown in Fig. 1, Fig. 2, Fig. 4, implement Journey and technical principle are referring to the associated description in method shown in Fig. 1, Fig. 2, Fig. 4, and details are not described herein again.

The present invention provides the method, apparatus and system of a kind of hand gestures tracking, this method, including obtaining includes hand area The depth image in domain；According to depth image, target hand gestures are obtained by target network model；Wherein, target network model Refer to trained in advance, identify to obtain the initial stem posture of target for carrying out feature extraction and feature to depth image, And the initial hand gestures of target are optimized to export the learning model of target hand gestures.To improve the accurate of posture tracking Property, high efficiency, calculate device resource (such as GPU (Graphics Processing Unit, graphics processor)) i.e. without high Real-time posture tracking may be implemented.It can be realized hand gestures optimization, hand shape optimum and hand by target network model The level processes such as portion's posture shape combined optimization can meet the accuracy and requirement of real-time of posture tracking simultaneously.

In addition, the embodiment of the present application also provides a kind of computer readable storage medium, deposited in computer readable storage medium Computer executed instructions are contained, when at least one processor of user equipment executes the computer executed instructions, user equipment Execute above-mentioned various possible methods.

Wherein, computer-readable medium includes computer storage media and communication media, and wherein communication media includes being convenient for From a place to any medium of another place transmission computer program.Storage medium can be general or specialized computer Any usable medium that can be accessed.A kind of illustrative storage medium is coupled to processor, to enable a processor to from this Read information, and information can be written to the storage medium.Certainly, storage medium is also possible to the composition portion of processor Point.Pocessor and storage media can be located in ASIC.In addition, the ASIC can be located in user equipment.Certainly, processor and Storage medium can also be used as discrete assembly and be present in communication equipment.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of method of hand gestures tracking characterized by comprising

Obtain the depth image comprising hand region；

According to the depth image, target hand gestures are obtained by target network model；Wherein, the target network model is Finger is trained in advance, for identifying to obtain the initial hand appearance of target to depth image progress feature extraction and feature State, and the initial hand gestures of the target are optimized to export the learning model of target hand gestures.

2. the method according to claim 1, wherein obtaining the depth image comprising hand region, comprising:

It include the depth image of hand region using depth camera shooting.

3. the method according to claim 1, wherein passing through target network model according to the depth image Before acquisition target hand gestures, further includes:

It constructs posture and initializes network model, wherein the posture initialization network model includes hand Global localization branch, hand Portion's posture classification branch；The hand Global localization branch is used to extract the characteristic point of hand in the depth image, and according to institute State characteristic point output hand overall situation posture；The hand gestures classification branch is used to extract the characteristic point of the depth image, and According to the characteristic point of the characteristic point of default hand reference attitude and the depth image matching classification is carried out, it is current to obtain hand Local pose；Target network model is obtained by training dataset training pose refinement module.

4. according to the method described in claim 3, it is characterized in that, the pose refinement module is specifically for complete by the hand Office's posture carries out matching with the current local pose of the hand and merges to obtain the initial hand gestures of the target, uses target equation The initial hand gestures of the target are optimized, target hand gestures are obtained.

5. according to the method described in claim 3, it is characterized in that including a variety of majorized function items in the pose refinement module Target equation, construct the majorized function item of the pose refinement module by presetting constraint condition, wherein majorized function item packet It includes hand gestures and optimizes corresponding multiple majorized function items and the corresponding majorized function item of hand shape optimum.

6. a kind of device of hand gestures tracking, using such as described in any item methods of claim 3-5, which is characterized in that institute Stating target network model includes: by multiple convolutional layers, normalization layer and relu active coating, pond layer, non-pond layer, thermodynamic chart Layer, full articulamentum and relu active coating and softmax layers of hand Global localization branch, the hand gestures separately constituted, which are classified, to be propped up Road.

7. device according to claim 6, which is characterized in that the hand Global localization branch, comprising: multiple convolution Layer, normalization layer and relu active coating, pond layer and non-pond layer, wherein the convolutional layer is for extracting hand in depth image Characteristic point；The normalization layer is used to be arranged the numberical range of the characteristic point；The relu active coating is used for feature Point output is the characteristic pattern of Enhanced expressing；The pond layer compresses the characteristic pattern of the Enhanced expressing, and it is special to obtain compression Sign figure is so that characteristic pattern becomes smaller, and the non-pond layer on scale for amplifying the compressive features figure, and and joint The maximum likelihood Distance Density figure output of point is related；The heating power figure layer is for generating the initial global position of hand.

8. device according to claim 6, which is characterized in that the hand gestures classification branch, comprising:

Multiple convolutional layers successively combined, the normalization layer and the relu active coating, described connect the pond layer entirely Connect layer and the relu active coating and softmax layers；Wherein, the convolutional layer is used to extract the characteristic point of depth image；Institute State the numberical range that normalization layer is used to limit the characteristic point；The relu active coating is used to characteristic point output be increasing The characteristic pattern of strongly expressed；The pond layer compresses the characteristic pattern of the Enhanced expressing, obtains compressive features figure so that spy Sign figure becomes smaller；The full articulamentum is for being attached the compressive features figure to obtain local feature figure；It is softmax layers described Probability by exporting the posture of hand is used to carry out the local feature figure classification and obtains the current local pose of hand.

9. a kind of system of hand gestures tracking characterized by comprising memory and processor store in memory State the executable instruction of processor；Wherein, the processor is configured to want via executing the executable instruction and carry out perform claim The method for asking the described in any item hand gestures trackings of 1-5.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method of the described in any item hand gestures trackings of claim 1-5 is realized when execution.