CN102855488A

CN102855488A - Three-dimensional gesture recognition method and system

Info

Publication number: CN102855488A
Application number: CN2011101865359A
Authority: CN
Inventors: 王西颖; 任海兵; 张帆
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2011-06-30
Filing date: 2011-06-30
Publication date: 2013-01-02

Abstract

The invention provides a 3D (Three-dimensional) gesture recognition method and system. The 3D gesture recognition system comprises a feature extracting unit, a matching unit and a support vector machine (SVM) classifier, wherein the feature extracting unit is used for extracting a grid depth feature (GDF) from a present frame of an input video sequence, and extracting level setup moment (LSM) features and/or curvature histogram (HOC) features; the matching unit is used for matching the GDF feature extracted by the feature extracting unit with GDF features of a plurality of clustering templates obtained from off-line view clustering, thus obtaining orientation information of a gesture in the present frame; and the SVM classifier is used for recognizing the gesture in the present frame based on the LSM features extracted by the feature extracting unit and/or the HOC features and the orientation information obtained by the matching unit. According to the 3D gesture recognition method and system, the orientation of a hand is not limited, and the problem that the hand of an actor may be shielded by the actor is solved successfully.

Description

Three-dimension gesture recognition methods and system

Technical field

The present invention relates to the Intelligent Recognition field, specifically, relate to a kind of three-dimensional (3D) gesture identification method and system.

Background technology

Immersion (immersive) giant display based on virtual reality needs the people to be undertaken alternately by traditional approach.But mostly current computing machine/user interactions is simple interactive mode if adopting, and have mutual obstacle, so the consumer more is ready to be undertaken alternately by the mode of multimedia or virtual reality.For example, computer keyboard provides the keyboard mutuality performance, but this keyboard mutuality is not intuition, and TV remote controller can allow the user feel more intuition, but the interactive performance that TV remote controller provides is limited.In addition, not only heaviness but also expensive of some compliant interfaces (for example, instrument protective clothing).

Current, most of existing gesture recognition systems only are two dimension (2D) gesture identification, and are limited to some certain viewing angles, in addition, also be difficult to difference towards the gesture perception also very difficult.Therefore, need a kind ofly to be not limited to certain viewing angles and not to be subjected to towards the 3D gesture identification method and the system that affect.

Summary of the invention

According to an exemplary embodiment of the present invention on the one hand, a kind of 3D gesture recognition system is provided, described 3D gesture recognition system comprises: feature extraction unit, from the present frame of video sequence of input, extract trellis depth feature GDF feature, and extract and be horizontally disposed with moment LSM feature and/or curvature histogram HOC feature; Matching unit, the GDF feature that feature extraction unit is extracted is mated with the GDF feature of being undertaken by off-line in a plurality of cluster templates that the view cluster obtains, to obtain the orientation information of the gesture in the present frame; The support vector machines sorter, the orientation information that the LSM feature of extracting based on feature extraction unit and/or HOC feature and matching unit obtain is identified the gesture in the present frame.

Described 3D gesture recognition system also can comprise: the time domain confirmation unit, calculate the probability that present frame belongs to certain gestures according to the recognition result of a plurality of previous frames of video sequence, and the gesture of maximum probability be defined as the gesture of present frame.

The time domain confirmation unit can be determined by following formula the gesture of present frame:

c＝arg max(p(c _i))

p(c _i)＝prob((r _i＝c _i)/(r _i-1＝c ₁，...，r _i-n＝c _n))；c _i＝1，2，...，N

Wherein, r _I-1... r _I-nThe recognition result that represents a plurality of previous frames, r _iThe current recognition result of expression present frame, c _iRepresent i kind gesture, N represents the sum of gesture, and n represents the quantity of previous frame, p (c _i) probability of expression i kind gesture, prob () expression is used for obtaining the function of probability, c represent will with the gesture of the corresponding gesture of maximum probability as present frame.

Matching unit can calculate by following formula the GDF feature f of expression extraction and the GDF feature T of template _iBetween the distance of similarity:

dis (f, T_{i}) = Σ_{n = 1}^{N} w_{n} \cdot | f_{n} - T_{in} |

Wherein, N is intrinsic dimensionality, w _nThe weight that represents the n dimensional feature; f _nRepresent the n dimensional feature; T _InThe n dimensional feature that represents i template.

Matching unit can with apart from the template institute mark of minimum in being defined as present frame gesture towards, thereby obtain described orientation information.

Feature extraction singly can be divided into a plurality of with present frame, comes the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece is carried out normalization, to obtain the GDF feature of present frame.

Feature extraction unit can be calculated not bending moment (region invariant the moments) feature of image-region of each flat seam of present frame, then makes up the feature of all flat seams, to consist of the LSM feature of present frame.

But the curvature value of feature extraction unit computation bound pixel carries out histogram analysis to the curvature value of boundary pixel, thereby obtains the HOC feature of present frame.

Can obtain the svm classifier device by following operation: each sample image from the view sample database extracts the GDF feature; Based on the GDF feature each sample image in the view sample database is carried out cluster by the K-medoids method, to obtain orientation information; By orientation information the view sample is made marks; The view sample of having done mark is carried out the SVM training, thereby obtain the svm classifier device.

According to another exemplary embodiment of the present invention, a kind of 3D gesture identification method is provided, described 3D gesture identification method may further comprise the steps: extract trellis depth feature GDF feature from present frame to be identified, and extraction is horizontally disposed with moment LSM feature and/or curvature histogram HOC feature; The GDF feature extracted and the GDF feature of being undertaken by off-line in a plurality of cluster templates that the view cluster obtains are mated, to obtain the orientation information of the gesture in the present frame; The svm classifier device is identified gesture in the present frame based on the orientation information of the LSM feature of extracting and/or HOC feature and acquisition.

Described 3D gesture identification method also can comprise: the recognition result according to a plurality of previous frames calculates the probability that present frame belongs to certain gestures, and the gesture of maximum probability is defined as the gesture of present frame.

Determine the gesture of present frame by following formula:

c＝arg max(p(c _i))

p(c _i)＝prob((r _i＝c _i)/(r _i-1＝c ₁，..，r _i-n＝c _n))；c _i＝1，2，...，N

Can calculate by following formula the GDF feature f of expression extraction and the GDF feature T of template _iBetween the distance of similarity:

dis (f, T_{i}) = Σ_{n = 1}^{N} w_{n} \cdot | f_{n} - T_{in} |

Wherein, N is intrinsic dimensionality, w _nThe weight that represents the n dimensional feature; f _nRepresent the n dimensional feature; T _InThe n dimensional feature that represents i template.Will apart from the template institute mark of minimum in being defined as present frame gesture towards, thereby obtain described orientation information.

The step of the trellis depth feature GDF feature that can extract from the present frame of input comprises: present frame is divided into a plurality of, come the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece carried out normalization, to obtain the GDF feature of present frame.

The step of the trellis depth feature LSM feature extracted can comprise from the present frame of input: calculate the invariant moment features of image-region of each flat seam of present frame, then make up the feature of all flat seams, to consist of the LSM feature of present frame.

The step of the trellis depth feature HOC feature extracted can comprise from the present frame of input: the curvature value of computation bound pixel, carry out histogram analysis to the curvature value of boundary pixel, thereby obtain the HOC feature of present frame.

According to another exemplary embodiment of the present invention, a kind of 3D gesture recognition system is provided, described 3D gesture recognition system can comprise: feature extraction unit, from present frame to be identified, extract trellis depth feature GDF feature; The actual tree RBT sorter that strengthens is based on the gesture in the GDF feature identification present frame of feature extraction unit extraction.

c＝arg max(p(c _i))

Feature extraction unit can be divided into a plurality of with present frame, comes the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece is carried out normalization, to obtain the GDF feature of present frame.

Can obtain the RBT sorter by following operation: extract the GDF feature in each sample image from the view sample database, the GDF feature of extracting is added that class label comes the sample image in the view sample database is trained, to obtain the RBT sorter.

According to another exemplary embodiment of the present invention, a kind of 3D gesture identification method is provided, described 3D gesture identification method can may further comprise the steps: extract trellis depth feature GDF feature from present frame to be identified; The actual tree RBT sorter that strengthens is based on the gesture in the GDF feature identification present frame that extracts.

Described 3D gesture identification method also can comprise step: the recognition result according to a plurality of previous frames of video sequence calculates the probability that present frame belongs to certain gestures, and the gesture of maximum probability is defined as the gesture of present frame.

Can determine by following formula the gesture of present frame:

c＝arg max(p(c _i))

p(c _i)＝prob((r _i＝c _i)/(r _i-1＝c ₁，...，r _i-n＝c _n))c _i＝1，2，...，N

The step of extracting the GDF feature can comprise: present frame is divided into a plurality of, comes the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece is carried out normalization, to obtain the GDF feature of present frame.

According to an exemplary embodiment of the present invention 3D gesture identification method and system do not have hand towards restriction, and can successfully solve self occlusion issue.

Description of drawings

By the detailed description of carrying out below in conjunction with accompanying drawing, above and other objects of the present invention, characteristics and advantage will become apparent, wherein:

Fig. 1 illustrates the according to an exemplary embodiment of the present invention block diagram of the structure of 3D gesture recognition system;

Fig. 2 illustrates the according to an exemplary embodiment of the present invention diagram of LSM feature;

Fig. 3 illustrates the according to an exemplary embodiment of the present invention diagram of HOC feature;

Fig. 4 illustrates the process flow diagram that how to carry out according to an exemplary embodiment of the present invention the view cluster and how to train the method for svm classifier device;

Fig. 5 shows the different views of a gesture in the view sample database;

Fig. 6 is the block diagram that illustrates according to the structure of the 3D gesture recognition system of another exemplary embodiment of the present invention;

Fig. 7 is the diagram that the principle of RBT sorter is shown;

Fig. 8 is the process flow diagram that illustrates according to the 3D gesture identification method of another exemplary embodiment of the present invention;

Fig. 9 is the process flow diagram that illustrates according to the 3D gesture identification method of another exemplary embodiment of the present invention;

Figure 10 illustrates 8 kinds of gesture-type and the schematic diagram of a certain gesture under different angles.

Embodiment

Now, describe more fully according to example embodiment of the present invention with reference to the accompanying drawings.

Here need to prove, 3D gesture recognition system and method are for video sequence and real time execution for convenience of description, thinks that with the present frame of video sequence depth image is described according to an exemplary embodiment of the present invention.That is to say, for Real time identification, below the depth image mentioned corresponding to the present frame in the video sequence.

Fig. 1 illustrates the according to an exemplary embodiment of the present invention block diagram of the structure of 3D gesture recognition system.

As shown in Figure 1, the 3D gesture recognition system can comprise feature extraction unit 101, matching unit 102, support vector machine (SVM) sorter 103 according to an exemplary embodiment of the present invention.

The below is described in detail the unit among Fig. 1 with reference to Fig. 1.

Feature extraction unit 101 is extracted the hand feature from the depth image to be identified of input, here, be that the depth image of inputting only comprises that hand is that example is described.According to another exemplary embodiment of the present invention, if the depth image of input is the image that also comprises other features except comprising hand, then the 3D gesture recognition system also can comprise pretreatment unit according to an exemplary embodiment of the present invention, described pretreatment unit can carry out pre-service to depth image before carrying out feature extraction depth image being input to feature extraction unit 101, from depth image, detecting the image section of the pre-sizing that only comprises hand, and this image section is input to feature extraction unit 101.For example, described pretreatment unit can be hand detecting unit and hand tracking cell.

Feature extraction unit 101 hand feature of extracting from the depth image of input can comprise trellis depth feature (GDF), and can comprise and be horizontally disposed with moment (LSM) feature and/or curvature histogram (HOC) feature.The below will be described in detail above-mentioned feature.

Here how Expressive Features extraction unit 101 extracts the GDF feature of depth image.Feature extraction unit 101 is divided into a plurality of with depth image, comes the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece is carried out normalization, to obtain the GDF feature of depth image.

Specifically, feature extraction unit 101 can be divided into a plurality of with depth image, for example, depth image is marked off the individual piece of 4 * 4 (only for for example, for example, can mark off 8 * 8,16 * 16 according to large young pathbreaker's depth image of depth image here).In each piece, come the depth value of computing block by the mean value of pixel in the piece, this depth value is normalized to [0,100], then depth value is set to-100 less than the GDF value of the background piece of certain threshold level, depth value is that the GDF value of the remarkable piece of maximal value (100) is set to 150, and the GDF value of the piece of depth value between certain threshold level and maximal value is set in normalized mode, thereby obtains the GDF value of this depth image.

Here how Expressive Features extraction unit 101 extracts the LSM feature of depth image.The LSM feature is a kind of for the different depth value of extracting hand and the 3D depth level feature of describing the depth profile in each flat seam (level) zone.The combination of all flat seams can represent the depth profile in whole hand zone in the depth image: the relative size of each flat seam, relative position and provincial characteristics.But then the invariant moment features of the image-region of each flat seam of feature extraction unit 101 compute depth images makes up the feature of all flat seams, to consist of the feature of depth image.

Fig. 2 illustrates the according to an exemplary embodiment of the present invention diagram of LSM feature.

Shown in Fig. 2 (a), at first, along Z-direction Range Image Segmentation is become several parts (level) according to the depth value of pixel.That is the scope of, dividing equably depth value z from foremost (baseline Fig. 2 (a)) to terminal (wrist).Shown in Fig. 2 (b), there is the hand zone of three levels, the hand zone of these three levels is respectively the zone that represents with different gray scales.

Here how Expressive Features extraction unit 101 extracts the HOC feature of depth image.

The HOC feature is a kind of 2D shape facility, and this 2D shape facility is described the curvature distribution of boundary pixel with statistical.The curvature value of the boundary pixel of feature extraction unit 101 compute depth images carries out histogram analysis to the curvature value of boundary pixel, thereby obtains the HOC feature of depth image.

Specifically, the curvature value of the boundary pixel of feature extraction unit 101 compute depth images, curvature value to boundary pixel carries out statistics with histogram, the curvature range of histogrammic each interval representative is set, the number of the mid point that each of statistic histogram is interval, then statistical value is carried out normalization, thereby obtain the HOC feature of depth image.

More particularly, feature extraction unit 101 can be extracted by carrying out following operation the HOC feature of depth image:

(1) curvature of computation bound pixel is at first calculated at a p with following formula _iThe curvature value at place:

c_{s} = \arccos (\overset{&RightArrow;}{p_{(i - s)} p_{i}} \cdot \overset{&RightArrow;}{p_{i} p_{(i + s)}})

Wherein, p _(i-s)And p _(i+s)For a p _iHave two end points of step-length s, wherein, i-s and i+s are in image range.By a p _(i-s)With a p _iBetween vector

And some p _iAnd p _(i+s)Between vector

Vector product determine the symbol of curvature; Calculate a plurality of curvature values for a plurality of step-length s, then these a plurality of curvature values are averaged to obtain about a p _iCurvature value; Then all curvature values are carried out statistics with histogram;

(2) curvature range of histogrammic each interval representative is set;

(3) then the number of each interval mid point of statistic histogram carries out normalization to statistical value, thereby obtains the HOC feature of depth image.

Fig. 3 illustrates the according to an exemplary embodiment of the present invention diagram of HOC feature.

As shown in Figure 3, transverse axis has been divided into 14 parts with curvature range, and the longitudinal axis is the statistics ratio value, and four curves are respectively HOC curve corresponding to four kinds of gestures shown in the left side with Fig. 3, and wherein, the HOC eigenwert is exactly the ratio value sequences of 14 dimensions.

Feature extraction unit 101 sends to matching unit 102 with the GDF feature of extracting, and LSM feature and the HOC feature of extracting sent to svm classifier device 103.

Here the operation of profile matching unit 102.Matching unit 102 receives the GDF feature of being extracted by feature extraction unit 101, and receive by off-line and carry out a plurality of cluster templates that the view cluster obtains (of the corresponding gesture of each cluster template towards, with reference to Fig. 5 the view cluster is described in detail after a while).Matching unit 102 mates the GDF feature and the cluster template that receive.Specifically, the GDF feature of the depth image that matching unit 102 will be extracted by feature extraction unit 101 (here, for the ease of statement, be called sample characteristics f) with carry out GDF feature in a plurality of cluster templates that the view cluster obtains by off-line (for the ease of statement, individual features in the template is called T) mate, can obtain by following formula for expression sample characteristics f and template characteristic T _iBetween the distance of similarity measurement:

dis (f, T_{i}) = Σ_{n = 1}^{N} w_{n} \cdot | f_{n} - T_{in} |

Distance is less, represents more similar.Therefore, at last will apart from the template institute mark of minimum in being defined as depth image gesture towards.

The operation of svm classifier device 103 is described here.

Because the current gesture-type that will identify belongs to the multiclass identification problem, and traditional SVM is binary classifier, so need to convert single multiclass problem to a plurality of binary class problems.By mode (but not one-to-many manner) one to one, each class is made up a binary classifier.Classify by maximum stratagem ensuring success ballot (MWV) strategy, in described MWV strategy, each sorter distributes sample to a class in two classes, then the class of distributing is increased by 1 ballot, and final, the class that will have maximum ballots is defined as sample classification.

Svm classifier device 103 receives LSM feature and/or HOC feature from feature extraction unit 101, receive orientation information from matching unit 102, and the LSM feature that utilize to receive and/or HOC feature and orientation information classify, thereby identify the type of gesture in the depth image.

According to another exemplary embodiment of the present invention, in order to improve the accuracy of identification, the 3D gesture recognition system also can comprise time domain confirmation unit 104 according to an exemplary embodiment of the present invention.

Because the input of system is video sequence (for example, can come capture video sequences by depth camera), so there is the time domain continuity in the positional information between the consecutive frame.The gesture identification result who supposes the front n frame (a plurality of previous frame) of present frame is r _I-1... r _I-n, time domain confirmation unit 104 can calculate the probability that present frame belongs to certain gestures according to the recognition result of the front n frame of present frame, and the gesture of maximum probability is defined as the gesture of present frame.Specifically, the current recognition result of supposing present frame is r _i, time domain confirmation unit 104 can following formula by utilizing the time domain continuity to improve gesture identification result's correctness:

c＝arg max(p(c _i))

Time domain is confirmed to be the effective means that solves noise or data loss problem.Because in some frames, there is serious data loss problem in camera in the zone of hand (especially closing on finger position).By the operation of time domain confirmation unit 104, even have the vicious identification label of frame tool that some depth datas are lost, but the net result that is closed on frame and affect may be correct.

Fig. 4 illustrates the process flow diagram that how to carry out according to an exemplary embodiment of the present invention the view cluster and how to train the method for svm classifier device.

Here need to prove, the training of view cluster and svm classifier device all is that off-line is finished, and is asynchronous with the on-line operation of describing before.

In the view sample database, there is in advance the different views of the different gesture-type of catching from different people.By non-supervisory view cluster, automatically with similar towards view be grouped together.Then for each group (cluster), according to gesture-type the view sample is marked and to train for SVM.

With reference to Fig. 4, carry out feature extraction at step S401, each sample image from the view sample database extracts the GDF feature.

Fig. 5 shows the different views of a gesture in the view sample database.

As shown in Figure 5, will divide according to level and vertical direction towards the space, the left side two figure of Fig. 5 have provided the angular range schematic diagram of level and vertical direction; The rightest figure of Fig. 5 is the diagram that the images of gestures under the various visual angles is shown.

Here, two kinds of different methods that are used for making up the viewdata storehouse are arranged:

(1) collects by carry out view from 3D hand model composograph: at first, produce the 3D hand model; Different produce different views towards parameter by inputting.

(2) the actual images of gestures from piece of video is carried out the view collection: in video, catch a plurality of views of gesture, and manually select different views from video.

After step S401, based on the GDF feature each sample image in the view sample database is carried out cluster by the K-medoids method, to obtain orientation information (step S402), here, because it is strong that the GDF feature is described the performance of 3D orientation information of gesture, so in the K-medoids method, use the GDF feature.Can all gesture views be divided into a plurality of clusters based on the GDF feature, in each cluster, the view sample have similar towards with different gesture-type.All cluster centres consist of template set.Specifically, all there is the gesture sample of a cluster centre in each cluster, the gesture of this cluster centre towards can be used as this cluster towards typical value.Be divided in the assorting process this cluster sample towards being considered to cluster centre towards identical, thereby all cluster centres are consisted of template set.

At step S403, can be based on coming the view sample is made marks by the orientation information that cluster obtains at step S402.

At step S404, the view sample of having done mark is carried out the SVM training, thereby obtain the svm classifier device.The svm classifier device is the supervised classifier for statistical classification and regretional analysis, and wherein, SVM is illustrated in the lineoid in the higher dimensional space, realizes good separation by the lineoid apart from the distance maximum of the nearest training data point of any class.

By above description as can be known, adopted the combination of cluster and SVM according to the training of the svm classifier device of this exemplary embodiment, its basic thought is the combination of unsupervised learning method and supervised learning method, by non-supervisory clustering method, automatically with similar towards view be grouped together, then for each group (cluster), according to gesture-type the view sample is marked for the SVM training of supervision.By the combination of unsupervised learning method and supervised learning method, can obtain high performance svm classifier device.

Here need to prove, can change according to actual needs the structure of 3D gesture recognition system.For example, if only need to know orientation information for a gesture, then according to the present invention the 3D gesture recognition system of another exemplary embodiment only needs comprise feature extraction unit 101 and matching unit 102; Only do not need to know gesture-type information if do not need to know orientation information, then can adopt the 3D gesture recognition system of describing below with reference to Fig. 6.

Fig. 6 is the block diagram that illustrates according to the structure of the 3D gesture recognition system of another exemplary embodiment of the present invention.

As shown in Figure 6, the 3D gesture recognition system according to another exemplary embodiment of the present invention can comprise feature extraction unit 201 and actual tree (RBT) sorter 202 that strengthens.

Feature extraction unit 201 is extracted the GDF feature from depth image to be identified, and the GDF feature of extracting is sent to RBT sorter 202 as input.

The feature extraction unit 201 here is similar with the feature extraction unit 101 shown in Fig. 1, is not described in detail.Below described the GDF feature in detail, also be not described in detail here.

The GDF feature that RBT sorter 202 extracts based on feature extraction unit 201 is carried out gesture-type identification.

Here need to prove, RBT sorter 202 also obtains through off-line training.Specifically, extract the GDF feature in each sample image from the view sample database, the GDF feature of extracting is added that class label comes the sample image in the view sample database is trained, to obtain RBT sorter 202.

Fig. 7 is the diagram that the principle of RBT sorter is shown.

As shown in Figure 7, the left figure of Fig. 7 represents that RBT is the decision tree (DT) of cascade, and right figure is the structure of a decision tree.

One of advantage of existing method based on strengthening is: both be used for feature selecting because strengthen, fork is used for the sorter training, so training process saves time very much.Yet RBT sorter 202 has cascade structure, and this cascade structure has a plurality of decision trees as Weak Classifier, and described Weak Classifier relatively easily is used for training.In addition, the simple feature of using in decision tree and the existing method (for example, the Haar feature) is compared, and decision tree is stronger sorter.

Strengthening algorithm is a kind of method of seeking the high supposition of accuracy (classifying rules) by a plurality of weak supposition of combination (sorter), and wherein, the accuracy of each in described a plurality of weak supposition (sorter) only is medium.Call the enhancing algorithm to seek a small amount of Weak Classifier h by iteration, then described a small amount of Weak Classifier h is combined into a strong classifier H, improve the performance of Weak Classifier.Actual Adaboost is a kind of method that scale-of-two Adaboost method is extended to the multicategory classification problem.

The Weak Classifier that uses in RBT is decision tree.Decision tree is a kind of binary number, and each nonleaf node of this binary tree just in time has 2 child nodes, and the output of decision tree is not the class mark, but actual value.Begin recursively to make up decision tree from root node.Separate root node with all training datas.In each node, seek optimal decision rule (that is, optimal separation) based on the Geordie purity rubric.Then, recursively separate left sibling and right node, until satisfy stop condition, for example, reach the depth capacity of tree branch, perhaps all samples in the node belong to same class etc.

In the process of classification, be in the situation of m (m ∈ [1, N]) at the supposition sample labeling, calculate the output of RBT, and select to have the RBT of maximum output.Obtain final classification results shown in the following formula 3.

m = \arg \max (Σ_{i = 1}^{N} v_{mi}); m = 1,2, . . ., N

Wherein, m is possible classification results (m ∈ [1, N]), v _MiIt is the output valve of i decision tree when the expectation mark of sample is m.

Because the RBT sorter is known for a person skilled in the art, therefore no longer RBT sorter 202 is conducted further description here.

According to another exemplary embodiment of the present invention, in order to improve the accuracy of identification, the 3D gesture recognition system also can comprise time domain confirmation unit 203 according to an exemplary embodiment of the present invention.

Because the input of system is video sequence (for example, can come capture video sequences by depth camera), so there is the time domain continuity in the positional information between the consecutive frame.The gesture identification result who supposes the front n frame (a plurality of previous frame) of present frame is r _I-1... r _I-n, time domain confirmation unit 203 can calculate the probability that present frame belongs to certain gestures according to the recognition result of the front n frame of present frame, and the gesture of maximum probability is defined as the gesture of present frame.Specifically, the current recognition result of supposing present frame is r _i, time domain confirmation unit 203 can following formula by utilizing the time domain continuity to improve gesture identification result's correctness:

c＝arg max(p(c _i))

Time domain is confirmed to be the effective means that solves noise or data loss problem.Because in some frames, there is serious data loss problem in camera in the zone of hand (especially closing on finger position).By the operation of time domain confirmation unit 203, even have the vicious identification label of frame tool that some depth datas are lost, but the net result that is closed on frame and affect may be correct.

Fig. 8 is the process flow diagram that illustrates according to the 3D gesture identification method of another exemplary embodiment of the present invention.

At step S801, from the depth image to be identified of input, extract the hand feature.The hand feature can comprise the GDF feature, and can comprise LSM feature and/or HOC feature, above-mentioned feature is described in detail, no longer is repeated in this description here.

At step S802, the GDF in the hand feature of extracting is mated with the view template that obtains in advance, to obtain the orientation information of hand.Below when the matching unit 102 among Fig. 1 is described, described in detail and how to have mated, no longer be repeated in this description here.

At step S803, the orientation information that obtains based on the LSM feature of extracting at step S801 and/or HOC feature and at step S802 is by using svm classifier device identification gesture-type.Describe in detail with reference to Fig. 1 and Fig. 4 about training and the use svm classifier device identification gesture-type of svm classifier device, no longer be repeated in this description here.

According to another exemplary embodiment of the present invention, domain validation step when the 3D gesture identification method also can comprise.Domain validation the time domain confirmation unit 104 among Fig. 1 is described in detail, no longer is repeated in this description here when relevant.

Fig. 9 is the process flow diagram that illustrates according to the 3D gesture identification method of another exemplary embodiment of the present invention.

At step S901, from the depth image to be identified of input, extract the hand feature.

At step S902, the RBT sorter is identified gesture-type based on the feature of extracting.

Step S901 and the step 801 among Fig. 8 among Fig. 9 are similar, are not described in detail here.By having been carried out, the operation of the RBT sorter 202 among Fig. 6 is described in detail here in detail.

In this application, use two databases: tranining database (the view sample database of mentioning namely) and rating database.The gesture view of the different people of being caught by depth camera from different visual angles during tranining database.Current, have the 8 kinds of different gesture-type (referring to (a) of Figure 10) from 10 people, everyone each gesture has 200 views, and 200 * 8 * 10=16000 view photo namely arranged in tranining database.For rating database, for among other 5 people everyone, from different visual angles for different gestures (with reference to (b) of Figure 10, (b) of Figure 10 shows for the view of a certain gesture under different angles, wherein, angular region is divided according to grid position) arbitrarily take 1000 views.In 2 meters distance, catch training data and evaluating data by the PrimeSense camera.The resolution of depth image is 640 * 480 pixels.

Owing to having the gesture identification of two kinds of situations, so estimate respectively two kinds of methods.Being configured to of PC computing machine that is used for experiment: Pentium processor IV, in save as 2GB, CPU is 3.2GHZ.For the VC+SVM method, be set to 9 for the group quantity of view cluster, here, 9 with (b) of Figure 10 in 9 zones corresponding.Use all view sample training.In the view cluster, produce 9 bunches based on the GDF feature by the K-medoids method, and number of iterations is set to 20.The quantity of the training sample of each svm classifier device is 1800.For SVM, use the LSM+HOC assemblage characteristic.It is 30 that the initial parameter gamma is set to 0.0068, C, upgrades these values after 10 iteration are confirmed.In the SVM training process, according to the type of training sample with towards coming the mark training sample.So in each bunch, the mark of gesture is as follows: for example, " towards upper left ", " towards the lower-left " or " fist is towards a left side ".

For 5 evaluating data collection, recognition result following (referring to table 1):

The result of table 1VC+SVM method

The evaluating data collection	Towards accuracy (%)	Gesture accuracy (%)
			DS_1	83.2	88.4
DS_2	88.5	92.2
			DS_3	85.2	83.9
DS_4	79.5	86.6
			DS_5	85.9	86.1
On average	84.4	87.4

For the RBT method, when beginning, the degree of depth of decision tree is 5, and the quantity of decision tree is 100.Some different parameters of performance evaluation have been attempted being used for: at first, attempted the varying number (table 2 has provided the result who obtains by a half-sample and all samples) of training sample; Then, attempted the varying number of Weak Classifier (decision tree): 100,200 and 400, and be displayed in Table 3 the result.

Table 2 has the result of the RBT of different training samples

Table 3 has the result of the RBT of different Weak Classifiers

Can find out from above evaluation result, can obtain preferably classifying quality according to 3D gesture identification method of the present invention and system.

In addition, according to multidimensional gesture identification method of the present invention and system solved hand towards restriction, and can successfully solve self occlusion issue.

Claims

1. 3D gesture recognition system, described 3D gesture recognition system comprises:

Feature extraction unit is extracted trellis depth feature GDF feature from the present frame of video sequence of input, and extracts and be horizontally disposed with moment LSM feature and/or curvature histogram HOC feature;

Matching unit, the GDF feature that feature extraction unit is extracted is mated with the GDF feature of being undertaken by off-line in a plurality of cluster templates that the view cluster obtains, to obtain the orientation information of the gesture in the present frame;

The support vector machines sorter, the orientation information that the LSM feature of extracting based on feature extraction unit and/or HOC feature and matching unit obtain is identified the gesture in the present frame.

2. 3D gesture recognition system as claimed in claim 1 also comprises: the time domain confirmation unit, calculate the probability that present frame belongs to certain gestures according to the recognition result of a plurality of previous frames of video sequence, and the gesture of maximum probability be defined as the gesture of present frame.

3. 3D gesture recognition system as claimed in claim 2, wherein, the time domain confirmation unit is determined the gesture of present frame by following formula:

c＝arg max(p(c _i))

p(c _i)＝prob((r _i＝c _i)/(r _i-1＝c ₁，...，r _i-n＝c _n))；c _i＝1，2，....，N

4. 3D gesture recognition system as claimed in claim 1, wherein, matching unit calculates GDF feature f that expression extracts and the GDF feature T of template by following formula _iBetween the distance of similarity:

dis (f, T_{i}) = Σ_{n = 1}^{N} w_{n} \cdot | f_{n} - T_{in} |

Matching unit will apart from the template institute mark of minimum in being defined as present frame gesture towards, thereby obtain described orientation information.

5. 3D gesture recognition system as claimed in claim 1, wherein, feature extraction unit is divided into a plurality of with present frame, comes the depth value of computing block by the mean value that calculates the degree of depth of pixel in each piece, and the depth value of each piece carried out normalization, to obtain the GDF feature of present frame.

6. 3D gesture recognition system as claimed in claim 1, wherein, feature extraction unit is calculated the invariant moment features of image-region of each flat seam of present frame, then makes up the feature of all flat seams, to consist of the LSM feature of present frame.

7. 3D gesture recognition system as claimed in claim 1, wherein, the curvature value of feature extraction unit computation bound pixel carries out histogram analysis to the curvature value of boundary pixel, thereby obtains the HOC feature of present frame.

8. 3D gesture recognition system as claimed in claim 1 wherein, obtains the svm classifier device by following operation:

Each sample image from the view sample database extracts the GDF feature;

Based on the GDF feature each sample image in the view sample database is carried out cluster by the K-medoids method, to obtain orientation information;

By orientation information the view sample is made marks;

The view sample of having done mark is carried out the SVM training, thereby obtain the svm classifier device.

9. 3D gesture identification method, described 3D gesture identification method may further comprise the steps:

From present frame to be identified, extract trellis depth feature GDF feature, and extraction is horizontally disposed with moment LSM feature and/or curvature histogram HOC feature;

The GDF feature extracted and the GDF feature of being undertaken by off-line in a plurality of cluster templates that the view cluster obtains are mated, to obtain the orientation information of the gesture in the present frame;

The svm classifier device is identified gesture in the present frame based on the orientation information of the LSM feature of extracting and/or HOC feature and acquisition.

10. 3D gesture recognition system, described 3D gesture recognition system comprises:

Feature extraction unit is extracted trellis depth feature GDF feature from present frame to be identified;

The actual tree RBT sorter that strengthens is based on the gesture in the GDF feature identification present frame of feature extraction unit extraction.

11. a 3D gesture identification method, described 3D gesture identification method may further comprise the steps:

From present frame to be identified, extract trellis depth feature GDF feature;

The actual tree RBT sorter that strengthens is based on the gesture in the GDF feature identification present frame that extracts.

12. 3D gesture identification method as claimed in claim 11, described 3D gesture identification method also comprises step: the recognition result according to a plurality of previous frames of video sequence calculates the probability that present frame belongs to certain gestures, and the gesture of maximum probability is defined as the gesture of present frame.

13. 3D gesture identification method as claimed in claim 11, wherein, obtain the RBT sorter by following operation: extract the GDF feature in each sample image from the view sample database, the GDF feature of extracting is added that class label comes the sample image in the view sample database is trained, to obtain the RBT sorter.