CN109033978A

CN109033978A - A kind of CNN-SVM mixed model gesture identification method based on error correction strategies

Info

Publication number: CN109033978A
Application number: CN201810684333.9A
Authority: CN
Inventors: 冯志全; 李健
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-12-18
Anticipated expiration: 2038-06-28
Also published as: CN109033978B

Abstract

The present invention provides a kind of CNN-SVM mixed model gesture identification method based on error correction strategies, belongs to field of human-computer interaction.The CNN-SVM mixed model gesture identification method based on error correction strategies first pre-processes collected gesture data, then automatically extracts feature and carries out predicting that classification obtains classification results, is finally corrected using error correction strategies to the classification results.The misclassification rate easily obscured between gesture is reduced using the method for the present invention, improves the discrimination of static gesture.

Description

A kind of CNN-SVM mixed model gesture identification method based on error correction strategies

Technical field

The invention belongs to field of human-computer interaction, and in particular to a kind of CNN-SVM mixed model gesture based on error correction strategies Recognition methods.

Background technique

As computer is more more and more universal in today's society, a kind of convenient natural human-computer interaction (HCI) mode is to using It is particularly important for person.In numerous man-machine interaction mode, gesture is as a kind of natural, succinct, intuitive human-computer interaction side Formula receives the concern of more and more people, and it can play an important role in various reality scenes, as somatic sensation television game, Sign Language Recognition, intelligent wearable device and intelligent tutoring etc..The purpose of gesture identification, which is to design a kind of algorithm, makes computer The gesture that can recognize that picture or human body, understands the meaning of gesture, to realize the interaction of people and computer.Know in gesture During not, gesture is typically under complex environment, and in order to accurately carry out human-computer interaction, designed gesture identification is calculated Method should have good recognition capability under various light, angle, background and other complex environments.

Traditional Gesture Recognition Algorithm is based primarily upon hidden Markov model (HMM) and template matching.Wherein, it is based on hidden horse The gesture identification method of Er Kefu model, the model can be used for expressing the markoff process of an implicit unknown parameter, and The process of gesture identification can regard the Markov Chain containing time series as, therefore the model can be applied to gesture Identification.The information such as the profile of gesture, edge, spatial distribution are established into hand as feature based on last matched gesture identification method Gesture template, application template matching algorithm realize gesture identification.The artificial extraction feature of both methods needs, and the hand manually extracted Gesture feature needs a large amount of experiential basis, and the feature manually extracted has certain subjectivity and limitation, so that it holds Easily ignore the feature of some conspicuousnesses, therefore often recognition capability is limited and inefficient for conventional method.

Convolutional neural networks (Convolutional Neural Network, CNN) are at current machine vision and image One of most widely used model in reason field, convolutional neural networks can obtain the part of input picture and complete by training study Office's feature, solves the problems, such as manually to extract feature bring feature extraction insufficient.In recent years, convolutional neural networks have been It is successfully applied to image retrieval, recognition of face, Expression Recognition and target detection.There is scholar to know CNN applied to gesture Maximum pond layer is combined (MPCNN) with convolutional neural networks and obtained for gesture identification by other field, Jawad Nagi et al. Good effect, Takayoushi et al., which proposes a kind of depth convolutional network end to end, realizes gesture identification, simultaneously Improve the accuracy rate of gesture identification.In gesture identification using upper, what is generally used is all than shallower network, traditional quiet In state gesture identification method, time-consuming for the gesture identification method extracted based on manual features, and discrimination is low.

Summary of the invention

It is an object of the invention to solving above-mentioned problem existing in the prior art, provide a kind of based on error correction strategies CNN-SVM mixed model gesture identification method, the network of use is deeper, can learn to more profound feature, reduce mould Type is to the misclassification rate for easily obscuring gesture, the final identification for realizing static gesture.

The present invention is achieved by the following technical solutions:

A kind of CNN-SVM mixed model gesture identification method based on error correction strategies, first to collected gesture data It is pre-processed, then automatically extract feature and carries out predicting that classification obtains classification results, finally using error correction strategies to described Classification results are corrected.

The described method includes:

Step 1: being pre-processed to obtain training sample and test sample to collected data；

Step 2: obtaining CNN-SVM mixed model；

Step 3: test sample is input in the CNN-SVM mixed model that second step obtains, obtain classification results and The probability Estimation and confusion matrix of classification results；

Step 4: the probability Estimation and confusion matrix that obtain based on third step obtain error correction strategies, error correction is then utilized Strategy corrects classification results.

The operation of the first step includes:

(11) static gesture is acquired, obtains the depth image and color image of hand respectively；

(12) processing is carried out to the depth image and obtains mask images；

(13) color image and mask images are carried out obtaining coarse gesture area image with operation；

(14) after obtaining segmentation to the coarse gesture area image progress skin color segmentation using Bayes's complexion model Image, the image after segmentation is divided into two parts, a part is used as training sample, and another part is as test sample.

It is that static gesture is acquired using Kinect in the step (11).

The second step is achieved in that the last output layer that CNN classifier is replaced with SVM classifier

The operation of the second step includes:

(21) training sample is input to the input layer of CNN classifier, by the training of CNN classifier until training Process restrains or reaches maximum the number of iterations, obtains trained CNN model；

(22): the training sample being input to progress Automatic Feature Extraction in the trained CNN model and is instructed Practice the feature vector of sample；

(23): the feature vector of the training sample being input in SVM classifier and carries out second training, after the completion of training Obtain CNN-SVM mixed model.

The error correction strategies refer to: one threshold value of regulation, are screened the classification results of mistake according to the threshold value, then According to the statistical data that experiment obtains, final classification results are corrected.

The operation of 4th step includes:

In N classification problem, if M_iIt is right for a threshold value of all test samples progress error correction for being i to classification results In M_iBe described as follows:

Wherein, M_i,jExpression prediction result is i, but the mean value that true value is calculated by the sample of j, M_iIt is a j dimension Vector；S_i,jExpression prediction result is i, but true value is the quantity of all samples of j, S_iIndicate all tests for being predicted as i class The quantity of sample, P_n(i) probability Estimation of n-th of test sample in all test samples for being predicted as i class is represented Maximum value, P_n(j) second largest value is represented；The class that belongs to of maximum value in the estimation of i presentation class, second largest value category in the estimation of j presentation class In class；

When probability Estimation meets the following conditions, the corresponding class of the maximum value of probability Estimation is revised as corresponding to second largest value Class:

Wherein w_n(i) expression prediction result is the probability Estimation maximum value of i class at a distance from probability Estimation second largest value, that is, is existed Numerically equal to P_n(i)-P_n(j), p_ijIndicate the probability that classification results are i in confusion matrix but true value is j.

Compared with prior art, easily obscure between gesture the beneficial effects of the present invention are: being reduced using the method for the present invention Misclassification rate, improve the discrimination of static gesture.

Detailed description of the invention

The photo of the nine different types of gestures of Fig. 1-1.

The depth image of nine different types of gestures in Fig. 1-2 corresponding diagram 1-1

Image preprocessing deficiency block diagram in Fig. 2 the method for the present invention

Picture in Fig. 3 preprocessing process

CNN network structure used by Fig. 4 the method for the present invention

The curve of Fig. 5 test accuracy rate on different data sets

The step block diagram of Fig. 6 the method for the present invention.

Specific embodiment

Present invention is further described in detail with reference to the accompanying drawing:

The advantages of present invention combination convolutional neural networks and support vector machines, proposes a kind of mixed model to automatically extract Feature and the generalization ability for improving model, and the easy mistake for obscuring gesture is reduced based on the error correction strategies of probability Estimation with a kind of Knowledge rate.

As shown in fig. 6, the method for the present invention include: firstly, to Kinect acquisition gesture data carry out segmentation pretreatment, with Reduce the interference of complex background and human body other parts.Then, mixed model automatically extracts feature and carries out prediction classification.Most Afterwards, categorised decision is adjusted using error correction strategies.It is tested on the database established, finally obtains and do not use The discrimination of error correction strategies is 95.81%, the use of Average Accuracy is obtained after error correction strategies is 97.32%.

Data acquisition in the method for the present invention is as follows:

Present system acquires static gesture using Kinect2.0, obtains the depth image and color image of hand respectively, Then corresponding gesture database is established.The gesture library established includes 17 class gestures altogether, by 300 student enrollment not With the still image composition acquired under illumination background.The common 9 kinds of gestures of the mankind are picked in the present invention, and every kind of gesture includes 3300 pictures.Fig. 1-2, Fig. 1-2 are respectively the photo and depth image for 9 kinds of gestures that operator completes.

Data prediction is as follows:

It is not difficult to find out from collected images of gestures, although the image clearly of manpower gesture is distinguishable in color image, It is still very difficult at accurate identification, this is because collected gesture by visual angle, appearance, shape, other positions of human body and The influence of complex background.And in the depth image of acquisition, one side the depth information not color by manpower itself, textural characteristics And the influence of illumination, robustness is good, and precision is high；On the other hand, the depth information reflection in depth image is manpower to adopting Collect the distance between equipment, therefore is not very big in the depth disparity of gesture area.Because depth image is in collection process Segmentation has been carried out, therefore, can help to be partitioned into the interested gesture area of color image using the feature, to reduce The interference of other positions of human body and complex background in color image.It is as shown in Figure 2 to divide pretreated step.

In preprocessing process, the present invention by depth image binaryzation collected, due in collection process by depth Image has been converted into gray scale depth image, i.e., by depth value to value range be adjusted between gray value [0-255].By In having carried out the segmentation of gesture area to depth map in collection process, the size of gray value can use to obtain gesture The bianry image in region.Directly by mask images, (setting in a threshold value present invention to gray level image is 128, the picture greater than 128 Vegetarian refreshments is assigned a value of 1, and the pixel less than 128 is assigned a value of 0) can only obtain coarse gesture with color image progress logic and operation Area image, since in Kinect collection process, there are resolution ratio different problems, surroundings for depth image and color image Also have the interference of non-gesture pixel.Skin color segmentation is carried out to obtained coarse gesture area, utilizes Bayes's complexion model (refer to document " M.J.Jones, et al.Statistical color models with application to skin detection[J].International Journal of Computer Vision(IJCV),2002,46(1):81- 96 ") accurate gesture area image is obtained.

Divide pretreated validity in one image of the random selection of the present invention to examine, wherein color image, depth Image is respectively as Fig. 3-1 is arrived shown in Fig. 3-5 after degree image, mask images, coarse gesture area and segmentation.

It can significantly see that the segmentation pretreatment in the method for the present invention can effective place to go complex background and human body The influence of other parts, and finally also can accurately retain the effective information of gesture area using Bayes's complexion model, Good Data safeguard is provided for the training in later period.

It is as follows to mix CNN-SVM model:

SVM classifier: support vector machines is by selecting different kernel functions by the sample of low-dimensional input space linearly inseparable Originally being converted into high-dimensional feature space makes its linear separability, using institutional risk minimization principle as theoretical basis the structure in feature space Optimal hyperlane is built, has obtained the structural description to data distribution, therefore reduce and want to data scale and data distribution It asks, effectively reduces independent test collection error, it is considered to be one of the most frequently used, classifier that effect is best.

(document " Chih-Chung Chang, Chih-Jen Lin.LIBSVM:A is referred to using LIBSVM in an experiment library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST), 2011,2 (3): 1-27 ") construct SVMs, LIBSVM be it is a kind of quickly and effectively The software package for classifying and returning, more classification problems are solved using one-to-one strategy.LIBSVM can not only be predicted point Class result and the probabilistic information that the offer classification of each test sample can be provided.For a k classification problem, purpose is exactly Remove sample estimatesBelong to the probability of every one kind:

For one-to-one strategy, p_iIt is obtained by the following following optimization problem of solution:

Wherein r_ijIt is a pairs of definition of probability are as follows:

In experiment of the invention, SVMs is trained for the classification results of pre- measuring tape probability, these classification results it is general Rate value will be applied to error correction for easily obscuring gesture, go to determine whether classification results are directly applied, still through the invention A kind of used strategy is reclassified.

CNN classifier: convolutional neural networks are a kind of neural networks of depth feedforward, by image directly as the defeated of network Enter, do not need Manual definition and feature selecting, avoids the feature selecting in tional identification algorithm and the link of feature extraction, Also there is good fault-tolerant ability, parallel processing capability and self-learning capability simultaneously.

Instead of using in MPCNN, (document " Chih-Chung Chang, Chih-Jen Lin.LIBSVM:A is referred to library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST), 2011,2 (3): 1-27 ") present invention employs one kind in document “A.Krizhevsky,S.Ilya,and G.E.Hinton.Imagent classification with deep convolutional neural networks[C]//Advances in Neural Information Processing The increasingly complex CNN being previously mentioned in Systems 2 (NIPS), 2012:1106-1114 " is trained, wherein network structure As shown in Figure 4.A total of 8 layers of this network, including 5 convolutional layers and 3 full articulamentums, the last one full articulamentum output The softmax of one 9 dimension expresses the prediction for 9 classifications.First layer convolutional layer is by 224 × 224 × 3 input picture Convolution kernel with 96 11 × 11 × 3 does convolution algorithm, step-length 4.Second layer convolutional layer is by first layer by response normalization Convolution algorithm is done in output and 256 5 × 5 × 48 convolution kernels with pond.Third layer convolutional layer is with 384 3 × 3 × 256 Convolution kernel does convolution algorithm with into the second layer output for crossing normalization core pond.The convolution kernel number of 4th layer of convolutional layer is 384 Size is 3 × 3 × 192, and layer 5 convolutional layer has the convolution kernel of 256 3 × 3 × 192 sizes.Each full articulamentum has 4096 neurons.Due to the complicated network structure, the present invention copes with over-fitting by the way of amplification data collection.By from The block and horizontal mirror image that 224 × 224 are randomly selected in 256 × 256 picture realize this method, and in these receipts Neural network is trained on the block obtained.If in this way, network not will appear serious over-fitting, forces and adopt With smaller network, profound feature is not available so as to cause in SVM training.

CNN-SVM mixed model: mixing CNN-SVM model of the present invention is to use output layer last in CNN SVM replaces firstly, treated image is passed to input layer, by original CNN more times training until training process restrains Or reach maximum the number of iterations.Then training sample is inputted into trained CNN model, obtains the feature of training sample Vector is inputted SVM classifier and carries out second training, obtains CNN-SVM model after the completion of training, test sample is inputted mould Type obtains classification results.

Error correction strategies (HECS) based on probability Estimation: LIBSVM gives each sample in last prediction result It is assigned to an all kinds of probability Estimations, the classification results of final choice are wherein maximum one of probability value, and table 1 lists one The final probability distribution of the test samples of classification results mistakes is predicted a bit, and the first row in table indicates the true classification of test sample It numbers, the prediction classification number for the test sample that the secondary series in table represents, other remaining column respectively represent the sample category The probability of Mr. Yu's column, therefrom it can be observed that in the probability Estimation of the test sample of prediction error, the maximum value of estimated probability For predicted value, second largest value is just its true value.

Table 1

According to LIBSVM final decision feature and from last experimental result it is known that wrong in prediction classification results In sample accidentally, the probability Estimation gap very little that prediction is classified between true classification, and in the correct sample of prediction result, Probability Estimation gap between prediction classification and other each classification results is all bigger.The present invention is proposed according to this feature It is a kind of based on the error correction strategies of probability Estimation with reach reduce in such cases caused by classification error.In N classification problem In, the present invention uses M_iThe threshold value that error correction is carried out as all test samples that prediction result is i, for M_iDescription such as Under:

Wherein S_iIndicate the quantity for being predicted as all test samples of i class, P_n(i) it represents and is predicted as all of i class all The maximum value of n-th of test sample probability Estimation, P in picture_n(j) second largest value is represented.I indicates that maximum value belongs in probability Estimation That is a kind of, j indicate that second largest value in probability Estimation belongs to that is a kind of.

When probability Estimation meets the following conditions, the corresponding class of maximum value is revised as class corresponding to second largest value.

Wherein w_n(i) expression prediction result is the probability Estimation maximum value of i class at a distance from second largest value, i.e., numerically etc. In P_n(i)-P_n(j), p_ijIt indicates in confusion matrix, prediction result is the probability of i but true value j.

The advantages of model of the present invention, is as follows:

The present invention constructs CNN-SVM model to reach the limitation for making up CNN and SVM classifier, in conjunction with two kinds of classifiers The advantages of.The theory study method and multi-layer perception (MLP) (MLP) of convolutional neural networks (can refer to document “E.A.Zanaty.Support Vector Machines(SVMs)versus Multilayer Perception(MLP)in Data classification [J] .Egyptian Informatics Journal, 2012,13 (3): 177-183 ") Learning method is identical, therefore an extension of substantially MLP.MLP theory was being trained based on empirical risk minimization By the trained continuous minimum of mistake in journey.A minimum is found when carrying out backpropagation calculating, whether is it being Global minimum value all can make training result be converged in this point, and not continue to the solution of innovatory algorithm.SVM is in training sample In the case that the distribution of this collection is fixed, an optimal hyperplane is looked for using structure risk minimum principle, minimizes data On extensive error, therefore the generalization ability of SVM is better than MLP.

The advantages of CNN, is automatically extract the feature of input picture profound level, and input picture is in certain journey Feature still has invariance when moving and distort on degree.It needs to carry out meticulous design however, manually extracting feature, Method (such as document " the Jiang Y.An HMM based approach of traditional artificial extraction feature on gesture identification method for video action recognition using motion trajectories[C]//IEEE International Conference on Intelligent Control and Information Porcessing,2010:359-464.”、 “Liu Jie,Huang Jin,Han Dongqi,Tian Feng,el at.Template Matching Algorithm for 3D Gesture Recognition[J].Journal of Computer-Aided Design&Computer Graphics, 2016,28 (8): the method that 1365-1372 " is provided) the local visual feature that ignores hand, be merely observed that gesture profile and Colouring information, the bending of such as finger, the distance between finger this feature is all very important in terms of gesture identification.Manually The feature extraction of design is easy to ignore and lose some features.Therefore, the extraction for feature being carried out using CNN can be than tradition side Method is collected into more representative and correlation information.

Error correction strategies are actually to define a threshold value, it would be possible to and the prediction classification results of error screen, Then the statistical data obtained according to experiment, corrects final categorised decision with certain probability.Pass through CNN-SVM model Classified available good effect to sample, but for it is some due to block or collection image quality problem and The case where caused two samples are difficult to respectively not can be carried out accurate judgement, and error correction strategies proposed by the invention can be The classification results for easily obscuring sample are subjected to certain correction to improve the accuracy rate of last entirety in final decision.

The method of the present invention is tested and is analyzed as follows:

Experimental situation: in this experiment, gesture identification model running is in Windows operating system, hardware configuration are as follows: Intel (R) Core (TM) i5-6500 processor, NVIDIA GeForceGT730 inside save as 8G, video memory 2G.CNN network It is to be built by Caffe, the present invention uses radial kernel (Gaussian RBF), realizes SVM classifier using LIBSVM software package.It is real All algorithms are run in Matlab2014a platform in testing.

Experimental result be analyzed as follows:

In experiment of the invention, color image and depth image are split pretreatment first, the segmentation that will be obtained Images of gestures out total 29700 are used as data set of the invention, wherein training of 27000 pictures for model, 2700 Picture is for testing.Use 30000 times in CNN training process as maximum the number of iterations, can as seen from Figure 5, iteration System has reached convergence at about 10000 times, is finally tested using iteration 30000 times models, the standard on test set True rate is 88.35%.Then CNN-SVM model is resettled, by last full articulamentum instead of SVM classifier, by 4096 dimensions Feature vector be put into SVM and be trained and test.In present invention experiment, SVM uses RBF kernel function, in order to look for Optimal multiplication coefficient C and optimal nuclear parameter g obtains optimal result using 5 folding cross-validation methods on training set.This two The range that a parameter is found is respectively: g=[2³, 2¹..., 2^-15] and C=[2¹⁵, 2¹³..., 2^-5].11 × 10 have been attempted in total =110 kinds of different combinations finally determine C=64, g=0.00024414.Then with the two obtained parameters for mixing The training of model, the final accuracy rate to training are 99.94%, are reached in the accuracy rate of 2700 test chart on pieces 95.81%.Table 2 lists on the data set that is prepared of the present invention, using CNN and using CNN-SVM training accuracy rate with Test accuracy rate.

As seen from Figure 5, when maximum number of iterations is all 30000 times, the accuracy rate of color image is minimum, at most only 37.92% accuracy rate can be reached, depth image has apparent improvement that can reach 79.07% compared with color image, warp 88.35% can be reached by crossing the accurate highest of pretreated image.This is because directly being instructed with untreated color image When practicing, there are a large amount of noise information (information of complicated background information and human body other parts) for training sample itself, use Although the depth image being partitioned into does not have an interference of background and human body other parts, but because depth image collected be by Depth information, which has projected to, to be saved in the grayscale information of [0,255], therefore depth image also can some information Missing, and into the gesture after present invention pretreatment segmentation excessively can not only effectively place to go complex background and human body other parts be big Interference can retain the complete colour information of gesture area simultaneously, enable and extract when carrying out CNN network training More abundant feature is for classifying.Classification prediction is carried out by the way that test sample is put into mixed model, one can be counted Confusion matrix is as shown in table 2:

Table 2

In 100 tests, error correction rate is concentrated mainly between [3%, 5%], and accuracy rate collects the most at [97%, 98%] In, average error correction rate is 4.12%, Average Accuracy 97.32%.

Table 3 gives the accuracy rate of the method for the present invention and the other methods gesture identification under provided data set.With this Inventive method is different, document " Yamashita T, Watasue T.Hand posture recognition based on bottom-up structured deep conbolutional nerual network with curriculum learning[C]//Image Processing(ICIP),2014 IEEE International Conference On.IEEE, 2014:853-857 " are to have used a kind of fairly simple convolutional neural networks, by maximum pond layer and convolutional Neural Network constitutes MPCNN, and 68.89% recognition accuracy has been obtained on test set.Document " Shao-Zi Li, Bin Yu, Wei Wu,Song-Zhi Su,Rong-Rong Ji.Feature learning cased on SAE-PCA network for human gesture recognition in RGBD images[J].Neurocomputing,2015,151(2):565- 573 " use a kind of convolutional neural networks end to end, and obtaining gesture identification accuracy rate is 85.43%.Document " Xiao- Xiao Niu,Ching Y.Suen.A novel hybrid CNN-SVM classifier for recognizing Handwritten digits [J] .Pattern Recognition, 2012,45 (4): 1318-1325 " is first with depth information Hand Gesture Segmentation is carried out with Skin Color Information, then with the SAE-PCA model extraction feature based on feature learning is crossed, finally uses SVM Classifier is classified, and the final accuracy rate to gesture identification is 93.32%, and different gesture identification methods are in data of the present invention Accuracy rate on collection is as shown in table 3:

Table 3

As can be seen that the method for the present invention is obviously improved at the accurate aspect of identification compared to other methods.

The method of the present invention has carried out segmentation pretreatment to the depth data of gesture and color data first, eliminates colored number According to the influence of human body and complex background；Then the feature that gesture is extracted using convolutional neural networks, is avoided according to gesture The complex process of profile and the artificial design feature of geometrical property；The probability Estimation of gesture is carried out by support vector machines again；Finally, Based on the confusion matrix that obtained probability Estimation Binding experiment obtains propose a kind of error correction strategies to the classification results of model into Row error correction.Largely the experimental results showed that this method can effectively identify static gesture, and can optimize to a certain extent CNN-SVM category of model easily obscures the ability of gesture, can be improved the accuracy rate finally identified on the whole.

Above-mentioned technical proposal is one embodiment of the present invention, for those skilled in the art, at this On the basis of disclosure of the invention application method and principle, it is easy to make various types of improvement or deformation, be not limited solely to this Invent method described in above-mentioned specific embodiment, therefore previously described mode is only preferred, and and do not have limitation The meaning of property.

Claims

1. a kind of CNN-SVM mixed model gesture identification method based on error correction strategies, it is characterised in that: the method is right first Collected gesture data is pre-processed, and is then automatically extracted feature and is carried out predicting that classification obtains classification results, last benefit The classification results are corrected with error correction strategies.

2. the CNN-SVM mixed model gesture identification method according to claim 1 based on error correction strategies, feature exist In: the described method includes:

Step 2: obtaining CNN-SVM mixed model；

It is trained step 3: test sample is input in the CNN-SVM mixed model that second step obtains, obtains classification results And the probability Estimation and confusion matrix of classification results；

Step 4: the probability Estimation and confusion matrix that obtain based on third step obtain error correction strategies, error correction strategies are then utilized Classification results are corrected.

3. the CNN-SVM mixed model gesture identification method according to claim 2 based on error correction strategies, feature exist In: the operation of the first step includes:

(12) processing is carried out to the depth image and obtains mask images；

(14) figure after skin color segmentation is divided is carried out to the coarse gesture area image using Bayes's complexion model Image after segmentation is divided into two parts by picture, and a part is used as training sample, and another part is as test sample.

4. the CNN-SVM mixed model gesture identification method according to claim 3 based on error correction strategies, feature exist In: it is that static gesture is acquired using Kinect in the step (11).

5. the CNN-SVM mixed model gesture identification method according to claim 3 based on error correction strategies, feature exist The last output layer that CNN classifier is replaced with SVM classifier is achieved in that in: the second step.

6. the CNN-SVM mixed model gesture identification method according to claim 5 based on error correction strategies, feature exist In: the operation of the second step includes:

(21) training sample is input to the input layer of CNN classifier, by the training of CNN classifier until training process Maximum the number of iterations is restrained or reached, trained CNN model is obtained；

(22): the training sample being input in the trained CNN model and carries out Automatic Feature Extraction acquisition training sample This feature vector；

(23): the feature vector of the training sample being input in SVM classifier and carries out second training, is obtained after the completion of training CNN-SVM mixed model.

7. the CNN-SVM mixed model gesture identification method according to claim 2 based on error correction strategies, feature exist In: the error correction strategies refer to: one threshold value of regulation, are screened the classification results of mistake according to the threshold value, then foundation The statistical data obtained is tested, final classification results are corrected.

8. the CNN-SVM mixed model gesture identification method according to claim 1 based on error correction strategies, feature exist In: the operation of the 4th step includes:

In N classification problem, if M_iFor be i to classification results all test samples carry out error correction a threshold value, for M_i's It is described as follows:

Wherein, M_i,jExpression prediction result is i, but the mean value that true value is calculated by the sample of j, M_iIt is a j dimensional vector； S_i,jExpression prediction result is i, but true value is the quantity of all samples of j, S_iIndicate all test samples for being predicted as i class Quantity, P_n(i) maximum of the probability Estimation of n-th of test sample in all test samples for being predicted as i class is represented Value, P_n(j) second largest value is represented；The class that belongs to of maximum value in the estimation of i presentation class, second largest value belongs in the estimation of j presentation class Class；

Wherein w_n(i) expression prediction result is the probability Estimation maximum value of i class at a distance from probability Estimation second largest value, i.e., in numerical value It is upper to be equal to P_n(i)-P_n(j), p_ijIndicate the probability that classification results are i in confusion matrix but true value is j.