CN110046544A

CN110046544A - Digital gesture identification method based on convolutional neural networks

Info

Publication number: CN110046544A
Application number: CN201910147442.1A
Authority: CN
Inventors: 张国山; 赵阳
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2019-07-23

Abstract

The present invention relates to a kind of digital gesture identification method based on convolutional neural networks, including the following steps: using the images of gestures of Kinect depth camera acquisition 10 classes number, images of gestures is filtered；The sample set of each digital gesture characterization is established using filtered image, the method is as follows: morphology pretreatment is carried out to filtered images of gestures；Classification marker image information, obtains the sample set of each digital gesture characterization, and classifies and training set and test set is made；Construct convolutional neural networks-CNN；Training sample set is inputted, characteristics of image is extracted, carries out classification based training；The image in test data set is identified using the convolutional neural networks after training.

Description

Digital gesture identification method based on convolutional neural networks

Technical field

The present invention relates to deep learnings, field of image processing, and in particular to one kind is based on the hand of convolutional neural networks (CNN) Gesture identification.

Background technique

Gesture identification is always a popular research topic, and digital gesture identification will solve the acquisition of data, image Processing and selection, input sample expression selection, pattern recognition classifier device selection, and based on sample set to identifier into Row has the problems such as the training of supervision.

Gesture is person-to-person communication, inalienable part in exchange, and Gesture Recognition also open the mankind with With the development of science and technology, Gesture Recognition is from by extraneous auxiliary by the brand-new situation interacted between machine, equipment or computer Help the data glove era development of equipment to pattern classification stage based on computer vision currently a popular view-based access control model Gesture identification is divided into segmentation, 3 stage Hand Gesture Segmentations of feature extraction and identification are the bases of gesture identification, and target is from background Gesture is partitioned into complicated image since the colour of skin there are certain Clustering features in color space, current most of hand Gesture dividing method is all the color characteristic (YUV, HSV, YCbCr etc.) or geometrical characteristic (such as model of ellipse, artwork using the colour of skin Type) complete.The Main way studied at present is: current research work is all separately to carry out gestures detection and identification, On the basis of the continuous refinement of identification technology, how by applied mathematics morphology, neural network algorithm, genetic algorithm new technology It applies in gesture identification.

The maximum difficult point of gesture identification research is: the video that data processing stage system acquires camera carries out frame point From processing, single images of gestures is separated from video frame, and the pretreatments such as smooth, sharpening are made to data.Then it detects Whether there is images of gestures, if detecting images of gestures, images of gestures and background are subjected to separating treatment.Gesture analysis rank Section carries out feature detection to gesture, then is estimated with selected gesture model

Individual features parameter.Identify that sorting phase, will using each sorting algorithm by feature extraction and model parameter estimation Point or track in parameter space are categorized into different subspaces, finally convert specific meanings for reality for identification information Using.The influence of illumination and pixel etc. can all bring different degrees of influence to the accuracy of identifying system.And Kinect Depth image is unrelated with ambient lighting and shade, and pixel can clearly express the morphology of scenery, and Kinect is deep Degree camera is that Microsoft is a motion perception input equipment that its Xbox360 game host and windows platform PC make, and is made For a body-sensing peripheral hardware, it is actually that a 3D body-sensing using completely new space orientation technique (Light Coding) images Head, for Kinect there are three camera lens, intermediate camera lens is RGB color video camera, and the right and left camera lens is then respectively infrared ray transmitting The 3D depth inductor that device and infrared C MOS video camera are constituted.It is defeated using instant dynamic capture, image identification, microphone Enter, speech recognition, the functions such as community interactive, player is allowed to pass through natural user interface skill using body gesture and voice command Art is interacted with Xbox 360.

As the important component of computer intelligence interface, digital gesture identification has great importance, the technology The service efficiency that can greatly improve computer is improved, is the fields such as office automation, smart home, robot interactive control The following optimal input mode.Currently, there are problems to be mainly reflected in three aspects: the 1) acquisition of data set for gesture identification Problem；2) how the gesture pose detection in picture is accurately separated, is image by the problem of pretreatment of images of gestures Main aspect in test problems；3) digital gesture identification and neural network are combined, reaches recognition effect most It is good.

Summary of the invention

The object of the present invention is to provide a kind of recognition effect, more preferably the digital gesture based on convolutional neural networks is known automatically Other method.Technical solution is as follows:

A kind of digital gesture identification method based on convolutional neural networks, including the following steps:

(1) using the images of gestures of Kinect depth camera acquisition 10 classes number, images of gestures is filtered；Benefit The sample set of each digital gesture characterization is established with filtered image, the method is as follows: form is carried out to filtered images of gestures Learn pretreatment；Classification marker image information, obtains the sample set of each digital gesture characterization, and classifies and training set and test is made Collection；

(2) convolutional neural networks-CNN is constructed:

The digital images of gestures of each classification is imported convolutional neural networks by (2a), and as inputs layers, size is [320,320,3,59]；

(2b) constructs 8 layers of convolutional neural networks, carries out the behaviour such as convolution, down-sampling, pond to each pixel of input picture Make, obtains every layer of maps characteristic pattern；

The input of (2c) by every layer of output as next layer finally converges at full fc layers of connection by the layer of front and back 8, Result is exported by output layer softmax classifier；

(3) training sample set is inputted, characteristics of image is extracted, carries out classification based training；

(3a) uses softmax classifier, classifies to image feature vector；

(3b) uses convolutional neural networks algorithm, is trained to training sample set, the model .mat text after being trained Part；

(4) image in the convolutional neural networks identification test data set after training is utilized.

Wherein, the filtering method of step (1) is preferably as follows: the depth image filtering algorithm based on joint two-sided filter, Using the depth image of the images of gestures of Kinect camera lens synchronization acquisition and color image as input, with gaussian kernel function meter The space length weight of depth image and the gray scale weight of RGB color image are calculated, the two weights are multiplied to obtain joint filter Wave weight designs joint two-sided filter, carries out convolution algorithm realization with the filter result of this filter and noise image Kinect depth image filtering.

Gesture identification based on convolutional neural networks (CNN) of the invention can efficiently realize gesture feature depth map The automatic identification and output of the acquisition of picture and further denoising and gesture numerical characteristic image, recognition accuracy can To reach 93% or so.Algorithm all has certain robustness to illumination variation, simple geometry deformation and additional noise, can Related fields for the identification of digital gesture feature；Algorithm is after extension, it can also be used to the automatic knowledge of other gesture features Not.

Detailed description of the invention

Fig. 1 is algorithm flow chart of the invention.

The data set that Fig. 2 has been pre-processed.

Fig. 3 is the entirety training figure of network.

Fig. 4 is the recognition result of test set picture.

Specific embodiment

The purpose of the present invention is acquiring data set using Kinect depth camera, is pre-processed and denoised by morphological image Afterwards, then based on the convolutional neural networks of building the automatic identification of digital gesture is realized, to reach practical requirement.Mainly It comprises the steps of:

(1) sample set of each digital gesture characterization is obtained；

The characterize data collection of (1a) acquisition 10 classes number；

(1b) carries out morphology pretreatment to acquired image；

Training set and test set is made in collected digital gesture data set classification by (1c) classification marker image information；

(2) convolutional neural networks-CNN is constructed:

(3a) uses softmax classifier, classifies to image feature vector；

(4) picture in the convolutional neural networks automatic identification test data set after training is utilized.

Test sample collection is input in trained convolutional neural networks, i.e., by inputting trained model .mat text Part tests test sample collection, realizes each digital gesture picture automatic identification, outputs test result.

It is described as follows in conjunction with 1 pair of specific steps of the invention of attached drawing:

(1) sample set of each digital gesture feature is obtained；

Acquiring 10 classes includes the image of digital gesture feature as data set, wherein including expression 0,1,2,3,4,5,6, 7, each 1050 of 8,9 digital gesture, 10500 training datasets, the digital gesture characteristic pattern of each classification are to utilize in total Kinect depth camera respectively from different angles be acquired under light.For the depth image of Kinect camera lens acquisition Generally there are noise and black hole phenomenon, directly applies to and identify that its effect is poor, we utilize the depth based on joint two-sided filter Image filter arithmetic is spent, using the depth image of Kinect camera lens synchronization acquisition and color image as input.Firstly, with Gaussian kernel function calculates the space length weight of depth image and the gray scale weight of RGB color image, then weighs the two Value multiplication obtains Federated filter weight, and designs joint two-sided filter using fast Gaussian transform replacement gaussian kernel function. Finally, carrying out convolution algorithm with the filter result of this filter and noise image realizes Kinect depth image filtering.Then with Machine extraction is wherein used as training sample set for 10000, carries out manual sort's label, is left 500 and is used as test sample collection；Most Training sample set 10000, test sample collection 500 are obtained eventually.

(2) convolutional neural networks --- CNN is constructed:

The digital images of gestures of each classification is imported into convolutional neural networks, as inputs layer, size for [320, 320,3,59]；8 layers of convolutional neural networks are constructed, the behaviour such as convolution, down-sampling, pond is carried out to each pixel of input picture Make, obtains every layer of maps characteristic pattern；Every layer of output is finally converged as next layer of input by the layer of front and back 8 In full fc layers of connection, result is exported by output layer softmax classifier；

Training sample set is inputted, characteristics of image is extracted, carries out classification based training；Using softmax classifier, to characteristics of image Vector is classified；Using convolutional neural networks algorithm, training sample set is trained, the model .mat after being trained File；

The basic procedure of convolutional neural networks algorithm is as follows: the threshold value of random initializtion network weight and neuron；According to Formula (1) carries out propagated forward:

Layered method hidden neuron and output neuron are output and input；Wherein E represents output error, and d represents true Reality, w_jk, v_ijRespectively represent the weight and threshold value of each layer.

Error back propagation is carried out according to formula (2):

Wherein, θ is the learning rate parameter (θ=0.001 in the present invention) of back-propagation algorithm, and n represents input vector Number (present invention in n=320*320*3*59), m represents the number of hidden layer output vector, and (m is exported with convolutional layer in the present invention The change of vector and change), l represents the number (present invention in l=1*1*4096*59) of output layer output vector, in above formula Negative sign indicates gradient decline in the weight space, i.e., so that the weight that the value of E declines changes direction.Weight is corrected by above formula And threshold value, until meeting termination condition.

(3) the convolutional neural networks automatic identification number gesture feature set after training is utilized.

Test sample collection is input in trained convolutional neural networks, i.e., by inputting trained model .mat text Part tests test sample collection, realizes the automatic identification of each digital gesture picture, outputs test result.

Compared with prior art, the present invention having the characteristics that and advantage:

First: convolutional neural networks are applied in digital gesture feature identification by the present invention, and data set includes: 10 classes number Gesture feature image, including digital gesture each 1050, in total 10500 of expression 0,1,2,3,4,5,6,7,8,9 Training dataset, the digital gesture characteristic pattern of each classification are to utilize Kinect depth camera difference from different angles and light It is acquired under line, morphologic filtering denoising then is carried out to collected 10500 picture.Test to test set The experimental results showed that most gesture feature image all identifies correctly, as shown in Figure 4.Table 2 is current traditional algorithm and this The recognition accuracy of inventive method compares.As can be seen from the table, the recognition accuracy of the method for the present invention is relatively preferable.

Second, the present invention constructs parallel Pooling layer, which is advantageous in that: in training dataset when production Can effectively reduce top-1 (correct option of maximum probability) and top-5 when the output of raw identical dimensional, (preceding 5 probability are highest In include correct option).In the structure of CNN, feature extraction layer can be by the part of the input of each neuron and preceding layer Acceptance region is connected, while extracting the feature of the part of this layer.After once local feature is extracted, it and other feature vectors Between positional relationship also determine therewith, facilitate the extraction of feature vector.

Third, the present invention is when acquiring digital images of gestures using the Kinect depth map based on joint two-sided filter As filtering algorithm, it can preferably retain the correlated characteristic of original image, help to improve recognition correct rate.

Claims

1. a kind of digital gesture identification method based on convolutional neural networks, including the following steps:

(1) using the images of gestures of Kinect depth camera acquisition 10 classes number, images of gestures is filtered；Utilize filter Image after wave establishes the sample set of each digital gesture characterization, the method is as follows: it is pre- to carry out morphology to filtered images of gestures Processing；Classification marker image information, obtains the sample set of each digital gesture characterization, and classifies and training set and test set is made；

(2) convolutional neural networks-CNN is constructed:

The digital images of gestures of each classification is imported convolutional neural networks by (2a), as inputs layer, size for [320, 320,3,59]；

(2b) constructs 8 layers of convolutional neural networks, carries out the operation such as convolution, down-sampling, pond to each pixel of input picture, obtains To every layer of maps characteristic pattern；

The input of (2c) by every layer of output as next layer finally converges at full fc layers of connection, passes through by the layer of front and back 8 Output layer softmax classifier exports result；

(3a) uses softmax classifier, classifies to image feature vector；

(3b) uses convolutional neural networks algorithm, is trained to training sample set, the model .mat file after being trained；

2. the method according to claim 1, wherein the filtering method of step (1) is as follows: based on the bilateral filter of joint The depth image filtering algorithm of wave device, by the depth image and color image of the images of gestures of Kinect camera lens synchronization acquisition As input, the space length weight of depth image and the gray scale weight of RGB color image are calculated with gaussian kernel function, by this Two weights are multiplied to obtain Federated filter weight, joint two-sided filter are designed, with the filter result and noise pattern of this filter Kinect depth image filtering is realized as carrying out convolution algorithm.