CN108537147A

CN108537147A - A kind of gesture identification method based on deep learning

Info

Publication number: CN108537147A
Application number: CN201810242638.4A
Authority: CN
Inventors: 董训锋; 陈镜超; 李国振; 马啸天
Original assignee: Donghua University
Current assignee: Donghua University; National Dong Hwa University
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2018-09-14
Anticipated expiration: 2038-03-22
Also published as: CN108537147B

Abstract

The present invention provides a kind of gesture identification methods based on deep learning, which is characterized in that includes the following steps：Binaryzation convolutional neural networks are trained using gesture training set and test set；The colouring information reflected using the colour of skin, is split pretreated original image based on colouring information, extracts gesture profile；Judge the corresponding gesture instruction of gesture profile using the binaryzation convolutional neural networks after training；A series of corresponding dynamic gesture of gesture profiles, stop are positioned, and gesture path is tracked using TLD algorithms, the deviation in tracing process is modified using Haar classifier, reuses HMM algorithms identification dynamic gesture.Method provided by the invention can solve in traditional gesture identification generally there are accuracy of identification it is not high, stability is poor, real-time is poor, gesture function is single the problems such as.

Description

A kind of gesture identification method based on deep learning

Technical field

The present invention relates to one kind being based on deep learning gesture identification method, belongs to technical field of hand gesture recognition.

Background technology

The appearance of computer produces extremely important influence to the social production of the mankind and daily life, it is on the one hand The efficiency of information processing is greatly improved, the hair of intelligent life has on the other hand been pushed.Therefore, how efficiently easily with Computer interaction becomes the focus of people's research.

With the development of social information's technology, human-computer interaction technology (Human Computer Interaction, English letter Referred to as HCI), it has also become the important component of daily life.As a kind of emerging man-machine interaction mode, Gesture Recognition Extensive prospect of the application is suffered from many range fields：(1) digital living and amusement aspect.For example, 2008, Ericsson pushes away Go out a smart mobile phone R520m, which is acquired the gesture information of user by its built-in camera, key is served as in mobile phone interface Disk or touch screen, to realize the control to alarm clock and incoming call.(2) scientific and technical innovation field.It is led in space probation and military field engineering Domain is frequently encountered some hazardous environments or is not easy to the particular surroundings that people is in direct contact control, at this moment can be long-range by gesture Manipulation of the machine people interacts acquisition relevant information.(3) intelligent transportation field, for example, it is unmanned.Early in 2010, Google Company externally discloses their pilotless automobile, which opens the new era of intelligent transportation.

Gesture Recognition can play the role of following in human-computer interaction technique field：

(1) for a user, it helps user more easily to use product, saves user Shi Wen, promotes family body user experience；

(2) for product, the operation instruction of redundancy is eliminated, product, which uses, need to only provide relevant general gestures guidance i.e. It can.

Invention content

The technical problem to be solved by the present invention is to：Generally there are accuracy of identification is not high in traditional Gesture Recognition Algorithm, The problems such as stability is poor, and real-time is poor, and gesture function is single.

In order to solve the above-mentioned technical problem, the technical solution of the present invention is to provide a kind of, and the gesture based on deep learning is known Other method, which is characterized in that include the following steps：

Step 1 is trained binaryzation convolutional neural networks using gesture training set and test set；

After step 2, the original images of gestures of acquisition, original images of gestures is pre-processed, to remove illumination to original graph The influence as caused by；

Step 3, the colouring information reflected using the colour of skin, divide pretreated original image based on colouring information It cuts, extracts gesture profile；

The gesture profile that step 4, judgment step 3 are extracted whether be dynamic gesture rise, stop, if so, the gesture The gesture profile of a series of images extraction of profile thereafter is dynamic gesture, 6 is entered step, if it is not, then the gesture profile is Static gesture enters step 5；

Step 5 judges the corresponding gesture instruction of gesture profile using the binaryzation convolutional neural networks after training；

The corresponding dynamic gesture of step 6, a series of gesture profiles of positioning, stop, and track gesture rail using TLD algorithms Mark, the deviation in tracing process are modified using Haar classifier, reuse HMM algorithms identification dynamic gesture.

Preferably, in the step 2, the pretreatment includes brightness correction and light compensation；

When brightness correction, highlight regions in original images of gestures are corrected using modified exponential transform；For original Dark region in images of gestures, is corrected using the logarithmic transformation with parameter, to other regions then without correcting；

Light compensation is carried out based on dynamic threshold, is transformed into original images of gestures based on the theoretical algorithm of total reflection Then the set of the larger point of Y-component in YCbCr color space images is regarded white reference point by YCbCr color spaces.

Preferably, in the step 3, when being split to original image, using the colour of skin based on YCbCr color spaces Partitioning algorithm.

Method provided by the invention can solve in traditional gesture identification generally there are accuracy of identification it is not high, stablize The problems such as property is poor, real-time is poor, gesture function is single.

Due to the adoption of the above technical solution, compared with prior art, the present invention having the following advantages that and actively imitating Fruit：

The present invention improves tradition based on conventional Gesture Recognition Algorithm, has used improvement illumination compensation strategy so that Original image is easier to handle, and using improved complexion model dividing gesture to improve the accuracy of segmentation, uses improvement Depth convolutional network classifies to static gesture, improves static gesture discrimination；Using improved TLD and HMM algorithms to dynamic Gesture improves the robustness and real-time and discrimination of gesture system into line trace and identification.

Description of the drawings

Fig. 1 is the system structure signal of the design of the gesture recognition system the present invention is based on deep learning；

Fig. 2 is binaryzation convolutional neural networks structure chart of the present invention；

Fig. 3 is TLD algorithm frame figures；

Fig. 4 is the detail flowchart of TLD algorithms；

The improved TLD algorithm flow charts of Fig. 5；

Fig. 6 Design of System Software flow charts.

Specific implementation mode

Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention Rather than it limits the scope of the invention.In addition, it should also be understood that, after reading the content taught by the present invention, people in the art Member can make various changes or modifications the present invention, and such equivalent forms equally fall within the application the appended claims and limited Range.

Embodiments of the present invention are related to a kind of gesture identification method based on deep learning, as shown in Figure 1, including following Step：

Above-mentioned each step is further described with reference to embodiment：

1, include mainly to the pretreatment of original images of gestures progress in step 2：Based on exponential transform and logarithmic transformation Brightness correction, the light compensation based on dynamic threshold, specifically include：

(1) brightness correction based on exponential transform and logarithmic transformation.

Exponential transform only has good correction effect to inclined bright area in image, and logarithmic transformation is to region dark in image There is preferable correction effect, the two is combined and realizes a kind of light compensation policy for human hand, it is right as shown in formula (1) Highlight regions are corrected using modified exponential transform, for dark region, are corrected using the logarithmic transformation with parameter, To other regions without correcting.

Formula (1) is as follows using parameter：

G (x, y) indicates revised image；F (x, y) indicates original images of gestures；A indicates bloom regulation coefficient, at this A=0 in embodiment；B indicates the average brightness of image, in the present embodiment b=120/log T₁；C indicates that normal number passes through reality It tests debugging to obtain, in the present embodiment c=T₂；D indicates that (normal number obtained by experimental debugging, in the present embodiment d=1/ 255-T₂；T₁It indicates under dark illumination condition, light lower threshold, in the present embodiment T₁=115；T₂It indicates to shine item compared with light Under part, light upper limit threshold, T in the present embodiment₂=135.

(2) the light compensation based on dynamic threshold

Image is transformed into YCbCr color spaces based on total reflection theoretical algorithm, then by Y in YCbCr color spaces The set of the larger point of component regards white reference point.Its detailed flow is as follows：

Assuming that original images of gestures is f (x, y), size is m × n, then has：

Original images of gestures f (x, y) is transformed into YCbCr color skies by step 1 first with formula (2) from rgb color space Between：

Step 2 is obtained with reference to white point

(a) transformed image is cut into M × N blocks, in the present embodiment, M=3, N=4；

(b) to the block after each segmentation, C in YCbCr space is calculated separately_bAnd C_rThe average value M of component_bAnd M_r；

(c) M is used_bAnd M_rCarry out the C to each piecemeal_bAnd C_rThe mean absolute error D of component_bAnd D_rIt is calculated, is calculated Formula is formula (3)：

In formula (3), C_b(i, j) indicates offset of the B component relative to brightness of each pixel, C_r(i, j) indicates R Offset of the component relative to brightness, sum indicate the sum of all pixels mesh of current piecemeal.

When 2, being split to pretreated original image based on colouring information in step 3, using based on YCbCr colors Space skin color segmentation algorithm, specifically includes：

YCbCr color spaces are also known as YUV color spaces, and Y indicates that brightness, Cr and Cb indicate coloration and saturation degree, In, Cr reflects the difference between RGB input signals RED sector and rgb signal brightness value.And Cb reflections is RGB input letters Difference number between blue portion and rgb signal brightness value.Rgb color space is that formula (4) is shown to the conversion formula of YCrCb：

By repetition test, the basic value of parameter is as follows：

77≤C_b≤127 AND 132≤C_r≤172 (5)

But formula (5) is comprising more skin color ranges, the interval provided is excessive, therefore is readily incorporated such as orange Or brown object interference.The present invention be directed to the distinctive features of skin colors of yellow, by repeatedly debugging, in value adjusted It is whole, the interference of class colour of skin object can be effectively excluded, value is as follows：

3, binaryzation convolutional neural networks use the convolutional Neural net based on binaryzation on the basis of MOCNN in step 1 Network (BCNN), specifically includes：

Current depth convolutional neural networks algorithm is exactly that calculating consumption is huge all there are one common defects at present.Therefore, The optimization of network calculations consumption is also unfolded mainly around the two aspects.Here on the basis of MPCNN gesture classification methods, It is proposed a kind of convolutional neural networks (binary convolution neural networks, BCNN) gesture based on binaryzation Sorting technique is improved neural network using the approximate strategy of binaryzation, reduces its consumption to computing resource.Binaryzation Network reduce computing resource consumption mode it is main there are two：First, indicate original double essences using the approximate weights of binaryzation Weights are spent, the EMS memory occupation of network in the calculation is reduced；Second, it is consumed in maximum multiplication calculating to being calculated in each layer Input and weights are substituted using the approximate value of binaryzation, and such multiplication calculating can be simplified to addition and subtraction even position fortune It calculates.Including the transformation to convolution block and the transformation to full link block.

(1) binaryzation of convolution block.

The concrete mode that convolutional neural networks are carried out with binaryzation approximation transformation is as follows：

The first step carries out binaryzation according to formula (7) to the weight matrix w of convolutional network and obtains during forward-propagating w^b, and retain the weight matrix w of script, i.e.,：

In formula (7),Matrix w is obtained after representing binaryzation approximation^bIn weights, c_f、w_f、h_fIndicate volume Quantity, width and the height of product core,In the sign functions of standard, as w=0, it can obtain Sign (w)=0, and here in order to achieve the effect that binaryzation, being not allow for the 3rd value exists, so regulation takes as w=0 Sign (w)=1.

Second step, before every layer of preceding layer increasing a binaryzation active coating obtains nodal value and is jealous of, and substitutes script ReLU active coatings, as shown in formula (8), i.e.,：

In formula (8),For i-th layer of input value of binaryzation network,C, w, h indicate defeated respectively Enter the port number of image, width and height；L(X_(i-1)) it is the value that i-th layer of binaryzation active coating obtains；X_(i-1)Indicate binaryzation (i-1)-th layer of input value of network.

The function of sign is consistent with formula (8).Finally, the weight w obtained^bConvolution behaviour is carried out in binaryzation convolutional layer Make, as shown in formula (9), i.e.,：

In formula (9)：L^b(X^b) be binaryzation network layer functions；For convolution operation；X^bAsw^bPass through formula respectively (7) it is obtained with formula (8).

For convolution block, structure is also required to certain adjustment.BatchNorm layers of normalized and binaryzation are activated Layer is placed on before convolution operation, this is that the result of binaryzation active coating in order to prevent result occurs when by maximum pond layer Most of the case where being 1.Specific network structure, as shown in Figure 2.

The process of trained backpropagation is as follows.Last layer calculates gradient, and layer second from the bottom is successively reversed to first layer The gradient of the gradient and weights that calculate node is propagated, then the w retained before binaryzation is updated to obtain w^uAnd it carries out such as formula (10) loose operation, i.e.,：

In formula (10), w^uIndicate the value after the floating number right value update retained during forward-propagating；σ(w^u) indicate power Weight w^uProbability when ＞ 0；Chip () indicates max functions.

(2) binaryzation of full link block.

Binaryzation convolutional layer is replaced with unlike the binaryzation of full link block is almost the same from the binaryzation of convolution block The full articulamentum of binaryzation, and eliminate maximum pond layer.Shown in the calculation formula such as formula (11) of the full articulamentum of binaryzation.

L^b(X^b)=w^bX^b (11)

In formula (11), L^b(X^b) be binaryzation full connection layer functions；X^b, w^bPass through formula (7) respectively and formula (8) obtains.Two The full articulamentum of value eliminates biasing b.

4, in step 6, track gesture path using TLD algorithms, the deviation in tracing process using Haar classifier into Row is corrected, and the specific method for reusing HMM algorithms identification dynamic gesture includes：

4.1, TLD algorithm frames are made of three parts：Tracking, study, detection, as shown in Figure 3：

In algorithm frame, three part cooperative compensatings complete the tracking to object.In tracking module, precondition Not high for speed of moving body, object is not in significantly displacement between adjacent two frame, and tracked target exists always Within the scope of camera, moving target is estimated with this, if target disappears from the visual field, it will cause tracking to fail. In detection module, premise does not generate interference between the every frame of video, by the model for detecting and learning in the past, detection is used to calculate Method demarcates the region that is likely to occur of target respectively in every frame picture search target.When detection module when the error occurs, Study module according to tracking module obtain as a result, the mistake occurred to detection module is evaluated, generate training sample, update The mesh of detection module

The key feature points for marking model and tracking module, to avoid the occurrence of similar mistake.The detailed process of TLD algorithms Figure is as shown in Figure 4.

TLD algorithms are good to target following real-time, and when target is blocked or leaves camera shooting head region, occur again When, it can still identify by into line trace.But the algorithm needs to manually select tracked target by mouse in initialization, no Conducive to the automation of target following；Meanwhile the LBP features used in detection module readily satisfy in real time although calculating simply Property require, but tracking during will appear position deviation, cause tracking fail.Therefore this system is in original TLD algorithms On the basis of, in conjunction with the characteristics of static gesture identification and gesture tracking, following improvement is made to algorithm：

When to solve algorithm initialization, manual selected target regional issue is needed, static gesture identification database is added In detection module, when in video frame occur with gesture database match gesture when, auto-initiation TLD track algorithms.Meanwhile Due to that using trained static gesture database, then can remove the study module in original TLD algorithms, work as user gesture When changing, video frame need to be only retrieved again with the presence or absence of gesture in gesture database, then by TLD algorithm initializations, is improved TLD algorithm flows it is as shown in Figure 5.

4.2, the deviation during being tracked using Haar classifier amendment

The structure key step of Haar classifier includes extraction Haar features and training grader two parts.Haar feature masters To include central feature, linear character, edge feature and to corner characteristics.Final Haar classifier, the present invention adopt in order to obtain It is trained with improved Adaboost algorithm.First to train different Weak Classifiers from the Haar features of sample extraction, so These Weak Classifiers are integrated afterwards to obtain final strong classifier, that is, the Haar classifier needed herein.

The implementation process of improved Adaboost algorithm is as follows：

Assuming that X is sample space, Y is sample class logo collection.There is Y={ 0,1 } for typical two classification problem, remembers S={ (x_i, y_i) | i=1,2,3 ..., m } it is the training sample set being added after label, wherein there is x_i∈ X, y_i∈ Y, it is assumed that reach To iteration T times altogether when final target.

Step 1 initializes the weights of m sample：

In formula, D_t(i) sample (x in the t times iteration is indicated_i, y_i) weights.

Step 2 calculates separately t=1,2,3 ..., T：

(a) each feature f for being sample x trains a Weak Classifier h_l(x, f, p, θ)：

In formula (13), θ indicates the threshold value of the corresponding Weak Classifiers of f, and the effect of p is adjustment sign of inequality direction.It calculates and uses q_iTo the classification error rate ε after the weighting of the Weak Classifiers of all features_f：

ε_f=∑_iq_i|h_t(x, f, p, θ)-y_i| (14)

In formula (14), y_iIndicate element in sample class identifier space, q_iIndicate the weights of i-th of training sample.

(b) it picks out and possesses minimal error rate ε_tBest Weak Classifier ε_t

ε_t=min_{F, p, θ}∑_iq_i|h_t(x, f, p, θ)-y_i| (15)

(c) sample weights are corrected using best Weak Classifier：

β_t=ε_t(1-ε_t) (17)

In formula (16), D_t+1(i) probability value of the t+1 training sample is indicated,Indicate D_t+1With D_tThere are iteration passes System, can pass through D_tUpdate D_t+1。

In formula (17), β_tIndicate normaliztion constant.

If sample x_iCorrectly classified, then e_i=0, otherwise, e_i=1.

Step 3, final Haar classifier C (x)：

α_t=log (1/ β_t) (19)

4.3, HMM dynamic gesture track identifications are based on

In the present invention, identification dynamic gesture track can use Hidden Markov Model, identification process to correspond to hidden Ma Er Three processes of section's husband's model solution：

(1) estimation problem

The problem refers to and generated by the model one for given Hidden Markov Model λ=(π, A, a B) Sequence of observations O=(o₁, o2 ..., o_T), calculate the likelihood probability P (O | λ) of the observation sequence O of generation.Solve the problems, such as this one A efficient algorithm is preceding to-backward recursion algorithm.

Defining forward variable is：

α_t(i)=P (o₁, o₂... o_T, q_t=θ_i| λ), 1≤t≤T (19)

In formula (19), P () indicate observation sequence like probability；o₁, o₂... o_TIndicate observation sequence；q_tIndicate moment t's Observation；θ_iIndicate system mode value；λ indicates Hidden Markov Model；T indicates observation total time；T indicates time scale, takes Between value 0-T.

Remember b_j(o_t)=b_jk|o_t=v_k, b_j(o_t) indicate observation state transfer matrix, b_jkIndicate arbitrary t moment, systematic perspective Survey matrix, v_kIndicate that the hidden state of t moment, forwards algorithms step are：

Initialization：

α₁(i)=π_ib_j(o₁), 1≤i≤N (20)

In formula (20), α₁(i) indicate o occur from the 1-i moment₁~o_iObservation sequence, and moment hidden state v₁It is 1 Probability；π_iIndicate initial probability distribution matrix.

Recurrence：

In formula (21), α_t+1(j) j moment hidden states v is indicated_t+1For the probability of t+1, α_{I, j}Indicate in arbitrary t moment, be System state-transition matrix.

Calculating P (O | λ)：

In formula (15), and P (O | λ) it indicates to generate the likelihood probability of observation sequence O under "current" model λ.Variable is after definition：

β_t(i)=P (o_t+1, o_t+2... o_T, q_t=θ_i| λ), 1≤t≤T (22)

In formula (22), β_t(i) posterior probability of t moment P (O | λ) is indicated.

The step of backward algorithm is：

Initialization：

β_T(i)=1, (23) 1≤i≤N

Recurrence：

T=T-1, T-2,, 1,1≤i≤N

Calculating P (O | λ)：

By using forwards algorithms in calculating first half, if the period is 0~t, after the latter half of calculating uses Can be in the hope of probability if the period is t~T to algorithm：

(2) decoding problem

For a Hidden Markov Model λ=(π, A, B), it is necessary first to find out an observation sequence of model generation Arrange O=(o₁, o₂... o_T), on the basis of the sequence of observations, computation model is undergone most during generating observation sequence Good status switchUsed here as Viterbi algorithm.

(3) problem concerning study

In the case where not knowing Hidden Markov Model parameter, observation sequence O=(o are generated according to model₁, o₂... o_T), by adjusting model parameter so that and likelihood probability P (O | λ) value maximum.In this system, problem concerning study is usually used Baum-Welch algorithms solve.

Gesture identification platform acquires images of gestures by camera, and gesture command therein, which is converted into computer, to be held Capable instruction.Firstly the need of sample database, static gesture carries out in this Basis of Database with dynamic gesture track identification： Images of gestures can be obtained by camera, can also be directly from local video file；Obtain images of gestures after, to its into The operations such as row Hand Gesture Segmentation, image binaryzation and feature extraction；Gesture identification is finally carried out to it, is returned to recognition result and is convenient for me Process observation.Design of System Software flow is as shown in Figure 6.The system is developed using multithreading, wherein image preprocessing, gesture It is segmented in by-pass journey 1 and completes, dynamic gesture tracking and identification are completed in by-pass journey 3.

Claims

1. a kind of gesture identification method based on deep learning, which is characterized in that include the following steps：

After step 2, the original images of gestures of acquisition, original images of gestures is pre-processed, original image is made with removing illumination At influence；

Step 3, the colouring information reflected using the colour of skin, are split pretreated original image based on colouring information, carry Take gesture profile；

The gesture profile that step 4, judgment step 3 are extracted whether be dynamic gesture rise, stop, if so, the gesture profile Thereafter the gesture profile of a series of images extraction is dynamic gesture, enters step 6, if it is not, then the gesture profile is static state Gesture enters step 5；

The corresponding dynamic gesture of step 6, a series of gesture profiles of positioning, stop, and gesture path is tracked using TLD algorithms, Deviation in tracing process is modified using Haar classifier, reuses HMM algorithms identification dynamic gesture.

2. a kind of gesture identification method based on deep learning as described in claim 1, which is characterized in that in the step 2 In, the pretreatment includes brightness correction and light compensation；

When brightness correction, highlight regions in original images of gestures are corrected using modified exponential transform；For original gesture Dark region in image, is corrected using the logarithmic transformation with parameter, to other regions then without correcting；

Light compensation is carried out based on dynamic threshold, original images of gestures is transformed by YCbCr colors based on the theoretical algorithm of total reflection Then the set of the larger point of Y-component in YCbCr color space images is regarded white reference point by color space.

3. a kind of gesture identification method based on deep learning as described in claim 1, which is characterized in that in the step 3 In, when being split to original image, using the skin color segmentation algorithm based on YCbCr color spaces.