CN107102727B - Dynamic gesture learning and recognition method based on ELM neural network - Google Patents
Dynamic gesture learning and recognition method based on ELM neural network Download PDFInfo
- Publication number
- CN107102727B CN107102727B CN201710160089.1A CN201710160089A CN107102727B CN 107102727 B CN107102727 B CN 107102727B CN 201710160089 A CN201710160089 A CN 201710160089A CN 107102727 B CN107102727 B CN 107102727B
- Authority
- CN
- China
- Prior art keywords
- vector
- shoulder
- gesture
- neural network
- elbow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 140
- 230000003068 static effect Effects 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 26
- 241000282414 Homo sapiens Species 0.000 claims abstract description 15
- 210000001364 upper extremity Anatomy 0.000 claims abstract description 8
- 210000000707 wrist Anatomy 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 10
- 230000005284 excitation Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 11
- 230000003993 interaction Effects 0.000 abstract description 11
- 238000002474 experimental method Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013401 experimental design Methods 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps: 1) collecting structural vectors of upper limbs of a human body; 2) calculating the structure vector into gesture included angle information; 3) describing the gesture included angle information as a static gesture feature sequence; 4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer; 5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating the weight from the hidden layer to the output layer; 6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training; 7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition. The invention adopts a learning method based on a feedforward network, namely an extreme learning machine, and uses the extreme learning machine for gesture recognition of human-computer interaction, and compared with a BP neural network, an ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.
Description
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a dynamic gesture learning and recognition method based on an ELM neural network.
Background
As a simple and direct human-computer interaction mode, the interaction application of gestures relates to the fields of remote control, home care, motion sensing games, smart home, daily teaching and the like, and the gestures are just as important research objects in the human-computer interaction field. The rapid development of recognition technology is the key point for realizing the prosperous development of the field of human-computer interaction as a main mode for assisting human beings to carry out natural and non-contact communication with machines based on the gesture interaction of computer vision. At present, the human-computer interaction technology based on gestures cannot completely meet the requirements of people, and a better technology is urgently needed in the human-computer interaction application market to improve the existing interaction mode, so that the research on the gesture recognition method based on vision is of great significance.
The meaning of the research of the gesture recognition method is mode recognition, and the extension of the mode recognition is artificial intelligence. Two key components of the gesture recognition system are feature extraction and pattern classification, and the performance of the pattern classifier directly affects the performance of the whole recognition system. In other words, the quality of the gesture recognition algorithm directly determines the final classification recognition effect. According to the characteristics of the algorithm, the research of the gesture recognition method can be divided into a template matching method and a state space method. The template matching method is to compare the extracted gesture features with reference template features one by one, classify and match gestures according to a given similarity algorithm, and mainly comprises a dynamic time warping algorithm and an optical flow method. The principle of the state space method is different from the template matching, each static gesture is taken as a node in the space, the moving gesture sequence can be represented as one traversal among different nodes, and the hidden Markov model, the dynamic Bayesian network, the neural network and the like are mainly used.
At present, the algorithm of the BP neural network is usually adopted for gesture recognition, and the gesture recognition algorithm based on the BP (Back-Propagation) neural network has the defects that the gesture recognition rate of the BP algorithm is increased along with the increase of the number of hidden layer nodes in a certain range, and then is gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. Moreover, the BP algorithm needs to set many parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor.
Disclosure of Invention
Aiming at the defects of the prior art, the dynamic gesture learning and recognition method based on the ELM neural network provided by the invention solves the problems of low learning speed and low recognition rate of the dynamic gesture in the prior art.
In order to achieve the above object, the present invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:
1) collecting structural vectors of upper limbs of a human body;
2) calculating the structure vector into gesture included angle information;
3) describing the gesture included angle information as a static gesture feature sequence;
4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer;
5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating a weight from a hidden layer to an output layer;
6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training;
7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
Preferably, the structural vector of the upper limb of the human body in the step 1) comprises a vector from the center of the shoulder to the left shoulderLeft shoulder to left elbow vectorLeft elbow to left wrist vectorLeft wrist to left hand vectorVector from shoulder center to right shoulderVector of right shoulder to right elbowRight elbow to right wrist vectorRight wrist to right hand vector
Preferably, the gesture included angle information calculated by the structural vector in the step 2) includes a vector from the center of the shoulder to the left shoulderAnd the vector from the left shoulder to the left elbowAngle β therebetween1Left shoulder to left elbow vectorAnd the vector from the left elbow to the left wristAngle β therebetween2Vector from left elbow to left wristAnd left wrist to left hand vectorAngle β therebetween3Vector from shoulder center to right shoulderVector from right shoulder to right elbowAngle β therebetween4Vector from right shoulder to right elbowVector from right elbow to right wristAngle β therebetween5Right elbow to right wrist vectorVector from right wrist to right handAngle β therebetween6。
Preferably, the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.
Preferably, the specific steps of step 4) include:
41) taking sample data of the static gesture feature sequence as an input layer;
42) initializing a network, randomly generating a weight matrix W from an input layer to a hidden layer and a threshold vector b of the hidden layer, and determining the number l of nodes of the hidden layer and an excitation function g (x);
43) calculating a hidden layer response matrix H according to the definition of the response H (x) of the hidden layer to the input sample data;
Preferably, the weight from hidden layer to output layer in the step 44)The calculation method comprises the following steps:where A is the desired output of the system, H+Is a generalized inverse of the hidden layer response matrix H.
Preferably, the shoulder center to right shoulder vectorVector from right shoulder to right elbowAngle β therebetween4Vector from right shoulder to right elbowVector from right elbow to right wristAngle β therebetween5Right elbow to right wrist vectorVector from right wrist to right handAngle β therebetween6Vector from shoulder center to left shoulderAnd the vector from the left shoulder to the left elbowAngle β therebetween1Left shoulder to left elbow vectorAnd the vector from the left elbow to the left wristAngle β therebetween2Vector from left elbow to left wristAnd left wrist to left hand vectorAngle β therebetween3According to the human body pairThe weighing structure is obtained, and the value ranges of the six included angles are all [0, pi]。
Optimally, the number of hidden layer nodes/is 39.
The invention adopts a learning method based on a feedforward network, namely an Extreme Learning Machine (ELM), and the Extreme learning machine is used for gesture recognition of man-machine interaction. The experiment result shows that compared with a BP neural network, the ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.
Drawings
FIG. 1 is a schematic diagram of a gesture structure vector according to the present invention.
FIG. 2 is a schematic diagram of a gesture structure vector included angle according to the present invention.
FIG. 3 is a flow chart of the gesture sequence description vector construction in the present invention.
FIG. 4 is a diagram of an ELM neural network mathematical model in the present invention.
FIG. 5 is a flow chart of the ELM algorithm of the present invention.
FIG. 6 is a flow chart of gesture recognition in the present invention.
FIG. 7 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm.
FIG. 8 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the ELM algorithm.
FIG. 9 is a comparison graph of the ELM algorithm and the BP algorithm.
Detailed Description
In order to make the technical solution and the achievement effect of the present invention clearer, the present invention is further described in detail with reference to the accompanying drawings and the specific embodiments.
The invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:
1) collecting the structural vector of the upper limb of the human body.
The static gesture description vector is first constructed from both the selection of joint points and the angle between the joint vectors. And 9 joint points closely related to the gesture change are selected to describe the characteristics of the gesture, namely the left hand, the left wrist, the left elbow, the left shoulder, the shoulder center, the right hand, the right wrist, the right elbow and the right shoulder.
Constructing gesture structure vectors aiming at the structural characteristics of the human body in the skeleton model is the basis for describing hand angle information. A total of 8 sets of structure vectors are constructed according to the upper limb part of the human skeleton model, and the structure vector construction method is shown in figure 1. Vector quantityRespectively represent vectors formed by joint points on the left arm, and respectively represent structural vectors from the center of the shoulder to the left shoulder, the left shoulder to the left elbow, the left elbow to the left wrist and the left wrist to the left hand. Accordingly, the 4-set structural vector consisting of the joint points on the right arm includes the shoulder center to right shoulder vectorVector of right shoulder to right elbowRight elbow to right wrist vectorRight wrist to right hand vectorThe specific corresponding positions are shown in figure 1.
2) And calculating the gesture included angle information of the structure vector.
And selecting included angles between adjacent structure vectors to represent joint point angle information, and constructing static gesture description vectors by using the values of the included angles so as to represent the static gesture characteristics. The invention selects 6 pieces of angle information to construct static gesture description vectors, and figure 2 shows the included angle information between different structural vectors on the left arm of the human body.
β can be seen in conjunction with FIGS. 1 and 21Vector representing center of shoulder to left shoulderAnd the vector from the left shoulder to the left elbowAngle between them, whose value reflects the angle change information of the left shoulder node angle β2Vector representing left shoulder to left elbowAnd the vector from the left elbow to the left wristAngle therebetween, β3Vector representing left elbow to left wristAnd left wrist to left hand vectorThe values of the included angles between the left elbow and the left wrist respectively reflect the angle change information of the nodes. 3 included angle information between left arm structure vectors are marked in fig. 2, and 3 included angle information of right arm structure vectors can be obtained according to a human body symmetric structure, which are respectively: vector from shoulder center to right shoulderVector from right shoulder to right elbowAngle β therebetween4Vector from right shoulder to right elbowVector from right elbow to right wristAngle β therebetween5Right elbow to right wrist vectorVector from right wrist to right handAngle β therebetween6. The value ranges of the 6 included angles are all [0, pi ]]。
3) The gesture angle information is described as a static gesture feature sequence. The flow of the gesture sequence description vector construction is shown in fig. 3.
G represents static gesture description vector, and G is (β)1,β2,β3,β4,β5,β6) That is, a static gesture description vector consisting of 6 angular values may represent a static gesture feature.
A gesture sequence is composed of several frames of static gestures, GSRepresenting a gesture sequence with N frames of data, for gesture sequence GSIs provided with
GS=(G1,G2,···,GN)
Wherein G isiAnd representing the static gesture description vector corresponding to the ith frame data, wherein i is more than or equal to 1 and less than or equal to N. That is, a gesture sequence containing N frames of data may represent dynamic gesture features with N static gesture description vectors.
4) And establishing an ELM neural network by taking the static gesture feature sequence as an input layer.
The Extreme Learning Machine (ELM) algorithm provided by the invention is a Learning algorithm based on Single-hidden Layer Feedforward neural Networks (SLFNs), and belongs to the field of fast Machine Learning methods, as shown in fig. 4 and 5. In the feedforward type neural network, a common learning method is back propagation, i.e., BP algorithm. However, the BP algorithm requires setting various parameters, and the algorithm is easy to converge to a local minimum solution. Secondly, the determination of the initial weight and the threshold of the hidden layer and the output layer affects the stability and the generalization capability of the network, thereby affecting the final recognition effect. However, the ELM algorithm only needs to set the number of neurons in the hidden layer, and does not need to modify the weight and the threshold, so that the training time is shortened. In addition, the solution obtained by the ELM algorithm is a global optimal solution, and the problem that the solution is easy to fall into a local optimal solution in the BP neural network solving process is solved.
The learning steps of the extreme learning machine are as follows:
5) and inputting sample data of the static gesture feature sequence to train the ELM neural network, calculating weights from a hidden layer to an output layer, and inputting a training sample set to form a gesture sequence G (β) according to an included angle1,β2,β3,β4,β5,β6) And obtaining a plurality of groups of gesture sequences to form a sample set Gs. 6 gesture vector included angles are calculated by 8 gesture vectors, each 6 gesture vector included angles form a G, and a plurality of groups of G form a sample Gs. Gs is input into an extreme learning machine for operation, and finally the weight from the hidden layer to the output layer can be calculated
6) And obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training. The training process of training the neural network of the extreme learning machine is to solve the weight from the hidden layer to the output layerIn the process of (2), the weight is solvedI.e. the training is complete.
Assume a sample size of N, gesture sequence GS=(G1,G2,···,GN)]The number of nodes of the input layer is n [ i.e. the number of nodes of the input layer ]]The input vector is xi=(xi1,xi2,···,xin)T[ value input from each node of input layer-sequence of partial gestures within the input Gs]I is more than or equal to 1 and less than or equal to N, and i and N are natural numbers; the hidden layer excitation function is g (x), the number of neuron nodes is l [ the number of nodes of the hidden layer ]]The threshold is bj[ Each data has an error value after the hidden layer, and the error value is randomly generated by the system]J is more than or equal to 1 and less than or equal to l, and j and l are natural numbers; the excitation function of the output layer is f (x), and is generally set to be f (x) ═ x, and the number of nodes of the output layer is m [ the number of nodes of the output layer ]]The node threshold is bos[ error value after each data to output layer, System random Generation]S is more than or equal to 1 and less than or equal to m, and s and m are natural numbers; the output vector is yi=(yi1,yi2,···,yim) [ actual output result]The desired output vector is ai=(ai1,ai2,...,aim) [ ideal output result]I is more than or equal to 1 and less than or equal to N; the weight matrix between the input layer and the hidden layer is W ═ W1,w2,···wj,···wl) [ random Generation by an extreme learning machine neural network],wjIs an n-dimensional column vector and represents the weight from all nodes of the input layer to the jth node of the hidden layer, and the weight matrix between the hidden layer and the output layer is β ═ β1,β2,···βs,···βm) [ β ] training is completed by evaluating the training phase of extreme learning machine and determining β],βsAnd the vector is an l-dimensional column vector and represents the weight from all nodes of the hidden layer to the s-th node of the output layer. Then for an ELM neural network containing only one hidden layer, the output of the ith sample is:
yi=f[g(WTxi+b)Tβ+bo]=g(WTxi+b)Tβ+bo
=[g(w1xi+b1)g(w2xi+b2)···g(wlxi+bl)]β+bo
wherein the threshold sequence b ═ b1,b2,···,bj,···,bl)TNode threshold sequence bo ═ bo (bo)1,bo2,···,bos,···,bom) The response of the hidden layer to the vector of input samples is denoted as h (x)i)=[g(w1xi+b1)g(w2xi+b2)···g(wlxi+bl)]Then there is an output
yi=h(xi)β+bo
For a network of N samples, let the hidden layer response matrix H ═ H (x)1)h(x2)···h(xN)]T, BoT=(bo,···,bo)1×NThen the system output can be expressed as Y-H β + Bo
If A is ═ a1,a2,···,aN)TRepresenting the desired output of the system, the error function of the system may be expressed as E | | | a-Y | | | | a- (H β + Bo) | | computational complexity
For a feedforward neural network with n, m and l nodes of input layer, output layer and hidden layer, if the excitation function g (x) R → R satisfies infinity in any interval, then for the randomly generated weight vector wiAnd a threshold value bjIf the hidden layer response matrix H is reversible, the error function | | | a-H β | | | | is 0.
According to the above theorem, the weight w is infinitely differentiable in any interval R → R as long as the excitation function g (x) is satisfiediAnd a threshold value bjAll can be specified, so the network does not need to adjust these two parameters when training, since a-H β | |, 0, the output layer threshold bosAnd also without adjustment, the entire network may only need to determine the output layer weight matrix β.
In an ideal case, the N output vectors y of the feedforward neural networkiEqual to the corresponding desired output vector aiThat is, if a-H β exists, that is, if the matrix β in the equation can be found, a neural network with zero output error can be constructed, where the error function E is a zero matrix and the output layer weight matrix is β -H-1A. As long as the number of hidden layer nodes satisfies l ═ N, the error function E is a zero matrix, and at this time, an N × N order invertible matrix H must exist.
However, in general, the number of hidden layer neurons is smaller than the number of input samples of the neural network, i.e. l is less than N, and at this time, the hidden layer response matrix H is not a square matrix, and the inverse matrix H in the general sense cannot be obtained-1We therefore turn to solving to minimize the systematic error function E. Is provided withThe solution for the minimum value of the error function E is then
If it isSatisfy the above formula and minimize β, i.e. satisfy min β, thenIs the least-norm least-squares solution of the above formula, and has
From the above formula, a least squares solution of the minimum norm can be calculatedThis isI.e. the weight from the hidden layer to the output layerCalculating the weightAnd then, after the extreme learning machine training is finished, the gesture recognition can be carried out.
7) And inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
Desired output a ═ a1,a2,···,aN)TThe value of (A) is selected according to the actual situation, namely the sample data of the static gesture feature sequence is classified by using the matrix. Inputting a large amount of sample data into the ELM for training after selection, and inputting any hand after trainingThe potential feature sequence data can be used for obtaining an output value. Comparing the output value with the preset expected output standard A value can judge the represented meaning. And if the output value cannot be found in the pre-established A value standard, the identification fails.
Test and result analysis
The invention selects 810 vector samples to be divided into two groups, one group is a training vector sample, the other group is a testing vector sample, the sample amount of each group is 405, the gesture comprises 45 gestures, the types of the gestures are marked, and experimental data are determined. The gesture recognition experiment based on the neural network is to perform learning training on a plurality of samples and then perform classification recognition on the test samples, that is, the gesture recognition processes based on the BP algorithm or the ELM algorithm are the same, and the gesture recognition process is shown in fig. 6.
(1) BP algorithm experimental design
The invention adopts a BP algorithm with a standard three-layer structure to carry out experiments on gesture data. There are many parameters to be determined for a three-layer BP network, where the number of input layer nodes i 'and the number of output layer nodes o' can be determined by the dimension of the input vector sample and the total class number of samples, i.e., i '360 and o' 9, respectively. Weight and threshold usage of BP network [ -1,1 [ -1 [ ]]The values of partial training parameters are determined, however, the other two important parameters cannot be directly determined, namely the number l ' of hidden layer nodes and the learning step length η '. at present, no good theory can help us to directly determine the number of hidden layer nodes and the learning step length, but the value range of the number l ' of hidden layer nodes can be determined according to an empirical formula, and the calculation formula is as follows:the number of nodes of the BP algorithm input layer, the hidden layer and the output layer is respectively represented by i ', l' and o ', a' is a constant term, and the value range is 1-8. The empirical value range of the number of the nodes of the hidden layer can be calculated to be 20-27 according to the formula.
When a BP neural network is adopted to train a sample, the number l 'of hidden layer nodes with undetermined values and a learning step length η' both influence the final gesture recognition effect, in order to obtain the optimal recognition rate of the BP algorithm, the change rule of a gesture recognition result along with the number of the hidden layer nodes and the learning step length needs to be analyzed firstly, in order to make the experiment result more reasonable, the value of one parameter needs to be determined firstly, the change rule of the gesture recognition result along with the other parameter needs to be researched, the value range of the hidden layer nodes is determined according to an empirical formula, therefore, the learning step length is set to be a fixed value firstly, the influence of the number of the hidden layer nodes on the gesture recognition result of the BP algorithm is researched, the learning step length is set to be η '═ 0.2, and the number l' of the hidden layer nodes is set to be a number in the empirical value range respectively, so that 8 groups of experiment results of a gesture test sample of 405 can be obtained.
TABLE 1 influence of hidden layer node number on BP Algorithm
According to the values given in table 1, we can study the change rule of the gesture recognition rate of the BP algorithm along with the number of nodes of the hidden layer. FIG. 7 shows the relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm. As can be seen from fig. 7, when the number of hidden layer nodes is within the range of the empirical value, the gesture recognition rate increases with the increase of the number of nodes, and when the number of hidden layer nodes reaches a certain value, the gesture recognition rate may be reduced. This is because increasing the number of hidden layer nodes increases the number of iterations, which causes the neural network to over-fit, resulting in a reduced gesture recognition rate. Therefore, selecting the appropriate number of hidden layer nodes is crucial to the BP neural network model.
As can be seen from table 1, when the number of hidden layer nodes is l' ═ 25, the gesture recognition rate of the BP algorithm is the highest, and at this time, after 436 iterations, the mean square error of the BP algorithm reaches the preset minimum value, and the network training is finishedThe initial value range of the learning step length η' of the network is [0.1,0.7 ]][60]The number of hidden layer nodes l 'is set to 25, and the gesture recognition results when the learning step η' takes different values within the range are shown in table 2.
TABLE 2 Effect of learning step size η' on BP Algorithm
As can be seen from table 2, when the learning step η 'is 0.45, the gesture recognition rate of the BP algorithm is the highest, and therefore the learning step η' is 0.45 is selected as the optimal value, therefore, all parameters of the BP algorithm for obtaining the optimal recognition rate are determined, that is, when the hidden layer node number is l '25 and the learning step η' is 0.45, the gesture recognition effect of the BP algorithm is optimal.
(2) ELM algorithm experimental design
When an ELM algorithm is selected to classify and identify 405 gesture test samples, the number of nodes of an input layer of the ELM algorithm is 360 of the dimension of a gesture description vector, the number of nodes of an output layer of the ELM algorithm is 9 of the gesture sample class, and an excitation function is an S-shaped function. Because the weight and the threshold are not required to be determined in the ELM algorithm, after the gesture sample vector is input in the ELM neural network, the gesture sample can be trained and recognized only by confirming the number of the hidden layer nodes. In the gesture recognition process, the value of the number of hidden layer nodes of the ELM algorithm is not provided with an empirical formula for reference, so that the number of the hidden layer nodes is sequentially 1-50, and the relationship between the gesture recognition result of the ELM algorithm and the number of the hidden layer nodes is researched. According to the gesture recognition result in fig. 8, the relationship between the gesture recognition rate of the ELM algorithm and the number of hidden layer nodes can be studied. As can be seen from fig. 8, when the number of hidden layer nodes is a value within [1,39], the gesture recognition rate of the ELM algorithm is continuously increased in fluctuation with the increase of the number of hidden layer nodes until the number of hidden layer nodes increases to 39, and the gesture recognition rate reaches the highest value. When the number of hidden layer nodes is greater than 39, the gesture recognition rate is reduced as the number of hidden layer nodes increases. Therefore, when the ELM algorithm achieves the optimal gesture recognition rate of 84.2%, the number of hidden layer nodes is 39.
According to two sets of designed experiments, the invention researches the change rule of the gesture recognition results of the BP algorithm and the ELM algorithm along with respective parameters, and determines the parameters when the two algorithms reach the highest recognition rate, in the ELM algorithm, because a weight matrix from an input layer to a hidden layer and a threshold value of a node of the hidden layer are randomly generated, the final gesture recognition result fluctuates, in order to make the experiment result more reasonable, when the ELM algorithm is adopted to recognize a gesture, the experiment is repeated for 20 times after the node number of the hidden layer is set, the average value of the experiment results of 20 times is taken as the gesture recognition result of the ELM algorithm, meanwhile, the training time length of each experiment is recorded, the average time length of 20 times of experiment is taken as the training time length of the ELM algorithm, the experiment can obtain, when the node number of the hidden layer is 39, the average gesture recognition rate of the ELM algorithm is 83.3%, the average training time length is 0.03 second, the experiment of the BP algorithm can know, when the node number of the hidden layer is l is 25, the learning step length is η, the highest gesture recognition rate of the BP algorithm is reached, and the gesture recognition rate is 10.8.8.8.
Table 3 shows the best recognition comparison between BP algorithm and ELM algorithm. As can be seen from table 3, the gesture recognition effect and the training duration of the ELM algorithm are both better than those of the BP algorithm. The gesture recognition results and training duration pairs for both algorithms are shown in fig. 9.
TABLE 3 optimal identification result comparison of BP algorithm and ELM algorithm
From the above experimental results, the following conclusions can be drawn:
(1) as can be seen from fig. 7, in a certain range, the gesture recognition rate of the BP algorithm becomes higher as the number of hidden layer nodes increases, and then becomes gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. This is because too many hidden layer nodes lead to an over-training problem, and thus the recognition rate is reduced. As can be seen from table 1, increasing the number of hidden layer nodes easily increases the number of iterations of the network. Therefore, the determination of the number of hidden layer neuron nodes is crucial to the BP network model.
(2) As can be seen from fig. 8, the gesture recognition rate of the ELM algorithm increases with the number of hidden layer nodes within a certain range. When the number of hidden layer nodes reaches a certain value, the gesture recognition rate is reduced along with the increase of the number of hidden layer nodes. Therefore, the number of hidden layer nodes is an important factor influencing the gesture recognition effect of the ELM algorithm, and the determination of the number of hidden layer neuron nodes is a key point of the ELM algorithm gesture recognition.
(3) Comparing the experimental results of the BP algorithm and the ELM algorithm in fig. 9, it can be seen that the gesture recognition effect of the ELM algorithm is superior to that of the BP neural network algorithm, and the training duration of the ELM algorithm is significantly shorter than that of the BP algorithm. The ELM algorithm searches for a global optimal solution, and the input sample can be trained only by determining the number of nodes of the hidden layer without setting a plurality of parameters, so that the network training time is shortened. The BP algorithm needs to set a plurality of parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor. Compared with a BP neural network, the ELM algorithm has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of gestures.
Those not described in detail in this specification are within the skill of the art.
Claims (5)
1. A dynamic gesture learning and recognition method based on an ELM neural network is characterized in that: the method comprises the following steps:
1) collecting structural vectors of upper limbs of a human body; firstly, constructing a static gesture description vector from two aspects of selection of joint points and angles among the joint vectors, and selecting 9 joint points closely related to gesture change to describe features of gestures, wherein the features are respectively a left hand, a left wrist, a left elbow, a left shoulder, a shoulder center, a right hand, a right wrist, a right elbow and a right shoulder; the gesture structure vector is constructed according to the structure characteristics of the human body in the skeleton modelOn the basis of hand angle information, 8 groups of structural vectors are constructed according to the upper limb part of a human skeleton model; the structural vector of the upper limb of the human body comprises a vector from the center of the shoulder to the left shoulderLeft shoulder to left elbow vectorLeft elbow to left wrist vectorLeft wrist to left hand vectorVector from shoulder center to right shoulderVector of right shoulder to right elbowRight elbow to right wrist vectorRight wrist to right hand vector
2) Calculating the structure vector into gesture included angle information; the structural vector calculation gesture included angle information comprises a vector from the center of the shoulder to the left shoulderAnd the vector from the left shoulder to the left elbowAngle β therebetween1Left shoulder to left elbow vectorAnd the vector from the left elbow to the left wristAngle β therebetween2Vector from left elbow to left wristAnd left wrist to left hand vectorAngle β therebetween3Vector from shoulder center to right shoulderVector from right shoulder to right elbowAngle β therebetween4Vector from right shoulder to right elbowVector from right elbow to right wristAngle β therebetween5Right elbow to right wrist vectorVector from right wrist to right handAngle β therebetween6;
The shoulder center to right shoulder vectorVector from right shoulder to right elbowAngle β therebetween4Vector from right shoulder to right elbowVector from right elbow to right wristAngle β therebetween5Right elbow to right wrist vectorVector from right wrist to right handAngle β therebetween6Vector from shoulder center to left shoulderAnd the vector from the left shoulder to the left elbowAngle β therebetween1Left shoulder to left elbow vectorAnd the vector from the left elbow to the left wristAngle β therebetween2Vector from left elbow to left wristAnd left wrist to left hand vectorAngle β therebetween3The value ranges of the six included angles are [0, pi ] obtained according to the symmetrical structure of the human body];
3) Describing the gesture included angle information as a static gesture feature sequence;
4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer;
5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating a weight from a hidden layer to an output layer;
6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training;
7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
2. The ELM neural network-based dynamic gesture learning and recognition method of claim 1, wherein: the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.
3. The ELM neural network-based dynamic gesture learning and recognition method of claim 2, wherein: the specific steps of the step 4) comprise:
41) taking sample data of the static gesture feature sequence as an input layer;
42) initializing a network, randomly generating a weight matrix W from an input layer to a hidden layer and a threshold vector b of the hidden layer, and determining the number l of nodes of the hidden layer and an excitation function g (x);
43) calculating a hidden layer response matrix H according to the definition of the response H (x) of the hidden layer to the input sample data;
4. The ELM neural network-based dynamic gesture learning and recognition method of claim 3, wherein: the weight from hidden layer to output layer in the step 44)The calculation method comprises the following steps:where A is the desired output of the system, H+Is a generalized inverse of the hidden layer response matrix H.
5. The ELM neural network-based dynamic gesture learning and recognition method of claim 4, wherein: the number of hidden layer nodes l is 39.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710160089.1A CN107102727B (en) | 2017-03-17 | 2017-03-17 | Dynamic gesture learning and recognition method based on ELM neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710160089.1A CN107102727B (en) | 2017-03-17 | 2017-03-17 | Dynamic gesture learning and recognition method based on ELM neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107102727A CN107102727A (en) | 2017-08-29 |
CN107102727B true CN107102727B (en) | 2020-04-07 |
Family
ID=59675073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710160089.1A Expired - Fee Related CN107102727B (en) | 2017-03-17 | 2017-03-17 | Dynamic gesture learning and recognition method based on ELM neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107102727B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110325965B (en) * | 2018-01-25 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Object processing method, device and storage medium in virtual scene |
CN108509839A (en) * | 2018-02-02 | 2018-09-07 | 东华大学 | One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks |
CN108647292A (en) * | 2018-05-07 | 2018-10-12 | 前海梧桐(深圳)数据有限公司 | Enterprise's property sort computational methods based on neural network algorithm and system |
CN108960171B (en) * | 2018-07-12 | 2021-03-02 | 安徽工业大学 | Method for converting gesture recognition into identity recognition based on feature transfer learning |
CN109271947A (en) * | 2018-09-28 | 2019-01-25 | 合肥工业大学 | A kind of night real-time hand language identifying system based on thermal imaging |
CN110443167B (en) * | 2019-07-23 | 2022-05-17 | 中国建设银行股份有限公司 | Intelligent recognition method and intelligent interaction method for traditional culture gestures and related devices |
CN110390303B (en) * | 2019-07-24 | 2022-04-08 | 达闼机器人有限公司 | Tumble alarm method, electronic device, and computer-readable storage medium |
CN110674747A (en) * | 2019-09-24 | 2020-01-10 | 上海眼控科技股份有限公司 | Behavior judging method and device, computer equipment and readable storage medium |
CN111796519B (en) * | 2020-06-14 | 2022-05-06 | 武汉理工大学 | Automatic control method of multi-input multi-output system based on extreme learning machine |
WO2022007879A1 (en) | 2020-07-09 | 2022-01-13 | 北京灵汐科技有限公司 | Weight precision configuration method and apparatus, computer device, and storage medium |
CN111831356B (en) * | 2020-07-09 | 2023-04-07 | 北京灵汐科技有限公司 | Weight precision configuration method, device, equipment and storage medium |
CN114777771B (en) * | 2022-04-13 | 2024-08-20 | 西安电子科技大学 | Outdoor unmanned vehicle combined navigation positioning method |
CN114997295B (en) * | 2022-05-25 | 2024-09-10 | 吉林大学 | LR-ELM-based lower limb artificial limb movement identification method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005769A (en) * | 2015-07-08 | 2015-10-28 | 山东大学 | Deep information based sign language recognition method |
CN105807926A (en) * | 2016-03-08 | 2016-07-27 | 中山大学 | Unmanned aerial vehicle man-machine interaction method based on three-dimensional continuous gesture recognition |
-
2017
- 2017-03-17 CN CN201710160089.1A patent/CN107102727B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005769A (en) * | 2015-07-08 | 2015-10-28 | 山东大学 | Deep information based sign language recognition method |
CN105807926A (en) * | 2016-03-08 | 2016-07-27 | 中山大学 | Unmanned aerial vehicle man-machine interaction method based on three-dimensional continuous gesture recognition |
Non-Patent Citations (3)
Title |
---|
"Constructive, Robust and Adaptive OS-ELM in Human Action Recognition";Arif Budiman等;《IAICT 2014》;20140830;全文 * |
"基于神经网络的手势识别研究";冯桐;《中国优秀硕士论文全文数据库》;20150715;50-59页 * |
"应用Kinect的人体行为识别方法研究与系统设计";韩旭;《中国优秀硕士论文全文数据库》;20131015;21-30页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107102727A (en) | 2017-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107102727B (en) | Dynamic gesture learning and recognition method based on ELM neural network | |
Pranav et al. | Facial emotion recognition using deep convolutional neural network | |
Chen et al. | A novel ensemble ELM for human activity recognition using smartphone sensors | |
Song et al. | An efficient initialization approach of Q-learning for mobile robots | |
Zuo et al. | Deterministic generative adversarial imitation learning | |
Lu et al. | A hybrid wavelet neural network and switching particle swarm optimization algorithm for face direction recognition | |
Su et al. | HDL: Hierarchical deep learning model based human activity recognition using smartphone sensors | |
Zeng et al. | CNN model design of gesture recognition based on tensorflow framework | |
Guo et al. | A deep reinforcement learning method for multimodal data fusion in action recognition | |
CN110009108A (en) | A kind of completely new quantum transfinites learning machine | |
CN104408470A (en) | Gender detection method based on average face preliminary learning | |
Wan | Deep learning: Neural network, optimizing method and libraries review | |
Hu et al. | An optimization strategy for weighted extreme learning machine based on PSO | |
Zhai et al. | Facial beauty prediction via local feature fusion and broad learning system | |
Tan et al. | Two-phase switching optimization strategy in deep neural networks | |
Patel et al. | Quantum inspired binary neural network algorithm | |
Soltani et al. | Newman-Watts-Strogatz topology in deep echo state networks for speech emotion recognition | |
Yang et al. | AM-SGCN: Tactile object recognition for adaptive multichannel spiking graph convolutional neural networks | |
Petluru et al. | Transfer Learning-based Facial Expression Recognition with modified ResNet50 | |
Guo et al. | Exploiting LSTM-RNNs and 3D skeleton features for hand gesture recognition | |
Aleotti et al. | Arm gesture recognition and humanoid imitation using functional principal component analysis | |
Li et al. | Multimodal information-based broad and deep learning model for emotion understanding | |
Tonchev et al. | Human Skeleton Motion Prediction Using Graph Convolution Optimized GRU Network | |
CN114863548A (en) | Emotion recognition method and device based on human motion posture nonlinear spatial features | |
Kasabov et al. | Incremental learning in autonomous systems: evolving connectionist systems for on-line image and speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200407 |