CN107102727B - Dynamic gesture learning and recognition method based on ELM neural network - Google Patents

Dynamic gesture learning and recognition method based on ELM neural network Download PDF

Info

Publication number
CN107102727B
CN107102727B CN201710160089.1A CN201710160089A CN107102727B CN 107102727 B CN107102727 B CN 107102727B CN 201710160089 A CN201710160089 A CN 201710160089A CN 107102727 B CN107102727 B CN 107102727B
Authority
CN
China
Prior art keywords
vector
shoulder
gesture
neural network
elbow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710160089.1A
Other languages
Chinese (zh)
Other versions
CN107102727A (en
Inventor
郭志强
李博闻
黄晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201710160089.1A priority Critical patent/CN107102727B/en
Publication of CN107102727A publication Critical patent/CN107102727A/en
Application granted granted Critical
Publication of CN107102727B publication Critical patent/CN107102727B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps: 1) collecting structural vectors of upper limbs of a human body; 2) calculating the structure vector into gesture included angle information; 3) describing the gesture included angle information as a static gesture feature sequence; 4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer; 5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating the weight from the hidden layer to the output layer; 6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training; 7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition. The invention adopts a learning method based on a feedforward network, namely an extreme learning machine, and uses the extreme learning machine for gesture recognition of human-computer interaction, and compared with a BP neural network, an ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.

Description

Dynamic gesture learning and recognition method based on ELM neural network
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a dynamic gesture learning and recognition method based on an ELM neural network.
Background
As a simple and direct human-computer interaction mode, the interaction application of gestures relates to the fields of remote control, home care, motion sensing games, smart home, daily teaching and the like, and the gestures are just as important research objects in the human-computer interaction field. The rapid development of recognition technology is the key point for realizing the prosperous development of the field of human-computer interaction as a main mode for assisting human beings to carry out natural and non-contact communication with machines based on the gesture interaction of computer vision. At present, the human-computer interaction technology based on gestures cannot completely meet the requirements of people, and a better technology is urgently needed in the human-computer interaction application market to improve the existing interaction mode, so that the research on the gesture recognition method based on vision is of great significance.
The meaning of the research of the gesture recognition method is mode recognition, and the extension of the mode recognition is artificial intelligence. Two key components of the gesture recognition system are feature extraction and pattern classification, and the performance of the pattern classifier directly affects the performance of the whole recognition system. In other words, the quality of the gesture recognition algorithm directly determines the final classification recognition effect. According to the characteristics of the algorithm, the research of the gesture recognition method can be divided into a template matching method and a state space method. The template matching method is to compare the extracted gesture features with reference template features one by one, classify and match gestures according to a given similarity algorithm, and mainly comprises a dynamic time warping algorithm and an optical flow method. The principle of the state space method is different from the template matching, each static gesture is taken as a node in the space, the moving gesture sequence can be represented as one traversal among different nodes, and the hidden Markov model, the dynamic Bayesian network, the neural network and the like are mainly used.
At present, the algorithm of the BP neural network is usually adopted for gesture recognition, and the gesture recognition algorithm based on the BP (Back-Propagation) neural network has the defects that the gesture recognition rate of the BP algorithm is increased along with the increase of the number of hidden layer nodes in a certain range, and then is gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. Moreover, the BP algorithm needs to set many parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor.
Disclosure of Invention
Aiming at the defects of the prior art, the dynamic gesture learning and recognition method based on the ELM neural network provided by the invention solves the problems of low learning speed and low recognition rate of the dynamic gesture in the prior art.
In order to achieve the above object, the present invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:
1) collecting structural vectors of upper limbs of a human body;
2) calculating the structure vector into gesture included angle information;
3) describing the gesture included angle information as a static gesture feature sequence;
4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer;
5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating a weight from a hidden layer to an output layer;
6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training;
7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
Preferably, the structural vector of the upper limb of the human body in the step 1) comprises a vector from the center of the shoulder to the left shoulder
Figure GDA0002339662500000021
Left shoulder to left elbow vector
Figure GDA00023396625000000212
Left elbow to left wrist vector
Figure GDA0002339662500000022
Left wrist to left hand vector
Figure GDA0002339662500000023
Vector from shoulder center to right shoulder
Figure GDA0002339662500000024
Vector of right shoulder to right elbow
Figure GDA0002339662500000025
Right elbow to right wrist vector
Figure GDA0002339662500000026
Right wrist to right hand vector
Figure GDA0002339662500000027
Preferably, the gesture included angle information calculated by the structural vector in the step 2) includes a vector from the center of the shoulder to the left shoulder
Figure GDA0002339662500000028
And the vector from the left shoulder to the left elbow
Figure GDA0002339662500000029
Angle β therebetween1Left shoulder to left elbow vector
Figure GDA00023396625000000210
And the vector from the left elbow to the left wrist
Figure GDA00023396625000000211
Angle β therebetween2Vector from left elbow to left wrist
Figure GDA0002339662500000031
And left wrist to left hand vector
Figure GDA0002339662500000032
Angle β therebetween3Vector from shoulder center to right shoulder
Figure GDA0002339662500000033
Vector from right shoulder to right elbow
Figure GDA0002339662500000034
Angle β therebetween4Vector from right shoulder to right elbow
Figure GDA0002339662500000035
Vector from right elbow to right wrist
Figure GDA0002339662500000036
Angle β therebetween5Right elbow to right wrist vector
Figure GDA0002339662500000037
Vector from right wrist to right hand
Figure GDA0002339662500000038
Angle β therebetween6
Preferably, the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.
Preferably, the specific steps of step 4) include:
41) taking sample data of the static gesture feature sequence as an input layer;
42) initializing a network, randomly generating a weight matrix W from an input layer to a hidden layer and a threshold vector b of the hidden layer, and determining the number l of nodes of the hidden layer and an excitation function g (x);
43) calculating a hidden layer response matrix H according to the definition of the response H (x) of the hidden layer to the input sample data;
44) calculating hidden layer to output layer weight
Figure GDA0002339662500000039
45) Outputting the weight from the hidden layer to the output layer
Figure GDA00023396625000000310
Preferably, the weight from hidden layer to output layer in the step 44)
Figure GDA00023396625000000311
The calculation method comprises the following steps:
Figure GDA00023396625000000312
where A is the desired output of the system, H+Is a generalized inverse of the hidden layer response matrix H.
Preferably, the shoulder center to right shoulder vector
Figure GDA00023396625000000313
Vector from right shoulder to right elbow
Figure GDA00023396625000000314
Angle β therebetween4Vector from right shoulder to right elbow
Figure GDA00023396625000000315
Vector from right elbow to right wrist
Figure GDA00023396625000000316
Angle β therebetween5Right elbow to right wrist vector
Figure GDA00023396625000000317
Vector from right wrist to right hand
Figure GDA00023396625000000318
Angle β therebetween6Vector from shoulder center to left shoulder
Figure GDA00023396625000000319
And the vector from the left shoulder to the left elbow
Figure GDA00023396625000000320
Angle β therebetween1Left shoulder to left elbow vector
Figure GDA00023396625000000321
And the vector from the left elbow to the left wrist
Figure GDA00023396625000000322
Angle β therebetween2Vector from left elbow to left wrist
Figure GDA00023396625000000323
And left wrist to left hand vector
Figure GDA00023396625000000324
Angle β therebetween3According to the human body pairThe weighing structure is obtained, and the value ranges of the six included angles are all [0, pi]。
Optimally, the number of hidden layer nodes/is 39.
The invention adopts a learning method based on a feedforward network, namely an Extreme Learning Machine (ELM), and the Extreme learning machine is used for gesture recognition of man-machine interaction. The experiment result shows that compared with a BP neural network, the ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.
Drawings
FIG. 1 is a schematic diagram of a gesture structure vector according to the present invention.
FIG. 2 is a schematic diagram of a gesture structure vector included angle according to the present invention.
FIG. 3 is a flow chart of the gesture sequence description vector construction in the present invention.
FIG. 4 is a diagram of an ELM neural network mathematical model in the present invention.
FIG. 5 is a flow chart of the ELM algorithm of the present invention.
FIG. 6 is a flow chart of gesture recognition in the present invention.
FIG. 7 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm.
FIG. 8 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the ELM algorithm.
FIG. 9 is a comparison graph of the ELM algorithm and the BP algorithm.
Detailed Description
In order to make the technical solution and the achievement effect of the present invention clearer, the present invention is further described in detail with reference to the accompanying drawings and the specific embodiments.
The invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:
1) collecting the structural vector of the upper limb of the human body.
The static gesture description vector is first constructed from both the selection of joint points and the angle between the joint vectors. And 9 joint points closely related to the gesture change are selected to describe the characteristics of the gesture, namely the left hand, the left wrist, the left elbow, the left shoulder, the shoulder center, the right hand, the right wrist, the right elbow and the right shoulder.
Constructing gesture structure vectors aiming at the structural characteristics of the human body in the skeleton model is the basis for describing hand angle information. A total of 8 sets of structure vectors are constructed according to the upper limb part of the human skeleton model, and the structure vector construction method is shown in figure 1. Vector quantity
Figure GDA0002339662500000041
Respectively represent vectors formed by joint points on the left arm, and respectively represent structural vectors from the center of the shoulder to the left shoulder, the left shoulder to the left elbow, the left elbow to the left wrist and the left wrist to the left hand. Accordingly, the 4-set structural vector consisting of the joint points on the right arm includes the shoulder center to right shoulder vector
Figure GDA0002339662500000042
Vector of right shoulder to right elbow
Figure GDA00023396625000000516
Right elbow to right wrist vector
Figure GDA0002339662500000052
Right wrist to right hand vector
Figure GDA0002339662500000053
The specific corresponding positions are shown in figure 1.
2) And calculating the gesture included angle information of the structure vector.
And selecting included angles between adjacent structure vectors to represent joint point angle information, and constructing static gesture description vectors by using the values of the included angles so as to represent the static gesture characteristics. The invention selects 6 pieces of angle information to construct static gesture description vectors, and figure 2 shows the included angle information between different structural vectors on the left arm of the human body.
β can be seen in conjunction with FIGS. 1 and 21Vector representing center of shoulder to left shoulder
Figure GDA0002339662500000054
And the vector from the left shoulder to the left elbow
Figure GDA00023396625000000515
Angle between them, whose value reflects the angle change information of the left shoulder node angle β2Vector representing left shoulder to left elbow
Figure GDA0002339662500000055
And the vector from the left elbow to the left wrist
Figure GDA0002339662500000056
Angle therebetween, β3Vector representing left elbow to left wrist
Figure GDA0002339662500000057
And left wrist to left hand vector
Figure GDA0002339662500000058
The values of the included angles between the left elbow and the left wrist respectively reflect the angle change information of the nodes. 3 included angle information between left arm structure vectors are marked in fig. 2, and 3 included angle information of right arm structure vectors can be obtained according to a human body symmetric structure, which are respectively: vector from shoulder center to right shoulder
Figure GDA0002339662500000059
Vector from right shoulder to right elbow
Figure GDA00023396625000000510
Angle β therebetween4Vector from right shoulder to right elbow
Figure GDA00023396625000000511
Vector from right elbow to right wrist
Figure GDA00023396625000000512
Angle β therebetween5Right elbow to right wrist vector
Figure GDA00023396625000000513
Vector from right wrist to right hand
Figure GDA00023396625000000514
Angle β therebetween6. The value ranges of the 6 included angles are all [0, pi ]]。
3) The gesture angle information is described as a static gesture feature sequence. The flow of the gesture sequence description vector construction is shown in fig. 3.
G represents static gesture description vector, and G is (β)123456) That is, a static gesture description vector consisting of 6 angular values may represent a static gesture feature.
A gesture sequence is composed of several frames of static gestures, GSRepresenting a gesture sequence with N frames of data, for gesture sequence GSIs provided with
GS=(G1,G2,···,GN)
Wherein G isiAnd representing the static gesture description vector corresponding to the ith frame data, wherein i is more than or equal to 1 and less than or equal to N. That is, a gesture sequence containing N frames of data may represent dynamic gesture features with N static gesture description vectors.
4) And establishing an ELM neural network by taking the static gesture feature sequence as an input layer.
The Extreme Learning Machine (ELM) algorithm provided by the invention is a Learning algorithm based on Single-hidden Layer Feedforward neural Networks (SLFNs), and belongs to the field of fast Machine Learning methods, as shown in fig. 4 and 5. In the feedforward type neural network, a common learning method is back propagation, i.e., BP algorithm. However, the BP algorithm requires setting various parameters, and the algorithm is easy to converge to a local minimum solution. Secondly, the determination of the initial weight and the threshold of the hidden layer and the output layer affects the stability and the generalization capability of the network, thereby affecting the final recognition effect. However, the ELM algorithm only needs to set the number of neurons in the hidden layer, and does not need to modify the weight and the threshold, so that the training time is shortened. In addition, the solution obtained by the ELM algorithm is a global optimal solution, and the problem that the solution is easy to fall into a local optimal solution in the BP neural network solving process is solved.
The learning steps of the extreme learning machine are as follows:
Figure GDA0002339662500000061
5) and inputting sample data of the static gesture feature sequence to train the ELM neural network, calculating weights from a hidden layer to an output layer, and inputting a training sample set to form a gesture sequence G (β) according to an included angle123456) And obtaining a plurality of groups of gesture sequences to form a sample set Gs. 6 gesture vector included angles are calculated by 8 gesture vectors, each 6 gesture vector included angles form a G, and a plurality of groups of G form a sample Gs. Gs is input into an extreme learning machine for operation, and finally the weight from the hidden layer to the output layer can be calculated
Figure GDA0002339662500000062
6) And obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training. The training process of training the neural network of the extreme learning machine is to solve the weight from the hidden layer to the output layer
Figure GDA0002339662500000063
In the process of (2), the weight is solved
Figure GDA0002339662500000064
I.e. the training is complete.
Assume a sample size of N, gesture sequence GS=(G1,G2,···,GN)]The number of nodes of the input layer is n [ i.e. the number of nodes of the input layer ]]The input vector is xi=(xi1,xi2,···,xin)T[ value input from each node of input layer-sequence of partial gestures within the input Gs]I is more than or equal to 1 and less than or equal to N, and i and N are natural numbers; the hidden layer excitation function is g (x), the number of neuron nodes is l [ the number of nodes of the hidden layer ]]The threshold is bj[ Each data has an error value after the hidden layer, and the error value is randomly generated by the system]J is more than or equal to 1 and less than or equal to l, and j and l are natural numbers; the excitation function of the output layer is f (x), and is generally set to be f (x) ═ x, and the number of nodes of the output layer is m [ the number of nodes of the output layer ]]The node threshold is bos[ error value after each data to output layer, System random Generation]S is more than or equal to 1 and less than or equal to m, and s and m are natural numbers; the output vector is yi=(yi1,yi2,···,yim) [ actual output result]The desired output vector is ai=(ai1,ai2,...,aim) [ ideal output result]I is more than or equal to 1 and less than or equal to N; the weight matrix between the input layer and the hidden layer is W ═ W1,w2,···wj,···wl) [ random Generation by an extreme learning machine neural network],wjIs an n-dimensional column vector and represents the weight from all nodes of the input layer to the jth node of the hidden layer, and the weight matrix between the hidden layer and the output layer is β ═ β12,···βs,···βm) [ β ] training is completed by evaluating the training phase of extreme learning machine and determining β],βsAnd the vector is an l-dimensional column vector and represents the weight from all nodes of the hidden layer to the s-th node of the output layer. Then for an ELM neural network containing only one hidden layer, the output of the ith sample is:
yi=f[g(WTxi+b)Tβ+bo]=g(WTxi+b)Tβ+bo
=[g(w1xi+b1)g(w2xi+b2)···g(wlxi+bl)]β+bo
wherein the threshold sequence b ═ b1,b2,···,bj,···,bl)TNode threshold sequence bo ═ bo (bo)1,bo2,···,bos,···,bom) The response of the hidden layer to the vector of input samples is denoted as h (x)i)=[g(w1xi+b1)g(w2xi+b2)···g(wlxi+bl)]Then there is an output
yi=h(xi)β+bo
For a network of N samples, let the hidden layer response matrix H ═ H (x)1)h(x2)···h(xN)]T, BoT=(bo,···,bo)1×NThen the system output can be expressed as Y-H β + Bo
If A is ═ a1,a2,···,aN)TRepresenting the desired output of the system, the error function of the system may be expressed as E | | | a-Y | | | | a- (H β + Bo) | | computational complexity
For a feedforward neural network with n, m and l nodes of input layer, output layer and hidden layer, if the excitation function g (x) R → R satisfies infinity in any interval, then for the randomly generated weight vector wiAnd a threshold value bjIf the hidden layer response matrix H is reversible, the error function | | | a-H β | | | | is 0.
According to the above theorem, the weight w is infinitely differentiable in any interval R → R as long as the excitation function g (x) is satisfiediAnd a threshold value bjAll can be specified, so the network does not need to adjust these two parameters when training, since a-H β | |, 0, the output layer threshold bosAnd also without adjustment, the entire network may only need to determine the output layer weight matrix β.
In an ideal case, the N output vectors y of the feedforward neural networkiEqual to the corresponding desired output vector aiThat is, if a-H β exists, that is, if the matrix β in the equation can be found, a neural network with zero output error can be constructed, where the error function E is a zero matrix and the output layer weight matrix is β -H-1A. As long as the number of hidden layer nodes satisfies l ═ N, the error function E is a zero matrix, and at this time, an N × N order invertible matrix H must exist.
However, in general, the number of hidden layer neurons is smaller than the number of input samples of the neural network, i.e. l is less than N, and at this time, the hidden layer response matrix H is not a square matrix, and the inverse matrix H in the general sense cannot be obtained-1We therefore turn to solving to minimize the systematic error function E. Is provided with
Figure GDA0002339662500000081
The solution for the minimum value of the error function E is then
Figure GDA0002339662500000082
If it is
Figure GDA0002339662500000083
Satisfy the above formula and minimize β, i.e. satisfy min β, then
Figure GDA0002339662500000084
Is the least-norm least-squares solution of the above formula, and has
Figure GDA0002339662500000085
From the above formula, a least squares solution of the minimum norm can be calculated
Figure GDA0002339662500000086
This is
Figure GDA0002339662500000087
I.e. the weight from the hidden layer to the output layer
Figure GDA0002339662500000088
Calculating the weight
Figure GDA0002339662500000089
And then, after the extreme learning machine training is finished, the gesture recognition can be carried out.
7) And inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
Desired output a ═ a1,a2,···,aN)TThe value of (A) is selected according to the actual situation, namely the sample data of the static gesture feature sequence is classified by using the matrix. Inputting a large amount of sample data into the ELM for training after selection, and inputting any hand after trainingThe potential feature sequence data can be used for obtaining an output value. Comparing the output value with the preset expected output standard A value can judge the represented meaning. And if the output value cannot be found in the pre-established A value standard, the identification fails.
Test and result analysis
The invention selects 810 vector samples to be divided into two groups, one group is a training vector sample, the other group is a testing vector sample, the sample amount of each group is 405, the gesture comprises 45 gestures, the types of the gestures are marked, and experimental data are determined. The gesture recognition experiment based on the neural network is to perform learning training on a plurality of samples and then perform classification recognition on the test samples, that is, the gesture recognition processes based on the BP algorithm or the ELM algorithm are the same, and the gesture recognition process is shown in fig. 6.
(1) BP algorithm experimental design
The invention adopts a BP algorithm with a standard three-layer structure to carry out experiments on gesture data. There are many parameters to be determined for a three-layer BP network, where the number of input layer nodes i 'and the number of output layer nodes o' can be determined by the dimension of the input vector sample and the total class number of samples, i.e., i '360 and o' 9, respectively. Weight and threshold usage of BP network [ -1,1 [ -1 [ ]]The values of partial training parameters are determined, however, the other two important parameters cannot be directly determined, namely the number l ' of hidden layer nodes and the learning step length η '. at present, no good theory can help us to directly determine the number of hidden layer nodes and the learning step length, but the value range of the number l ' of hidden layer nodes can be determined according to an empirical formula, and the calculation formula is as follows:
Figure GDA0002339662500000091
the number of nodes of the BP algorithm input layer, the hidden layer and the output layer is respectively represented by i ', l' and o ', a' is a constant term, and the value range is 1-8. The empirical value range of the number of the nodes of the hidden layer can be calculated to be 20-27 according to the formula.
When a BP neural network is adopted to train a sample, the number l 'of hidden layer nodes with undetermined values and a learning step length η' both influence the final gesture recognition effect, in order to obtain the optimal recognition rate of the BP algorithm, the change rule of a gesture recognition result along with the number of the hidden layer nodes and the learning step length needs to be analyzed firstly, in order to make the experiment result more reasonable, the value of one parameter needs to be determined firstly, the change rule of the gesture recognition result along with the other parameter needs to be researched, the value range of the hidden layer nodes is determined according to an empirical formula, therefore, the learning step length is set to be a fixed value firstly, the influence of the number of the hidden layer nodes on the gesture recognition result of the BP algorithm is researched, the learning step length is set to be η '═ 0.2, and the number l' of the hidden layer nodes is set to be a number in the empirical value range respectively, so that 8 groups of experiment results of a gesture test sample of 405 can be obtained.
TABLE 1 influence of hidden layer node number on BP Algorithm
Figure GDA0002339662500000101
According to the values given in table 1, we can study the change rule of the gesture recognition rate of the BP algorithm along with the number of nodes of the hidden layer. FIG. 7 shows the relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm. As can be seen from fig. 7, when the number of hidden layer nodes is within the range of the empirical value, the gesture recognition rate increases with the increase of the number of nodes, and when the number of hidden layer nodes reaches a certain value, the gesture recognition rate may be reduced. This is because increasing the number of hidden layer nodes increases the number of iterations, which causes the neural network to over-fit, resulting in a reduced gesture recognition rate. Therefore, selecting the appropriate number of hidden layer nodes is crucial to the BP neural network model.
As can be seen from table 1, when the number of hidden layer nodes is l' ═ 25, the gesture recognition rate of the BP algorithm is the highest, and at this time, after 436 iterations, the mean square error of the BP algorithm reaches the preset minimum value, and the network training is finishedThe initial value range of the learning step length η' of the network is [0.1,0.7 ]][60]The number of hidden layer nodes l 'is set to 25, and the gesture recognition results when the learning step η' takes different values within the range are shown in table 2.
TABLE 2 Effect of learning step size η' on BP Algorithm
Figure GDA0002339662500000111
As can be seen from table 2, when the learning step η 'is 0.45, the gesture recognition rate of the BP algorithm is the highest, and therefore the learning step η' is 0.45 is selected as the optimal value, therefore, all parameters of the BP algorithm for obtaining the optimal recognition rate are determined, that is, when the hidden layer node number is l '25 and the learning step η' is 0.45, the gesture recognition effect of the BP algorithm is optimal.
(2) ELM algorithm experimental design
When an ELM algorithm is selected to classify and identify 405 gesture test samples, the number of nodes of an input layer of the ELM algorithm is 360 of the dimension of a gesture description vector, the number of nodes of an output layer of the ELM algorithm is 9 of the gesture sample class, and an excitation function is an S-shaped function. Because the weight and the threshold are not required to be determined in the ELM algorithm, after the gesture sample vector is input in the ELM neural network, the gesture sample can be trained and recognized only by confirming the number of the hidden layer nodes. In the gesture recognition process, the value of the number of hidden layer nodes of the ELM algorithm is not provided with an empirical formula for reference, so that the number of the hidden layer nodes is sequentially 1-50, and the relationship between the gesture recognition result of the ELM algorithm and the number of the hidden layer nodes is researched. According to the gesture recognition result in fig. 8, the relationship between the gesture recognition rate of the ELM algorithm and the number of hidden layer nodes can be studied. As can be seen from fig. 8, when the number of hidden layer nodes is a value within [1,39], the gesture recognition rate of the ELM algorithm is continuously increased in fluctuation with the increase of the number of hidden layer nodes until the number of hidden layer nodes increases to 39, and the gesture recognition rate reaches the highest value. When the number of hidden layer nodes is greater than 39, the gesture recognition rate is reduced as the number of hidden layer nodes increases. Therefore, when the ELM algorithm achieves the optimal gesture recognition rate of 84.2%, the number of hidden layer nodes is 39.
According to two sets of designed experiments, the invention researches the change rule of the gesture recognition results of the BP algorithm and the ELM algorithm along with respective parameters, and determines the parameters when the two algorithms reach the highest recognition rate, in the ELM algorithm, because a weight matrix from an input layer to a hidden layer and a threshold value of a node of the hidden layer are randomly generated, the final gesture recognition result fluctuates, in order to make the experiment result more reasonable, when the ELM algorithm is adopted to recognize a gesture, the experiment is repeated for 20 times after the node number of the hidden layer is set, the average value of the experiment results of 20 times is taken as the gesture recognition result of the ELM algorithm, meanwhile, the training time length of each experiment is recorded, the average time length of 20 times of experiment is taken as the training time length of the ELM algorithm, the experiment can obtain, when the node number of the hidden layer is 39, the average gesture recognition rate of the ELM algorithm is 83.3%, the average training time length is 0.03 second, the experiment of the BP algorithm can know, when the node number of the hidden layer is l is 25, the learning step length is η, the highest gesture recognition rate of the BP algorithm is reached, and the gesture recognition rate is 10.8.8.8.
Table 3 shows the best recognition comparison between BP algorithm and ELM algorithm. As can be seen from table 3, the gesture recognition effect and the training duration of the ELM algorithm are both better than those of the BP algorithm. The gesture recognition results and training duration pairs for both algorithms are shown in fig. 9.
TABLE 3 optimal identification result comparison of BP algorithm and ELM algorithm
Figure GDA0002339662500000121
From the above experimental results, the following conclusions can be drawn:
(1) as can be seen from fig. 7, in a certain range, the gesture recognition rate of the BP algorithm becomes higher as the number of hidden layer nodes increases, and then becomes gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. This is because too many hidden layer nodes lead to an over-training problem, and thus the recognition rate is reduced. As can be seen from table 1, increasing the number of hidden layer nodes easily increases the number of iterations of the network. Therefore, the determination of the number of hidden layer neuron nodes is crucial to the BP network model.
(2) As can be seen from fig. 8, the gesture recognition rate of the ELM algorithm increases with the number of hidden layer nodes within a certain range. When the number of hidden layer nodes reaches a certain value, the gesture recognition rate is reduced along with the increase of the number of hidden layer nodes. Therefore, the number of hidden layer nodes is an important factor influencing the gesture recognition effect of the ELM algorithm, and the determination of the number of hidden layer neuron nodes is a key point of the ELM algorithm gesture recognition.
(3) Comparing the experimental results of the BP algorithm and the ELM algorithm in fig. 9, it can be seen that the gesture recognition effect of the ELM algorithm is superior to that of the BP neural network algorithm, and the training duration of the ELM algorithm is significantly shorter than that of the BP algorithm. The ELM algorithm searches for a global optimal solution, and the input sample can be trained only by determining the number of nodes of the hidden layer without setting a plurality of parameters, so that the network training time is shortened. The BP algorithm needs to set a plurality of parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor. Compared with a BP neural network, the ELM algorithm has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of gestures.
Those not described in detail in this specification are within the skill of the art.

Claims (5)

1. A dynamic gesture learning and recognition method based on an ELM neural network is characterized in that: the method comprises the following steps:
1) collecting structural vectors of upper limbs of a human body; firstly, constructing a static gesture description vector from two aspects of selection of joint points and angles among the joint vectors, and selecting 9 joint points closely related to gesture change to describe features of gestures, wherein the features are respectively a left hand, a left wrist, a left elbow, a left shoulder, a shoulder center, a right hand, a right wrist, a right elbow and a right shoulder; the gesture structure vector is constructed according to the structure characteristics of the human body in the skeleton modelOn the basis of hand angle information, 8 groups of structural vectors are constructed according to the upper limb part of a human skeleton model; the structural vector of the upper limb of the human body comprises a vector from the center of the shoulder to the left shoulder
Figure FDA0002360996240000011
Left shoulder to left elbow vector
Figure FDA0002360996240000012
Left elbow to left wrist vector
Figure FDA0002360996240000013
Left wrist to left hand vector
Figure FDA0002360996240000014
Vector from shoulder center to right shoulder
Figure FDA0002360996240000015
Vector of right shoulder to right elbow
Figure FDA0002360996240000016
Right elbow to right wrist vector
Figure FDA0002360996240000017
Right wrist to right hand vector
Figure FDA0002360996240000018
2) Calculating the structure vector into gesture included angle information; the structural vector calculation gesture included angle information comprises a vector from the center of the shoulder to the left shoulder
Figure FDA0002360996240000019
And the vector from the left shoulder to the left elbow
Figure FDA00023609962400000110
Angle β therebetween1Left shoulder to left elbow vector
Figure FDA00023609962400000111
And the vector from the left elbow to the left wrist
Figure FDA00023609962400000112
Angle β therebetween2Vector from left elbow to left wrist
Figure FDA00023609962400000113
And left wrist to left hand vector
Figure FDA00023609962400000114
Angle β therebetween3Vector from shoulder center to right shoulder
Figure FDA00023609962400000115
Vector from right shoulder to right elbow
Figure FDA00023609962400000116
Angle β therebetween4Vector from right shoulder to right elbow
Figure FDA00023609962400000117
Vector from right elbow to right wrist
Figure FDA00023609962400000118
Angle β therebetween5Right elbow to right wrist vector
Figure FDA00023609962400000119
Vector from right wrist to right hand
Figure FDA00023609962400000120
Angle β therebetween6
The shoulder center to right shoulder vector
Figure FDA00023609962400000121
Vector from right shoulder to right elbow
Figure FDA00023609962400000122
Angle β therebetween4Vector from right shoulder to right elbow
Figure FDA00023609962400000123
Vector from right elbow to right wrist
Figure FDA00023609962400000124
Angle β therebetween5Right elbow to right wrist vector
Figure FDA00023609962400000125
Vector from right wrist to right hand
Figure FDA00023609962400000126
Angle β therebetween6Vector from shoulder center to left shoulder
Figure FDA00023609962400000127
And the vector from the left shoulder to the left elbow
Figure FDA00023609962400000128
Angle β therebetween1Left shoulder to left elbow vector
Figure FDA00023609962400000129
And the vector from the left elbow to the left wrist
Figure FDA00023609962400000130
Angle β therebetween2Vector from left elbow to left wrist
Figure FDA00023609962400000131
And left wrist to left hand vector
Figure FDA00023609962400000132
Angle β therebetween3The value ranges of the six included angles are [0, pi ] obtained according to the symmetrical structure of the human body];
3) Describing the gesture included angle information as a static gesture feature sequence;
4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer;
5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating a weight from a hidden layer to an output layer;
6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training;
7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition.
2. The ELM neural network-based dynamic gesture learning and recognition method of claim 1, wherein: the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.
3. The ELM neural network-based dynamic gesture learning and recognition method of claim 2, wherein: the specific steps of the step 4) comprise:
41) taking sample data of the static gesture feature sequence as an input layer;
42) initializing a network, randomly generating a weight matrix W from an input layer to a hidden layer and a threshold vector b of the hidden layer, and determining the number l of nodes of the hidden layer and an excitation function g (x);
43) calculating a hidden layer response matrix H according to the definition of the response H (x) of the hidden layer to the input sample data;
44) calculating hidden layer to output layer weight
Figure FDA0002360996240000021
45) Outputting the weight from the hidden layer to the output layer
Figure FDA0002360996240000022
4. The ELM neural network-based dynamic gesture learning and recognition method of claim 3, wherein: the weight from hidden layer to output layer in the step 44)
Figure FDA0002360996240000023
The calculation method comprises the following steps:
Figure FDA0002360996240000024
where A is the desired output of the system, H+Is a generalized inverse of the hidden layer response matrix H.
5. The ELM neural network-based dynamic gesture learning and recognition method of claim 4, wherein: the number of hidden layer nodes l is 39.
CN201710160089.1A 2017-03-17 2017-03-17 Dynamic gesture learning and recognition method based on ELM neural network Expired - Fee Related CN107102727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710160089.1A CN107102727B (en) 2017-03-17 2017-03-17 Dynamic gesture learning and recognition method based on ELM neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710160089.1A CN107102727B (en) 2017-03-17 2017-03-17 Dynamic gesture learning and recognition method based on ELM neural network

Publications (2)

Publication Number Publication Date
CN107102727A CN107102727A (en) 2017-08-29
CN107102727B true CN107102727B (en) 2020-04-07

Family

ID=59675073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710160089.1A Expired - Fee Related CN107102727B (en) 2017-03-17 2017-03-17 Dynamic gesture learning and recognition method based on ELM neural network

Country Status (1)

Country Link
CN (1) CN107102727B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110325965B (en) * 2018-01-25 2021-01-01 腾讯科技(深圳)有限公司 Object processing method, device and storage medium in virtual scene
CN108509839A (en) * 2018-02-02 2018-09-07 东华大学 One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks
CN108647292A (en) * 2018-05-07 2018-10-12 前海梧桐(深圳)数据有限公司 Enterprise's property sort computational methods based on neural network algorithm and system
CN108960171B (en) * 2018-07-12 2021-03-02 安徽工业大学 Method for converting gesture recognition into identity recognition based on feature transfer learning
CN109271947A (en) * 2018-09-28 2019-01-25 合肥工业大学 A kind of night real-time hand language identifying system based on thermal imaging
CN110443167B (en) * 2019-07-23 2022-05-17 中国建设银行股份有限公司 Intelligent recognition method and intelligent interaction method for traditional culture gestures and related devices
CN110390303B (en) * 2019-07-24 2022-04-08 达闼机器人有限公司 Tumble alarm method, electronic device, and computer-readable storage medium
CN110674747A (en) * 2019-09-24 2020-01-10 上海眼控科技股份有限公司 Behavior judging method and device, computer equipment and readable storage medium
CN111796519B (en) * 2020-06-14 2022-05-06 武汉理工大学 Automatic control method of multi-input multi-output system based on extreme learning machine
WO2022007879A1 (en) 2020-07-09 2022-01-13 北京灵汐科技有限公司 Weight precision configuration method and apparatus, computer device, and storage medium
CN111831356B (en) * 2020-07-09 2023-04-07 北京灵汐科技有限公司 Weight precision configuration method, device, equipment and storage medium
CN114777771B (en) * 2022-04-13 2024-08-20 西安电子科技大学 Outdoor unmanned vehicle combined navigation positioning method
CN114997295B (en) * 2022-05-25 2024-09-10 吉林大学 LR-ELM-based lower limb artificial limb movement identification method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN105807926A (en) * 2016-03-08 2016-07-27 中山大学 Unmanned aerial vehicle man-machine interaction method based on three-dimensional continuous gesture recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN105807926A (en) * 2016-03-08 2016-07-27 中山大学 Unmanned aerial vehicle man-machine interaction method based on three-dimensional continuous gesture recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Constructive, Robust and Adaptive OS-ELM in Human Action Recognition";Arif Budiman等;《IAICT 2014》;20140830;全文 *
"基于神经网络的手势识别研究";冯桐;《中国优秀硕士论文全文数据库》;20150715;50-59页 *
"应用Kinect的人体行为识别方法研究与系统设计";韩旭;《中国优秀硕士论文全文数据库》;20131015;21-30页 *

Also Published As

Publication number Publication date
CN107102727A (en) 2017-08-29

Similar Documents

Publication Publication Date Title
CN107102727B (en) Dynamic gesture learning and recognition method based on ELM neural network
Pranav et al. Facial emotion recognition using deep convolutional neural network
Chen et al. A novel ensemble ELM for human activity recognition using smartphone sensors
Song et al. An efficient initialization approach of Q-learning for mobile robots
Zuo et al. Deterministic generative adversarial imitation learning
Lu et al. A hybrid wavelet neural network and switching particle swarm optimization algorithm for face direction recognition
Su et al. HDL: Hierarchical deep learning model based human activity recognition using smartphone sensors
Zeng et al. CNN model design of gesture recognition based on tensorflow framework
Guo et al. A deep reinforcement learning method for multimodal data fusion in action recognition
CN110009108A (en) A kind of completely new quantum transfinites learning machine
CN104408470A (en) Gender detection method based on average face preliminary learning
Wan Deep learning: Neural network, optimizing method and libraries review
Hu et al. An optimization strategy for weighted extreme learning machine based on PSO
Zhai et al. Facial beauty prediction via local feature fusion and broad learning system
Tan et al. Two-phase switching optimization strategy in deep neural networks
Patel et al. Quantum inspired binary neural network algorithm
Soltani et al. Newman-Watts-Strogatz topology in deep echo state networks for speech emotion recognition
Yang et al. AM-SGCN: Tactile object recognition for adaptive multichannel spiking graph convolutional neural networks
Petluru et al. Transfer Learning-based Facial Expression Recognition with modified ResNet50
Guo et al. Exploiting LSTM-RNNs and 3D skeleton features for hand gesture recognition
Aleotti et al. Arm gesture recognition and humanoid imitation using functional principal component analysis
Li et al. Multimodal information-based broad and deep learning model for emotion understanding
Tonchev et al. Human Skeleton Motion Prediction Using Graph Convolution Optimized GRU Network
CN114863548A (en) Emotion recognition method and device based on human motion posture nonlinear spatial features
Kasabov et al. Incremental learning in autonomous systems: evolving connectionist systems for on-line image and speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200407