CN107102727B

CN107102727B - Dynamic gesture learning and recognition method based on ELM neural network

Info

Publication number: CN107102727B
Application number: CN201710160089.1A
Authority: CN
Inventors: 郭志强; 李博闻; 黄晶
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2020-04-07
Anticipated expiration: 2037-03-17
Also published as: CN107102727A

Abstract

The invention discloses a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps: 1) collecting structural vectors of upper limbs of a human body; 2) calculating the structure vector into gesture included angle information; 3) describing the gesture included angle information as a static gesture feature sequence; 4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer; 5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating the weight from the hidden layer to the output layer; 6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training; 7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition. The invention adopts a learning method based on a feedforward network, namely an extreme learning machine, and uses the extreme learning machine for gesture recognition of human-computer interaction, and compared with a BP neural network, an ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.

Description

Dynamic gesture learning and recognition method based on ELM neural network

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to a dynamic gesture learning and recognition method based on an ELM neural network.

Background

As a simple and direct human-computer interaction mode, the interaction application of gestures relates to the fields of remote control, home care, motion sensing games, smart home, daily teaching and the like, and the gestures are just as important research objects in the human-computer interaction field. The rapid development of recognition technology is the key point for realizing the prosperous development of the field of human-computer interaction as a main mode for assisting human beings to carry out natural and non-contact communication with machines based on the gesture interaction of computer vision. At present, the human-computer interaction technology based on gestures cannot completely meet the requirements of people, and a better technology is urgently needed in the human-computer interaction application market to improve the existing interaction mode, so that the research on the gesture recognition method based on vision is of great significance.

The meaning of the research of the gesture recognition method is mode recognition, and the extension of the mode recognition is artificial intelligence. Two key components of the gesture recognition system are feature extraction and pattern classification, and the performance of the pattern classifier directly affects the performance of the whole recognition system. In other words, the quality of the gesture recognition algorithm directly determines the final classification recognition effect. According to the characteristics of the algorithm, the research of the gesture recognition method can be divided into a template matching method and a state space method. The template matching method is to compare the extracted gesture features with reference template features one by one, classify and match gestures according to a given similarity algorithm, and mainly comprises a dynamic time warping algorithm and an optical flow method. The principle of the state space method is different from the template matching, each static gesture is taken as a node in the space, the moving gesture sequence can be represented as one traversal among different nodes, and the hidden Markov model, the dynamic Bayesian network, the neural network and the like are mainly used.

At present, the algorithm of the BP neural network is usually adopted for gesture recognition, and the gesture recognition algorithm based on the BP (Back-Propagation) neural network has the defects that the gesture recognition rate of the BP algorithm is increased along with the increase of the number of hidden layer nodes in a certain range, and then is gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. Moreover, the BP algorithm needs to set many parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor.

Disclosure of Invention

Aiming at the defects of the prior art, the dynamic gesture learning and recognition method based on the ELM neural network provided by the invention solves the problems of low learning speed and low recognition rate of the dynamic gesture in the prior art.

In order to achieve the above object, the present invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:

1) collecting structural vectors of upper limbs of a human body;

2) calculating the structure vector into gesture included angle information;

3) describing the gesture included angle information as a static gesture feature sequence;

4) establishing an ELM neural network by taking the static gesture feature sequence as an input layer;

5) inputting sample data of the static gesture feature sequence to train the ELM neural network, and calculating a weight from a hidden layer to an output layer;

6) obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training;

7) and inputting the data of the static gesture feature sequence into an ELM neural network for recognition.

Preferably, the structural vector of the upper limb of the human body in the step 1) comprises a vector from the center of the shoulder to the left shoulder

Left shoulder to left elbow vector

Left elbow to left wrist vector

Left wrist to left hand vector

Vector from shoulder center to right shoulder

Vector of right shoulder to right elbow

Right elbow to right wrist vector

Right wrist to right hand vector

Preferably, the gesture included angle information calculated by the structural vector in the step 2) includes a vector from the center of the shoulder to the left shoulder

And the vector from the left shoulder to the left elbow

Angle β therebetween₁Left shoulder to left elbow vector

And the vector from the left elbow to the left wrist

Angle β therebetween₂Vector from left elbow to left wrist

And left wrist to left hand vector

Angle β therebetween₃Vector from shoulder center to right shoulder

Vector from right shoulder to right elbow

Angle β therebetween₄Vector from right shoulder to right elbow

Vector from right elbow to right wrist

Angle β therebetween₅Right elbow to right wrist vector

Vector from right wrist to right hand

Angle β therebetween₆。

Preferably, the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.

Preferably, the specific steps of step 4) include:

41) taking sample data of the static gesture feature sequence as an input layer;

42) initializing a network, randomly generating a weight matrix W from an input layer to a hidden layer and a threshold vector b of the hidden layer, and determining the number l of nodes of the hidden layer and an excitation function g (x);

43) calculating a hidden layer response matrix H according to the definition of the response H (x) of the hidden layer to the input sample data;

44) calculating hidden layer to output layer weight

45) Outputting the weight from the hidden layer to the output layer

Preferably, the weight from hidden layer to output layer in the step 44)

The calculation method comprises the following steps:

where A is the desired output of the system, H⁺Is a generalized inverse of the hidden layer response matrix H.

Preferably, the shoulder center to right shoulder vector

Vector from right shoulder to right elbow

Angle β therebetween₄Vector from right shoulder to right elbow

Vector from right elbow to right wrist

Angle β therebetween₅Right elbow to right wrist vector

Vector from right wrist to right hand

Angle β therebetween₆Vector from shoulder center to left shoulder

And the vector from the left shoulder to the left elbow

Angle β therebetween₁Left shoulder to left elbow vector

And the vector from the left elbow to the left wrist

Angle β therebetween₂Vector from left elbow to left wrist

And left wrist to left hand vector

Angle β therebetween₃According to the human body pairThe weighing structure is obtained, and the value ranges of the six included angles are all [0, pi]。

Optimally, the number of hidden layer nodes/is 39.

The invention adopts a learning method based on a feedforward network, namely an Extreme Learning Machine (ELM), and the Extreme learning machine is used for gesture recognition of man-machine interaction. The experiment result shows that compared with a BP neural network, the ELM algorithm has higher learning speed and better recognition effect. The method has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of the gesture.

Drawings

FIG. 1 is a schematic diagram of a gesture structure vector according to the present invention.

FIG. 2 is a schematic diagram of a gesture structure vector included angle according to the present invention.

FIG. 3 is a flow chart of the gesture sequence description vector construction in the present invention.

FIG. 4 is a diagram of an ELM neural network mathematical model in the present invention.

FIG. 5 is a flow chart of the ELM algorithm of the present invention.

FIG. 6 is a flow chart of gesture recognition in the present invention.

FIG. 7 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm.

FIG. 8 is a diagram illustrating a relationship between the number of hidden layer nodes and the gesture recognition rate of the ELM algorithm.

FIG. 9 is a comparison graph of the ELM algorithm and the BP algorithm.

Detailed Description

In order to make the technical solution and the achievement effect of the present invention clearer, the present invention is further described in detail with reference to the accompanying drawings and the specific embodiments.

The invention provides a dynamic gesture learning and recognition method based on an ELM neural network, which comprises the following steps:

1) collecting the structural vector of the upper limb of the human body.

The static gesture description vector is first constructed from both the selection of joint points and the angle between the joint vectors. And 9 joint points closely related to the gesture change are selected to describe the characteristics of the gesture, namely the left hand, the left wrist, the left elbow, the left shoulder, the shoulder center, the right hand, the right wrist, the right elbow and the right shoulder.

Constructing gesture structure vectors aiming at the structural characteristics of the human body in the skeleton model is the basis for describing hand angle information. A total of 8 sets of structure vectors are constructed according to the upper limb part of the human skeleton model, and the structure vector construction method is shown in figure 1. Vector quantity

Respectively represent vectors formed by joint points on the left arm, and respectively represent structural vectors from the center of the shoulder to the left shoulder, the left shoulder to the left elbow, the left elbow to the left wrist and the left wrist to the left hand. Accordingly, the 4-set structural vector consisting of the joint points on the right arm includes the shoulder center to right shoulder vector

Vector of right shoulder to right elbow

Right elbow to right wrist vector

Right wrist to right hand vector

The specific corresponding positions are shown in figure 1.

2) And calculating the gesture included angle information of the structure vector.

And selecting included angles between adjacent structure vectors to represent joint point angle information, and constructing static gesture description vectors by using the values of the included angles so as to represent the static gesture characteristics. The invention selects 6 pieces of angle information to construct static gesture description vectors, and figure 2 shows the included angle information between different structural vectors on the left arm of the human body.

β can be seen in conjunction with FIGS. 1 and 2₁Vector representing center of shoulder to left shoulder

And the vector from the left shoulder to the left elbow

Angle between them, whose value reflects the angle change information of the left shoulder node angle β₂Vector representing left shoulder to left elbow

And the vector from the left elbow to the left wrist

Angle therebetween, β₃Vector representing left elbow to left wrist

And left wrist to left hand vector

The values of the included angles between the left elbow and the left wrist respectively reflect the angle change information of the nodes. 3 included angle information between left arm structure vectors are marked in fig. 2, and 3 included angle information of right arm structure vectors can be obtained according to a human body symmetric structure, which are respectively: vector from shoulder center to right shoulder

Vector from right shoulder to right elbow

Angle β therebetween₄Vector from right shoulder to right elbow

Vector from right elbow to right wrist

Angle β therebetween₅Right elbow to right wrist vector

Vector from right wrist to right hand

Angle β therebetween₆. The value ranges of the 6 included angles are all [0, pi ]]。

3) The gesture angle information is described as a static gesture feature sequence. The flow of the gesture sequence description vector construction is shown in fig. 3.

G represents static gesture description vector, and G is (β)₁,β₂,β₃,β₄,β₅,β₆) That is, a static gesture description vector consisting of 6 angular values may represent a static gesture feature.

A gesture sequence is composed of several frames of static gestures, G_SRepresenting a gesture sequence with N frames of data, for gesture sequence G_SIs provided with

G_S＝(G₁,G₂,···,G_N)

Wherein G is_iAnd representing the static gesture description vector corresponding to the ith frame data, wherein i is more than or equal to 1 and less than or equal to N. That is, a gesture sequence containing N frames of data may represent dynamic gesture features with N static gesture description vectors.

4) And establishing an ELM neural network by taking the static gesture feature sequence as an input layer.

The Extreme Learning Machine (ELM) algorithm provided by the invention is a Learning algorithm based on Single-hidden Layer Feedforward neural Networks (SLFNs), and belongs to the field of fast Machine Learning methods, as shown in fig. 4 and 5. In the feedforward type neural network, a common learning method is back propagation, i.e., BP algorithm. However, the BP algorithm requires setting various parameters, and the algorithm is easy to converge to a local minimum solution. Secondly, the determination of the initial weight and the threshold of the hidden layer and the output layer affects the stability and the generalization capability of the network, thereby affecting the final recognition effect. However, the ELM algorithm only needs to set the number of neurons in the hidden layer, and does not need to modify the weight and the threshold, so that the training time is shortened. In addition, the solution obtained by the ELM algorithm is a global optimal solution, and the problem that the solution is easy to fall into a local optimal solution in the BP neural network solving process is solved.

The learning steps of the extreme learning machine are as follows:

5) and inputting sample data of the static gesture feature sequence to train the ELM neural network, calculating weights from a hidden layer to an output layer, and inputting a training sample set to form a gesture sequence G (β) according to an included angle₁,β₂,β₃,β₄,β₅,β₆) And obtaining a plurality of groups of gesture sequences to form a sample set Gs. 6 gesture vector included angles are calculated by 8 gesture vectors, each 6 gesture vector included angles form a G, and a plurality of groups of G form a sample Gs. Gs is input into an extreme learning machine for operation, and finally the weight from the hidden layer to the output layer can be calculated

6) And obtaining the weight from the hidden layer to the output layer, namely completing ELM neural network training. The training process of training the neural network of the extreme learning machine is to solve the weight from the hidden layer to the output layer

In the process of (2), the weight is solved

I.e. the training is complete.

Assume a sample size of N, gesture sequence G_S＝(G₁,G₂,···,G_N)]The number of nodes of the input layer is n [ i.e. the number of nodes of the input layer ]]The input vector is x_i＝(x_i1,x_i2,···,x_in)^T[ value input from each node of input layer-sequence of partial gestures within the input Gs]I is more than or equal to 1 and less than or equal to N, and i and N are natural numbers; the hidden layer excitation function is g (x), the number of neuron nodes is l [ the number of nodes of the hidden layer ]]The threshold is b_j[ Each data has an error value after the hidden layer, and the error value is randomly generated by the system]J is more than or equal to 1 and less than or equal to l, and j and l are natural numbers; the excitation function of the output layer is f (x), and is generally set to be f (x) ═ x, and the number of nodes of the output layer is m [ the number of nodes of the output layer ]]The node threshold is bo_s[ error value after each data to output layer, System random Generation]S is more than or equal to 1 and less than or equal to m, and s and m are natural numbers; the output vector is y_i＝(y_i1,y_i2,···,y_im) [ actual output result]The desired output vector is a_i＝(a_i1,a_i2,...,a_im) [ ideal output result]I is more than or equal to 1 and less than or equal to N; the weight matrix between the input layer and the hidden layer is W ═ W₁,w₂,···w_j,···w_l) [ random Generation by an extreme learning machine neural network]，w_jIs an n-dimensional column vector and represents the weight from all nodes of the input layer to the jth node of the hidden layer, and the weight matrix between the hidden layer and the output layer is β ═ β₁,β₂,···β_s,···β_m) [ β ] training is completed by evaluating the training phase of extreme learning machine and determining β]，β_sAnd the vector is an l-dimensional column vector and represents the weight from all nodes of the hidden layer to the s-th node of the output layer. Then for an ELM neural network containing only one hidden layer, the output of the ith sample is:

y_i＝f[g(W^Tx_i+b)^Tβ+bo]＝g(W^Tx_i+b)^Tβ+bo

＝[g(w₁x_i+b₁)g(w₂x_i+b₂)···g(w_lx_i+b_l)]β+bo

wherein the threshold sequence b ═ b₁,b₂,···,b_j,···,b_l)^TNode threshold sequence bo ═ bo (bo)₁,bo₂,···,bo_s,···,bo_m) The response of the hidden layer to the vector of input samples is denoted as h (x)_i)＝[g(w₁x_i+b₁)g(w₂x_i+b₂)···g(w_lx_i+b_l)]Then there is an output

y_i＝h(x_i)β+bo

For a network of N samples, let the hidden layer response matrix H ═ H (x)₁)h(x₂)···h(x_N)]^T， Bo^T＝(bo,···,bo)_1×NThen the system output can be expressed as Y-H β + Bo

If A is ═ a₁,a₂,···,a_N)^TRepresenting the desired output of the system, the error function of the system may be expressed as E | | | a-Y | | | | a- (H β + Bo) | | computational complexity

For a feedforward neural network with n, m and l nodes of input layer, output layer and hidden layer, if the excitation function g (x) R → R satisfies infinity in any interval, then for the randomly generated weight vector w_iAnd a threshold value b_jIf the hidden layer response matrix H is reversible, the error function | | | a-H β | | | | is 0.

According to the above theorem, the weight w is infinitely differentiable in any interval R → R as long as the excitation function g (x) is satisfied_iAnd a threshold value b_jAll can be specified, so the network does not need to adjust these two parameters when training, since a-H β | |, 0, the output layer threshold bo_sAnd also without adjustment, the entire network may only need to determine the output layer weight matrix β.

In an ideal case, the N output vectors y of the feedforward neural network_iEqual to the corresponding desired output vector a_iThat is, if a-H β exists, that is, if the matrix β in the equation can be found, a neural network with zero output error can be constructed, where the error function E is a zero matrix and the output layer weight matrix is β -H^-1A. As long as the number of hidden layer nodes satisfies l ═ N, the error function E is a zero matrix, and at this time, an N × N order invertible matrix H must exist.

However, in general, the number of hidden layer neurons is smaller than the number of input samples of the neural network, i.e. l is less than N, and at this time, the hidden layer response matrix H is not a square matrix, and the inverse matrix H in the general sense cannot be obtained^-1We therefore turn to solving to minimize the systematic error function E. Is provided with

The solution for the minimum value of the error function E is then

If it is

Satisfy the above formula and minimize β, i.e. satisfy min β, then

Is the least-norm least-squares solution of the above formula, and has

From the above formula, a least squares solution of the minimum norm can be calculated

This is

I.e. the weight from the hidden layer to the output layer

Calculating the weight

And then, after the extreme learning machine training is finished, the gesture recognition can be carried out.

Desired output a ═ a₁,a₂,···,a_N)^TThe value of (A) is selected according to the actual situation, namely the sample data of the static gesture feature sequence is classified by using the matrix. Inputting a large amount of sample data into the ELM for training after selection, and inputting any hand after trainingThe potential feature sequence data can be used for obtaining an output value. Comparing the output value with the preset expected output standard A value can judge the represented meaning. And if the output value cannot be found in the pre-established A value standard, the identification fails.

Test and result analysis

The invention selects 810 vector samples to be divided into two groups, one group is a training vector sample, the other group is a testing vector sample, the sample amount of each group is 405, the gesture comprises 45 gestures, the types of the gestures are marked, and experimental data are determined. The gesture recognition experiment based on the neural network is to perform learning training on a plurality of samples and then perform classification recognition on the test samples, that is, the gesture recognition processes based on the BP algorithm or the ELM algorithm are the same, and the gesture recognition process is shown in fig. 6.

(1) BP algorithm experimental design

The invention adopts a BP algorithm with a standard three-layer structure to carry out experiments on gesture data. There are many parameters to be determined for a three-layer BP network, where the number of input layer nodes i 'and the number of output layer nodes o' can be determined by the dimension of the input vector sample and the total class number of samples, i.e., i '360 and o' 9, respectively. Weight and threshold usage of BP network [ -1,1 [ -1 [ ]]The values of partial training parameters are determined, however, the other two important parameters cannot be directly determined, namely the number l ' of hidden layer nodes and the learning step length η '. at present, no good theory can help us to directly determine the number of hidden layer nodes and the learning step length, but the value range of the number l ' of hidden layer nodes can be determined according to an empirical formula, and the calculation formula is as follows:

the number of nodes of the BP algorithm input layer, the hidden layer and the output layer is respectively represented by i ', l' and o ', a' is a constant term, and the value range is 1-8. The empirical value range of the number of the nodes of the hidden layer can be calculated to be 20-27 according to the formula.

When a BP neural network is adopted to train a sample, the number l 'of hidden layer nodes with undetermined values and a learning step length η' both influence the final gesture recognition effect, in order to obtain the optimal recognition rate of the BP algorithm, the change rule of a gesture recognition result along with the number of the hidden layer nodes and the learning step length needs to be analyzed firstly, in order to make the experiment result more reasonable, the value of one parameter needs to be determined firstly, the change rule of the gesture recognition result along with the other parameter needs to be researched, the value range of the hidden layer nodes is determined according to an empirical formula, therefore, the learning step length is set to be a fixed value firstly, the influence of the number of the hidden layer nodes on the gesture recognition result of the BP algorithm is researched, the learning step length is set to be η '═ 0.2, and the number l' of the hidden layer nodes is set to be a number in the empirical value range respectively, so that 8 groups of experiment results of a gesture test sample of 405 can be obtained.

TABLE 1 influence of hidden layer node number on BP Algorithm

According to the values given in table 1, we can study the change rule of the gesture recognition rate of the BP algorithm along with the number of nodes of the hidden layer. FIG. 7 shows the relationship between the number of hidden layer nodes and the gesture recognition rate of the BP algorithm. As can be seen from fig. 7, when the number of hidden layer nodes is within the range of the empirical value, the gesture recognition rate increases with the increase of the number of nodes, and when the number of hidden layer nodes reaches a certain value, the gesture recognition rate may be reduced. This is because increasing the number of hidden layer nodes increases the number of iterations, which causes the neural network to over-fit, resulting in a reduced gesture recognition rate. Therefore, selecting the appropriate number of hidden layer nodes is crucial to the BP neural network model.

As can be seen from table 1, when the number of hidden layer nodes is l' ═ 25, the gesture recognition rate of the BP algorithm is the highest, and at this time, after 436 iterations, the mean square error of the BP algorithm reaches the preset minimum value, and the network training is finishedThe initial value range of the learning step length η' of the network is [0.1,0.7 ]]^[60]The number of hidden layer nodes l 'is set to 25, and the gesture recognition results when the learning step η' takes different values within the range are shown in table 2.

TABLE 2 Effect of learning step size η' on BP Algorithm

As can be seen from table 2, when the learning step η 'is 0.45, the gesture recognition rate of the BP algorithm is the highest, and therefore the learning step η' is 0.45 is selected as the optimal value, therefore, all parameters of the BP algorithm for obtaining the optimal recognition rate are determined, that is, when the hidden layer node number is l '25 and the learning step η' is 0.45, the gesture recognition effect of the BP algorithm is optimal.

(2) ELM algorithm experimental design

When an ELM algorithm is selected to classify and identify 405 gesture test samples, the number of nodes of an input layer of the ELM algorithm is 360 of the dimension of a gesture description vector, the number of nodes of an output layer of the ELM algorithm is 9 of the gesture sample class, and an excitation function is an S-shaped function. Because the weight and the threshold are not required to be determined in the ELM algorithm, after the gesture sample vector is input in the ELM neural network, the gesture sample can be trained and recognized only by confirming the number of the hidden layer nodes. In the gesture recognition process, the value of the number of hidden layer nodes of the ELM algorithm is not provided with an empirical formula for reference, so that the number of the hidden layer nodes is sequentially 1-50, and the relationship between the gesture recognition result of the ELM algorithm and the number of the hidden layer nodes is researched. According to the gesture recognition result in fig. 8, the relationship between the gesture recognition rate of the ELM algorithm and the number of hidden layer nodes can be studied. As can be seen from fig. 8, when the number of hidden layer nodes is a value within [1,39], the gesture recognition rate of the ELM algorithm is continuously increased in fluctuation with the increase of the number of hidden layer nodes until the number of hidden layer nodes increases to 39, and the gesture recognition rate reaches the highest value. When the number of hidden layer nodes is greater than 39, the gesture recognition rate is reduced as the number of hidden layer nodes increases. Therefore, when the ELM algorithm achieves the optimal gesture recognition rate of 84.2%, the number of hidden layer nodes is 39.

According to two sets of designed experiments, the invention researches the change rule of the gesture recognition results of the BP algorithm and the ELM algorithm along with respective parameters, and determines the parameters when the two algorithms reach the highest recognition rate, in the ELM algorithm, because a weight matrix from an input layer to a hidden layer and a threshold value of a node of the hidden layer are randomly generated, the final gesture recognition result fluctuates, in order to make the experiment result more reasonable, when the ELM algorithm is adopted to recognize a gesture, the experiment is repeated for 20 times after the node number of the hidden layer is set, the average value of the experiment results of 20 times is taken as the gesture recognition result of the ELM algorithm, meanwhile, the training time length of each experiment is recorded, the average time length of 20 times of experiment is taken as the training time length of the ELM algorithm, the experiment can obtain, when the node number of the hidden layer is 39, the average gesture recognition rate of the ELM algorithm is 83.3%, the average training time length is 0.03 second, the experiment of the BP algorithm can know, when the node number of the hidden layer is l is 25, the learning step length is η, the highest gesture recognition rate of the BP algorithm is reached, and the gesture recognition rate is 10.8.8.8.

Table 3 shows the best recognition comparison between BP algorithm and ELM algorithm. As can be seen from table 3, the gesture recognition effect and the training duration of the ELM algorithm are both better than those of the BP algorithm. The gesture recognition results and training duration pairs for both algorithms are shown in fig. 9.

TABLE 3 optimal identification result comparison of BP algorithm and ELM algorithm

From the above experimental results, the following conclusions can be drawn:

(1) as can be seen from fig. 7, in a certain range, the gesture recognition rate of the BP algorithm becomes higher as the number of hidden layer nodes increases, and then becomes gradually stable after reaching a certain value. However, when the number of hidden layer nodes is too large, the gesture recognition rate may decrease as the number of nodes increases. This is because too many hidden layer nodes lead to an over-training problem, and thus the recognition rate is reduced. As can be seen from table 1, increasing the number of hidden layer nodes easily increases the number of iterations of the network. Therefore, the determination of the number of hidden layer neuron nodes is crucial to the BP network model.

(2) As can be seen from fig. 8, the gesture recognition rate of the ELM algorithm increases with the number of hidden layer nodes within a certain range. When the number of hidden layer nodes reaches a certain value, the gesture recognition rate is reduced along with the increase of the number of hidden layer nodes. Therefore, the number of hidden layer nodes is an important factor influencing the gesture recognition effect of the ELM algorithm, and the determination of the number of hidden layer neuron nodes is a key point of the ELM algorithm gesture recognition.

(3) Comparing the experimental results of the BP algorithm and the ELM algorithm in fig. 9, it can be seen that the gesture recognition effect of the ELM algorithm is superior to that of the BP neural network algorithm, and the training duration of the ELM algorithm is significantly shorter than that of the BP algorithm. The ELM algorithm searches for a global optimal solution, and the input sample can be trained only by determining the number of nodes of the hidden layer without setting a plurality of parameters, so that the network training time is shortened. The BP algorithm needs to set a plurality of parameters, the performance of the network is affected by the change of the parameters such as the weight of each layer, the threshold, the number of nodes of the hidden layer, the learning rate and the like, and the BP algorithm is easy to fall into a local optimal solution, so that the generalization capability of the network is poor. Compared with a BP neural network, the ELM algorithm has the advantages of stronger operability, better generalization capability of the network and higher successful recognition rate of gestures.

Those not described in detail in this specification are within the skill of the art.

Claims

1. A dynamic gesture learning and recognition method based on an ELM neural network is characterized in that: the method comprises the following steps:

1) collecting structural vectors of upper limbs of a human body; firstly, constructing a static gesture description vector from two aspects of selection of joint points and angles among the joint vectors, and selecting 9 joint points closely related to gesture change to describe features of gestures, wherein the features are respectively a left hand, a left wrist, a left elbow, a left shoulder, a shoulder center, a right hand, a right wrist, a right elbow and a right shoulder; the gesture structure vector is constructed according to the structure characteristics of the human body in the skeleton modelOn the basis of hand angle information, 8 groups of structural vectors are constructed according to the upper limb part of a human skeleton model; the structural vector of the upper limb of the human body comprises a vector from the center of the shoulder to the left shoulder

Left shoulder to left elbow vector

Left elbow to left wrist vector

Left wrist to left hand vector

Vector from shoulder center to right shoulder

Vector of right shoulder to right elbow

Right elbow to right wrist vector

Right wrist to right hand vector

2) Calculating the structure vector into gesture included angle information; the structural vector calculation gesture included angle information comprises a vector from the center of the shoulder to the left shoulder

And the vector from the left shoulder to the left elbow

Angle β therebetween₁Left shoulder to left elbow vector

And the vector from the left elbow to the left wrist

Angle β therebetween₂Vector from left elbow to left wrist

And left wrist to left hand vector

Angle β therebetween₃Vector from shoulder center to right shoulder

Vector from right shoulder to right elbow

Angle β therebetween₄Vector from right shoulder to right elbow

Vector from right elbow to right wrist

Angle β therebetween₅Right elbow to right wrist vector

Vector from right wrist to right hand

Angle β therebetween₆；

The shoulder center to right shoulder vector

Vector from right shoulder to right elbow

Angle β therebetween₄Vector from right shoulder to right elbow

Vector from right elbow to right wrist

Angle β therebetween₅Right elbow to right wrist vector

Vector from right wrist to right hand

Angle β therebetween₆Vector from shoulder center to left shoulder

And the vector from the left shoulder to the left elbow

Angle β therebetween₁Left shoulder to left elbow vector

And the vector from the left elbow to the left wrist

Angle β therebetween₂Vector from left elbow to left wrist

And left wrist to left hand vector

Angle β therebetween₃The value ranges of the six included angles are [0, pi ] obtained according to the symmetrical structure of the human body]；

2. The ELM neural network-based dynamic gesture learning and recognition method of claim 1, wherein: the ELM neural network model in the step 4) is a single hidden layer feedforward neural network.

3. The ELM neural network-based dynamic gesture learning and recognition method of claim 2, wherein: the specific steps of the step 4) comprise:

44) calculating hidden layer to output layer weight

45) Outputting the weight from the hidden layer to the output layer

4. The ELM neural network-based dynamic gesture learning and recognition method of claim 3, wherein: the weight from hidden layer to output layer in the step 44)

The calculation method comprises the following steps:

5. The ELM neural network-based dynamic gesture learning and recognition method of claim 4, wherein: the number of hidden layer nodes l is 39.