CN113111831A

CN113111831A - Gesture recognition technology based on multi-mode information fusion

Info

Publication number: CN113111831A
Application number: CN202110441452.3A
Authority: CN
Inventors: 杨庆华; 金圣权; 王志恒; 毛传波
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-13

Abstract

A gesture recognition technology based on multi-mode information fusion belongs to the technical field of gesture recognition. It comprises the following steps: the method comprises the following steps: collecting a gesture action acquisition signal executed by a subject; step two: denoising myoelectric signals and triaxial acceleration and angle original data, extracting signal moving segments and marking labels; step three: extracting time domain characteristics and AR model parameters of the electromyographic signals, and extracting absolute average values and waveform length characteristics of the triaxial acceleration signals and the angle signals; step four: fusing and reducing the dimensions of the extracted features, and inputting the data subjected to dimension reduction into a classifier; step five: training to obtain an optimized prediction model; step six: and inputting the acquired gesture signals needing to be predicted into a back propagation neural network optimized based on a genetic algorithm for recognition, and outputting a result. The invention extracts information from multiple angles, overcomes the defect of incomplete extraction of single electromyographic signal information, increases the diversity of gesture information and is beneficial to improving the gesture recognition accuracy.

Description

Gesture recognition technology based on multi-mode information fusion

Technical Field

The invention belongs to the technical field of gesture recognition, and particularly relates to a gesture recognition technology based on multi-mode information fusion.

Background

With the development of artificial intelligence, human-computer interaction technology as a core of the artificial intelligence development is receiving more and more attention. The essence of human-computer interaction is the information exchange process between a person and a computer for completing a certain task in a certain interaction mode. The hand is an important organ of the human body, not only has the function of working, but also is an important medium for information communication. The gesture is used as an effective information transmission means, has the characteristics of non-contact and diversity, is a common control means in the field of human-computer interaction, and the gesture recognition technology also becomes a hot topic in the fields of intelligent artificial limbs, rehabilitation training, sign language recognition, intelligent home furnishing and the like.

According to the current total population of China and the related data of the second national disabled people sampling survey, the proportion of the disabled population to the total population of the whole country is 6.34%, the total number of various disabled people is about 8876 thousands of people, wherein the proportion of the disabled people to the physical disabilities is the largest, and is 29.07%, and the number of people is about 2580 thousands. The intelligent artificial limb is arranged to help the disabled patients to relieve inconvenience and recover partial missing functions. The traditional gesture recognition method cannot be applied to the control of the artificial limb, and the gesture recognition based on the surface electromyogram signal can be used for meeting the requirements of the use and the control of the artificial limb.

At present, the population of China is aging seriously, so that more and more people face the trouble of apoplexy of middle-aged and old people, and the labor cost gradually rises. The cost of rehabilitation treatment for patients with limb function impairment will be higher and higher, so the way of using machines to replace manual work to help patients to perform rehabilitation training is a trend. The nervous system of a human has a recovery ability, so that the paralysis of limbs caused by a stroke can recover partial functions or even completely recover through rehabilitation training. Clinical practice shows that rehabilitation training matched with the active intention of a human body has a very positive effect on the rehabilitation of a nervous system. The surface muscle electrical signal can well reflect the active movement intention of the human body, so that the patient can carry out more effective rehabilitation training by matching with the gesture recognition technology of the myoelectric signal.

Sign language is a language of a structure language, is not limited by a vocal organ, is a main mode of daily communication of deaf-mutes, and normal people are not familiar with the expression mode of the sign language, which also causes communication barrier between the normal people and the deaf-mutes.

With the development of virtual reality and intelligent home technology, a user can realize the control of equipment through an information interaction means, and can experience more convenient and faster life style for the group with inconvenient movement, such as control of lamplight, household appliances, wheelchairs and even small aircrafts.

The hand motion recognition technology based on the surface electromyogram signal is researched, the hand motion recognition with rapidness, reliability and low cost is realized, the control and the use of the intelligent artificial limb by the disabled can be met in the aspect of medical rehabilitation, the rehabilitation training of the motor neuropathy patient is carried out based on the active will, and the more convenient and more universal information output in the man-machine interaction process can be met.

The Surface ElectroMyoGram (sEMG) is used as bioelectricity generated by human body activities, the acquisition of the bioelectricity has the advantages of non-invasiveness and non-wound, the psychology and the body of an acquired person are not burdened, but the single sensor collects information in a one-sided mode, the sEMG can effectively reflect the flexion and extension of fingers but cannot reflect the pose information of the hand, so that the number of gestures which can be covered by only identifying the gestures through the sEMG is small, and the actions of parts of the fingers which are similar to the actions are easy to identify errors.

Kernel Principal Component Analysis (KPCA) is used as a nonlinear generalization of Principal Component Analysis (PCA), and is implemented by mapping sample data from an input space to a high-dimensional feature space through nonlinear transformation, and then performing feature extraction in the high-dimensional feature space by using PCA. PCA is often used for dimensionality reduction of linear features, the advantage of which cannot be highlighted for nonlinear data, whereas sEMG signals are nonlinear. Therefore, the myoelectric signal and pose signal characteristics are fused and dimension reduced by using the KPCA method, and the original signal aspect is more reasonable. And the characteristics after dimension reduction and fusion are sent to a back propagation neural network based on genetic algorithm optimization, so that the identification process is accelerated.

Genetic Algorithm (GA), an evolutionary Algorithm designed according to the rules of evolution of organisms in nature, was first proposed by John holland in the united states in the 70 th 20 th century. The basic principle is an evolution algorithm simulating Darwinian biological evolution theory, namely 'competitive selection of a matter and survival of a suitable person'. The algorithm encodes problem parameters into chromosomes, and then performs operations such as selection, intersection, variation and the like in an iterative mode to exchange information of the chromosomes in a population, so as to finally generate the chromosomes meeting the optimization target. When a complex combined optimization problem is solved, a better optimization result can be obtained faster compared with some conventional optimization algorithms. Genetic algorithms have been widely used by people in the fields of combinatorial optimization, machine learning, signal processing, adaptive control, artificial life, and the like.

The bp (back propagation) neural network is a concept proposed by scientists including Rumelhart and McClelland in 1986, is a multi-layer feedforward neural network trained according to an error back propagation algorithm, and is the most widely applied neural network. The network structure of the device is composed of three neutral network structures including an input layer, a hidden layer and an output layer, and a single hidden layer can sufficiently meet the use requirement. The input layer receives data, the output layer outputs data, the neuron in the previous layer is connected to the neuron in the next layer, information transmitted by the neuron in the previous layer is collected, and a value is transmitted to the neuron in the next layer through 'activation'. Although the BP neural network is the most widely applied algorithm in the artificial neural network, the selection of the network structure, the initial connection weight and the threshold value of the BP neural network greatly improves the network training, but the BP neural network cannot be obtained accurately, and the genetic algorithm can improve the BP neural network.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a gesture recognition technology, which is a method for acquiring the myoelectric signals and the pose signals of the hands by a sensor to recognize hand actions.

The invention provides the following technical scheme: 1. a gesture recognition technology based on multi-modal information fusion is characterized in that: the method specifically comprises the following steps:

the method comprises the following steps: arranging a myoelectricity sensor at a proper position and placing a pose sensor at the back of a hand according to the muscle related to the gesture to be recognized, and executing the gesture action by a subject to acquire signals;

step two: denoising original data of electromyographic signals, three-axis acceleration signals of gesture actions and angle signals by a filtering method, extracting signal active segments by a dual-threshold segmentation method, and marking labels;

step three: extracting time domain characteristics and AR model parameters of the electromyographic signals, and extracting absolute average values and waveform length characteristics of the triaxial acceleration signals and the angle signals;

step four: performing fusion dimensionality reduction on the features extracted in the third step by a Kernel Principal Component Analysis (KPCA) method, and inputting dimensionality-reduced data into a classifier;

step five: optimizing an initial weight threshold of the BP neural network through a genetic algorithm, and training to obtain an optimized prediction model;

step six: and inputting the collected signals of the gesture motion needing to be predicted into a back propagation neural network GA-BPNN optimized based on a genetic algorithm through the characteristics after dimension reduction and fusion for identification, and outputting the result.

The gesture recognition technology based on multi-modal information fusion is characterized in that in the second step, a low-pass filtering mode is adopted, signals passing through a Butterworth low-pass filter with the cut-off frequency of 5Hz are subtracted from original signals, the base line drift of electromyographic signals is removed, low-frequency motion artifact interference and high-frequency noise interference are removed through a Butterworth band-pass filter with the pass band of 20Hz-460Hz, a comb-shaped wave trap is used for removing 50Hz power frequency and harmonic interference thereof, and noise interference mixed in the process of collecting acceleration signals and angle signals is removed through moving average filtering.

The gesture recognition technology based on multi-modal information fusion is characterized in that in the second step, data are divided through overlapped sliding windows, a first threshold value sigma is set by extracting an energy average value of multi-channel electromyographic signals, the size of the first threshold value sigma is 1.3 times of the average value of the multi-channel electromyographic signals in a relaxed state, a second threshold value mu is set to be 6, a calculated value in the window is compared with the first threshold value sigma, when the calculated value is larger than the first threshold value sigma and the length of a continuous window is larger than the second threshold value mu, the signal is judged to be an active segment, and the active segment of the pose signal is divided according to the active segment time of the electromyographic signals.

The gesture recognition technology based on multi-modal information fusion is characterized in that in the fourth step, a kernel function in the kernel principal component analysis method uses a Gaussian radial basis kernel function, and the first n principal components with contribution rate of more than 90% are selected as data after dimensionality reduction.

The gesture recognition technology based on multi-modal information fusion is characterized in that in the fifth step, the characteristics obtained after the test data are processed in the first step to the fourth step are input into a BP neural network, the absolute value of the error of a prediction result is used as the fitness of a genetic algorithm, the individual length is determined by the BP neural network structure, if the number of nodes of an input layer, a hidden layer and an output layer of the BP neural network is l, m and n respectively, the individual length is (l + n +1) x m + n, the optimal initial weight threshold obtained by optimizing the genetic algorithm is used as the initial weight threshold of the BP neural network, and a network model obtained after training is used for prediction.

The gesture recognition technology based on multi-modal information fusion is characterized in that in the sixth step, signals to be predicted are processed in the first step to the fourth step, characteristics are input into a trained neural network, and the class with the highest probability in results is output as a recognized gesture class.

By adopting the technology, compared with the prior art, the invention has the following beneficial effects:

1) the invention adopts a mode of extracting electromyographic signals and pose signals as original data, extracts information from multiple angles, complements the defect of incomplete extraction of single electromyographic signal information, increases the diversity of gesture information, and is beneficial to the improvement of gesture recognition accuracy;

2) the method adopts a method of electromyographic signal average energy double thresholds to segment the data active segment, limits the signal amplitude and the duration, reduces the interference of short-time noise interference on the segmentation of the active segment, improves the accuracy of information effective segment acquisition, and is beneficial to the time synchronization of multi-mode signals by corresponding acceleration signals and angle signals to the time segment of electromyographic signal segmentation;

3) the invention adopts a kernel principal component analysis method to perform fusion dimensionality reduction on the features extracted from the multi-modal information, is closer to the characteristic of feature nonlinearity of the electromyographic signal, is beneficial to retaining the nonlinearity characteristic of the signal, improves the retention degree of the information contained in the original signal while performing feature fusion dimensionality reduction, and contributes to improving the gesture recognition efficiency and reducing the recognition time by reducing the feature dimensionality of an input classifier;

4) the initial weight threshold of the BP neural network is optimized by adopting a genetic algorithm, so that the problem that the initial weight threshold with larger influence on the network performance can only be determined randomly is solved, the training time is reduced, and the identification accuracy is improved.

Drawings

FIG. 1 is a flow chart of a gesture recognition technique based on modal information fusion in accordance with the present invention;

FIG. 2 is a flowchart of the training algorithm of the GA-optimized BP neural network model according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1-2, a gesture recognition technology based on multimodal information fusion specifically includes the following steps:

the method comprises the following steps: as shown in fig. 1, signals from both electromyogram signals and position signals are selected as gesture recognition samples. Arranging myoelectric sensors at arm muscles related to gestures of a subject, arranging IMU sensors at the back of hands, executing the action gestures of the subject according to a cyclic mode of 'relaxation-action-relaxation', wherein the action and relaxation time lasts for about 1s, each group of actions is performed for 30s, a rest interval is not less than 2min before the next group of actions is performed, and each group of actions is repeated for 3 times.

Step two: as shown in fig. 1, the original surface electromyogram signal is easily interfered by physiological and environmental noises such as motion artifacts and power frequency noises, whereas the acceleration signal and the angle signal are easily interfered by high frequency environmental noises, and the signals need to be preprocessed to improve the signal-to-noise ratio of the signals. Firstly, removing baseline drift of a signal obtained by subtracting low-pass filtering from an original signal in a low-pass filtering mode; then, a Butterworth band-pass filter of 20Hz-460Hz is utilized to remove low-frequency motion artifact interference and high-frequency noise interference; and finally, removing power frequency noise of 50Hz and harmonic interference thereof by using a comb-shaped wave trap. And eliminating fluctuation of the acceleration signal and the angle signal due to noise interference by means of sliding window mean filtering.

Then, the signals are alive by the mean energy double-threshold method of the electromyographic signalsThe expression of the mean energy of the electromyographic signals is

Wherein t represents a certain time, L represents the window length, C represents the number of electromyographic signal channels, semg_a(i) Is the electromyogram signal value of the i moment under the a channel, wherein the window length is 50 ms. Extracting a numerical value by a window for extracting the average energy with a sliding distance of 10ms, setting a threshold value sigma to be 1.3 times of the average energy of the electromyographic signals in a relaxed state, setting a threshold value mu to be 6, comparing a calculated value in the window with the threshold value sigma, judging the signal of the section to be an active section when the calculated value is greater than the threshold value sigma and the length of a continuous window is greater than the threshold value mu, wherein the starting time point of the active section is the starting point of the window which initially meets the threshold value condition, the ending time point of the active section is the starting point of the first window which does not meet the threshold value condition in a subsequent window, and the active section of the pose signal is divided according.

Step three: the features are extracted according to the window length of 200ms and the window sliding length of 50ms, and three time domain features are extracted for the electromyographic signals, namely an absolute average value (MAV), a root mean square value (RMS) and a Variance (VAR).

The absolute average expression is:

where N is the window length, x_iIs the electromyographic signal at time i.

The root mean square value expression is:

the variance expression is:

for an AR parameter model, p parameters are used for representing sEMG signals in the method, and the invention adopts an AR parameter model with 4-order extraction, wherein the expression is as follows:

wherein u is_iIs white noise residual, p is AR model order, a_nIs the nth coefficient in the AR model.

And aiming at the extraction of the acceleration signal and the angle signal, the window of the acceleration signal and the angle signal is the same as the electromyographic signal, and the absolute average value is the same as the above.

The waveform length expression is:

step four: and (5) performing fusion dimensionality reduction on the features extracted in the step three by using a KPCA (kernel principal component analysis) method. Because different feature scales of signal extraction are different greatly, normalization processing needs to be performed on feature values first, and the feature values are normalized into data x with a mean value of 0 and a variance of 1₀The formula is as follows:

where x is the raw feature value data, μ is the mean of the data, and σ is the variance of the data. And mapping the normalized data from a data space by using a Gaussian radial basis kernel function to obtain a kernel matrix K, wherein the Gaussian radial basis kernel function formula is as follows:

where σ is the nuclear parameter. Then, the kernel matrix is centralized, and the formula is as follows: k_c＝K-l_NK-Kl_N+l_NKl_NWherein l is_NIs an N × N matrix, each element is 1/N, and N is the number of samples. Then calculating the matrix K_cTo obtain a corresponding eigenvector of lambda₁,···,λ_nSince the eigenvalues determine the magnitude v of the variance₁,···,ν_nAnd sorting the effective information in descending order of the characteristic values and adjusting the corresponding characteristic vectors if the characteristic values are larger and the effective information is more. Orthogonalizing and unitizing the feature vectors by a Schmidt orthogonalization method to obtain a₁,···,a_n. Calculating the accumulated contribution rate r of the reordered eigenvalues₁,···,r_nTaking the part with the contribution rate more than 90%, the formula is as follows:

selecting a₁,···,a_tAnd the data is the data after the dimensionality reduction and fusion.

Step five: as shown in fig. two, firstly, determining a BP neural network structure, and determining the number of nodes of an input layer, a hidden layer and an output layer, where the number of nodes of the input layer is dimension t after dimension reduction and fusion in the step four, the number of nodes of the output layer is a gesture type n to be recognized, and the number of nodes of the hidden layer m may refer to one of the following 3 formulas:

m＜t-1

m＝log₂t

wherein a is a constant between 0 and 10 and includes 0 and 10. After the network structure is determined, the total weight and threshold in the neural network can be calculated as follows: and (t + n +1) x m + n, and coding the data by taking the weight value and the threshold length as the population length. And then, obtaining an initialization weight threshold value according to the individual for constructing a BP neural network, and training the neural network by the preprocessed training data, wherein the data preprocessing process comprises the four steps. Predicting by using the trained neural network, and taking the absolute value of the error between the predicted output and the expected output as the individual fitness of the genetic algorithm, wherein the formula is as follows:

where n is the number of neural network output nodes, y_iIs the expected output of the ith node of the neural network, and y' is the predicted output of the ith node of the neural network.

And then a selection operation, which essentially selects excellent individuals from the old population with a certain probability from the viewpoint of natural selection and inheritance for next generation propagation, because the present invention uses the sum of absolute values of errors as a fitness function, the more excellent individuals should have smaller errors. Method of selectionThere are many, the present invention uses roulette, each individual i having a probability of selection p_iThe formula is as follows:

wherein S_iThe fitness of the individual i is better as the error requirement is smaller, and therefore the reciprocal is taken, and N is the number of population individuals.

Then, performing crossover operation to the selected individuals and selecting the k-th chromosome dna_kAnd the l chromosome dna_lPerforming a crossover operation at m bits, wherein the formula is as follows:

wherein r is a random number between [0,1 ].

The organism has variation in the genetic process, so the genetic algorithm has variation operation, and the jth gene dna of the ith individual is selected_ijPerforming variation by the formula:

wherein dna_maxAnd dna_minAre respectively based on dna_ijX and r are two [0,1]]G is the current iteration number, g_maxIs the maximum number of iterations.

And if the iteration times after the mutation is finished are smaller than the maximum iteration times, returning to the selection step to continue the selection, crossing and mutation until the specified iteration times are reached, taking the generation with the minimum error as the optimal initial weight and the threshold value, substituting the optimal initial weight and the threshold value into the BP neural network, and training the neural network to obtain the optimal BP network model.

Step six: and acquiring myoelectric signals and pose signals of the testee during gesture execution in real time, processing the myoelectric signals and the pose signals in the first step to the fourth step, sending the processed myoelectric signals and pose signals to the optimal BP neural network trained in the fifth step, and outputting the result, wherein the category with the highest possibility is used as a final recognition result.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A gesture recognition technology based on multi-modal information fusion is characterized in that: the method specifically comprises the following steps:

2. The gesture recognition technology based on multimodal information fusion as claimed in claim 1, wherein in the second step, a low pass filtering mode is adopted to subtract the signal passing through the butterworth low pass filter with the cut-off frequency of 5Hz from the original signal to remove the baseline drift of the electromyographic signal, the butterworth band pass filter with the pass band of 20Hz-460Hz is adopted to remove the low-frequency motion artifact interference and the high-frequency noise interference, the comb-shaped trap is adopted to remove the 50Hz power frequency and the harmonic interference thereof, and the sliding average filtering is adopted to remove the noise interference mixed in the acquisition process of the acceleration signal and the angle signal.

3. The multi-modal information fusion-based gesture recognition technology of claim 1, wherein in the second step, the data is divided through overlapped sliding windows, a first threshold value σ is set by extracting the energy mean value of the multi-channel electromyographic signals, the size of the first threshold value σ is 1.3 times of the mean value of the multi-channel electromyographic signals in the relaxed state, a second threshold value μ is set to 6, the calculated value in the window is compared with the first threshold value σ, when the calculated value is greater than the first threshold value σ and the length of the continuous window is greater than the second threshold value μ, the signal is judged to be an active segment, and the active segment of the pose signal is divided according to the active segment time of the electromyographic signals.

4. The gesture recognition technology based on multi-modal information fusion as claimed in claim 1, wherein in the fourth step, the kernel function in the kernel principal component analysis method uses a gaussian radial basis kernel function, and the first n principal components with contribution ratios of 90% or more are selected as the data after dimensionality reduction.

5. The multi-modal information fusion-based gesture recognition technology of claim 1, wherein in the fifth step, the features obtained by processing the test data in the first to fourth steps are input into a BP neural network, the absolute value of the error of the prediction result is used as the fitness of a genetic algorithm, the individual length is determined by the BP neural network structure, if the number of nodes of an input layer, a hidden layer and an output layer of the BP neural network is l, m and n respectively, the individual length is (l + n +1) xm + n, the optimal initial weight threshold obtained by optimizing the genetic algorithm is used as the initial weight threshold of the BP neural network, and the trained network model is used for prediction.

6. The gesture recognition technology based on multi-modal information fusion as claimed in claim 1, wherein in the sixth step, the signal to be predicted is processed through the first to fourth steps, the features are input into the trained neural network, and the class with the highest probability in the result is output as the recognized gesture class.