CN112183375A

CN112183375A - Gesture recognition method fusing electromyographic signals and visual images

Info

Publication number: CN112183375A
Application number: CN202011053444.3A
Authority: CN
Inventors: 印二威; 沈瑞; 谢良; 闫慧炯; 邓宝松; 罗治国; 闫野
Original assignee: Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center; National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center; National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-05
Also published as: CN113269159A

Abstract

The invention discloses a gesture recognition method fusing electromyographic signals and visual images, which comprises the following steps: acquiring an electromyographic signal by using a high-density electromyographic signal acquisition armband, acquiring a visual image by using a camera, and fusing the electromyographic image converted from the electromyographic signal and the visual image into an image; and inputting the fused image of the surface electromyogram and the visual image into an image classification module for identification and classification. According to the invention, two different network structures are not needed for identifying and classifying the surface electromyographic signals and the visual images, and the surface electromyographic signals and the visual images are only needed to be simultaneously transmitted into the network architecture, so that the identification and classification can be carried out by using the same network, the network complexity can be effectively reduced, and limited resources are saved. The invention can improve the form of gesture recognition by using a single input mode in each set of traditional network, and improves the respective limitations of the surface electromyogram signal and the visual image for gesture recognition respectively.

Description

Gesture recognition method fusing electromyographic signals and visual images

Technical Field

The invention relates to the field of gesture recognition, in particular to a gesture recognition technology fusing electromyographic signals and visual images.

Background

The invention is from the problem of gesture recognition of electromyographic signals and visual images. In the daily communication between people, gestures are a widely and frequently used interaction mode, and sign language is an important and convenient mode for the deaf-mute to communicate between people. Over the years, researchers have increasingly explored gesture recognition technology, and gesture recognition has long been one of the hot spots in the field of human-computer interaction.

There are many techniques for implementing gesture recognition, and the vision-based gesture recognition technique is most widely and conveniently researched and applied. The non-contact gesture motion acquisition method does not bring physical discomfort or movement obstruction to the user, has low cost and is favored in practical application. However, when image data for gesture recognition is acquired based on vision, many uncertainties exist, and the image data is susceptible to interference during application, for example, application conditions such as under different backgrounds, under different light, under different angles of an acquisition device (camera), and the like may become a restriction factor influencing gesture recognition accuracy, so that the gesture recognition technology based on vision can obtain a better effect under ideal conditions, but a great improvement space still exists in the recognition effect in practical application. The other technology for gesture recognition is gesture recognition based on electromyographic signals, compared with gesture recognition based on vision, gesture recognition based on the electromyographic signals has the greatest advantage that the gesture recognition based on the electromyographic signals is not influenced by external environments such as background conditions, light and the like, the defect of gesture recognition based on vision is overcome, the electromyographic signals can have differences among different individuals, the arrangement position of the collecting electrode has a large influence on the signals in the mode of collecting the electromyographic signals by using a sparse channel, and the factors increase the difficulty of gesture recognition based on the electromyographic signals.

Considering the advantages and disadvantages of the electromyographic signals and the visual images, better system performance can be obtained by fusing the electromyographic signals and the visual images, and meanwhile, because the data forms of the electromyographic signals and the visual images are different, two sets of networks are generally adopted when the electromyographic signals and the visual images are fused for use, which undoubtedly increases the calculation amount and the storage space. In order to solve the problems, the invention provides a gesture recognition method for fusing an electromyographic signal and a visual image.

Disclosure of Invention

In order to solve the problems of limited application scene of visual image gesture recognition and large difference of electromyographic signals among different individuals and reduce the calculated amount and the storage space, the invention provides a method for gesture recognition based on the electromyographic signals and the visual images by using a neural network. Collecting electromyographic signals by using high-density non-invasive equipment or invasive equipment, converting the electromyographic signals into electromyographic images, simultaneously collecting visual images, respectively carrying out corresponding processing on the two types of images according to a selected image fusion method, fusing the obtained electromyographic images and the visual images into an image, and inputting the fused image into an image classification module for identification and classification.

A gesture recognition method fusing electromyographic signals and visual images comprises the following specific steps:

s1, acquiring an electromyographic signal by using an electromyographic signal acquisition device, acquiring a visual image by using a camera, and converting the electromyographic signal and the visual image into an electromyographic image and a visual image which have the same size and can be used for subsequent fusion respectively;

s2, fusing the myoelectric image and the visual image by adopting an image fusion method to form an image;

and S3, inputting the fused images into an image recognition and classification module, and performing image classification and recognition to obtain a gesture recognition result.

The electromyographic signal acquisition comprises the steps of acquiring a high-density surface electromyographic signal by using a non-invasive device or acquiring a high-density electromyographic signal by using an invasive device;

the electromyographic signal acquisition equipment is high-density electromyographic signal acquisition armband equipment;

the image fusion method adopts a space domain fusion method or a transform domain fusion method;

the image recognition and classification module is a neural network-based classification and identification module.

The step S1 includes the following steps:

s11, setting the size of the image transmitted into the image classification module after fusion, determining the high-density electrode device and the placement position thereof according to the set size of the image, and collecting the electromyographic signals by using the device;

s12, selecting a certain moment, collecting an instantaneous value of the electromyographic signal at the moment, and generating a two-dimensional grid matrix according to the position of the electrode grid array by the instantaneous value;

s13, converting the two-dimensional grid matrix obtained in the step S12 into a gray scale map by utilizing linear transformation, wherein the obtained gray scale map is an electromyogram;

and S14, acquiring a visual image at the same time as the electromyographic signal, using equidistant sampling, reducing the visual image or amplifying the visual image by using an interpolation method, and converting the visual image into a gray map with the same size as the electromyographic image.

The step S2 includes the following steps:

and (4) superposing the two-dimensional images with the same size obtained in the step (S1) according to the dimension direction to obtain a three-dimensional image, namely a fusion image, which is used as the input of the subsequent neural network.

The step S3 of inputting the fused image into an image recognition and classification module for image classification and recognition includes the following steps:

the image identification and classification module comprises a feature extraction module and a classification module, the fused image firstly enters the feature extraction module for feature extraction, and then the output of the feature extraction module is used as the input of the classification module.

S31, firstly, inputting a data set into the feature extraction module to extract feature vectors;

and S32, inputting the feature vectors extracted in the step S31 into an identification module for identification and classification.

The method can automatically extract features by a network by adopting a deep convolutional neural network, and the related calculation of the deep convolutional neural network is realized by specifically comprising the following steps of:

wherein,

identifying an output of a l-th layer of the convolutional neural network for the gesture; i and j are indices of the rows and columns of the convolutional layer output image; f is a relu activation function; s is the size of the input image of the l-th layer;

is the input of the l layer;

a convolution kernel which is the l-th layer of convolution; b^(l)Is the bias term for the l-th layer.

In the image recognition and classification module, the loss function used is as follows:

wherein N is the total number of samples; c is the number of gesture categories; y is_i,kA predicted kth true gesture tag value for the ith sample; p_i，kThe probability of predicting the kth gesture tag value for the ith sample.

Assuming that the number of gesture categories is C, the output layer classifies the vector with input of C × 1 using the softmax function, and outputs the following:

wherein S is_jOutputting the jth value of the vector S for softmax; a is_jIs the jth value in the C1 vector; a is_kIs the kth value in the C x 1 vector.

An image recognition and classification module according to the loss function L_logAnd (Y, P) training the model by adopting a back propagation algorithm.

When predicting, the image recognition and classification module selects an index value with the largest value in a C x 1 vector output by a sample through the softmax layer as a predicted gesture label of the sample.

The myoelectric signal acquisition device is used for acquiring myoelectric signals, the camera is used for acquiring visual images, the myoelectric signals and the visual images are respectively converted into myoelectric images and visual images which have the same size and can be used for subsequent fusion, and the myoelectric image acquisition device specifically comprises:

determining the number of high-density electrodes for acquiring high-density electromyographic signals to be H multiplied by W according to the set resolution of the fused image to be H multiplied by W, and placing the electrodes in a uniform grid mode of H electrodes in each row and W electrodes in each column, wherein the amplitude of the signal acquired by each electrode is subsequently converted into the pixel value of the electromyographic image at the position; acquiring an electromyographic signal by using the high-density electrode, selecting a moment t, and acquiring an instantaneous value of the electromyographic signal at the moment; arranging instantaneous values of the electromyographic signals at the t moment into a two-dimensional H multiplied by W-dimensional grid point matrix according to grids of the electrode array, wherein each grid point in the matrix takes the value of the instantaneous value of the amplitude of the electromyographic signals acquired by a corresponding electrode at the t moment; and converting the obtained two-dimensional grid point matrix into a gray scale map by linear transformation, namely linearly transforming the instantaneous value of the amplitude of the electromyographic signal into the intensity of color to obtain the gray scale map which is the electromyographic image.

The invention has the following beneficial effects:

(1) greatly reduces the quantity of parameters, saves limited resources and improves the efficiency. After the electromyographic signals and the visual images are processed, the fused images are input into an image recognition and classification module, two different networks for respectively recognizing and classifying the electromyographic signals and the visual images are replaced by one network, the parameter quantity is reduced to the parameter quantity in the image recognition and classification module, overfitting caused by excessive parameters is weakened to a certain degree, meanwhile, calculation resources are saved due to the fact that the parameter quantity is greatly reduced, and calculation efficiency is improved.

(2) The method has higher accuracy in the recognition of both dynamic gestures and static gestures. The method integrates the electromyographic signal information and the visual image information, takes the advantages of the electromyographic signal information and the visual image information into account, and enables the recognition of the dynamic gesture and the static gesture to be high in accuracy.

(3) The advantages of the electromyographic signals and the visual images are integrated, and the method has strong adaptability to different external environments. The method integrates the electromyographic signal information and the visual image information, and has strong adaptability to different external environments.

Drawings

Fig. 1 is a flowchart illustrating a gesture recognition method based on fusion of an electromyographic signal and a visual image according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a high-density electromyographic signal acquisition armband device disclosed in the embodiment of the present invention.

Fig. 3 is a flowchart illustrating the electromyographic signal branch processing and conversion into an electromyographic image according to an embodiment of the present invention.

FIG. 4 is a flowchart illustrating a visual image branching process according to an embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating a fusion mode of an electromyogram and a visual image according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

The invention provides a gesture recognition method fusing an electromyographic signal and a visual image, wherein the processing flow is shown in fig. 1, only one of a plurality of methods for transforming the electromyographic image and fusing the image is selected for detailed explanation in the embodiment, the used high-density electromyographic signal acquisition armband device is shown in fig. 2, 101 in fig. 2 is a high-density electrode array, and 102 in fig. 2 is a fixing band.

Embodiment 1. gesture recognition method fusing electromyographic signals and visual images

The step S1 includes the following steps:

The step S2 includes the following steps:

and (4) superposing the two-dimensional images with the same size obtained in the step (S1) according to the dimension direction to obtain a three-dimensional image, namely the fused image.

If the deep convolutional neural network is adopted, the features can be automatically extracted by the network, and the step of manually extracting the features is omitted.

The related calculation implementation process of the deep convolutional neural network is as follows:

the mapping process of convolutional layer forward propagation is expressed as:

wherein,

is the input of the l layer;

The present invention utilizes the collected electromyographic signals and the visual images to obtain two images with the same size in step S1, and specifically, the flow of obtaining the electromyographic images from the electromyographic signals is shown in fig. 3, and the visual image processing method is shown in fig. 4.

In step 2, the present invention performs fusion according to the two images with the same size obtained in step 1 to obtain an image transmitted to the subsequent image classification module, where the processing manner is shown in fig. 5.

Through the steps 1 and 2, the fusion of the electromyographic signals and the visual images is completed, and the obtained electromyographic-visual images contain the characteristics of the electromyographic signals and the characteristics of the visual images.

The method comprises 2 key steps, namely a method for converting a preprocessed high-density electromyogram signal into an electromyogram image and a method for fusing the electromyogram image and a visual image, wherein the 2 key steps are described in detail.

The high-density electromyographic signals are converted into electromyographic images after filtering and normalization preprocessing. Fig. 3 shows a flow chart of the process of preprocessing the high-density electromyographic signals and converting the preprocessed signals into electromyographic images. In order to improve the influence of the difference of the electrode placement positions on the final recognition result, the electromyographic signals are collected by adopting the high-density electrodes in the example, and the high-density electrode device and the placement positions thereof are determined according to the set image size, and the specific steps are as follows:

according to the set fused image resolution H multiplied by W, determining that the number of high-density electrodes for acquiring high-density electromyographic signals is H multiplied by W, and placing the electrodes in a uniform grid mode of H electrodes in each row and W electrodes in each column, and subsequently converting the signal amplitude acquired by each electrode into the pixel value of the electromyographic image at the position; acquiring an electromyographic signal by using the high-density electrode, selecting a moment t, and acquiring an instantaneous value of the electromyographic signal at the moment; arranging instantaneous values of the electromyographic signals at the t moment into a two-dimensional H multiplied by W-dimensional grid point matrix according to grids of the electrode array, wherein each grid point in the matrix takes the value of the instantaneous value of the amplitude of the electromyographic signals acquired by a corresponding electrode at the t moment; converting the obtained two-dimensional grid point matrix into a gray scale by linear transformation, namely linearly transforming the instantaneous value of the amplitude of the electromyographic signal into the intensity of color to obtain the gray scale which is the electromyographic image;

for the fusion method of the electromyogram and the visual image, fig. 4 and 5 show a flow chart for converting the visual image into the gray scale and a schematic diagram of a fusion mode of the electromyogram and the visual image with the same size. Collecting a visual image at the same time of t moment as the electromyographic signal while collecting the electromyographic signal, and converting the visual image into a gray scale image with H multiplied by W size; and superposing the two obtained two-dimensional images with the same size according to the dimension direction to obtain a three-dimensional image, namely the input of the subsequent neural network.

Such implementations include, but are not limited to, the methods described above. The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A gesture recognition method fusing an electromyographic signal and a visual image is characterized by comprising the following specific steps:

2. The method for gesture recognition based on fusion of electromyographic signals and visual images according to claim 1, wherein the acquiring of the electromyographic signals comprises acquiring high-density surface electromyographic signals by using a non-invasive device or acquiring high-density electromyographic signals by using an invasive device.

3. The method for recognizing the gesture based on the fusion of the electromyographic signal and the visual image according to claim 1, wherein the electromyographic signal acquisition device is a high-density electromyographic signal acquisition armband device.

4. A gesture recognition method according to claim 1, wherein the image fusion method is a spatial domain fusion method or a transform domain fusion method.

5. The method for recognizing the gesture based on the fusion of the electromyographic signal and the visual image according to claim 1, wherein the image recognition and classification module is a neural network-based classification and recognition module.

6. The method for recognizing a gesture based on a fusion of an electromyographic signal and a visual image according to claim 1, wherein the step S1 comprises the following steps:

7. The method for recognizing a gesture based on a fusion of an electromyographic signal and a visual image according to claim 1, wherein the step S2 comprises the following steps:

8. The method for gesture recognition based on fusion of electromyographic signals and visual images according to claim 1, wherein the step S3 is performed by inputting the fused images into an image recognition and classification module for image classification and recognition, and specifically comprises the following steps:

the image identification and classification module comprises a feature extraction module and a classification module, the fused image firstly enters the feature extraction module for feature extraction, and then the output of the feature extraction module is used as the input of the classification module;

s32, inputting the feature vectors extracted in the step S31 into an identification module for identification and classification;

the method adopts a deep convolutional neural network to autonomously extract features by the network, and the related calculation of the deep convolutional neural network is realized, specifically, the forward propagation mapping process of the convolutional layer is expressed as follows:

wherein,

is the input of the l layer;

a convolution kernel which is the l-th layer of convolution; b^(l)Is the bias term of the l layer;

wherein N is the total number of samples; c is the number of gesture categories; y is_i,kA predicted kth true gesture tag value for the ith sample; p_i，kPredicting a probability of a kth gesture tag value for the ith sample;

wherein S is_jOutputting the jth value of the vector S for softmax; a is_jIs the jth value in the C1 vector; a is_kIs the kth value in the C1 vector;

an image recognition and classification module according to the loss function L_log(Y, P), training the model by adopting a back propagation algorithm;

9. The method for gesture recognition based on fusion of electromyographic signals and visual images according to claim 1, wherein the electromyographic signals are collected by an electromyographic signal collecting device, the visual images are collected by a camera, and the electromyographic signals and the visual images are respectively converted into the same size of electromyographic images and the same size of visual images which can be used for subsequent fusion, specifically comprising: