CN111562842B

CN111562842B - Virtual keyboard design method based on electromyographic signals

Info

Publication number: CN111562842B
Application number: CN202010352231.4A
Authority: CN
Inventors: 李辉勇; 牛建伟; 符宗恺; 刘雪峰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-01-21
Filing date: 2020-04-28
Publication date: 2021-11-30
Anticipated expiration: 2040-04-28
Also published as: CN111562842A

Abstract

The invention provides a virtual keyboard design method based on electromyographic signals, and relates to the technical field of human-computer interaction and intelligence. The method comprises the following steps: the method comprises the steps of collecting a training sample of an electromyographic signal generated when a user knocks a keyboard, enabling a label of the training sample to be a gesture action sequence, denoising the electromyographic signal and converting the electromyographic signal into a time-frequency graph, training an established gesture recognition model by using the time-frequency graph of the training sample, recognizing the electromyographic signal collected in real time by using the trained gesture recognition model, obtaining the gesture action sequence by using a CTC (central processing unit) technology, and performing key code conversion on the recognized gesture action sequence. The keyboard layout designed by the invention is a T9 keyboard, a user can imagine any plane into a keyboard with three rows and three columns by wearing the MYO armlet, the gesture action of knocking the plane is carried out, and the character input is realized, and through experiments, the virtual keyboard realized by using the method of the invention is used for character input, and the speed can reach 15.7 characters/minute.

Description

Virtual keyboard design method based on electromyographic signals

Technical Field

The invention discloses a virtual keyboard design method based on electromyographic signals, and belongs to the technical field of human-computer interaction and intelligent computing.

Background

The traditional physical keyboard is a most common man-machine interaction device for text input, but the physical keyboard is usually arranged in a fixed place and is too large to be carried around easily. In the field of mobile human-computer interaction, devices should be small, portable and capable of computing and communicating, and therefore the concept of a virtual keyboard has been proposed, which allows a user to input text and commands to a computer or other like device through different devices and methods. Nowadays, the most common virtual keyboards are those of smartphones and tablets, which are arranged in the screen of the device, and the user can input text by touching the screen area corresponding to the keys. However, the size of such virtual keyboards is typically small, making it easy for the user to touch the wrong area, and such virtual keyboards require additional display area of the device.

The electromyographic signals are the superposition of action potentials of movement units in a plurality of muscle fibers on time and space, and can reflect the activities of neuromuscular to a certain extent. At present, the applications based on electromyographic signals are mainly embodied in two fields, namely the medical field of diagnosing medical diseases and helping disabled people to control artificial limbs, and the human-Computer interaction field (MCI) based on electromyographic signals. Previous studies have demonstrated the feasibility of using electromyographic signals for human-computer interaction, and many researchers have proposed that gestures that can be recognized by electromyographic signals can be used to communicate with computers. In the previous research of character input based on myoelectric signals, the myoelectric signals can be recognized by a recognition algorithm to convert the myoelectric signals into words corresponding to sign language. However, this method requires the user to be familiar with sign language gestures in advance, and at the same time, this method cannot bring the user experience of using a real keyboard.

Disclosure of Invention

In consideration of the fact that people are familiar with input habits under keyboard layout, the invention provides a virtual keyboard design method based on electromyographic signals.

According to the method for designing the virtual keyboard based on the electromyographic signals, the T9 keyboard is adopted in the designed keyboard layout, and the whole keyboard is divided into 9 areas. The present invention defines 9 sets of actions, each set of actions corresponding to tapping an area in the keyboard. The user can imagine any surface as a keyboard for text command input by performing finger tap gestures. The method comprises the following implementation steps:

step 1: collecting electromyographic signals generated when a user performs keyboard knocking operation, and transmitting the electromyographic signals to intelligent terminal equipment;

when collecting the marking data and detecting in real time, the user wears the MYO armlet at the position up to the elbow of the right hand, establishes wireless connection with the intelligent terminal equipment through the bluetooth, and the myoelectric signal that the MYO armlet was gathered transmits for intelligent terminal equipment through the bluetooth.

During training and test data acquisition, an acquisition person performs gesture actions by observing a gesture action sequence on a screen; each collected training sample is a section of electromyographic signals, and the label is a key code sequence corresponding to an English word, namely a gesture action sequence. Preferably, the length of each training sample is set to a preset fixed length, and when the length of the collected electromyographic signal is insufficient, zero is filled at the end of the signal.

Step 2: the intelligent terminal device preprocesses the electromyographic signals and comprises: denoising the collected electromyographic signals, and converting the time domain signals into a time-frequency graph.

And step 3: constructing a gesture recognition model; the input of the gesture recognition model is a time-frequency graph of an electromyographic signal, the gesture recognition model firstly extracts a characteristic graph from the time-frequency graph through a CNN (convolutional neural network) layer, then converts the characteristic graph into a characteristic sequence to be input into an LSTM network, learns the time sequence relation between signals and outputs a time probability graph; the size of the hidden layer of the LSTM network is set as the number of the gesture action types. The time probability graph is a two-dimensional matrix, each row of the matrix represents a gesture action category, each column represents a time point, and element values in the matrix represent probability values of the gesture action category at the corresponding time point.

And 4, step 4: training a gesture recognition model; preprocessing the training sample in the step 2 to obtain a time-frequency diagram of the electromyographic signal, inputting the time-frequency diagram into the gesture recognition model, and outputting a time probability diagram; decoding the time probability map by using a CTC (continuous traffic control) technology to generate a gesture action sequence with an indefinite length; and if l represents a gesture action sequence and x represents a time probability graph, p (l | x) represents the probability that x is input and l is output, and the output time probability graph is adjusted by adjusting network weights in a gesture recognition model in the training process so that the probability p (l | x) is maximized.

And 5: deploying the trained gesture recognition model in intelligent terminal equipment, preprocessing the myoelectric signals acquired in real time in the step 2, inputting the preprocessed myoelectric signals into the trained gesture recognition model for recognition, and acquiring a gesture action sequence from the output time probability graph by using a CTC technology; and combining the recognition result of the gesture recognition model with the language model to finally determine the gesture action sequence.

Step 6: and performing key code conversion on the gesture action sequence. Each gesture corresponds to one key code, and the invention adopts a T9 keyboard, and each key code corresponds to three to four characters. Firstly, extracting words from a text library; establishing a dictionary tree for all extracted words; during conversion, traversing the dictionary tree by using a depth-first search algorithm to generate all character sequences corresponding to the key code sequence and existing in the dictionary tree; and the user selects the character sequence to be input to realize character input.

Compared with the prior art, the invention has the advantages and positive effects that:

(1) the myoelectric signal-based virtual keyboard provided by the invention uses a wearable device MYO arm ring integrating a myoelectric sensor and Bluetooth as a data acquisition device, can acquire 8-channel myoelectric signals and interacts with a mobile phone and a computer through the Bluetooth, has portability, can be worn at any time and any place, can be used by a user as a real physical keyboard for inputting characters by using the designed virtual keyboard, and can acquire rich myoelectric signals to prepare for subsequent more accurate gesture classification.

(2) The method realizes a gesture recognition model based on CNN and RNN, utilizes the characteristics of CNN learning signals and RNN learning context relationship of electromyographic signal sequences, inputs of the gesture recognition model are time-frequency graphs of signals, outputs are gesture labels, the overall recognition process is end-to-end, and the prior recognition mode is omitted, namely people select different characteristics of signal segments to form characteristic sequences by using signal processing knowledge and experience, and then compare different classifiers to select the final classifier according to the classification accuracy. The gesture recognition model of the invention learns the characteristics and the internal relation of the input time-frequency graph by self, and has real-time performance and higher gesture recognition accuracy by adjusting the weight of each network layer. Experiments show that the speed of the virtual keyboard designed by the invention can reach 15.7 characters/Minute (Word Per Minute, WPM) when the virtual keyboard is used for inputting characters.

(3) When the speed of inputting characters is increased, the gesture action frequency is increased, and the signal segment corresponding to a single gesture action cannot be accurately intercepted through signal short-time energy, so that the method of the invention uses the CTC technology to align the electromyographic signal sequence and the gesture action label sequence, avoids manually aligning the label corresponding to each frame signal, and simultaneously enables the gesture recognition model designed by the invention to recognize the gesture sequence, thereby improving the accuracy of gesture sequence recognition.

Drawings

FIG. 1 is a schematic flow chart of a method for designing a virtual keyboard based on an electromyographic signal according to the present invention;

FIG. 2 is an example of a time-frequency diagram of an extracted electromyographic signal of the present invention;

FIG. 3 is a schematic diagram of a gesture recognition model according to the present invention.

Detailed Description

The technical solution of the present invention will be further described in detail with reference to the accompanying drawings and examples.

According to the virtual keyboard design method based on the electromyographic signals, gesture actions of equipment users are recognized by collecting the electromyographic signals, each recognized action is converted into a key code, and the key codes are converted into character text sequences, so that text input is achieved.

In order to bring the experience of using a real physical keyboard to a user, the virtual keyboard realized by the invention adopts a T9 keyboard layout commonly used by a smart phone, the keyboard is divided into 3 rows and 3 columns, 9 key areas are totally formed, and each key corresponds to 3 or 4 characters. In order to make the user experience close to using a real physical keyboard, the invention defines 9 groups of gesture actions, wherein each group of gesture actions corresponds to 9 areas on a plane hit by a finger.

The biggest difficulty of the virtual keyboard is the recognition of a finger gesture sequence under high frequency, for the tasks of the type, the prior work usually extracts an active segment signal from a continuous electromyographic signal and recognizes a single gesture, and an end-to-end model is designed by the method, and the electromyographic signal is directly mapped into a final result.

Compared with the character input equipment which needs to ensure the speed of inputting characters, the gesture recognition method has the advantages that the gesture action frequency is high in the gesture recognition scene for character input, so that a model which can be used for detecting the gesture action sequence needs to be designed, the input of the model is a fixed-length signal sequence, and the output is an indefinite-length gesture action sequence. During model training, a CTC (connection Temporal Classification) technology is adopted, and each frame of an input gesture action sequence corresponds to a gesture label; when the model is predicted, a predicted gesture sequence is obtained by adopting a CTC decoding algorithm of a fusion language model; the structure of the model combines CNN and RNN, and can learn and extract the characteristics and internal relation of the signal by self. The gesture sequence recognition model designed by the invention is an end-to-end model, the time-frequency diagram of the signal obtained by short-time Fourier change is input into the model for the original signal, the key code sequence can be obtained, then the key code is converted, all possible text character sequences are searched on a dictionary tree and are displayed in front of eyes of a user through visualization for the user to select.

The physical equipment required for realizing the virtual keyboard comprises a MYO arm ring, intelligent terminal equipment such as a computer, an intelligent mobile phone and the like, and functional modules contained in the intelligent terminal equipment comprise an electromyographic signal preprocessing module, a gesture recognition module, a key code conversion module and the like.

Wherein, a communication program is compiled according to a Bluetooth protocol provided by a MYO development manual, so that the intelligent terminal equipment communicates with the MYO arm ring through Bluetooth to construct a wearable electromyographic signal acquisition system,

the virtual keyboard design method for character input based on electromyographic signals provided by the invention is realized by a flow shown in fig. 1, which is specifically described by dividing into 6 steps.

Step 1: myoelectric signal collection and labeling are carried out by using the MYO arm ring.

In the embodiment of the invention, 9 groups of actions are defined for 9 key areas, and each group of actions corresponds to one area in a keyboard. The user uses the right hand with 3 fingers-index, middle and ring fingers-to perform gesture actions using the keyboard. The index finger is used for controlling the leftmost column of the keyboard, the middle finger is used for controlling the middle column of the keyboard, and the ring finger is used for controlling the rightmost column of the keyboard. Each column is divided into 3 zones and the 9 sets of gestures designed by the present invention are used as the index, middle and ring fingers to perform stretching, normal and bending motions, respectively.

In the embodiment of the invention, when the marking data are collected and the gesture is detected in real time, the MYO arm ring is connected with a mobile phone or a computer through Bluetooth and is worn at the position of the elbow part of the right hand of a user. The MYO armlet is provided with 8 sensor channels, the sampling frequency of the electromyographic signals is set to be 200hz, the bit width of 8 bits is set, and the MYO armlet collects the electromyographic signals of the 8 channels.

When training data is collected, a collector observes a label sequence given on a screen and makes a corresponding gesture action, the start time and the end time of collecting data by the collector are artificially controlled, timestamps are stamped on the start and the end of collecting myoelectric signals of the gesture action, and a signal segment corresponding to the gesture action is intercepted from the collected myoelectric signals through the timestamps. In the scene of character input, a single gesture is not easy to be directly extracted from a signal in the real-time detection process, so that the data acquired by the method is a signal segment corresponding to a gesture action sequence. Because the acquisition personnel can not obtain the real-time feedback of the gesture actions of the acquisition personnel when acquiring the data and is not familiar with the layout and fingering of the keyboard, the acquisition personnel often perform wrong gesture actions when acquiring the data. In order to increase the accuracy of data acquisition, the invention draws a keyboard on a plane, the keyboard is divided into 9 areas, corresponding key codes are written on each area, and simultaneously, English character sequences are converted into key code sequences in advance, namely gesture action sequences. The training data acquired by the whole collection is divided into a test set and a training set. In the embodiment of the invention, two parts of data are respectively collected on different days, and the collection personnel take a rest for 10 minutes every time 50 words are collected. The input of the model training sample is a section of electromyographic signals, the label is a section of gesture action sequence with an indefinite length, the gesture actions are totally divided into 9 categories, and the 9 categories correspond to 9 areas on the keyboard.

When data are collected, the gesture action label sequence is not fixed, but the length of the corresponding electromyographic signals is not fixed, the length of the electromyographic signals is set to be a uniform value, and when the length of the collected electromyographic signals is not enough, the length is uniformly fixed by zero filling at the tail end of the signals.

Step 2: and the myoelectric signals collected by the MYO armlet are preprocessed by the computer or the mobile phone terminal. The preprocessing of this step is accomplished by a signal preprocessing module.

The myoelectric signal itself that gathers through the Myo bracelet has certain noise, can not directly be used for carrying out the model training, consequently need to carry out noise reduction to original signal. Since the frequency of the electromyographic signals is concentrated between 10hz and 200hz, the method firstly uses a low-pass filter to filter out the noise signals with the frequency lower than 10 hz.

A time domain representation of a signal is a representation of a mathematical function and a physical signal versus time and a frequency domain representation of a signal is a representation of the signal's characteristics in terms of frequency. From a physical perspective, a signal frequency has a distinct characteristic, and is one of the essential attributes of all things in nature, and the signal characteristic is more stable than that of a signal in a time domain. Therefore, the method of the invention uses short-time Fourier transformation to convert the electromyographic signals into a time-frequency diagram. Specifically, the sliding window size is set to 500ms, and the overlapping ratio of the windows is set to 10%. In order to prevent the frequency spectrum leakage, a hanning window (hanning) operation is performed on the signal in the window, and then the signal after the hanning window operation is subjected to short-time Fourier transform to obtain a frequency domain signal of the signal in the window. And finally, transversely connecting the frequency domain signals obtained by the change of each window together to finally obtain a time-frequency diagram of the electromyographic signals.

Fig. 2 is a time-frequency diagram of myoelectric signals of 8 channels of the Myo bracelet, each graph corresponding to a myoelectric signal of one channel. Wherein the horizontal axis of the time-frequency diagram represents time, the vertical axis represents frequency, and the value of a pixel represents the intensity of a signal. It can be seen from the figure that there are four darker regions in the time-frequency diagram of 8 channels, and the regions correspond to four gesture actions.

And step 3: and (5) building a gesture recognition model.

The method commonly used for gesture recognition is to extract features of the signal to form feature vectors, and then send the feature vectors to a classifier for classification. This method requires sufficient signal processing knowledge, relies on experience to manually construct feature vectors and select classifiers for gesture recognition. Obtaining an accurate gesture recognition model requires multiple attempts, which is time consuming and labor intensive. The method of the invention utilizes a deep learning method to identify and classify the gesture actions.

The gesture recognition model established based on the deep learning method is realized by combining the structures of a CNN (volumetric Neural network) layer and an RNN (Current Neural network) layer, the features of the electromyographic signal time-frequency diagram are extracted through the CNN network, the gesture action sequence is obtained through the RNN network, and the characteristics of the electromyographic signal corresponding to the gesture actions in different time periods can be learned.

The gesture recognition model is an end-to-end model, an 8-channel electromyogram signal time-frequency diagram is input into the model, and an indefinite-length gesture action sequence is output; the first gesture action category is nine in number, and corresponds to actions of clicking nine different areas on the virtual keyboard. The network structure of the gesture recognition model is shown in fig. 3, a Feature map is extracted from a time-frequency graph by using convolutional layers (Conv1, Conv2) and a pooling layer (Maxpooling), and then the features are converted into a sequence form (Feature sequence) through an im2col module and input into an LSTM (long short term memory network) network to learn the time sequence relationship between signals. In order to support the gesture recognition model network to recognize gesture action sequences with indefinite length, the hidden layer size of the LSTM network is set as the number of gesture action types, the output of the LSTM network is used as a time probability graph (probability graph), and the gesture action sequences with indefinite length are obtained by decoding the time probability graph by adopting a CTC algorithm. The im2col module is optimized for convolution operations.

The time probability map is a two-dimensional probability matrix, each row represents a gesture action list, each column is a time point, wherein the value of the point (t, l) represents the probability value of the gesture action being in the category l at the time point t.

And 4, step 4: and training a gesture recognition model.

The traditional gesture recognition framework is that signal segments corresponding to gesture actions are extracted from original signals, and then single gestures are recognized through a trained gesture recognition model. In a scene of character input based on myoelectric signal gesture recognition, the frequency of gesture actions is very high, the interval between two gestures is very small, and a signal segment corresponding to a single gesture action cannot be accurately extracted from an original signal. Therefore, the method of the invention adopts the CTC technology to align the one-to-one correspondence problem of the input electromyographic signals and the output gesture action sequences, thereby realizing gesture sequence identification.

And inputting a time-frequency graph of the electromyographic signals into the gesture recognition model to obtain a probability matrix. The method of the invention uses a CTC algorithm to combine the probability matrix with the input label gesture sequence, namely decoding the probability matrix output by the LSTM network to generate a gesture action sequence with an indefinite length.

The loss function of the gesture recognition model is as follows:

first, p (l | x) is calculated, which represents the probability that when the input is x and the output is l, where l is the gesture action label sequence in the collected data, and x is the probability matrix, i.e. the time probability map. Where p (l | x) can be calculated by the following formula:

this formula calculates all the paths that can be transformed into the sequence l by the CTC algorithm, where，π_tRepresents the state value of the path pi at time T, T represents the length of the probability matrix, B (pi)_1:T＝l_1:T) Represents the set of all paths pi through which the sequence l can be transformed.

The probability value corresponding to the state of the path pi at the time t is represented, and the value can be obtained through a probability matrix, so that p (l | x) can be calculated after the probability matrix and the gesture label sequence are given.

The goal of model training is then to maximize the value of p (l | x), i.e., the loss function, as follows:

loss＝maximize(p(l|x))

the probability matrix of the model output is adjusted by updating the weights of the gesture recognition model network such that p (l | x) is maximized.

And training the established gesture recognition model by using a loss function and a training set to obtain a trained model.

And 5: and training the language model.

The language model is used for judging whether the character string is reasonable or not. Assuming a string S, length m, S ═ c₁,c₂,c₃…c_m) Wherein c is_iRepresenting the ith character in the string. The possibility of calculating a character string by means of an N-gram model, N being set to 2 in view of calculation speed and model size, the probability of occurrence p (S) of the character string S being calculated as follows:

P(S)＝P(c₁)P(c₂|c₁)P(c₃|c₂)…P(c_m|c_m-1)

the embodiment of the invention uses 1-billion-word-language-modeling-benchmark-r13 output' as a corpus to establish a language model, firstly, each word in the corpus is converted into a T9 key code sequence, and a given key code c is counted_iThe next key code is c_jTo thereby calculate P (c)_i|c_j) The value of (c). Finally, when a key code sequence is given, the value of P (S) can be calculated according to the formula, and the rationality of the key code sequence is judged。

Step 6: and recognizing the key code by using the trained gesture recognition model.

And 4, obtaining a trained gesture recognition model as a gesture recognition module, deploying the model to the intelligent equipment terminal, and detecting the gesture action of the user in real time. And (3) preprocessing the real-time collected original electromyogram signals by using the method in the step (2) to extract a time-frequency graph, and inputting a gesture recognition model to obtain a predicted probability graph. The final recognition result is a path with the highest score from left to right on the probability chart, in order to improve the accuracy of model prediction, the score of the language model is added in the calculation of the score, and the formula of the final predicted gesture action sequence is as follows:

C＝argmax(α*(P_em(C|x))+β*(P_lm(C|x)))

c is the last predicted gesture motion sequence, x is the probability matrix predicted by the gesture recognition model, P_em(C | X) represents the probability that the gesture motion sequence predicted to be output by the gesture recognition model is C, which can be obtained by calculating the formula of p (l | X) in step 4; p_lm(C | X) represents the probability that the gesture action sequence output by the language model prediction is C, which can be obtained by calculating the formula of P (S) in step 5. Alpha and beta are two weighted values and are used for controlling the proportion occupied by the gesture recognition model and the language model when the gesture action sequence is predicted together at last. The argmax function of the above formula represents the sequence of gesture actions taken to maximize the value of the function in the argmax brackets.

The invention uses a cluster search method to find the path with the highest score. The searching method can be summarized as that each point under the current frame is added to the existing path from left to right on X to obtain a new path, then n paths with the highest score under the frame are calculated, and then the current path is updated by the newly calculated n paths.

And 7: and performing key code conversion on the recognized gesture action sequence, outputting a character sequence and realizing character input.

And (4) obtaining a gesture label sequence through gesture recognition in the step (6), wherein each gesture corresponds to a key code, and therefore the gesture sequence is equivalent to the key code sequence. In the keyboard layout designed by the invention, each key code corresponds to 3 to 4 characters, so that the key code sequence also needs to be converted. It is feasible to generate all key code sequences by combination of brute force search methods and then compare the key code sequences with a text library, but the method consumes a great deal of computing time and resources and is not real-time. Therefore, in order to accelerate the key code real-time conversion efficiency, the method adopts a dictionary tree method to build a tree for the text sequences which are possibly converted by all the key code sequences, a depth-first search (DFS) algorithm is used for traversing the possible solutions in the searching process, all the results obtained through the traversing are sorted according to the sum of the values of each path and are visually displayed to the user for the user to select.

Compared with the traditional dictionary tree, the method adds a characteristic, the value of each node in the dictionary tree not only contains characters, but also contains the probability value of the character string corresponding to the path, and the probability value can be obtained by calculating the frequency value of the character string in the text library. And finally traversing the dictionary tree through a DFS algorithm, and arranging all the generated possible character string results once from large to small according to the probability value for a user to select and input. The user selects the string last entered into the machine by a number gesture of 1 to 10.

The implementation of the virtual keyboard for character input of the invention is described above, wherein the myoelectric signals are collected by using the MYO arm ring, the virtual keyboard is very simple and portable, and a user can use the character input device of the invention to input characters only by wearing the MYO arm ring at the position of the forearm.

Claims

1. A virtual keyboard design method based on electromyographic signals is characterized in that a virtual keyboard realized by the method adopts a T9 keyboard layout, and a group of gesture actions are defined for each key area, and the method comprises the following steps:

step 1, collecting electromyographic signals generated when a user performs keyboard knocking operation by fingers as training samples, and setting labels for the training samples, wherein the labels are gesture action sequences with indefinite lengths;

the method for collecting the training sample comprises the following steps: pre-converting an English character sequence into a key code sequence, namely a gesture action sequence, making corresponding gesture actions by an acquisition person according to the given gesture action sequence, and stamping time stamps at the beginning and the end of electromyographic signal acquisition;

when data are collected, the gesture action sequence is not fixed, the length of the corresponding electromyographic signals is not fixed, the length of the electromyographic signals is set to be a uniform value, and when the collected electromyographic signals are insufficient in length, zero is filled at the tail of the signals and the signals are uniform to be a fixed length;

step 2, denoising preprocessing is carried out on the myoelectric signals, and the myoelectric signals are expressed into a time-frequency graph;

step 3, constructing a gesture recognition model;

the input of the gesture recognition model is a time-frequency graph of an electromyographic signal, the output of the gesture recognition model is a time probability graph, the time probability graph is a two-dimensional matrix, each row of the matrix represents a gesture action category, each column represents a time point, and element values in the matrix represent probability values of the gesture action categories at corresponding time points;

the gesture recognition model extracts a characteristic diagram from a time-frequency diagram of an electromyographic signal through a convolutional layer and a pooling layer, converts the characteristic diagram into a characteristic sequence, inputs the characteristic sequence into an LSTM network and outputs a time probability diagram; the size of the hidden layer of the LSTM network is set as the number of the types of the gesture actions;

step 4, training a gesture recognition model;

inputting a time-frequency graph of a training sample into a gesture recognition model, and outputting a time probability graph; decoding the time probability map by using a CTC (continuous traffic control) technology to generate a gesture action sequence with an indefinite length; if l represents a gesture action sequence and x represents a time probability graph, p (l | x) represents the probability that x is input and the output is l, and the output time probability graph is adjusted by adjusting the network weight in the gesture recognition model in the training process so that the probability p (l | x) is maximized;

step 5, performing real-time gesture recognition by using the trained gesture recognition model;

preprocessing the electromyographic signals acquired in real time in the step 2, inputting the preprocessed electromyographic signals into a trained gesture recognition model for recognition, and obtaining a gesture action sequence by utilizing a CTC (central control unit) technology; combining the recognition result of the gesture recognition model with the language model to finally determine a gesture action sequence;

if a time probability graph x is obtained by a gesture recognition model for the collected electromyographic signals, a final predicted gesture action sequence formula is as follows:

C＝argmax(α*(P_em(C|x))+β*(P_lm(C|x)))

where C is the final gesture motion sequence output, P_em(C | X) represents the probability that the gesture action sequence output by the gesture recognition model prediction is C, P_lm(C | X) represents the probability that the gesture action sequence output through language model prediction is C, and alpha and beta are two weighted values;

step 6, in the gesture action sequence, each gesture action corresponds to one key code;

and establishing a dictionary tree of words, traversing the dictionary tree by using a depth-first search algorithm, and converting the gesture action sequence into all character sequences corresponding to the key code sequence and existing in the dictionary tree.

2. The method according to claim 1, characterized in that in step 2, the electromyographic signals are converted into a time-frequency diagram using short-time fourier transformations, in particular: setting the size of a sliding window to be 500ms, setting the overlapping proportion of the window to be 10%, carrying out Hanning window adding operation on signals in the window, and then carrying out short-time Fourier transformation to obtain frequency domain signals; and finally, transversely connecting the frequency domain signals obtained by each sliding window together to obtain a time-frequency diagram of the electromyographic signals.