CN113397572A

CN113397572A - Surface electromyographic signal classification method and system based on Transformer model

Info

Publication number: CN113397572A
Application number: CN202110839308.5A
Authority: CN
Inventors: 李智军; 程琦云; 李国欣
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-09-17

Abstract

The invention provides a surface electromyogram signal classification method and system based on a Transformer model, which comprises the following steps: step S1: collecting corresponding multi-channel surface electromyographic signals according to preset actions, and filtering and removing noise of the collected signals; step S2: converting the filtered signals into a data-tag sequence by using a sliding window technology, and preprocessing the data of each window; step S3: carrying out position coding on the preprocessed window data, and inputting the window data into a coder layer of a Transformer model to extract characteristics; step S4: and integrating the features extracted by the encoder through global pooling, and then obtaining a final classification result through a layer of fully-connected network. The method is based on the deep learning algorithm, solves the problem of low calculation efficiency caused by the fact that the sequence models such as RNN, LSTM, GRU and the like cannot be subjected to parallel calculation, and simultaneously improves the accuracy of the models.

Description

Surface electromyographic signal classification method and system based on Transformer model

Technical Field

The invention relates to the field of machine learning, in particular to a surface electromyographic signal classification method and system based on a Transformer model.

Background

The surface electromyographic signals are electrical signals collected on human skin by surface electrodes, and the electrical signals are potential differences generated near muscle fibers by muscle movement. When a human body produces an exercise intention, the intention is generated and encoded in nerve signals by the brain and transmitted to the spinal cord, the nerve signals are transmitted to corresponding limbs (such as lower limbs) through nerve passages after secondary encoding, muscle fibers are contracted by the nerve signals to generate potential differences, and muscles pull the skeleton to complete the exercise. In this process, the movement is intended to be ultimately encoded in the electrical signals generated by the contraction of muscle fibers. By decoding this signal, the original movement intention can be obtained, thereby controlling the external machine. Compared with the method of directly decoding brain signals and nerve signals, the muscle electrical signals are closer to the action implementation stage, the contained information is more accurate, the signal-to-noise ratio is higher, and the acquisition is more convenient.

Machine learning is one of the primary methods of decoding surface muscle electrical signals. The method comprises two stages: feature extraction and classification in feature space. The existing manual feature extraction method has a large difference from the RNN neural network in precision, but the RNN neural network cannot realize parallel calculation in the calculation process and has low speed.

Patent document CN112466326A (application number: 202011470115.9) discloses a method for extracting speech emotion features based on a transform model encoder, which is applicable to the fields of artificial intelligence and speech emotion recognition. Firstly, extracting low-level speech emotion characteristics from an original speech waveform by using a sinnet filter, and then further learning the low-level speech emotion characteristics by using a multilayer transform model encoder; the improved transformer model encoder is added with a layer of s incnet filter, namely a set of parameterized sinc functions with band-pass filters, in front of the conventional transformer model encoder, the sinnet filter is utilized to complete the low-level feature extraction work of the voice original waveform signal, and the network can better capture important narrow-band emotional features, so that frame-level emotional features containing global context information at a deeper level are obtained.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a surface electromyogram signal classification method and system based on a Transformer model.

The surface electromyogram signal classification method based on the Transformer model provided by the invention comprises the following steps:

step S1: collecting corresponding multi-channel surface electromyographic signals according to preset actions, and filtering and removing noise of the collected signals;

step S2: converting the filtered signals into a data-tag sequence by using a sliding window technology, and preprocessing the data of each window;

step S3: carrying out position coding on the preprocessed window data, and inputting the window data into a coder layer of a Transformer model to extract characteristics;

step S4: and integrating the features extracted by the encoder through global pooling, and then obtaining a final classification result through a layer of fully-connected network.

Preferably, the step S1 includes: collecting corresponding multichannel surface electromyographic signals according to preset types of movement actions and preset types of rest and relaxation actions, and filtering noise of the collected signals through a band-pass filter.

Preferably, the sliding window technique in step S2 includes: preset group data overlapping is formed between adjacent windows;

the pretreatment comprises the following steps: each channel of each window is individually normalized.

Preferably, the position encoding in step S3 includes:

wherein pos represents a location of the data within the time window; i represents the location of the data feature within the current set of data; d_inputRepresenting dimensions of input data features.

Preferably, the step S3 of inputting the encoder-layer extraction features of the transform model includes:

the encoder module of the used Transformer comprises a multi-head attention network and a feedforward network;

the multi-head attention network includes: extracting internal features of the input sequence through a multi-head attention mechanism, wherein the formula is as follows:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

wherein X represents a time window of input; concat represents a splicing function; parameter matrix

And

all are learnable matrices; d_modelRepresenting dimensions of the transform model output features; if the number of the heads of the multi-head is h, the shape of the parameter matrix is d_k＝d_v＝d_modelH, requirement d_kMust be a square number; q represents a query matrix; k represents a matrix of the relevance of the inquired information and other information; v represents a matrix of queried information; superscript T represents matrix transposition; r represents a vector space; d_inputCharacteristic dimension representing input information, d_kRepresents one dimension of K;

the feedforward network comprises a two-layer fully-connected network, ReLU is used as an activation function, and the calculation formula is as follows:

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

is a learnable matrix. d_h _iddenRepresenting a hidden layer dimension; b₁、b₂A deviation term is represented.

Preferably, the step S4 includes: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.

The invention provides a surface electromyogram signal classification system based on a Transformer model, which comprises the following components:

module M1: collecting corresponding multi-channel surface electromyographic signals according to preset actions, and filtering and removing noise of the collected signals;

module M2: converting the filtered signals into a data-tag sequence by using a sliding window technology, and preprocessing the data of each window;

module M3: carrying out position coding on the preprocessed window data, and inputting the window data into a coder layer of a Transformer model to extract characteristics;

module M4: and integrating the features extracted by the encoder through global pooling, and then obtaining a final classification result through a layer of fully-connected network.

Preferably, said module M1 comprises: collecting corresponding multichannel surface electromyographic signals according to preset types of movement actions and preset types of rest and relaxation actions, and filtering noise of the collected signals through a band-pass filter.

Preferably, the position encoding in the module M3 includes:

wherein pos represents a location of the data within the time window; i represents the location of the data feature within the current set of data; d_inputA dimension representing a characteristic of the input data;

the encoder layer extracting features of the transform model input in the module M3 includes:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

And

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

is a learnable matrix. d_hiddenRepresenting a hidden layer dimension; b₁、b₂A deviation term is represented.

Preferably, said module M4 comprises: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the method, the forearm electromyographic signals are classified by using a Transformer algorithm, the characteristics do not need to be designed manually, the process of characteristic selection is omitted, the quality of extracted characteristics is better, and the recognition precision is improved;

2. the invention utilizes a Multi-Head Attention network mechanism in a transform to extract sequence information in a time sequence, and compared with the traditional RNN type network, the accuracy is higher;

3. the Multi-Head Attention network mechanism used in the invention can realize the parallel calculation of a plurality of groups of data, and has higher calculation efficiency and higher speed compared with the traditional RNN type network which can only carry out serial calculation;

4. the method is based on the deep learning algorithm, solves the problem of low calculation efficiency caused by the fact that the sequence models such as RNN, LSTM, GRU and the like cannot be subjected to parallel calculation, and simultaneously improves the accuracy of the models.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a process framework diagram of the present invention;

FIG. 2 is a network framework of the present invention;

FIG. 3 is a flow chart of a self-attention mechanism operation;

FIG. 4 is a flow chart of the multi-head self-attention mechanism operation.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Example 1

Specifically, the step S1 includes: according to six types of movement actions and one type of rest relaxing actions, six-channel signals acquired from six different positions of the forearm are acquired corresponding multi-channel surface electromyographic signals, and the acquired signals are filtered by a band-pass filter to filter noise so as to eliminate artifact noise (low frequency) and unnecessary high-frequency noise.

Specifically, the sliding window technique in step S2 includes: 125 under the sliding window band, 35 step length, and 90 groups of data overlap between adjacent windows;

Specifically, the position encoding in step S3 includes:

Specifically, the step S3 of inputting the encoder layer extraction features of the transform model includes:

the Encoder (Encoder) module of the transform used includes a Multi-Head Attention Network (Multi-Head Attention) and a Feed-Forward Network (Feed Forward Network);

MultiHead(X)＝Concat(head₁,…,head_h)W^O

And

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

Specifically, the step S4 includes: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.

Specifically, the module M1 includes: according to six types of movement actions and one type of rest relaxing actions, six-channel signals acquired from six different positions of the forearm are acquired corresponding multi-channel surface electromyographic signals, and the acquired signals are filtered by a band-pass filter to filter noise so as to eliminate artifact noise (low frequency) and unnecessary high-frequency noise.

Specifically, the sliding window technique in the module M2 includes: 125 under the sliding window band, 35 step length, and 90 groups of data overlap between adjacent windows;

Specifically, the position encoding in the module M3 includes:

Specifically, the step of inputting the encoder layer extraction features of the transform model in the module M3 includes:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

wherein X representsA time window of input; concat represents a splicing function; parameter matrix

And

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

Specifically, the module M4 includes: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.

Example 2

Example 2 is a preferred example of example 1

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

As shown in fig. 1 to 4, in a surface electromyographic signal classification method based on a transform model, firstly, electromyographic signals of forearms are collected according to seven designed actions, wherein the seven actions are respectively as follows: fist making, stretching, inward folding of forearm, outward folding of wrist towards forearm, inward folding of elbow, outward folding of elbow and natural relaxation. Through the seven actions, corresponding forearm surface electromyographic signals are collected, and the sampling frequency is 1024 Hz. For the collected signals, a band-pass filter is arranged for eliminating artifact noise and high-frequency noise with small information quantity, wherein the artifact noise does not exceed 20Hz generally, and the high-frequency noise exceeds 500Hz generally, and the passing frequency of the band-pass filter is 20-500 Hz.

For the filtered data, we use sliding window technique to convert the collected time series into "data-label" pairs, and normalize each channel in each time window, with the following formula:

where min is the minimum value taken for the channel and max is the maximum value taken for the channel.

For the data after windowing and normalization, a position code is added to the data so that a subsequent model can conveniently learn position information in a sequence during learning, and the position code formula is as follows:

where pos is the location of the data within the time window, i is the location of the data feature within the set of data, d_inputFor the set model output dimension, from the data we have collected, there is d_input＝6

For data added with position information, we input it into Encoder module of Transformer, the data first passes through Multi-Head Attention layer, and the calculation formula of this part is as follows:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

where X is the input time window, Concat is the splicing function, the parameter matrix

And

are all learnable matrices, d_modelFor the dimension of the model output features, let d here_model256. If the number of the multi-head is h is 4, the parameter matrix shape is d_k＝d_v＝d_modelWhen d is 64_kThe requirement of being an average is satisfied.

Inputting the features extracted by the Multi-Head Attention layer into a Feed Forward Network layer, wherein the Feed Forward Network layer consists of two layers of fully connected networks, and ReLU is used as an activation function, and the calculation formula is as follows:

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

for learnable matrices, d is set here_hidden＝256。

We use global tiepooling to integrate features extracted by the Encoder layer, followed finally by a fully connected network layer, using the softmax function as the activation function to complete the final classification.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A surface electromyogram signal classification method based on a Transformer model is characterized by comprising the following steps:

2. The method for classifying surface electromyogram signals based on a fransformer model according to claim 1, wherein the step S1 comprises: collecting corresponding multichannel surface electromyographic signals according to preset types of movement actions and preset types of rest and relaxation actions, and filtering noise of the collected signals through a band-pass filter.

3. The method for classifying surface electromyographic signals based on a fransformer model according to claim 1, wherein the sliding window technique in the step S2 comprises: preset group data overlapping is formed between adjacent windows;

4. The method for classifying surface electromyographic signals based on a fransformer model according to claim 1, wherein the position encoding in the step S3 comprises:

5. The method for classifying surface electromyogram signals based on a fransformer model according to claim 1, wherein the step S3 of inputting the encoder layer extraction features of the fransformer model comprises:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

head_i＝Attention(XW_i ^Q,XW_i ^K,XW_i ^V)

And

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

is a learnable matrix. d_{h idden}Representing a hidden layer dimension; b₁、b₂A deviation term is represented.

6. The method for classifying surface electromyogram signals based on a fransformer model according to claim 1, wherein the step S4 comprises: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.

7. A surface electromyogram signal classification system based on a Transformer model is characterized by comprising:

8. The transform model-based surface electromyographic signal classification system according to claim 7, wherein the module M1 comprises: collecting corresponding multichannel surface electromyographic signals according to preset types of movement actions and preset types of rest and relaxation actions, and filtering noise of the collected signals through a band-pass filter.

9. The transform model-based surface electromyogram signal classification system according to claim 7, wherein the position coding in the module M3 comprises:

MultiHead(X)＝Concat(head₁,…,head_h)W^O

head_i＝Attention(XW_i ^Q,XW_i ^K,XW_i ^V)

And

FFN(x)＝max(0,xW₁+b₁)W₂+b₂

parameter matrix

And

10. The transform model-based surface electromyographic signal classification system according to claim 7, wherein the module M4 comprises: features extracted by the encoder are integrated using global average pooling, and then finally classified via the softmax function of the fully connected network.