CN115083566A

CN115083566A - Motion intention identification method based on double-flow Transformer encoder and multi-head attention mechanism

Info

Publication number: CN115083566A
Application number: CN202210762625.6A
Authority: CN
Inventors: 张文利; 赵庭松; 王宇飞; 张健一; 刘嘉铭; 王天语
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-09-20

Abstract

The invention discloses a motion intention identification method based on a double-current Transformer encoder and a multi-head attention mechanism, which comprises the following steps of: acquiring multiple groups of sample information acquired by wearable equipment worn by a stroke patient, wherein each group of sample information comprises a sample electromyographic signal, an inertial measurement signal and/or a sample electroencephalographic signal; establishing a motion intention identification model based on a double-current Transformer encoder and a multi-head attention mechanism based on each group of sample information; determining an exercise intention of the stroke patient based on the exercise intention recognition model; the motion intention identification network comprises a double-current Transformer encoder, a long and short sequence feature cross attention module, a multi-scale feature fusion module and a motion intention classification module; the double-flow Transformer encoder comprises a multi-head attention mechanism; also disclosed are a motion intention recognition system based on a dual-stream Transformer encoder and a multi-head attention mechanism, an application of the motion intention recognition method in mirror image treatment and/or assisting treatment of a stroke patient, an electronic device and a computer readable storage medium.

Description

Motion intention identification method based on double-flow Transformer encoder and multi-head attention mechanism

Technical Field

The invention relates to the technical field of computer virtual reality and intelligent rehabilitation, in particular to a motion intention identification method based on a double-flow Transformer encoder and a multi-head attention mechanism.

Background

The cerebral apoplexy is also called apoplexy and cerebrovascular accident, is an acute cerebrovascular disease, is a group of diseases caused by brain tissue damage due to the fact that blood cannot flow into the brain because of sudden rupture of cerebral vessels or blockage of blood vessels, and comprises ischemic stroke and hemorrhagic stroke, wherein the incidence rate of the ischemic stroke is higher than that of the hemorrhagic stroke, and accounts for 60% -70% of the total number of the cerebral stroke. The internal carotid artery and vertebral artery occlusion and stenosis can cause ischemic stroke, the age is more than 40 years old, more men than women, severe people can cause death, the death rate of hemorrhagic stroke is higher, investigation shows that urban and rural total stroke becomes the first death reason in China and is also the first reason for Chinese adult disability, the stroke has the characteristics of high morbidity, high mortality and high disability rate, wherein, the wrist varus is the common clinical manifestation of the stroke, the arm muscle of a patient is atrophied, the grasping effect of the hand is lost, and the operation is very inconvenient.

The movement intention is a key part for accurately tracking the movement of the upper limbs of the human body and finally realizing the mirror image treatment of the upper limbs. Although the exoskeleton field has achieved some success in the research of athletic intent in recent years, the technology is still not mature enough. The key of the motor intention recognition is to acquire a time sequence of signals such as myoelectricity and the like which perform incomplete actions at the current moment aiming at a cerebral apoplexy patient, and analyze the expected actions of the patient so as to guide the affected hand to carry out rehabilitation movement according to the motor intention of the patient. At present, the recognition method aiming at human motion intention mainly comprises intention recognition based on mechanical information and intention recognition based on bioelectrical information. However, the exercise intention recognition method using mechanical information has a relatively serious hysteresis since it is only available after the user starts exercising, and cannot directly reflect the exercise intention of the person, and it is difficult to implement flexible control. Since continuous movement of a human body causes problems of decreased muscle contractility, sweating on the upper epidermis and the like, so that accuracy of a prediction result of an exercise intention is decreased, intention recognition based on bioelectrical information needs to comprehensively consider the influence of a muscle state of a user on myoelectric information after long-time use, and therefore people begin to research application of a machine learning method in the field of exercise intention recognition. For the research on the recognition of the whole body movement intention of a human body, various sensors such as acceleration, angular velocity and pressure are usually worn on the human body and the exoskeleton to acquire physiological signals, and then the movement form of the human body is pre-judged, so that the exoskeleton robot is controlled to move. Research in the field of rehabilitation exercises for the lower limbs of the human body, for example: the literature, "MundtMarion, Koeppe Arnd, David Sina, Bamer Franz, Potthast Wolfgang, marker burn, prediction of ground interaction for and joint movement on optical motion capture data during movement," J. However, the network structure of the method is simple, and the effect of extracting the features of the bioelectricity signals at the obvious change is poor, so that the prediction accuracy is generally low.

Therefore, the prior art does not have a mature and applicable technical scheme for identifying the movement intention by applying an identification model in mirror image rehabilitation therapy and medium and mild assistive therapy for patients with severe stroke.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a movement intention identification method based on a double-current transducer encoder and a multi-head attention mechanism.

The invention aims to provide a motion intention identification method based on a double-current Transformer encoder and a multi-head attention mechanism, which comprises the following steps of:

s1, acquiring multiple groups of sample information acquired by wearable equipment worn by a stroke patient, wherein each group of sample information comprises a sample electromyographic signal, an inertia measurement signal and/or a sample electroencephalographic signal;

s2, establishing a motion intention identification model based on a double-current Transformer encoder and a multi-head attention mechanism based on each group of sample information;

s3, determining the motor intention of the stroke patient based on the motor intention recognition model;

wherein, in the step S2, establishing, based on each set of sample information, a motion intention recognition model based on a dual-stream Transformer encoder and a multi-head attention mechanism includes:

s21, preprocessing the sample information and obtaining first part data set data required by establishing a movement intention recognition model;

s22, performing data set expansion on the first part of data set data to obtain second part of data set data, and combining the first part of data set data and the second part of data set data to form sample data set data;

s23, establishing a motion intention identification network based on a double-flow Transformer encoder and a multi-head attention mechanism; the motion intention identification network comprises a double-flow Transformer encoder, a long and short sequence feature cross attention module, a multi-scale feature fusion module and a motion intention classification module; the dual-stream Transformer encoder comprises a multi-head attention mechanism;

and S24, inputting the sample training data set into the movement intention recognition network for training and learning to obtain the movement intention recognition model.

Preferably, the wearable device is a myoelectric acquisition sensor, an inertial measurement sensor and/or an electroencephalogram acquisition sensor.

Preferably, the preprocessing of S21 includes noise reduction, normalization, absolute value taking, and data segmentation, wherein:

s211, denoising, wherein denoising comprises filtering noise caused by power frequency interference, motion artifacts and/or multi-channel crosstalk reasons in the original electromyographic signals, so as to obtain noise-filtered sample information;

s212, the normalization comprises limiting the sample information after the noise is filtered to a size which is beneficial to model training, and obtaining a normalized electromyographic signal;

s213, the taking the absolute value includes: taking absolute values of all sequences of the normalized electromyographic signals;

s214, data segmentation: and cutting the whole sequence of the normalized electromyographic signals after the absolute values are taken into a plurality of sample time sequence windows, and taking the plurality of sample time sequence windows as data set data.

Preferably, the S22 obtaining the second partial data set data by performing data set expansion on the first partial data set data includes:

s221, randomly windowing: performing random window extraction on the first partial data set data to obtain random window extraction sample data in the second partial data set data, including: randomly selecting a starting point of a window in each type of action sequence, and determining a termination point according to the window length so as to obtain a myoelectric time sequence window; performing random window sampling on all sequences of the normalized sample information after the absolute value is taken based on the electromyographic time sequence window to obtain random window sampling data in the second part of data set data;

s222, time delay signal enhancement: performing time delay signal enhancement on the first part of data set data to obtain time delay signal enhancement sample data in the second part of data set data, including: randomly selecting and deleting a section of sampling points of one of the plurality of sample timing windows in S214; selecting sampling points with the same number as the deleted sampling points at the next moment of one of the sample timing windows, and putting the sampling points into a window tail to form a time delay signal enhanced timing window; performing time delay signal enhancement on all sequences of the normalized sample information after the absolute value is taken based on the time delay signal enhancement time sequence window to obtain time delay signal enhancement sample data in the second part of data set data;

s223, merging the random window sample data in the second part of data set data and the time delay signal enhancement sample data in the second part of data set data to obtain second part of data set data; and combining the first part of data set data and the second part of data set data to form sample data set data based on data enhancement and combination, thereby effectively expanding the data volume of the sample data set.

Preferably, the step S22 only includes performing the step S221 of randomly windowing or the step S222 of enhancing the delayed signal, and accordingly, the step S223 is not performed.

Preferably, the double-current Transformer encoder comprises a channel attention module, a long sequence slice transformation module, a short sequence slice transformation module, a multi-head attention mechanism module and a feedforward neural network module;

the long and short sequence feature cross attention module is used for simultaneously learning the identification information of the long sequence branch and the short sequence branch obtained by the long sequence slice conversion module and the short sequence slice conversion module;

the multi-scale feature fusion module is used for fusing the identification information of the long sequence branch and the identification information of the short sequence branch which pass through the long and short sequence feature cross attention module and outputting multi-scale fusion features;

and the motion intention classification module is used for classifying the motion intention of the multi-scale fusion features by using full connection to obtain a motion intention output result.

Preferably, the establishing of the dual-stream Transformer encoder includes:

s231, establishing a channel attention module, including: calculating the relation among all channels according to the time sequence characteristics of the sample information and the spatial characteristics of the multichannel electromyographic signals, learning the importance of the signal characteristics of each channel in spatial distribution on rehabilitation action recognition, and adaptively adjusting the recognition weight of each channel to enable the sample time sequence window passing through the channel attention module to form channel attention;

s232, establishing a long sequence slice conversion module and a short sequence slice conversion module, comprising: slicing the sample time sequence window forming the channel attention according to the number of sampling points in a certain time to respectively form a long sequence with more sampling points and a short sequence with less sampling points; respectively converting the long sequence and the short sequence of each slice into a long sequence slice one-dimensional vector and a short sequence slice one-dimensional vector through a long sequence slice module and a short sequence slice module;

s233, establishing a multi-head attention mechanism module;

s234, establishing a feedforward neural network, wherein the feedforward neural network is composed of a plurality of full connection layers, and a first residual error connection and normalization module is arranged between the feedforward neural network and the multi-head attention mechanism module; and a second residual error connection and normalization module is arranged between the feedforward neural network and the long and short sequence feature cross attention module.

Preferably, the calculation formula of each head of the multi-head attention mechanism in S233 is as formula (1), and the correlation between each slice in the same layer can be learned through the calculation formula of the attention mechanism:

wherein the Attention (Q, K, V) is a multi-head Attention mechanism, Q, K, V are respectively a query matrix, a key matrix and a value matrix, d is a row vector dimension of the matrix, Softmax is a normalized exponential function, a Softmax function is a single-layer neural network, the Softmax function is the popularization of a two-classification function sigmoid on multi-classification, and is used for displaying the result of the multi-classification in a probability form, and the calculation method of Softmax is as follows: softmax is the conversion of the prediction result from negative infinity to positive infinity into a probability in two steps: the first step is to convert the prediction result of the model to an exponential function, thereby ensuring the nonnegativity of the probability; the second step is to make sure that the sum of the probabilities of the prediction results is equal to 1, normalize the converted results, i.e. divide the converted results by the sum of all converted results, and take the percentage of the converted results in the total as the approximate probability.

Preferably, the multi-head attention mechanism module comprises:

the multi-head slice forming module is used for constructing a long sequence slice matrix and a short sequence slice matrix based on the long sequence slice one-dimensional vector and the short sequence slice one-dimensional vector and inputting the long sequence slice matrix and the short sequence slice matrix into the slice matrix transformation module;

a slice matrix transformation module: the single-head attention conversion module is used for carrying out linear change on the long sequence slice matrix and the short sequence slice matrix to obtain a query matrix Q, a key matrix K and a value matrix V, obtaining a converted query matrix Q ', a key matrix K' and a value matrix V 'through the full-connection layer and inputting the converted query matrix Q', the key matrix K 'and the value matrix V' into the single-head attention conversion module;

single-head attention conversion module: for obtaining a plurality of single-headed attention matrices based on the converted query matrix Q ', the key matrix K ', and the value matrix V ';

the multi-head attention fusion module: the multi-head attention matrix X 'or Y' is used for splicing a plurality of single-head attention matrix to construct a multi-head attention matrix, then compressing the multi-head attention matrix and outputting the compressed multi-head attention matrix X 'or Y';

the multi-head slice forming module is used for receiving the one-dimensional vectors a of the n short sequence slices output by the short sequence slice module ₁ ,a ₂ ,…,a _n Or receiving L long sequence slice one-dimensional vectors b output by the long sequence slice module ₁ ,b ₂ ,…,b _L (ii) a Construction of a short sequence slice matrix X ═ a ₁ ,a ₂ ,…,a _n ]Or long sequence slice matrix Y ═ b ₁ ,b ₂ ,…,b _L ](ii) a Outputting the short sequence slice matrix X or the long sequence slice matrix Y to a sliceIn the slice matrix transformation module.

The slice matrix transformation module is used for receiving the short sequence slice matrix X or the long sequence slice matrix Y, and obtaining a query matrix Q, a key matrix K and a value matrix V through linear transformation, wherein the query matrix Q, the key matrix K and the value matrix V are shown in formulas (11), (12) and (13);

for a short sequence slice matrix X, then:

Q＝W _q X+b _q (11)；

K＝W _k X+b _k (12)；

V＝W _v X+b _v (13)；

or the same operations as the equations (11), (12), (13) are performed for the long-sequence slice matrix Y;

wherein W _q ，W _K ，W _v For each attention mechanism, a matrix with learnable parameters, b _q ，b _k ，b _v Updating a parameter matrix and matrix bias by optimization in the model training process for matrix bias;

after obtaining the values of Q, K and V, obtaining a converted query matrix Q ', a key matrix K' and a value matrix V 'through a full connection layer, and outputting the converted query matrix Q', the key matrix K 'and the value matrix V' to the single-head attention transformation module for constructing a single head of a multi-head attention mechanism;

the single-head attention transformation module is used for receiving the converted query matrix Q ', the key matrix K ' and the value matrix V ' output by the slice matrix transformation module; then the following treatment is carried out: firstly, transposing a converted key matrix K 'and performing point multiplication on a converted query matrix Q'; the dot product is then divided by the matrix row vector dimension d to the power of one half

Finally, normalizing the calculation result by a Softmax function and multiplying the normalization result by a converted value matrix V' to obtain an output matrix head containing single attention information;

the calculation formula of the single-head attention is as follows (1'):

wherein the head is an output matrix containing single-head Attention information, the Attention (Q ', K ', V ') is single-head Attention transformation, Q ', K ', V ' are respectively a converted query matrix, a key matrix and a value matrix, the Q ', K ', V ' matrix dimensions are the same, and d is the row vector dimension of the matrix;

when h single-head attention modules exist, h single-head attention output matrixes head1, head2, …, head i, … and head h are obtained respectively, and the formula is (14):

wherein

Outputting the h single-head attention matrixes to a multi-head attention fusion module;

the multi-head attention fusion module is used for receiving the h single-head attention output matrixes head1, head2, …, head i, … and head h, splicing the h single-head attention output matrixes head1, head2, …, head i, … and head h to construct a matrix containing multi-head attention information, as shown in formula (15),

MultiHead(Q'，K'，V')＝concat(head 1，...，head h) (15)；

the concat function can connect a plurality of matrixes along a designated axis to form a splicing matrix;

compressing a multi-head attention moment array MultiHead (Q ', K ', V ') to obtain a compressed multi-head attention matrix X ' or Y ', and outputting the compressed multi-head attention matrix X ' or Y ' to a first residual connecting and normalizing module for processing, wherein the compressing process comprises the following steps: and compressing the multi-head attention matrix by using a full connection layer, so that the dimensionality of the compressed multi-head attention matrix is consistent with that of the single-head attention moment matrix.

A second aspect of the present invention provides a motion intention identification system based on a dual-stream Transformer encoder and a multi-head attention mechanism, comprising:

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a plurality of groups of sample information acquired by wearable equipment worn by a stroke patient, and each group of sample information comprises a sample electromyographic signal, an inertia measurement signal and/or a sample electroencephalographic signal;

the model establishing module is used for establishing a motion intention identification model based on a double-current Transformer encoder and a multi-head attention mechanism based on each group of sample information;

an exercise intention recognition module for determining an exercise intention of the stroke patient based on the exercise intention recognition model.

The third invention of the invention provides an application of the movement intention identification method based on the double-flow transducer encoder and the multi-head attention mechanism in the mirror image treatment and/or the assisting treatment of the stroke patient.

A fourth aspect of the invention provides an electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor being configured to read the instructions and to perform the method according to the first aspect.

A fifth aspect of the invention provides a computer readable storage medium storing a plurality of instructions readable by a processor and performing the method of the first aspect.

The movement intention identification method, the system, the application, the electronic equipment and the computer readable storage medium based on the double-current Transformer encoder and the multi-head attention mechanism have the following beneficial technical effects:

sample signals of the patient wearing equipment are collected, all the signals are intercepted into a sample time sequence window after preprocessing and serve as data set data, data enhancement is conducted on the data set, and training samples are expanded. And slicing the long-term sequence and the short-term sequence of the sample signal, respectively inputting the sliced sample signal into a double-current Transformer encoder, and fully extracting the information of the long-term sequence and the short-term sequence of the signal. And finally, obtaining the movement intention of the patient through an intention classification module, and realizing the movement intention identification with high accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1(a) is a flow chart of a motion intention identification method based on a dual-stream Transformer encoder and a multi-head attention mechanism according to the present invention; fig. 1(b) is a flowchart of the method for building a motion intention recognition model based on a dual-stream Transformer encoder and a multi-head attention mechanism according to the present invention.

FIG. 2 is a schematic diagram of a motion intention recognition network structure based on a dual-stream Transformer encoder and a multi-head attention mechanism according to the present invention.

FIG. 3 is a schematic diagram of a multi-head attention mechanism module according to the present invention.

FIG. 4 is a block diagram of a multi-head attention mechanism according to the present invention.

FIG. 5 is a schematic architecture diagram of the motion intention recognition system based on a dual stream Transformer encoder and a multi-head attention mechanism according to the present invention.

Fig. 6 is a schematic structural diagram of an electronic device according to the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The method provided by the invention can be implemented in the following terminal environment, and the terminal can comprise one or more of the following components: a processor, a memory, and a display screen. Wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the methods described in the embodiments described below.

A processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and calling data stored in the memory.

The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instructions.

The display screen is used for displaying user interfaces of all the application programs.

In addition, those skilled in the art will appreciate that the above-described terminal configurations are not intended to be limiting, and that the terminal may include more or fewer components, or some components may be combined, or a different arrangement of components. For example, the terminal further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and other components, which are not described herein again.

Example one

Referring to fig. 1(a), a motion intention identification method based on a dual-stream Transformer encoder and a multi-head attention mechanism includes:

referring to fig. 1(b) and fig. 2, the step S2 of building a motion intention recognition model based on a dual-stream Transformer encoder and a multi-head attention mechanism based on each set of sample information includes:

and S24, forming a sample training data set by the first part of data set data and the second part of data set data, and inputting the sample training data set into the movement intention recognition network for training and learning to obtain the movement intention recognition model.

In a preferred embodiment, the wearable device is a myoelectric acquisition sensor, an inertial measurement sensor and/or an electroencephalogram acquisition sensor. In this embodiment, wearable equipment is myoelectricity collection sensor, fixes myoelectricity collection sensor in the corresponding position of cerebral apoplexy patient's healthy side hand.

As a preferred embodiment, the preprocessing of S21 includes noise reduction, normalization, absolute value taking, and data segmentation, wherein:

s211, the denoising comprises: setting a filter type and a coefficient and a blind source separation method according to the type of the sample information; filtering noise caused by power frequency interference, motion artifacts and/or multi-channel crosstalk in the original electromyographic signals based on the filter and the blind source separation method, so as to obtain sample information after the noise is filtered;

s212, the normalizing includes: limiting the sample information after noise filtering to a size beneficial to model training based on a Z-score or maximum and minimum normalization method to obtain a normalized electromyographic signal;

s213, the taking the absolute value includes: taking absolute values of all sequences of the normalized electromyographic signals; step S213 is implemented because the normalized myoelectric signal has positive or negative signal amplitude of each action segment, but the muscle contraction can be expressed regardless of the positive or negative signal, and useful information may be cancelled out without absolute value processing.

S214, data segmentation: and cutting the whole sequence of the normalized sample information after the absolute value is taken into a plurality of sample time sequence windows, and taking the plurality of sample time sequence windows as data set data. In this embodiment, the sample data is an electromyographic signal, the electromyographic signal is a time sequence, a long sequence is collected according to the sampling rate of a sampling device, and the data cannot be trained and recognized due to the overlong data, so that the whole sequence of the normalized electromyographic signal with the absolute value taken is cut into a plurality of electromyographic timing windows, namely, the windows are taken as original electromyographic timing windows, the original electromyographic timing windows are used as data set data, and the data set data is subsequently subjected to data enhancement to obtain training data for training an exercise intention recognition model, test data of the test model, and the like.

As a preferred embodiment, in S22, performing data set expansion on the first partial data set data to obtain second partial data set data includes performing random window extraction and delay signal enhancement on the first partial data set data, respectively, obtaining random window extraction sample data in the second partial data set data and delay signal enhancement sample data in the second partial data set data, and merging the random window extraction sample data in the second partial data set data and the delay signal enhancement sample data in the second partial data set data to obtain second partial data set data.

In a preferred embodiment, the step S22 of performing data set expansion on the first partial data set data to obtain a second partial data set data includes:

s221, randomly windowing: the method comprises the following steps: randomly selecting a starting point of a window in each type of action sequence, and determining a termination point according to the window length so as to obtain a myoelectric time sequence window; performing random window sampling on all sequences of the normalized sample information after the absolute value is taken based on the electromyographic time sequence window to obtain random window sampling data in the second part of data set data; for the electromyographic signals in the embodiment, the purpose of the operation of S221 is to obtain an electromyographic timing window which cannot be obtained in the implementation process of S214, so as to increase the sample diversity of the data set.

S222, time delay signal enhancement: the method comprises the following steps: randomly selecting and deleting a section of sampling points of one of the plurality of sample timing windows in S214; selecting sampling points with the same number as the deleted sampling points at the next moment of one of the sample timing windows, and putting the sampling points into a window tail to form a time delay signal enhanced timing window; and performing time delay signal enhancement on the whole sequence of the normalized sample information after the absolute value is taken based on the time delay signal enhancement time sequence window to obtain time delay signal enhancement sample data in the second part of data set data. In this embodiment, the purpose of S222 implementation is that the myoelectric acquisition sensor has internal components or generates data omission during transmission, reception, and transmission, so that the sample size and robustness of the system can be increased;

and S223, merging the random window sample data in the second part of data set data and the time delay signal enhancement sample data in the second part of data set data to obtain second part of data set data.

The merging of the first part of data set data and the second part of data set data to form the sample data set data is realized based on data enhancement merging, so that the data volume of the sample data set is effectively expanded.

It should be noted that the S22 only includes performing the S221 random window or the S222 time-delayed signal enhancement, and accordingly, the S223 is not performed. Therefore, the above three data amplification methods (including only S221, only S222, and the schemes including S221-S223) are all within the scope of the present invention, and can effectively expand the data size of the sample data set to different degrees.

Referring again to fig. 2, where "XM" and "XN" indicate that the structure in the dotted line is repeated M and N times to construct a deep encoder to extract features of a deeper layer, based on which the long-short sequence feature cross attention module is used to simultaneously learn the identification information of the long-sequence branch and the short-sequence branch obtained by the long-sequence slice transformation module and the short-sequence slice transformation module, respectively; specifically, in this embodiment, the identification information corresponds to the short sequence feature and the long sequence feature shown in fig. 2, so the long-short sequence feature cross attention module is configured to learn the long sequence feature output by the long sequence branch and the short sequence feature output by the short sequence branch constructed by the long sequence slice transformation module and the short sequence slice transformation module at the same time;

the multi-scale feature fusion module is used for fusing the identification information learned by the long sequence branch of the long and short sequence feature cross attention module and the identification information learned by the short sequence branch and outputting multi-scale fusion features; in this embodiment, a specific method is to first exchange information between patch tokens (sequence slices) of another branch by using a CLS token (classification token) of each branch as a proxy, and then project the information to its own branch. Since the CLS token has learned abstract information between all patch tokens in its branch, interaction with a patch token in another branch helps to fuse information of different scales. After fusing with other branch tokens, the CLS token interacts with the own patch token again on the next layer Transformer encoder, and in the step, the CLS token can transmit the learning information from another branch to the own patch token so as to enrich the characteristic representation of each patch token;

the athletic intent classification module: and carrying out movement intention classification on the multi-scale fusion features by using full connection to obtain a movement intention output result.

As a preferred embodiment, the dual-stream Transformer encoder includes a channel attention module, a long sequence slice transform module, a short sequence slice transform module, a multi-head attention mechanism module, and a feed-forward neural network module.

The establishing of the dual-stream Transformer encoder comprises the following steps:

s231, establishing a channel attention module, including: according to the time sequence characteristics of the sample information and the spatial characteristics of the multichannel electromyographic signals, calculating the relation among the channels, wherein one channel corresponds to one electromyographic sensor, and the plurality of electromyographic sensors are distributed on different muscle groups; learning the importance of the signal characteristics of each channel in the spatial distribution on rehabilitation action recognition, and adaptively adjusting the recognition weight of each channel to enable a sample time sequence window passing through the channel attention module to form channel attention; therefore, the movement intention identification network can better extract action information contained in myoelectricity;

s232, establishing a long sequence slice conversion module and a short sequence slice conversion module, comprising: slicing the sample time sequence window forming the channel attention according to the number of sampling points in a certain time to respectively form a long sequence with more sampling points and a short sequence with less sampling points; respectively converting the long sequence and the short sequence of each slice into a long sequence slice one-dimensional vector and a short sequence slice one-dimensional vector through a long sequence slice module and a short sequence slice module; in this embodiment, the collected electromyographic signals include rehabilitation gestures that are set to different degrees of difficulty for different degrees of mobility of the affected hand. The complex rehabilitation gesture is more dependent on the variation characteristics in the long-time electromyography sequence, and the simple rehabilitation gesture is more dependent on the variation characteristics in the short-time electromyography sequence; information redundancy is caused to simple gesture recognition by extracting long-time myoelectric sequence characteristics, and the extraction of short-time myoelectric sequence characteristic information is not enough to recognize complex rehabilitation gestures, so that the recognition of various gestures is facilitated by simultaneously carrying out long-sequence and short-sequence slicing;

and S233, establishing a multi-head attention mechanism module.

Referring to fig. 3, the multi-head attention mechanism module includes the components:

multi-head slice forming module S1: one-dimensional vector a of n short sequence slices for receiving output of short sequence slice module ₁ ,a ₂ ,…,a _n Or receiving L long sequence slice one-dimensional vectors b output by the long sequence slice module ₁ ,b ₂ ,…,b _L . And constructing a short sequence slice matrix X ═ a ₁ ,a ₂ ,…,a _n ]Or long sequence slice matrix Y ═ b ₁ ,b ₂ ,…,b _L ](ii) a Outputting the short sequence slice matrix X or the long sequence slice matrix Y to a slice matrix transformation module S2;

slice matrix transformation module S2: receiving the short sequence slice matrix X or the long sequence slice matrix Y, and obtaining a query matrix Q, a key matrix K and a value matrix V through linear transformation, wherein the query matrix Q, the key matrix K and the value matrix V are shown in formulas (11), (12) and (13);

for a short sequence slice matrix X, then:

Q＝W _q X+b _q (11)；

K＝W _k X+b _k (12)；

V＝W _v X+b _v (13)；

the same operations as equations (11), (12), (13) are performed for the long-sequence slice matrix Y;

wherein W _q ，W _K ，W _v For each attention mechanism, a matrix with learnable parameters, b _q ，b _k ，b _v For matrix bias, the parameter matrix and the matrix bias are updated through optimization in the model training process, so that the output of the model approaches to the correct movement intention;

after obtaining the values of Q, K, and V, obtaining a converted query matrix Q ', a key matrix K', and a value matrix V 'through the full connection layer, and outputting the converted query matrix Q', the key matrix K ', and the value matrix V' to the single-head attention transformation module S3 for constructing a single head of the multi-head attention mechanism;

single-head attention module S3: for receiving the transformed query matrix Q ', key matrix K ' and value matrix V ' output by the slice matrix transformation module S2; firstly, transposing a key matrix K 'and dot multiplying the key matrix Q'; the dot product is then divided by the matrix row vector dimension d to the power of one half

The purpose is to reduce the matrix parameter value to be easy for model calculation; finally, the calculation result is subjected to normalization processing by a Softmax function and multiplied by a value matrix V' to obtain an output moment containing single-head attention informationHead shooting;

the calculation formula of the single attention module S3 is shown in formula (1), and the specific structure is shown in the dashed line portion S3 in fig. 3. The correlation between the slices can be learned by the calculation formula of the single-head attention module.

Wherein the head is an output matrix containing single-head Attention information, the Attention (Q ', K ', V ') is single-head Attention transformation, Q ', K ', V ' are respectively a converted query matrix, a key matrix and a value matrix, the Q ', K ', V ' matrix dimensions are the same, and d is the row vector dimension of the matrix.

When h single-head attention modules exist, h single-head attention output matrixes head1, head2, …, head i, … and head h are obtained respectively, and the formula is (14)

Wherein

And outputting the h single-head attention matrixes to a multi-head attention fusion module S4 to obtain a short-sequence slice X 'and a long-sequence slice Y' with attention information.

The multi-head attention fusion module S4 includes the following functions:

(1) multi-head attention splicing: receiving the h single-head attention output matrixes head1, head2, …, head i, … and head h, splicing the h single-head attention output matrixes to construct a matrix containing multi-head attention information, as shown in formula (15), effectively integrating the multi-dimension attention information through multi-head attention splicing,

MultiHead(Q'，K'，V')＝concat(head 1，...，head h) (15)；

(2) compressing the multi-head attention moment array MultiHead (Q ', K ', V ') to obtain a compressed multi-head attention matrix X ' or Y '; the compression process includes: and compressing the multi-head attention matrix by using a full connection layer, so that the dimensionality of the compressed multi-head attention matrix is consistent with that of the single-head attention moment matrix.

When the attention model is used for building the model, more detailed features can be extracted by increasing the network depth, so that the method is an effective mode for improving the model performance. Referring to the above detailed description, the calculation formula (1) of each Head of the Multi-Head Attention mechanism (Multi-Head Attention) in the Transformer is as follows:

wherein the Attention (Q, K, V) is a multi-head Attention mechanism, Q, K, V are respectively a query matrix, a key matrix and a value matrix, d is a matrix row vector dimension, Softmax is a normalized exponential function, a Softmax function is a single-layer neural network, the Softmax function is the popularization of a two-classification function sigmoid on multi-classification, and is used for displaying the result of the multi-classification in a probability form, and the calculation method of Softmax is as follows: softmax is the conversion of the prediction result from negative infinity to positive infinity into a probability in two steps: the first step is to convert the prediction result of the model to an exponential function, thereby ensuring the nonnegativity of the probability; the second step is to make sure that the sum of the probabilities of the prediction results is equal to 1, and normalize the converted results, i.e. dividing the converted results by the sum of all the converted results, which can be understood as the percentage of the converted results in the total number, to obtain an approximate probability;

(3) and outputting the compressed multi-head attention matrix X 'or Y' to a first residual error connecting and normalizing module for processing, wherein the establishment of the first residual error connecting and normalizing module is explained in detail below.

S234, establishing a feedforward neural network, wherein the feedforward neural network is composed of a plurality of full connection layers, and a first residual error connection and normalization module is arranged between the feedforward neural network and the multi-head attention mechanism module; a second residual error connection and normalization module is arranged between the feedforward neural network and the long and short sequence feature cross attention module;

the two residual error connection and normalization modules are used for solving the problem in multilayer neural network training, performing weighting connection on input and output of the previous module and performing normalization processing, and the two modules are used for effectively transmitting shallow information to a deep layer so as to effectively solve the problem that the gradient disappears. Two parts including residual concatenation Add and normalized Norm:

(1) residual Connection Add represents a Residual Connection, and part of information of the previous layer is transmitted to the next layer without difference, so that the model performance is improved, and the problem of difficulty in training the multi-layer neural network is solved. For some layers, whether the effect is positive cannot be determined, after the residual error connection is added, the information of the previous layer is divided into two paths, one part of the information is changed through the layer, the other part of the information is directly transmitted into the next layer, and the results of the two parts are added to be used as the input of the lower layer, so that the information of the previous layer can be at least reserved after the residual error connection;

(2) the normalization Norm is layer normalization, and the training process of the model is accelerated by normalizing the activation value of the layer, so that the faster convergence speed is obtained. The normalization in this embodiment includes two methods: the same feature of different samples under the same batch is normalized or different features of the same sample are normalized in the channel direction.

Example two

Referring to fig. 5, the second embodiment provides a motion intention identification system based on a dual-stream Transformer encoder and a multi-head attention mechanism, including:

the system comprises a sample acquisition module 101, a data processing module and a data processing module, wherein the sample acquisition module 101 is used for acquiring multiple groups of sample information acquired by wearable equipment worn by a stroke patient, and each group of sample information comprises a sample electromyographic signal, an inertia measurement signal and/or a sample electroencephalographic signal;

the model establishing module 102 is configured to establish a motion intention identification model based on a double-current Transformer encoder and a multi-head attention mechanism based on each group of sample information;

an exercise intention recognition module 103 for determining an exercise intention of the stroke patient based on the exercise intention recognition model.

The third invention of the invention provides an application of the movement intention identification method based on the double-flow transducer encoder and the multi-head attention mechanism in the mirror image treatment and/or the assisting treatment of the stroke patient. Wherein:

(I) Severe patients

Applicable objects are as follows: brunnstorm stage I (no voluntary movements (lag phase)), II (minimal flexion only) patients.

The characteristics of the patients are as follows: the affected hand has no random movement or only slight bending, and has no reliable myoelectric signal for expressing the movement intention.

The measures are as follows: mirror image treatment, collecting the electric signal of the hand muscle of the healthy side to identify the action, thereby controlling the rehabilitation assistant tool movement of the hand of the affected side.

(II) moderate patients

Applicable objects are as follows: brunnstorm stage III (integral grasp, using hook-shaped grasp but not relaxing, not extending fingers), IV (pinching and releasing thumb on the side, semi-random small extension of fingers), V (spherical, cylindrical grasp, fingers together but not separately) patients.

The characteristics of the patients are as follows: the affected hand only moves partially autonomously, but has reliable myoelectric signals for expressing the movement intention.

The measures are as follows: and (4) assisting in movement treatment, namely acquiring the electric signals of hand muscles on the affected side according to specific rehabilitation actions to recognize gesture actions, so as to control the rehabilitation assistive device to perform rehabilitation movement.

(III) mild patients

Applicable objects are as follows: brunnstorm staged VI patients.

The characteristics of the patients are as follows: all grasping is accomplished, but the speed accuracy is worse than healthy, and the patient's actions are self-performing at this stage.

The present invention also provides a memory storing a plurality of instructions for implementing the method according to embodiment one.

As shown in fig. 6, the present invention further provides an electronic device, which includes a processor 501 and a memory 502 connected to the processor 501, where the memory 502 stores a plurality of instructions, and the instructions can be loaded and executed by the processor, so that the processor can execute the method according to the first embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A motion intention identification method based on a double-current Transformer encoder and a multi-head attention mechanism is characterized by comprising the following steps:

s2, establishing a motion intention identification model based on a double-flow Transformer encoder and a multi-head attention mechanism based on each group of sample information;

s3, determining the motor intention of the stroke patient based on the motor intention recognition model.

2. The method for recognizing motion intention based on the dual-stream Transformer encoder and the multi-head attention mechanism as claimed in claim 1, wherein the step S2 of building a motion intention recognition model based on the dual-stream Transformer encoder and the multi-head attention mechanism based on each set of sample information includes:

and S24, inputting the sample data set into the movement intention recognition network for training and learning to obtain the movement intention recognition model.

3. The method for recognizing motor intention based on the double-current transducer encoder and the multi-head attention mechanism according to claim 1, wherein the wearable device is an electromyography acquisition sensor, an inertia measurement sensor and/or an electroencephalography acquisition sensor.

4. The method of claim 2, wherein the preprocessing of the step S21 includes denoising, normalizing, taking absolute value, and data partitioning, and wherein:

s212, the normalization comprises limiting the sample information after the noise is filtered to be in a proper size beneficial to model training, and obtaining a normalized electromyographic signal;

5. The method of claim 4, wherein the step S22 of performing data set expansion on the first partial data set data to obtain second partial data set data comprises:

6. The method of claim 5, wherein the step S22 only comprises performing the step S221 of randomly windowing or the step S222 of time-lapse signal enhancement, and accordingly, the step S223 is not performed.

7. The method of claim 4, wherein the method of motion intention recognition based on dual-stream Transformer encoder and multi-head attention mechanism,

the double-current Transformer encoder comprises a channel attention module, a long sequence slice conversion module, a short sequence slice conversion module, a multi-head attention mechanism module and a feedforward neural network module;

8. The method of claim 7, wherein the establishing the dual-stream Transformer encoder comprises:

s231, establishing a channel attention module, including: according to the time sequence characteristics of the sample information and the spatial characteristics of the multichannel electromyogram signals, calculating the relation among the channels, learning the importance of the signal characteristics of each channel in spatial distribution to rehabilitation action recognition, and adaptively adjusting the recognition weight of each channel to enable a sample time sequence window passing through the channel attention module to form channel attention;

s233, establishing a multi-head attention mechanism module;

9. The method of claim 8, wherein the calculation formula of each head of the multi-head attention mechanism in S233 is as formula (1), and the correlation between each slice in the same layer can be learned through the calculation formula of the multi-head attention mechanism:

10. The method of claim 9, wherein the multi-head attention mechanism module comprises:

a multi-head slice forming module (S1) for constructing a long sequence slice matrix and a short sequence slice matrix based on the long sequence slice one-dimensional vector and the short sequence slice one-dimensional vector and inputting the long sequence slice matrix and the short sequence slice matrix to the slice matrix transformation module (S2);

slice matrix transform module (S2): a single-head attention transformation module (S3) for linearly changing the long sequence slice matrix and the short sequence slice matrix to obtain a query matrix Q, a key matrix K and a value matrix V, and obtaining a converted query matrix Q ', a key matrix K ' and a value matrix V ' through the full connection layer;

single-head attention conversion module (S3): the single-head attention matrix acquisition module is used for acquiring a plurality of single-head attention matrices based on the converted query matrix Q ', the key matrix K ' and the value matrix V ';

multi-head attention fusion module (S4): the multi-head attention matrix X 'or Y' is used for splicing a plurality of single-head attention matrix to construct a multi-head attention matrix, then compressing the multi-head attention matrix and outputting the compressed multi-head attention matrix X 'or Y';

the multi-head slice forming module (S1) is used for receiving the one-dimensional vectors a of the n short sequence slices output by the short sequence slice module ₁ ,a ₂ ,…,a _n Or receiving L long sequence slice one-dimensional vectors b output by the long sequence slice module ₁ ,b ₂ ,…,b _L (ii) a Construction of a short sequence slice matrix X ═ a ₁ ,a ₂ ,…,a _n ]Or long sequence slice matrix Y ═ b ₁ ,b ₂ ,…,b _L ](ii) a Outputting the short sequence slice matrix X or the long sequence slice matrix Y into a slice matrix transformation module (S2);

the slice matrix transformation module (S2) is configured to receive the short sequence slice matrix X or the long sequence slice matrix Y, and obtain a query matrix Q, a key matrix K, and a value matrix V through linear transformation, as shown in equations (11), (12), and (13);

for a short sequence slice matrix X, then:

Q＝W _q X+b _q (11)；

K＝W _k X+b _k (12)；

V＝W _v X+b _v (13)；

obtaining Q, K and V values, obtaining a converted query matrix Q ', a key matrix K' and a value matrix V 'through a full connection layer, and outputting the converted query matrix Q', the key matrix K 'and the value matrix V' to the single-head attention transformation module (S3) for constructing a single head of a multi-head attention mechanism;

the single-headed attention transformation module (S3) is configured to receive the transformed query matrix Q ', the key matrix K ' and the value matrix V ' output by the slice matrix transformation module (S2); then the following treatment is carried out: firstly, transposing a converted key matrix K 'and performing point multiplication on a converted query matrix Q'; the dot product is then divided by the matrix row vector dimension d to the power of one half

the calculation formula of the single-head attention is as follows (1'):

when h single-head attention modules exist, h single-head attention output matrixes head1, head2, …, head, … and head are obtained respectively, and the formula (14) is shown as follows:

wherein

Outputting the h single-head attention matrixes to a multi-head attention fusion module (S4);

the multi-head attention fusion module (S4) is used for receiving the h single-head attention output matrixes head1, head2, …, head, … and head, splicing the h single-head attention output matrixes to construct a matrix containing multi-head attention information, as shown in formula (15),

MultiHead (Q'，K'，V')＝concat(head 1，...，head h) (15)；

11. A motion intention recognition system based on a dual stream Transformer encoder and a multi-head attention mechanism, for implementing the motion intention recognition method according to any one of claims 1 to 10, comprising:

the system comprises a sample acquisition module (101) for acquiring multiple groups of sample information acquired by wearable equipment worn by a stroke patient, wherein each group of sample information comprises a sample electromyographic signal, an inertial measurement signal and/or a sample electroencephalographic signal;

the model establishing module (102) is used for establishing a motion intention identification model based on a double-current Transformer encoder and a multi-head attention mechanism based on each group of sample information; and

an exercise intention recognition module (103) for determining an exercise intention of the stroke patient based on the exercise intention recognition model.

12. Use of a dual-stream Transformer encoder and multi-head attention mechanism based motor intention recognition method according to any one of claims 1-10 for mirroring and/or assisting treatment of stroke patients.

13. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor configured to read the instructions and perform the method of claims 1-10.

14. A computer-readable storage medium storing a plurality of instructions readable by a processor and performing the method of claims 1-10.