CN116597824A - Imagination voice classification method and system based on attention-guided tensor network - Google Patents

Imagination voice classification method and system based on attention-guided tensor network Download PDF

Info

Publication number
CN116597824A
CN116597824A CN202310580969.XA CN202310580969A CN116597824A CN 116597824 A CN116597824 A CN 116597824A CN 202310580969 A CN202310580969 A CN 202310580969A CN 116597824 A CN116597824 A CN 116597824A
Authority
CN
China
Prior art keywords
layer
data
attention
tensor
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310580969.XA
Other languages
Chinese (zh)
Inventor
孔万增
李昌盛
周文晖
王宇涵
莫良言
金宣妤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202310580969.XA priority Critical patent/CN116597824A/en
Publication of CN116597824A publication Critical patent/CN116597824A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/015Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Dermatology (AREA)
  • Neurology (AREA)
  • Neurosurgery (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a imagination voice classification method and system based on an attention-guided tensor network. Acquiring idea imagination voice brain electricity data and a label corresponding to the same; carrying out data enhancement on the ideological imagination voice electroencephalogram data to construct a training data set; constructing an attention-directed tensor network, training by using a training set after data enhancement in a data set, and testing by using a test set which is not enhanced in the data set; and realizing imagination voice classification of the brain electricity by using the trained and verified attention-guided tensor network. The method combines the data enhancement and the tensor network technology guided by the classification identification bit of the attention mechanism, and realizes the high-precision imagination voice brain electricity classification performance.

Description

Imagination voice classification method and system based on attention-guided tensor network
Technical Field
The invention belongs to the field of brain-computer interfaces, relates to a method and a system for classifying imagination voice based on an attention-guided tensor network, in particular to a method for judging imagination voice category by carrying out data enhancement and feature extraction on imagination voice brain-computer data based on a tensor network technology guided by a classification identification bit of a data enhancement and attention mechanism.
Background
It is desirable in the future that BCI be able to decode the human visual imagination and output it into a real world environment. Once the imagined word or dialogue is decoded by the BCI system, it can be used as a neural command, outputting the user imagined word through speech synthesis, or controlling robots and devices based on the word. Therefore, imagining the effectiveness and practicality of speech decoding is a non-negligible important issue. To implement these types of BCIs, research into extracting relevant features of the notional speech paradigm may improve the effectiveness of capturing brain activity associated with speech. Recently, researchers have studied various methods, particularly a deep learning method, which has been developed with the development of natural language processing technology, to accurately capture phoneme-level speech from brain signals.
Imagination of speech can be a key paradigm in developing intuitive systems that are easy for users to operate. Commands that identify the intuitive intent of the user and translate it into the outside world are one of the key functions of BCI. Using the notional speech paradigm allows the communication of the BCI to be significantly improved, as it can directly convey the user's intent through the notional speech or word itself, rather than through the spelling of a single letter. Meanwhile, the technique can apply such decoded results to control an external device. Imagine speech is an emerging paradigm that can shift a user's intent to an external device. Imagining a speech paradigm may provide a vital advantage over traditional BCI paradigms (e.g., MI). For example, increasing the number of classes in MI depends on the movement of the body parts, which may naturally overlap when many classes are needed, whereas the speech properties of different classes may allow more variation between classes without the concept of overlapping. Furthermore, the decoded imagined speech may directly match the interaction between the user's intent and the device feedback in a real-world environment. Finally, this feature of imagining a speech paradigm may help to develop a more practical BCI system, providing a high degree of freedom for the user. Thus, BCI is more prone to a technique of decoding human visual intent. However, imagined speech multi-class classification performance is still at a relatively low level compared to traditional BCI paradigms (such as MI or ERP). Efficient feature selection or classification methods for imagined speech may help to improve decoding performance. The multi-class classification performance of imagined speech is improved to the level of the conventional BCI paradigm, thereby enabling simple communication through internal speech or control of the external environment.
Disclosure of Invention
The invention aims at solving the defects of the prior art, and provides a method and a system for classifying imagined voices based on an attention-guided tensor network, wherein the performance of the imagined voices in multiple classification is at a relatively low level. The prior knowledge is introduced to the model by utilizing the data enhancement technology, so that the model learns more robust features to solve the problems of fewer samples of the existing data set and the like. A multi-headed attention mechanism is used to efficiently extract characteristic information of the data in the time dimension. The tensor network is used to solve the problem of small samples of the data set and improve the classification performance of the model.
In a first aspect, the present invention provides a method for imagining speech classification based on an attention-directed tensor network, comprising the steps of:
step S1: acquiring idea imagination voice brain electricity data and a label corresponding to the same;
step S2: when the model is trained, data enhancement is carried out on the ideas imagination voice brain electricity data, and a training data set is constructed;
step S3: constructing an attention-directed tensor network, training by using a training set after data enhancement in a data set, and testing by using a test set which is not enhanced in the data set;
step S4: and realizing imagination voice classification of the brain electricity by using the trained and verified attention-guided tensor network.
In a second aspect, the present invention provides a imagined speech classification system comprising a trained and validated attention-directed tensor network.
The beneficial effects of the invention are as follows:
the invention uses a multi-headed self-care mechanism to focus on the information of different time steps and different channels in the EEG signal at the same time, thereby better capturing the time sequence and the spatial correlation in the EEG signal. The importance of each time step and channel is then calculated using these correlations as weights. This may help the model automatically learn important features in the EEG signal, thereby improving the performance of the model. Different feature representations may also be learned by a multi-headed self-care mechanism and combined to form the final representation. This may help the model better process different EEG signals, thereby improving the generalization ability of the model.
One potential problem with deep learning is the large number of parameters. Thus, fitting requires a large number of samples, training the model takes a lot of time, and EEG samples are often inadequate. The invention introduces the method of converting the weight matrix of the fully connected layer into the tensor format in the tensor learning network, so that the number of parameters is greatly reduced, and the expression capability of the layer is reserved.
In summary, the invention provides a imagination voice classification method based on an attention-guided tensor network, which combines data enhancement and a tensor network technology guided by a classification identification bit of an attention mechanism, wherein the network comprises a data acquisition module, a data enhancement module, a multi-head attention module and a classification module which are respectively used for data enhancement, feature acquisition and classification results. Meanwhile, the network adopts random position coding, classification identification bits are added, and a tensor network is used for realizing high-precision imagination voice brain electricity classification performance.
Drawings
FIG. 1 is a flow chart of a method of imagining speech classification;
FIG. 2 is a diagram showing experimental paradigm of the data set used
FIG. 3 is a schematic diagram of feature extraction based on a multi-head attention mechanism;
fig. 4 is a schematic diagram of a classification module based on a tensor network.
Detailed Description
The process according to the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a method for imagining voice classification based on an attention-directed tensor network comprises the following specific steps:
a method of imagination speech classification based on an attention-directed tensor network, comprising the steps of:
step S1: acquiring idea imagination voice brain electricity data and a label corresponding to the same.
The data used in this experiment was a public data set and the experiment recorded EEG data of 15 subjects (S1-S15; age 20-30 years) as shown in FIG. 2. During the experiment, the subject sits in a comfortable chair with a 24 inch liquid crystal display in front. The subject is required to imagine silent utterances of a given word or phrase as if they were doing a real speech without moving any of the utterances or making sounds. The subject is instructed not to perform any brain activity other than the given task. They are required to neither move nor blink when they imagine or accept hints. All imagination experiments were performed using a black screen so that the subject was not subjected to any stimulus to avoid any other factors affecting brain activity. An auditory cue representing one of the five words/phrases is presented randomly for 2 seconds, followed by a cross mark of 0.8 seconds to 1.2 seconds. Researchers require subjects to imagine a given thread immediately after the cross marker disappears from the screen. Each random cue goes through 4 cross-tag phases (0.8-1.2 s) and imagined speech phase (2 s) in turn. After four phases of imagined speech, a 3s relaxation phase is allowed to clear the subject's mind for the next word/phrase. Electroencephalogram data was recorded using a signal amplifier (BrainAmp, brainProducts GmbH, germany). Raw data were recorded using a BrainVision (BrainProducts GmbH, germany) and MATLAB 2019a (The MathWorks Inc. USA), using 64 electroencephalogram electrodes following The 10-20 International configuration. The ground and reference channels are placed on Fpz and FCz, respectively. The impedance of all electrodes between the sensor and the scalp skin is kept below 15 k.
The experiment recorded an electroencephalogram of class 5 imaginative words/phrases. Is marked asWhere T is the time dimension and the size is 795.C is the channel dimension, and the size is 64. I.e. the original data size is 64 x 795.
Step S2: when the model is trained, data enhancement is carried out on the ideas imagination voice brain electricity data, and a training data set is constructed;
the method for enhancing the data by using Mixup linear interpolation specifically comprises the following steps:
wherein (X) i ,Y i ) And (X) j ,Y j ) Is two samples randomly extracted from training data, X i ,X j Is the original data input, Y i ,Y j For the single thermal coding of the corresponding class, lambda E [0,1 ]];
Mixup is a data enhancement technique used to improve the performance of the model. It creates a new training example by randomly combining a pair of examples from different classes. The technique can make the model more robust, reduce over-fitting and improve generalization capability. Specifically, mixup will input two samples X i ,X j With a random proportion lambda epsilon 0,1]Performing linear interpolation to generate a new sampleAt the same time their labels Y i ,Y j Interpolation is performed in the same ratio to generate a new tag +.>Thus, the model can learn more characteristics and similarities among different categories, thereby improving generalization capability. After data enhancement, the data size remains unchanged 795×64.
This approach introduces a priori knowledge to the model: by introducing the priori knowledge, the enhanced data can enable the model to learn more robust features, and the generalization of the deep learning model is improved.
Step S3: constructing an attention-directed tensor network, training by using a training set after data enhancement in a data set, and testing by using a test set which is not enhanced in the data set;
the attention guiding tensor network comprises a feature extraction module of a cascade multi-head attention mechanism and a classification module of a tensor learning network;
1) The feature extraction module of the cascade multi-head attention mechanism as shown in fig. 3 comprises an embedded layer, a classification identification bit Class Token layer, a position coding layer, a first LN regularization layer, a multi-head self-attention layer, a first residual error connection layer, a second LN regularization layer, a feedforward network layer, a second residual error connection layer and a third LN regularization layer which are sequentially connected in series;
1.1, up-sampling the channel dimension of 795×64 brain electrical data by a full connection layer to increase the data dimension, and extracting finer granularity information to obtain 795×1024 data;
1.2, generating a vector with a size of 1 x 1024 by using a random initialization mode in the Class token layer, splicing the vector to a data head of an embedded layer book to realize statistics of global characteristic information and reduce local characteristic information interference, wherein the data size is 796 x 1024; using Class Tokens, it can encode the statistics of the entire notional speech data, which can be updated continuously as the network trains. The aggregation (global feature aggregation) of information on all other Token is performed, and the information is not based on the content of the data, so that the bias on a specific Token in the data can be avoided. Secondly, the fixed position codes are used for the Class token, so that the interference of the position codes on the output can be effectively avoided.
The position coding layer of 1.3 adopts a random position coding method, which comprises the following steps: generating a random number matrix with the same format as the input data, and adding the random number matrix and the input data to be used as the output of a position coding layer; the position coding layer adopts a random position coding method, so that the problem that a model cannot capture the position relation in the time dimension in an input sequence can be solved. The position-coding layer assigns a position code to each vector in the time dimension of each input sequence, which position code is added to the time-dimension vector, so that the time-dimension vector can contain information about its position in the input sequence. Therefore, the neural network can better understand the sequence and the relation of the time dimension in the input sequence, and further improve the performance of the model. Specifically, the position-coding layer outputs a random number matrix in the same format as its input data, and adds it to the input data as the input data of the multi-headed self-attention layer.
1.4, carrying out normalization processing on the output data of the position coding layer by the first LN regularization layer;
1.5, mapping the LN regularization layer output data to different subspaces by the multi-head self-attention mechanism layer, and then performing point multiplication operation on all subspaces to calculate an attention vector; finally, splicing and mapping the attention vectors calculated in all subspaces to an original input space to obtain a final attention vector so as to realize the feature correlation of the statistical imagination voice data in the time dimension;
the multi-head self-attention mechanism layer has the functions of enhancing the understanding and expressing ability of the model to input data and improving the accuracy and generalization ability of the model. In particular, the multi-headed attention layer may multi-headed divide the input data, and each head may focus on a different portion of the input data, thereby extracting different characteristic information. These heads can be computed in parallel, thereby speeding up the training of the model. And finally, combining the calculation results of the plurality of heads to obtain a final output result. The expression of the multi-headed self-focusing layer is as follows (3):
wherein MultiHead (Q, K, V) represents the resulting output attention vector; concat represents a splice operation; wheree head i Representing the attention vector calculated in the ith subspace;
i represents different subspaces, a query vector Q, a key vector K and a value vector V are obtained from the output data of the first LN regularization layer through a full connection layer and serve as the input of the multi-head self-attention module, W i Q Mapping matrix for Q in different subspaces, W i K Mapping matrix for K in different subspaces, W i V Mapping matrix for V in different subspaces, W O By W in all subspaces i V Splicing to obtain the final product;
the calculation mode of the attention vector on the independent subspace is as follows in sequence: firstly, carrying out dot multiplication operation on a query vector Q and a key vector K, and dividing the query vector Q and the key vector K by the square root of the dimension of the key vector KObtaining a score matrix of the query vector Q, finally transmitting the result into a Softmax function, normalizing the result by using the Softmax function to obtain a weight matrix, multiplying the weight matrix by a value vector V to obtain a subspace attention vector, wherein the expression is as follows (4):
wherein the parameter matrix dimensions d of Q, K, V q ,d k And d v All 128, the number of attention head heads is 8, d model 1024.
By linear transformation, query vector Q is derived from d model Dimension map d q * head, key vector K from d model Dimension map d k * head, value vector V from d model Dimension map d v *head;
Increasing the number of attention heads implicitly without reducing the hidden dimension assigned to each attention head can effectively extract global features and improve classification accuracy.
1.6 the first residual connection layer performs residual connection on the multi-head self-attention mechanism layer output so as to improve the characterization capability of a network on imagination voice data and effectively solve the problems of gradient disappearance and gradient explosion;
1.7, the second LN regularization layer performs normalization processing on the output data of the first residual error connection layer;
1.8 the Feed-forward network layer (Feed-ForwardNetwork, FFN) consists of two layers of Feed-forward neural networks, the first layer of Feed-forward network taking the output of the second LN regularized layer from d model Dimension map is 4*d model Dimension, activation function is GELU function, second layer feedforward neural network is 4*d again model Dimension map back d model Dimension, not using an activation function;
the expression of each layer of feedforward network is as follows (5):
wherein W is 1 And W is 2 Is a randomly initialized weight vector, b 1 And b 2 Is a randomly initialized bias; x represents the output of the second LN regularization layer;
1.9, the second residual connection layer performs residual connection on the output of the feedforward network layer, so as to improve the representation capability of the network on imagined voice data;
1.10, the third LN regularization layer performs normalization processing on the output data of the second residual error connection layer;
2) The classification module of the tensor learning network acquires data with ClassTokes in the output data of the feature extraction module of the cascade multi-head attention mechanism, and carries out prediction classification on the data;
the classification module of the tensor learning network as shown in fig. 4 comprises a tensor network, an activation layer and a full connection layer which are sequentially connected in series;
the tensor network performs tensor processing on input data with the size of 1 x 1024 so as to extract linear relation characteristics of the network on high-dimensional imagination voice data, specifically:
linearly transforming an N-dimensional input vector to obtain a mathematical expression represented by the formula (6)
y 1 =Wx 1 +b (6)
Wherein the method comprises the steps ofIs a weight matrix>For inputting data +.>Is biased;
wherein the element y (i) in y is represented by formula (7):
according to tensor learning thought, all y, W, x, b are converted into tensor representation, and marked as y, W, x, b; the method specifically comprises the following steps:
first, x ε R N*S Conversion to a 5-dimensional tensorDenoted as x (j) 1 ,...,j 5 ) Wherein N x S = S 1 *S 2 *S 3 *S 4 *S 5 I.e. converting the input Class token vector 1 x 1024 into a five-dimensional tensor with the size of 4 x 4;
via bijective function F (i) = (F) 1 (i),f 2 (i),f 3 (i),f 4 (i),f 5 (i))=(i 1 ,i 2 ,i 3 ,i 4 ,i 5 ) Vectors y, b and y (i) are indexed by index i 1 ,i 2 ,i 3 ,i 4 ,i 5 ),b(i 1 ,i 2 ,i 3 ,i 4 ,i 5 ) The five-dimensional tensor representation is related as shown in equation (8):
y(F(i))=y(i 1 ,i 2 ,i 3 ,i 4 ,i 5 )=y(i)
wherein y, b.epsilon.R Md= 5,i e 1,2,; y (i), b (i) is an element in y, b, y (i) 1 ,i 2 ,i 3 ,i 4 ,i 5 ) Five-dimensional tensors also of size 4 x 4;
the same applies to the weight matrixSee (9):
F(i)=(f 1 (i),f 2 (i),f 3 (i),f 4 (i),f 5 (i))=(i 1 ,i 2 ,i 3 ,i 4 ,i 5 )
G(j)=(g 1 (j),g 2 (j),g 3 (j),g 4 (j),g 5 (j))=(j 1 ,j 2 ,j 3 ,j 4 ,j 5 ) (9)
the weight matrix W may be associated with its corresponding tensor W and converted into a tensor column format Tensor Train Format (TT-format) as shown in equation (10):
wherein each g [ i ] k ,j k ]Denoted as i in the case of the same k k *j k *r k-1 *r k K e 1, 2..5, where r 0 =r 5 =1, where r k-1 *r k Called tensor rank TT-rank, TT-rank is [1,8,8,8,8,1 ]];
Finally, the formula (6) can be converted into a tensor form represented by the formula (11):
the activation layer uses RELU activation function to transmit tensor network output data into full connection through the activation layer to obtain classification result;
the corresponding classification labels are output through the tensor network classification module, and the loss function is calculated by comparing the classification labels with the real labels, and the cross entropy loss function is adopted, and the specific formula is as follows:
wherein M is 1 Is the number of tests, N 1 Is the number of categories to be counted,represents the mth 1 True label of secondary test->Representing the representation class n 1 Mth m 1 Predictive probability of secondary trial. When trained in conjunction with the model, it was noted as criterion, and the loss was calculated as follows:
loss=λ*criterion(pred,Y i )+(1-λ)criterion(pred,Y j ) (13)
wherein the method comprises the steps of
When the invention is specifically used, adam with high convergence rate is used as an optimizer, the initial learning rate is set to 8e-5, and the batch size is set to 8.
Step S4: and realizing imagination voice classification of the brain electricity by using the trained and verified attention-guided tensor network.
Table (1) imagination speech classification accuracy of different subjects using the above method

Claims (10)

1. A method of imagination speech classification based on an attention-directed tensor network, characterized in that the method comprises the steps of:
step S1: acquiring idea imagination voice brain electricity data and a label corresponding to the same;
step S2: carrying out data enhancement on the ideological imagination voice electroencephalogram data to construct a training data set;
step S3: constructing an attention-directed tensor network, training by using a training set after data enhancement in a data set, and testing by using a test set which is not enhanced in the data set;
the attention guiding tensor network comprises a feature extraction module of a cascade multi-head attention mechanism and a classification module of a tensor learning network;
1) The feature extraction module of the cascade multi-head attention mechanism comprises an embedded layer, a classification identification bit Class Token layer, a position coding layer, a first LN regularization layer, a multi-head self-attention layer, a first residual error connection layer, a second LN regularization layer, a feedforward network layer, a second residual error connection layer and a third LN regularization layer which are sequentially connected in series;
the multi-head self-attention mechanism layer maps the LN regularization layer output data to different subspaces, then performs point multiplication operation on all subspaces, and calculates an attention vector; finally, splicing and mapping the attention vectors calculated in all subspaces to an original input space to obtain a final attention vector so as to realize the feature correlation of the statistical imagination voice data in the time dimension;
2) The classification module of the tensor learning network acquires data with Class Tokens in the output data of the feature extraction module of the cascade multi-head attention mechanism, and carries out prediction classification on the data;
the classification module of the tensor learning network comprises a tensor network, an activation layer and a full connection layer which are sequentially connected in series;
the tensor network performs tensor processing on the input data to realize linear relation feature extraction of the network on the high-dimensional imagination voice data;
step S4: and realizing imagination voice classification of the brain electricity by using the trained and verified attention-guided tensor network.
2. The method according to claim 1, wherein the data enhancement in step S2 is specifically:
wherein (X) i ,Y i ) And (X) j ,Y j ) Is two samples randomly extracted from training data, X i ,X j Is the original data input, Y i ,Y j For the single thermal coding of the corresponding class, lambda E [0,1 ]]。
3. The method of claim 1, wherein in step S3, the embedded layer in the feature extraction module of the cascade multi-head attention mechanism upsamples a channel dimension of electroencephalogram data through a full connection layer to increase the data dimension, and extracts finer granularity information to obtain 795×1024 data;
the Class token layer generates vectors with the size of 1 x 1024 by using a random initialization mode, and splices the vectors to the data head of the embedded layer book so as to realize statistics of global characteristic information and reduce local characteristic information interference, and the data size is 796 x 1024 at the moment;
the position coding layer adopts a random position coding method, and specifically comprises the following steps: generating a random number matrix with the same format as the input data, and adding the random number matrix and the input data to be used as the output of a position coding layer;
and the first LN regularization layer performs normalization processing on the output data of the position coding layer.
4. A method according to claim 1 or 3, characterized in that in step S3, the expression of the multi-headed self-attention layer is as follows (3):
wherein MultiHead (Q, K, V) represents the resulting output attention vector; concat represents a splice operation; wheree head i Representing the attention vector calculated in the ith subspace;
i represents different subspaces, a query vector Q, a key vector K and a value vector V are obtained from the output data of the first LN regularization layer through a full connection layer and serve as the input of the multi-head self-attention module, W i Q Mapping matrix for Q in different subspaces, W i K Mapping matrix for K in different subspaces, W i V Mapping matrix for V in different subspaces, W O By W in all subspaces i V Splicing to obtain the final product;
the calculation mode of the attention vector on the independent subspace is as follows in sequence: firstly, carrying out dot multiplication operation on a query vector Q and a key vector K, and dividing the query vector Q and the key vector K by the square root of the dimension of the key vector KObtaining a score matrix of the query vector Q, finally transmitting the result into a Softmax function, normalizing the result by using the Softmax function to obtain a weight matrix, multiplying the weight matrix by a value vector V to obtain a subspace attention vector, wherein the expression is as follows (4):
wherein the parameter matrix dimensions d of Q, K, V q ,d k And d v All 128, the number of attention head heads is 8, d model 1024.
By linear transformation, query vector Q is derived from d model Dimension map d q * head, key vector K from d model Dimension map d k * head, value vector V from d model Dimension map d v *head。
5. The method according to claim 4, wherein in step S3, the first residual connection layer performs residual connection on the multi-headed self-attention mechanism layer output, so as to improve the representation capability of the network for imagined voice data;
the second LN regularization layer normalizes the output data of the first residual error connection layer;
the feedforward network layer consists of two layers of feedforward neural networks, wherein the first layer of feedforward network is used for outputting the second LN regularization layer from d model Dimension map is 4*d model Dimension, activation function is GELU functionThe second layer feedforward neural network is further processed from 4*d model Dimension map back d model Dimension, not using an activation function;
the second residual connection layer performs residual connection on the feedforward network layer output so as to improve the representation capability of the network on imagination voice data;
and the third LN regularization layer performs normalization processing on the output data of the second residual error connection layer.
6. The method according to claim 5, wherein in step S3, the expression of each layer of feedforward network is represented by the following formula (5):
wherein W is 1 And W is 2 Is a randomly initialized weight vector, b 1 And b 2 Is a randomly initialized bias; x represents the output of the second LN regularization layer.
7. The method according to claim 1, characterized in that in step S3, the tensor network is in particular:
an N-dimensional input vector is linearly transformed, thus obtaining a mathematical expression as shown in formula (6):
y 1 =Wx 1 +b (6)
wherein the method comprises the steps ofIs a weight matrix>For inputting data +.>Is biased;
wherein y is 1 The element y (i) in the formula(7) The following is shown:
according to tensor learning thought, all y, W, x, b are converted into tensor representation, and marked as y, W, x, b; the method specifically comprises the following steps:
first of all,conversion to 5-dimensional tensor->Denoted as x (j) 1 ,...,j 5 ) Wherein N x S = S 1 *S 2 *S 3 *S 4 *S 5 I.e. converting the input Class token vector 1 x 1024 into a five-dimensional tensor with the size of 4 x 4;
via bijective function F (i) = (F) 1 (i),f 2 (i),f 3 (i),f 4 (i),f 5 (i))=(i 1 ,i 2 ,i 3 ,i 4 ,i 5 ) Vectors y, b and y (i) are indexed by index i 1 ,i 2 ,i 3 ,i 4 ,i 5 ),b(i 1 ,i 2 ,i 3 ,i 4 ,i 5 ) The five-dimensional tensor representation is related as shown in equation (8):
y(F(i))=y(i 1 ,i 2 ,i 3 ,i 4 ,i 5 )=y(i)
b(F(i))=b(i 1 ,i 2 ,i 3 ,i 4 ,i 5 )=b(i) (8)
wherein y, b.epsilon.R My (i), b (i) is an element in y, b, y (i) 1 ,i 2 ,i 3 ,i 4 ,i 5 ) Five-dimensional tensors also of size 4 x 4;
the same applies to the weight matrixSee (9):
F(i)=(f 1 (i),f 2 (i),f 3 (i),f 4 (i),f 5 (i))=(i 1 ,i 2 ,i 3 ,i 4 ,i 5 )
G(j)=(g 1 (j),g 2 (j),g 3 (j),g 4 (j),g 5 (j))=(j 1 ,j 2 ,j 3 ,j 4 ,j 5 ) (9)
the weight matrix W is associated with its corresponding tensor W and converted into a tensor column format Tensor Train Format (TT-format) as shown in equation (10):
wherein each g [ i ] k ,j k ]Denoted as i in the case of the same k k *j k *r k-1 *r k K e 1, 2..5, where r 0 =r 5 =1, where r k-1 *r k Called tensor rank TT-rank, TT-rank is [1,8,8,8,8,1 ]];
Finally, the formula (6) is converted into a tensor form shown in the formula (11):
8. the method according to claim 1 or 7, characterized in that the activation layer uses a RELU activation function.
9. The method according to claim 1, characterized in that the loss function of the attention-directing tensor network employs a cross entropy loss function, the specific formula being as follows:
wherein M is 1 Is the number of tests, N 1 Is the number of categories to be counted,represents the mth 1 True label of secondary test->Representing the representation class n 1 Mth m 1 Predictive probability of secondary trial; when training in combination with the model, the cross entropy loss function is noted as criterion, and the loss calculation mode is as follows: loss=λ x criterion (pred, Y i )+(1-λ)criterion(pred,Y j ) (13)
Wherein the method comprises the steps of
10. A classification system implementing the method according to any of claims 1-9, characterized by comprising a trained and validated attention-directed tensor network.
CN202310580969.XA 2023-05-19 2023-05-19 Imagination voice classification method and system based on attention-guided tensor network Pending CN116597824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310580969.XA CN116597824A (en) 2023-05-19 2023-05-19 Imagination voice classification method and system based on attention-guided tensor network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310580969.XA CN116597824A (en) 2023-05-19 2023-05-19 Imagination voice classification method and system based on attention-guided tensor network

Publications (1)

Publication Number Publication Date
CN116597824A true CN116597824A (en) 2023-08-15

Family

ID=87605895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310580969.XA Pending CN116597824A (en) 2023-05-19 2023-05-19 Imagination voice classification method and system based on attention-guided tensor network

Country Status (1)

Country Link
CN (1) CN116597824A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821776A (en) * 2023-08-30 2023-09-29 福建理工大学 Heterogeneous graph network node classification method based on graph self-attention mechanism
CN117473303A (en) * 2023-12-27 2024-01-30 小舟科技有限公司 Personalized dynamic intention feature extraction method and related device based on electroencephalogram signals
CN117851897A (en) * 2024-03-08 2024-04-09 国网山西省电力公司晋城供电公司 Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821776A (en) * 2023-08-30 2023-09-29 福建理工大学 Heterogeneous graph network node classification method based on graph self-attention mechanism
CN116821776B (en) * 2023-08-30 2023-11-28 福建理工大学 Heterogeneous graph network node classification method based on graph self-attention mechanism
CN117473303A (en) * 2023-12-27 2024-01-30 小舟科技有限公司 Personalized dynamic intention feature extraction method and related device based on electroencephalogram signals
CN117473303B (en) * 2023-12-27 2024-03-19 小舟科技有限公司 Personalized dynamic intention feature extraction method and related device based on electroencephalogram signals
CN117851897A (en) * 2024-03-08 2024-04-09 国网山西省电力公司晋城供电公司 Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method

Similar Documents

Publication Publication Date Title
Latif et al. Deep representation learning in speech processing: Challenges, recent advances, and future trends
CN116597824A (en) Imagination voice classification method and system based on attention-guided tensor network
CN111134666B (en) Emotion recognition method of multi-channel electroencephalogram data and electronic device
CN110060690B (en) Many-to-many speaker conversion method based on STARGAN and ResNet
Goodfellow et al. Deep learning
CN111461176B (en) Multi-mode fusion method, device, medium and equipment based on normalized mutual information
CN112800998B (en) Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
CN110060657B (en) SN-based many-to-many speaker conversion method
US20230101539A1 (en) Physiological electric signal classification processing method and apparatus, computer device and storage medium
Boloukian et al. Recognition of words from brain-generated signals of speech-impaired people: Application of autoencoders as a neural Turing machine controller in deep neural networks
CN111584069B (en) Psychosis recognition system based on speech deep-shallow feature stack sparse automatic coding
Kim et al. Automatic classification of the Korean triage acuity scale in simulated emergency rooms using speech recognition and natural language processing: a proof of concept study
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
Kim et al. Cross-modal distillation with audio–text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0
Poncelet et al. Low resource end-to-end spoken language understanding with capsule networks
CN117608402B (en) Hidden Chinese language processing system and method based on Chinese character writing imagination
Duan et al. Dewave: Discrete encoding of eeg waves for eeg to text translation
CN116108856B (en) Emotion recognition method and system based on long and short loop cognition and latent emotion display interaction
Sunder et al. Handling class imbalance in low-resource dialogue systems by combining few-shot classification and interpolation
CN116701996A (en) Multi-modal emotion analysis method, system, equipment and medium based on multiple loss functions
CN116484885A (en) Visual language translation method and system based on contrast learning and word granularity weight
Ye et al. Attention bidirectional LSTM networks based mime speech recognition using sEMG data
CN115588486A (en) Traditional Chinese medicine diagnosis generating device based on Transformer and application thereof
Alrumiah et al. A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses as Case Study
Jiang et al. Dual memory network for medical dialogue generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination