CN115153563A - Mandarin auditory attention decoding method and device based on EEG - Google Patents

Mandarin auditory attention decoding method and device based on EEG Download PDF

Info

Publication number
CN115153563A
CN115153563A CN202210527156.XA CN202210527156A CN115153563A CN 115153563 A CN115153563 A CN 115153563A CN 202210527156 A CN202210527156 A CN 202210527156A CN 115153563 A CN115153563 A CN 115153563A
Authority
CN
China
Prior art keywords
attention
eeg
voice
envelope
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210527156.XA
Other languages
Chinese (zh)
Inventor
倪广健
许淄豪
白艳茹
于韩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210527156.XA priority Critical patent/CN115153563A/en
Publication of CN115153563A publication Critical patent/CN115153563A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/25Bioelectric electrodes therefor
    • A61B5/279Bioelectric electrodes therefor specially adapted for particular uses
    • A61B5/291Bioelectric electrodes therefor specially adapted for particular uses for electroencephalography [EEG]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention discloses a mandarin auditory attention decoding method and a device based on an EEG, wherein the method comprises the following steps: establishing a nonlinear model between EEG and voice envelope through a deep learning architecture, and taking the extracted voice envelope and the collected EEG signal as the input of the nonlinear model; based on the nonlinear model, constructing a speech envelope reconstruction model based on an electroencephalogram signal by means of a long-time memory artificial neural network and a deep learning model based on a self-attention mechanism; calculating Pearson's correlation coefficient between the reconstructed voice envelope and the candidate voice envelope, and taking the voice stream with the maximum correlation coefficient between the candidate voice and the reconstructed voice envelope as the tested auditory attention object; and decoding the voice content online or offline, and outputting a reconstructed voice envelope. The device comprises: brain electricity cap, amplifier and processor. The invention solves the problem that no specific auditory attention decoding model of the tone language exists at present, and improves the auditory attention decoding effect of the non-tone language.

Description

Mandarin auditory attention decoding method and device based on EEG
Technical Field
The invention relates to the field of brain-computer interfaces, in particular to a mandarin auditory attention decoding method and device based on an EEG (electroencephalogram).
Background
In real life, the interested speech signal usually accompanies complex acoustic conditions such as background noise and irrelevant speaker interference, which seriously impairs the understanding of speech information and reduces the accuracy of speech recognition. Normal hearing persons have certain voice separation and recognition capabilities, and when people speak face to face, people usually can concentrate on listening to voices of interested speakers by increasing the volume, drawing close to the distance from the speakers and adjusting the attention to achieve barrier-free communication of voice information. In long-term learning and living, human ears can continuously adjust self ability to adapt to the influence of noise, and can sense and distinguish noise and target voice, separate and rectify the target voice and upload the target voice to a high-grade cognitive brain area for voice reprocessing and recognition, namely cocktail party effect proposed by Cherry in 1953, wherein a plurality of speaking voices exist in the cocktail party at the same time and are mixed with music, wine glass collision sounds and sounds of which the sounds are reflected back to the ears through indoor objects and six surfaces are reverberant sounds, however, a receiver can separate sounds of interested speakers from complex mixed sounds and carry out barrier-free communication [1] . However, patients with hearing impairment cannot isolate the target sounds of interest in the cocktail party and thus affect their quality of life. For more than half a century, computer and medical related researchers have attempted to design intelligent speech recognition systems to mimic the human auditory organs to solve the cocktail party problem, but to date have not achieved the desired results.
Colloquial description the cocktail party problem is concerned with the auditory selection ability of humans in complex auditory environments, where normal persons can easily focus on certain sound stimuli of interest and ignore other interfering sounds. How to design a set of system models to accurately detect the sound stimuli concerned by human and separate the target sound stimuli from the complex sound environment is an important model in the auditory field. Information processing between multiple modalities is not independent of each other. Integration of multisensory via organization of different modalitiesInput is processed in a multi-modal brain region to obtain a target signal with less noise and more robustness, so that the separation between background noise and a target and the division between continuous time are easier [2] . Studies have shown that visual input has a very strong influence on the information processing of other modalities, where the mugger effect shows that the action of the lips and their surrounding areas plays a key role in speech processing. The video showing a syllable "ga" acting on the lips of the attack "ba" is presented to the subject, who is said to hear the syllable neither "ga" nor "ba" but "da". And the motion of the lips and chin are related to the acoustic envelope of the speech, by viewing the face of the speaker who is speaking, the tracking of the speech by the auditory cortex and the attention selection of the targeted speaker can be enhanced [4] . There is currently no doubt as to at which stage multisensory integration occurs, there are three possibilities: one is early integration, where fusion occurs at a relatively early processing stage, which is a pre-attention process, i.e., perceptually driven attention [5-7] (ii) a Second, late integration, which requires attention during the integration process; three are parallel integrations, i.e. whether early integration or late integration occurs depends on the resources available to the task at hand.
Currently, electrophysiological approaches are mostly adopted for auditory attention decoding, such as: electroencephalograms (EEG), cortical electroencephalograms (ECoG), stereo electroencephalograms (sog), and the like. Speech material studied is essentially non-tonal languages such as: english, dutch, etc., the research on tonal language (mandarin).
Tonal language users account for more than half of the world population, especially users in mandarin chinese, whose population is more than 15 hundred million. The current research shows that the brain activation states and neural coding modes of tonal languages and non-tonal languages are different, so that it is necessary to research the auditory attention decoding architecture of Mandarin.
Disclosure of Invention
The invention provides a Mandarin auditory attention decoding method and device based on EEG, the invention takes Mandarin as a voice material, establishes a nonlinear model between EEG and voice envelope through a deep learning architecture, and constructs an auditory attention decoding device; the invention mainly solves the problem that no specific auditory attention decoding model of tone language (Mandarin) exists at present, and provides that the nonlinear auditory attention decoding model can improve the auditory attention decoding effect of non-tone language, and the details are described in the following:
an EEG-based Mandarin auditory attention decoding method, the method comprising:
establishing a nonlinear model between an EEG and a speech envelope through a deep learning architecture, and taking the extracted speech envelope and the collected EEG signal as the input of the nonlinear model;
based on the nonlinear model, a speech envelope reconstruction model based on electroencephalogram signals is constructed by means of a long-time memory artificial neural network (LSTM) and a deep learning model transducer based on an attention mechanism;
calculating Pearson's correlation coefficient between the reconstructed voice envelope and the candidate voice envelope, and taking the voice stream with the maximum correlation coefficient between the candidate voice and the reconstructed voice envelope as the tested auditory attention object; the speech content is decoded online or offline and the reconstructed speech envelope and auditory attention object are output.
Wherein the acquired EEG signals are:
calculating power spectral density of each channel by adopting three-segment attention state electroencephalogram data, and drawing a topographic map of the brain, wherein the power spectral density is more than 15 muA 2 The method comprises the following steps that a/Hz channel is related to brain regions related to auditory attention, the brain regions are located in the prefrontal lobe, the temporal lobe and the parietal lobe, 17-channel EEG data are used, STG and the prefrontal lobe are covered to serve as input of a nonlinear model, EEG signals are filtered through a band-pass filter of 0.1-30Hz, and EEG information related to voice signals and attention is obtained;
the brain activation state related to auditory attention is obtained by calculating the power spectral density of the electroencephalogram signal, and the characteristic expression that the mandarin activation brain area has left lateralization in the forehead lobe is found.
Further, the speech envelope is:
before the experiment begins, two character biographical story outlines of the experiment appear on a computer screen, a tested person selects an interesting story according to personal interest and carries out key response, an experiment main test sets biographical audio of the tested person selected to a left sound channel or a right sound channel according to the feedback of the tested person, and unselected biographies are set to an opposite sound channel for noise playing;
in the experimental process, 8 single choice questions randomly appear, the tested person answers according to the heard audio frequency, the answer is a number or a proper noun, and the tested person can make a choice after hearing;
EEG data from-300 ms to 0ms is used for EEG baseline correction, EEG data is down-sampled to 128Hz in EEGLAB to ensure that the sampling rate of the EEG data is consistent with the voice envelope, the EEG data is filtered to 0.1-35Hz, collected voice signals are subjected to logarithmic linear distribution 128 frequency segments in the frequency distribution range of Mandarin Chinese according to 100-8000Hz, voice envelope extraction is carried out, voice envelopes are extracted through Hilbert-Huang transformation, then 16 voice envelopes are obtained by carrying out superposition averaging on every eight frequency segments, and the 16 voice envelopes are subjected to linear combination to obtain the whole voice envelope of Mandarin Chinese.
The speech envelope reconstruction model specifically comprises:
1) Preprocessing data and initializing a weight matrix and a bias vector;
2) Training a neural network using a back-propagation and gradient-based optimizer, and updating weight coefficients and bias vectors to minimize a loss function;
3) Determining a decoding accuracy by performing a Pearson's correlation calculation between the reconstructed and candidate speech envelopes;
4) Optimizing the hyper-parameters of the LSTM model by repeating the steps 2) to 3) until a model with the highest precision is obtained;
5) Dividing the voice envelopes of all the subjects and corresponding electroencephalogram data into a training set, a verification set and a test set for model training and testing, wherein data repetition does not exist among the data sets;
6) According to the Pearson correlation coefficient, the network trains between the target and the reconstructed speech envelope, the Mini-batch is set to be 20, the random gradient is reduced, the Adam optimization algorithm is adopted for model training, and the initial values are set as follows: the learning rate is 0.01, the input dimension is 18, the number of concealment layers is 10, dropout is 0.5, the number of concealment units is 64, the total training time is 300epochs, and model parameters are acquired and stored every 10 epochs.
Further, the self-attention mechanism is: the importance of each channel is researched by calculating attention contribution, the influence of different brain areas on auditory attention is evaluated, and a numerical model basis is provided for optimizing electroencephalogram channels; the function is:
CR=Sigmoid(W 2 ReLU(W 1 X))
wherein X represents the original EEG signal, W 1 And W 2 Representing a learning matrix, CR representing a channel attention output, using two fully connected layers to obtain channel attention; the first fully connected layer is followed by the ReLU layer and the second fully connected layer is followed by the Sigmoid layer.
The decoding employs a standard decoder structure, i.e. consisting of a multi-headed attention layer and a masked multi-headed attention module,
in the mask multi-head attention module, multiplying position information by an upper triangular matrix after passing through a Softmax function, and preventing information leakage of the next position; sequentially realizing predicted voice envelopes through a full connection layer;
the linear combination of mean square error and Pearson's is chosen as a loss function for the target speech envelope prediction and the loss is propagated from the output of the decoder back to the entire model.
An EEG-based Mandarin acoustic attention decoding apparatus, the apparatus comprising: an electroencephalogram cap, an amplifier and a processor,
the electroencephalogram cap is used for collecting electroencephalogram signals, the electroencephalogram signals are transmitted to the processor after being processed by the amplifier, and the processor calls stored program instructions to enable the device to execute the method steps.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention provides a layout mode of electroencephalogram acquisition signals and a Mandarin experiment mode aiming at the field of auditory attention decoding, can finish human-computer interaction by means of auditory attention decoding, and can be used for decoding auditory attention objects or voice contents on line or off line;
2. aiming at the fact that the existing hearing attention decoding model is mainly a linear model, the invention constructs a nonlinear model between an electroencephalogram signal and a speech envelope, can complete a speech envelope reconstruction model based on the electroencephalogram signal by means of an LSTM (long-short time memory artificial neural network) model and a transformer (a deep learning model based on a self-attention system), further improves the accuracy of hearing attention decoding, and can be used for researching and developing new generation of neural hearing aid equipment (artificial cochlea and the like based on the hearing attention decoding).
Drawings
FIG. 1 is a Mandarin auditory attention decoding experimental paradigm;
FIG. 2 is a graph of a speech envelope for Mandarin;
FIG. 3 is a flowchart of an experimental paradigm process based on an LSTM decoding architecture;
FIG. 4 is a LSTM architecture diagram;
FIG. 5 is a graph of Mandarin results based on the LSTM decoding architecture;
FIG. 6 is an AAD (auditory attention decoding) -Transformer architecture diagram;
FIG. 7 is a graph of Mandarin results based on AAD-Transformer;
FIG. 8 is a schematic diagram of a Mandarin auditory attention decoding brain electrical acquisition channel.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a Mandarin auditory attention decoding method based on EEG, which is based on an auditory attention decoding architecture model of EEG, and mainly comprises the following steps:
101: executing an experiment task;
the tested person is prompted to execute an auditory attention task with specific duration and autonomously selecting an attention object according to an experimental paradigm design, and the voice envelope characteristic is indirectly extracted by acquiring an EEG signal and is subsequently utilized by utilizing the EEG signal and the voice envelope characteristic.
102: extracting voice envelopes;
the method comprises the following steps: converting the voice signal into envelope characteristic, carrying out 100-8000Hz logarithmic space linear distribution on the voice signal, and then obtaining voice envelope data through Hilbert-Huang transformation.
103: preprocessing an electroencephalogram signal;
the method comprises the following steps: preprocessing an EEG signal acquired by an experiment, which mainly comprises the following steps: re-referencing, eliminating useless leads, band-pass filtering, etc.
104: an auditory attention decoding architecture;
and taking the extracted voice envelope and the collected EEG signal as the input of an auditory attention decoding framework, and outputting the decoded voice envelope as a reconstructed voice envelope. And decoding the auditory attention object according to the reconstructed speech envelope and the candidate speech envelope by using a Pearson correlation coefficient.
Wherein the auditory attention decoding architecture is: 1) The auditory attention decoding architecture inputs 17 channels of electroencephalogram signals and candidate speech envelopes; 2) A nonlinear model between the brain electrical signal and the speech envelope is composed of an LSTM (long-short time memory unit) or AAD-transducer (transducer based on auditory attention) model structure; 3) And calculating Pearson's correlation coefficient between the reconstructed voice envelope and the candidate voice envelope, and taking the voice stream with the maximum correlation coefficient between the candidate voice and the reconstructed voice envelope as the tested auditory attention object.
Wherein, the EEG signal acquisition in step 101 is implemented by the following means:
neuroscan brain cap: the collecting cap is made of black elastic cloth and is used for collecting brain scalp electroencephalogram signals of a human. Holes are arranged on the temporal lobe, the forehead lobe and the occipital lobe at two sides of the collecting cap, and buckles are arranged to facilitate the installation and the disassembly of the electroencephalogram signal collector.
The Neuroscan data acquisition and control device comprises: the device is connected with an electroencephalogram cap, data acquisition and amplification functions are performed, event labeling is synchronized, and recorded digital signals and event labels are all stored in a computer (data analysis and model training at the later stage are facilitated).
The preprocessing of the electroencephalogram signals in the step 103 is realized by the following devices:
GPU display card: enveloping the acquired electroencephalogram signals and voices, training a model on a GTX3090 display card, and storing the trained model parameters.
In summary, the embodiments of the present invention can complete human-machine interaction through the above steps 101 to 104 by means of auditory attention decoding, and can be used for decoding auditory attention objects or speech content online or offline.
Example 2
The scheme of example 1 is further described below with reference to specific calculation formulas and examples, which are described in detail below:
the experimental data acquisition process of the embodiment of the invention is shown in fig. 1. The experiment paradigm is classic binaural hearing experiment paradigm, and the headphone is tried to carry out the experiment. Before the experiment begins, two character biographical story outlines of the experiment module appear on a computer screen, and a story which is interested is selected according to personal interests and is subjected to key response. The experiment main test sets the biographical audio of the person selected by the test to the left or right sound channel according to the feedback of the test, and the biographical audio which is not selected is set to the opposite sound channel for noise playing. Before the audio starts, the tested object is kept at rest for 30s, and the aim is to stabilize the electroencephalogram signal. After each experimental module is started, the subject needs to pay attention to listen to the selected biographical story audio. During the experiment, 8 single choice questions appeared randomly and were tested to answer based on the heard audio. The answers of the questions are basically numbers or proper nouns, the choices can be made after the questions are audited, advanced cognitive functions are not needed, and the randomly-occurring questions are used for ensuring the attention of the whole process of the testee and improving the quality of collected data.
EEG data were recorded using a NeuroScan SynAmps2 with 17 electrodes according to the International 10-20 System (17 electrode positions used in this experiment are shown in FIG. 8) at a digital sampling rate of 1000Hz. EEG data was pre-processed through a 0-150Hz band pass filter and a 50Hz notch filter, all referenced to the mean of all scalp channels. A total of 40 minutes of electroencephalographic data were obtained for each subject. The start of the speech playback is considered to be "0ms" of the electroencephalogram recording, and a total of 40 minutes of electroencephalogram recording was obtained per subject. EEG data from-300 ms to 0ms was used for EEG baseline correction. The electroencephalogram data is down-sampled to 128Hz in EEGLAB to ensure that the sampling rate of the electroencephalogram data is consistent with the speech envelope. EEG data was filtered to 0.1-35Hz. The speech signal is first passed through 128 Mel filter bank components linearly distributed in 100-8000Hz log space, averaging every 8 envelopes to get a 16-dimensional signal. A Hilbert-Huang transform is then performed to obtain a 16-dimensional speech envelope for each band. And linearly combining all the segmented 16-dimensional speech envelopes to obtain a complete speech envelope. The speech envelope is downsampled to 128Hz to conform to the EEG signal.
The embodiment of the invention randomly selects three sections of attention state electroencephalogram data, calculates the power spectral density of each channel and draws the topographic map of the brain. Power spectral density greater than 15 μ V 2 the/Hz channel is associated with auditory attention related brain regions located in the prefrontal, temporal and parietal lobes. As shown in fig. 8, activation of the brain is mainly STG (temporal upper gyrus), prefrontal lobe and parietal lobe, which means that auditory attention requires participation of higher brain functions such as cognition in addition to auditory functions. Embodiments of the present invention use 17-channel EEG data, primarily covering the STG and prefrontal lobes as inputs to the decoding model.
In order to obtain the time dependency of the continuous speech stream auditory attention decoding model, the embodiment of the invention uses the LSTM to maintain the state of each unit to ensure the dynamic adjustment of the information stream. The LSTM architecture is shown in FIG. 4, memory cell state c t-1 And intermediate output h t-1 And subsequent input x t Interact to determine which elements in the internal state vector should be updated, maintained, or disappear based on the output at the previous time and the input at the current time.
The LSTM model with the forgetting factor can be expressed as the following equations (1) - (6):
e t =σ(x t U e +h t-1 W e ) (1)
f t =σ(x t U f +h t-1 W f ) (2)
a t =σ(x t U a +h t-1 W a ) (3)
Figure BDA0003645017380000073
Figure BDA0003645017380000071
h t =tanh(c t )*a t (6)
wherein, operator refers to element-by-element multiplication; σ refers to sigmoid activation function; e, f, and a represent the input gate (EEG and target speech envelope), forget gate, and output gate (reconstructed speech envelope), respectively; t refers to the time instant; w e ,W f ,W a And W and c representing a weight matrix required to be learned by LSTM training; u shape e ,U f ,U a And U is c Representing a coefficient matrix;
Figure BDA0003645017380000072
is a candidate hidden state calculated from the current input and the previous hidden state; c. C t Is an internal memory of the LSTM unit; h is t Represents the final output of the internal storage unit of the LSTM unit; h is a nonlinear function of the hidden layer. The LSTM unit can capture complex non-linear features in short-term and long-term sequences of data through the functionality of input gates, forgetting gates, and output gates.
In this work, a new method of speech envelope reconstruction based on EEG data is proposed, which uses a non-linear model of the LSTM network. And (3) obtaining an LSTM network nonlinear dynamic model between the electroencephalogram data and the voice envelope by combining an Adam optimization method, and being used for auditory attention detection. FIG. 4 illustrates a process for reconstructing a speech envelope based on brain electrical signals using an LSTM model, where the brain electrical signals may be considered a priori knowledge as input to the LSTM model. The target speech envelope serves as a target (output). The training process is as follows:
step 1: preprocess data and initialize weight matrix and offset vector to include W e ,W f ,W a ,W c ,U e ,U f ,U a And U is c
And 2, step: training a neural network using a back propagation method and a gradient-based optimizer, and updating weight coefficients and bias vectors to minimize a loss function;
and step 3: determining a decoding accuracy by performing a Pearson's correlation calculation between the reconstructed and candidate speech envelopes;
and 4, step 4: optimizing hyper-parameters (such as the number of hidden units, the learning rate, the input latitude, the number of hidden layers and the like) of the LSTM model by repeating the steps 2 to 3 until a model with the highest precision is obtained;
and 5: a data set comprising: the speech envelopes and corresponding electroencephalographic data of all the subjects are divided into a training set, a verification set and a test set for model training and testing, and no data repetition exists between each data set.
To prevent overfitting of the LSTM model, and improve generalization and portability, embodiments of the invention randomly draw a 6 minute data segment (40 minutes) from the entire EEG data for each subject for training, 2 minutes for data validation, and the rest (32 minutes) for the test data set.
The network trains between the target and the reconstructed speech envelope according to Pearson's correlation coefficient. The Mini-batch is set to 20, the random gradient decreases, and the model training adopts an Adam optimization algorithm. The initial values are set as follows: the learning rate is 0.01, the input dimension is 18, the number of concealment layers is 10, dropout is 0.5, and the number of concealment units is 64. The total training time was 300epochs, and model parameters were acquired and saved every 10 epochs. And finally, selecting the model dictionary with the best loss function (the root mean square error of the speech envelope and the Pearsons coefficient) as the optimal model for subsequent use. The software platform used to extract the speech envelope in this study was MATLAB 2018b. The architecture of the LSTM algorithm is the torch and Python version 3.8.
A linear TRF (time frequency response function) model is used for comparison of the decoding results. In this model, a spatio-temporal filter is trained and applied to the electroencephalographic data to reconstruct the envelope of the participating stimuli. The reconstructed envelope is then correlated (Pearson correlation coefficient) with each candidate envelope over the decision window. In the context of the auditory system, which is monitored by N recording channels, it can be assumed that the transient neural response r (T, N), sampled at time T =1 \ 8230t and channel N, constitutes a stimulus attribute s (T) by a convolution, with an unknown channel-specific TRF, w (τ, N). The response model can be expressed at discrete times as:
r(t,n)=∑ τ w(τ,n)s(t-τ)+ε(t-τ) (7)
among other things, the TRF model can be thought of as a filter that describes the linear transformation of ongoing stimuli into ongoing neural responses. W (τ, n) of the TRF model describes this transformation with respect to a certain time lag range τ of the temporal occurrence of the stimulation feature s (t).
The TRF model was also trained on the same dataset for fair comparison with LSTM. However, the decision windows for the LSTM and TRF models are 0.15s and 3s, respectively. Fig. 5 shows the results of the mandarin auditory attention decoding based on the EEG, wherein (a) the graph shows the decoding accuracy of two models on the tested data, (b) the graph shows the partial result graph of the reconstructed speech envelope and Pearson's coefficients of the target speech and the non-target speech, and (c) the graph shows the reconstructed speech envelope and the results of the target speech and the non-target speech envelope.
Example 3
The embodiment of the invention is a binaural hearing test based on the cocktail party problem, namely, two auditory stimuli are played simultaneously through a left channel and a right channel. Before the experiment began, the subjects were asked to select the verbal stimuli they were interested in based on the subject of the story, and they were required to focus on playing the story channel they selected. Subjects were asked to complete a selection question related to the content of the attention biographies that appeared randomly on the screen during the experiment to maintain their attention. Eight choice questions related to the content of the selected story were presented randomly to ensure that the subjects focused on listening to the selected story throughout the trial. There are four candidate answers to each question, only one of which is the correct answer. The correct answer is a number or proper noun. The experiment was intended to react immediately when the subject hears the answer. This process does not require the subject to perform higher cognitive functions, such as calculations and memory, to ensure that the EEG signals are instinctive responses to attention speech. The duration of each test was 9-11 minutes with a 5 minute interval between the two tests.
The auditory stimuli consisted of eight biographical stories read aloud by men or women in Mandarin native language and recorded using Sennheiser e845S at a sampling frequency of 48000 Hz. The voice signal is firstly subjected to band-pass filtering through 128 Mel filter banks linearly distributed in 100-8000Hz logarithmic space, and then voice envelope is obtained through Hilbert-Huang transformation.
According to the international 10-20 system, EEG data were recorded using a NeuroScan SynAmps2 with 64 electrodes and pre-processed by a 0.1-150Hz band pass filter and a 50Hz notch filter. EEG data is referenced to the average of all scalp channels.
The EEG data and speech envelope are down-sampled to 128Hz and used as raw signal input for the decoding model in subsequent steps. A 6-minute continuous piece of data was randomly drawn from the entire EEG data (40 minutes) for each subject for training, 2 minutes for validation, and the rest (32 minutes) for testing.
The AAD-transformer provided by the embodiment of the invention has an encoder-decoder framework, and the framework introduces a time self-attention module and a channel attention module based on electroencephalogram for auditory attention decoding. FIG. 6 shows a block diagram of the overall model and details of the time self-attention and channel attention modules.
The Transformer decoder is an autoregressive generative model, mainly using an attention-self mechanism and sinusoidal position information. Each layer consists of a time self-attention sublayer, a feedforward network sublayer, a residual network sublayer and a dropout layer.
The self-attention layer first finds the D-dimensional vector X = (X) 1 ,x 2 ,…,x l ) Conversion of EEG sequence into query Q = XW Q Bond K = XW K Sum value V = XW v Wherein W is Q ,W K And W v Are all D x D square matrices. Each LxD query, key and value matrix is then split into H x L D h Partial or attention head, indexed by h, dimension
Figure BDA0003645017380000101
This allows the model to focus on different parts of the history, where H represents the number of attention heads, L represents the length of the input brain electrical data, and D represents the dimensionality of the brain electrical data. Calculating a vector output sequence for each head by scaled dot product:
Figure BDA0003645017380000102
the outputs of each attention head are connected and the feed forward network (FF) is switched to obtain a matrix in dimensions Z, L x D. In the decoder, an upper triangular matrix ensures that the target speech envelope features do not appear in the query value after the current time instant. The FF network sublayer then takes the output Z from the attention sublayer from the previous time and performs two point-by-point dense layers in the depth D dimension.
FF(Z)=Softmax(ZW 1 +b 1 )W 2 +b 2 (9)
Wherein, W 1 ,W 2 ,b 1 ,b 2 Respectively the weight and the offset of the two layers.
Embodiments of the present invention provide a computational gating block function, i.e., attention module, that learns the nonlinear interaction between multiple EEG channels and the speech envelope. The function can be written as:
CR=Sigmoid(W 2 ReLU(W 1 X)) (10)
wherein X represents the original EEG signal, W 1 And W 2 Representing the learning matrix and CR the channel attention output. Two fully connected layers are used to gain channel attention. The first fully-connected layer is followed by a ReLULayer, the second fully connected layer is followed by a Sigmoid layer. The purpose of the attention module of the EEG channel is to study the importance of each channel by calculating the attention contribution. The module can evaluate the influence of different brain areas on auditory attention and provide a numerical model basis for optimizing brain electrical channels.
The encoder aims to extract the robustness of the EEG signal input. The dominant feature vector is preferentially considered as a self-attention feature map of the next layer, wherein the self-attention feature map comprises multi-head time self-attention, a feed-forward network, a residual network and layer normalization. A one-dimensional convolution filter (kernel width =3, fill = 1) is first performed in the time dimension using the ReLU activation function, and then a conventional self-attention operation is performed. Embodiments of the present invention use 6 blocks (Nx) as the basic structure of the entire encoder to obtain the final hidden representation of the encoder.
Embodiments of the present invention use a standard decoder structure as shown in fig. 6, consisting of a multi-head attention layer and a masked multi-head attention. In the masked multi-head attention module of the decoder, the position information is multiplied by an upper triangular matrix directly after passing through a Softmax function. It can prevent information leakage at the next position. Finally, the predicted speech envelope is realized in turn by the full connection layer. The embodiment of the invention selects a linear combination of Mean Square Error (MSE) and Pearson's as a loss function of target speech envelope prediction, and propagates the loss from the output of the decoder back to the entire model.
The mandarin chinese auditory attention decoding results based on AAD-Transformer are shown in fig. 7, where (a) the graph shows the auditory attention decoding accuracy for each test; (b) The graph shows the mean decoding accuracy of a linear model (TRF), a codec model of AAD-Transform (including only temporal self-attention modules), and a codec model of AAD-Transform 2 (including temporal self-attention and channel attention modules).
An EEG-based Mandarin auditory attention decoding device, the device comprising: an electroencephalogram cap, an amplifier and a processor,
the electroencephalogram cap is used for collecting electroencephalogram signals, the electroencephalogram signals are transmitted to the processor after being processed by the amplifier, and the processor calls stored program instructions to enable the device to execute the method steps:
establishing a nonlinear model between EEG and voice envelope through a deep learning architecture, and taking the extracted voice envelope and the collected EEG signal as the input of the nonlinear model;
based on the nonlinear model, constructing a speech envelope reconstruction model based on an electroencephalogram signal by means of a long-time memory artificial neural network (LSTM) and a deep learning model (Transformer) based on a self-attention mechanism;
calculating Pearson's correlation coefficient between the reconstructed voice envelope and the candidate voice envelope, and taking the voice stream with the maximum correlation coefficient between the candidate voice and the reconstructed voice envelope as the tested auditory attention object; decoding the voice content either online or offline,
the reconstructed speech envelope and the auditory attention object are output.
Wherein the collected EEG signals are:
calculating power spectral density of each channel by adopting three-segment attention state electroencephalogram data, and drawing a topographic map of the brain, wherein the power spectral density is more than 15 muA 2 The method comprises the following steps that a/Hz channel is related to brain regions related to auditory attention, the brain regions are located in the prefrontal lobe, the temporal lobe and the parietal lobe, 17-channel EEG data are used, STG and the prefrontal lobe are covered to serve as input of a nonlinear model, EEG signals are filtered through a band-pass filter of 0.1-30Hz, and EEG information related to voice signals and attention is obtained;
the brain activation state related to auditory attention is obtained by calculating the power spectral density of the electroencephalogram signal, and the characteristic expression that the mandarin activation brain area has left lateralization in the forehead lobe is found.
Further, the speech envelope is:
before the experiment begins, two character biographical story outlines of the experiment appear on a computer screen, a tested person selects an interesting story according to personal interest and carries out key response, an experiment main test sets biographical audio of the tested person selected to a left sound channel or a right sound channel according to the feedback of the tested person, and unselected biographies are set to an opposite sound channel for noise playing;
in the experimental process, 8 single choice questions randomly appear, the tested person answers according to the heard audio frequency, the answer is a number or a proper noun, and the tested person can make a choice after hearing;
EEG data from-300 ms to 0ms is used for EEG baseline correction, EEG data is down-sampled to 128Hz in EEGLAB to ensure that the sampling rate of the EEG data is consistent with the voice envelope, the EEG data is filtered to 0.1-35Hz, collected voice signals are subjected to logarithmic linear distribution 128 frequency segments in the frequency distribution range of Mandarin Chinese according to 100-8000Hz, voice envelope extraction is carried out, voice envelopes are extracted through Hilbert-Huang transformation, then 16 voice envelopes are obtained by carrying out superposition averaging on every eight frequency segments, and the 16 voice envelopes are subjected to linear combination to obtain the whole voice envelope of Mandarin Chinese.
The voice envelope reconstruction model specifically comprises the following steps:
1) Preprocessing data and initializing a weight matrix and a bias vector;
2) Training a neural network using a back-propagation and gradient-based optimizer and updating weight coefficients and bias vectors to minimize a loss function;
3) Determining a decoding accuracy by performing Pearson's correlation calculations between the reconstructed and candidate speech envelopes;
4) Optimizing the hyper-parameters of the LSTM model by repeating the steps 2) to 3) until a model with the highest precision is obtained;
5) Dividing the voice envelopes of all the subjects and corresponding electroencephalogram data into a training set, a verification set and a test set for model training and testing, wherein data repetition does not exist among the data sets;
6) According to the Pearson correlation coefficient, the network trains between the target and the reconstructed speech envelope, the Mini-batch is set to be 20, the random gradient is reduced, the Adam optimization algorithm is adopted for model training, and the initial values are set as follows: the learning rate is 0.01, the input dimension is 18, the number of concealment layers is 10, dropout is 0.5, the number of concealment units is 64, the total training time is 300epochs, and model parameters are acquired and stored every 10 epochs.
Further, the self-attention mechanism is: the importance of each channel is researched by calculating attention contribution, the influence of different brain areas on auditory attention is evaluated, and a numerical model basis is provided for optimizing electroencephalogram channels; the function is:
CR=Sigmoid(W 2 ReLU(W 1 X))
wherein X represents the original EEG signal, W 1 And W 2 Representing a learning matrix, CR representing a channel attention output, using two fully connected layers to obtain channel attention; the first fully connected layer is followed by the ReLU layer and the second fully connected layer is followed by the Sigmoid layer.
Wherein, the decoding adopts a standard decoder structure, namely, the decoder structure consists of a multi-head attention layer and a mask multi-head attention module,
in the mask multi-head attention module, multiplying position information by an upper triangular matrix after passing through a Softmax function, and preventing information leakage of the next position; sequentially realizing predicted voice envelopes through a full connection layer;
the linear combination of mean square error and Pearson's is chosen as a loss function for the target speech envelope prediction and the loss is propagated from the output of the decoder back to the entire model.
It should be noted that the device description in the above embodiments corresponds to the method description in the embodiments, and the embodiments of the present invention are not described herein again.
The execution main body of the processor can be a computer, a single chip microcomputer, a microcontroller and other devices with calculation functions, and in the specific implementation, the execution main body is not limited in the embodiment of the invention and is selected according to the requirements in practical application.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. An EEG-based Mandarin auditory attention decoding method, the method comprising:
establishing a nonlinear model between an EEG and a speech envelope through a deep learning architecture, and taking the extracted speech envelope and the collected EEG signal as the input of the nonlinear model;
based on the nonlinear model, a speech envelope reconstruction model based on electroencephalogram signals is constructed by means of a long-time memory artificial neural network (LSTM) and a deep learning model transducer based on an attention mechanism;
calculating Pearson's correlation coefficient between the reconstructed voice envelope and the candidate voice envelope, and taking the voice stream with the maximum correlation coefficient between the candidate voice and the reconstructed voice envelope as the tested auditory attention object; the speech content is decoded online or offline and the reconstructed speech envelope and auditory attention object are output.
2. An EEG-based Mandarin acoustic attention decoding method according to claim 1, wherein said acquired EEG signals are:
calculating power spectral density of each channel by using three-segment attention state electroencephalogram data, and drawing a topographic map of the brain, wherein the power spectral density is more than 15 muV 2 The method comprises the following steps that a/Hz channel is related to brain regions related to auditory attention, the brain regions are located in the prefrontal lobe, the temporal lobe and the parietal lobe, 17-channel EEG data are used, STG and the prefrontal lobe are covered to serve as input of a nonlinear model, EEG signals are filtered through a band-pass filter of 0.1-30Hz, and EEG information related to voice signals and attention is obtained;
the brain activation state related to auditory attention is obtained by calculating the power spectral density of the electroencephalogram signal, and the characteristic expression that the mandarin activation brain area has left lateralization in the forehead lobe is found.
3. An EEG-based Mandarin acoustic attention decoding method as claimed in claim 1, wherein said speech envelope is:
before the experiment begins, two character biographical story outlines of the experiment appear on a computer screen, a tested person selects an interesting story according to personal interest and carries out key response, an experiment main test sets biographical audio of the tested person selected to a left sound channel or a right sound channel according to the feedback of the tested person, and unselected biographies are set to an opposite sound channel for noise playing;
in the experimental process, 8 single-item selection questions randomly appear, the tested person answers according to the heard audio, the answer is a number or a proper noun, and the tested person can make a selection;
EEG data from-300 ms to 0ms is used for EEG baseline correction, EEG data is downsampled to 128Hz in EEGLAB to ensure that the sampling rate of EEG data is consistent with the speech envelope, and EEG data is filtered to 0.1-35Hz;
the collected voice signals are subjected to logarithmic linear distribution for 128 frequency segments within the frequency distribution range of 100-8000Hz according to the Mandarin Chinese, voice envelope extraction is carried out, voice envelopes are extracted through Hilbert-Huang transformation, then every eight frequency segments are subjected to superposition averaging to obtain 16 voice envelopes, and the 16 voice envelopes are subjected to linear combination to obtain the integral voice envelope of the Mandarin Chinese.
4. The EEG-based Mandarin auditory attention decoding method of claim 1, wherein said speech envelope reconstruction model is specifically:
1) Preprocessing data and initializing a weight matrix and a bias vector;
2) Training a neural network using a back-propagation and gradient-based optimizer, and updating weight coefficients and bias vectors to minimize a loss function;
3) Determining a decoding accuracy by performing a Pearson's correlation calculation between the reconstructed and candidate speech envelopes;
4) Optimizing the hyper-parameters of the LSTM model by repeating the steps 2) to 3) until a model with the highest precision is obtained;
5) Dividing the voice envelopes of all the subjects and corresponding electroencephalogram data into a training set, a verification set and a test set for model training and testing, wherein data repetition does not exist among the data sets;
6) According to the Pearson correlation coefficient, the network trains between the target and the reconstructed speech envelope, the Mini-batch is set to be 20, the random gradient is reduced, the Adam optimization algorithm is adopted for model training, and the initial values are set as follows: the learning rate is 0.01, the input dimension is 18, the number of concealment layers is 10, dropout is 0.5, the number of concealment units is 64, the total training time is 300epochs, and model parameters are acquired and stored every 10 epochs.
5. An EEG-based Mandarin auditory attention decoding method as claimed in claim 1, wherein said self-attention mechanism is: the importance of each channel is researched by calculating attention contribution, the influence of different brain areas on auditory attention is evaluated, and a numerical model basis is provided for optimizing electroencephalogram channels; the function is:
CR=Sigmoid(W 2 ReLU(W 1 X))
wherein X represents the original EEG signal, W 1 And W 2 Representing a learning matrix, CR representing a channel attention output, using two fully connected layers to obtain channel attention; the first fully connected layer is followed by the ReLU layer and the second fully connected layer is followed by the Sigmoid layer.
6. An EEG-based Mandarin acoustic attention decoding method according to claim 1, wherein said decoding employs a standard decoder structure, i.e. consisting of a multi-head attention layer and a masked multi-head attention module,
in the mask multi-head attention module, multiplying position information by an upper triangular matrix after passing through a Softmax function, and preventing information leakage of the next position; sequentially realizing predicted voice envelopes through a full connection layer;
the linear combination of mean square error and Pearson's is chosen as a loss function for the target speech envelope prediction and the loss is propagated from the output of the decoder back to the entire model.
7. An EEG-based Mandarin acoustic attention decoding method as claimed in claim 1, wherein said decoding time window varies from 60s up to 0.15s, and wherein the shortest decoding time window has a length of 0.15s and contains a smallest phonetic unit.
8. An EEG-based Mandarin auditory attention decoding apparatus, the apparatus comprising: an electroencephalogram cap, an amplifier and a processor,
the electroencephalogram cap is used for collecting electroencephalogram signals, the electroencephalogram signals are transmitted to the processor after being processed by the amplifier, and the processor calls stored program instructions to enable the device to execute the method steps of any one of claims 1-7.
CN202210527156.XA 2022-05-16 2022-05-16 Mandarin auditory attention decoding method and device based on EEG Pending CN115153563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210527156.XA CN115153563A (en) 2022-05-16 2022-05-16 Mandarin auditory attention decoding method and device based on EEG

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210527156.XA CN115153563A (en) 2022-05-16 2022-05-16 Mandarin auditory attention decoding method and device based on EEG

Publications (1)

Publication Number Publication Date
CN115153563A true CN115153563A (en) 2022-10-11

Family

ID=83482928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210527156.XA Pending CN115153563A (en) 2022-05-16 2022-05-16 Mandarin auditory attention decoding method and device based on EEG

Country Status (1)

Country Link
CN (1) CN115153563A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011506A (en) * 2023-03-28 2023-04-25 同心智医科技(北京)有限公司 Method for constructing stereo electroencephalogram electrode signal decoding model and application thereof
CN116172580A (en) * 2023-04-20 2023-05-30 华南理工大学 Auditory attention object decoding method suitable for multi-sound source scene
CN116269447A (en) * 2023-05-17 2023-06-23 之江实验室 Speech recognition evaluation system based on voice modulation and electroencephalogram signals
CN117130490A (en) * 2023-10-26 2023-11-28 天津大学 Brain-computer interface control system, control method and implementation method thereof
CN117437367A (en) * 2023-12-22 2024-01-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017218492A1 (en) * 2016-06-14 2017-12-21 The Trustees Of Columbia University In The City Of New York Neural decoding of attentional selection in multi-speaker environments
CN107864440A (en) * 2016-07-08 2018-03-30 奥迪康有限公司 Hearing assistance system including EEG records and analysis system
CN110830898A (en) * 2018-08-08 2020-02-21 斯达克实验室公司 Electroencephalogram-assisted beamformer, method of beamforming, and ear-worn hearing system
US20200201435A1 (en) * 2018-12-20 2020-06-25 Massachusetts Institute Of Technology End-To-End Deep Neural Network For Auditory Attention Decoding
CN113192504A (en) * 2021-04-29 2021-07-30 浙江大学 Domain-adaptation-based silent voice attack detection method
US20210397952A1 (en) * 2018-10-17 2021-12-23 Georgia Tech Research Corporation Systems and methods for decoding code-multiplexed coulter signals using machine learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017218492A1 (en) * 2016-06-14 2017-12-21 The Trustees Of Columbia University In The City Of New York Neural decoding of attentional selection in multi-speaker environments
CN107864440A (en) * 2016-07-08 2018-03-30 奥迪康有限公司 Hearing assistance system including EEG records and analysis system
CN110830898A (en) * 2018-08-08 2020-02-21 斯达克实验室公司 Electroencephalogram-assisted beamformer, method of beamforming, and ear-worn hearing system
US20210397952A1 (en) * 2018-10-17 2021-12-23 Georgia Tech Research Corporation Systems and methods for decoding code-multiplexed coulter signals using machine learning
US20200201435A1 (en) * 2018-12-20 2020-06-25 Massachusetts Institute Of Technology End-To-End Deep Neural Network For Auditory Attention Decoding
CN113192504A (en) * 2021-04-29 2021-07-30 浙江大学 Domain-adaptation-based silent voice attack detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAOHENG KUANG;JUN QU: "LSTM Model with Self-Attention Mechanism for EEG Based Cross-Subject Fatigue Detection", 2021 IEEE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS TECHNOLOGY OF INFORMATION AND COMPUTER (ICFTIC), 24 December 2021 (2021-12-24), pages 148 - 153 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011506A (en) * 2023-03-28 2023-04-25 同心智医科技(北京)有限公司 Method for constructing stereo electroencephalogram electrode signal decoding model and application thereof
CN116172580A (en) * 2023-04-20 2023-05-30 华南理工大学 Auditory attention object decoding method suitable for multi-sound source scene
CN116172580B (en) * 2023-04-20 2023-08-22 华南理工大学 Auditory attention object decoding method suitable for multi-sound source scene
CN116269447A (en) * 2023-05-17 2023-06-23 之江实验室 Speech recognition evaluation system based on voice modulation and electroencephalogram signals
CN116269447B (en) * 2023-05-17 2023-08-29 之江实验室 Speech recognition evaluation system based on voice modulation and electroencephalogram signals
CN117130490A (en) * 2023-10-26 2023-11-28 天津大学 Brain-computer interface control system, control method and implementation method thereof
CN117130490B (en) * 2023-10-26 2024-01-26 天津大学 Brain-computer interface control system, control method and implementation method thereof
CN117437367A (en) * 2023-12-22 2024-01-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function
CN117437367B (en) * 2023-12-22 2024-02-23 天津大学 Early warning earphone sliding and dynamic correction method based on auricle correlation function

Similar Documents

Publication Publication Date Title
US11961533B2 (en) Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
CN115153563A (en) Mandarin auditory attention decoding method and device based on EEG
Geirnaert et al. Electroencephalography-based auditory attention decoding: Toward neurosteered hearing devices
CA3145254A1 (en) Method of contextual speech decoding from the brain
Kello et al. A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters
Severini et al. Automatic detection of cry sounds in neonatal intensive care units by using deep learning and acoustic scene simulation
Mini et al. EEG based direct speech BCI system using a fusion of SMRT and MFCC/LPCC features with ANN classifier
Xu et al. Decoding selective auditory attention with EEG using a transformer model
Krishna et al. Improving eeg based continuous speech recognition
WO2021035067A1 (en) Measuring language proficiency from electroencephelography data
Koctúrová et al. EEG-based speech activity detection
Zakeri et al. Supervised binaural source separation using auditory attention detection in realistic scenarios
Ribeiro et al. Silent versus modal multi-speaker speech recognition from ultrasound and video
Xu et al. Auditory attention decoding from eeg-based mandarin speech envelope reconstruction
Krishna et al. Continuous Silent Speech Recognition using EEG
Anumanchipalli et al. Intelligible speech synthesis from neural decoding of spoken sentences
Geirnaert et al. EEG-based auditory attention decoding: Towards neuro-steered hearing devices
Bollens et al. SparrKULee: A Speech-evoked Auditory Response Repository of the KU Leuven, containing EEG of 85 participants
Sharon et al. The" Sound of Silence" in EEG--Cognitive voice activity detection
Sharma et al. Human-Computer Interaction with Special Emphasis on Converting Brain Signals to Speech
Gaddy Voicing Silent Speech
Lai et al. Plastic multi-resolution auditory model based neural network for speech enhancement
Li et al. Esaa: An Eeg-Speech Auditory Attention Detection Database
Datta Brain signal recognition using deep learning
CN117130490B (en) Brain-computer interface control system, control method and implementation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination