AU2021105454A4 - Acoustic concentration decoding from electroencephalography based on long squat term recollection network - Google Patents

Acoustic concentration decoding from electroencephalography based on long squat term recollection network Download PDF

Info

Publication number
AU2021105454A4
AU2021105454A4 AU2021105454A AU2021105454A AU2021105454A4 AU 2021105454 A4 AU2021105454 A4 AU 2021105454A4 AU 2021105454 A AU2021105454 A AU 2021105454A AU 2021105454 A AU2021105454 A AU 2021105454A AU 2021105454 A4 AU2021105454 A4 AU 2021105454A4
Authority
AU
Australia
Prior art keywords
model
subject
squat
decoding
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021105454A
Inventor
Aman Joshi
Vijay Mohan
Bharti Panjwani
Ankur Singh Rana
Abhay Sharma
Bharat Bhushan Sharma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mohan Vijay Dr
Panjwani Bharti Ms
Rana Ankur Singh Dr
Sharma Bharat Bhushan Dr
Original Assignee
Mohan Vijay Dr
Panjwani Bharti Ms
Rana Ankur Singh Dr
Sharma Abhay Mr
Sharma Bharat Bhushan Dr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mohan Vijay Dr, Panjwani Bharti Ms, Rana Ankur Singh Dr, Sharma Abhay Mr, Sharma Bharat Bhushan Dr filed Critical Mohan Vijay Dr
Priority to AU2021105454A priority Critical patent/AU2021105454A4/en
Application granted granted Critical
Publication of AU2021105454A4 publication Critical patent/AU2021105454A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/372Analysis of electroencephalograms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation

Abstract

A healthy person can attend to one speech in a multi-speaker scenario; however, this ability is not available to some people suffering from hearing impairments. Therefore, research on acoustic attention detection based on electroencephalography (EEG) is a possible way to help hearing-impaired listeners detect the focused speech. Many previous studies used linear models or deep learning to decode the attended speech, but the cross-subject decoding accuracy is low, especially within squat time duration. In this study, we propose a multi-task learning model based on long squat term recollection network to simultaneously perform attention decoding and reconstruct the attended temporal amplitude envelopes (Temporal amplitude envelopes ) in a 2s time condition. The experimental results show that compared to the traditional linear method, both the subject-specific and cross-subject decoding performance showed great improvement. Particularly, the cross-subject decoding accuracy was improved from 56% to 82% in twos condition in the dichotic listening experiment. Furthermore, it was found that the frontal and temporal regions of the brain were more important for the detection of acoustic attention by analyzing the channel contribution map. In summary, the proposed method is promising for nerve-steered hearing aids which can help hearing-impaired listeners to make faster and accurate attention detection 1 Conr-1Pol-1 Cenv-2 Pol-2 Fig 1: proposed model R+D A Fig 2: Listening experiment

Description

Conr-1Pol-1 Cenv-2 Pol-2
Fig 1: proposed model R+D
A
Fig 2: Listening experiment
TITLE OF THE INVENTION ACOUSTIC CONCENTRATION DECODING FROM ELECTROENCEPHALOGRAPHY BASED ON LONG SQUAT TERM RECOLLECTION NETWORK
Field and background of the invention
Understanding speech in a noisy environment requires not only the acoustic system
to separate simultaneous speech streams but also the ability to focus on a specific
speech stream while suppressing irrelevant information. Although the normal
hearing person can easily analyze complex speech scenes, some people with
hearing impairments encounter difficulties in understanding speech in a noisy
environment, even if they wear acoustic prostheses
The previous studies proposed an idea that reconstruct the temporal amplitude
envelopes of the speech from neural signals and compare the correlation of
reconstructed temporal amplitude envelopes with the presented attended and
unattended speech to detect the acoustic attention. They supposed that if one
attended to the speech, the reconstructed correlation is higher in the attended side
than the unattended side. This idea based on a hypothesis that the cortical signal
can track the temporal amplitude envelopes of the acoustic speech signal.
Numerous electrophysiological studies have shown that the slower than 20 Hz time domain fluctuations in speech signals are synchronized with low-frequency cortical activity in the delta
Summary of Invention
Forty subjects participated in the dichotic listening experiment. Participants
reported no history of hearing mutilation or neurological disease and were right
handed. In the experiment, the subject listened to two classic fictions that were
played in agreement, and different stories were played in the left and right ear
respectively through the Sennheiser HD650 headphone. The story played in each
trial is continuous with the previous trial. An additional male speaker read each
story. Participants were averagely divided into two groups, and each group listened
to one of the stories in the left or right ear and ignored the other. After each trial,
the subjects were required to answer 5 to 7 MCQs about the story to check the
concentration. The amplitude of each speech stream was normalized to the same
root mean square intensity. All silence interval of audio were shortened to 0.4 s to
prevent the subject from disturbing. Moreover, subjects should fix their eyes on the
crosshairs on the screen during each trial and minimized blinks, head movements,
and all other motor activities
Multi task learning model
Long squat term recollection network can automatically extract abundant features
through layer by layer convolution. The lower layer extracts local and spatial features, such as boundaries. And higher layer can extract global features, such as complex shapes and complete objects. Electroencephalography spatial maps can be regarded as a 2 dimensional image, 1 dimension is the temporal context, and the other is the electrode spatial dimension. Therefore, our study employed long squat term recollection network to perform automatic feature extraction on electroencephalography signals. The proposed multi task learning model (R+D model) innovatively combined the task of attended temporal amplitude envelopes reconstruction with the classification task for the left and right direction by sharing hidden convolution layers. The model used the task of stimulus modernization to facilitate the learning of the classification task for direction.
Brief description of the system
In contrast, the long squat term recollection network based nonlinear models
achieve great classification results under 2s duration. Among them, the
classification accuracy of the D model is about 89.73%, and the average accuracy
of the proposed R+D model reaches 90.78%. Since the nonlinear classification
model can already obtain very high classification accuracy, in that case, the
accuracy of the proposed R+D model dose not have great improvement in the
subject-specific condition. However, for each subject, the classification accuracy
of our proposed R+D model is improved compared to the D model.
Due to the huge difference in electroencephalography signals between subjects, the
classification accuracy of these three models under the cross-subject conditions is
lower than that of the subject specific condition. The results have shown that the m
TRF linear model had a poor effect of about 55.66%, however, the cross subject
decoding accuracy of the R+D model can reach 82.48% under twos condition.
Compared with the m TRF model and D model, the R+D model has higher
classification accuracy in twos condition when crossing the subject which also
indicated that long squat term recollection network had a good effect in extracting
the electroencephalography features. But it is worth noting that for the same
network structure, the R model that only performs the reconstruction task, the
classification accuracy is close to the chance level under both conditions of
subject-specific and cross-subject. The main reason for this result of the R model
may be that the length of time of a single sample is small and the output dimension
is large. Perhaps the RNN model that is more sensitive to time series can be more
effective for the reconstruction task. This requires further research in the future
Description of the system
We focus on achieving continuous electroencephalography signal classification
under a squat time of twos in cross subject conditions to detect acoustic attention.
We propose a multi-task learning model based on long squat term recollection
network which combines the binary classification task for direction and the temporal amplitude envelopes reconstruction task by sharing hidden layers to co training model parameters to improve the classification accuracy. To that end, we compared three types of models, including the linear m TRF model, long squat term recollection network based D model, and the proposed R+D multi-task learning model. By comparing and analyzing our experimental results, it is proved that our proposed model can achieve better classification accuracy under both subject-specific and cross-subject conditions compared with the other two models in a squat time of twos. Compared with the traditional mTRF linear model, we increased the classification accuracy from 55.66% to 82.38% under the cross subjects condition. Moreover, we obtained the contribution of all the channels through calculating the interlayer relevance of the proposed model using the LRP algorithm and found that the temporal lobe and part of the frontal lobe channels contributed greatly to this AAD issue.
However, this study also has some squat comings. First, the stimuli data of this
public data set is temporal amplitude envelopes that have been extracted by Hilbert
transform but not the original audio so that we cannot make more improvements in
the process of extracting the temporal amplitude envelopes. Secondly, in order to
be closer to the actual application, we did not specifically remove the blinking, eye
movement and other motor artifacts in the preprocessing of the
electroencephalography signals, which led to the LRP results may be interfered by blinking and eye movement so that we cannot get more accurate channel contribution results.

Claims (6)

CLAIMS We Claim:
1. R+D model, in addition to training the mTRF model on the training and
testing set, we also increased a long squat term recollection network
based direction binary classification model
2. Electroencephalography spatial maps can be regarded as a two
dimensional image, one dimension is the temporal context, and the other
is the electrode spatial dimension
3. The convolution kernel size of the first convolution layer is 3 x 3, a total
of 16 filters, the kernel size of second layer is 3 x 3 with a total of 32
filters
4. The classification task was followed by a fully connected layer with a
dropout of 0.4, and then a softmax layer, which outputs a 1 x 2 matrix
represented the direction
5. The classification accuracy of the D model is about 89.73%, and the
average accuracy of the proposed R+D model reaches 90.78%
6. A multi-task learning model based on long squat term recollection network which combines the binary classification task for direction
Fig 1 : proposed model R+D
Fig 2 : Listening experiment
AU2021105454A 2021-08-13 2021-08-13 Acoustic concentration decoding from electroencephalography based on long squat term recollection network Ceased AU2021105454A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021105454A AU2021105454A4 (en) 2021-08-13 2021-08-13 Acoustic concentration decoding from electroencephalography based on long squat term recollection network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021105454A AU2021105454A4 (en) 2021-08-13 2021-08-13 Acoustic concentration decoding from electroencephalography based on long squat term recollection network

Publications (1)

Publication Number Publication Date
AU2021105454A4 true AU2021105454A4 (en) 2021-11-25

Family

ID=78610520

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021105454A Ceased AU2021105454A4 (en) 2021-08-13 2021-08-13 Acoustic concentration decoding from electroencephalography based on long squat term recollection network

Country Status (1)

Country Link
AU (1) AU2021105454A4 (en)

Similar Documents

Publication Publication Date Title
US11961533B2 (en) Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
EP3469584B1 (en) Neural decoding of attentional selection in multi-speaker environments
Vandecappelle et al. EEG-based detection of the locus of auditory attention with convolutional neural networks
CN110353702A (en) A kind of emotion identification method and system based on shallow-layer convolutional neural networks
O'Sullivan et al. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations
Scharinger et al. You had me at “Hello”: Rapid extraction of dialect information from spoken words
Alain et al. Hearing two things at once: neurophysiological indices of speech segregation and identification
Das et al. Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research
Haghighi et al. EEG-assisted modulation of sound sources in the auditory scene
Andermann et al. Neuromagnetic correlates of voice pitch, vowel type, and speaker size in auditory cortex
CN115153563A (en) Mandarin auditory attention decoding method and device based on EEG
Cai et al. EEG-based auditory attention detection via frequency and channel neural attention
Zhang et al. A learnable spatial mapping for decoding the directional focus of auditory attention using eeg
AU2021105454A4 (en) Acoustic concentration decoding from electroencephalography based on long squat term recollection network
Robson et al. Mismatch negativity (MMN) reveals inefficient auditory ventral stream function in chronic auditory comprehension impairments
Zakeri et al. Supervised binaural source separation using auditory attention detection in realistic scenarios
Pahuja et al. XAnet: cross-attention between EEG of left and right brain for auditory attention decoding
Zhang et al. EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep Learning.
Simon et al. Optimal time lags for linear cortical auditory attention detection: differences between speech and music listening
Yang et al. Stimulus reconstruction based auditory attention detection using EEG in multi-speaker environments without access to clean sources
Jiang et al. Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG
Han et al. Improved decoding of attentional selection in multi-talker environments with self-supervised learned speech representation
Oda et al. EEG data analysis for intellectual developmental disorder
Van Brenk et al. The relationship between acoustic indices of speech motor control variability and other measures of speech performance in dysarthria
Lai et al. Plastic multi-resolution auditory model based neural network for speech enhancement

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry