WO2021035067A1 - Mesure de la compétence linguistique à partir de données d'électroencéphalographie - Google Patents

Mesure de la compétence linguistique à partir de données d'électroencéphalographie Download PDF

Info

Publication number
WO2021035067A1
WO2021035067A1 PCT/US2020/047232 US2020047232W WO2021035067A1 WO 2021035067 A1 WO2021035067 A1 WO 2021035067A1 US 2020047232 W US2020047232 W US 2020047232W WO 2021035067 A1 WO2021035067 A1 WO 2021035067A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
eeg
data
linguistic
features
Prior art date
Application number
PCT/US2020/047232
Other languages
English (en)
Inventor
Nima Mesgarani
Giovanni Di LIBERTO
Jingping NIE
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Publication of WO2021035067A1 publication Critical patent/WO2021035067A1/fr

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/12Audiometering
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems

Definitions

  • the human brain can process a variety of acoustic and linguistic properties for speech comprehension.
  • Speech perception can be underpinned by a hierarchical neural network that processes different linguistic properties in distinct interconnected cortical layers. For example, a person can attend to one speaker in a multi-speaker and noisy environment while inhibiting and masking out the unattended speech (e.g., the so-called cocktail party phenomenon).
  • the language acquisition process is dependent on the pre-existing knowledge of a person. For example, learning a second language can be a challenging process that differs from native language acquisition.
  • An example device can include one or more processors adapted to implement a regression model.
  • the regression model can be configured to receive electroencephalogram (EEG) data of the subject, wherein the subject is exposed to a target sound, estimate a linguistic feature from the target sound the EEG data or a combination thereof, and the EEG data or a combination thereof.
  • EEG electroencephalogram
  • the linguistic feature can be selected from a phonemic feature, a phonotactic feature, a semantic feature, and combinations thereof.
  • the phonemic feature can include a cohort size at each phoneme, a cohort reduction variable, or a combination thereof.
  • the phonotactic feature can include a phonotactic probability.
  • the semantic feature can include a semantic vector.
  • the linguistic feature can further include envelopes and/or an auditory spectrogram of the target sound.
  • the device can further include one or more sensor components for obtaining the EEG data of the subject.
  • the sensor components can be coupled to the one or more processors via a wired connection or a wireless connection.
  • the device can be configured to assess the accuracy of the decoding by predicting a brain response of the subject exposed to the target sound.
  • the processors can be configured to be trained by receiving training data.
  • the training data can include electroencephalogram (EEG) data, linguistic feature data, or a combination thereof.
  • EEG electroencephalogram
  • the regression model can be configured to estimate a temporal response function (TRF) for the linguistic features.
  • TRF temporal response function
  • the device can be configured to assess language proficiency and/or native status of the subject based on the TRF.
  • the device can further include one or more sound input components for collecting sounds and one or more sound output components.
  • the sound input and output components can be coupled to the processors via a wired connection or a wireless connection.
  • the processors can be configured to amplify the target sound and/or decrease non-target sounds through the sound output components to facilitate hearing of the subject based on the predicted responses.
  • An example method can include receiving EEG data of the subject, wherein the subject is exposed to a target sound, estimating a linguistic feature of the target sound using a regression model, and decoding the linguistic features from the EEG data.
  • the linguistic feature can be selected from a phonemic feature, a phonotactic feature, a semantic feature, and combinations thereof.
  • the phonemic feature can include a cohort size at each phoneme, a cohort reduction variable, or a combination thereof.
  • the phonotactic feature can include a phonotactic probability.
  • the semantic property can include a semantic vector.
  • the linguistic feature can be estimated from the target sound, the EEG data or a combination thereof
  • the method can include training the regression model by providing training data.
  • the training data includes electroencephalogram (EEG) data, linguistic feature data, or a combination thereof.
  • the method can include measuring EEG data from the subject exposed to the target sound. In non-limiting embodiments, the method can further include assessing accuracy of the decoding by predicting a brain response of the subject exposed to the target sound.
  • the method can include estimating a temporal response function (TRF) for linguistic features.
  • the method can further include assessing language proficiency and/or native status of the subject based on the TRF for the linguistic features.
  • TRF temporal response function
  • FIG. 1 is a diagram showing an example device in accordance with the disclosed subject matter.
  • FIG. 2 is a block diagram illustrating an example method in accordance with the disclosed subject matter.
  • FIG. 3 is a block diagram illustrating an example method in accordance with the disclosed subject matter.
  • FIG. 4 is a block diagram illustrating an example method in accordance with the disclosed subject matter.
  • FIG. 5A is a block diagram illustrating one or more elements of the presently disclosed techniques in accordance with the disclosed subject matter.
  • FIG. 5B is a graph showing example speech descriptors extracted from the audio waveform in accordance with the disclosed subject matter.
  • FIGs. 6A-6E are exemplary stimulus-EEG temporal response function (TRF) assessments for acoustics and phonetic features in accordance with the disclosed subject matter.
  • TRF temporal response function
  • FIGs. 7A-7C provide images showing TRFs for various linguistic features.
  • Fig. 7A provides graphs and images showing TRF analysis for phonotactic features.
  • Fig. 7B provides graphs and images showing TRF analysis for semantic features.
  • Fig. 7C provides graphs showing predicted language proficiency in accordance with the disclosed subject matter.
  • FIG. 8A provides an example speech hierarchy in accordance with the disclosed subject matter.
  • FIG. 8B provides example electrophysiol ogical recordings in accordance with the disclosed subject matter.
  • FIG. 8C provides an example regression model for predicting the brain signal in accordance with the disclosed subject matter.
  • FIG. 8D provides an example decision model in accordance with the disclosed subject matter.
  • FIG. 9A provides an exemplary graph showing the average AAD accuracy in accordance with the disclosed subject matter.
  • FIG. 9B provides an exemplary graph showing the AAD performance in accordance with the disclosed subject matter.
  • FIG. 9C provides an exemplary graph showing the AAD improvements in accordance with the disclosed subject matter.
  • FIGs. 10A-10E provide exemplary topoplots of correlations between the estimated and actual brain response in accordance with the disclosed subject matter.
  • FIG. 11 provides exemplary topoplots of attended and unattended linear model’s coefficients at various time intervals in accordance with the disclosed subject matter.
  • FIG. 12A provides a map showing an example correlation between the disclosed features at various linguistic levels in accordance with the disclosed subject matter.
  • FIG. 12B provides a map showing the correlation between shuffled in time features at various linguistic levels.
  • the disclosed subject matter provides techniques for improving the intelligibility of a subject.
  • the disclosed subject matter provides methods and devices for assessing the language proficiency of the subject and improving the intelligibility of the subject to a target sound.
  • the disclosed subject matter can be used for reconstructing intelligible speech from the human brain.
  • the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
  • a "subject” herein can be a human or a non-human animal, for example, but not by limitation, rodents such as mice, rats, hamsters, and guinea pigs; rabbits; dogs; cats; sheep; pigs; goats; cattle; horses; and non-human primates such as apes and monkeys, etc.
  • rodents such as mice, rats, hamsters, and guinea pigs
  • rabbits dogs; cats; sheep; pigs; goats; cattle; horses; and non-human primates such as apes and monkeys, etc.
  • the term “intelligibility” refers to a measure of how comprehensible speech is in given conditions or the proportion of a speaker's output that a listener can readily understand.
  • an example device 100 for improving intelligibility of a subject can include one or more processors 101.
  • the one or more processors 101 can include a regression model 102.
  • the regression model 102 can be a software and/or an instruction operable when executed by the one or more processors to operate the device 100.
  • the regression model 102 can be configured to cause the device to receive and analyze an electroencephalogram (EEG) data 103 and/or sound data 104.
  • EEG electroencephalogram
  • the regression model 102 can estimate temporal response functions (TRFs) 105 that can map a stimulus to the subject’s brain.
  • TRFs temporal response functions
  • Temporal response functions can describe the speech-EEG mapping within that latency-window for each EEG channel.
  • the TRF can be obtained by a regression model that estimates a filter to optimally predict the neural response from the stimulus features (e,g., a forward model).
  • the input of the regression can include time-shifted versions of the stimulus features, so that the time-lags in the latency-window of interest can be simultaneously considered.
  • the regression weights can reflect the relative importance between time-latencies to the stimulus-EEG mapping.
  • the disclosed subject matter can infer the temporal dynamics of the speech responses using the regression model.
  • the reliability of the TRF models can be assessed using leave-one-out cross-validation across trials, which quantifies the EEG prediction correlation (Pearson’s r) on unseen data while controlling for overfitting.
  • the regression model 102 can analyze the EEG data 103 and extract various features to identify the relationship (e.g., coupling or TRFs) between the EEG signals 103 and the sound source 104. For example, high-level and/or low-level linguistic features can be extracted from the EEG signals and sound source by comparing the EEG signals and the sound sources.
  • the high- level linguistic features can include a phonetic feature, a phonotactic feature, a semantic feature, and a combination thereof.
  • the low-level linguistic features can include an envelope and/or a spectrogram of the sound source.
  • the regression model can be a linear regression model or a non-linear regression model.
  • machine learning techniques e.g., convolutional neural network models (CNN)
  • CNN convolutional neural network models
  • the disclosed models can capture nonlinear relationships between the signals in addition to estimating prior probabilities, which can improve prediction accuracy.
  • the phonetic feature can include a cohort size at each phoneme, a cohort reduction variable, or a combination thereof.
  • the cohort of a speech can be defined as the set of words or lexical units that match the acoustic input of a word at any time point during the expression of the word.
  • the cohort of each phoneme can be defined by selecting all the lexical items in the dictionary that have a similar phoneme sequence, starting at the beginning of the lexical unit, to that of the phoneme sequence from the beginning of the word to the current phoneme.
  • the cohort size can be estimated for each phoneme by calculating the log of the word numbers in the defined cohort set.
  • the phonotactic feature can include phonotactic probabilities.
  • the regression model can be configured to estimate the likelihood that each phoneme sequence pi . k composing a word pi ..n , where 1 ⁇ k ⁇ n, belongs to the sound source (e.g., language). Then, the regression model can estimate the phonotactic score (i.e., inverse phonotactic likelihood) corresponding to the negative logarithm of the likelihood of a sequence.
  • the phonotactic probabilities can be described by the cohort reduction variable.
  • the phonotactic feature can include prosody, phonological neighborhood, semantic neighborhood, or combinations thereof.
  • the semantic feature can include a semantic vector, which represents semantic dissimilarity.
  • Semantic dissimilarity can be quantified as the distance of a word with the preceding semantic context resulting in a sparse vector that marks all content words with larger values for more dissimilar words.
  • the semantic dissimilarity TRFs can be estimated.
  • the regression model can extract low-level linguistic features from the EEG signals.
  • the low-level linguistic features can include an envelope and/or a spectrogram of the sound source.
  • TRFs between amplitude envelope and the EEG signals can be estimated to determine the time dependencies.
  • the regression model can extract and analyze a spectrogram, which can provide additional information for the EEG signal.
  • the spectrograms can be estimated using a model of cochlear frequency analysis, and the envelope of the sound source can be estimated by averaging over the amplitude envelope of each frequency in the spectrogram for each time point.
  • the disclosed device can include one or more sensor components. As shown in FIG. 1, the sensor component 106 and the processor 101 can be coupled via a wired connection and/or a wireless connection.
  • the sensor component 106 can include a plurality of electrodes 107 for measuring an electroencephalogram (EEG) data of the subject when the subject is exposed to at least one sound source (e.g., speech) 108.
  • EEG electroencephalogram
  • the electrodes 107 responsive to the sound source can be identified by comparing the measured EEG responses 103 to the sound source with EEG signals that are recorded during the silence.
  • the regression model can be trained for each linguistic feature.
  • the regression model can receive training data.
  • the training data can include EEG data of the subject exposed to various sound sources.
  • the sound sources can include natural speech stories spoken by several male and female speakers to account for the natural variability of speech.
  • the regression model can be trained through machine-learning techniques. For example, a deep learning technique can be used for training the regression model.
  • the device can be configured to assess the language proficiency and nativeness level of the subject.
  • the device can be configured to decode the language proficiency and the nativeness from the brain data (e.g., EEG data) and the extracted linguistic features.
  • the TRFs for the envelope, the spectrogram, the phonetic feature, the phonotactic feature, the semantic feature, or a combination thereof can be used for assessing the language proficiency and nativeness level of the subject.
  • the disclosed device can perform a multilinear principal component analysis (MPCA) on the TRF of each sound descriptor independently.
  • MPCA multilinear principal component analysis
  • the language proficiency and nativeness level of the subject can be determined by comparing the value of TRF with a control data (e.g., TRFs of native speakers, non-native speakers, low- proficiency level speakers, high-proficiency level speaker, etc.).
  • a control data e.g., TRFs of native speakers, non-native speakers, low- proficiency level speakers, high-proficiency level speaker, etc.
  • the disclosed device can calculate Pearson’s correlations for determining the language proficiency and nativeness level of the subject.
  • the device can be configured to perform auditory attention decoding (AAD) to distinguish a target sound from non-target sounds.
  • AAD auditory attention decoding
  • the device can be used for distinguishing the attended speaker in a multi-speaker environment from the unattended speaker using the EEG data of a user.
  • the disclosed device can further adjust the volume of the target and the non-target sounds for facilitating hearing.
  • the device 100 can include one or more sound input components 109 for collecting sounds 108 and one or more sound output components 110.
  • the device 100 can collect sounds 108 through the sound input components 109 and amplify the target sound and/or decrease non-target sounds to facilitate hearing through the output components 110.
  • the one or more sound input 109 and out components 110 can be coupled to the one or more processors 101 via a wired connection or a wireless connection.
  • the disclosed device can be trained with various sound stimuli/sources and/or EEG data. For example, the disclosed device can analyze the stimuli and the brain responses to concatenate them using the training data. In non-limiting embodiments, the disclosed device can estimate TRFs for the linguistic features to concatenate the stimuli to the brain responses.
  • the linguistic features can include the envelope, the spectrogram, the phonetic feature, the phonotactic feature, the semantic feature, or a combination thereof.
  • the trained device can map the concatenated stimuli to the brain responses.
  • the trained device can be configured to predict a sound stimulus, a sound feature, or a combination thereof based on a measured EEG data of the subject.
  • the disclosed device instead of using encoding models to predict the neural responses from the brain signals (e.g., forward modeling), the disclosed device can perform the opposite (e.g., backward modeling). In this approach, the disclosed device can decode the features from the neural signals using linear or nonlinear regression models.
  • the disclosed decoding approaches can have the advantage that the correlational structure between EEG recordings can be modeled, and the decoding approaches can result in improved decoding accuracy.
  • an exemplary method 200 can include receiving electroencephalogram (EEG) data of the subj ect, wherein the subj ect is exposed to a target sound 201, estimating a linguistic feature of the target sound using a regression model 202, and decoding the linguistic features from the EEG data 203.
  • the linguistic feature can be selected from the group consisting of a phonemic feature, a phonotactic feature, a semantic feature, and combinations thereof.
  • the method can include training the regression model by providing training data 301, wherein the training data comprises electroencephalogram (EEG) data, linguistic feature data, or a combination thereof.
  • the training data can be obtained from the subject and/or others.
  • the training data can be used for machine learning (e.g., deep learning) to concatenate the sound inputs/stimuli and the brain responses using the training data.
  • the method can include measuring electroencephalogram (EEG) data from the subject 304 when the subject is exposed to the target sound.
  • EEG data can be obtained through one or more sensor components (e.g., electrodes).
  • the measured EEG data can be used for assessing the accuracy of the decoding 407 by predicting a brain response of the subject exposed to the target sound.
  • the method can further include estimating a temporal response function (TRF) for the linguistic features.
  • TRF temporal response function
  • the regression model can be utilized for estimating the TRF for the envelope, the spectrogram, the phonetic feature, the phonotactic feature, the semantic feature, or a combination thereof.
  • the TRFs for the linguistic features can be used for assessing the language proficiency and nativeness level of the subject 305. The language proficiency and nativeness level of the subject can be determined by comparing the value of TRF with a control data (e.g., TRFs of native speakers, non-native speakers, low-proficiency level speakers, high-proficiency level speaker, etc.).
  • Pearson’s correlations can be performed to compare the TRF values of the subject to the control group.
  • the term “Pearson’s correlation” refers to a measure of the strength of the association between the two variables. It can be calculated by dividing the covariance of the two variables by the product of their standard deviations.
  • the method can further include measuring additional linguistic features. For example, envelopes and/or an auditory spectrogram of the target sound can be estimated and used for improving the accuracy of the predicted responses.
  • the additional linguistic features can be used in combination with the phonemic feature, phonotactic feature, the semantic property to improve the accuracy.
  • the phonemic feature can include a cohort size at each phoneme, a cohort reduction variable, or a combination thereof.
  • the phonotactic feature can include a phonotactic probability.
  • the semantic property can include a semantic vector.
  • the method can further include assessing accuracy of the decoding 407 by predicting a brain response of the subject exposed to the target sound 406.
  • This technique can be used for performing auditory attention decoding (AAD) to distinguish a target sound from non-target sounds.
  • AAD auditory attention decoding
  • Example provided as merely illustrative of the disclosed methods and systems, and should not be considered as a limitation in any way.
  • L2 Learning a second language
  • L2 can be a challenging process that differs from the native language (LI) acquisition.
  • Age at the time of learning, frequency of exposure to the language, and interest can be factors that can make a person more or less successful in mastering a second language.
  • Younger learners can generally achieve proficiency levels that are more native-like than older learners. Even in the presence of optimal conditions, adult learners rarely fully master a second language with native-like proficiency.
  • One typical difference can be the “foreign accent” that can characterize L2 speakers, who tend to carry LI features to their L2 (e.g., phonetic).
  • Similar considerations can be made for speech listening, which can elicit different cortical patterns in LI and L2 users and for different L2 proficiency levels.
  • the precise neural underpinnings of L2 listening and the differences with LI processing remain poorly understood and somewhat controversial.
  • One issue that can shed light on L2 acquisition can be determining how increased proficiency shapes the cortical correlates of speech listening. Furthermore, the ability to objectively assess such a transformation can provide information regarding whether and how closely the brain dynamics for increasing L2 proficiency-levels converge to LI processing.
  • the disclosed subject matter can utilize a framework, which includes a linear modeling approach that allows deriving objective measures of speech perception at multiple linguistic levels from a single electrophysiological recording during which participants listen to the continuous natural speech, to investigate how proficiency shapes the hierarchical cortical encoding of L2 speech, and how that differs from LI speech encoding.
  • the disclosed subject matter can be used for analyzing speech processing at the levels of sound acoustics, phonemes, phonotactics, and semantics.
  • the disclosed subject matter was utilized to assess the proficiency to modulate brain responses to all the investigated linguistic properties.
  • Phoneme encoding was expected to become more native-like with proficiency, but not full converge.
  • the processing of phonotactics can occur, at least in part, as a form of implicit learning. Therefore the encoding of phonotactics was expected to gradually become more native-like with proficiency, a change that would all proficiency levels, even with no speech comprehension.
  • the disclosed subject matter was also utilized for identifying whether L2 phonotactics is encoded separately from LI or it remains influenced by LI statistics.
  • the disclosed subject matter also includes EEG analyses on word-pair listening, which can indicate that the encoding of semantic dissimilarity can be molded by proficiency.
  • the disclosed subject matter was also utilized to combine cortical measures of acoustic and linguistic speech processing to assess the differences between LI and L2 encoding during natural speech listening showing effects of nativeness, proficiency, and a difference between LI and L2 perception that goes beyond proficiency.
  • the subjects who got C2 and Cl were classified as Proficient user, while the subjects who got B-level scores were defined as Independent user, who got Al and A2 were categorized as a Basic user, and who got A0 were characterized as No English.
  • Each subject reported no history of hearing impairment or neurological disorder, provided written informed consent, and was paid for their participation.
  • the experiment was carried out in a single session for each participant. EEG data were recorded from 62 electrode positions, digitized at 512 Hz using a BioSemi Active Two system. Audio stimuli were presented at a sampling rate of 44,100 Hz using Sennheiser HD650 headphones and Presentation software. Testing was carried out in a dark room, and subjects were instructed to maintain visual fixation on a crosshair centered on the screen and to minimize motor activities while music was presented.
  • EEG signals were digitally filtered between 1 and 15 Hz using a Butterworth zero-phase filter (low- and high-pass filters both with order 2 and implemented with the function), and down-sampled to 64 Hz.
  • EEG channels with a variance exceeding three times that of the surrounding ones were replaced by an estimate calculated using spherical spline interpolation. All channels were then re-referenced to the average of the two mastoid channels with the goal of maximizing the EEG responses to the auditory stimuli.
  • Fig. 5 A shows an example design of the disclosed subject matter. Participants 501 listened to natural speech sentences 502 while 64-channel EEG signals 503 were recorded. Speech descriptors 504 were extracted from the audio waveform capturing various acoustic and linguistic properties. Multivariate linear regression models were fit between each speech descriptor and the preprocessed EEG signal to describe the temporal response function at various linguistic levels. As shown in Fig. 5B, speech descriptors were extracted from the audio waveform to capture the speech acoustics, phonemes, phonotactics, and semantic dissimilarity.
  • Stimuli and experimental procedure EEG data were collected in a sound-proof, electrically shielded booth in dim light conditions. Participants listened to short stories narrated by two speakers (1 male) while minimizing motor movements and maintaining visual fixation on a crosshair at the center of the screen. The male and the female narrators were alternated to minimize speaker-specific electrical effects. Stimuli were presented at a sampling rate of 44,100 Hz, monophonically, and at a comfortable volume from loudspeakers in front of the participant. Each session consisted of 20 experimental blocks (3 min each), divided in five sections that were interleaved by short breaks. Participants were asked to attend to speech material from seven audio-stories that were presented in a random order.
  • the engagement to the speech material was assessed by means of behavioural tasks.
  • L2 participants were asked three questions at the end of each block. First, participants wereasked whether the last sentence of the section was spoken by a male or female speaker. Next, participants were asked to identify 3-5 words with high-frequency in the sentence from a list of eight words. Third, participants performed a phrase-repetition detection task. Specifically, the last two to four words were repeated immediately after the end of some of the sentences (1-5 per block). Given that our target was monitoring attention, a finger-tip clicker was used to count the repetitions so that they would be engaged in a detection and not counting task, which would instead require additional memory resources and, potentially, reduce their engagement to the main listening task. Participants were asked to indicate how many sentences in the story presented these repetitions at the end of each block. To assess attention in LI participants, three questions about the content of the story were asked after each block. All LI participants were attentive and able to all answer correctly at least 60% of the questions
  • Speech features The coupling between the EEG data and various properties of the speech stimuli were assessed. These properties were extracted from the stimulus data based on previous research.
  • a set of descriptors summarizing low-level acoustic properties of the music stimuli was defined. Specifically, a time-frequency representation of the speech sounds was calculated using a model of the peripheral auditory system consisting of three stages: (1) a cochlear filterbank with 128 asymmetric filters equally spaced on a logarithmic axis, (2) a hair cell stage consisting of a low-pass filter and a nonlinear compression function, and (3) a lateral inhibitory network consisting of a first-order derivative along the spectral axis.
  • the envelope was estimated for each frequency band, resulting in a two-dimensional representation simulating the pattern of activity on the auditory nerve (the relevant Matlab code is available at https://isr.umd.edu/Labs/NSL/Software.htm).
  • This acoustic spectrogram (S) was then resampled to 16 bands.
  • a broadband envelope descriptor (E) was also obtained by averaging all envelopes across the frequency dimension.
  • the half-way rectified first derivative was used as an additional descriptor, which was shown to contribute to the speech-EEG mapping and was used here to regress out the most acoustic-related responses as possible. Additional speech descriptors were defined to capture neural signatures of higher-order speech processing.
  • the speech material was segmented into time-aligned sequences of phonemes using the Penn Phonetics Lab Forced Aligner Toolkit, and the phoneme alignments were then manually corrected using Praat software.
  • Phoneme onset times were then encoded in an appropriate univariate descriptor (Pon), where ones indicate an onset and all other time samples were marked with zeros.
  • An additional descriptor was also defined to distinguish between vowels and consonants (Pvc). Specifically, this regressor consisted of two vectors, similar to Pon, but marking either vowels or consonants only. While this information was shown to be particularly relevant when describing the cortical responses to speech, there remains additional information on phoneme categories that contributes to those signals.
  • This information was encoded in a 19-dimensional descriptor indicating the phonetic articulatory features corresponding to each phoneme (Phn).
  • Phn The Phn descriptor encoded this categorical information as step functions, with steps corresponding to the starting and ending time points for each phoneme.
  • phonotactic probability information was encoded in an appropriate two-dimensional vector (Pt).
  • Probabilities were derived by means of the BLICK computational model, which estimates the probability of a phoneme sequence to belong to the English language. This model is based on a combination of explicit theoretical rules from traditional phonology and a maxent grammar which find optimal weights for such constraints to best match the phonotactic intuition of native speakers.
  • the phonotactic probability was derived for all phoneme sub-sequences within a word (phi . k , 1 ⁇ k ⁇ n, where n is the word length) and used to modulate the magnitude of a phoneme onset vector (Pti).
  • a second vector was produced to encode the change in phonotactic probability due to the addition of a phoneme (phi .
  • the weights of this layer are the features used to describe each word in a 400-dimensional space capturing the co occurrence of a content word with all others. In this space, words that share similar meanings will have a closer proximity.
  • the semantic dissimilarity indices are calculated by subtracting from 1 the Pearson’s correlation between a word’s feature vector and the average feature vector across all previous words in that particular sentence (the first word in a sentence was instead correlated with the average feature vector for all words in the previous sentence). Thus, if a word is not likely to co-occur with the other words in the sentence, it does not correlate with the context, resulting in a higher semantic dissimilarity value.
  • the semantic dissimilarity vector (Sem) marks the onset of content words with their semantic dissimilarity index.
  • TRF Temporal response functions
  • the regression weights reflect the relative importance between time-latencies to the stimulus-EEG mapping and were here studied to infer the temporal dynamics of the speech responses.
  • a time-lag window of 0-600 ms was used to fit the TRF models which is thought to contain most of the EEG responses to speech of interest.
  • the reliability of the TRF models was assessed using a leave-one-out cross-validation procedure (across trials), which quantified the EEG prediction correlation (Pearson’s r) on unseen data while controlling for overfitting. Note that the correlation values are calculated with noisy EEG signal, therefore the r-scores can be highly significant even though they have low absolute values (r ⁇ 0.1 for sensor-space low-frequency EEG).
  • Stimulus descriptors at the levels of acoustics, phonemes, phonotactics, and semantics were combined in a single TRF model fit procedure. This strategy was adopted with the goal of discerning EEG responses at different processing stages. In fact, larger weights are assigned to regressors that are most relevant for predicting the EEG. For example, a TRF derived with Pt alone can reflect EEG responses to phonotactics and phoneme onset. A TRF based on the combination of Pt and Pon would instead discern their respective EEG contributions, namely by assigning larger weights to Pt for latencies that are most relevant to phonotactics.
  • TRF weights constitute good features to study the spatio-temporal relationship between a stimulus feature and the neural signal.
  • studying this relationship for multivariate speech descriptor, such as Phn requires the identification of criteria to combine multiple dimensions of TRF weights.
  • the relative enhancement in EEG prediction correlation was considered when Phn was included in the model, thus allowing to discern the relative contribution of phonetic features to the neural signal.
  • This isolated index of phoneme-level processing was also shown to correlate with psychometric measures of phonological skills. Additional analyses were conducted with a generic modelling approach. Specifically, one generic TRF model was derived for each of the groups A, B, C, and LI by averaging the regression weights from all subjects within the group. Then, EEG data from each left-out subject (whose data was not included in the generic models) was predicted with the four models. The four prediction correlations were used as indicators of how similar the EEG signal from a subject was to the one expected for each of the four groups, providing a simple classifier.
  • Proficiency-level decoding Linear regression, Gradient boosting regression, and classification were performed. Grid-search for hyperparameter selection (number of trees, depth tree) and backward elimination. Nested-loop Cross-validation and Feature selection were performed by means of factor analysis.
  • EEG 62-channel EEG was recorded from 74 participants as they listened to continuous speech sentences in the English language. 52 of them were native Chinese speakers with English as a non-native language. English proficiency was assessed by means of a standardized test of receptive skills that assigned participants to seven different CEFR levels (Common European Framework of Reference for languages). No English (A0 level), Basic user (A1 and A2 levels), Independent user (B1 and B2 levels), and Proficient user (Cl and C2 levels). The remaining 22 participants were native English speakers. To investigate the low- versus higher-level brain processing speech, linear regression models were used to measure the coupling between the low-frequency cortical signals (1-8 Hz) with progressively more abstract properties of the linguistic input. This procedure allowed, for the first time, to simultaneously assess the processing of L2 speech processing at several distinct linguistic levels with a single listening experiment based on the ecologically-valid speech stimuli.
  • Envelope TRF Forward TRF models were fit between E and the EEG signals (TRFE) for a broad time-latency window from 0 to 600ms that can capture the time dependencies of interest. Leave-one-out cross-validation indicated that the resulting TRF models can reliably predict the EEG signal for all subjects (renv > 0, permutation test with p ⁇ 0.01 , where renv indicates the average EEG prediction correlation across all electrodes when using TREE).
  • Fig. 6A shows the model weights of TREE after averaging across subjects for LI and for each of the L2 proficiency groups A 601, B 602, and C 603(averaged across all electrodes).
  • TRFs for all groups appear strongly temporally synchronized, which was expected for cortical responses to low-level acoustic properties. Furthermore, significant correlations between proficiency and the TREE magnitude emerged for the negative components at speech-EEG latencies between 60 and 80ms (p ⁇ 0.05, FDR-corrected Pearson’s correlation). TREE for LI 604 participants was more strongly correlated with A than B and C participants, showing significant differences with the TRFs for B and C. The topographical distribution of these TRFs did not show differences for distinct participant groups.
  • the low-level auditory responses were modeled by considering the acoustic spectrogram (S), which was shown to be a better predictor of the EEG signal.
  • S acoustic spectrogram
  • TREE and TRFS were promising in that they showed specific speech-EEG latencies sensitive to the effects of proficiency and nativeness, an ability to assess the functional origins of those effects is hindered by the inherent strong correlation between the acoustic spectrogram and higher-order properties such as phonemes.
  • additional analyses were performed by investigating the TRFs in response to higher-order properties of speech.
  • Phoneme TRF Phonetic feature information was represented by the categorical descriptor F, where the occurrence of a phoneme is marked by a rectangular pulse for each corresponding phonetic features (e,g., speech features). Because of the correlation between phoneme and acoustic descriptors, TRFs were fit between their concatenation and the EEG signal. Here, the acoustic descriptor played the role of a nuisance regressor, meaning that it reduced the impact of acoustic-only responses onto the TRF model for phonetic features (TRFF).
  • TRFF phonetic features
  • the acoustic descriptor A which consists of the concatenations of the acoustic spectrogram with the half-way rectified first derivative of the envelope.
  • Fig. 6B and 6D show the resulting TRFs for phonetic features (nuisance regressor weights are not shown) and their linear projection to phoneme TRFs respectively (TRFPh).
  • MDS multidimensional scaling analysis
  • Phonemes in A-, B-, and C-MDS space were then projected to the LI -MDS space by means of a Procrustes analysis.
  • the effect of proficiency on individual phonemic contrasts can also be assessed by means of 2- dimensional plots summarising the position of each phoneme in the Ll-MDS space.
  • Fig. 6C shows distance between LI and L2 phonemes for each language proficiency group calculated at the electrode Cz in the Ll-MDS space.
  • FIG. 6D shows topographies depicting the TRF weights for selected speech-EEG time-latencies after averaging the weights across all phonetic features. The weights reflect the relative importance between time-latencies to the stimulus-EEG mapping and allow to infer the temporal dynamics of the speech responses.
  • Fig. 6E shows this information, with different grey scales indicating phonemes for LI and L2 participants, respectively.
  • Phonotactics TRF In a given language, certain phoneme sequences are more likely to be valid speech tokens than others. This is due to language-specific regularities that can be captured, for example, by means of statistical models.
  • One of such models, called BLICK was used to estimate the likelihood that each phoneme sequence pi.. k composing a word pi ..n, where 1 ⁇ k ⁇ n, belongs to the language.
  • This model returns the phonotactic score (inverse phonotactic likelihood) corresponding to the negative logarithm of the likelihood of a sequence. Larger values correspond to less likely (more surprising) sequences. These values were used to mark phoneme onsets in a vector, where all other time points were assigned to zero. Fig.
  • TRF for LI participants 701 was qualitatively similar to the ones described in the literature, with a significant negative component from time-latencies between about 300 to 500 ms.
  • TRF patterns for L2 participants e.g., A 702, B 703, and C 704
  • the topographical patterns in Fig. 7A further clarify that the effect of proficiency begins to emerge at about 80 ms, while previous time latencies showed similar responses for LI and all L2 proficiency groups.
  • Semantic dissimilarity TRF A similar analysis was conducted based on semantic dissimilarity rather than phonotactic scores. Specifically, a 300-dimensional feature space defines according to the Word2Vec algorithm. Then, semantic dissimilarity was quantified as the distance of a word with the preceding semantic context, thus resulting in a sparse vector that marks all content words with larger values for more dissimilar words (e.g., speech features) . Fig. 7B shows the semantic dissimilarity TRFs for five selected scalp channels. TRFs for LI participants 705 were consistent with the results shown by Broderick and colleagues, with a centro-parietal negativity peaking at peri-stimulus latencies of 360-380 ms.
  • Decoding language proficiency and nativeness The results indicate that both language proficiency and nativeness shape the cortical responses to speech at various processing stages.
  • information at distinct hierarchical levels was combined to decode the language proficiency and nativeness of the participants.
  • the first set of features for the decoding was extracted from the TRFs for E, S, F, Pt, and Sd. Because these information spaces along three dimensions, namely EEG channels, time latencies, and stimulus features (e.g., phonetic features), a multilinear principal component analysis (MPCA) was performed on the TRF of each speech descriptor independently and the first component was retained.
  • MPCA multilinear principal component analysis
  • similarities of a participant’s EEG signal with the ones of all other subjects were calculated by means of a generic modeling approach. Specifically, a generic TRF was calculated for the A, B, C, and LI groups by averaging the TRFs of all subjects within each group. Then, Pearson’s correlations were calculated between the EEG signal of a left out subject and the EEG predictions for each proficiency group. Finally, the last feature included in the decoding was the enhancement in EEG prediction correlation due to phonetic features (FA-A), which was previously suggested to isolate phoneme-level responses. A gradient boosting regression (based on decision trees) was run on the combination of the resulting features to decode the English proficiency on the 52 non native speakers. Fig.
  • the attended and unattended speech can be represented in the electrophysiological recordings of a user at different strengths and latencies.
  • This representation of the multi-speaker speech provides the opportunity of decoding the attended speech from the electrophysiological signal of a user, an example of the applications of which is in designing hearing aids for helping patients with hearing disabilities.
  • Forward Auditory attention decoding can be performed by using a regularized linear regression model to reconstruct the brain signal from the attended and unattended stimulus and comparing the reconstruction result with the original brain signal and determining the attended speaker. This can be performed by extracting the envelope and acoustic features of the attended and unattended speech.
  • a speech can be broken down into and represented in multiple levels. Low-level and high-level linguistic features can be represented in the electrophysiological recordings. At the lowest level, a speech can be represented by its acoustic features such as its frequency content and envelope. Moving into higher levels, a speech can be categorized into distinct units of sounds called phonemes (e.g., /b/ and /d/ as in /bad/ versus /dad/). Phonotactic probabilities have to do with the chance of occurrence of a specific sequence of phonemes in a specific language.
  • Combinations of syllables form high-level linguistic components that are words, conveying semantic information (Fig. 8A).
  • the addition of high-level features to acoustic features has been proven to enhance the reconstruction of brain signals in a single speaker task. This raises the question of whether these high-level features can be used to identify the attention of a subject.
  • Various features, at different linguistic levels that are extracted from the same speech share some commonalities while having their own unique pattern.
  • the representation of these features in the electrophysiological recordings raises the question of whether a model based on the integration of all these features for AAD can achieve better performance compared to models that only use the envelope of the speech.
  • Stimuli and procedure Subjects were placed in a sound-proof booth, which was electrically shielded, where their EEG data were collected. Stimuli, Multiple short stories, were played to subjects, monophonically with a speaker, placed in front of the participants. The speaker’s volume was set to a comfortable, loud enough, and constant level. Participants were instructed to listen to a multi-speaker speech containing multiple short stories, simultaneously, told by two speakers (one male and one female voice actor). The experiment consisted of 16 blocks. Between each block, a short break was given to the participants. Participants were instructed to only attend to one speaker, as specified before the block, during each block. Subjects were asked to attend to the male speaker in the initial block and to switch their attention after each block subsequently.
  • EEG data were acquired using a g.HIamp bio-signal amplifier (Guger Technologies) with 62 active electrodes assembled on an elastic cap (10-20 enhanced montage) at a sampling rate of 2kHz.
  • the ground and the reference were subsequently fixed by using a separate frontal electrode (AFz) and taking the average of two earlobe electrodes.
  • the earlobe electrodes were chosen as a reference due to the highly correlated activity across electrodes, which can make common reference averaging unfunctional.
  • the channel impedances were maintained below 20k ( ⁇ ohm) at all times.
  • an online fourth-order high-pass Butterworth filter at 0.01 Hz was applied to the EEG data.
  • the cohort of a speech can be defined as the set of words or lexical units that match the acoustic input of a word at any time point during the expression of the word.
  • the cohort of each phoneme was defined by selecting all the lexical items in the dictionary that had a similar phoneme sequence, starting at the beginning of the lexical unit, to that of the phoneme sequence from the beginning of the word to the current phoneme. Then each phoneme, the cohort size was calculated by taking the log of the number of words in the cohort set.
  • the cohort reduction variable was defined to be equal to the cohort size at the current phoneme minus the cohort size at the previous phoneme or in case of the initial phoneme, the log of the number of words in the dictionary.
  • the AAD performance improves if the phonotactic probabilities was described by the Cohort Reduction variable.
  • each word was quantified by using its equivalent 25- dimensional global vectors for word representation (GloVe) from a pre-trained dictionary based on the twitter texts.
  • GloVe word representation
  • TRFs Temporal Response functions
  • a GLM model was employed with regularization using the MATLAB mTRF toolbox.
  • the leave-one-out method was employed.
  • a GLM can map the concatenated stimuli (at different lags ranging from 0 to 650 ms) to the concatenated brain responses. Having the GLM parameters, the brain response for the target trial was assessed, and the prediction with the actual brain response was comparted.
  • the topoplot of the coefficients of the GLM model (averaged across all trials) for an attended and unattended speech during various time windows.
  • Figures 8B-8D show the method used to find the attended speaker from the linguistic features and the envelope.
  • a generalized linear model (Fig. 8C) was trained for each trial based on all the other trials to estimate and predict the EEG response of the brain based on the attended and unattended stimuli of that trial (each linguistic feature and addition of all linguistic features to the envelope).
  • the predicted brain responses were compared to the actual brain responses and the speaker with a higher correlation was selected as the attended speaker (Fig. 8D).
  • the AAD was done using, the envelop, phonemic, phonotactic, semantic features, and the combination of all features to examine whether models that use higher linguistic features for AAD are more advantageous.
  • Fig. 9A shows the results for AAD for each stimulus.
  • the accuracy of auditory attention decoding was evaluated by predicting the brain’s response at each time point with a forward linear regression decoder from the envelope of the speech (env), its phonetic vectors (phn), phonotactic vectors (phon), semantic features (sem), and their combination.
  • env envelope of the speech
  • phn phonetic vectors
  • phon phonotactic vectors
  • semantic features semantic features
  • Fig. 9A also shows the accuracy of auditory attention decoding based on the integration of all linguistic features and envelope (env+phn+phon+sem).
  • Fig. 9B compares the accuracy of AAD when combining all features versus only using the envelope of speech. As can be seen from Fig. 9B, AAD accuracies improved in the majority of subjects when higher linguistic features were included in the regression model.
  • Fig. 9C compares the relative improvement in AAD to the AAD accuracy using the envelope. On average the AAD performance was improved 20.7 percent. According to the Fig. 9, there is a strong negative correlation (-0.81) between the relative improvement and the envelope accuracy, suggesting that including higher- level linguistic features can especially be helpful in improving the performance of AAD for subjects that have a bad AAD performance based on the envelope of the stimulus.
  • relative improvement multi-feature accuracy-envelope accuracy envelope accuracy (1)
  • Figs. 10A-10E compare the correlation between reconstructed EEG and actual EEG for both attended and unattended models for each EEG Channel (normalized for each subject and then averaged across all subjects).
  • the topoplots of the attended and unattended model weights, trained on the combination of all features for various time intervals can be found in Fig. 11.
  • the disclosed subject matter was used for showing that higher-level linguistic features such as phonemic, phonotactic, and semantic features in the AAD models can improve the performance of auditory attention decoding.
  • the disclosed subject matter provided the followings: A) higher linguistic features on their own are capable of decoding the attention from the neural data and their accuracies are above chance and comparable to that of using the envelope in AAD, B) a model is established based on the addition of all the low level and high-level linguistic features and the envelope and achieved higher accuracy in the attention decoding task ( on average 5.6% improvement from 58.12% when using the envelope stimulus to 63.75% when using the combination of linguistic stimuli) which shows that adding higher level linguistic features to the envelope of the speech, can improve the AAD performance.
  • the disclosed subject matter shows that the primary auditory cortex (AC) encodes the speech of both attended and unattended in a multi-speaker task regardless of attention, while the nonprimary auditory cortex selectively encodes the attended speech and masks out the unattended one.
  • AC auditory cortex
  • nonprimary auditory cortex selectively encodes the attended speech and masks out the unattended one.
  • the higher areas in the brain mostly encode higher linguistic features, especially of the attended speech, including high-level linguistic cues allowsto focus the AAD analysis on these higher areas and nonprimary AC.
  • ASR automatic speech recognition models
  • Linear AAD models using high-level linguistic features are not capable of extracting the nonlinear encoding of these features in the neural data.
  • the linguistic features are inherently correlated, and this correlation prevents the linear models from achieving any further improvement in the performance.
  • this correlation prevents the linear models from achieving any further improvement in the performance.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Psychiatry (AREA)
  • Psychology (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

L'invention concerne un système et des procédés d'amélioration de l'intelligibilité d'un sujet. Le système peut comprendre un ou plusieurs processeurs comprenant un modèle de régression. Le modèle de régression peut être configuré pour recevoir des données d'électroencéphalogramme (EEG) du sujet, estimer une caractéristique linguistique à partir d'un son cible, les données d'EEG, ou une combinaison de ceux-ci, et décoder les caractéristiques linguistiques à partir des données d'EEG. La caractéristique linguistique peut être sélectionnée dans le groupe consistant en un élément phonémique, un élément phonotactique, un élément sémantique et des combinaisons de ceux-ci.
PCT/US2020/047232 2019-08-20 2020-08-20 Mesure de la compétence linguistique à partir de données d'électroencéphalographie WO2021035067A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962889478P 2019-08-20 2019-08-20
US62/889,478 2019-08-20

Publications (1)

Publication Number Publication Date
WO2021035067A1 true WO2021035067A1 (fr) 2021-02-25

Family

ID=74660350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/047232 WO2021035067A1 (fr) 2019-08-20 2020-08-20 Mesure de la compétence linguistique à partir de données d'électroencéphalographie

Country Status (1)

Country Link
WO (1) WO2021035067A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031766A (zh) * 2021-03-15 2021-06-25 哈尔滨工业大学 一种通过脑电解码汉语发音的方法
CN114328939A (zh) * 2022-03-17 2022-04-12 天津思睿信息技术有限公司 基于大数据的自然语言处理模型构建方法
CN117475182A (zh) * 2023-09-13 2024-01-30 江南大学 基于多特征聚合的立体匹配方法
CN117475182B (zh) * 2023-09-13 2024-06-04 江南大学 基于多特征聚合的立体匹配方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074667A1 (en) * 2002-11-22 2006-04-06 Koninklijke Philips Electronics N.V. Speech recognition device and method
US20060074664A1 (en) * 2000-01-10 2006-04-06 Lam Kwok L System and method for utterance verification of chinese long and short keywords
US20110159467A1 (en) * 2009-12-31 2011-06-30 Mark Peot Eeg-based acceleration of second language learning
US20140148724A1 (en) * 2011-08-03 2014-05-29 Widex A/S Hearing aid with self fitting capabilities
WO2017068414A2 (fr) * 2015-10-23 2017-04-27 Siemens Medical Solutions Usa, Inc. Génération de représentations en langage naturel d'un contenu mental à partir d'images cérébrales fonctionnelles
US20180092567A1 (en) * 2015-04-06 2018-04-05 National Institute Of Information And Communications Technology Method for estimating perceptual semantic content by analysis of brain activity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074664A1 (en) * 2000-01-10 2006-04-06 Lam Kwok L System and method for utterance verification of chinese long and short keywords
US20060074667A1 (en) * 2002-11-22 2006-04-06 Koninklijke Philips Electronics N.V. Speech recognition device and method
US20110159467A1 (en) * 2009-12-31 2011-06-30 Mark Peot Eeg-based acceleration of second language learning
US20140148724A1 (en) * 2011-08-03 2014-05-29 Widex A/S Hearing aid with self fitting capabilities
US20180092567A1 (en) * 2015-04-06 2018-04-05 National Institute Of Information And Communications Technology Method for estimating perceptual semantic content by analysis of brain activity
WO2017068414A2 (fr) * 2015-10-23 2017-04-27 Siemens Medical Solutions Usa, Inc. Génération de représentations en langage naturel d'un contenu mental à partir d'images cérébrales fonctionnelles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WONG DANIEL D.E., FUGLSANG SØREN A., HJORTKJÆR JENS, CEOLINI ENEA, SLANEY MALCOLM, CHEVEIGNÉ ALAIN DE: "A Comparison of Temporal Response Function Estimation Methods for Auditory Attention Decoding", BIORXIV, 13 March 2018 (2018-03-13), XP055795286, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/281345v2.full.pdf> [retrieved on 20201019] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031766A (zh) * 2021-03-15 2021-06-25 哈尔滨工业大学 一种通过脑电解码汉语发音的方法
CN114328939A (zh) * 2022-03-17 2022-04-12 天津思睿信息技术有限公司 基于大数据的自然语言处理模型构建方法
CN114328939B (zh) * 2022-03-17 2022-05-27 天津思睿信息技术有限公司 基于大数据的自然语言处理模型构建方法
CN117475182A (zh) * 2023-09-13 2024-01-30 江南大学 基于多特征聚合的立体匹配方法
CN117475182B (zh) * 2023-09-13 2024-06-04 江南大学 基于多特征聚合的立体匹配方法

Similar Documents

Publication Publication Date Title
US11961533B2 (en) Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
Van Bezooijen Characteristics and recognizability of vocal expressions of emotion
US9149202B2 (en) Device, method, and program for adjustment of hearing aid
Di Liberto et al. Neural representation of linguistic feature hierarchy reflects second-language proficiency
US20220301563A1 (en) Method of Contextual Speech Decoding from the Brain
Gillis et al. Neural tracking as a diagnostic tool to assess the auditory pathway
WO2021035067A1 (fr) Mesure de la compétence linguistique à partir de données d&#39;électroencéphalographie
Oganian et al. Vowel and formant representation in the human auditory speech cortex
Hunter Is the time course of lexical activation and competition in spoken word recognition affected by adult aging? An event-related potential (ERP) study
Bilibajkić et al. Automatic detection of stridence in speech using the auditory model
Ifukube Sound-based assistive technology
Ooster et al. Self-conducted speech audiometry using automatic speech recognition: Simulation results for listeners with hearing loss
Anumanchipalli et al. Intelligible speech synthesis from neural decoding of spoken sentences
Arias-Vergara Analysis of Pathological Speech Signals
Gaddy Voicing Silent Speech
Schuerman et al. Speaker statistical averageness modulates word recognition in adverse listening conditions
Rotman Rapid perceptual learning of time-compressed speech and the perception of natural fast speech in older adults with presbycusis
Zhang The cognitive mechanisms underlying the extrinsic perceptual normalization of vowels
Trevino Techniques for understanding hearing-impaired perception of consonant cues
Raghavan Decoding auditory attention from neural representations of glimpsed and masked speech
Di Dona et al. Early differentiation of memory retrieval processes for newly learned voices and phonemes as indexed by the MMN
Gillis Heard or understood? Neural markers of speech understanding
Venezia et al. The Role of Multisensory Temporal Covariation in Audiovisual Speech Recognition in Noise
Teoh Investigating the Neural Correlates of Speech Processing and Selective Auditory Attention using Electroencephalography (EEG)
O’Sullivan The impact of visual speech on neural processing of auditory speech

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20854410

Country of ref document: EP

Kind code of ref document: A1