WO2022251472A1 - Methods and devices for real-time word and speech decoding from neural activity - Google Patents
Methods and devices for real-time word and speech decoding from neural activity Download PDFInfo
- Publication number
- WO2022251472A1 WO2022251472A1 PCT/US2022/031101 US2022031101W WO2022251472A1 WO 2022251472 A1 WO2022251472 A1 WO 2022251472A1 US 2022031101 W US2022031101 W US 2022031101W WO 2022251472 A1 WO2022251472 A1 WO 2022251472A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attempted
- speech
- subject
- word
- electrical signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
- A61B5/37—Intracranial electroencephalography [IC-EEG], e.g. electrocorticography [ECoG]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/015—Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/25—Bioelectric electrodes therefor
- A61B5/279—Bioelectric electrodes therefor specially adapted for particular uses
- A61B5/291—Bioelectric electrodes therefor specially adapted for particular uses for electroencephalography [EEG]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/25—Bioelectric electrodes therefor
- A61B5/279—Bioelectric electrodes therefor specially adapted for particular uses
- A61B5/291—Bioelectric electrodes therefor specially adapted for particular uses for electroencephalography [EEG]
- A61B5/293—Invasive
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
- A61B5/372—Analysis of electroencephalograms
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4058—Detecting, measuring or recording for evaluating the nervous system for evaluating the central nervous system
- A61B5/4064—Evaluating the brain
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7278—Artificial waveform generation or derivation, e.g. synthesizing signals from measured signals
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/74—Details of notification to user or communication with user or patient; User input means
- A61B5/7405—Details of notification to user or communication with user or patient; User input means using sound
- A61B5/741—Details of notification to user or communication with user or patient; User input means using sound using synthesised speech
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/74—Details of notification to user or communication with user or patient; User input means
- A61B5/742—Details of notification to user or communication with user or patient; User input means using visual displays
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
Definitions
- Methods, devices, and systems for assisting individuals with communication are provided.
- methods, devices, and systems are provided for decoding words and sentences directly from neural activity of an individual.
- cortical activity from a region of the brain involved in speech processing is recorded while an individual attempts to say or spell out words (even if the words or spelled letters are not vocalized).
- Deep learning computational models are used to detect and classify words from the recorded brain activity. Decoding of speech from brain activity is aided by use of a language model that predicts how likely certain sequences of words are to occur.
- a method of assisting a subject with communication comprising: positioning a neural recording device comprising an electrode at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech by the subject; positioning an interface in communication with a computing device at a location on the head of the subject, wherein the interface is connected to the neural recording device; recording the brain electrical signal data associated with attempted speech by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor; and decoding a word, a phrase, or a sentence from the recorded brain electrical signal data using the processor.
- the subject has difficulty with communication because of anarthria, a stroke, a traumatic brain injury, a brain tumor, or amyotrophic lateral sclerosis.
- the subject is paralyzed.
- the location of the neural recording device is in the ventral sensorimotor cortex.
- the electrode can be positioned on a surface of the sensorimotor cortex region or within the sensorimotor cortex region. In some embodiments, the electrode is positioned on a surface of the sensorimotor cortex region of the brain in a subdural space.
- the method comprises recording brain electrical signal data from a sensorimotor cortex region selected from a precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- the neural recording device comprises a brain- penetrating electrode array or an electrocorticography (ECoG) electrode array.
- EoG electrocorticography
- the electrode is a depth electrode or a surface electrode.
- the features used by the processor are high-gamma frequency content features contained in the electrical signal data.
- the high-gamma frequency electrical signal data may comprise neural oscillations in a range from 70 Hz to 150 Hz.
- the method further comprises mapping the brain of the subject to identify an optimal location for positioning the electrode for recording the brain electrical signals associated with the attempted speech by the subject.
- the interface comprises a percutaneous pedestal connector attached to the subject's cranium.
- the interface further comprises a removable headstage connected to the percutaneous pedestal connector.
- the processor is provided by a computer or a handheld device (e.g., a cell phone or tablet).
- the processor is programmed to automate speech detection, word classification, and sentence decoding using a machine learning algorithm based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject.
- the machine learning algorithm uses artificial neural network (ANN) models for the speech detection and the word classification and natural language processing techniques such as, but not limited to, a hidden Markov model (HMM) or a Viterbi decoding model for the sentence decoding.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model for the sentence decoding.
- the processor is programmed to automate detection of onset and offset of word production during the attempted speech by the subject.
- the method further comprises assigning speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- the processor is programmed to use the recorded brain electrical signal data within a time window around the detected onset of word classification.
- the subject is limited to a specified word set for the attempted speech.
- the processor is programmed to calculate a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech.
- the processor is programmed to calculate the probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set, and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech.
- the word set comprises: am, are, bad, bring, clean, closer, comfortable, coming, computer, do, faith, family, feel, glasses, going, good, goodbye, have, hello, help, here, hope, how, hungry, I, is, it, like, music, my, need, no, not, nurse, okay, outside, please, right, success, tell, that, they, thirsty, tired, up, very, what, where, yes, and you.
- the subject may use the words of the word set without limitation to create sentences. In other embodiments, the subject is limited to a specified sentence set for the attempted speech.
- the processor is programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech. In some embodiments, the processor is programmed to calculate the probability that a sentence of the sentence set is an intended sentence that the subject tried to produce during the attempted speech for every sentence of the sentence set. In some embodiments, the processor is programmed to calculate the probability of many possible sentences composed entirely of words from the specified word set as being the intended sentence that the subject tried to produce during the attempted speech.
- the processor is programmed to maintain the most likely sentence as well as other, less likely sentences composed entirely of words from the specified word set that the subject tried to produce during the attempted speech. In some embodiments, the processor is programmed to track the first, second, and third most likely sentence possibilities at any given point in time. When a new word event is processed, the most likely sentence may change. For example, the second most likely sentence based on processing of a word event could then become the most likely sentence after one or more additional word events are processed. [0022] In certain embodiments, the sentence set includes sentences that can be selected to communicate with a caregiver regarding tasks the subject wishes the caregiver to perform.
- the sentences that can be composed entirely of words from the specified word set include sentences that can be used to communicate with a caregiver regarding the tasks the subject wishes the caregiver to perform.
- the sentence set comprises: Are you going outside; Are you tired; Bring my glasses here; Bring my glasses please; Do not feel bad; Do you feel comfortable; Faith is good; Hello how are you; Here is my computer; How do you feel; How do you like my music; I am going outside; I am not going; I am not hungry; I am not okay; I am okay; I am outside; I am thirsty; I do not feel comfortable; I feel very comfortable; I feel very hungry; I hope it is clean; I like my nurse; I need my glasses; I need you; It is comfortable; It is good; It is okay; It is right here; My computer is clean; My family is here; My family is outside; My family is very comfortable; My glasses are clean; My glasses are comfortable; My nurse is outside; My nurse is right outside; No; Please bring my glasses here; Please clean it; Please tell my family
- the processor is programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities. For example, words that occur more frequently are assigned more weight than words that occur less frequently according to the language model.
- the processor is programmed to use a hidden Markov model (HMM) or a Viterbi decoding model to determine the most likely sequence of words in the intended speech of the subject given the brain electrical signal data associated with the attempted speech, the predicted word probabilities from the word classification using the machine learning algorithm, and the word sequence probabilities using the language model.
- HMM hidden Markov model
- Viterbi decoding model to determine the most likely sequence of words in the intended speech of the subject given the brain electrical signal data associated with the attempted speech, the predicted word probabilities from the word classification using the machine learning algorithm, and the word sequence probabilities using the language model.
- the method further comprises: recording brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted speech or to control an external device; and analyzing the brain electrical signal data using a non-speech motor movement classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the processor is further programmed to automate detection of an attempted non-speech motor movement of the subject based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement. In some embodiments, the processor is further programmed to assign event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data. [0028] In certain embodiments, the method further comprises assessing accuracy of the decoding.
- a computer implemented method for decoding a sentence from recorded brain electrical signal data associated with attempted speech by a subject comprising: a) receiving the recorded brain electrical signal data from the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted speech is occurring at any time point during recording of the brain electrical signal data and detect onset and offset of word production during the attempted speech by the subject; c) analyzing the brain electrical signal data using a word classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject and calculates predicted word probabilities; d) performing sentence decoding by using the calculated word probabilities from the word classification model in combination with predicted word sequence probabilities in the sentence using a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to calculate predicted word sequence probabilities and determining the most likely sequence of words in the sentence based on the predicted word probabilities determined using the word classification model and the language model; and e
- the processor is programmed to automate speech detection, word classification, and sentence decoding using a machine learning algorithm based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject.
- the machine learning algorithm uses artificial neural network (ANN) models for the speech detection and the word classification and natural language processing techniques such as, but not limited to, a hidden Markov model (HMM) or a Viterbi decoding model for the sentence decoding.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model for the sentence decoding.
- the subject is limited to a specified word set for the attempted speech.
- the processor is further programmed to calculate a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech.
- the subject may use the words of the word set without limitation to create sentences.
- the subject is limited to a specified sentence set for the attempted speech.
- the processor is further programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech.
- the processor is further programmed to calculate a probability that a sentence of the sentence set is an intended sentence that the subject tried to produce during the attempted speech.
- the computer implemented method further comprises assigning speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- the computer implemented method further comprises analyzing the recorded brain electrical signal data within a time window around the detected onset of word classification (e.g., from 1 second before the detected onset up to 3 seconds after the detected onset for word classification).
- the computer implemented method further comprises assigning more weight to words that occur more frequently than words that occur less frequently according to the language model.
- the computer implemented method further comprises: receiving recorded brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted speech or to control an external device; and analyzing the brain electrical signal data using a non-speech motor movement classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the computer implemented method further comprises assigning event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data.
- the computer implemented method further comprises storing a user profile for the subject comprising information regarding the patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject.
- a non-transitory computer-readable medium comprising program instructions that, when executed by a processor in a computer, causes the processor to perform a computer implemented method described herein for decoding a sentence from recorded brain electrical signal data associated with attempted speech by a subject.
- a kit comprising the non-transitory computer-readable medium and instructions for decoding brain electrical signal data associated with attempted speech by a subject.
- a system for assisting a subject with communication comprising: a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech by the subject; a processor programmed to decode a sentence from the recorded brain electrical signal data according to a computer implemented method described herein; an interface in communication with a computing device adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and a display component for displaying the sentence decoded from the recorded brain electrical signal data.
- the subject has difficulty with communication because of anarthria, a stroke, a traumatic brain injury, a brain tumor, or amyotrophic lateral sclerosis.
- the location of the neural recording device is in the ventral sensorimotor cortex.
- the electrode is adapted for positioning on a surface of the sensorimotor cortex region or within the sensorimotor cortex region. In some embodiments, the electrode is adapted for positioning on a surface of the sensorimotor cortex region of the brain in a subdural space.
- the neural recording device comprises a brain- penetrating electrode array or an electrocorticography (ECoG) electrode array.
- the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features.
- the high-gamma frequency electrical signal data comprises neural oscillations in a range from 70 Hz to 150 Hz.
- the interface comprises a percutaneous pedestal connector attached to the subject's cranium.
- the interface further comprises a headstage that is connectable to the percutaneous pedestal connector.
- the processor is provided by a computer or handheld device (e.g., a cell phone or tablet).
- the processor is programmed to automate speech detection, word classification, and sentence decoding using a machine learning algorithm based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject.
- the machine learning algorithm uses artificial neural network (ANN) models for the speech detection and the word classification and natural language processing techniques such as, but not limited to, a hidden Markov model (HMM) or a Viterbi decoding model for the sentence decoding.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model for the sentence decoding.
- the processor is further programmed to assign speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- the processor is further programmed to use the recorded brain electrical signal data within a time window around the detected onset of word classification.
- the subject is limited to a specified word set for the attempted speech.
- the processor is further programmed to calculate a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set, and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech.
- the word set comprises: am, are, bad, bring, clean, closer, comfortable, coming, computer, do, faith, family, feel, glasses, going, good, goodbye, have, hello, help, here, hope, how, hungry, I, is, it, like, music, my, need, no, not, nurse, okay, outside, please, right, success, tell, that, they, thirsty, tired, up, very, what, where, yes, and you.
- the subject may use the words of the word set without limitation to create sentences. In other embodiments, the subject is limited to a specified sentence set for the attempted speech.
- the processor is further programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech. In some embodiments, the processor is further programmed to calculate a probability that a sentence of the sentence set is an intended sentence that the subject tried to produce during the attempted speech. In some embodiments, the sentence set includes sentences that can be selected to communicate with a caregiver regarding tasks the subject wishes the caregiver to perform.
- the sentence set comprises: Are you going outside; Are you tired; Bring my glasses here; Bring my glasses please; Do not feel bad; Do you feel comfortable; Faith is good; Hello how are you; Here is my computer; How do you feel; How do you like my music; I am going outside; I am not going; I am not hungry; I am not okay; I am okay; I am outside; I am thirsty; I do not feel comfortable; I feel very comfortable; I feel very hungry; I hope it is clean; I like my nurse; I need my glasses; I need you; It is comfortable; It is good; It is okay; It is right here; My computer is clean; My family is here; My family is outside; My family is very comfortable; My glasses are clean; My glasses are comfortable; My nurse is outside; My nurse is right outside; No; Please bring my glasses here; Please clean it; Please tell my family; That is very clean; They are coming here; They are coming outside; They are going outside; They have faith; What do you do; Where is it; Yes; and You are not right.
- the processor is further programmed to automate detection of an attempted non-speech motor movement of the subject based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement. In some embodiments, the processor is further programmed to assign event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data.
- a kit comprising a system described herein for assisting a subject with communication and instructions for using the system for recording and decoding brain electrical signal data associated with attempted speech by a subject is provided.
- a method of assisting a subject with communication comprising: positioning a neural recording device comprising an electrode at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by the subject; positioning an interface in communication with a computing device at a location on the head of the subject, wherein the interface is connected to the neural recording device; recording the brain electrical signal data associated with said attempted spelling by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor of the computing device; and decoding the spelled words of the intended sentence from the recorded brain electrical signal data using the processor.
- the electrical signal data comprises high-gamma frequency content features (e.g., 70 Hz to 150 Hz) and low frequency content features (e.g., 0.3 Hz to 100 Hz).
- recording the brain electrical signal data comprises recording the brain electrical signal data from a sensorimotor cortex region selected from a precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- the method further comprising mapping the brain of the subject to identify an optimal location for positioning the electrode for recording the brain electrical signals associated with the attempted spelling of words by the subject.
- the processor is programmed to automate detection of brain activity associated with the attempted spelling, letter classification, word classification, and sentence decoding based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted spelling of words by the subject.
- the processor is programmed to use a machine learning algorithm for the speech detection, letter classification, word classification, and sentence decoding.
- the machine learning algorithm may use natural language processing techniques.
- the processor is further programmed to constrain word classification from sequences of letters decoded from neural activity associated with attempted spelling of words by the subject to only words within a vocabulary of a language used by the subject.
- the processor is programmed to automate detection of onset and offset of letter production during the attempted spelling by the subject.
- the processor is further programmed to assign speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- the processor is programmed to use the recorded brain electrical signal data within a time window around the detected onset of attempted spelling of a letter by the subject.
- the method further comprises providing a series of go cues to the subject indicating when the subject should initiate attempted spelling of each letter of the words of the intended sentence. In some embodiments, the series of go cues are provided visually on a display.
- each go cue is preceded by a countdown to the presentation of the go cue, wherein the countdown for the next spelled letter is provided visually on the display and automatically started after each go cue.
- the series of go cues are provided with a set interval of time between each go cue.
- the subject can control the set interval of time between each go cue.
- the processor is programmed to use the recorded brain electrical signal data within a time window following the go cue. [0068] In certain embodiments, the processor is programmed to calculate a probability that a sequence of decoded words from a sequence of decoded letters is an intended sentence that the subject tried to produce during the attempted spelling of letters of words of an intended sentence by the subject.
- the processor is programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities. In some embodiments, words that occur more frequently are assigned more weight than words that occur less frequently according to the language model. [0070] In certain embodiments, the processor is further programmed to use a sequence of predicted letter probabilities to compute potential sentence candidates and automatically insert spaces into letter sequences between predicted words in the sentence candidates.
- the method further comprises: recording brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted spelling of words of the intended sentence or to control an external device; and analyzing the brain electrical signal data using a classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- the processor is programmed to automate detection of an attempted non-speech motor movement of the subject signaling the end of the attempted spelling by the subject based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement.
- the processor is further programmed to assign event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data.
- the method further comprises: recording brain electrical signal data associated with attempted speech by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor of the computing device; and decoding a word, a phrase, or a sentence from the recorded brain electrical signal data associated with attempted speech by the subject using the processor, as described herein.
- the method further comprises assessing accuracy of the decoding.
- a computer implemented method for decoding a sentence from recorded brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by a subject performing steps comprising: a) receiving the recorded brain electrical signal data associated with the attempted spelling of letters of words of an intended sentence by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted spelling is occurring at any time point and detect onset and offset of letter production during the attempted spelling by the subject; c) analyzing the brain electrical signal data using a letter classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted letter production by the subject and calculates a sequence of predicted letter probabilities; d) computing potential sentence candidates based on the sequence of predicted letter probabilities and automatically inserting spaces into the letter sequences between predicted words in the sentence candidates, wherein decoded words in the letter sequences are constrained to only words within a vocabulary of a language used by the subject; e) analyzing the potential sentence candidates using a language model
- the recorded brain electrical signal data is only used within a time window around the detected onset of attempted spelling of a letter by the subject.
- the method further comprises displaying a series of go cues to the subject indicating when the subject should initiate attempted spelling of each letter of the words of the intended sentence.
- each go cue is preceded by displaying a countdown to the presentation of the go cue, wherein the countdown for the next spelled letter is automatically started after each go cue.
- the series of go cues are provided with a set interval of time between each go cue.
- the subject can control the set interval of time between each go cue.
- the recorded brain electrical signal data within a time window following the go cue is used for letter classification.
- the computer implemented method further comprises receiving recorded brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted spelling of words of the intended sentence or to control an external device; and analyzing the brain electrical signal data using a motor movement classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- a machine learning algorithm is used for speech detection and letter classification.
- the computer implemented method further comprises assigning more weight to words that occur more frequently than words that occur less frequently according to the language model.
- the computer implemented method further comprises storing a user profile for the subject comprising information regarding the patterns of electrical signals in the recorded brain electrical signal data associated with letter production during attempted spelling by the subject.
- the electrical signal data comprises high-gamma frequency content features (e.g., 70 Hz to 150 Hz) and low frequency content features (e.g., 0.3 Hz to 100 Hz).
- the computer implemented method further comprises assessing accuracy of the decoding.
- the computer implemented method further comprises decoding a sentence from recorded brain electrical signal data associated with attempted speech by the subject, the computer further performing steps comprising: a) receiving the recorded brain electrical signal data associated with the attempted speech by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted speech is occurring at any time point and detect onset and offset of word production during the attempted speech by the subject; c) analyzing the brain electrical signal data using a word classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject and calculates predicted word probabilities; d) performing sentence decoding by using the calculated word probabilities from the word classification model in combination with predicted word sequence probabilities in the sentence using a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to calculate predicted word sequence probabilities and determining the most likely sequence of words in the sentence based on
- a machine learning algorithm is used for speech detection, word classification, and sentence decoding.
- artificial neural network (ANN) models are used for the speech detection and the word classification and a hidden Markov model (HMM), a Viterbi decoding model, or other natural language processing techniques are used for the sentence decoding.
- HMM hidden Markov model
- Viterbi decoding model or other natural language processing techniques are used for the sentence decoding.
- a non-transitory computer-readable medium comprising program instructions that, when executed by a processor in a computer, causes the processor to perform a computer implemented method described herein.
- a kit comprising the non-transitory computer-readable medium and instructions for decoding brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by a subject.
- a system for assisting a subject with communication comprising: a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech, attempted spelling of letters of words of an intended sentence, or attempted non-speech motor movement by the subject, or a combination thereof; a processor programmed to decode a sentence from the recorded brain electrical signal data according to a computer implemented method described herein; an interface in communication with a computing device, said interface adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and a display component for displaying the sentence decoded from the recorded brain electrical
- the electrode is adapted for positioning on a surface of the sensorimotor cortex region or within the sensorimotor cortex region. [0090] In certain embodiments, the electrode is adapted for positioning on a surface of the sensorimotor cortex region of the brain in a subdural space. [0091] In certain embodiments, the neural recording device comprises a brain- penetrating electrode array. [0092] In certain embodiments, the neural recording device comprises an electrocorticography (ECoG) electrode array. [0093] In certain embodiments, the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features (e.g., 70 Hz to 150 Hz) and low frequency content features (e.g., 0.3 Hz to 100 Hz).
- the interface comprises a percutaneous pedestal connector attached to the subject's cranium.
- the interface further comprises a headstage that is connectable to the percutaneous pedestal connector.
- the processor is provided by a computer or handheld device (e.g., a cell phone or tablet).
- kits comprising a system described herein and instructions for using the system for recording and decoding brain electrical signal data associated with attempted speech, attempted spelling of words, or attempted non-speech motor movement by a subject, or a combination thereof.
- the methods of assisting a subject with communication through decoding of neural activity associated with attempted speech, attempted spelling of words, or attempted non- speech motor movement can be combined.
- the techniques are complementary.
- decoding of attempted spelling may enable a larger vocabulary to be used than for decoding of attempted speech.
- decoding of attempted speech may be easier and more convenient for the subject, as it allows faster, direct word decoding, which may be preferred to express frequently used words.
- FIG.1 Schematic overview of the direct speech BCI.
- Neural activity acquired from an investigational electrocorticography (ECoG) electrode array implanted in a clinical trial participant with severe paralysis is used to directly decode words and sentences in real time.
- EoG investigational electrocorticography
- the participant is visually prompted with a question (A) and is instructed to attempt to respond using words from a predefined 50-word vocabulary.
- cortical signals are acquired from the surface of the brain via the ECoG device (B) and processed in real time (C).
- a speech detection model analyzes the processed neural signals sample-by-sample to detect the participant’s attempts to speak (D).
- a classifier computes word probabilities (across the 50 possible words) from each detected window of relevant neural activity (E).
- a Viterbi decoding algorithm uses these probabilities in conjunction with word sequence probabilities from a separately trained language model to decode the most likely sentence given the ECoG data (F).
- the predicted sentence which is updated each time a word is decoded, is displayed as feedback to the participant (G).
- FIGS. 2A-2E Neural signal processing and language modeling enable decoding of a variety of sentences in real time.
- FIG.2A shows word error rates of the word sequences decoded from the participant’s cortical activity during sentence task blocks.
- FIG.2B shows decoded words per minute values across all trials when either including or excluding words that were incorrectly decoded.
- Each violin distribution was created using kernel density estimation with Scott bandwidth estimation, accompanied by a thick horizontal line depicting the median and smaller horizontal lines depicting the range (excluding outliers that were more than 4 standard deviations below or above the mean).
- FIG.2C shows a summary of the differences between the number of detected and actual words in each trial, with the percent of trials with correct sentence lengths shown in black and incorrect sentence lengths shown in dark red.
- FIG. 2D shows the edit distances (the number of decoding errors made) for the decoded sentences with and without the LM across all trials and all 50 sentence targets, sorted by ascending edit distance for the predictions with the LM (lower edit distance indicates better performance).
- Each small vertical dash represents the edit distance for a single trial (there are 3 trials per target sentence; marks for identical edit distances are staggered horizontally for visualization purposes).
- Each dot represents the mean edit distance for that target sentence.
- the histogram on the bottom shows the edit distance counts across all of the trials.
- FIG.2E shows the target sentence and the decoded sentence with and without use of the LM for seven different trials. Correctly decoded words are shown in black and incorrect words are shown in red.
- FIGS. 3A-3C Distinct neural activity patterns underlie word production attempts.
- FIG.3A shows the effect of the amount of training data on word classification accuracy using cortical activity recorded during the participant’s isolated word production attempts. Each point depicts mean ⁇ standard deviation across 10 cross-validation folds. Chance accuracy is depicted as a horizontal dashed line.
- FIG.3B shows the participant’s brain reconstruction overlaid with the locations of the implanted electrodes and their contributions to the speech detection and word classification models.
- FIG.3C shows word confusions from the classification results, depicting how often the classifier predicted each of the 50 words given the identity of the target word that the participant was attempting to say (values along the diagonal correspond to correct classifications).
- FIGS. 4A-4B Neural activity recorded during attempted speech exhibits long- term stability.
- FIG.4A shows neural activity from a single electrode across all of the participant’s attempts to say the word “Goodbye” during the isolated word task, spanning over 18 months of recording.
- FIG.4B shows word classification outcomes from training and testing the detector and classifier on subsets of isolated word data sampled from four non-overlapping date ranges. Each subset contains data from 20 attempted productions of each word. Each solid bar depicts results from cross-validated evaluation within a single subset, and each dotted bar depicts results from training on data from all of the subsets except for the one that is being evaluated. Each bar depicts mean ⁇ standard error across 10 evaluation folds. Chance accuracy is depicted as a horizontal dashed line.
- Electrode contributions computed during cross-validated evaluation within a single subset are shown on top (oriented with the most dorsal and posterior electrode in the upper-right corner). Plotted electrode size (area) and opacity are scaled by relative contribution. Each set of contributions are normalized to sum to 1.
- FIGS. 5A-5B MRI results for the participant.
- FIG.5A shows a sagittal MRI for the participant, who has encephalomalacia and brainstem atrophy (labeled in blue) caused by pontine stroke (labeled in red).
- FIG.5B shows two additional MRI scans that indicate the absence of cerebral atrophy, suggesting that cortical neuron populations (including those recorded from in this study) should be relatively unaffected by the participant’s pathology.
- FIG.6 Real-time neural data acquisition hardware infrastructure. Electrocorticography (ECoG) data acquired from the implanted array and percutaneous pedestal connector are processed and transmitted to the Neuroport digital signal processor (DSP). Simultaneously, microphone data are acquired, amplified, and transmitted to the DSP.
- EoG Electrocorticography
- DSP Neuroport digital signal processor
- Signals from the DSP are transmitted to the real-time computer.
- the real-time computer controls the task displayed to the participant, including any decoded sentences that are provided in real time as feedback. Speaker output from the real-time computer is also sent to the DSP and synchronized with the neural signals (not depicted).
- a human patient cable connected to the pedestal acquired the ECoG signals, which were then processed by a front-end amplifier before being transmitted to the DSP (the human patient cable and front-end amplifier are not shown here, but they replaced the digital headstage and digital hub in this pipeline when they were used).
- FIG.7 Real-time neural signal processing pipeline.
- ECG electrocorticography
- the participant uses the data acquisition headstage and rig to acquire electrocorticography (ECoG) signals at 30 kHz, filtered with a wide-band filter, conditioned with a software-based line noise cancellation technique, low-pass filtered at 500 Hz, and streamed to the real-time computer at 1 kHz.
- custom software was used to perform common average referencing, multi- band high gamma band-pass filtering, analytic amplitude estimation, multi-band averaging, and running z-scoring on the ECoG signals.
- the resulting signals were then used as the measure of high gamma activity for the remaining analyses.
- FIG.8 Data collection timeline.
- FIG.9 Speech detection model schematic.
- the z-scored high gamma activity across all electrodes is processed time point by time point by an artificial neural network consisting of a stack of three long short-term memory layers (LSTMs) and a single dense (fully connected) layer.
- LSTMs long short-term memory layers
- the dense layer projects the latent dimensions of the last LSTM layer into probability space for three event classes: speech, preparation, and rest.
- the predicted speech event probability time series is smoothed and then thresholded with probability and time thresholds to yield onset (t * ) and offset times of detected speech events.
- onset onset
- offset times onset
- each time a speech event was detected the window of neural activity spanning from ⁇ 1 to 3 seconds relative to the detected onset (t * ) was passed to the word classifier.
- the neural activity, predicted speech probability time series (upper right), and detected speech event (lower right) shown are the actual neural data and detection results across a 7-second time window for an isolated word trial in which the participant attempted to produce the word “family”.
- FIG.10 Word classification model schematic.
- a 4- second time window of high gamma activity is processed by an ensemble of 10 artificial neural network (ANN) models.
- ANN artificial neural network
- the high gamma activity is processed by a temporal convolution followed by two bidirectional gated recurrent unit (GRU) layers.
- GRU gated recurrent unit
- a dense layer projects the latent dimension from the final GRU layer into probability space, which contains the probability of each of the words from the 50-word set being the target word during the speech production attempt associated with the neural time window.
- the 10 probability distributions from the ensembled ANN models are averaged together to obtain the final vector of predicted word probabilities.
- FIG.11 Sentence decoding hidden Markov model.
- This hidden Markov model describes the relationship between the words that the participant attempts to produce (the hidden states qi) and the associated detected time windows of neural activity (the observed states y i ).
- q 0 ) can be simplified to p ( ⁇ i
- FIGS. 12A-12C Auxiliary modeling results with isolated word data.
- FIG.12A shows the effect of the amount of training data on word classification accuracy (left) and cross- entropy loss (right) using cortical activity recorded during the participant’s isolated word production attempts.
- Lower cross entropy indicates better performance.
- Each point depicts mean ⁇ standard deviation across 10 cross-validation folds (the error bars in the cross-entropy plot were typically too small to be seen alongside the circular markers).
- Chance performance is depicted as a horizontal dashed line in each plot (chance cross-entropy loss is computed as the negative log (base 2) of the reciprocal of the number of word targets). Performance improved more rapidly for the first four hours of training data and then less rapidly for the next 5 hours, although it did not plateau.
- FIG.12B shows the effect of the amount of training data on the frequency of detection errors during speech detection and detected event curation with the isolated word data. Lower error rates indicate better performance. False positives are detected events that were not associated with a word production attempt and false negatives are word production attempts that were not associated with a detected event. Each point depicts mean ⁇ standard deviation across 10 cross-validation folds. Not all of the available training data was used to fit each speech detection model, but each model always used between 47 and 83 minutes of data (not depicted).
- FIG.12C shows the distribution of onsets detected from neural activity across 9000 isolated word trials relative to the go cue (100 ms histogram bin size).
- FIG.13 Acoustic contamination investigation. Each blue curve depicts the average correlations between the spectrograms from a single electrode and the corresponding spectrograms from the time-aligned microphone signal as a function of frequency.
- the red curve depicts the average power spectral density (PSD) of the microphone signal.
- PSD average power spectral density
- Vertical dashed lines mark the 60 Hz line noise frequency and its harmonics.
- Highlighted in green is the high gamma frequency band (70-150 Hz), which was the frequency band from which we extracted the neural features used during decoding.
- the high gamma frequency band 70-150 Hz
- correlations between the electrode and microphone signals are small. There is a slight increase in correlation in the lower end of the high gamma frequency range, but this increase in correlation occurs as the microphone PSD decreases. Because the correlations are low and do not increase or decrease with the microphone PSD, the observed correlations are likely due to factors other than acoustic contamination, such as shared electrical noise.
- FIGS. 14A-14C Long-term stability of speech-evoked signals.
- FIG.14A shows neural activity from a single electrode across all of the participant’s attempts to say the word “Goodbye” during the isolated word task, spanning 81 weeks of recording.
- FIG.14B shows the participant’s brain reconstruction overlaid with electrode locations. The electrode shown in Panel A is filled in with black. For anatomical reference, the precentral gyrus is highlighted in light blue.
- FIG.14C shows word classification outcomes from training and testing the detector and classifier on subsets of isolated word data sampled from four non-overlapping date ranges. Each subset contains data from 20 attempted productions of each word. Each solid bar depicts results from cross-validated evaluation within a single subset, and each dotted bar depicts results from training on data from all of the subsets except for the one that is being evaluated. Each error bar shows the 95% confidence interval of the mean, computed across cross-validation folds. Chance accuracy is depicted as a horizontal dashed line. Electrode contributions computed during cross-validated evaluation within a single subset are shown on top (oriented with the most dorsal and posterior electrode in the upper-right corner).
- FIG.15 Schematic depiction of the spelling pipeline.
- A At the start of a sentence-spelling trial, the participant attempts to silently say a word to volitionally activate the speller.
- B Neural features (high-gamma activity and low-frequency signals) are extracted in real time from the recorded cortical data throughout the task. The features from a single electrode (electrode 0 as shown in FIG.19A) are depicted.
- the speech-detection model consisting of a recurrent neural network (RNN) and thresholding operations, processes the neural features sample-by-sample to detect a silent-speech attempt. Once an attempt is detected, the detection model becomes inactive and the spelling procedure begins.
- RNN recurrent neural network
- the participant spells out the intended message throughout letter-decoding cycles that occur every 2.5 seconds. Each cycle, the participant is visually presented with a countdown and eventually a go cue. At the go cue, the participant attempts to silently say the code word that represents the desired letter.
- High-gamma activity and low-frequency signals are computed throughout the spelling procedure for all electrode channels and dividedled into 2.5- second non-overlapping time windows corresponding to the letter-decoding cycles.
- An RNN- based letter-classification model processes each of these neural time windows to predict the probability that the participant was attempting to silently say each of the 26 possible code words or attempting to perform a hand-motor command (see G). If the classifier predicts that the participant was performing the hand-motor command with at least 80% probability, the spelling procedure ends and the sentence is finalized (see I). Otherwise, the predicted letter probabilities are processed by a beam-search algorithm in real time and the most likely sentence is displayed to the participant.
- FIGS. 16A-16F Performance summary of the spelling system during the copy- typing task.
- FIG.16A The neural network-based language model (“DistilGPT-2”) rescores the sentences composed solely of complete words, and the system uses the most likely sentence after rescoring as the final prediction.
- DistilGPT-2 a neural network-based language model
- CERs Character error rates observed during real-time sentence spelling (denoted as ‘+LM (Real-time results)’) and offline simulations in which portions of the spelling system were omitted.
- CERs Character error rates observed during real-time sentence spelling (denoted as ‘+LM (Real-time results)’) and offline simulations in which portions of the spelling system were omitted.
- ‘Chance’ condition sentences were created by replacing the outputs from the neural classifier with randomly generated letter probabilities without altering the remainder of the spelling pipeline.
- Only neural decoding sentences were created solely by concatenating together the most likely character from each of the classifier’s predictions during a sentence trial (which did not include any whitespace characters).
- the predicted letter probabilities from the neural classifier were used with a beam search that constrained the predicted character sequences to form words from within the 1,152-word vocabulary.
- the final condition labeled ‘+ LM (Real-time results)’ shows the real-time results during testing with the participant, incorporating language modeling during the beam search and after the sentence is finalized.
- the sentences decoded with the full system in real time exhibited lower CERs than sentences decoded in the other conditions (*** P ⁇ 0.0001, two-sided Wilcoxon Rank-Sum test with 6-way Holm-Bonferroni correction).
- FIG. 16B Word error rates (WERs) for real-time results and corresponding offline omission simulations from FIG.16A.
- FIG.16C The decoded characters per minute during real-time testing.
- FIG.16D The decoded words per minute during real-time testing.
- FIG. 16E Number of excess characters in each decoded sentence.
- FIGS. 17A-17H Characterization of high-gamma activity (HGA) and low- frequency signals (LFS) during silent-speech attempts.
- FIG.17A 10-fold cross-validated classification accuracy on silently attempted NATO code words when using HGA alone, LFS alone, and both HGA+LFS simultaneously.
- Classification accuracy using only LFS is significantly higher than using only HGA, and using both HGA+LFS results in significantly higher accuracy than using either feature type alone (** P ⁇ 0.001, two-sided Wilcoxon Rank- Sum test with 3-way Holm-Bonferroni correction).
- Chance accuracy is 3.7%.
- Each boxplot depicts the quartiles of the data with whiskers extending to show the remainder of the distribution except for data points that are 1.5 times the interquartile range.
- FIG.17B Electrode contributions from a classification model trained using only HGA features.
- FIG.17C Electrode contributions associated with HGA features from a classification model trained using the combined HGA+LFS feature set.
- FIG.17D Electrode contributions from a classification model trained using only LFS features.
- FIG.17E Electrode contributions associated with LFS features from a classification model trained using the combined HGA+LFS feature set.
- plotted electrode size and opacity are scaled by relative contribution; electrodes that appear larger and more opaque provided more important features to the classification model.
- FIG.17F plotted electrode size and opacity are scaled by relative contribution; electrodes that appear larger and more opaque provided more important features to the classification model.
- PCs principal components
- FIG.17F and FIG.17G the number of PCs required for each feature set is depicted as a histogram, where the x-axis is the percent of the bootstrap iterations that required a certain number of PCs.
- FIG.17H Effect of temporal smoothing on classification accuracy. Each point represents the median and error bars represent the 99% confidence interval around bootstrapped estimations of the median.
- FIGS. 18A-18C Comparison of neural signals during attempts to silently say English letters and NATO code words.
- each boxplot depicts the quartiles of the data with whiskers extending to show the rest of the distribution except for data points that are 1.5 times the interquartile range.
- FIG.18C The nearest-class distance is greater for the majority of code words than for the corresponding letters.
- nearest-class distances are computed as the Frobenius norm between trial-averaged HGA+LFS features.
- FIGS. 19A-19D Differences in neural signals and classification performance between overt- and silent-speech attempts.
- FIG.19A MRI reconstruction of the participant’s brain overlaid with implanted electrode locations. The locations of the electrodes used in FIG. 19B and FIG.19C are bolded and numbered in the overlay.
- FIG.19B High-gamma activity (HGA) event-related potentials during silent (orange) and overt (green) attempts to say the NATO code word “kilo”.
- FIG.19C High-gamma activity (HGA) event-related potentials during silent (orange) and overt (green) attempts to say the NATO code word “tango”.
- FIGS. 20A-20D The spelling approach can generalize to larger vocabularies and conversational settings.
- FIG.20A Simulated character error rates from the copy-typing task with different vocabularies, including the original vocabulary used during real-time decoding.
- FIG.20B Word error rates from the corresponding simulations in FIG.20A.
- FIG.20C Character and word error rates across the volitionally chosen responses and messages decoded in real time during the conversational task condition.
- each boxplot depicts the quartiles of the data with whiskers extending to show the rest of the distribution except for data points that are 1.5 times the interquartile range.
- FIG.20D Examples of presented questions from trials of the conversational task condition (left) along with corresponding responses decoded from the participant’s brain activity (right).
- FIG.21 Data collection timeline. Each bar depicts the total number of trials collected on each day of recording. The participant and implant date are the same as in our previous work [2]. If more than one type of dataset was collected in a single day, the bar is colored by the proportion of each dataset collected. Each color represents a specific dataset (as specified in the legend). Datasets vary in task type (isolated-target or real-time sentence spelling), utterance set (English letters, NATO code words (which included the attempted hand squeeze), copy-typing sentences, or conversational sentences), and, for the real-time sentence- spelling datasets, the purpose of the data (for hyperparameter optimization or for performance evaluation).
- FIG.22 Real-time signal-processing pipeline.
- a detachable data-acquisition headstage (CerePlex E, Blackrock Microsystems) attached to the percutaneous pedestal connector applied a hardware-based wide-band Butterworth filter (between 0.3 Hz and 7.5 kHz) to the ECoG signals, digitized them with 16-bit, 250-nV per bit resolution, and transmitted them at 30 kHz through additional connections to a Neuroport system (Blackrock Microsystems), which processed the signals using software-based line noise cancellation and an anti-aliasing low-pass filter (at 500 Hz). Afterwards, the processed signals were streamed at 1 kHz to a separate computer for further real-time processing and analysis, where we applied a common average reference (across all electrode channels) to each time sample of the ECoG data.
- a hardware-based wide-band Butterworth filter between 0.3 Hz and 7.5 kHz
- the re- referenced signals were then processed in two parallel streams to extract high-gamma activity (HGA) and low-frequency signal (LFS) features.
- HGA high-gamma activity
- LFS low-frequency signal
- To compute the HGA features we applied eight 390 th -order band-pass finite impulse response (FIR) filters to the re-referenced signals (filter center frequencies were within the high-gamma band at 72.0, 79.5, 87.8, 96.9, 107.0, 118.1, 130.4, and 144.0 Hz). Then, for each channel and band, we used a 170 th -order FIR filter to approximate the Hilbert transform.
- FIR finite impulse response
- FIG.23 Speech-detection model schematic. To detect silent-speech attempts from the participant’s neural activity during real-time sentence spelling, first the z-scored low- frequency signals (LFS) and high-gamma activity (HGA) for each electrode are processed continuously by a stack of 3 long short-term memory (LSTM) layers. Next, a single dense (fully connected) layer projects the latent dimensions of the final LSTM onto the 4 possible classes: speech, speech preparation, rest, and motor.
- LFS low- frequency signals
- HGA high-gamma activity
- FIGS. 24A-24B Effects of feature selection on code-word classification accuracy.
- FIG.24A Effects of feature selection on code-word classification accuracy.
- HGA high- gamma activity
- LFS low-frequency signals
- FIG.25 Confusion matrix from isolated-target trial classification. Confusion values, computed during offline classification of neural data (using both high-gamma activity and low-frequency signals) recorded during isolated-target trials, are shown for each NATO code word and the attempted hand squeeze.
- Each row corresponds to a target code word or the attempted hand squeeze, and the value in each column for that row corresponds to the percent of isolated-target task trials that were correctly classified as the target (if the value is along the diagonal) or misclassified (“confused”) as another potential target (if the value is not along the diagonal).
- the values in each row sum to 100%.
- silent-speech and hand-squeeze attempts were reliably classified.
- FIGS. 26A-26B Neural-activation characteristics during overt- and silent-speech attempts.
- FIG.26A Neural-activation characteristics during overt- and silent-speech attempts.
- Each image shows an MRI reconstruction of the participant’s brain overlaid with electrode locations and the maximum neural activations for each electrode, type of speech attempt (overt or silent), and feature type (high-gamma activity (HGA) or low-frequency signals (LFS)), measured as maximum peak code-word average magnitudes.
- HGA high-gamma activity
- LFS low-frequency signals
- FIG.26B The standard deviation of peak code-word average magnitudes.
- the standard deviation (instead of the maximum used in FIG.26A) of the peak average magnitudes across the code words for each electrode, type of speech attempt, and feature type is computed and plotted, depicting how much the magnitudes varied across speech targets for that combination.
- each plotted electrode indicates the true associated value for that electrode
- the size of each electrode depicts the associated value for that electrode relative to the values for the other electrodes (for a given type of speech attempt and feature type).
- DETAILED DESCRIPTION [00126] Methods, devices, and systems for assisting a subject with communication are provided. In particular, methods, devices, and systems are provided for decoding words and sentences directly from neural activity of an individual. In the disclosed methods, cortical activity from a region of the brain involved in speech processing is recorded while an individual attempts to say or spell out words of a sentence. Deep learning computational models are used to detect and classify words from the recorded brain activity.
- Decoding of speech from brain activity is aided by use of a language model that predicts how likely certain sequences of words are to occur.
- decoding of attempted non-speech motor movements from neural activity can be used to further assist communication.
- the methods, devices, and systems disclosed herein may be used to assist individuals who have difficulty with communication caused by conditions and diseases including, without limitation, strokes, traumatic brain injuries, brain tumors, amyotrophic lateral sclerosis, multiple sclerosis, Huntington's disease, Niemann-Pick disease, Friedreich's ataxia, Wilson's disease, cerebral palsy, Guillain-Barré syndrome, Tay-Sachs disease, encephalopathy, central pontine myelinolysis, and other conditions causing dysfunction or paralysis of the muscles of the head, neck, or chest resulting in anarthria.
- Communication disorders is used herein to refer to a group of conditions that affect the ability of a subject to speak. Communication disorders include, without limitation, anarthria, strokes, traumatic brain injuries, brain tumors, amyotrophic lateral sclerosis, multiple sclerosis, Huntington's disease, Niemann-Pick disease, Friedreich's ataxia, Wilson's disease, cerebral palsy, Guillain-Barré syndrome, Tay-Sachs disease, encephalopathy, central pontine myelinolysis, and other conditions causing dysfunction or paralysis of the muscles of the head, neck, or chest resulting in anarthria.
- the term “communication” includes word-based communication such as verbal communication including spoken speech, spelling of words, and production of text (e.g., controlling a personal device to generate email or text via attempts to speak) as well as action- based communication such as through attempted non-speech motor movement.
- Attempted speech may include vocalized speech, which may or may not be intelligible, or non-vocalized speech.
- Silent-speech attempts are volitional attempts to articulate speech without vocalizing.
- Silent-spelling attempts are volitional attempts to spell alphabetical characters or numbers without vocalizing.
- Attempted non-speech motor movement may include imagined movement without any detectable physical movement. Attempted non-speech motor movements may include, without limitation, imagined head, arm, hand, foot, and leg movements.
- Attempted non- speech motor movements may be used to indicate the initiation or termination of attempted speech or spelling or to control an external device (e.g., for communication with a personal device or software applications or to turn on or off a device).
- neural activity is recorded during attempts to communicate whether or not the individual produces any vocal output or detectable motor movement.
- the terms “subject”, “individual”, “patient”, and “participant” are used interchangeably herein and refer to a patient having a communication disorder.
- the patient is preferably human, e.g., a child, an adolescent, an adult, such as a young, middle-aged, or elderly human who may benefit from the systems, devices, and methods disclosed herein for restoring communication.
- the patient may have been diagnosed as having anarthria.
- the term “user” as used herein refers to a person that interacts with a device and/system disclosed herein for performing one or more steps of the presently disclosed methods.
- the user may be the patient receiving treatment.
- the user may be a health care practitioner, such as, the patient’s physician.
- METHODS [00139]
- the present disclosure provides methods for assisting a subject with communication. Methods are provided for decoding words and sentences directly from neural activity of an individual. In the disclosed methods, cortical activity from a region of the brain involved in speech processing is recorded while an individual attempts to say or spell out words of a sentence. Attempts to say or spell out words can include or exclude vocalizations.
- neural activity is recorded during attempts to say or spell out words whether or not the individual produces any vocal output.
- the vocal output may be unintelligible when the individual attempts to say or spell out words.
- Deep learning computational models are used to detect and classify words and/or spelled letters from the recorded brain activity. Decoding of speech from brain activity is aided by use of a language model that predicts how likely certain sequences of words occur.
- the neurotechnology described herein can be used to restore communication to patients who have lost the ability to speak and has the potential to improve autonomy and quality of life.
- the method includes positioning a neural recording device comprising one or more electrodes at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech and/or attempted spelling by the subject; and positioning an interface in communication with a computing device at a location on the head of the subject.
- Brain electrical signal data associated with attempted speech and/or attempted spelling by the subject is recorded using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor programmed to detect attempted speech and/or spelling by the subject and decode spelled letters, words, phrases, or sentences from the recorded brain electrical signal data.
- the recording device may comprise non-brain penetrating surface electrodes or brain-penetrating depth electrodes.
- the electrical signals may be recorded using a single electrode, electrode pairs, or an electrode array.
- the brain activity is recorded from more than one site.
- brain electrical signal data is recorded from a sensorimotor cortex region of the brain involved in speech processing such as the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- the electrode is positioned on a surface of the sensorimotor cortex region of the brain in a subdural space.
- Positioning an electrode for recording brain activity at specified region(s) of the brain may be carried out using standard surgical procedures for placement of intra-cranial electrodes.
- the phrases “an electrode” or “the electrode” refer to a single electrode or multiple electrodes such as an electrode array.
- the term “contact” as used in the context of an electrode in contact with a region of the brain refers to a physical association between the electrode and the region. In other words, an electrode that is in contact with a region of the brain is physically touching the region of the brain. An electrode in contact with a region of the brain can be used to detect electrical signals corresponding to neural activity associated with attempted speech and/or spelling.
- Electrodes used in the methods disclosed herein may be monopolar (cathode or anode) or bipolar (e.g., having an anode and a cathode).
- one or more electrodes are used to record electrical signals for neural activity associated with attempted speech and/or attempted spelling in one or more brain regions.
- An electrode may be placed, for example, in a region of the sensorimotor cortex involved in speech processing such as the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region of the brain. In certain cases, placing the electrode may involve positioning the electrode on the surface of the specified region(s) of the brain.
- electrodes may be placed on the surface of the brain at the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- the electrode may contact at least a portion of the surface of the brain at the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus regions.
- the electrode may contact substantially the entire surface area at the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus regions.
- the electrode may additionally contact area(s) adjacent to the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus regions.
- an electrode array arranged on a planar support substrate may be used for detecting electrical signals for neural activity from one or more of the brain regions specified herein.
- the surface area of the electrode array may be determined by the desired area of contact between the electrode array and the brain.
- An electrode for implanting on a brain surface such as, a surface electrode or a surface electrode array may be obtained from a commercial supplier. A commercially obtained electrode/electrode array may be modified to achieve a desired contact area.
- the non-brain penetrating electrode (also referred to as a surface electrode) that may be used in the methods disclosed herein may be an electrocorticography (ECoG) electrode or an electroencephalography (EEG) electrode.
- ECG electrocorticography
- EEG electroencephalography
- placing the electrode at a target area or site may involve positioning a brain penetrating electrode (also referred to as depth electrode) in the specified region(s) of the brain.
- a depth electrode may be placed in a selected region of the sensorimotor cortex involved in speech processing (e.g., the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region).
- the electrode may additionally contact area(s) adjacent to the selected region of the sensorimotor cortex involved in speech processing (e.g., adjacent to the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region).
- an electrode array may be used for recording electrical signals at the selected region of the sensorimotor cortex involved in speech processing (e.g., the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region) as specified herein.
- the depth to which an electrode is inserted into the brain may be determined by the desired level of contact between the electrode array and the brain and the types of neural populations that the electrode would have access to for recording electrical signals.
- a brain- penetrating electrode array may be obtained from a commercial supplier.
- a commercially obtained electrode array may be modified to achieve a desired depth of insertion into the brain tissue.
- an electrode array may include two or more electrodes, such as 3 or more, 10 or more, 50 or more, 100 or more, 200 or more, 500 or more, including 4 or more, e.g., about 3 to 6 electrodes, about 6 to 12 electrodes, about 12 to 18 electrodes, about 18 to 24 electrodes, about 24 to 30 electrodes, about 30 to 48 electrodes, about 48 to 72 electrodes, about 72 to 96 electrodes, about 96 to 128 electrodes, about 128 to 196 electrodes, about 196 to 294 electrodes, or more electrodes.
- the electrodes may be arranged into a regular repeating pattern (e.g., a grid, such as a grid with about 1 cm spacing between electrodes), or no pattern.
- a regular repeating pattern e.g., a grid, such as a grid with about 1 cm spacing between electrodes
- An electrode that conforms to the target site for optimal recording of electrical signals from neural activity associated with attempted speech and/or spelling by a subject may be used.
- One such example is a single multi contact electrode with eight contacts separated by 21 ⁇ 2 mm. Each contact would have a span of approximately 2 mm.
- Another example is an electrode with two 1 cm contacts with a 2 mm intervening gap.
- another example of an electrode that can be used in the present methods is a 2 or 3 branched electrode to cover the target site.
- a high-density ECoG electrode array is used to record electrical signals from neural activity associated with attempted speech and/or spelling by a subject.
- a high-density ECoG electrode array may comprise at least 100 electrodes, at least 128 electrodes, at least 196 electrodes, at least 256 electrodes, at least 294 electrodes, at least 500 electrodes, or at least 1000 electrodes, or more.
- the electrode center-to-center spacing in a high-density ECoG electrode array ranges from 250 ⁇ m to 4 mm, including any electrode center-to-center spacing within this range such as 250 ⁇ m, 300 ⁇ m, 350 ⁇ m, 400 ⁇ m, 500 ⁇ m, 550 ⁇ m, 600 ⁇ m, 650 ⁇ m, 700 ⁇ m, 800 ⁇ m, 900 ⁇ m, 1 mm, 1.5 mm, 2 mm, 2.5 mm, 3 mm, 3.5 mm, or 4 mm.
- a high-density ECoG micro- electrode array is used.
- ECoG micro-electrode arrays may comprise electrodes having a diameter of 250 ⁇ m or less, 230 ⁇ m or less, or 200 ⁇ m or less, including electrodes having a diameter ranging from 150 ⁇ m to 250 ⁇ m, including any diameter within this range such as 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 ⁇ m.
- electrodes having a diameter of 250 ⁇ m or less, 230 ⁇ m or less, or 200 ⁇ m or less including electrodes having a diameter ranging from 150 ⁇ m to 250 ⁇ m, including any diameter within this range such as 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 ⁇ m.
- each electrode may also vary depending upon such factors as the number of electrodes in the array, the location of the electrodes, the material, the age of the patient, and other factors.
- each electrode has a size (e.g., a diameter) of about 5 mm or less, such as about 4 mm or less, including 4 mm-0.25 mm, 3 mm-0.25 mm, 2 mm-0.25 mm, 1 mm-0.25 mm, or about 3 mm, about 2 mm, about 1 mm, about 0.5 mm, or about 0.25 mm.
- the method further comprises mapping the brain of the subject to optimize positioning of an electrode.
- Positioning of an electrode is optimized to detect brain activity features associated with attempted speech by the subject and to achieve optimal decoding of attempted speech. For example, patterns of electrical signals in specific frequency ranges (e.g., alpha, delta, beta, gamma, and/or high gamma) may be used for detecting attempted speech and/or spelling and decoding words, phrases, or sentences intended by the subject.
- electrodes may be positioned to optimize detection and/or decoding of brain activity in specific frequency ranges to restore communication to a subject who has a communication disorder.
- the methods and systems of the present disclosure may include recording brain activity, for example, electrical activity in the ventral sensorimotor cortex, where patterns of gamma-frequency neural activity associated with words, phrases, and sentences of attempted speech may be detected.
- electrical activity from a plurality of locations in the ventral sensorimotor cortex may be measured.
- electrical activity in the high gamma frequency range such as 70 Hz to 150 Hz
- the low frequency range such as 0.3 Hz to 100 Hz
- electrical activity in the high gamma frequency range (such as 70 Hz to 150 Hz) and the low frequency range (such as 0.3 Hz to 100 Hz) may be measured from the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- Detection of brain activity may be performed by any method known in the art.
- functional brain imaging of neural activity may be carried out by electrical methods such as electrocorticography (ECoG), electroencephalography (EEG), stereoelectroencephalography (sEEG), magnetoencephalography (MEG), single photon emission computed tomography (SPECT), as well as metabolic and blood flow studies such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), functional near-infrared spectroscopy (fNIRS), and time-domain functional near-infrared spectroscopy.
- EoG electrocorticography
- EEG electroencephalography
- sEEG stereoelectroencephalography
- MEG magnetoencephalography
- SPECT single photon emission computed tomography
- metabolic and blood flow studies such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), functional near-infrared spectroscopy (fNIRS), and time-domain functional near-infrared spectroscopy.
- the precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region are mapped to determine optimal positioning for electrodes to detect neural activity associated with attempted speech and/or attempted spelling.
- One or more of these regions may be implanted with a neural recording device comprising electrodes to measure electrical signals from neural activity associated with attempted speech and/or attempted spelling.
- electrical activity in one or more locations in the brain may be measured not only during attempted speech or attempted spelling but also during a period extending from just prior to attempted speech or attempted spelling (i.e., period of preparation for speech or spelling) to a period just after attempted speech or spelling (i.e., rest period after attempted speech or spelling).
- Assessment of the accuracy of the decoding of speech or spelling from neural activity at a particular site may be determined by comparing decoded words to the intended words of the patient. For example, the patient may communicate the correct intended words using an assistive typing device. Both detection of the onset and offset of speech events and word/letter classification accuracy from decoding neural activity may be evaluated.
- Application of the method may include a prior step of selecting a patient for implantation with a neural recording device based on need as determined by clinical assessment of the severity of the communication disorder and the desire for assistance with communication, and may also include cognitive assessment, anatomical assessment, behavioral assessment and/or neurophysiological assessment.
- Patients who have difficulty with communication may be implanted with a neural recording device to assist communication, as described herein.
- An interface capable of communication with a computing device is implanted in the cranium or placed on the head of the subject to provide an externally accessible platform through which brain electrical signals can be acquired from the neural recording device and transmitted to a data processor for decoding.
- the interface comprises a percutaneous pedestal connector anchored in the cranium of the subject.
- the interface can be connected, for example, to a computing device such as a computer or a handheld computing device (e.g., cell phone or tablet) with a detachable digital connector and cable.
- the interface may be connected to a computing device wirelessly.
- the interface comprises a first wireless communication unit in communication with a computing device comprising a second wireless communication unit.
- the first wireless communication unit utilizes a wireless communication protocol using an electromagnetic carrier wave (e.g., a radio wave, microwave, or an infrared carrier wave) or ultrasound to transfer data from the interface to the computing device comprising the second wireless communication unit.
- an electromagnetic carrier wave e.g., a radio wave, microwave, or an infrared carrier wave
- ultrasound to transfer data from the interface to the computing device comprising the second wireless communication unit.
- the processor may be provided by a computer or a handheld computing device (e.g., cell phone or tablet) programmed to decode the attempted speech and/or attempted spelling from the recorded brain electrical signal data.
- Analyzing the recorded brain electrical activity may comprise the use of an algorithm or classifier.
- a machine learning algorithm is used to automate speech detection, letter classification (in the case of attempted spelling), word classification, and sentence decoding from analysis of recorded brain activity during attempted speech or spelling.
- the machine learning algorithm may comprise a supervised learning algorithm.
- supervised learning algorithms may include Average One-Dependence Estimators (AODE), Artificial neural network (e.g., artificial neural network comprising a stack of long short-term memory (LSTM) layers), Bayesian statistics (e.g., Naive Bayes classifier, Bayesian network, Bayesian knowledge base), Case-based reasoning, Decision trees, Inductive logic programming, Gaussian process regression, Group method of data handling (GMDH), Learning Automata, Learning Vector Quantization, Minimum message length (decision trees, decision graphs, etc.), Lazy learning, Instance-based learning Nearest Neighbor Algorithm, Analogical modeling, Probably approximately correct (PAC) learning , Ripple down rules, a knowledge acquisition methodology, Symbolic machine learning algorithms, Subsymbolic machine learning algorithms, Support vector machines, Random Forests, Ensembles of classifiers, Bootstrap aggregating (bagging), and Boosting.
- AODE Average One-Dependence Estimators
- Artificial neural network e.g., artificial neural network comprising a
- Supervised learning may comprise ordinal classification such as regression analysis and Information fuzzy networks (IFN).
- supervised learning methods may comprise statistical classification, such as AODE, Linear classifiers (e.g., Fisher's linear discriminant, Logistic regression, Naive Bayes classifier, Perceptron, and Support vector machine), quadratic classifiers, k-nearest neighbor, Boosting, Decision trees (e.g., C4.5, Random forests), Bayesian networks, and Hidden Markov models.
- the machine learning algorithms may also comprise an unsupervised learning algorithm.
- unsupervised learning algorithms may include artificial neural network, Data clustering, Expectation-maximization algorithm, Self-organizing map, Radial basis function network, Vector Quantization, Generative topographic map, Information bottleneck method, and IBSEAD.
- Unsupervised learning may also comprise association rule learning algorithms such as Apriori algorithm, Eclat algorithm and FP-growth algorithm.
- Hierarchical clustering such as Single-linkage clustering and Conceptual clustering, may also be used.
- unsupervised learning may comprise partitional clustering such as K-means algorithm and Fuzzy clustering.
- the machine learning algorithms comprise a reinforcement learning algorithm. Examples of reinforcement learning algorithms include, but are not limited to, temporal difference learning, Q-learning and Learning Automata.
- the machine learning algorithm may comprise Data Pre-processing.
- the machine learning algorithm may use deep learning. Deep learning (e.g., deep neural networks, deep belief networks, graph neural networks, recurrent neural networks and convolutional neural networks) may be supervised, semi-supervised or unsupervised.
- the machine learning algorithm uses artificial neural network (ANN) models for the speech detection and the word/letter classification and natural language processing techniques such as, but not limited to, a hidden Markov model (HMM) or a Viterbi decoding model for the sentence decoding.
- ANN artificial neural network
- the processor is programmed to use a speech detection model to determine the probability that attempted speech or spelling is occurring at any time point during recording of neural activity and/or detect onset and offset of attempted speech or spelling during recording of the neural activity.
- Linear models or non-linear (e.g., artificial neural network (ANN)) models may be used to automate speech detection.
- ANN artificial neural network
- a deep learning model is used for speech detection, in particular, to automate detection of onset and offset of word production during attempted speech by the subject or letter production during attempted spelling by the subject.
- the processor may be programmed to further assign speech event labels for preparation, speech/spelling, and rest to time points during the recording of the brain electrical signal data.
- the recorded brain electrical signal data within a time window around the detected onset of attempted speech/spelling is used for word classification or letter classification.
- Word classification may utilize a machine learning algorithm to automate identification of neural activity patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production during attempted speech by the subject.
- Letter classification may utilize a machine learning algorithm to automate identification of neural activity patterns of electrical signals in the recorded brain electrical signal data associated with attempted letter production during attempted spelling by the subject.
- a series of go cues is provided to the subject indicating when the subject should initiate attempted spelling of each letter of the words of an intended sentence.
- the series of go cues are provided visually on a display. Each go cue may be preceded by a countdown to the presentation of the go cue, wherein the countdown for the next spelled letter is provided visually on the display and automatically started after each go cue. For example, during the spelling procedure, the participant spells out the intended message throughout letter-decoding cycles. In each cycle, the participant is visually presented with a countdown and eventually a go cue. At the go cue, the participant attempts to silently say a desired letter.
- the series of go cues are provided with a set interval of time between each go cue, which may be adjustable by the user.
- the processor is programmed to use the recorded brain electrical signal data within a time window following a go cue.
- the processor is programmed to use a word classification model to decode words in a detected time window of neural activity (e.g., time window identified by the speech detection model as occurring during attempted speech or spelling).
- the word classification model is used to determine the probability that the subject intended a particular word in the attempted speech across possible speech/text targets. For example, for each word in a vocabulary of possible words that the user can say, the word classification model determines probabilities that the neural activity was collected as the user attempted to say that word.
- the word classification model may use linear models or non-linear (e.g., ANN) models.
- the processor is programmed to use a letter classification model to determine the probability that the subject intended a particular letter during the attempted spelling across all possible characters (i.e., letters of an alphabet or numbers) of the language used by a subject.
- the processor is further programmed to constrain word classification from sequences of letters decoded from neural activity associated with attempted spelling of words by the subject to only words within a vocabulary of a language used by the subject.
- the processor is programmed to use a word sequence decoding model to decode sentences based on word-sequence probabilities to determine the most likely sequence of words associated with detected speech events from the corresponding neural activity of the subject during attempted speech or spelling.
- the word sequence decoding model uses the sequence of probabilities from the classification model to construct a decoded sequence. This can involve using language models to incorporate a priori character-sequence or word-sequence probabilities into the neural decoding pipeline. It can also involve hidden Markov modeling (HMM) or Viterbi decoding models to handle incorporation of probabilities from the language model(s). This can use linear models or non-linear (e.g. ANN) models.
- the processor is also programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities, wherein words that occur more frequently are assigned more weight than words that occur less frequently according to the language model.
- decoded information from previous detected speech events may be used to aid decoding. See Examples for a detailed discussion of the speech detection model, word classification model, and language model used to decode attempted speech from neural activity.
- the subject may be instructed to limit attempted speech to words from a predefined vocabulary (i.e., word set). The number of words included is preferably large enough to create a meaningful variety of sentences but small enough to enable satisfactory neural-based classification performance.
- word classification from neural activity the subject is instructed to attempt to produce each word contained in the word set to determine the pattern of electrical signals associated with each word. Exploratory, preliminary assessments with the subject following device implantation may be used to evaluate the selection of words and the size of the word set that can be readily decoded and used to assist communication by the methods described herein.
- the word set comprises up to 50 words, up to 100 words, up to 200 words, up to 300 words, up to 400 words, or up to 500 words, or more.
- the word set may include 50 words, 55 words, 60 words, 65 words, 70 words, 75 words, 80 words, 85 words, 90 words, 95 words, 100 words, 125 words, 150 words, 175 words, 200 words, 225 words, 250 words, 275 words, 300 words, 325 words, 350 words, 375 words, 400 words, 500 words, 600 words, 700 words, 800 words, 900 words, 1000 words, or any number of words in between.
- the word set comprises: am, are, bad, bring, clean, closer, comfortable, coming, computer, do, faith, family, feel, glasses, going, good, goodbye, have, hello, help, here, hope, how, hungry, I, is, it, like, music, my, need, no, not, nurse, okay, outside, please, right, success, tell, that, they, thirsty, tired, up, very, what, where, yes, and you.
- the attempted speech of the subject may include any chosen sequence of words of the selected word set. In other embodiments, the attempted speech of the subject is further limited to a predefined sentence set that uses only words of the selected word set.
- the word set and sentence set may be selected to include sentences that can be used to communicate with a caregiver regarding tasks the subject wishes the caregiver to perform.
- sentence classification from neural activity the subject is instructed to attempt to produce each sentence contained in the sentence set while the neural activity of the subject is processed and decoded into text.
- a processor connected to the interface is programmed to calculate the probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech.
- the processor is programmed to calculate the probability of many possible sentences composed entirely of words from the specified word set as being the intended sentence that the subject tried to produce during the attempted speech.
- the processor is programmed to maintain the most likely sentence as well as other, less likely sentences composed entirely of words from the specified word set that the subject tried to produce during the attempted speech.
- the processor is programmed to maintain the first, second, and third most likely sentence possibilities at any given point in time.
- the most likely sentence may change.
- the second most likely sentence based on processing of a word event could then become the most likely sentence after one or more additional word events are processed.
- the sentence set comprises up to 25 sentences, up to 50 words, up to 100 sentences, up to 200 sentences, up to 300 sentences, up to 400 sentences, or up to 500 sentences, or more.
- the sentence set may include 50 sentences, 100 sentences, sentences 200 sentences, 300 sentences, 400 sentences, 500 sentences, 600 sentences, 700 sentences, 800 sentences, 900 sentences, 1000 sentences, or any number of words in between.
- the sentence set comprises: Are you going outside; Are you tired; Bring my glasses here; Bring my glasses please; Do not feel bad; Do you feel comfortable; Faith is good; Hello how are you; Here is my computer; How do you feel; How do you like my music; I am going outside; I am not going; I am not hungry; I am not okay; I am okay; I am outside; I am thirsty; I do not feel comfortable; I feel very comfortable; I feel very hungry; I hope it is clean; I like my nurse; I need my glasses; I need you; It is comfortable; It is good; It is okay; It is right here; My computer is clean; My family is here; My family is outside; My family is very comfortable; My glasses are clean; My glasses are comfortable; My nurse is outside; My nurse is right outside; No; Please bring my glasses here; Please clean it; Please tell my family; That is very clean; They are coming here; They are coming outside; They are going outside; They have faith; What do you do; Where is it; Yes; and You are not right.
- the attempted speech of the subject comprises spelling out words of intended messages.
- the attempted speech targets may include the alphabet of any language (such as English) and/or code words representing letters of the alphabet (e.g. NATO code words such as alpha, bravo, etc.).
- Character probabilities can be determined by classification of the speech targets (which can use linear or non-linear (e.g., ANN) models) and processed using sequence decoding techniques (e.g., language modeling, hidden Markov modeling, Viterbi decoding, etc.) to decode full sentences from the brain activity.
- sequence decoding techniques e.g., language modeling, hidden Markov modeling, Viterbi decoding, etc.
- Non-speech motor movements may include, without limitation, imagined head, arm, hand, foot, and leg movements.
- Non-speech motor movements can be used in any fashion that is beneficial to the user. For example, decoding of non-speech motor movements from neural activity could be used to control a mouse cursor or otherwise interact with other devices, control error correction methods in a text decoding interface, or select high-level commands to control the system (such as “end-of- sentence” or “return to main menu” commands).
- a classification model may be used to identify a motor command (e.g., an imagined hand movement), which could be used to indicate to the system that the user is initiating or ending attempted speech or spelling out of an intended message.
- decoding of attempted spelling may enable a larger vocabulary to be used than for decoding of attempted speech.
- decoding of attempted speech may be easier and more convenient for the subject, as it allows faster, direct word decoding, which may be preferred to express frequently used words.
- attempted non-speech motor movements may be used to signal a subject is initiating or ending attempted speech or spelling out of an intended message.
- the system may include a) a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech and/or attempted spelling and/or attempted non-speech motor movement by the subject; b) a processor programmed to decode a sentence from the recorded brain electrical signal data; c) an interface in communication with a computing device, said interface adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and d) a display component for displaying the sentence decoded from the recorded brain electrical signal data.
- a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech and/or attempted spelling and/or attempted non-speech motor movement by the subject
- a processor programmed to decode a sentence from the recorded brain electrical signal data
- electrical activity in the high gamma frequency range such as 70 Hz to 150 Hz
- low frequency range e.g., 0.3 Hz to 100 Hz
- the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor.
- the processor may run programming for decoding letters, words, phrases, or sentences from the recorded brain electrical signal data using one or more algorithms, as described herein.
- a computer implemented method is used for decoding a sentence from recorded brain electrical signal data associated with attempted speech by a subject.
- the processor may be programmed to perform steps of the computer implemented method comprising: a) receiving the recorded brain electrical signal data associated with the attempted speech by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted speech is occurring at any time point and detect onset and offset of word production during the attempted speech by the subject; c) analyzing the brain electrical signal data using a word classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject and calculates predicted word probabilities; d) performing sentence decoding by using the calculated word probabilities from the word classification model in combination with predicted word sequence probabilities in the sentence using a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to calculate predicted word sequence probabilities and determining the most likely sequence of words in the sentence based on the predicted word probabilities determined using the word classification model
- a computer implemented method is used for decoding a sentence from recorded brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by a subject.
- the processor may be programmed to perform steps of the computer implemented method comprising: a) receiving the recorded brain electrical signal data associated with the attempted spelling of letters of words of an intended sentence by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted spelling is occurring at any time point and detect onset and offset of letter production during the attempted spelling by the subject; c) analyzing the brain electrical signal data using a letter classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted letter production by the subject and calculates a sequence of predicted letter probabilities; d) computing potential sentence candidates based on the sequence of predicted letter probabilities and automatically inserting spaces into the letter sequences between predicted words in the sentence candidates, wherein decoded words in the letter sequences are constrained to only words within a vocabulary of a language used by the subject; e)
- a computer implemented method is used for decoding a sentence from recorded brain electrical signal data associated with attempted speech and attempted spelling by a subject.
- the system may be used not only for decoding speech or spelling information from neural activity collected during attempted speech or attempted spelling, but also for decoding attempted non-speech motor movements from recorded neural activity.
- Non-speech motor movements may include, without limitation, imagined head, arm, hand, foot, and leg movements. Non-speech motor movements can be used in any fashion that is beneficial to the user.
- decoding of non-speech motor movements from neural activity could be used to control a mouse cursor or otherwise interact with other devices, control error correction methods in a text decoding interface, or select high-level commands to control the system (such as “end-of-sentence” or “return to main menu” commands).
- a classification model may be used to identify a motor command (e.g., an imagined hand movement), which could be used to indicate to the system that the user is initiating or ending attempted speech or spelling out of an intended message.
- the computer implemented method further comprises: receiving recorded brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted speech or attempted spelling of words of an intended sentence or to control an external device; and analyzing the brain electrical signal data using a classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the computer implemented method further comprises storing a user profile for the subject comprising information regarding the patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model a hidden Markov model
- the subject is limited to a specified word set for the attempted speech.
- the processor is further programmed to calculate a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set, and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech.
- the attempted speech of the subject may include any chosen sequence of words of the selected word set.
- the subject is limited to a specified sentence set for the attempted speech.
- the processor is further programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech.
- the processor is programmed to calculate the probability of many possible sentences composed entirely of words from the specified word set as being the intended sentence that the subject tried to produce during the attempted speech.
- the processor is programmed to maintain the most likely sentence as well as one or more less likely sentences composed entirely of words from the specified word set that the subject tried to produce during the attempted speech.
- the processor is programmed to track the first, second, and third most likely sentence possibilities at any given point in time.
- the processor is further programmed to assign event labels for preparation, speech/spelling (full words, letters, or any other speech target), non- speech motor movement, and rest to time points during the recording of the brain electrical signal data.
- the processor is further programmed to use the recorded brain electrical signal data within a time window around the detected onset of word or letter classification. For example, the processor may be programmed to use the recorded brain electrical signal data from 1 second before the detected onset up to 3 seconds after the detected onset for word or letter classification.
- the processor is further programmed to assign more weight to words that occur more frequently than words that occur less frequently according to the language model.
- the recorded brain electrical signal data may be processed in various ways before decoding.
- data processing may include, without limitation, real-time sample-by- sample processing of neural feature streams, the use of common-average referencing across individual electrode channels, the use of finite impulse response (FIR) filters to perform digital signal filtering, a running sliding-window normalization procedure, e.g., using Welford’s method, automatic artifact rejection, and parallelization and linear pipelining to improve computational efficiency.
- Processing of neural features may be performed in real-time to extract one or more feature streams for use during speech/text decoding.
- the disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, a data processing apparatus.
- the computer readable medium can be a machine- readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or any combination thereof.
- a computer program also known as a program, software, software application, script, or code
- a computer program does not necessarily correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- the system for performing the computer implemented method, as described may include a computer containing a processor, a storage component (i.e., memory), a display component, and other components typically present in general purpose computers.
- the storage component stores information accessible by the processor, including instructions that may be executed by the processor and data that may be retrieved, manipulated or stored by the processor.
- the storage component includes instructions.
- the storage component includes instructions for decoding a sentence from recorded brain electrical signal data associated with attempted speech and/or attempted spelling by a subject.
- the computer processor is coupled to the storage component and configured to execute the instructions stored in the storage component in order to receive brain electrical signal data associated with attempted speech by the subject and analyze the data according to one or more algorithms, as described herein.
- the display component displays the sentence decoded from the recorded brain electrical signal data.
- the storage component may be of any type capable of storing information accessible by the processor, such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, USB Flash drive, write-capable, and read-only memories.
- the processor may be any well- known processor, such as processors from Intel Corporation. Alternatively, the processor may be a dedicated controller such as an ASIC or an FPGA.
- the instructions may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor. In that regard, the terms "instructions,” “steps” and “programs" may be used interchangeably herein.
- the instructions may be stored in object code form for direct processing by the processor, or in any other computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
- Data may be retrieved, stored or modified by the processor in accordance with the instructions.
- the data may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files.
- the data may also be formatted in any computer-readable format such as, but not limited to, binary values, ASCII or Unicode.
- the data may comprise any information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information which is used by a function to calculate the relevant data.
- the processor and storage component may comprise multiple processors and storage components that may or may not be stored within the same physical housing. For example, some of the instructions and data may be stored on removable CD-ROM and others within a read-only computer chip. Some or all of the instructions and data may be stored in a location physically remote from, yet still accessible by, the processor. Similarly, the processor may comprise a collection of processors which may or may not operate in parallel.
- the system also includes an interface capable of communication with a computing device.
- the interface may be implanted in the cranium or placed on the head of the subject to provide an externally accessible platform through which brain electrical signals can be acquired from the neural recording device and transmitted to a computing device for decoding.
- the interface comprises a percutaneous pedestal connector anchored in the cranium of the subject.
- the interface can be connected, for example, to a computing device such as a computer or a handheld computing device (e.g., cell phone or tablet) with a detachable digital connector and cable.
- the interface may be connected to a computing device wirelessly.
- the interface comprises a first wireless communication unit in communication with a computing device comprising a second wireless communication unit.
- the first wireless communication unit utilizes a wireless communication protocol using an electromagnetic carrier wave (e.g., a radio wave, microwave, or an infrared carrier wave) or ultrasound to transfer data from the interface to the computing device comprising the second wireless communication unit.
- an electromagnetic carrier wave e.g., a radio wave, microwave, or an infrared carrier wave
- ultrasound to transfer data from the interface to the computing device comprising the second wireless communication unit.
- Brain-computer interfaces are commercially available, including the NeuroportTM system from Blackrock Microsystems (Salt Lake City, Utah), See also, e.g., Weiss et al. (2019) Brain-Computer Interfaces 6:106-117; herein incorporated by reference. [00198] Components of systems for carrying out the presently disclosed methods are further described in the examples below. KITS [00199] Kits are also provided for carrying out the methods described herein.
- the kit comprises software for carrying out the computer implemented methods for decoding a sentence from recorded brain electrical signal data associated with attempted speech and/or attempted spelling by a subject, as described herein.
- the kit comprises a system for assisting a subject with communication as described herein.
- Such a system may comprise: a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the subject to record brain electrical signal data associated with attempted speech and/or attempted spelling and/or non-speech motor movement by the subject; a processor programmed to decode a sentence from the recorded brain electrical signal data according to a computer implemented method described herein; an interface capable of communication with a computing device, said interface adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and a display component for displaying the sentence decoded from the recorded brain electrical signal data.
- kits may further include (in certain embodiments) instructions for practicing the subject methods.
- These instructions may be present in the subject kits in a variety of forms, one or more of which may be present in the kit.
- instructions may be present as printed information on a suitable medium or substrate, e.g., a piece or pieces of paper on which the information is printed, in the packaging of the kit, in a package insert, and the like.
- Another form of these instructions is a computer readable medium, e.g., diskette, compact disk (CD), flash drive, and the like, on which the information has been recorded.
- Yet another form of these instructions that may be present is a website address which may be used via the internet to access the information at a removed site.
- the methods, devices, and systems of the present disclosure find use in assisting individuals with communication.
- methods, devices, and systems are provided for decoding words and sentences directly from neural activity of an individual.
- cortical activity from a region of the brain involved in speech processing is recorded while an individual attempts to say or spell out words of an intended sentence.
- Deep learning computational models are used to detect and classify letters/words from the recorded brain activity.
- Decoding of speech from brain activity is aided by use of a language model that predicts how likely certain sequences of words are to occur.
- decoding of attempted non-speech motor movements from neural activity can be used to further assist communication.
- the methods, devices, and systems disclosed herein may be used to assist individuals who have difficulty with communication caused by conditions and diseases including, without limitation, anarthria, strokes, traumatic brain injuries, brain tumors, amyotrophic lateral sclerosis, multiple sclerosis, Huntington's disease, Niemann-Pick disease, Friedreich's ataxia, Wilson's disease, cerebral palsy, Guillain-Barré syndrome, Tay-Sachs disease, encephalopathy, central pontine myelinolysis, and other conditions causing dysfunction or paralysis of the muscles of the head, neck, or chest resulting in anarthria.
- the methods disclosed herein may be used to restore communication to such individuals and improve autonomy and quality of life.
- a method of assisting a subject with communication comprising: positioning a neural recording device comprising an electrode at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech by the subject; positioning an interface in communication with a computing device at a location on the head of the subject, wherein the interface is connected to the neural recording device; recording the brain electrical signal data associated with attempted speech by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor of the computing device; and decoding a word, a phrase, or a sentence from the recorded brain electrical signal data using the processor.
- 5. The method of any one of aspects 1-4, wherein the electrode is positioned on a surface of the sensorimotor cortex region or within the sensorimotor cortex region. 6.
- the neural recording device comprises a brain-penetrating electrode array.
- the neural recording device comprises an electrocorticography (ECoG) electrode array.
- EoG electrocorticography
- the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features.
- the electrical signal data comprises neural oscillations in a range from 70 Hz to 150 Hz. 12.
- any one of aspects 1-11 wherein said recording the brain electrical signal data comprises recording the brain electrical signal data from a sensorimotor cortex region selected from a precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, posterior superior frontal gyrus, or posterior inferior frontal gyrus region, or any combination thereof.
- the method of any one of aspects 1-12 further comprising mapping the brain of the subject to identify an optimal location for positioning the electrode for recording the brain electrical signals associated with the attempted speech by the subject.
- the interface comprises a percutaneous pedestal connector attached to the subject's cranium.
- the interface further comprises a headstage connected to the percutaneous pedestal connector.
- the processor is provided by a computer or handheld device. 17.
- the handheld device is a cell phone or a tablet.
- the processor is programmed to automate speech detection, word classification, and sentence decoding based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with attempted word production.
- the processor is programmed to use a machine learning algorithm for speech detection, word classification, and sentence decoding.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model a natural language processing technique
- any one of aspects 24-26 wherein the word set comprises am, are, bad, bring, clean, closer, comfortable, coming, computer, do, faith, family, feel, glasses, going, good, goodbye, have, hello, help, here, hope, how, hungry, I, is, it, like, music, my, need, no, not, nurse, okay, outside, please, right, success, tell, that, they, thirsty, tired, up, very, what, where, yes, and you. 28.
- the method of any one of aspects 1-27 wherein the subject may use the words of the word set without limitation to create sentences. 29.
- the processor is programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech.
- any one of aspects 1-29 wherein the processor is programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities.
- a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities.
- 31. The method of aspect 30, wherein words that occur more frequently are assigned more weight than words that occur less frequently according to the language model.
- 32 is programmed to use a Viterbi decoding model to determine the most likely sequence of words in the intended speech of the subject given the brain electrical signal data associated with the attempted speech, the predicted word probabilities from the word classification model using the machine learning algorithm, and the word sequence probabilities using the language model.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- a computer implemented method for decoding a sentence from recorded brain electrical signal data associated with attempted speech by a subject comprising: a) receiving the recorded brain electrical signal data associated with the attempted speech by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted speech is occurring at any time point during recording of the brain electrical signal data and detect onset and offset of word production during the attempted speech by the subject; c) analyzing the brain electrical signal data using a word classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject and calculates predicted word probabilities; d) performing sentence decoding by using the calculated word probabilities from the word classification model in combination with predicted word sequence probabilities in the sentence using a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to calculate predicted word sequence probabilities and determining the most likely sequence of words in the sentence based on the predicted word probabilities determined using the word classification model and the language model; and e) displaying the sentence de
- the computer implemented method of aspect 42 further comprising calculating a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech.
- the computer implemented method of any one of aspects 39-43 wherein the subject may use the words of the word set without limitation to create sentences or is limited to a specified sentence set for the attempted speech.
- the computer implemented method of any one of aspects 39-44 further comprising calculating a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech. 46.
- the computer implemented method of aspect 45 further comprising maintaining the most likely sentence and one or more less likely sentences and recalculating the probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech after decoding of each word.
- the computer implemented method of aspect 46 wherein the most likely sentence and the one or more less likely sentences are composed only of words from the word set used by the subject for the attempted speech.
- 48. The computer implemented method of any one of aspects 39-47, further comprising assigning speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- 49. The computer implemented method of aspect 48 wherein only the recorded brain electrical signal data within a time window around the detected onset of word classification is used. 50.
- the computer implemented method of any one of aspects 39-49 wherein more weight is assigned to words that occur more frequently than words that occur less frequently according to the language model.
- 51. The computer implemented method of any one of aspects 39-50, further comprising storing a user profile for the subject comprising information regarding the patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject. 52.
- the computer implemented method of any one of aspects 39-51 further comprising: receiving recorded brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted speech or to control an external device; and analyzing the brain electrical signal data using a classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non- speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze. 55.
- a non-transitory computer-readable medium comprising program instructions that, when executed by a processor in a computer, causes the processor to perform the method of any one of aspects 39-55.
- a kit comprising the non-transitory computer-readable medium of aspect 56 and instructions for decoding brain electrical signal data associated with attempted speech by a subject.
- a system for assisting a subject with communication comprising: a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech or an attempted non-speech motor movement by the subject; a processor programmed to decode a sentence from the recorded brain electrical signal data according to the computer implemented method of any one of aspects 39-55; an interface in communication with a computing device, said interface adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and a display component for displaying the sentence decoded from the recorded brain electrical signal data. 59.
- the system of aspect 58 wherein the subject has difficulty with said communication because of anarthria, a stroke, a traumatic brain injury, a brain tumor, or amyotrophic lateral sclerosis.
- 60. The system of aspect 58 or 59, wherein the location of the neural recording device is in the ventral sensorimotor cortex.
- 61. The system of any one of aspects 58-60, wherein the electrode is adapted for positioning on a surface of the sensorimotor cortex region or within the sensorimotor cortex region.
- the electrode is adapted for positioning on a surface of the sensorimotor cortex region of the brain in a subdural space.
- the neural recording device comprises a brain-penetrating electrode array.
- the neural recording device comprises an electrocorticography (ECoG) electrode array.
- EoG electrocorticography
- 65 The system of any one of aspects 58-64, wherein the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features.
- 67 The system of aspect 66, wherein the electrical signal data comprises neural oscillations in a range from 70 Hz to 150 Hz. 68.
- the interface comprises a percutaneous pedestal connector attached to the subject's cranium.
- the interface further comprises a headstage that is connectable to the percutaneous pedestal connector.
- the processor is provided by a computer or handheld device.
- the handheld device is a cell phone or tablet.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model a natural language processing technique
- a natural language processing technique is used for the sentence decoding.
- the processor is further programmed to assign speech event labels for preparation, speech, and rest to time points during the recording of the brain electrical signal data.
- the processor is further programmed to use the recorded brain electrical signal data within a time window around the detected onset of word classification.
- the subject is limited to a specified word set for the attempted speech. 77.
- processor is further programmed to calculate a probability that a word of the word set is an intended word that the subject tried to produce during the attempted speech for every word of the word set and select the word of the word set having the highest probability of being the intended word that the subject tried to produce during the attempted speech. 78.
- the system of aspect 76 or 77, wherein the word set comprises: am, are, bad, bring, clean, closer, comfortable, coming, computer, do, faith, family, feel, glasses, going, good, goodbye, have, hello, help, here, hope, how, hungry, I, is, it, like, music, my, need, no, not, nurse, okay, outside, please, right, success, tell, that, they, thirsty, tired, up, very, what, where, yes, and you. 79.
- the system of any one of aspects 76-78, wherein the subject may use any chosen sequence of words of the selected word set. 80.
- the system of aspect 79 wherein the processor is programmed to calculate a probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech.
- the system of aspect 80 wherein the processor is programmed to maintain the most likely sentence and one or more less likely sentences and recalculate the probability that a sequence of words is an intended sentence that the subject tried to produce during the attempted speech after decoding of each word.
- the most likely sentence and the one or more less likely sentences are composed only of words from the word set used by the subject for the attempted speech.
- any one of aspects 58-82 wherein the processor is further programmed to automate detection of an attempted non-speech motor movement of the subject signaling the initiation or termination of the attempted speech by the subject based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement.
- the processor is further programmed to assign event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data.
- a kit comprising the system of any one of aspects 58-84 and instructions for using the system for recording and decoding brain electrical signal data associated with attempted speech by a subject. 86.
- a method of assisting a subject with communication comprising: positioning a neural recording device comprising an electrode at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by the subject; positioning an interface in communication with a computing device at a location on the head of the subject, wherein the interface is connected to the neural recording device; recording the brain electrical signal data associated with said attempted spelling by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to a processor of the computing device; and decoding the spelled words of the intended sentence from the recorded brain electrical signal data using the processor.
- the method of aspect 86 wherein the subject has difficulty with said communication because of anarthria, a stroke, a traumatic brain injury, a brain tumor, or amyotrophic lateral sclerosis.
- 88. The method of aspect 86 or 87, wherein the subject is paralyzed.
- 89. The method of any one of aspects 86-88, wherein the location of the neural recording device is in the ventral sensorimotor cortex.
- 90. The method of any one of aspects 86-89, wherein the electrode is positioned on a surface of the sensorimotor cortex region or within the sensorimotor cortex region.
- the neural recording device comprises a brain-penetrating electrode array.
- the neural recording device comprises an electrocorticography (ECoG) electrode array.
- EoG electrocorticography
- the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features and low frequency content features.
- the electrical signal data comprises neural oscillations in a high-gamma frequency range from 70 Hz to 150 Hz and in a low frequency range from 0.3 Hz to 100 Hz. 97.
- any one of aspects 86-96 wherein said recording of the brain electrical signal data comprises recording the brain electrical signal data from a sensorimotor cortex region selected from a precentral gyrus region, a postcentral gyrus region, a posterior middle frontal gyrus region, a posterior superior frontal gyrus region, or a posterior inferior frontal gyrus region, or any combination thereof.
- a sensorimotor cortex region selected from a precentral gyrus region, a postcentral gyrus region, a posterior middle frontal gyrus region, a posterior superior frontal gyrus region, or a posterior inferior frontal gyrus region, or any combination thereof.
- any one of aspects 86-98 wherein the interface comprises a percutaneous pedestal connector attached to the subject's cranium. 100. The method of aspect 99, wherein the interface further comprises a headstage connected to the percutaneous pedestal connector. 101. The method of any one of aspects 86-100, wherein the processor is provided by a computer or handheld device. 102. The method of aspect 101, wherein the handheld device is a cell phone or a tablet. 103. The method of any one of aspects 86-102, wherein the processor is programmed to automate detection of the attempted spelling, letter classification, word classification, and sentence decoding based on identification of a neural activity pattern of electrical signals in the recorded brain electrical signal data associated with the attempted spelling of words by the subject. 104.
- the method of aspect 103 wherein the processor is programmed to use a machine learning algorithm for the speech detection, letter classification, word classification, and sentence decoding.
- the processor is further programmed to constrain word classification from sequences of letters decoded from neural activity associated with attempted spelling of words by the subject to only words within a vocabulary of a language used by the subject.
- the processor is further programmed to assign event labels for preparation, attempted spelling, and rest to time points during the recording of the brain electrical signal data.
- the method of aspect 106 wherein the processor is programmed to use the recorded brain electrical signal data within a time window around the detected onset of attempted spelling of a letter by the subject. 108.
- any one of aspects 86-107 further comprising providing a series of go cues to the subject indicating when the subject should initiate attempted spelling of each letter of the words of the intended sentence.
- the method of aspect 108 wherein the series of go cues are provided visually on a display.
- each go cue is preceded by a countdown to the presentation of the go cue, wherein the countdown for the next spelled letter is provided visually on the display and automatically started after each go cue.
- the method of aspect 111 wherein the subject can control the set interval of time between each go cue. 113.
- the processor is programmed to calculate a probability that a sequence of decoded words from a sequence of decoded letters is an intended sentence that the subject tried to produce during the attempted spelling of letters of words of an intended sentence by the subject. 115.
- any one of aspects 86-114 wherein the processor is programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities.
- the processor is programmed to use a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to aid the decoding by determining predicted word sequence probabilities.
- 116 The method of aspect 115, wherein words that occur more frequently are assigned more weight than words that occur less frequently according to the language model.
- the processor is further programmed to use a sequence of predicted letter probabilities to compute potential sentence candidates and automatically insert spaces into letter sequences between predicted words in the sentence candidates.
- any one of aspects 86-117 further comprising: recording brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted spelling of words of the intended sentence or to control an external device; and analyzing the brain electrical signal data using a classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the method of aspect 119 wherein the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- the method of any one of aspects 118-120 further comprising assigning event labels for the attempted non-speech motor movement to time points during the recording of the brain electrical signal data.
- 122. The method of any one of aspects 86-121, further comprising assessing accuracy of the decoding. 123.
- any one of aspects 86-122 further comprising: recording brain electrical signal data associated with attempted speech by the subject using the neural recording device, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor of the computing device; and decoding a word, a phrase, or a sentence from the recorded brain electrical signal data associated with attempted speech by the subject using the processor. 124.
- a computer implemented method for decoding a sentence from recorded brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by a subject performing steps comprising: a) receiving the recorded brain electrical signal data associated with the attempted spelling of letters of words of an intended sentence by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted spelling is occurring at any time point during the recording of the electrical signal data and detect onset and offset of letter production during the attempted spelling by the subject; c) analyzing the brain electrical signal data using a letter classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted letter production by the subject and calculates a sequence of predicted letter probabilities; d) computing potential sentence candidates based on the sequence of predicted letter probabilities and automatically inserting spaces into the letter sequences between predicted words in the sentence candidates, wherein decoded words in the letter sequences are constrained to only words within a vocabulary of a language used by the subject; e) analyzing the potential sentence candidates using a language model that provides next
- the computer implemented method of aspect 124 wherein the recorded brain electrical signal data is only used within a time window around the detected onset of attempted spelling of a letter by the subject. 126.
- the computer implemented method of aspect 124 or 125 further comprising displaying a series of go cues to the subject indicating when the subject should initiate attempted spelling of each letter of the words of the intended sentence.
- the computer implemented method of any one of aspects 124-130 further comprising: receiving recorded brain electrical signal data associated with an attempted non-speech motor movement of the subject, wherein the subject performs the attempted non-speech motor movement to indicate the initiation or termination of the attempted spelling of words of the intended sentence or to control an external device; and analyzing the brain electrical signal data using a classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with the attempted non-speech motor movement and calculates a probability that the subject attempted the non-speech motor movement.
- the attempted non-speech motor movement comprises an attempted head, arm, hand, foot, or leg movement.
- the method of aspect 132 wherein the attempted hand movement comprises an imagined hand gesture or an imagined hand squeeze.
- the computer implemented method of any one of aspects 124-133 wherein a machine learning algorithm is used for detection of attempted spelling or attempted non-speech motor movement or letter classification. 135.
- the computer implemented method of any one of aspects 124-134 further comprising assigning more weight to words that occur more frequently than words that occur less frequently according to the language model.
- the computer implemented method of any one of aspects 124-139 further comprising decoding a sentence from recorded brain electrical signal data associated with attempted speech by the subject, the computer further performing steps comprising: a) receiving the recorded brain electrical signal data associated with the attempted speech by the subject; b) analyzing the recorded brain electrical signal data using a speech detection model to calculate the probability that attempted speech is occurring at any time point and detect onset and offset of word production during the attempted speech by the subject; c) analyzing the brain electrical signal data using a word classification model that identifies patterns of electrical signals in the recorded brain electrical signal data associated with attempted word production by the subject and calculates predicted word probabilities; d) performing sentence decoding by using the calculated word probabilities from the word classification model in combination with predicted word sequence probabilities in the sentence using a language model that provides next-word probabilities given a previous word or phrase in a sequence of words to calculate predicted word sequence probabilities and determining the most likely sequence of words in the sentence based on the predicted word probabilities determined using the word classification model and the language model; and e)
- 141 The computer implemented method of aspect 140, wherein a machine learning algorithm is used for speech detection and word classification, and sentence decoding.
- 142 The computer implemented method of aspect 141, wherein artificial neural network (ANN) models are used for the speech detection and the word classification, and a hidden Markov model (HMM), a Viterbi decoding model, or a natural language processing technique is used for the sentence decoding.
- ANN artificial neural network
- HMM hidden Markov model
- Viterbi decoding model or a natural language processing technique is used for the sentence decoding.
- a non-transitory computer-readable medium comprising program instructions that, when executed by a processor in a computer, causes the processor to perform the method of any one of aspects 124-142. 144.
- a kit comprising the non-transitory computer-readable medium of aspect 143 and instructions for decoding brain electrical signal data associated with attempted spelling of letters of words of an intended sentence by a subject.
- a system for assisting a subject with communication comprising: a neural recording device comprising an electrode adapted for positioning at a location in a sensorimotor cortex region of the brain of the subject to record brain electrical signal data associated with attempted speech, attempted spelling of letters of words of an intended sentence, or attempted non-speech motor movement by the subject, or a combination thereof; a processor programmed to decode a sentence from the recorded brain electrical signal data according to the computer implemented method of any one of aspects 124-142; an interface in communication with a computing device, said interface adapted for positioning at a location on the head of the subject, wherein the interface receives the brain electrical signal data from the neural recording device and transmits the brain electrical signal data to the processor; and a display component for displaying the sentence decoded from the recorded brain electrical signal data.
- the system of aspect 145 wherein the subject has difficulty with said communication because of anarthria, a stroke, a traumatic brain injury, a brain tumor, or amyotrophic lateral sclerosis.
- the electrode is adapted for positioning on a surface of the sensorimotor cortex region or within the sensorimotor cortex region.
- the electrode is adapted for positioning on a surface of the sensorimotor cortex region of the brain in a subdural space.
- the neural recording device comprises a brain-penetrating electrode array.
- the neural recording device comprises an electrocorticography (ECoG) electrode array.
- EoG electrocorticography
- the electrode is a depth electrode or a surface electrode.
- the electrical signal data comprises high-gamma frequency content features and low frequency content features.
- the electrical signal data comprises neural oscillations in a high-gamma frequency range from 70 Hz to 150 Hz and in a low frequency range from 0.3 Hz to 100 Hz. 155.
- EXAMPLE 1 A SPEECH NEUROPROSTHESIS FOR DECODING WORDS IN A PERSON WITH SEVERE PARALYSIS Introduction
- Anarthria is the loss of the ability to articulate speech. It can result from a variety of conditions, including stroke, traumatic brain injury, and amyotrophic lateral sclerosis [1]. For paralyzed individuals with severe movement impairment, it hinders communication with family, friends, and caregivers, reducing self-reported quality of life [2].
- Advances have been made with typing-based brain-computer interfaces that allow impaired individuals to spell out intended messages using cursor control [3–7]. However, letter-by-letter selection interfaces driven by neural signal recordings can be relatively slow and tedious.
- the neural implant used to acquire brain signals from the participant is a customized hybrid of a high-density ECoG electrode array (PMT Corporation, MN, USA) with a pedestal connector (Blackrock Microsystems, UT, USA).
- the ECoG array consists of 128 flat, disc-shaped electrodes with 4-mm center-to-center spacing.
- the speech sensorimotor cortex was exposed via craniotomy and the array was laid on the surface of the brain in the subdural space.
- the dura was sutured closed, and the cranial bone flap was replaced.
- the percutaneous pedestal connector was placed at a separate site and anchored to the cranium with small titanium screws.
- This pedestal connector is an externally accessible platform through which brain signals can be acquired and transmitted to a computer via a detachable digital connector and cable (FIG.1).
- the participant underwent surgical implantation of the device in early 2019. The procedure was successful, and his recovery was uneventful.
- the electrode coverage enabled sampling from multiple cortical regions that have been implicated in speech processing, including portions of the left precentral gyrus, postcentral gyrus, posterior middle frontal gyrus, and posterior inferior frontal gyrus [8, 10–12].
- Neural data acquisition and real-time processing [00212] Using a digital signal processing unit and peripheral hardware (NeuroPort System, Blackrock Microsystems), signals from all 128 channels of the implant device were acquired and transmitted to a separate computer running custom software for real-time analysis (Supplementary Method S2; FIGS.6 and 7) [16, 21]. On this computer, we measured high gamma activity (neural oscillations in the 70–150 Hz frequency range) for each channel, which we then used in all subsequent analyses and during real-time decoding. Task design [00213] The participant engaged in two tasks: an isolated word task and a sentence task (Supplementary Method S3).
- the participant In each trial of each task, the participant was visually presented with a text target and then attempted to produce (say aloud) that target.
- the participant In the isolated word task, the participant attempted to produce individual words from a set of 50 English words. This word set contained common English words that can be used to create a variety of sentences, including words that are relevant to caregiving and words requested by the participant. In each trial, the participant was presented with one of these 50 words, and, after a brief delay, he attempted to produce that word when presented with a visual go cue.
- the participant In the sentence task, the participant attempted to produce word sequences from a set of 50 English sentences consisting only of words from the 50-word set (Supplementary Methods S4 and S5).
- the speech detector processed each time point of neural activity during a task and detected onsets and offsets of attempted word production events in real time (Supplementary Method S8; FIG.9). We fit this model using only neural data and task timing information from the isolated word task.
- the word classifier predicted a set of word probabilities by processing the neural activity spanning from 1 second before to 3 seconds after the detected onset (Supplementary Method S9; FIG.10). The predicted probability associated with each word in the 50-word set quantified how likely it was that the participant was attempting to say that word during the detected event. We fit this model using neural data from the isolated word task.
- the Viterbi decoder was capable of decoding more plausible sentences than what would result from simply stringing together the predicted words from the word classifier. Evaluations [00221] To evaluate the performance of our decoding pipeline, we analyzed the sentences that were decoded in real time using two metrics: word error rate and words per minute (Supplementary Method S12).
- word error rate of a decoded sentence is defined as the edit distance (the number of word errors in that sentence) divided by the number of words in the target sentence.
- the words per minute metric measures how many words were decoded per minute of neural data. We also measured the latency of our system during real-time decoding.
- the number of detected words was equal to the number of words in the target sentence (FIG. 2C).
- the detected sentence length was at least one word too short in 2.67% of trials and as least one word too long in 5.33% of trials.
- 5 speech events were erroneously detected before the first trial in the block and were excluded from real-time decoding and analysis (all other detected speech events were included).
- mean edit distance decreased when the language model was used (FIG.2D).
- over half of the sentences were decoded without error (80 out of 150 trials; with language modeling; indicated by an edit distance of zero).
- Electrodes contributing to word classification performance were primarily localized to the ventral-most aspect of the ventral sensorimotor cortex (vSMC), with electrodes in the dorsal aspect of the vSMC contributing to both speech detection and word classification performance (FIG.3B). Overall, electrode contributions were more distributed for speech detection than for word classification, with over 50% of the total contributions coming from the top 37 electrodes for the word classifier and the top 50 electrodes for the speech detector.
- Word confusion analysis revealed consistent classification accuracy across the majority of the word targets (FIG.3C; 47.1% mean and 14.5% standard deviation of the classification accuracy along the diagonal of the row-normalized confusion matrix).
- EXAMPLE 2 SUPPLEMENTARY METHODS FOR WORD DECODING The Method S1.
- the participant s assistive typing device Assistive typing device description [00273]
- the participant often uses a commercially available touch-screen typing interface (Tobii Dynavox) to communicate with others, which he controls with a long (approximately 18- inch) plastic stylus attached to a baseball cap by using residual head and neck movement.
- the device displays letters, words, and other options (such as punctuation) that the participant can select with his stylus, enabling him to construct a text string.
- the participant can use his stylus to press an icon that synthesizes the text string into an audible speech waveform.
- This process of spelling out a desired message and having the device synthesize it is the participant’s typical method of communication with his caregivers and visitors.
- Typing rate assessment task design [00274] To compare with the neural-based decoding rates achieved with our system, we measured the participant’s typing rate while he used his typing interface in a custom task. In each trial of this task, we presented a word or sentence on the screen and the participant typed out that word or sentence using his typing interface. We instructed the participant to not use any of the word suggestion or completion options in his interface, but use of correction features (such as backspace or undo options) was permitted.
- the implanted electrocorticography (ECoG) array (PMT Corporation) contains electrodes arranged in a 16-by-8 lattice formation with 4-mm center-to-center spacing.
- the rectangular ECoG array has a length of 6.7 cm, a width of 3.5 cm, and a thickness of 0.51 mm, and the electrode contacts are disc-shaped with 2-mm contact diameters.
- signals were acquired from the ECoG array and processed in several steps involving multiple hardware devices (FIG.6 and FIG.7).
- a headstage (a detachable digital link; Blackrock Microsystems) connected to the percutaneous pedestal connector (Blackrock Microsystems) acquired electrical potentials from the implanted electrode array.
- the pedestal is a male connector and the headstage is a female connector.
- This headstage performed band-pass filtering on the signals using a hardware-based Butterworth filter between 0.3 Hz and 7.5 kHz.
- the digitized signals (with 16-bit, 250-nV per bit resolution) were then transmitted through an HDMI cable to a digital hub (Blackrock Microsystems), which then sent the data through an optical fiber cable to a Neuroport system (Blackrock Microsystems).
- a human patient cable Blackrock Microsystems
- a front-end amplifier Blackrock Microsystems
- This Neuroport system sampled all 128 channels of ECoG data at 30 kHz, applied software-based line noise cancellation, performed anti-aliasing low-pass filtering at 500 Hz, and then streamed the processed signals at 1 kHz to a separate real-time processing machine (Colfax International).
- the Neuroport system also acquired, streamed, and stored synchronized recordings of the relevant acoustics at 30 kHz (microphone input and speaker output from the real-time processing computer).
- the real-time processing computer which is a Linux machine (64-bit Ubuntu 18.04, 48 Intel Xeon Gold 61463.20 GHz processors, 500 GB of RAM), used a custom software package called real-time Neural Speech Recognition (rtNSR) [1, 2] to analyze and process the incoming neural data, run the tasks, perform real-time decoding, and store task data and metadata to disk.
- rtNSR real-time Neural Speech Recognition
- this word set into three disjoint subsets, with two subsets containing 20 words each and the third subset containing the remaining 10 words.
- the participant attempted to produce each word contained in one of these subsets twice, resulting in a total of either 40 or 20 attempted word productions per block (depending on the size of the word subset).
- the participant attempted to produce the 10 words in that subset four times each (instead of the usual two).
- Each trial in a block of this task started with a blank screen with a black background.
- Sentence task [00292] In the sentence task, the participant attempted to produce sentences from a 50- sentence set while his neural activity was processed and decoded into text. These sentences were composed only of words from the 50-word set. These 50 sentences were selected in a semi- random fashion from a corpus of potential sentences (see Method S5). A list of the sentences contained in this 50-sentence set is provided at the end of this section.
- this sentence set into five disjoint subsets, each containing 10 sentences.
- the participant attempted to produce each sentence contained in one of these subsets once, resulting in a total of 10 attempted sentence productions per block.
- Each trial in a block of this task started with a blank screen divided horizontally into top and bottom halves, both with black backgrounds. After two seconds, one of the sentences in the current sentence subset was shown in the top half of the screen in white text. The participant was instructed to attempt to produce the words in the sentence as soon as the text appeared on the screen at the fastest rate that he was comfortably able to.
- a set of cycling ellipses a text string that cycled each second between one, two, and three period characters
- the Viterbi decoding model which maintained the most likely word sequence in a trial given the observed neural activity, often updated its predictions for previous speech events given a new speech event, causing previously decoded words in the feedback text string to change as new information became available.
- the sentence target text turned from white to blue, indicating that the decoding portion of the trial had ended and that the decoded sentence had been finalized for that trial.
- This pre-determined amount of time was either 9 or 11 seconds depending on the block type (see the following paragraph). After 3 seconds, the task continued to the next trial. [00294]
- optimization blocks and testing blocks We collected two types of blocks of the sentence task: optimization blocks and testing blocks.
- this variant of the task instead of being prompted with a target sentence to attempt to repeat, the participant was prompted with a question or statement that mimicked a conversation partner and was instructed to attempt to produce a response to the prompt. Other than the conversational prompts and this change in task instructions to the participant, this variant of the task was identical to the regular version. We did not perform any analyses with data collected from this variant of the sentence task; it was used for demonstration purposes only.
- This variant of the task is shown in FIG.1 in the main text. Word and sentence lists [00296] The 50-word set used in this work is: 1. Am 2. Are 3. Bad 4. Bring 5. Clean 6. Closer 7. Comfortable 8. Coming 9. Computer 10. Do 11. Faith 12. Family 13. Feel 14. Glasses 15. Going 16. Good 17. Goodbye 18. Have 19. Hello 20.
- Method S6 Data organization Isolated word data: Subset creation [00304] In total, we collected 22 hours and 30 minutes of the isolated word task in 291 task blocks across 48 days of recording, with 196 trials (attempted productions) per word (9800 trials total). We split these blocks into 11 disjoint subsets: a single optimization subset and 10 cross-validation subsets. The optimization subset contained a total of 16 trials per word, and each cross-validation subset contained 18 trials per word.
- the speech detector and word classifier were assessed using cross-validation with nine different amounts of training data. Specifically, for each integer value of N ⁇ [1, 9], we performed 10-fold cross-validated evaluation with the isolated word data while only training on N randomly selected subsets in each fold. Through this approach, all of the available trials were evaluated for each value of N even though the amount of training data varied, and there was no overlap between training and testing data in any individual assessment.
- Isolated word data Stability subsets [00309] To assess how stable the signals driving word detection and classification were throughout the study period, we used the isolated word data to define four date-range subsets containing data collected during different date ranges.
- date-range subsets contained data collected 9–18, 18–30, 33–41, and 88–90 weeks post-implantation, respectively. Data collected on the day of the exact 18-week mark was considered to be part of the “Early” subset, not the “Middle” subset.
- Each of these subsets contained 20 trials for each word, randomly drawn (without replacement) from the available data in the corresponding date range. Trials were only sampled from the isolated word cross- validation subsets (not from the optimization subset). In FIG.4 in the main text, the date ranges for these subsets are expressed relative to the start of data collection for this study (instead of being expressed relative to the device implantation date).
- each of these subsets we further split the data into 10 disjoint subsets (referred to in this section as “pieces” to disambiguate these subsets from the four date-range subsets), each containing 2 trials of each word.
- a within-subset scheme Using these four date-range subsets, we defined three evaluation schemes: a within-subset scheme, an across-subset scheme, and a cumulative-subset scheme.
- the within-subset scheme involved performing 10-fold cross-validation using the 10 pieces within each date-range subset. Specifically, each piece in a date-range subset was evaluated using models fit on all of the data from the remaining pieces of that date-range subset.
- the training data used within each individual cross-validation fold for each date-range subset always consisted of 18 trials per word.
- the across-subset scheme involved evaluating the data in a date-range subset using models fit on data from other date-range subsets. In this scheme, the within-subset scheme was replicated, except that each piece in a date-range subset was evaluated using models fit on 6 trials per word randomly sampled (without replacement) from each of the other date-range subsets.
- the training data used within each individual cross-validation fold for each date-range subset always consisted of 18 trials per word.
- the cumulative-subset scheme involved evaluating the data from the “Very late” subset using models fit with varying amounts of data.
- four cross-validated evaluations were performed (using the 10 pieces defined for each date-range subset).
- data from the “Very late” subset were analyzed by the word classifier using 10-fold cross-validation (this was identical to the “Very late” within-subset evaluation).
- the cross-validated analysis from the first evaluation was repeated, except that all of the data from the “Late” subset was added to the training dataset for each cross-validation fold.
- Sentence data [00314] In total, we collected 2 hours and 4 minutes of the sentence task in 25 task blocks across 7 days of recording, with 5 trials (attempted productions) per sentence (250 trials total). We split these blocks into two disjoint subsets: A sentence optimization subset and a sentence testing subset.
- Hyperparameter optimization To find optimal values for the model hyperparameters used during performance evaluation, we used hyperparameter optimization procedures to evaluate many possible combinations of hyperparameter values, which were sampled from custom search spaces, with objective functions that we designed to measure model performance. During each hyperparameter optimization procedure, a desired number of combinations were tested, and the combination associated with the lowest (best) objective function value across all combinations was chosen as the optimal hyperparameter value combination for that model and evaluation type. The data used to measure the associated objective function values were distinct from the data that the optimal hyperparameter values would be used to evaluate (hyperparameter values used during evaluation of a test set were never chosen by optimizing on data in that test set).
- Equation S5 We used the objective function given in Equation S5 to measure the model performance with each hyperparameter value combination. In each detection hyperparameter optimization procedure, we evaluated 1000 hyperparameter value combinations before stopping. [00319] As described in Method S6, we computed speech probabilities for isolated word blocks in each of the 10 cross-validation data subsets using a speech detection model trained on the data from the other 9 cross-validation subsets. To compute speech probabilities for the blocks in the optimization subset, we used a speech detection model trained on data from all 10 of the cross-validation subsets.
- the optimization subset was used as the held-out set while training on data from all 10 cross-validation subsets.
- the second optimization we created a held-out set by randomly selecting (without replacement) 4 trials of each word from blocks collected within three weeks of the online sentence decoding test blocks.
- the training set for this optimization contained all of the isolated word data (from the cross- validation and optimization subsets) except for the trials in this held-out set.
- Speech detection model architecture and training [00327] We used the PyTorch 1.6.0 Python package to create and train the speech detection model [12]. [00328] The speech detection architecture was a stack of three long short-term memory (LSTM) layers with decreasing latent dimension sizes (150, 100, and 50) and a dropout of 0.5 applied at each layer. Recurrent layers are capable of maintaining an internal state through time that can be updated with new individual time samples of input data, making them well suited for real-time inference with temporally dynamic processes [13].
- LSTM long short-term memory
- the speech detection model outputs a distribution of probabilities Q(ln
- yn) over the three possible values of ln from the set of state labels L ⁇ rest, preparation, speech ⁇ .
- the predicted distribution Q implicitly depends on the model parameters.
- P The true distribution of the states, determined by the assigned state labels l.
- N The number of samples.
- y) The cross entropy of the predicted distribution with respect to the true distribution for l. log: The natural logarithm.
- LSTM models are trained with backpropagation through time (BPTT), which unrolls the backpropagation through each time step of processing [15]. Due to the periodicity of our isolated word task structure, it is possible that relying only on BPTT would cause the model to learn this structure and predict events at every go cue instead of trying to learn neural indications of speech events.
- the detection score is a weighted average of frame-level and event-level accuracies for each block.
- the frame-level accuracy measures the speech detector’s ability to predict whether or not a neural time point occurred during speech. Ideally, the speech detector would detect events that spanned the duration of the actual attempted speech event (as opposed to detecting small subsets of each actual speech event, for example).
- frame-level accuracy ⁇ frame as: with the following variable definitions: The positive weight fraction, which we used to control the importance of correctly detecting positive frames (correctly identifying which neural time points occurred during attempted speech) relative to negative frames (correctly identifying which neural time points did not occur during attempted speech).
- F P The number of actual positive frames (the number of time points that were assigned the speech label during data preparation).
- F TP The number of detected true positive frames (the number of time points that were correctly identified as occurring during an attempted speech event).
- F N The number of actual negative frames (the number of time points that were labeled as preparation or rest during data preparation).
- F TN The number of detected true negative frames (the number of time points that were correctly identified as not occurring during an attempted speech event).
- Event-level accuracy ⁇ event as: with the following variable definitions: ETP: The number of true positive detected events (the number of detected speech events that corresponded to an actual word production attempt). EFP: The number of false positive detected events (the number of detected speech events that did not correspond to an actual word production attempt). EFN: The number of false negative events (the number of actual word production attempts that were not associated with any detected event). EP: The number of actual word production attempts (the number of trials). [00340] We calculated event-level accuracy after curating the detected events, which involved matching each trial with a detected event (or the absence of a detected event; see the following section for more details).
- the event-level accuracy ranges from 0 to 1, with a value of 1 indicating that there were no false positive or false negative detected events.
- ⁇ F 0.4 to assign more weight to the event-level accuracy than the frame-level accuracy.
- auxiliary goal to select small values for the time threshold duration hyperparameter.
- We included this auxiliary goal because a large time threshold duration increases the chance of missing shorter utterances and, if the duration is large enough, adds delays to real-time speech detection.
- the objective function used during this hyperparameter optimization procedure can be expressed as: with the following variable definitions: chp ( ⁇ ): The value of the objective function using the hyperparameter value combination ⁇ .
- ⁇ time The penalty applied to the time threshold duration.
- ⁇ time The time threshold duration value, which is one of the three parameters contained in ⁇ .
- ⁇ time 0.00025.
- Word classification model Data preparation for offline training and evaluation [00354] During training and evaluation of the word classifier with the isolated word data, for each trial we obtained the time of the detected onset (if available; determined by the detection curation procedure described in Method S8). During evaluation with each trial, the word classifier predicted the probability of each of the 50 words being the target word that the participant was attempting to produce given the time window of high gamma activity spanning from ⁇ 1 to 3 seconds relative to the detected onset.
- Word classification model architecture and training [00357] We used the TensorFlow 1.14 Python package to create and train the word classification model [22]. [00358] Within the word classification ANN architecture, the neural data was processed by a temporal convolution with a two-sample stride and two-sample kernel size, which further downsampled the neural activity in time while creating a higher-dimensional representation of the data. Temporal convolution is a common approach for extracting robust features from time series data [23]. This representation was then processed by a stack of two bidirectional gated recurrent unit (GRU) layers, which are often used for nonlinear classification of time series data [24].
- GRU gated recurrent unit
- a fully connected (dense) layer with a softmax activation projects the latent dimension from the final GRU layer to probability values across the 50 words.
- Dropout layers are used between each intermediate representation for regularization.
- FIG.10 A schematic depiction of this architecture is given in FIG.10.
- y denote a series of high gamma time windows and w denote a series of corresponding target word labels for those windows, with y n as the time window at index n in the data series and wn as the corresponding label at index n in the label series.
- the word classifier outputs a distribution of probabilities Q(w n
- the predicted distribution Q implicitly depends on the model parameters.
- each word classifier contained an ensemble of 10 ANN models, each with identical architectures and hyperparameter values but with different parameter values (weights) [26].
- each ANN was initialized with random model parameter values and was individually fit using the same training samples, although each ANN processed the samples in a different order during stochastic gradient updates. This process yielded 10 different sets of model parameters.
- the word occurrence frequency weighting function is defined as: where ⁇ is the number of times the target word label ⁇ n occurred in the reference corpus, is the total number of words in the reference corpus, and W is the 50 word set. [00367] We define as: where W denotes the cardinality of the 50-word set (which is equal to 50). Therefore, acts to scale each word frequency in Equation S8 so that the mean word occurrence frequency is 1, which scales the objective function such that the loss value is comparable with the loss value resulting from Equation S6. Method S10.
- the n-grams (represented as tuples) extracted from the sentence “I hope my family is coming” in this approach would be: 1. (I) 2. (Hope) 3. (My) 4. (Family) 5. (Is) 6. (Coming) 7. (I, Hope) 8. (Hope, My) 9. (My, Family) 10. (Family, Is) 11. (Is, Coming) 12. (I, Hope, My) 13. (Hope, My, Family) 14. (My, Family, Is) 15. (Family, Is, Coming) 16. (I, Hope, My, Family) 17. (Hope, My, Family, Is) 18. (My, Family, Is, Coming) 19. (I, Hope, My, Family, Is) 20.
- the language model was trained to yield the conditional probability of any word occurring given the context of that word, which is the sequence of (n ⁇ 1) or fewer words that precede it.
- These probabilities can be expressed as p ( ⁇ i
- the additive smoothing factor is a value that is added to all of the counts k ⁇ 0 prior to normalization, which smooths (reduces the variance of) the probability distribution [27].
- N 3415
- Equation S14 is used to re-normalize the smoothed probabilities so that they sum to 1.
- LMSF language model scaling factor
- each observed state y i is the time window of neural activity at index i within the sequence of detected time windows for any particular trial
- each hidden state qi is the n-gram containing the words that the participant had attempted to produce from the first word to the word at index i in the sequence (FIG.11).
- qi ⁇ ⁇ i, ci ⁇ , where ⁇ i is the word at index i in the sequence and ci is the context of that word (defined in Equation S12; see Method S10).
- the emission probabilities for this HMM are p (yi
- the word classifier provided the probabilities p ( ⁇ i
- transition probabilities for this HMM are p (qi
- q ⁇ 1 can be defined as an empty set, indicating that q 0 is the first word in the sequence.
- Viterbi decoding implementation To predict the words that the participant attempted to produce during the sentence task, we implemented a Viterbi decoding algorithm with this underlying HMM structure.
- the Viterbi decoding algorithm uses dynamic programming to compute the most likely sequence of hidden states given hidden-state prior transition probabilities and observed-state emission likelihoods [33, 34]. To determine the most likely hidden-state sequence, this algorithm iteratively computes the probabilities of various “paths” through the hidden-state sequence space (various combinations of qi values). Here, each of these Viterbi paths was parameterized by a particular path through the hidden states (a particular word sequence) and the probability associated with that path given the neural activity.
- this algorithm created a set of new Viterbi paths by computing, for each existing Viterbi path, the probability of transitioning to each valid new word given the detected time window of neural activity and the preceding words in the associated existing Viterbi path.
- the creation of new Viterbi paths from existing paths can be expressed using the following recursive formula: with the following variable definitions: Vj: The set of all Viterbi paths created after the word production attempt at index j within a sentence trial. v j : A Viterbi path within Vj.
- Each of these Viterbi paths was parameterized by the n- grams (q0, ..., qj) (or, equivalently, the words ( ⁇ 0,..., ⁇ j)) and the log probability of that sequence of words occurring given the neural activity, although these equations only describe the recursive computation of the log probability values (the tracking of the words associated with each Viterbi path is implicitly assumed).
- qj,vk The n-gram qj, containing the word wj and the context of that word. This context is determined from the most recent words within the hidden state sequence of Viterbi path v k .
- qj,vj ⁇ 1) The emission probability specifying the likelihood of the observed neural activity y j given the n-gram q j,vj ⁇ 1 .
- qi ⁇ 1,vi ⁇ 1) The transition probability specifying the prior probability of transitioning to the n-gram q i,vi ⁇ 1, from the n-gram q i ⁇ 1,vi ⁇ 1 .
- L The language model scaling factor, which is a hyperparameter that we used to control the weight of the transition probabilities from the language model relative to the emission probabilities from the word classifier (see Method S7 and Table S1 for a description of the hyperparameter optimization procedure).
- W The 50-word set.
- log The natural logarithm.
- Equation S15 can be simplified to the following equation: where cj,vk is the context of word wj determined from the Viterbi path vk, p ( ⁇ i
- the index i was reset to zero (the first word in each trial was denoted ⁇ 0 ), and any existing Viterbi paths from a previous trial were discarded.
- V ⁇ 1 a singleton set containing a single Viterbi path with the empty set as its hidden state sequence and an associated log probability of zero.
- log probabilities in practice for numerical stability and computational efficiency.
- Viterbi path pruning via beam search [00383] As specified in Equation S16, when new emission probabilities p ( ⁇ i
- Vi ' is the set of all Viterbi paths created after the word production attempt at index i within a sentence trial (before pruning) and ⁇ i,j is the element at index j of a vector created by sorting the Viterbi paths in Vi ' in order of descending log probability (ties are broken arbitrarily during sorting).
- Sentence decoding evaluations [00385] We evaluated the performance of our decoding pipeline (speech detector, word classifier, language model, and Viterbi decoder) using the online predictions made during the sentence task blocks (in the testing subset; see Method S6).
- WER is a commonly used metric to measure the quality of predicted word sequences, computed by calculating the edit (Levenshtein) distance between a reference (target) and decoded sentence and then dividing the edit distance by the number of words in the reference sentence.
- the edit distance measurement can be interpreted as the number of word errors in the decoded sentence (in FIG.2 in the main text, the edit distance is referred to as the “number of word errors” or the “error count”). It is computed as the minimum number of insertions, deletions, and substitutions required to transform the decoded sentence into the reference sentence.
- the example decoded sentence has an edit distance of 1 to the target sentence.
- Insertion I good ⁇ I am good
- Deletion I am very good ⁇ I am good
- Substitution I am going ⁇ I am good
- Lower edit distances and WERs indicate better performance.
- block-level WER as the sum of the edit distances across all of the trials in a test block divided by the sum of the target-sentence word lengths across all trials.
- This approach to measure block-level WER was preferred to simply averaging trial-level WER values because it does not overvalue short sentences compared to long ones. For example, if we simply averaged trial-level WERs to compute a block-level WER, then one error in a trial with the target sentence “I am thirsty” would cause a greater impact on WER than one error in a trial with the target sentence “My family is very comfortable”, which was not a desired aspect of our block-level WER measurement.
- Step 1 Start with an empty word sequence.
- Step 2 Acquire the word probabilities from the language model using the current word sequence as context.
- Step 3 Randomly sample a word from the 50-word set, using the word probabilities in step 2 as weights for the sampling.
- Step 4 Add the word from step 3 to the current word sequence.
- Step 5 Repeat steps 2–4 until the length of the current word sequence is equal to the length of the target sentence for the trial.
- Step 4 Let ⁇ equal this WER value and increment n by 1.
- Step 5 Repeat steps 2–4 until each word in the decoded sentence has been deemed correct or incorrect.
- System latency calculation [00401] To estimate the latency of the decoding pipeline during real-time sentence decoding, we first randomly selected one of the sentence testing blocks to use to compute latencies.
- the computed latencies represented the amount of time the system required to predict the next word in the sequence after obtaining all of the associated neural data that would be required to make that prediction.
- the timing between the video and the result file timestamps were synchronized using a short beep that is played at the start of every block (speaker output was also acquired and stored in the result file during each block; see Method S2). Across all trials, there were 42 decoded words in this block. [00402] Using this approach, we found that the mean latency associated with the real-time word predictions was 4.0 s (with a standard deviation of 0.91 s). Method S13.
- Isolated word evaluations Classification accuracy, cross entropy, and detection errors [00403]
- the word classifier to predict word probabilities from the neural data associated with the word production attempt in each trial.
- We computed these word probabilities using time windows of neural activity associated with curated detected events from the speech detector (see Method S8).
- classification accuracy as the fraction of trials in which the target word was equal to the word with the highest predicted probability.
- cross entropy measures the amount of additional information that would be required to determine the target word identities from the predicted probabilities.
- cross entropy we first obtained the predicted probability of the target word in each trial.
- the cross entropy (in bits) was then calculated as the mean of the negative log (base 2) across all of these probabilities.
- the final set of analyses in this learning curve scheme was equivalent to using all of the available data.
- the speech detector provided curated detected speech events.
- Measuring training data quantities for the learning curve scheme [00405] Because the speech detection and word classification models used different training procedures, we measured the amount of neural data used by each type of model separately for each set of analyses in the learning curve scheme. For each word classifier, we multiplied the number of detected events used to fit the model by 4 seconds (the size of the neural time window used by the classifier).
- each speech detection model was fit with sliding windows to predict individual time points of neural activity, resulting in many more training samples per task block than trials.
- each training sample was a single window from the sliding window training procedure, which corresponded to an individual time point in the task block. Because we used early stopping to prevent overfitting, in practice each speech detector never used all of the data available during model fitting.
- increasing the amount of data available can increase the diversity of the training data (for example, by having data from blocks that were collected across long time periods), which can also affect the number of epochs that the detector is trained for and the robustness of the trained detection model.
- To measure the amount of data available to each speech detector during training we simply divided the number of available training samples by the sampling rate (200 Hz).
- To measure the amount of data that was actually used by each speech detector during training we divided the number of training samples used by the sampling rate. By computing the mean across the 10 folds, we measured the average amount of data available and the average amount that was actually used to fit the speech detector for each set of analyses.
- Electrode contributions (saliences) [00407] To measure how much each electrode contributed to detection and classification performance, we computed electrode contributions (saliences) with the artificial neural networks (ANNs) driving the speech detection and word classification models, respectively. We used a salience calculation method that has been demonstrated with convolutional ANNs during identification of image regions that were most useful for image classification [35]. We have also used this method in our previous work to measure which electrodes were most useful for speech decoding with a recurrent and convolutional ANN [20]. [00408] To compute electrode saliences for each type of ANN, we first calculated the gradient of the loss function for the ANN with respect to the input features.
- the input features were individual time samples of high gamma activity across entire blocks for the speech detector or across detected time windows for the word classifier.
- the mean across blocks or trials of the Euclidean norm values yielding a single salience value for each electrode.
- we normalized each set of electrode saliences so that they summed to 1.
- ITR information transfer rate
- P the mean classification accuracy for the full cross-validation analysis with the isolated word data (from the final set of analyses in the learning curve scheme). This formula makes the following assumptions: [00411] On average, all possible word targets had the same prior probability (that is, the probability independent of the neural data) of being the actual word target in any trial. This is reasonable because there was an equal number of isolated word trials collected for each word target. [00412] The classification accuracy used for P was representative of the overall accuracy of the word classifier (given the amount of training data) and is consistent across trials.
- the resulting correlations are more likely (but not guaranteed) to be indicative of acoustic contamination; for example, spectral power at 300 Hz in the acoustic signal would not be expected to correlate strongly with neural oscillations at that frequency in electrophysiological signals.
- Method S15 To measure training data quantities for this evaluation scheme, we used the same method as the one described in Method S13 to measure training data quantities for the word classifier in the learning curve analyses. Method S15. Statistical testing Word error rate confidence intervals [00426] To compute 95% confidence intervals for the word error rates (WERs), we performed the following steps for each set of results (chance, without language model, and with language model): [00427] 1. Compile the block-level WERs into a single array (with 15 elements, one for each block). [00428] 2. Randomly sample (with replacement) 15 WER values from this array and then compute and store the median WER from these values. [00429] 3. Repeat step 2 until one million median WER values have been computed. [00430] 4.
- step 2 Repeat step 2 until one million mean classification accuracies have been computed.
- step 3 4. Compute the confidence interval as the 2.5 and 97.5 percentiles of the collection of mean classification accuracies from step 3. Supplementary Table S1. Hyperparameter definitions and values.
- the first is the optimal value found when optimizing the detector on the isolated word optimization subset (used to detect word production attempts in the cross-validation subsets for evaluation by the word classifier)
- the second is the optimal value found when optimizing the detector on a subset of the pooled cross-validation subsets (used to detect word production attempts in the isolated word optimization subset for use during hyperparameter optimization of the word classifier)
- the third is the optimal value found during hyperparameter optimization of the decoding pipeline with the sentence optimization subset (the value used during online sentence decoding).
- the optimal value listed was found when optimizing the decoding pipeline with the sentence optimization subset (the value used for online sentence decoding).
- EXAMPLE 3 GENERALIZABLE SPELLING USING A SPEECH NEUROPROSTHESIS IN A PARALYZED PERSON Introduction
- Devastating neurological conditions such as stroke and amyotrophic lateral sclerosis can lead to anarthria, the loss of ability to communicate through speech 1 .
- Anarthric patients can have intact language skills and cognition, but paralysis may inhibit their ability to operate assistive devices, severely restricting communication with family, friends, and caregivers and reducing self-reported quality of life 2 .
- BCIs Brain-computer interfaces have the potential to restore communication to such patients by decoding neural activity into intended messages 3,4 .
- the participant performed spelling tasks in which he spelled out sentences in real time with a 1,152-word vocabulary using attempts to silently say the corresponding alphabetic code words.
- a beam-search algorithm used predicted code-word probabilities from a classification model to find the most likely sentence given the neural activity while automatically inserting spaces between decoded words.
- the participant silently attempted to speak, and a speech-detection model identified this start signal directly from ECoG activity.
- the participant attempted the hand-motor movement to disengage the speller.
- the classification model identified this hand-motor command from ECoG activity, a large neural network-based language model rescored the potential sentence candidates from the beam search and finalized the sentence.
- the participant’s neural activity was recorded from each electrode and processed to simultaneously extract high-gamma activity (HGA; between 70–150 Hz) and low-frequency signals (LFS; between 0.3–100 Hz; FIG.15B).
- HGA high-gamma activity
- LFS low-frequency signals
- FIG.15C a speech-detection model processed each time point of data in the combined feature stream (containing HGA+LFS features; FIG.15C) to detect this initial silent-speech attempt.
- the paced spelling procedure began (FIG. 15D). In this procedure, an underline followed by three dots appeared on the screen in white text. The dots disappeared one by one, representing a countdown.
- the underline turned green to indicate a go cue, at which time the participant attempted to silently say the NATO code word corresponding to the first letter in the sentence.
- the time window of neural features from the combined feature stream obtained during the 2.5-second interval immediately following the go cue was passed to a neural classifier (FIG.15E). Shortly after the go cue, the countdown for the next letter automatically started. This procedure was then repeated until the participant volitionally disengaged it (described later in this section). [00482]
- the neural classifier processed each time window of neural features to predict probabilities across the 26 alphabetic code words (FIG.15F).
- a beam-search algorithm used the sequence of predicted letter probabilities to compute potential sentence candidates, automatically inserting spaces into the letter sequences where appropriate and using a language model to prioritize linguistically plausible sentences.
- the beam search only considered sentences composed of words from a predefined 1,152-word vocabulary, which contained common words that are relevant for assistive-communication applications.
- the most likely sentence at any point in the task was always visible to the participant (FIG.15D).
- the participant was instructed to attempt to squeeze his right hand to disengage the spelling procedure (Fig 15H).
- the neural classifier predicted the probability of this attempted hand-motor movement from each 2.5-second window of neural features, and if this probability was greater than 80%, the spelling procedure was stopped and the decoded sentence was finalized (FIG.15I). To finalize the sentence, sentences with incomplete words were first removed from the list of potential candidates, and then the remaining sentences were rescored with a separate language model. The most likely sentence was then updated on the participant’s screen (FIG.15G). After a brief delay, the screen was cleared and the task continued to the next trial. [00484] To train the detection and classification models prior to real-time testing, we collected data as the participant performed an isolated-target task.
- WER word error rate
- CER character error rate
- WPM words per minute
- CPM characters per minute
- CPM characters per minute
- Electrode contributions for the HGA model were primarily localized to the ventral portion of the grid, corresponding to the ventral sensorimotor cortex (vSMC), pars opercularis, and pars triangularis (FIG.17B).
- Contributions for the LFS model were much more diffuse, covering more dorsal and posterior parts of the grid corresponding to dorsal aspects of the vSMC in the pre- and postcentral gyri (FIG.17D).
- PCA principal component analysis
- each utterance represented a single class, and distances were only computed between utterances of the same type.
- a larger nearest-class distance for a code word or letter indicates that that utterance is more discriminable in neural feature space because the neural activation patterns associated with silent attempts to produce it are more distinct from other code words or letters, respectively.
- WERs Median word error rates (WERs) were 12.4% (99% CI [8.01, 22.7]), 11.1% (99% CI [8.01, 23.1]), and 13.3% (99% CI [7.69, 28.3]), respectively (FIG.20B; WER was 10.53% (99% CI [5.76, 24.8]) for the original vocabulary).
- BCIs spelling brain-computer interfaces
- the implanted ECoG array In addition to enabling spatial coverage over the lateral speech-motor cortical brain regions, the implanted ECoG array also provided simultaneous access to neural populations in the hand-motor (“hand knob”) cortical area that is typically implicated during executed or attempted hand movements 37 . Our approach is the first to combine the two cortical areas to control a BCI.
- participant who was 36 years old at the start of the study, was diagnosed with severe spastic quadriparesis and anarthria by neurologists and a speech-language pathologist after experiencing an extensive pontine stroke. He is fully cognitively intact. Although he retains the ability to vocalize grunts and moans, he is unable to produce intelligible speech, and his attempts to speak aloud are abnormally effortful due to his condition (according to self-reported descriptions). He typically relies on assistive computer-based interfaces that he controls with residual head movements to communicate.
- the neural implant device consisted of a high-density electrocorticography (ECoG) array (PMT) and a percutaneous connector (Blackrock Microsystems).
- the ECoG array contained 128 disk-shaped electrodes arranged in a lattice formation with 4-mm center-to-center spacing.
- the array was surgically implanted on the pial surface of the left hemisphere of the brain over cortical regions associated with speech production, including the dorsal posterior aspect of the inferior frontal gyrus, the posterior aspect of the middle frontal gyrus, the precentral gyrus, and the anterior aspect of the postcentral gyrus 8,10,32 .
- the percutaneous connector was implanted in the skull to conduct electrical signals from the ECoG array to a detachable digital headstage and cable (NeuroPlex E; Blackrock Microsystems), minimally processing and digitizing the acquired brain activity and transmitting the data to a computer.
- the device was implanted in February 2019 without any surgical complications.
- the digital hub then transmitted the digitized signals through an optical fiber cable to a Neuroport system (Blackrock Microsystems), which applied noise cancellation and an anti-aliasing filter to the signals before streaming them at 1 kHz through an Ethernet connection to a separate real- time computer (Colfax International).
- a Neuroport system Blackrock Microsystems
- rtNSR custom Python software package
- the neural classifier further downsampled these feature streams by a factor of 6 before using them for inference (using an anti-aliasing filter with a cutoff frequency at 16.67 Hz), but the speech detector did not.
- [00514] We performed all data collection and real-time decoding tasks in a small office room near the participant’s residence. We uploaded data to our lab’s server infrastructure and trained the decoding models using NVIDIA V100 GPUs hosted on this infrastructure. Additional information regarding the recording hardware, task-setup procedures with the participant, and clinical trial protocol are provided in our previous work 16 .
- Task design [00515] We recorded neural data with the participant during two general types of tasks: an isolated-target task and a sentence-spelling task (FIG.21).
- the sentence-spelling task is described in the start of the Results section and in FIG.15. Briefly, the participant used the full spelling pipeline (described in the following sub- section) to either spell sentences presented to him as targets in a copy-typing task condition or to spell arbitrary sentences in a conversational task condition. We did not implement functionality to allow the participant to retroactively alter the predicted sentence, although the language model could alter previously predicted words in a sentence after receiving additional character predictions. Data collected during the sentence-spelling task were used to optimize beam-search hyperparameters and evaluate the full spelling pipeline. Modeling [00517] We fit detection and classification models using data collected during the isolated-target task as the participant attempted to produce code words and the hand-motor command.
- the speech-detection model was trained using supervised learning and truncated backpropagation through time. For training, we labeled each time point in the neural data as one of four classes depending on the current state of the task at that time: ‘rest’, ‘speech preparation’, ‘motor’, and ‘speech.’ Though only the speech probabilities were used during real- time evaluation to engage the spelling system, the other labels were included during training to help the detection model disambiguate attempts to speak from other behavior. See Method S2 and FIG.23 for further details about the speech-detection model.
- Classifier ensembling for sentence spelling During sentence spelling, we used model ensembling to improve classification performance by reducing overfitting and unwanted modeling variance caused by random parameter initializations 40 . Specifically, we trained 10 separate classification models using the same training dataset and model architecture but with different random parameter initializations. Then, for each time window of neural activity ⁇ ⁇ , we averaged the predictions from these 10 different models together to produce the final prediction ⁇ ⁇ . Incremental classifier recalibration for sentence spelling [00524] To improve sentence-spelling performance, we trained the classifiers used during sentence spelling on data recorded during sentence-spelling tasks from preceding sessions (in addition to data from the isolated-target task).
- CER and WER Performance evaluation Character error rate (CER) and word error rate (WER): [00533] Because CER and WER are overly influenced by short sentences, as in previous studies 6,16 we reported CER and WER as the sum of the character or word edit distances between each of the predicted and target sentences in a sentence-spelling block and then divided this number by the total number of characters or words across all target sentences in the block. Each block contained between two to five sentence trials.
- Characters and words per minute [00535] We calculated the characters per minute and words per minute rates for each sentence-spelling (copy-typing) block as follows: [00536] Here, indexes each trial, denotes the number of words or characters (including whitespace characters) decoded for trial , and denotes the duration of trial ⁇ (in minutes; computed as the difference between the time at which the window of neural activity corresponding to the final code word in trial ⁇ ended and the time of the go cue of the first code word in trial ). Electrode contributions [00537] To compute electrode contributions using data recorded during the isolated-target task, we computed the derivative of the classifier’s loss function with respect to the input features across time as in Simonyan et al.
- N is the number of features (128 for HGA and for LFS; 256 for HGA+LFS), T is the number of time points in each 2.5-second window, and C is the number of NATO code words (26), by concatenating the trial-averaged activity for each feature.
- T is the number of time points in each 2.5-second window
- C is the number of NATO code words (26)
- Brain2Char a deep architecture for decoding text from brain recordings. J. Neural Eng.17, 066015 (2020). [00561] 16. Moses, D. A. et al. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria. N. Engl. J. Med.385, 217–227 (2021). [00562] 17. Adolphs, S. & Schmitt. Lexical Coverage of Spoken Discourse. Appl. Linguist.24, 425–438 (2003). [00563] 18. van Tilborg, A. & Deckers, S. R. J. M. Vocabulary Selection in AAC: Application of Core Vocabulary in Atypical Populations. Perspect. ASHA Spec.
- EXAMPLE 4 PARTICIPANT SURVEY ON OVERT- VERSUS SILENT-SPEECH ATTEMPTS [00589] We asked the participant the following questions about controlling the spelling system using either silent or overt attempts to speak. The participant’s responses are provided after each question. [00590] 1. How long do you think you could comfortably use the spelling system for communication with overt-speech attempts? Response: 15 minutes [00591] 2. How long do you think you could comfortably use the spelling system for communication with silent-speech attempts? Response: 30 minutes [00592] 3. Can you please rank your comfort using the spelling system with overt-speech attempts on a scale from 1–10? Response: 5 [00593] 4.
- EXAMPLE 5 DATA RE-NORMALIZATION [00598]
- a running 30-second z-score on all neural features (see FIG.22).
- the neural activity recorded during the participant’s attempts to squeeze his right hand typically differed in signal magnitude when compared to activity recorded during silent-speech attempts.
- some isolated-target task blocks with only speech content letter and NATO code-word trials
- only attempted hand-movement trials had different neural-feature baselines than isolated-target blocks with both speech and hand-movement trials.
- Isolated-target task [00600] We recorded the participant’s neural activity as he silently (or sometimes overtly) attempted to say prompted utterances or perform prompted motor movements during an isolated-target task. As described in the Methods section of the main text, each trial of the isolated- target task began with the textual presentation of a single speech or motor target on the participant’s screen with 4 dots on either side of the text. These dots disappeared one at a time (simultaneously on each side of the text) at a constant rate, providing task timing to the participant. As the final dot disappeared, the text target turned green, representing a go cue. At this go cue, the participant was instructed to attempt to produce the target.
- Time points between a go cue and 2 seconds after that go cue for attempted hand squeezes were labeled as motor. Time points between the end of the allotted time period for an attempt (1 second after the go cue for speech or 2 seconds for hand-squeezes) and the end of that trial (when the screen cleared for an inter- trial interval) were not trained on. Training data for the speech detector included blocks of the attempted motor isolated target task. For blocks containing only attempted motor movements, time points during attempted motor trials that were not the attempted hand squeeze were ignored. All other time points were labeled as rest. [00605] The speech detector used both low-frequency signals (LFS) and high-gamma activity (HGA) as features at 200 Hz.
- LFS low-frequency signals
- HGA high-gamma activity
- Model architecture and training [00606] We used Python 3.6.6 and PyTorch 1.6.0 to create and train the speech detector [1].
- the speech detector contained a stack of 3 long short-term memory (LSTM) layers with 100, 50, and 50 nodes, respectively.
- the LSTM layers were followed by a single fully connected layer that projected the latent dimensions to probabilities across the four classes (speech preparation, speech, rest, and motor).
- the model processed each time point continuously from the feature stream, outputting a continuous stream of probabilities (one predicted probability vector per neural-feature time point at 200 Hz).
- a schematic of the model is shown in FIG.23.
- Cross-entropy loss is originally defined as: where: • P: The true distribution of the classes, determined by the assigned class labels l. • N: The number of samples. • H P,Q (l
- Equation S1 the cross-entropy loss defined in Equation S1 is redefined as: where wn is the penalty weight for sample n and is defined as: [00609]
- wn the penalty weight for sample n and is defined as: [00609]
- BPTT truncated backpropagation through time
- the speech probabilities were first temporally smoothed using a moving window average. Then, we binarized the smoothed probabilities using a probability threshold. Finally, we “de-bounced” these binarized values by requiring that a change in binary state (from absence of speech to presence of speech, or vice versa) must last for longer than a certain duration of time before the change is deemed a speech onset or offset.
- These 3 parameter values were chosen via hyperparameter optimization and are listed in Table S2.
- Hyperparameter optimization [00612] The hyperparameter optimization process is identical to our previous work [2]. In brief, we used the hyperopt Python package [4] to optimize the 3 detection hyperparameters by minimizing a cost function based on a detection score.
- the detection score is a measure encompassing both how accurately individual time points were predicted as speech or non-speech and how accurately the detector identified attempted-speech events in general.
- the cost function used to optimize the hyperparameters seeks to maximize the detection score while minimizing the time-threshold parameter (because we wanted to minimize the amount of time required to detect a silent-speech attempt.
- Classification model Data preparation [00614] We trained the classifier using data from isolated-target task blocks containing trials of the 26 NATO code words, blocks containing trials of the 26 NATO code words and the attempted right-hand squeeze, and blocks containing a variety of attempted motor movements including the attempted hand squeeze (from which we only used the attempted hand squeeze). For the classifiers used during the feature-type, speech-type, and utterance-set comparisons, only data from isolated-target task blocks were used. [00615] During training of the classifiers for real-time sentence spelling (and associated offline analyses), we also included sentence-spelling (copy-typing) trials in which the decoded sentence had a 0.0 character error rate (CER).
- CER character error rate
- sentence-spelling trials constituted 3.06% of the data for overt-speech attempts (preliminary sentence-spelling trials with overt-speech attempts were collected but not used during evaluation) and 22.7% of the data for silent-speech attempts.
- We never included sentence-spelling trials during classifier training that were recorded during the same session as (or, for associated offline analyses, a proceeding session of ) any trials that were used during testing; classifiers were not recalibrated or updated during an evaluation session.
- Modeling Architecture and training [00618] To model the temporal and spatial dynamics of the participant’s neural activity during silent-speech attempts, we trained artificial neural networks to classify which NATO code word (or the imagined hand squeeze) the participant had produced given a 2.5-second window of neural features after the associated go cue. We used gated-recurrent unit (GRU) layers [5], which have been shown to outperform other recurrent architectures (such as long- short term memory networks) [6] on sequence tasks [7]. [00619] In the classifier, neural features were first processed by a 1-dimensional convolutional layer parameterized by weights W and bias term b.
- GRU gated-recurrent unit
- h n the output of hidden layer n
- h1,j element j of the output of hidden layer 1
- ⁇ denotes the valid cross-correlation operator
- C refers to the number of neural features in the input matrix x i .
- This representation was then passed into a stack of n GRU layers. Each unit was parameterized by Wi, bi, Wh, and bh, which are weights and biases that acted on the input and hidden states, respectively. Portions of each matrix were dedicated to a reset gate r t , an update gate zt, and a new gate nt.
- the GRU decided at each time point how much to update the hidden state from its previous value given the new activity (with the reset function incorporated) using z t .
- Each layer’s output h n is used as the input to the next layer.
- dropout [8] to randomly set elements of hn to 0.0 with probability p dropout , which we determined through hyperparameter optimization.
- p dropout to randomly set elements of hn to 0.0 with probability p dropout , which we determined through hyperparameter optimization.
- bidirectional GRU layers This means that at each GRU, the input was copied, flipped backwards, and then used as an input to the network. This enabled us to learn forward and backward representations and use them as context when predicting class probabilities.
- Augmentations [00629] To bolster classifier performance, we used data augmentations, which have been shown to improve generalization and reduce overfitting for both images [11, 12] and neural activity [13, 14]. The following augmentations were applied sequentially to each trial of neural activity x i during training (but not testing), without changing the associated label y i : 1.
- Time jittering shift the neural features by a time shift ⁇ , such that: where j is a hyperparameter.
- Temporal masking set some time points of the neural features to 0, such that: where t0 is a randomly drawn time point within xi and p is the probability of ⁇ p being one, and the time points being set to 0. Both b and p are hyperparameters. 3.
- Scaling scale the magnitude of the neural features, such that: where ⁇ min and ⁇ max are hyperparameters. 4.
- Additive noise add a matrix of random Gaussian noise to the neural features xi, such that: where ⁇ n is a hyperparameter. 5.
- Channel-wise noise offset the neural features by a value randomly sampled from a Gaussian distribution to each channel c, such that: where ⁇ ch is a hyperparameter and is shared across all features.
- Model pre-training and fine-tuning When training the ensemble of classifiers used for real-time sentence spelling, which were also subsequently used during offline analyses to evaluate the effect of the beam search, the language model, and different vocabulary sizes on the real-time copy-typing results, we first pre-trained models on overt-speech attempts and then fine-tuned them on silent-speech attempts. Specifically, we trained classifiers on an initial dataset containing overt-speech attempts with a learning rate of 10 ⁇ 3 . We split this initial dataset into training and validation sets, and we early stopped models after the accuracy on the validation set did not improve for 5 epochs in a row and reset the model parameters to those corresponding to the highest validation accuracy.
- Hyperparameter optimization For the classifiers, we optimized the number of layers, number of hidden nodes in each layer, kernel size, stride, dropout rate, and augmentation hyperparameters using the Asynchronous Hyperband (ASH) method [15] with the Ray software package. We used the Hyperopt software package to suggest the next set of hyperparameters after each evaluation run [16]. The search space and final values are detailed in S2, and we searched 300 possible sets of hyperparameters.
- X is the set of windows of neural activity x1, ..., x T , p nc (l
- X ) is the probability under the neural classifier of l given X , and p lm (l) is the probability of transcription l under a language-model prior.
- Sentence finalization If the probability of the attempted hand movement (the sentence-finalization command) was greater than 80%, the predicted sentence was finalized. Specifically, we pruned the current list of candidate sentences (from the beam search) to remove sentences that contained incomplete or out-of-vocabulary words.
- p finalized (l) p(l)p gpt2 (l ⁇ ⁇ , (S11)
- p finalized (l) is the finalized probability of sentence l
- p(l) is the probability of the sentence l under equation S10
- pgpt2(l) is the probability of l using Distil-GPT2 [18]
- ⁇ gpt2 is a scaling parameter found through hyperparameter optimization.
- Hyperparameter optimization To find the optimal hyperparameters ⁇ , ⁇ , ⁇ gpt2 , and B, we collected an optimization dataset containing copy-typing sentence-spelling data recorded across 3 sessions to tune these parameters prior to performance evaluation of the spelling system. During these 3 sessions, the participant attempted to spell 35 of the 75 copy-typing sentences. Of these 35 sentences, there were 15 randomly selected sentences that the participant attempted 10 times, 5 sentences that the participant attempted 9 times, and 15 sentences that the participant attempted once. The remaining 40 sentences were unseen by the participant prior to real-time evaluation. We then used these sentences offline to optimize ⁇ , ⁇ , ⁇ gpt2, and B. [00639] Algorithm 1 Constrained beam search.
- Aspace is the same set as A but with the whitespace character appended after each letter (“a ”, “b”, “c ”, ... , “z ”).
- W (l) segment the sequence of characters l at each space and truncate any characters trailing the last space, yielding a list of completed words in l.
- W (l)) give the probability of the last word in l + given the n ⁇ 1 preceding words, enabling the use of an n-gram language model.
- the probability threshold for characters to be considered in the beam search was set to 10 ⁇ 3 .
- B is the beam width (the number of beams used in the beam search).
- No-beam edge case For 3 of the copy-typing sentence-spelling trials recorded during the real-time evaluation sessions, the beam search ran out of valid sentences. This occurred if the participant made a mistake such that no letter sequence that could make valid sentence candidates surpassed the threshold for consideration by the beam search. [00642] On the first day of the real-time evaluation sessions, if this occurred, we would simply output the most likely letters obtained from the neural classifier (without any spaces). Before the second day of real-time evaluation, we modified the beam-search algorithm to output the most likely sentence candidate at that point (immediately before the beam search contained no valid sentence candidates) and then subsequently output the most likely letters obtained from the neural classifier for the remainder of the trial.
- the probability threshold for a letter to be considered in the beam search was set to 10 ⁇ 3.
- the threshold was set to 10 ⁇ 3.
- the beam-search algorithm modified the beam-search algorithm so that if less than 3 letters (and their counterparts with spaces) had probability > 10 ⁇ 3 , we considered the 13 most likely letters (and their counterparts with spaces) to avoid running out of valid beams.
- Section S5. Language modeling n-gram modeling [00643] During the beam-search process, as we were updating each beam with a new character, we used a trigram language model because it was reliable while also being capable of producing predictions more quickly than a large neural network-based language model.
- the basic n-gram formulation is defined as having the probability of a word wk in position k as: where C is a function that counts the number of times each n-gram happens in a corpus.
- C is a function that counts the number of times each n-gram happens in a corpus.
- Improved n-gram modeling can be achieved with back-off and discounting [19]. Back-off refers to using lower-order n-gram models to estimate the probability of higher-order n-grams, since high-order n-grams can be sparse.
- ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ) directly depends on the lower-order n-gram p(w i
- Kneser-Ney smoothing [21]) to improve the unigram model implicit in S13, replacing it with word fertility, which represents the number of distinct context types that a word occurs in.
- word context fertility we can write the following proportion: where w’ is the word fertility and
- V is the set of words in the training vocabulary
- N is the total number of words in the vocabulary
- ⁇ kn is a smoothing hyperparameter that prevents unseen words from having a probability of 0 and infrequent words from being penalized too heavily.
- Sentence-finalization language model [00651] To score sentences after finalization during sentence spelling, we used the DistilGPT-2 neural network-based language model [18], which is based on OpenAI’s GPT-2 language model [24] but has fewer parameters. Supplementary References [00652] 1. Paszke A, Gross S, Massa F, et al. PyTorch: An Imperative Style, High- Performance Deep Learning Library. In: Advances in Neural Information Processing Systems 32. Ed. by Wallach H, Larochelle H, Beygelzimer A, d’Alch ⁇ e-Buc F, Fox E, and Garnett R. Curran Associates, Inc., 2019:8024–35. [00653] 2.
- Kingma DP and Ba J. Adam A method for stochastic optimization.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Surgery (AREA)
- Molecular Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Informatics (AREA)
- Heart & Thoracic Surgery (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Veterinary Medicine (AREA)
- Psychiatry (AREA)
- Artificial Intelligence (AREA)
- Psychology (AREA)
- Physiology (AREA)
- Neurosurgery (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Neurology (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Dermatology (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Electrically Operated Instructional Devices (AREA)
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020237043726A KR20240024095A (ko) | 2021-05-26 | 2022-05-26 | 신경 활동으로부터의 실시간 단어 및 음성 디코딩을 위한 방법들 및 디바이스들 |
| EP22812144.8A EP4329615A4 (en) | 2021-05-26 | 2022-05-26 | Methods and devices for real-time word and speech decoding from neural activity |
| JP2023572722A JP2024521768A (ja) | 2021-05-26 | 2022-05-26 | 神経活動からのリアルタイム単語及び発話復号のための方法及びデバイス |
| US18/561,981 US20240366157A1 (en) | 2021-05-26 | 2022-05-26 | Methods And Devices For Real-Time Word And Speech Decoding From Neural Activity |
| AU2022282378A AU2022282378A1 (en) | 2021-05-26 | 2022-05-26 | Methods and devices for real-time word and speech decoding from neural activity |
| CN202280052326.1A CN117693315A (zh) | 2021-05-26 | 2022-05-26 | 用于从神经活动进行实时单词和语音解码的方法和装置 |
| CA3220064A CA3220064A1 (en) | 2021-05-26 | 2022-05-26 | Methods and devices for real-time word and speech decoding from neural activity |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163193351P | 2021-05-26 | 2021-05-26 | |
| US63/193,351 | 2021-05-26 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2022251472A1 true WO2022251472A1 (en) | 2022-12-01 |
| WO2022251472A9 WO2022251472A9 (en) | 2023-11-09 |
Family
ID=84229189
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2022/031101 Ceased WO2022251472A1 (en) | 2021-05-26 | 2022-05-26 | Methods and devices for real-time word and speech decoding from neural activity |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US20240366157A1 (https=) |
| EP (1) | EP4329615A4 (https=) |
| JP (1) | JP2024521768A (https=) |
| KR (1) | KR20240024095A (https=) |
| CN (1) | CN117693315A (https=) |
| AU (1) | AU2022282378A1 (https=) |
| CA (1) | CA3220064A1 (https=) |
| WO (1) | WO2022251472A1 (https=) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220318501A1 (en) * | 2021-04-02 | 2022-10-06 | Salesforce.Com, Inc. | Methods and systems of answering frequently asked questions (faqs) |
| CN116206150A (zh) * | 2023-01-09 | 2023-06-02 | 阿里巴巴(中国)有限公司 | 任务处理方法、地物分类方法及任务模型训练方法 |
| CN116225222A (zh) * | 2023-02-26 | 2023-06-06 | 北京航空航天大学 | 基于轻量级梯度提升决策树的脑机交互意图识别方法及系统 |
| CN117058514A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 基于图神经网络的多模态脑影像数据融合解码方法和装置 |
| CN117131426A (zh) * | 2023-10-26 | 2023-11-28 | 一网互通(北京)科技有限公司 | 基于预训练的品牌识别方法、装置及电子设备 |
| CN117130490A (zh) * | 2023-10-26 | 2023-11-28 | 天津大学 | 一种脑机接口控制系统及其控制方法和实现方法 |
| CN117238277A (zh) * | 2023-11-09 | 2023-12-15 | 北京水滴科技集团有限公司 | 意图识别方法、装置、存储介质及计算机设备 |
| WO2024036213A1 (en) * | 2022-08-09 | 2024-02-15 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for decoding speech from neural activity |
| CN117708546A (zh) * | 2024-02-05 | 2024-03-15 | 北京智冉医疗科技有限公司 | 基于侵入式脑机接口的高通量神经信号的解码方法及装置 |
| CN117851769A (zh) * | 2023-11-30 | 2024-04-09 | 浙江大学 | 一种面向侵入式脑机接口的汉字书写解码方法 |
| US20240143936A1 (en) * | 2022-10-31 | 2024-05-02 | Zoom Video Communications, Inc. | Intelligent prediction of next step sentences from a communication session |
| CN118095295A (zh) * | 2024-04-28 | 2024-05-28 | 昆明理工大学 | 渐进式预训练和提示增强低资源语言的跨语言摘要方法 |
| CN118095447A (zh) * | 2024-04-12 | 2024-05-28 | 清华大学 | 大语言模型分布式推理方法及装置、介质 |
| US20240264670A1 (en) * | 2023-02-03 | 2024-08-08 | Georgia Tech Research Corporation | Systems and Methods for Determining the Coupling Response of a Non-Linear Variant System |
| US20240398317A1 (en) * | 2023-06-05 | 2024-12-05 | Northwestern University | Method and system to decode speech production from non-frontal, non-post-central brain cortices |
| WO2024254360A1 (en) * | 2023-06-06 | 2024-12-12 | The Regents Of The University Of California | Methods and systems for translation of neural activity into embodied digital-avatar animation |
| CN119402342A (zh) * | 2024-09-24 | 2025-02-07 | 中国南方电网有限责任公司 | 电力通信网的故障定位方法、系统及电子设备 |
| WO2025076530A1 (en) * | 2023-10-06 | 2025-04-10 | Precision Neuroscience Corporation | Systems and methods for visualizing brain activity in real time at high spatial and temporal resolution |
| WO2025080841A1 (en) | 2023-10-12 | 2025-04-17 | The Regents Of The University Of California | Methods for inside-out deployment of microelectrode arrays and devices and systems for same |
| US20250218434A1 (en) * | 2023-12-29 | 2025-07-03 | Cx360, Inc. | Automated prompt finder |
| WO2025235580A1 (en) * | 2024-05-08 | 2025-11-13 | The Regents Of The University Of California | Systems and methods for decoding biosignals of a person indicative of speech |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12469507B2 (en) | 2023-06-14 | 2025-11-11 | Microsoft Technology Licensing, Llc | Predictive context-based decoder correction |
| US20240419895A1 (en) * | 2023-06-14 | 2024-12-19 | Microsoft Technology Licensing, Llc | Context-based decoder correction |
| US12588859B2 (en) * | 2024-03-21 | 2026-03-31 | The Education University Of Hong Kong | System and method for interacting with human brain activities using EEG-fNIRS neurofeedback |
| CN118766414B (zh) * | 2024-04-30 | 2025-05-09 | 中国科学院心理研究所 | 一种基于跨模态脑功能连接图谱的书写能力脑指纹构建方法及系统 |
| CN119691418B (zh) * | 2024-11-26 | 2025-11-14 | 神州数码(中国)有限公司 | 模型评估方法、装置、电子设备和计算机可读存储介质 |
| CN121096350A (zh) * | 2025-09-03 | 2025-12-09 | 哈尔滨工业大学 | 一种基于脑电信号的语音解码方法、系统、设备及介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150338917A1 (en) * | 2012-12-26 | 2015-11-26 | Sia Technology Ltd. | Device, system, and method of controlling electronic devices via thought |
| US20190333505A1 (en) | 2018-04-30 | 2019-10-31 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Decoding Intended Speech from Neuronal Activity |
| US20190351230A1 (en) * | 2014-06-13 | 2019-11-21 | Neuvana, Llc | Transcutaneous electrostimulator and methods for electric stimulation |
| WO2021021714A1 (en) | 2019-07-29 | 2021-02-04 | The Regents Of The University Of California | Method of contextual speech decoding from the brain |
-
2022
- 2022-05-26 WO PCT/US2022/031101 patent/WO2022251472A1/en not_active Ceased
- 2022-05-26 CA CA3220064A patent/CA3220064A1/en active Pending
- 2022-05-26 AU AU2022282378A patent/AU2022282378A1/en active Pending
- 2022-05-26 CN CN202280052326.1A patent/CN117693315A/zh active Pending
- 2022-05-26 US US18/561,981 patent/US20240366157A1/en active Pending
- 2022-05-26 KR KR1020237043726A patent/KR20240024095A/ko active Pending
- 2022-05-26 JP JP2023572722A patent/JP2024521768A/ja active Pending
- 2022-05-26 EP EP22812144.8A patent/EP4329615A4/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150338917A1 (en) * | 2012-12-26 | 2015-11-26 | Sia Technology Ltd. | Device, system, and method of controlling electronic devices via thought |
| US20190351230A1 (en) * | 2014-06-13 | 2019-11-21 | Neuvana, Llc | Transcutaneous electrostimulator and methods for electric stimulation |
| US20190333505A1 (en) | 2018-04-30 | 2019-10-31 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Decoding Intended Speech from Neuronal Activity |
| WO2021021714A1 (en) | 2019-07-29 | 2021-02-04 | The Regents Of The University Of California | Method of contextual speech decoding from the brain |
Non-Patent Citations (4)
| Title |
|---|
| BEUKELMAN, AUGMENTATIVE AND ALTERNATIVE COMMUNICATION, vol. 23, no. 3, 2007, pages 230 - 242 |
| CHRISTIAN HERFF ET AL., BRAIN-TO-TEXT: DECODING SPOKEN PHRASES FROM PHONE REPRESENTATIONS IN THE BRAIN |
| FELGOISE ET AL., AMYOTROPHIC LATERAL SCLEROSIS AND FRONTOTEMPORAL DEGENERATION, vol. 17, no. 3-4, 2016, pages 179 - 183 |
| See also references of EP4329615A4 |
Cited By (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11790169B2 (en) * | 2021-04-02 | 2023-10-17 | Salesforce, Inc. | Methods and systems of answering frequently asked questions (FAQs) |
| US20220318501A1 (en) * | 2021-04-02 | 2022-10-06 | Salesforce.Com, Inc. | Methods and systems of answering frequently asked questions (faqs) |
| WO2024036213A1 (en) * | 2022-08-09 | 2024-02-15 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for decoding speech from neural activity |
| US12530535B2 (en) * | 2022-10-31 | 2026-01-20 | Zoom Communications, Inc. | Intelligent prediction of next step sentences from a communication session |
| US20240143936A1 (en) * | 2022-10-31 | 2024-05-02 | Zoom Video Communications, Inc. | Intelligent prediction of next step sentences from a communication session |
| CN116206150A (zh) * | 2023-01-09 | 2023-06-02 | 阿里巴巴(中国)有限公司 | 任务处理方法、地物分类方法及任务模型训练方法 |
| US20240264670A1 (en) * | 2023-02-03 | 2024-08-08 | Georgia Tech Research Corporation | Systems and Methods for Determining the Coupling Response of a Non-Linear Variant System |
| CN116225222A (zh) * | 2023-02-26 | 2023-06-06 | 北京航空航天大学 | 基于轻量级梯度提升决策树的脑机交互意图识别方法及系统 |
| US20240398317A1 (en) * | 2023-06-05 | 2024-12-05 | Northwestern University | Method and system to decode speech production from non-frontal, non-post-central brain cortices |
| WO2024254360A1 (en) * | 2023-06-06 | 2024-12-12 | The Regents Of The University Of California | Methods and systems for translation of neural activity into embodied digital-avatar animation |
| WO2025076530A1 (en) * | 2023-10-06 | 2025-04-10 | Precision Neuroscience Corporation | Systems and methods for visualizing brain activity in real time at high spatial and temporal resolution |
| CN117058514A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 基于图神经网络的多模态脑影像数据融合解码方法和装置 |
| WO2025080841A1 (en) | 2023-10-12 | 2025-04-17 | The Regents Of The University Of California | Methods for inside-out deployment of microelectrode arrays and devices and systems for same |
| CN117058514B (zh) * | 2023-10-12 | 2024-04-02 | 之江实验室 | 基于图神经网络的多模态脑影像数据融合解码方法和装置 |
| CN117130490B (zh) * | 2023-10-26 | 2024-01-26 | 天津大学 | 一种脑机接口控制系统及其控制方法和实现方法 |
| CN117131426B (zh) * | 2023-10-26 | 2024-01-19 | 一网互通(北京)科技有限公司 | 基于预训练的品牌识别方法、装置及电子设备 |
| US12170081B1 (en) | 2023-10-26 | 2024-12-17 | Tianjin University | Speech brain-computer interface neural decoding systems based on chinese language and implementation methods thereof |
| CN117130490A (zh) * | 2023-10-26 | 2023-11-28 | 天津大学 | 一种脑机接口控制系统及其控制方法和实现方法 |
| CN117131426A (zh) * | 2023-10-26 | 2023-11-28 | 一网互通(北京)科技有限公司 | 基于预训练的品牌识别方法、装置及电子设备 |
| CN117238277A (zh) * | 2023-11-09 | 2023-12-15 | 北京水滴科技集团有限公司 | 意图识别方法、装置、存储介质及计算机设备 |
| CN117238277B (zh) * | 2023-11-09 | 2024-01-19 | 北京水滴科技集团有限公司 | 意图识别方法、装置、存储介质及计算机设备 |
| CN117851769A (zh) * | 2023-11-30 | 2024-04-09 | 浙江大学 | 一种面向侵入式脑机接口的汉字书写解码方法 |
| US20250218434A1 (en) * | 2023-12-29 | 2025-07-03 | Cx360, Inc. | Automated prompt finder |
| CN117708546B (zh) * | 2024-02-05 | 2024-05-10 | 北京智冉医疗科技有限公司 | 基于侵入式脑机接口的高通量神经信号的解码方法及装置 |
| CN117708546A (zh) * | 2024-02-05 | 2024-03-15 | 北京智冉医疗科技有限公司 | 基于侵入式脑机接口的高通量神经信号的解码方法及装置 |
| CN118095447A (zh) * | 2024-04-12 | 2024-05-28 | 清华大学 | 大语言模型分布式推理方法及装置、介质 |
| CN118095295A (zh) * | 2024-04-28 | 2024-05-28 | 昆明理工大学 | 渐进式预训练和提示增强低资源语言的跨语言摘要方法 |
| WO2025235580A1 (en) * | 2024-05-08 | 2025-11-13 | The Regents Of The University Of California | Systems and methods for decoding biosignals of a person indicative of speech |
| CN119402342A (zh) * | 2024-09-24 | 2025-02-07 | 中国南方电网有限责任公司 | 电力通信网的故障定位方法、系统及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20240024095A (ko) | 2024-02-23 |
| CA3220064A1 (en) | 2022-12-01 |
| CN117693315A (zh) | 2024-03-12 |
| AU2022282378A1 (en) | 2023-12-14 |
| JP2024521768A (ja) | 2024-06-04 |
| US20240366157A1 (en) | 2024-11-07 |
| WO2022251472A9 (en) | 2023-11-09 |
| EP4329615A1 (en) | 2024-03-06 |
| EP4329615A4 (en) | 2025-01-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240366157A1 (en) | Methods And Devices For Real-Time Word And Speech Decoding From Neural Activity | |
| Metzger et al. | Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis | |
| Makin et al. | Machine translation of cortical activity to text with an encoder–decoder framework | |
| Moses et al. | Neuroprosthesis for decoding speech in a paralyzed person with anarthria | |
| US20260060621A1 (en) | Detection of disease conditions and comorbidities | |
| Moses et al. | Real-time decoding of question-and-answer speech dialogue using human cortical activity | |
| Livezey et al. | Deep learning as a tool for neural data analysis: speech classification and cross-frequency coupling in human sensorimotor cortex | |
| Bouchard et al. | Functional organization of human sensorimotor cortex for speech articulation | |
| Mora-Cortes et al. | Language model applications to spelling with brain-computer interfaces | |
| Kunz et al. | Inner speech in motor cortex and implications for speech neuroprostheses | |
| WO2024254360A1 (en) | Methods and systems for translation of neural activity into embodied digital-avatar animation | |
| Wu et al. | Adaptive LDA classifier enhances real-time control of an EEG brain–computer interface for decoding imagined syllables | |
| Mendes Junior et al. | Analysis of influence of segmentation, features, and classification in sEMG processing: A case study of recognition of brazilian sign language alphabet | |
| Zhang et al. | A brain-to-text framework for decoding natural tonal sentences | |
| Li et al. | Multimodal brain-computer interfaces: Ai-powered decoding methodologies | |
| Feng et al. | Acoustic inspired brain-to-sentence decoder for logosyllabic language | |
| Li et al. | Brain-to-text decoding with context-aware neural representations and large language models | |
| Tan et al. | Effective phoneme decoding with hyperbolic neural networks for high-performance speech BCIs | |
| Card et al. | Long-term independent use of an intracortical brain-computer interface for speech and cursor control | |
| Alonso-Vázquez et al. | EEG-based classification of spoken words using machine learning approaches | |
| Jiang et al. | Decoding covert speech from EEG using a functional areas spatio-temporal transformer | |
| Wu et al. | Adaptive LDA classifier enhances real-time control of an EEG Brain-computer interface for imagined-speech decoding | |
| Jude et al. | Decoding intended speech with an intracortical brain-computer interface in a person with longstanding anarthria and locked-in syndrome | |
| Wang et al. | Decoding linguistic representations of human brain | |
| Metzger | AI-Driven Brain-Computer Interfaces for Speech |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22812144 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023572722 Country of ref document: JP Ref document number: 3220064 Country of ref document: CA |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022282378 Country of ref document: AU Ref document number: AU2022282378 Country of ref document: AU |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022812144 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022812144 Country of ref document: EP Effective date: 20231130 |
|
| ENP | Entry into the national phase |
Ref document number: 2022282378 Country of ref document: AU Date of ref document: 20220526 Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280052326.1 Country of ref document: CN |