System and Method for Characterizing and Tracking Aging, Resilience, Cognitive Decline, and Disorders using Brain Dynamic Biomarkers Cross Reference to Related Applications [0001] The present application is based on, claims priority to, and incorporates herein by reference in its entirety for all purposes, US Provisional Application Serial No. 63/381,304, filed October 28, 2022. Statement of Government Support [0002] This invention was made with government support under R01 AG056015, R01 AG079345 and R01 AG080678 awarded by the National Institute of Health. The U.S. government has certain rights in the invention. Background [0003] There is significant interest in facilitating research on the use of digital signals and data as ‘biomarkers’ that may flag or signal early changes within individuals at risk of Alzheimer’s Disease (AD) and Related Dementias (ADRD) before cognitive symptoms are evidenced by current cognitive assessment and/or brain imaging biomarkers. Electroencephalograph (EEG) is a compelling candidate for an early "digital biomarker'' of AD. Numerous EEG features are now known to be correlated with AD progression and fundamental biomarkers such as beta-amyloid and tau. Compared to cognitively normal older adults, individuals with mild cognitive impairment (MCI) or AD have smaller resting state alpha (8-12 Hz) and beta (14-30 Hz) rhythms and larger delta (< 4 Hz) and theta (4-7 Hz) oscillations. Slow wave (< 1 Hz) amplitude during non-rapid eye movement (NREM) sleep correlates with beta-amyloid levels, and phase amplitude modulation of spindles by slow waves is correlated with tau levels. Changes in event-related potentials such as the P300 oddball response are also correlated with MCI and AD. [0004] Unfortunately, there is limited evidence that these same EEG measures, as currently constructed to describe population-level data, can accurately track or predict AD progression in individuals. One reason for this is that EEG signals have many sources of with- and between-subject variation that are not accounted for in current analysis methods, leading to imprecise markers that only have sufficient statistical power at the population-level. These sources of error include: 1) significant and inherent temporal fluctuations in the signal markers of interest, 2) significant overlap between brain oscillations at different frequencies
and with noise sources such as electromyogram (EMG) artifact; and 3) between-subject variation in oscillatory frequencies that do not necessarily conform to canonically defined EEG frequency bands. [0005] In another aspect, until recently, AD drug development has been focused largely on compounds that can reduce cerebral amyloid. Unfortunately, the majority of these compounds have failed to reduce cognitive decline in clinical trials. The exception, aducanumab, which recently received FDA approval, was only able to improve cognition in a small subset of patients who were exposed to the highest dose of the drug. This lack of efficacy, combined with significant side effects such as brain swelling that would seem to preclude even higher dosing, has dampened enthusiasm to continue pursuit of beta-amyloid as a drug target. [0006] AD is defined by pathophysiologic processes including accumulation of amyloid-beta, neurofibrillary tau tangles, and neurodegeneration. However, in the early stages of pre- clinical disease progression, patients with significant AD pathology may not show clinically significant changes in cognitive performance. Accordingly, therapies that reduce amyloid levels may not improve cognition. One potential reason for this is that neural circuit function is an intermediary between AD pathology and changes in cognitive performance. Individual patients may have different levels of resilience or functional reserve that can modulate the degree to which AD pathology ultimately leads to a decline in cognitive performance. [0007] Thus, there is a continuing need to facilitate early clinical diagnosis of AD and allow AD-related symptoms and conditions to be correlated with pathophysiology for an individual. Summary [0008] The present disclosure provides systems and methods that overcome the aforementioned drawbacks by employing advanced signal processing methods to enhance the precision and quality of information coming from the EEG, “optimizing,” “repurposing,” and “refining” the EEG compared to previous efforts to use EEG as a tool to detect, characterize, or track conditions, such as ADRD. These digital markers are relevant across stages of AD/ADRD spanning normal cognition, mild cognitive impairment, and AD dementia. Furthermore, systems and methods are provided to track aging, cognition, and resilience, with or without correlation to any neurodegenerative disease or other pathology. [0009] In accordance with one aspect of the present disclosure, a computer-implemented system is provided. The method comprises receiving electroencephalography (EEG) signals from a patient, extracting at least one feature from the EEG signals using a plurality of state-
space models (SSMs), determining an indicator of a neurodegenerative disease using the plurality of SSMs; and generating a report including the indicator of the neurodegenerative disease. [0010] In accordance with another aspect of the present disclosure, a brain dynamic biomarker system is provided. The system comprises a processor that is configured to receive electroencephalography (EEG) signals from a patient, extract at least one feature from the EEG signals using a plurality of state-space models (SSMs), determine an indicator of a neurodegenerative disease, and a display configured to communicate the indicator of the neurodegenerative disease. [0011] These aspects are nonlimiting. Other aspects and features of the systems and methods described herein will be provided below. Brief Description of the Drawings [0012] The foregoing features of embodiments will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which: [0013] FIG. 1 is a block diagram of an example brain dynamic biomarker system, according to aspects of the present disclosure. [0014] FIG.2 is a block diagram of example components that can implement the system for of FIG.1. [0015] FIG. 3A is a flow chart setting forth a non-limiting example of steps of an oscillator model or state-space model process in accordance with the present disclosure. [0016] FIG. 3B is a flow chart setting forth a non-limiting example of steps of a switching state-space model process in accordance with the present disclosure. [0017] FIG. 3C is a flow chart setting forth a non-limiting example of steps of a state-space ERP (SS-ERP) model process in accordance with the present disclosure. [0018] FIG. 4 is a graphical model illustrating how EEG and ERP can assess neural circuit function, a key intermediary between AD pathology and cognitive performance. [0019] FIG.5A is shows individualized decomposition from the oscillator models of resting- state eyes-closed spectra that differ qualitatively and quantitatively across a young (top row), an amyloid negative (Aβ-) (middle row), and an amyloid positive (Aβ+) elderly adult (bottom row).
[0020] FIG. 5B is the computed alpha spectral power with a fixed frequency band 8-12 Hz gives misleading results (left), while individualized oscillators better characterize the PSD values of the underlying alpha oscillations across subjects (right). [0021] FIG.6A shows that switching oscillator models can detect the presence of alpha oscillations and reduce variance in the estimation of alpha power in two subjects. [0022] FIG. 6B shows that there are minimal detectable effects (MDE) on alpha power estimation change with alpha stationarity in the two subjects of FIG.6A. [0023] FIG. 7A shows Individualized alpha power characterized with switching oscillator models shows a trend of decreased alpha in amyloid positive subjects. [0024] FIG.7B shows the fraction of recording containing alpha oscillations appears reduced in amyloid positive subjects. [0025] FIG.8A shows that oscillator models can facilitate the decomposition of noisy data into oscillation and time locked responses in ERP extractions. [0026] FIG. 8B shows that SS-ERP follows the ground truth much better than average ERP with a narrower confidence interval (shaded areas represent 90% confidence interval). [0027] FIG.9A shows the P300 Oddball response in healthy vs. amyloid positive individuals from 42 frequent and 10 oddball trials. Traditional average ERP is confounded by the background oscillation. [0028] FIG.9B shows the P300 Oddball response in healthy vs. amyloid positive individuals from 42 frequent and 10 oddball trials. SS-ERP shows the difference in ERPs, possibly due to underlying pathology. The shaded regions show 90% confidence intervals constructed from posterior variances in SS-ERP and sample variances in avg. ERP. [0029] FIG.9C shows the P300 Oddball response in healthy vs. amyloid positive individuals from 42 frequent and 10 oddball trials. An analysis using Gaussian kernels to represent the ERP is confounded by the background oscillation. [0030] FIG.9D shows the P300 Oddball response in healthy vs. amyloid positive individuals from 42 frequent and 10 oddball trials. KE-ERP shows the difference in ERPs, possibly due to underlying pathology. The shaded regions show 90% confidence intervals constructed from posterior variances in KE-ERP and sample variances in avg. ERP. [0031] FIG.10A shows extracting the 1 Hz slow oscillation in the simulated time-series with oscillator models and bandpass filtering. [0032] FIG.10B shows extracting the 10 Hz alpha oscillation in the simulated time-series with oscillator models and bandpass filtering.
[0033] FIG.11A shows multitaper spectra of oscillators identified using the iterative oscillator algorithm. Alpha oscillations during baseline awake (top) and REM sleep (bottom); [0034] FIG.11B shows multitaper spectra of oscillators identified using the iterative oscillator algorithm sleep spindles during early and late NREM stages. Center frequency parameters of oscillator 1 as learned by the algorithm are marked by the gray vertical line. Slow spindles (top) were extracted at a frontal electrode R3 and the rest (bottom) at a posterior electrode Z9. [0035] FIG.12 shows spectra of the switching oscillator models extracting sleep spindles with tight margins and provide large improvements over traditional switching methods. [0036] FIG.13A shows Phase-Amplitude Coupling (PAC) from a patient with low cognitive performance under anesthesia. Standard approach modulogram. MoCA: Montreal Cognitive Assessment [0037] FIG.13B shows Phase-Amplitude Coupling (PAC) from a patient with low cognitive performance under anesthesia. State-space modulogram. MoCA: Montreal Cognitive Assessment [0038] FIG. 13C is a plot of the phase modulation calculated every 6 seconds. The standard deviation for the phase associated with the higher alpha amplitude is larger with the usual PAC technique compared to the SS PAC. [0039] FIG. 14A shows sleep spindle detection in real NREM recordings. Spindles detected from an example 30-s epoch: Top. Multitaper spectrogram computed with 1-s windows at 0.05- s steps, time-half-bandwidth product of 2, and 3 Slepian tapers; Top-middle. Original time series; Bottom-middle. Time trace filtered to 10 to 16Hz. The posterior model probabilities after variational learning are marked in red. S=Spindle; NS=No spindle; Bottom. Spindles detected by the two benchmark algorithms shown as boxes. [0040] FIG.14B shows Venn diagrams showing numbers of spindles overlapping among the three methods. F1 scores (bottom right) display values for the two nights delimited by vertical lines. [0041] FIG.15 shows that state-space oscillator modeling identified increased delta oscillation (1-4 Hz) amplitude as a strong predictor of cortical amyloid (left), while a proportional measure of slow wave activity failed to identify the same correlation that was reported previously (right). This result clarifies this relation and goes against the hypothesis that amyloid burden correlates with slow oscillations during sleep due to metabolic clearance. The results are more consistent with an abnormal hyperactivation as a result of cortical amyloid impacting neural activity.
[0042] FIG.16A shows that oscillator states can be viewed as phasor, undergoing rotation (1), scaling (2) and addition of noise (3) at every time point, while the real coordinate is observed via a noisy measurement. [0043] FIG.16B shows the Expectation Maximization (EM) algorithm and Iterative oscillator model search. [0044] FIG.16C shows the distributional priors on oscillator parameters. [0045] FIG.16D is a demonstration of iterative oscillator model search. [0046] FIG.17A is a schematic of the switching state space model whereby a discrete hidden Markov chain gates one of the two independent oscillator models to the observation. [0047] FIG.17B is a schematic of the switching state space with a graphical representation of distributional constraint for Variational Bayes Inference. [0048] FIG.17C shows the modified EM Iterations for variational Bayes. [0049] FIG.18 is a flowchart of the variational Bayesian learning as an instance of generalized EM algorithm. ^^ Gaussian SSMs, indexed by
![Figure imgf000008_0004](https://patentimages.storage.googleapis.com/47/16/f3/1a44fa040664e4/imgf000008_0004.png)
is parameterized by and augmented with an HMM parameterized by
The
HMM determines the switching among the Gaussian SSMs to produce observed data. The observations are corrupted by observation noise with covariance ^^ and indexed by
Two variational summary statistics, are introduced to approximate
the true posterior distribution ^^ with a different distribution ^ The E-step requires inference
of the hidden states, achieved through fixed-point iterations that improve the variational approximation incrementally. Once the E-step has stabilized, model parameters are updated in the M-step. The convergence of the inner (E-step) iterations and outer (EM) iterations, index by ^^, is checked using the negative variational free energy
This algorithm outputs 1) posterior estimates of model parameters, state inferences, i.e., the means and covariances
o
f Gaussian SSM hidden states, and 3) estimates of model probabilities of
generating the observation at each time point,
[0050] FIG. 19A is a plot of simulation results: segmentation performance when true parameters are known. An example simulated sequence switching between two AR1 models with different dynamics. The top panel shows the time trace with the two states marked in different colors. The bottom panel shows switching inference results on this example given true parameters of the underlying generative model. Time points estimated to be in the first
model (s
t = 1) are marked in colored dots for each inference method, with accuracy shown in parentheses. [0051] FIG.19B shows histograms of segmentation accuracy across 200 repetitions. The mean segmentation accuracy for each method is displayed and marked by the dashed red line. True = ground truth; Random = random segmentation with a Bernoulli process; Static = static switching method; IMM = interacting multiple models method; VI-A = variational inference with deterministic annealing (orange color); VI-I = variational inference with interpolated densities (blue color). [0052] FIG. 20A is a plot of Simulation results: segmentation performance when true parameters are unknown. Histograms of segmentation accuracy across 200 repetitions of sequences of 200 time points. Since the true parameters were not known, they were estimated using an instance of EM algorithm for VI-A and VI-I inference starting from a random initialization. Static switching and IMM treated those random initialization as true model parameters. The mean segmentation accuracy for each method is displayed and marked by the dashed red line. The mean accuracy previously obtained using true parameters is shown in gray. Random = random segmentation with a Bernoulli process; Static = static switching method; IMM = interacting multiple models method; VI-A EM = variational EM learning with deterministic annealing (orange color); VI-I EM = variational EM learning with interpolated densities (blue color). [0053] FIG.20B shows swarm plots showing the distributions of model parameters learned by the variational learning algorithms for sequences of 200 time points. Uniform distributions used to sample initial parameters are marked in bold fonts on the y-axes, as well as using solid black lines (true values) and dotted gray lines (upper and lower bounds of the ranges). [0054] FIG.20C is a plot of changes of mean segmentation accuracy over sequences of varying data lengths. Shaded bounds denote the standard error of the mean around the average accuracy values. [0055] FIG.20D shows plots of mean parameter estimation errors from the true values across
10 EM iterations for two different data lengths. For transition matrix F and state noise variance Q, normalized error is defined as abs(estimated - true)/true, and averaged across the two switching models. Absolute error is defined as abs(estimated - true). Shaded bounds denote the standard error of the mean around the average error values. [0056] FIG. 21 shows simulation results: example segmentation on data generated from a different switching model class when true parameters are known. The top two panels show the
two sequences, y
1 and y
2 , recorded as a bivariate observation y. Sequence y
2 has a non-zero influence on sequence as shown by the upward arrows, according to a switching state
The time traces are also marked with different colors for the two switching states. The bottom panel shows the results of switching inference on the example data given true parameters. Time points estimated to be in the first model are marked in colored dots for each inference
![Figure imgf000010_0003](https://patentimages.storage.googleapis.com/94/be/32/5ae3c2c88894bd/imgf000010_0003.png)
method, with accuracy shown in parentheses. True = ground truth; Random = random segmentation with a Bernoulli process; Static = static switching method; IMM = interacting multiple models method; VI-A = variational inference with deterministic annealing (orange color); VI-I = variational inference with interpolated densities (blue color). [0057] FIG.22A shows simulation results: segmentation performance on data generated from a different switching model class. Histograms of segmentation accuracy given true parameters across 200 repetitions. [0058] FIG.22B shows simulation results: segmentation performance on data generated from a different switching model class. Histograms of segmentation accuracy when model parameter were unknown across 200 repetitions. In both FIGS. 22A and 22B, the mean segmentation accuracy for each method is displayed and marked by the dashed red line. Random = random segmentation with a Bernoulli process; Static = static switching method; IMM = interacting multiple models method; VI-A = variational inference with deterministic annealing (orange color); VI-I = variational inference with interpolated densities (blue color). VI-A/VI-I EM denote the EM learning algorithms with the corresponding initialization procedure during E- steps. [0059] FIG. 23A shows a plot of simulation results: segmentation performance on switching state-space oscillator models when true model parameters are known. Power spectra of simulated oscillations at different frequencies; spectral estimation obtained via multitaper method that utilized 6 tapers corresponding to a time half-bandwidth product of 4. [0060] FIG. 23B shows a plot of simulation results: segmentation performance on switching state-space oscillator models when true model parameters are known. Changes of mean segmentation accuracy over the number of switching states, as the number of underlying oscillations varies between 2 and 5. Shaded bounds denote the standard error of the mean around the average accuracy values across 200 repetitions. [0061] FIG. 23C shows a plot of simulation results: segmentation performance on switching state-space oscillator models when true model parameters are known. An example switching state path with 5 underlying oscillations and 31 possible switching states.
[0062] FIG. 23D shows a plot of simulation results: segmentation performance on switching state-space oscillator models when true model parameters are known. Estimated switching states with variational inference. VI-A = variational inference with deterministic annealing (orange color); VI-I = variational inference with interpolated densities (blue color). [0063] FIG.24A is a real-world example of comparisons of switching inference algorithms for sleep spindle detection. In the top three panels, the spindle activity is visualized using a spectrogram, the original time trace, and after being bandpass filtered within 10 Hz–16 Hz. The fourth panel shows the posterior model probabilities of s
t = 1 estimated by variational EM learning with interpolated densities (VI-I EM, blue color). The margins of spindle events identified by VI-I EM are also marked with vertical dashed lines. The last two panels display the estimated real (blue) and imaginary (magenta) spindle waveforms with 95% confidence intervals from posterior covariances. The learned spindle center frequency is displayed in blue in parentheses. [0064] FIG.24B is a real-world example of comparisons of switching inference algorithms for sleep spindle detection. Estimated posterior model probabilities of s
t = 1 by other algorithms in comparison. A model probability closer to 1 suggests the presence of spindles (S = Spindle), while being closer to 0 indicates no spindle (NS = No Spindle). The 0.5 model probability is marked with gray horizontal dashed lines. VI-A EM = variational EM learning with deterministic annealing (orange color); Static = static switching method; IMM = interacting multiple models method; S&S 1991 = the Gaussian merging method in Shumway and Stoffer (1991). [0065] FIG. 25A shows the results of an automatic segmentation of sleep spindles using the VI-I EM method. Three 30 s EEG recordings of NREM-2 sleep were segmented with the variational EM learning method with interpolated densities (VI-I EM) to identify spindles in an unsupervised manner. [0066] FIG. 25B shows the results of an automatic segmentation of sleep spindles using the VI-I EM method. Three 30 s EEG recordings of NREM-2 sleep were segmented with the variational EM learning method with interpolated densities (VI-I EM) to identify spindles in an unsupervised manner. [0067] FIG. 25C shows the results of an automatic segmentation of sleep spindles using the VI-I EM method. Three 30 s EEG recordings of NREM-2 sleep were segmented with the variational EM learning method with interpolated densities (VI-I EM) to identify spindles in an unsupervised manner. In FIGS. 25A-25C, the three sub-panels visualize spindle activity using a spectrogram, the original time trace, and the estimated real part of spindle waveform
with 95% confidence intervals from posterior covariances. The learned spindle center frequencies are displayed in blue in parentheses. The estimated posterior model probabilities for the candidate model with both slow oscillations and spindles are overlaid on the time traces in blue lines. Shaded pink bars indicate spindles identified by a wavelet-based method for comparison. S = Spindle; NS = No Spindle. [0068] FIG. 26 is a schematic of a generative structure with parallel switching state-space models. A directed acyclic graph is shown to represent the conditional independence structure between the ^^ real-valued Gaussian hidden state sequences a
![Figure imgf000012_0001](https://patentimages.storage.googleapis.com/a2/cb/3d/43db687bab8fc2/imgf000012_0001.png)
discrete-valued hidden Markov chain and the observed data up to time
In this
generative structure, the observation at a given time point depends only on the hidden states of the ^^ Gaussian models at that point, with the discrete-valued state selecting one of the models to produce the observation, hence the name switching. [0069] FIG.27A is a schematic of a true posterior distribution of switching state-space models. This is the resultant graph encoding the conditional independence relation after conditioning on the observed data ^ ^^
௧^ up to time ^^. The new edges between the hidden states make exact inference intractable. [0070] FIG. 27B is a schematic of an approximate posterior distribution of switching state- space models. Compared to the true posterior, a structured approximation decouples the hidden Gaussian state-space models from each other and from the switching state. On this approximate distribution, efficient closed-form inference can be performed. The marginal distributions of the Gaussian hidden states and the discrete-valued switching state
are now inter-
dependent through variational summary statistics
![Figure imgf000012_0005](https://patentimages.storage.googleapis.com/af/02/f3/c04c979e9df216/imgf000012_0005.png)
[0071] FIG.28A shows a simulation study, where SS-ERP captures the time-locked responses while simultaneously extracting the background oscillations and thus removing their effects on the ERP waveforms in a data-driven manner. [0072] FIG.28B shows a simulation study, where SS-ERP performance remains unchanged as number of trials drops, unlike classical average ERPs. The background oscillations produce spurious peaks in the average ERPs with small number of trials while SS-ERP is relatively immune to such interference due to the explicit modeling of background oscillations [0073] FIG.28C shows a simulation study, where SS-ERP captures the time-locked responses while simultaneously extracting the background oscillations and thus removing their effects on the ERP waveforms in a data-driven manner.
[0074] FIG. 28D shows a simulation study, where SS-ERP performance remains unchanged as number of trials drops, unlike classical average ERPs. The background oscillations produce spurious peaks in the average ERPs with small number of trials while SS-ERP is relatively immune to such interference due to the explicit modeling of background oscillations Detailed Description [0075] The present disclosure provides systems and methods that can be used to improve precision of EEG-derived quantities to enhance sensitivity and specificity for detection of conditions, including aging, cognition, resilience and pathologies, such as early detection of AD/ADRD. The present disclosure provides systems and methods that can employ signal processing methods to analyze high-frequency, time-series data, produced by a single individual to provide accurate personalized diagnostic information. Such methods are distinct from but complementary to “black box” machine learning approaches in that systems employing advanced signal processing methods employing mechanistically-inspired models make it possible to interpret the underlying physiological or pathophysiological processes that are driving a diagnosis, making it possible to identify the most appropriate treatment options for the individual patient in question [0076] In one non-limiting example, Bayesian methods may be utilized in the system and methods described herein. The Bayesian methods described herein enable principled, straightforward procedures for statistical inference that would enhance the reproducibility and reliability of EEG-derived digital biomarkers and facilitate longitudinal tracking and prediction analyses. Formal statistical inference can be a challenge in signal processing analyses. Resampling methods such as the bootstrap are often used when the likelihood or posterior distribution are unavailable, but large amounts of data are needed, which may not be practical to acquire in individual subjects, particularly in clinical or clinical research settings. The Bayesian approach leads to principled, straightforward statistical inference procedures. This feature, particularly when combined with the higher precision and lesser data requirements of the methods described herein, can significantly enhance the utility of EEG-based markers in higher level longitudinal tracking or prediction analyses. These principled inference procedures also enhance the reproducibility and reliability of EEG-based digital biomarkers. Furthermore, the EEG- and signal processing-based methods described herein are highly cost-effective and well-suited for rapid commercial development into tools usable in primary care, home, or residential care facilities.
[0077] As described above, EEG-based methods can often fail due to a multitude of error sources that, in real-world, clinical settings, undermine analysis that is, otherwise, theoretically possible. In a non-limiting aspect, the systems and methods herein can account for these many sources of error using state space models, switching state space models, and employing hierarchical state space models. Application of these systems and methods can enhance the precision of EEG oscillation, ERP, and source localization analyses by, for example, 1.8- to 150-fold. These improvements in precision reduce the minimum detectable effects to enable estimation of EEG-derived markers in individual subjects with far less data than conventional approaches. This facilitates deployment of highly sensitive EEG-based longitudinal tracking and prediction tools for AD progression with a substantially lower time and data burden for patients and caregivers. This, in turn, facilitates deployment of high-precision clinical diagnostic system to detect and track early disease progression using EEG and ERP. [0078] In another non-limiting aspect, a better understanding of AD-related changes in brain dynamics measured by EEG/ERP provides improved biomarkers to track the efficacy of novel AD therapies. On the other hand, growing knowledge of synaptic mechanisms for AD suggest numerous alternative therapeutic approaches. Moreover, there is a growing appreciation that earlier interventions may be more effective. In this context, the systems and methods describe here make it possible to employ EEG/ERP dynamics as neurophysiological biomarkers to assess the efficacy of novel AD therapies. Such biomarkers can significantly aid drug and therapeutic development for AD, and can do so in a cost-effective manner because EEG/ERP is inexpensive compared to other neuroimaging modalities, such as PET. [0079] Further, neural circuit function mediates the relationship between AD pathology and cognitive performance. The EEG and ERPs provide a way of measuring neural circuit function that could explain the “gap” between AD pathology and cognitive symptoms. [0080] The EEG and ERPs provide a way of assessing neural circuit function non-invasively and can do so during the specific cognitive performance tasks used to characterize clinical AD symptoms (e.g., the UDS 3.0 test battery, as described in in the example below). The systems and methods provided herein for acquiring and processing EEG and ERP biomarkers provides access to a crucial intermediary between AD pathology and cognitive performance, making it possible to identify factors that modulate resilience to AD, which in turn could aid in the development of novel therapies to slow AD progression. Furthermore, these markers can be used as clinical diagnostics to detect and track early disease progression. [0081] In another non-limiting aspect, the present disclosure recognizes that sleep orchestrates a symphony of brain rhythms that are directly related to AD pathology. Thus, systems and
methods are provided that can accurately measure and report sleep brain dynamics that relate to underlying AD pathology, or for use in considering numerous clinical and therapeutic applications. During sleep the brain engages in highly stereotyped dynamics that can be readily observed in the EEG data. Distinct components of these stereotyped dynamics are thought to be related to different aspects of sleep, including synaptic homeostasis, memory consolidation, and clearance of metabolic waste, to name a few. During non-rapid eye movement (NREM) sleep, slow waves (0.5 to 4 Hz) oscillations reflect alternating cortical up and down states of neuronal activity that are thought to play a role in re-balancing energy-consuming synaptic potentiation that develops while awake. Slow waves also play a central role in the function of the glymphatic system, signaling a pulsatile wave of metabolic-clearing lymphatic flow with every slow wave cycle. Sleep spindles are transient oscillations between 10 to 16 Hz occurring on the rising phase of slow oscillations that are thought to play a role in transferring information from hippocampus to cortex during memory consolidation. As described herein, changes in these sleep brain dynamics are related, in some cases causally, to the clinical and pathologic trajectory of AD. As such, given the relationship between sleep brain dynamics and AD pathology are highly significant and the growing understanding that specific sleep brain dynamics appear to be related to specific AD mechanisms, systems and methods that can accurately measure and report these dynamic patterns could provide an informative array of AD biomarkers that could be used for numerous purposes including clinical diagnosis, AD prognostication, and development of novel therapeutics. Additionally, the systems and methods provided herein can be used to analyze and generate reports regarding sleep and health as determined from sleep, even apart from a known pathology or even neurodegenerative disease, including AD. [0082] In yet another non-limiting aspect, while slow wave activity measured in the scalp EEG correlates with AD pathology, accurate measurement and reporting of slow wave activity and its frequency-dependent characteristics can be difficult. The systems and methods provided herein can overcome such challenges to provide diagnostically and therapeutically important information on AD pathology at the regional, network, and local levels. Slow wave activity, defined as the amount of power in the slow wave, decreases significantly in proportion to increasing amyloid. There is also evidence of a regional relationship: increased amyloid in prefrontal regions of interest seems most highly correlated with decreased slow wave activity in midline frontal EEG channels. There are multiple mechanistic links between slow wave activity and beta-amyloid. Amyloid increases during wakefulness as a result of increased neuronal activity and decreases during sleep. Slow wave sleep disruption increases CSF beta-
amyloid levels. Clearance of beta-amyloid is dependent on local glymphatic system function that is in turn dependent on slow waves. There is also evidence that slow wave activity decreases in relation to tau. Therefore, there is a local relationship between slow wave amplitude and underlying brain circuit properties including local AD pathology including beta- amyloid and tau levels. There is evidence that slow wave activity has distinct frequency- dependent mechanisms: the slow oscillation (< 1 Hz) component appears to play a specific role in memory consolidation, whereas the higher frequency delta (1 to 4 Hz) component relates to forgetting. Accordingly, slow and delta wave activity have opposing correlations with beta- amyloid levels: slow oscillation power decreases in relation to amyloid, whereas delta power increases in relation to amyloid. Accordingly, specific brain regions or networks may be involved in this frequency-dependent slow wave phenomenon, and that frequency-dependence may relate to underlying local or regional AD pathology. NREM sleep is mediated by noradrenergic activity in the locus coeruleus (LC), which is also known to develop tau pathology at the earliest stages of AD. There may be a relationship between LC AD pathology and changes in NREM sleep. If so, aspects of NREM sleep slow wave activity could be used to make inferences about AD pathology in the LC. The systems and methods provided herein, recognizing the foregoing, provide tools for analyzing and determining information that, otherwise, was unavailable or lost in this regard. [0083] In another non-limiting aspect, sleep spindle activity, frequency, and slow wave coupling appear to be fundamental dynamic indicators of memory consolidation. Recognizing this, the present disclosure provides systems and methods to characterize these spindle dynamics, which can be used provide insight into underlying AD pathology, or other conditions. Sleep spindles are transient oscillations between 10 to 16 Hz occurring on the rising phase of slow oscillations that are thought to play a role in transferring information from hippocampus to cortex during memory consolidation. Patients with MCI and AD have fewer and smaller spindles. Spindles can take on “fast” and “slow” forms that are associated with different scalp distributions that seem to dissociate aging from AD. AD seems to diminish fast spindles (13 to 16 Hz) over parietal scalp EEG channels more so than slow spindles (11 to 13 Hz) that occur over frontal EEG channels that diminish with age. Changes in the cross- frequency coupling relationship (phase-amplitude modulation) between slow waves and spindles are associated with declining memory function in aging adults. Reduced spindle-slow wave coupling is associated with increasing medial temporal lobe tau. Previous studies of the relationship between sleep spindle dynamics and AD pathology have focused on scalp EEG. Given the fundamental role that spindles play in memory consolidation, systems and methods
described herein can be used to characterize spindle activity make determinations about AD pathology. Furthermore, the systems and methods may characterize spindle and slow wave coupling related to AD pathology, given the importance of spindle-slow wave coupling (i.e., cross-frequency or phase-amplitude coupling between slow wave phase and spindle amplitude). The systems and methods described herein make it possible to, for example: 1) identify individual spindles with extreme accuracy using switching state space models; 2) characterize the properties of slow vs. fast spindles in individual spindles; and 3) estimate phase-amplitude modulation (cross-frequency coupling) relationships with 4-fold higher precision compared to conventional methods. The systems and methods provided herein, therefore, provide unique inferences about underlying AD pathology based on sleep spindle dynamics. [0084] In another non-limiting aspect, the present disclosure recognizes that REM sleep brain dynamics are altered during AD and are mediated by cholinergic systems that show AD pathology. The systems and methods described herein can accurately and precisely describe REM dynamics. In one non-limiting example, such REM dynamics can be used to describe underlying AD pathology for diagnostic and therapeutic purposes. The EEG during REM sleep is characterized by “desynchronized” activity that lacks the slow-wave activity seen in NREM sleep. In addition, intermittent bursts of alpha waves (8 to 12 Hz) may be appreciated during REM that appear to be related to the recall of dreaming. During AD, these brain dynamics are disrupted: during MCI, REM desynchronization is impaired and the EEG shows increased low frequency power. In addition, REM alpha activity is reduced in rodent AD and tauopathy models.
19 REM sleep is governed by subcortical cholinergic brainstem structures including the basal forebrain. Accordingly, basal forebrain tau as well as basal forebrain neurodegeneration both occur in AD. Thus changes in REM activity could be used to make inferences about basal forebrain function and tau pathology. [0085] In another non-limiting aspect, a better understanding of AD-related changes in sleep EEG dynamics could provide improved biomarkers to track the efficacy of novel AD therapies. Until recently, AD drug development has been focused largely on compounds that can reduce cerebral amyloid. Unfortunately, the majority of these compounds have failed to reduce cognitive decline in clinical trials. The exception, aducanumab, which recently received FDA approval, was only able to improve cognition in a small subset of patients who were exposed to the highest dose of the drug. This lack of efficacy, combined with significant side effects such as brain swelling that would seem to preclude even higher dosing, has dampened enthusiasm to continue pursuit of beta-amyloid as a drug target. On the other hand, growing
knowledge of synaptic mechanisms for AD suggest numerous alternative therapeutic approaches. Moreover, there is a growing appreciation that earlier interventions may be more effective. In this context, the systems and methods described herein facilitate the use of sleep EEG dynamics as neurophysiological biomarkers to assess the efficacy of novel AD therapies. Such biomarkers can be used to significantly aid drug and therapeutic development for AD, and can do so in a cost-effective manner since the EEG is inexpensive compared to other neuroimaging modalities such as positron emission tomography (PET). [0086] Referring now to FIG.1, an example of a system 100 for determining aging, resilience, dementia, and cognitive decline of a patient using brain dynamic biomarkers obtained from clinical data in accordance with some embodiments of the systems and methods described in the present disclosure is shown. As shown in FIG. 1, a computing device 150 can receive one or more types of data (e.g., EEG) from data source 102. In some embodiments, computing device 150 can execute at least a portion of a brain dynamic biomarker system 104 to determine a brain state from the clinical data source 102. [0087] Additionally or alternatively, in some embodiments, the computing device 150 can communicate information about data received from the data source 102 to a server 152 over a communication network 154, which can execute at least a portion of the brain dynamic biomarker system 104. In such embodiments, the server 152 can return information to the computing device 150 (and/or any other suitable computing device) indicative of an output of the brain dynamic biomarker system 104. [0088] In some embodiments, computing device 150 and/or server 152 can be any suitable computing device or combination of devices, such as a desktop computer, a laptop computer, a smartphone, a tablet computer, a wearable computer, a server computer, a virtual machine being executed by a physical computing device, and so on. [0089] In some embodiments, data source 102 can be any suitable source of clinical data, such as another computing device (e.g., a server storing clinical data). In some embodiments, data source 102 can be local to computing device 150. For example, data source 102 can be incorporated with computing device 150. As another example, data source 102 can be connected to computing device 150 by a cable, a direct wireless link, and so on, Additionally or alternatively, in some embodiments, data source 102 can be located locally and/or remotely from computing device 150, and can communicate data to computing device 150 (and/or server 152) via a communication network (e.g., communication network 154). [0090] In some embodiments, communication network 154 can be any suitable communication network or combination of communication networks. For
example, communication network 154 can include a Wi-Fi network (which can include one or more wireless routers, one or more switches, etc.), a peer-to-peer network (e.g., a Bluetooth network), a cellular network (e.g., a 3G network, a 4G network, etc., complying with any suitable standard, such as CDMA, GSM, LTE, LTE Advanced, WiMAX, etc.), a wired network, and so on. In some embodiments, communication network 154 can be a local area network, a wide area network, a public network (e.g., the Internet), a private or semi-private network (e.g., a corporate or university intranet), any other suitable type of network, or any suitable combination of networks. Communications links shown in FIG. 1 can each be any suitable communications link or combination of communications links, such as wired links, fiber optic links, Wi-Fi links, Bluetooth links, cellular links, and so on. [0091] Referring now to FIG. 2, an example of hardware 200 that can be used to implement data source 102, computing device 150, and server 152 in accordance with some embodiments of the systems and methods described in the present disclosure is shown. As shown in FIG.2, in some embodiments, computing device 150 can include a processor 202, a display 204, one or more inputs 206, one or more communication systems 208, and/or memory 210. In some embodiments, processor 202 can be any suitable hardware processor or combination of processors, such as a central processing unit (“CPU”), a graphics processing unit (“GPU”), and so on. In some embodiments, display 204 can include any suitable display devices, such as a computer monitor, a touchscreen, a television, and so on. In some embodiments, inputs 206 can include any suitable input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, and so on. [0092] In some embodiments, communications systems 208 can include any suitable hardware, firmware, and/or software for communicating information over communication network 154 and/or any other suitable communication networks. For example, communications systems 208 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 208 can include hardware, firmware and/or software that can be used to establish a Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on. [0093] In some embodiments, memory 210 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 202 to present content using display 204, to communicate with server 152 via communications system(s) 208, and so on. Memory 210 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For
example, memory 210 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 210 can have encoded thereon, or otherwise stored therein, a computer program for controlling operation of computing device 150. In such embodiments, processor 202 can execute at least a portion of the computer program to present content (e.g., images, user interfaces, graphics, tables), receive content from server 152, transmit information to server 152, and so on. [0094] In some embodiments, server 152 can include a processor 212, a display 214, one or more inputs 216, one or more communications systems 218, and/or memory 220. In some embodiments, processor 212 can be any suitable hardware processor or combination of processors, such as a CPU, a GPU, and so on. In some embodiments, display 214 can include any suitable display devices, such as a computer monitor, a touchscreen, a television, and so on. In some embodiments, inputs 216 can include any suitable input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, and so on. [0095] In some embodiments, communications systems 218 can include any suitable hardware, firmware, and/or software for communicating information over communication network 154 and/or any other suitable communication networks. For example, communications systems 218 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 218 can include hardware, firmware and/or software that can be used to establish a connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on. [0096] In some embodiments, memory 220 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 212 to present content using display 214, to communicate with one or more computing devices 150, and so on. Memory 220 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 220 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 220 can have encoded thereon a server program for controlling operation of server 152. In such embodiments, processor 212 can execute at least a portion of the server program to transmit information and/or content (e.g., data, images, a user interface) to one or more computing devices 150, receive information and/or content from one or more
computing devices 150, receive instructions from one or more devices (e.g., a personal computer, a laptop computer, a tablet computer, a smartphone), and so on. [0097] In some embodiments, data source 102 can include a processor 222, one or more data acquisition system(s) 224, one or more communications systems 226, and/or memory 228. In some embodiments, processor 222 can be any suitable hardware processor or combination of processors, such as a CPU, a GPU, and so on. [0098] Note that, although not shown, data source 102 can include any suitable inputs and/or outputs. For example, data source 102 can include input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, a trackpad, a trackball, and so on. As another example, data source 102 can include any suitable display devices, such as a computer monitor, a touchscreen, a television, etc., one or more speakers, and so on. [0099] In some embodiments, communications systems 226 can include any suitable hardware, firmware, and/or software for communicating information to computing device 150 (and, in some embodiments, over communication network 154 and/or any other suitable communication networks). For example, communications systems 226 can include one or more transceivers, one or more communication chips and/or chip sets, and so on. In a more particular example, communications systems 226 can include hardware, firmware and/or software that can be used to establish a wired connection using any suitable port and/or communication standard (e.g., VGA, DVI video, USB, RS-232, etc.), Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, and so on. [0100] In some embodiments, memory 228 can include any suitable storage device or devices that can be used to store instructions, values, data, or the like, that can be used, for example, by processor 222 to control the one or more data acquisition system(s) 224, and/or receive data from the one or more data acquisition system(s) 224; to images from data; present content (e.g., images, a user interface) using a display; communicate with one or more computing devices 150; and so on. Memory 228 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 228 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, and so on. In some embodiments, memory 228 can have encoded thereon, or otherwise stored therein, a program for controlling operation of data source 102. In such embodiments, processor 222 can execute at least a portion of the program to generate images, transmit information and/or content (e.g., data, images) to one or more computing devices 150, receive information and/or content from one or more computing
devices 150, receive instructions from one or more devices (e.g., a personal computer, a laptop computer, a tablet computer, a smartphone, etc.), and so on. [0101] Referring now to FIG. 3A, a method 300 for characterizing and tracking neurodegenerative diseases using brain dynamic biomarkers is shown. The method may be performed by computing device 150 and its processor 202 as shown in FIG. 2. At step 302, EEG signals are received from a patient, such as from data source 102 from FIG.1. The EEG signals may be received from real time acquisition or retrieved from a storge medium. In a non- limiting example, the EEG signals are obtained from the patient in a conscious or sleep state. In another non-limiting example, the EEG signals are obtained while the patient is presented with one or more stimuli. For example, the one or more stimuli may include an electrical shock, or an audible or visual stimulus. Next, at least one feature is extracted from the EEG signals at step 304 using a plurality of SSMs. In a non-limiting example, the feature includes a power, amplitude, phase, oscillatory frequency, probability of an oscillatory state, probability of a candidate state space model, slow wave amplitude or power during sleep, sleep spindles, or a sleep state. [0102] In a non-limiting example, the one or more spectral features are decomposed from the EEG signals using an oscillator model in form of a Gaussian linear SSM, defined by
![Figure imgf000022_0001](https://patentimages.storage.googleapis.com/26/fd/ad/7735f8eab46cc9/imgf000022_0001.png)
where ^^ is the real component and ^
^ is the imaginary component of a generalized
phasor that admits time-varying amplitude, phase and frequency, is the frequency,
is the
sampling frequency,
is the scaling factor,
is the phase,
is system noise, and ^
![Figure imgf000022_0007](https://patentimages.storage.googleapis.com/c4/45/98/b85284670ce692/imgf000022_0007.png)
is noise variance. The Gaussian linear state space model is described in further detail in the example below. [0103] In a non-limiting example, the plurality of SSMs is used to estimate an amplitude and phase. [0104] At step 306, an indicator of a neurodegenerative disease is determined using the plurality of SSMs. In a non-limiting example, an indicator of neurodegenerative disease includes determining an early biomarker of dementia, determining a link between a pathology and cognitive performance of dementia (FIG.4), or an effect of one or more therapeutics on dementia. In a non-limiting example, an early biomarker could include changes in the shape or
amplitude of ERPs, or could include changes in the properties of oscillatory dynamics including the oscillatory frequency, damping, power or amplitude of the oscillation, or the phase of the oscillation, interactions among any of these properties, as well as the probability or frequency of occurrence of any neural states represented by such models (e.g., frequency of spindles during NREM sleep, duration of NREM or REM sleep). [0105] In a non-limiting aspect, neurodegenerative disease includes Alzheimer’s disease (AD) and related dementias (ADRD) that impair memory, thought processes, and functioning, primarily among older adults. Such related dementias may include vascular dementia, dementia with Lewy Bodies, mixed dementia, Parkinson’s disease, and frontotemporal dementia. [0106] At step 308, a report including the indicator of the neurodegenerative disease is provided. [0107] Referring to FIG. 3B, an alternative method 310 for determining and characterizing neurodegenerative disease is shown. In steps 312-314, EEG signals are received from a patient and at least one features is extracted from the EEG signals using a plurality of SSMs as described previously in steps 302 and 304. [0108] At step 316, the plurality of SSMs is input into a switching state-space model (SSSM). In a non-limiting example, the SSSM includes the plurality of SSMs and a hidden state Markov chain. The discrete hidden state Markov chain gates on of the plurality of independent SSMs to an observation. In a non-limiting example, an observation is at least a portion of the received EEG signal and is mathematically represented by The joint probability of a hidden state and
![Figure imgf000023_0002](https://patentimages.storage.googleapis.com/17/dd/f7/e170f124ff21dd/imgf000023_0002.png)
the observation is defined by
where is the discrete hidden Markov chain where ^^ equals the number of the one or more oscillator models Further, unknown parameters of the plurality of SSMs and the hidden
![Figure imgf000023_0003](https://patentimages.storage.googleapis.com/ea/e5/67/9e17b1b87d5150/imgf000023_0003.png)
Markov chain are learned using a variational Bayes learning algorithm, wherein a posterior distribution over each of the hidden states and the discrete switching states are constrained to be independent. [0109] At step 318, an indicator of a neurodegenerative disease is determined using the SSSM. [0110] At step 320, a report including the indicator of the neurodegenerative disease is provided as in step 308 described previously.
[0111] FIG. 3C shows another method 322 of characterizing and tracking neurodegenerative disease using a SS-ERP model. At steps 324-326, EEG signals are received from a patient and at least one features is extracted from the EEG signals using a plurality of SSMs as described previously in steps 302 and 304. Next, at step 328 an ERP associated with one or more stimulus presentation is extracted using an SS-ERP model. In a non-limiting example, the ERPs may include P300 oddball response. [0112] In a non-limiting example, the ERPs are represented as a convolution between ERP
waveforms, and one or more stimulus presentations,
:
[0113] Furthermore, the ERPs are extracted using a SS-ERP model represented by,
[0114] The SS-ERP model is described in further detail in the example below. [0115] Alternatively, the evoked component can be represented as an expansion on a dictionary consisting of a given basis function. As a representative of such basis functions, a Gabor dictionary, is considered whose atoms are given by Gaussian kernels of fixed
variance, normalized to have maximum 1.
A zero mean gaussian prior is considered on the expansion coefficients,
![Figure imgf000024_0007](https://patentimages.storage.googleapis.com/4b/a8/ec/8477444fd97d18/imgf000024_0007.png)
with a diagonal Λ covariance matrix. The ERPs are referred to as Kernel Expansion ERP (KE-ERP). [0116] The KE-ERP model is described in further detail in the example below. [0117] At step 330, an indicator of a neurodegenerative disease is determined using the SS- ERP model. [0118] At step 332, a report including the indicator of the neurodegenerative disease is provided as in step 308 described previously.
[0119] As used in this specification and the claims, the singular forms “a,” “an,” and “the” include plural forms unless the context clearly dictates otherwise. [0120] As used herein, “about”, “approximately,” “substantially,” and “significantly” will be understood by persons of ordinary skill in the art and will vary to some extent on the context in which they are used. If there are uses of the term which are not clear to persons of ordinary skill in the art given the context in which it is used, “about” and “approximately” will mean up to plus or minus 10% of the particular term and “substantially” and “significantly” will mean more than plus or minus 10% of the particular term. [0121] As used herein, the terms “include” and “including” have the same meaning as the terms “comprise” and “comprising.” The terms “comprise” and “comprising” should be interpreted as being “open” transitional terms that permit the inclusion of additional components further to those components recited in the claims. The terms “consist” and “consisting of” should be interpreted as being “closed” transitional terms that do not permit the inclusion of additional components other than the components recited in the claims. The term “consisting essentially of” should be interpreted to be partially closed and allowing the inclusion only of additional components that do not fundamentally alter the nature of the claimed subject matter. [0122] The phrase “such as” should be interpreted as “for example, including.” Moreover, the use of any and all exemplary language, including but not limited to “such as”, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. [0123] Furthermore, in those instances where a convention analogous to “at least one of A, B and C, etc.” is used, in general such a construction is intended in the sense of one having ordinary skill in the art would understand the convention (e.g., “a system having at least one of A, B and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description or figures, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” [0124] All language such as “up to,” “at least,” “greater than,” “less than,” and the like, include the number recited and refer to ranges which can subsequently be broken down into ranges and subranges. A range includes each individual member. Thus, for example, a group having 1-3
members refers to groups having 1, 2, or 3 members. Similarly, a group having 6 members refers to groups having 1, 2, 3, 4, or 6 members, and so forth. [0125] The modal verb “may” refers to the preferred use or selection of one or more options or choices among the several described embodiments or features contained within the same. Where no options or choices are disclosed regarding a particular embodiment or feature contained in the same, the modal verb “may” refers to an affirmative act regarding how to make or use an aspect of a described embodiment or feature contained in the same, or a definitive decision to use a specific skill regarding a described embodiment or feature contained in the same. In this latter context, the modal verb “may” has the same meaning and connotation as the auxiliary verb “can.” [0126] The following example provides non-limiting examples of the development of the methods and systems implement that which is described herein. Example [0127] Results [0128] Neural oscillator models provide individualized characterization of neural oscillations crucial for informed uses of digital AD biomarkers based on EEG rhythms. Neural oscillations are conventionally defined with a fixed frequency band, e.g., 1-4 Hz for delta oscillation, 8-12 Hz for alpha rhythm etc., despite evidence for a substantial amount of inter- individual variations in the dominant frequencies of these oscillations. Ignoring these variabilities during analysis could lead to confounded and sometimes completely inaccurate results. For example, the alpha oscillation has been intensely studied across the AD spectrum from preclinical individuals to moderate AD patients, and it is a pioneering example of EEG biomarkers in AD clinical trials. Still, alpha activity is most frequently investigated using power spectral density (PSD) within a predefined frequency band. FIG.5A demonstrates such analysis among a young adult, an amyloid negative (Aβ-), and an amyloid positive (Aβ+) cognitively normal elderly with multitaper spectrogram and average spectra. Amyloid positivity is defined using a cortical composite measure of PiB-PET retention in the frontal, lateral, and retrosplenial regions (FLR). Naturally, one-size-fits-all band-pass filtering with a fixed, pre-defined frequency range between 8 to 12 Hz performed poorly to separate the relevant oscillations from background noise and confounded frequency changes with power differences, leading to average 8-12 Hz PSD values 17.8, 12.3, and 16.0 dBs for the young, Aβ-, and Aβ+ individuals respectively (FIG.5B). In contrast, the novel approaches to learning oscillator models described herein can identify the underlying oscillations in the resting state recordings without the constraint of pre-specified fixed frequency bands. As an
example, focusing on the alpha oscillation, the unsupervised method adjusted to different, individualized or personalized alpha center frequencies and spectral peak shapes in each subject. Extracting PSD values (+/-2 Hz from central frequency) in the alpha components gave rise to 17.7, 16.2, and 15.9 dBs for the young, Aβ-, and Aβ+ individuals respectively. Characterizing alpha oscillation with individualized oscillator models revealed the significantly lower alpha power in both older subjects compared to the young subject. Furthermore, the alpha power in the Aβ+ subject is noticeably lower than that of the Aβ- subject. Considering that most studies of resting state rhythms employ fixed frequency bands, many of the metrics computed and related to AD pathology are likely corrupted by individual variations highlighted here, inflating the within-group variability of EEG measures, and reducing the statistical power to detect meaningful differences that reflect the AD disease process in the early stage. In contrast, the systems and methods disclosed here introduce more principled and individualized or personalized measures of neural oscillations to advance EEG-based biomarkers in AD research, and the neural oscillator models can accomplish this task. [0129] The novel neural oscillator models within a novel switching state space framework can track and extract time-varying oscillatory dynamics with high precision. Standard analysis methods treat neural oscillations as stationary rhythms despite strong evidence to the contrary. An important benefit of the state-space oscillator models is the use of explicit generative models that can be extended to model time-varying patterns that vary according to a hidden Markov model (HMM)—i.e., a switching state space model. These switching state space models can therefore be used to infer when oscillations of interest are statistically present. A novel variational Bayesian learning method was developed for these switching state-space problems. As an example, when applied to the case of non-stationary alpha oscillations, the method identifies segments when alpha oscillations are indistinguishable from noise (FIG.6A). This allows a more accurate measure of alpha power by conditioning on when the alpha oscillation is actually present. Therefore, the variance of average alpha power spectral density estimation within an individual can be significantly reduced and achieve smaller minimal detectable effects (MDE = 2.485*std), improving from 11.75 dB using standard methods to 6.39 dB using the novel switching oscillator methods (FIG.6B). The empirically observed power difference between normal aging and MCI subjects is approximately 6 dB (FIG.6B dotted horizontal lines). Clearly, inferences on individual cognitive status and longitudinal tracking along the AD spectrum are only feasible using methods such as those presented herein that can achieve this level of precision.
[0130] To explore this direction of analysis, switching oscillator models were applied to characterize individualized alpha power in two groups (n=7 each group) of cognitively normal elderly subjects, who are amyloid positive and negative respectively according to a cutoff score of PiB-PET FLR regional distribution volume ratio (DVR) at 1.2. FIG.7A shows that amyloid positive individuals, despite being cognitively normal, exhibited a trend of decreased alpha power computed by averaging PSD values within 2 Hz around the central frequency of individually characterized alpha oscillator, conditioned on when alpha oscillation is statistically present based on switching inference. In addition, as can be seen from FIGS.5A, amyloid positive individuals appear to have a less stable alpha oscillation that varies over time. Using the switching inference model to identify alpha segments in the two groups of 14 subjects, it was also found that amyloid positive individuals tend to have a reduced fraction of eyes-closed recording containing alpha oscillations shown in FIG.7B. This is a novel finding in contrast to existing reports of alpha changes in AD that only focus on stationary spectral estimates. Stability of alpha may present a promising biomarker to characterize subtle changes in neural circuit dynamics during the preclinical stage of AD before substantial neuronal loss occurs and power reductions become detectable on EEG recordings. [0131] The novel space models can separate ERPs from background oscillations to provide ERP waveform estimates with orders-of-magnitude lower error than traditional averaging, making it possible to differentiate oddball P300 responses between healthy vs. Amyloid positive individuals. Event related potentials (ERPs) are time-locked EEG responses to a sensory, cognitive, or motor events and characterized by the prominent peaks and troughs of the response waveforms. In AD studies, the earlier scalp positive P300 peak is the most extensively used ERP component. However, the major drawback of current approaches is that they rely on averaging over hundreds of trials to extract this time-locked elicited responses from backdrop of persistent slow and/or alpha wave oscillations and. The ERPs could easily lose fine structures of the elicited response. Instead, the oscillator model in conjunction with a reverse correlation model can facilitate estimation of ERPs by explicitly modeling and effectively removing the persistent, spontaneous activities. In addition, a temporal continuity prior can be imposed on the waveform to recover a denoised ERP and an iterative VB-EM method can be developed and jointly used to solve for oscillator parameters and state-space ERPs (SS-EEP). [0132] FIGS.8A-8B illustrates the SS-ERP extracted from simulated auditory tone responses (10 trials), contaminated by strong slow and alpha oscillations. FIG.8A shows the posterior
mean of the oscillatory components and auditory evoked responses, while FIG.8B compares the SS-ERP to the traditional average ERP (“Average ERP”) alongside the ground truth. SS- ERP follows the ground truth closely with a narrower confidence interval, while the average ERP is dominated by oscillations in the background, demonstrating that the SS-ERP is superior at capturing the true latencies and amplitudes of the ERP peaks. [0133] SS-ERP performance in detecting P300 in EEG data (single channel, parietal lobe) recorded from two cognitively normal older adults was also benchmarked: one amyloid negative and one positive. They were asked to respond to a target tone (2000 Hz) that occurs 20% of the time intermixed with a regular tone (1000 Hz). Only 52 trials were included in the analysis to demonstrate the improved ERP extraction with the SS-ERP method. With only 52 trials, at least 10-fold fewer than typically required of traditional averaging methods to identify the P300 response (effect size: 3.57, Cohen’s d, precision: 0.56, MDE: 3.75 µV), SS- ERP captures the same by removing the slow wave and persistent alpha from the background with high confidence (effect size: 21.27 Cohen’s d, precision: 85.33, MDE: 0.303 µV). This shows an impressive improvement in effect size of 5.96-fold, in MDE by 12.37-fold, and an increase in precision of 152-fold for a single subject from only 52 trials. The conventional averaging approach shows no discernible difference between the Aβ- and Aβ+ subjects (FIG. 9A). In comparison, the SS-ERP analysis shows a large and significant difference in the P300 response between the two subjects (FIG.9B). Hence, SS-ERP makes it possible to observe individual differences in P300 ERP waveforms that may reflect preclinical AD pathology. More generally, the SS-ERP method could be used to estimate other ERP waveforms that reflect preclinical AD pathology. [0134] We ran the same benchmark for KE-ERP, but with different numbers of trials (118 frequent, 30 oddball) in a single subject. Like SS-ERP, KE-ERP (Cohen’s d: 167.68, precision: 178.98, MDE: 0.2µV) shows impressive improvements over classical averaging with the same kernels (Cohen’s d: 5.10, precision: 0.112, MDE: 7µV). KE-ERP again shows diminished P300 response in the Aβ+ individual compared to healthy Aβ- individual (FIG. 9D), while the strong background oscillation activity confounds such effect in traditional averaging ERPs (FIG.9C). [0135] Neural oscillator models can characterize individualized (subject- and state-specific) models that represent sleep oscillations more accurately than bandpass filtering. The success of state-space dynamic source localization methods depends on the ability of a model class to represent the signals being estimated. A novel state space model was employed that represents the instantaneous amplitude and phase of multiple parallel oscillations using
“phasors” with real and imaginary components.
70 This model is uniquely well-suited for neural oscillatory signals during sleep. FIGS.10A-10B shows a simulation study analyzing a synthetic EEG signal composed of a 1 Hz slow oscillation with a 10 Hz alpha wave. The state space neural oscillator method was compared with conventional bandpass filtering (0-2 Hz for slow, 9-11 Hz for alpha). The oscillator models provide much more accurate recovery of the underlying waveforms and phase (FIGS.10A-10B). [0136] Neural oscillations are conventionally defined with a fixed frequency band, e.g., 1-4 Hz for Delta oscillation, 8-13 Hz for alpha rhythm etc., despite evidence for a substantial amount of inter-individual variations in the dominant frequencies of these oscillations. Naturally, one-size-fits-all band-pass filtering with hard cutoffs performs poorly to separate the relevant oscillations from the background noise even within the same individual. The novel iterative oscillator algorithm automatically identifies the number of underlying oscillators and learns the individualized model parameters for the oscillations. FIGS.11A- 11B show the identified neural oscillators under different sleep stages for an elderly subject. The oscillator algorithm decomposes observed EEG signals into slow waves and higher frequency oscillations. The learned parameters reveal an elevated center frequency for alpha oscillations under REM compared to baseline awake (FIG.11A). The algorithm also automatically highlights a clear distinction between fast and slow spindles (FIG.11B) in the parameterized center frequency and the spectra of extracted spindle components. These results showcase the sensitivity of the algorithm to different sleep oscillations and the ability to learn and extract subject- and state-specific parameters. Given existing evidence reporting group-level changes of AD in resting state alpha, cholinergic signaling, and spindle activity in relation to memory consolidation and their individual differences, the ability to apply neural oscillator models to analyze EEG with high precision in an individual patient’s data could provide unique insights into underlying AD pathology. [0137] Neural oscillator models within a novel switching state space framework can track and extract time-varying oscillatory dynamics with high precision. Standard analysis methods analyze neural oscillations as stationary rhythms despite strong evidence against such assumption. The stationarity assumption is violated in inherently transient bursting activity during sleep, notably sleep spindles, creating substantial challenges in detecting the presence of spindles that still rely on human visual inspection or automatic methods with heuristic cutoff thresholds. An important benefit of the state-space oscillator models is the use of explicit generative models that can be extended to model time-varying patterns after a discrete state HMM that can infer when oscillations of interest are statistically present. A
novel variational Bayesian learning algorithm was developed for these switching state-space problems. When applied to the spindle detection problem, the algorithm successfully extracts spindles using inference of discrete hidden Markovian switching that gives tight margins around spindles (FIG.12 red lines) even when the pattern is difficult to discern on the original time-series, outperforming traditional switching inference methods (FIG.12 blue lines). Therefore, changes in sleep spindle activity can be measured in a rigorous data-driven manner and relate them to AD molecular biomarkers. [0138] State space phase amplitude modulation methods are significantly more precise than conventional methods. EEG phase amplitude modulation of spindles by slow waves during NREM sleep has shown an aging-related phase shift that is correlated with tau levels. It is well known, however, that existing phase amplitude modulation tools are imprecise and require substantial amounts of data that make such methods impractical for diagnostic or therapeutic applications in individual subjects or patients. Consequently, at present, phase amplitude analyses of EEG to study AD pathology have been limited to population-level studies. A novel state space method was recently developed that greatly improves the precision of phase amplitude coupling (PAC) analyses and is incorporated by reference in its entirety. The oscillator models described herein are used to obtain more accurate estimates of instantaneous phases and amplitudes, and further represent cross-frequency coupling in parametric (SSP) and time-varying forms (dSSP) to improve statistical efficiency during PAC estimation. [0139] To demonstrate the improved performance of the estimation method, EEG data from an elderly patient during general anesthesia was analyzed, which is known to exhibit a stereotyped modulation where alpha amplitude is maximal in the peak phase of the delta oscillations, FIG. 13A shows how phase amplitude modulation cannot be well estimated with conventional methods in the older patient with cognitive impairment. In comparison, the method described herein extracts a clear modulation phase (FIG.13B) ∅
^^ௗ, with 4.6-fold better precision than the conventional approach (FIG.13C). [0140] It is now understood that phase-amplitude coupling relationships between slow wave and spindles during sleep are disrupted in normal aging as well as early AD pathology related to the deposition of tau. However, the evidence for this is only visible in large numbers of subjects or patients and is not discernable in individual patients due to the lack of precision of the methods as illustrated above. In contrast, with the state space methods described herein, it is possible to make precise inferences about phase-amplitude modulation, for instance
between slow waves and spindles during sleep, that could provide precise diagnostic information in individual patients that could be tracked longitudinally over time. [0141] Herein it is further described how these methods can be used to detect sleep spindles in a manner superior to existing methodologies. Sleep EEG was collected during overnight sleep from a healthy young adult at the Massachusetts General Hospital Sleep Center. Written informed consent was obtained from the subject, and the study protocol was approved by the Massachusetts General Hospital Human Research Committee. EEG recordings from two consecutive nights were acquired using the 128-channel eegoTM mylab EEG amplifier system using an equidistant WaveguardTM montage (ANT Neuro, En- schede, The Netherlands) at 1 kHz sampling rate. The ground and online reference electrodes were placed at the left mastoid and a central parietal electrode (Z3) respectively. A trained polysomnography technician marked sleep stages based on the current AASM guidelines [1]. [0142] We analyzed recordings from a left parietal electrode (LA2), which is analogous to C3 in the International 10-20 system, re-referenced against the right mastoid (M2) and downsampled to 100 Hz. Spindles were detected using the proposed method and the two benchmark algorithms from all periods scored as NREM stage 2/3 sleep. For the variational switching SSM method, spindles from non-overlapping 30-s epochs were segmented that were analyzed independently from each other. This window length agrees with the clinical consensus of studying sleep EEG in 30-s epochs and allows fast latent state estimation and reliable model parameter learning. To quantify the detection overlaps among the three methods, the confusion matrix statistic F1 score was computed: F1 =2×precision×recall/(precision+recall),which is a harmonic mean of precision and recall. Precision and recall measures correspond to the fractions of events detected by one method also detected by the other. After artifact rejection based on outliers in spectral power, the two nights yielded 239 and 257 mins of NREM sleep (excluding stage 1), respectively. Average rates of spindle events detected by the proposed method and by the two benchmark algorithms are shown in Table 1.
![Figure imgf000032_0001](https://patentimages.storage.googleapis.com/3a/31/90/4b15190ff89244/imgf000032_0001.png)
[0143] The variational Bayesian switching (VBS) method detected many more spindles than the Wamsley (WAM) spindle detector that uses wavelet transform, but was similar to the
method that identifies time-frequency peaks (TFP) on spectrograms. FIG.14A shows an example 30-s epoch comparing the three spindle detection methods on NREM sleep EEG. It can be observed that all spindles visible on the spectrogram and on the band-pass filtered signal were identified by the proposed method (in red and overlaid over the filtered signal) across the whole window. In comparison, the Wamsley detector selected only the strongest spindles, which is a known bias originating from human scoring. The time-frequency peak method missed one spindle detected by the method at around 7s while identifying an extra event at around 27s in the epoch. In the absence of a ground truth, it was not assumed the spindles detected by any of the methods to be more accurate over the others. Instead, the detection overlaps among the methods were analyzed as shown in FIG.14B. The proposed method captured almost all spindles detected by the Wamsley detector, while had around 30% non-overlapping events with the time- frequency peak method. Spindle detection patterns were highly stable across the two nights in the same subject. [0144] This analysis presented a novel sleep spindle detection algorithm using switching linear Gaussian state-space models. The findings strongly support the framework that switching models can be used for automatic segmentation of sleep EEG into regimes exhibiting different dynamics. Compared to existing spindle detectors using band-pass filtering and thresholding, the method has several important advantages. First, the parametric generative model facilitates inferences that can adapt to instantaneous fluctuations in spindle amplitude and frequency, eliminating the need for pre-defined frequency cut- offs and amplitude thresholds. Second, spindles are segmented in a rigorous probabilistic framework with full statistical characterizations including confidence intervals. Third, denoised estimates of spindle waveforms are also obtained, providing greater statistical power to detect potential changes in spindle activity. Lastly, the method provides completely data-driven and individualized spindle detection requiring no a priori learning on labeled spindles. This feature allows unbiased detection on short recordings without relying on full-night population statistics to select spindle events. The proposed spindle detector is therefore applicable across different settings from short napping to at-home sleep testing that have become increasingly important in sleep research. [0145] A critical advantage of the modeling approach to study neural oscillations is the ability to rigorously analyze electrophysiological recordings and extract measures of underlying activity to reflect the brain functional status. This is achieved in a data-driven and unsupervised manner, avoiding biases introduced by human imposed cutoff thresholds and offering clearer interpretations. An example to demonstrate this feature is to analyze slow
oscillations during sleep in relation to amyloid deposition in cognitively normal older adults. There has been a solid base of literature relating slow wave activity during sleep to cortical amyloid, supported by the recently identified glymphatic system for amyloid clearance and empirical data correlating slow wave activity and cortical measure. However, slow wave activity is notorious to quantify from a signal processing standpoint. This is in part because Fourier-based spectral estimation has insufficient spectral resolution to disambiguate between different forms of activity in the narrow slow frequency range. Furthermore, previously disclosed methods provide no definitive evidence indicates a distinction between slow waves (<1Hz) and delta oscillations (1-4Hz) using 1Hz as the cutoff frequency. [0146] A highly influential paper [Winer et al.2019] showed a selective correlation between cortical amyloid and a proportional measure of slow wave activity, defined as the ratio of spectral power within 0.6-1Hz over the total power within 0.6-4Hz. This same measure has been used for almost a decade in the past to demonstrate various associations of human slow wave activity including its disruption in normal aging. However, from a signal processing standpoint this is a highly ambiguous and uninterpretable ratio measure because a reduction in this measure could be driven by an actual reduction in sub-1Hz slow wave activity, a change in the aperiodic component in EEG, an increase in delta oscillations in 1-4Hz, or a combination of any of the above factors. Each factor implies different mechanistic interpretations of how amyloid is related to altered neural oscillations during sleep and could lead to very different targeted therapeutic inventions in the future. A direct comparison was performed of this ratio measure with the state-space modeling method on overnight sleep recordings collected from cognitively normal older adults at Mass General Hospital Sleep Center. As shown in FIG.15, using oscillators to decompose sleep recordings, it was clarified that in fact it is the amplitude of delta oscillations (1-4 Hz), which normally decrease with aging, tended to be elevated in individuals harboring increased amyloid pathology. In comparison, using the proportional measure of slow wave activity described in Winer et al. 2019, no relation can be identified. This example illustrates the importance of studying neural oscillations in an unsupervised manner and extracting rigorous measures of oscillatory activity, compared to relying on ad-hoc derived measures that may correlate with the actual neural sources but are vulnerable to multiple sources of noise and suffer ambiguous interpretations. [0147] Signal Processing Methods Overview [0148] Oscillator Model.
[0149] A method for decomposing the EEG time series into oscillation components using a Gaussian linear state space model was considered, developed based on Wiener’s random frequency modulation:
![Figure imgf000035_0001](https://patentimages.storage.googleapis.com/8a/fe/54/6487450bc1d32f/imgf000035_0001.png)
[0150] The current state of oscillator, s determined by rotating counterclockwise through an angle of
about the origin, scaling by
and lastly adding system noise, At every time point, a noisy
observation of the first coordinate is made. In other words, can be seen as a generalization
of phasor are the real and imaginary components respectively) rotating in
time, while a noisy version of its projection on the real line is observed (see FIGS.16A-16C). Naturally, the instantaneous phase of the oscillator is defined as The
model can be easily generalized for multiple oscillation components by treating oscillation components independently and the observation time series as a noisy superposition of the first coordinates of all the oscillators. [0151] Learning the Oscillator Components. The oscillator parameters
and the observation noise variance
![Figure imgf000035_0006](https://patentimages.storage.googleapis.com/ef/ce/34/d89eafa3fcbf95/imgf000035_0006.png)
are learned using an instance of Expectation Maximization (EM) algorithm. EM alternates between optimizing the distribution over the hidden oscillator states given the current parameters (the E-step) and updating the parameters given the distribution of hidden states (the M-step). The updates are stopped when the value of the likelihood of the observation stabilize. To determine the number of oscillation components present in the neural time series data, a novel greedy algorithm was developed that starts from a slow wave oscillator, and iteratively keeps on adding additional oscillators one at a time (See FIG.16B). The algorithm stops based on Akaike Information Criterion (AIC) when adding another oscillation no longer reduces AIC. In summary, the iterative algorithm decomposes EEG data into a set of oscillations in an unsupervised manner that explicitly characterizes the underlying neural oscillations with stationary parameters (See FIG.16D).
[0152] Switching State Space Model. One elegant way of modeling transient neural processes that manifest momentary variations, i.e., ‘burst’ of high amplitude oscillations in EEG as in sleep spindles is switching state space model (SSSM): a set of oscillator models with different parameters augmented with a discrete hidden Markov chain. FIG.17A provides a schematic of a simple case with two oscillator models in the state
![Figure imgf000036_0002](https://patentimages.storage.googleapis.com/b0/0b/a3/cd962ae2f6ed19/imgf000036_0002.png)
space set, and bi-state Markov chain, each oscillator model evolves independently
according to its dynamics, while the Markov chain, ^
(also evolving independently) merely gates one of the oscillator model states to the observation. The joint probability of the hidden states and observations can therefore be factored as:
[0153] Variational Bayes Learning of Switching States. Again, like the oscillator model, the model parameters (i.e., oscillator parameters for different models, transition probability matrix of the hidden Markov chain, observation noise) can be learned using an instance of EM algorithm. However, in SSSM exact inference of the hidden states is difficult, since conditioning on the observation, (i.e., fixed), introduces inter-dependency of the oscillator
states,
and the switch, making their posterior distributions intractable. Thus, the
E-step was modified to search over a limited space of posterior distributions over hidden states that have a tractable form. This generalized approach is called Variational Bayes (VB). FIG.17B shows one such tractable form, i.e., the posterior distribution over each of the hidden states and the discrete switching states are constrained to be independent, and the learning approach under this constraint is summarized in FIG.17C. In summary, VB takes a
divide and conquer approach for the E-step: the constraint allows to learn the posteriors of
and cyclically, fixing the posteriors of the other hidden states.
[0154] State-Space Phase Amplitude Modulation (SS-PAC). With the oscillator decomposition model, the amplitude and phase of the underlying oscillation components could be directly estimated from the estimated oscillator states. A parametric representation of the cross-frequency modulation between slow instantaneous phase and fast oscillation
amplitude is introduced, which is readily computed from estimated oscillator states:
where modulation parameters, controls the strength of the modulation while is the preferred phase around which the amplitude of the fast oscillation is maximal. The representation can be cast as a constrained linear regression problem using a different set of parameters,
These estimates can be incorporated into a second state space model to capture their time-varying counterpart,
evolving as an auto-regressive process of order ^^ (where T denote the time windows):
[0155] Learning the SS-PAC. The constrained linear regression problem in
can be solved to derive the posterior distribution, in closed form. This addresses a major
source of statistical inefficiency by enabling direct construction of associated credible intervals to determine statistical significance, instead of relying on surrogate data
104. In summary, the SS-PAC is a novel method that integrates the state space model of oscillations with parametric formulation of PAC, addressing many limitations associated with standard PAC methods. [0156] State Space Evoked Response Potential (SS-ERP). The ubiquitous oscillator model can also facilitate estimation of evoked response potentials (ERP) by explicitly modeling background oscillatory “noise” that would otherwise interfere with the smaller underlying ERP waveform. This can be achieved by adding an extra term describing the ERP, ℎ as impulse response of the neural population to the stimulus sequence,
[0157] A continuity prior was also imposed on the ERPs in form of a random walk, i.e.,
to ensure robust recovery and further remove noise from the
![Figure imgf000037_0007](https://patentimages.storage.googleapis.com/88/2c/62/83eb0ac5d627c6/imgf000037_0007.png)
estimates. [0158] Learning the ERP. An instance of VB EM algorithm was also employed, constraining the posterior distributions of ERP, ℎ and oscillation parameters to be statistically independent. That allows the oscillator state and the ERP distribution to be updated cyclically in the E-step, while the oscillator parameters can be re-estimated in closed form. This framework provides the posterior distribution of the ERP, as opposed to a point estimate in
classical averaging technique. In summary, state space ERP allows to readily capture the time-locked responses while simultaneously extracting the background oscillations and thus removing their effects on the waveforms. [0159] Signal Processing Method Details [0160] Learning Parameters for State Space Oscillator Models [0161] Given a univariate time series data ^^ from neural signal recordings, a key objective is to determine what, if any, neural oscillations are present in the data
![Figure imgf000038_0004](https://patentimages.storage.googleapis.com/b6/f8/62/b6df32d59b5ae3/imgf000038_0004.png)
A related question is to characterize the properties or parameters of those neural oscillations. The iterative oscillator method (iOsc) addresses these questions. [0162] iOsc is a greedy search algorithm that attempts to represent
by adding neural oscillation components one at a time until a pre-specified stopping number. Model selection is then performed based on some metric of how well different numbers of oscillations can represent the original data. The output of the iOsc algorithm is this selected set of neural oscillations, inclusive of the number of oscillations and all of their parameters, which are regarded as the most representative oscillator components underlying
[0163] The way we try to represent ^^ in iOsc is via time domain modeling using the following class of parametric oscillator state-space models:
[0164] This model class is chosen as the building block of individual oscillations because it provides an interpretable and efficient model to represent a stochastic oscillatory process with random frequency modulation, which is unanimously observed in human EEG recordings, distinct from pure sinusoidal waves encountered in physical science problems such as radar. When there are multiple neural oscillations present at the same time, we can easily concatenate multiple independent oscillators in block-diagonal matrix form in (6) and arrive at an equally interpretable model with simple summations in (7) to produce An additional
benefit of time domain modeling is that log likelihoods provide a natural metric to evaluate how well different models with different numbers of oscillators can represent ^^, allowing for a straightforward model selection scheme. [0165] Computation and challenge
[0166] At each iteration, iOsc performs time domain modeling of
using one or more oscillator components in the form of (6). The unknown parameters are
for each oscillator and a single observation noise variance
If the values of these parameters are known, state estimation (also called exact inference or belief propagation) can be done efficiently using Kalman filtering and fixed-interval smoothing algorithms to estimate the hidden variables
which represent the time series of the underlying oscillation components. Conversely, given values of one can update the parameter values in this linear Gaussian
setting using closed-form equations. Alternating between these two steps is an instance of expectation-maximization (EM) algorithm with oscillators and data Detailed equations
for exact inference and parameters are outlined below. [0167] While each iteration of iOsc simply carries out EM learning, and the computations are known and efficiently implemented, a crucial challenge is how to initialize EM in iOsc. The complete data log likelihood landscape in this linear Gaussian case is concave with a single global maximum; however, there are non-trivial numbers of unknown parameters and only limited numbers of time points in Given poor initialization, EM may require infinite
iterations or data length to converge to the appropriate values and render poor results of iOsc. Thus, a crucial step to successfully identify underlying oscillator components using iOsc is to choose informed starting points throughout the greedy search. In other words: when we use a single oscillator to describe the data
what should be the initial guess of model parameters provided to EM? When we would like to add an additional oscillator, what should be the initial guess of that additional oscillator as well as of all existing oscillators? [0168] Oscillator initialization [0169] The algorithm design of iOsc adds one oscillator at a time; therefore, we need to construct a scheme to initialize model parameters for a single oscillator. Consider that we have observed data under sampling rate and the pre-specified stopping number of
oscillations is
, meaning we believe there are at most ^^ oscillations present in the recording. For example, could be a reasonable but non-limiting choice in human EEG that
typically contains 2-4 distinct oscillations. We have the following steps to initialize an oscillator. [0170] Step 1: fit an AR model
[0171] We first fit an AR model of order
. The order is chosen as
because the complex roots of AR models appear as pairs in frequency. For instance, when we
attempt to identify a zero at DC and 6 frequencies above 0 Hz for oscillatory poles using a AR(13) model. This AR fitting could be done using the Yule-Walker method or the Burg algorithm, for example, and we denote the fitted AR coefficients as
[0172] Step 2: select the largest pole/zero [0173] The complex roots of the fitted AR model can be identified by solving for the roots of the polynomial with coefficients
We can also obtain the theoretical PSD spectrum
of the AR process. To initialize the oscillator parameters ^ we first find the root
corresponding to the highest power. We then estimate the oscillator strength at a pole by weighting the theoretical PSD
at the pole frequency
by the pole radius
where comes from the set of all poles/zeros with unique oscillatory frequencies using the complex roots. Once the pole/zero with the largest power is identified, the oscillator parameters can be initialized as:
[0174] When a zero is selected as the largest root, the frequency 0 Hz is not a desired choice because these oscillator parameters are initialized for EM learning. Kalman filtering starting from this value prevents updates of the oscillator frequency to be different from 0 Hz. Thus, we place a lower bound on at 0.1 Hz during initialization, although subsequent EM learning
can update it to be closer to 0 Hz. [0175] Step 3: initialize oscillator state noise covariance [0176] After fitting an AR model, there is not an obviously apparent way to initialize the state noise variance parameter
for a single oscillator. This is because the estimated white noise variance excites all poles and zeros in the AR process at the same time, instead of
being specific to one or multiple poles at the frequency where we might wish to place an
oscillator. We can solve this technical challenge by transforming the AR process and the identified noise variance into distinct eigenmodes and read off noise variances only from
the poles located at the selected frequency.
[0177] The key idea here is that we will derive an informed estimate of
using spectral decomposition of the fitted AR process. We first write the A
( ) model in the following state-space form:
We also define the following shorthand notations:
and
We can therefore rewrite the A
( ) equations in the following matrix form:
Consider an eigendecomposition of the transition matrix
We can rearrange equation (11) by multiplying
an orthogonal matrix, to both sides:
[0178] The diagonal matrix contains complex eigenvalues, which are actually the same
poles and zeros of the characteristic function with AR coefficients
. This equivalence relation exists because is the companion matrix of the polynomial equation with
coefficients
In fact, modern computing software solves the polynomial via eigendecomposition of the companion matrix. Therefore, not only the complex eigenvalues in match the complex roots in step 2, they are in the exact same order. [0179] Notice that ^^ can be viewed as the transition matrix of a set of parametric oscillators. Each complex number can be written in a complex exponential form, which coincides with a rotation matrix with frequency ^^ multiplied by a damping parameter ^^ that is the magnitude of the complex eigenvalue (or equivalently, the radius of the complex root). In other words, except a slight difference in the state noise processes not being diagonal, equation (14) is also describing a set of block-diagonally concatenated oscillators in the space rotated by
![Figure imgf000042_0016](https://patentimages.storage.googleapis.com/85/0c/fd/9d4109b0d1ea84/imgf000042_0016.png)
just like equation (6). The only mathematical distinction lies in that the noise processes driving the real and imaginary components of each oscillator are now correlated, as well as among different oscillators. While we will not further utilize this oscillator interpretation of eigenmodes, it provides the motivation for initializing the oscillator parameter using
variances of the noise processes driving these eigenmodes. [0180] We also modify the observation equation (12) to be in the same rotated space:
with the new observation vector which is simply the first row of
⋅ in short:
[0181] Since is taking a linear combination of the state values from the different
oscillators (poles and zeros) in the rotated space to form the observed data
we can mask out all entries of
⋅ as 0 except for the selected pole by equation (8) in step 2:
[0182] Observe that the only noise process in this rotated space is
Multiplying by the
masked observation vector scales the noise variance back in scale
with the observed data ^^ and keeps only the contribution from a single pole. Thus, we can derive an estimate of the pole-specific noise variance by applying the masking observation vector to the rotated noise covariance matrix Because is only non-zero
in the first diagonal entry, we can rewrite the rotated noise covariance matrix as:
is the first column of the inverse o is the conjugate transpose of
[0183] Recall that the complex roots (or equivalently, the complex eigenvalues) appear as pairs in frequency except at DC. This means two poles in the AR process contribute to the spectral decomposition at any frequency greater than 0 Hz. This can be taken into account by summing variances across the poles with the same rotation frequency, which ignores the cross-correlation between them. Finally, we arrive at the desired estimate of the oscillator state noise variance parameter ^
[0184] In summary, the key insights of using this eigendecomposition described here in iOsc are: 1) to relate the complex eigenvalues to the poles and zeros of an AR process; 2) to recognize the complex eigenvalues under spectral decomposition as oscillators after rotation; and 3) to apply masking observation vectors in the rotated space to combine variances from relevant poles and zeros to initialize the oscillator parameter
ଶ
[0185] Step 4: repeat for additional oscillators [0186] Steps 1-3 provide informed starting points of parameters for the first
oscillator when ^^ is the observed data. This first oscillator added is most often a slow oscillator (< 1 Hz) in human EEG recordings, since such slow oscillations tend to be large in amplitude and thus power. Recall that the search algorithm of iOsc will now attempt to add an additional oscillator on top of the first oscillator that has gone through EM learning. We can use the outcomes of EM learning for existing oscillators, but we need to initialize the parameters for the new oscillator to be added. To achieve this, we can reapply the
AR fitting procedure to the innovations, i.e., one-step prediction error (OSPE):
is a reasonable time series to repeat steps 1-3 because the current model only includes existing oscillators in the transition matrix ^^, therefore it only predicts oscillatory activity characterized by existing oscillators but not additional oscillations present in the observed data ^^ that are yet to be characterized. Therefore, the next oscillator to be added by iOsc can be identified by selecting the strongest pole/zero of an AR model fitted to this OSPE ^^^
[0187] In contrast, if the residual error based on the filtered estimate ^ were
employed to discover additional oscillations, i.e., those additional
![Figure imgf000043_0003](https://patentimages.storage.googleapis.com/f4/84/c4/29dd8a80e05e29/imgf000043_0003.png)
oscillations would not be visible since the Kalman filter adjusts the state estimates based on the most recent observations, absorbing the error from any model mis-specification into the state estimate. The same would be true for smoothers such as the fixed interval smoother. Thus the use of the OSPE for discovery of additional oscillators as described above is an essential feature of the disclosed approach. [0188] This algorithm design offers an efficient implementation because the same routines in steps 1-3 can be repeated identically after replacing ^^ with ^^^. But, there are two additional technical details that require some attention. First, when there are only a small number of oscillators present in a model, strong oscillations in the low frequency range may not be explained fully because an EM learned oscillator has skewed frequency and/or damping parameter to accommodate spectral content in higher frequency. As a result, the next strongest pole identified by an AR process fitted to the residual ^^^ might be still in the low frequency. This is problematic because at the next iteration there are two oscillators competing with each other in this frequency range. To circumvent this issue, we impose a minimal frequency resolution (typically 1 Hz) when initializing new oscillators, meaning poles and zeros are only considered if they are more than 1 Hz away from existing oscillators. This is a mild constraint only applied during initialization, and if there are indeed oscillations closer together, EM learning is able to and will update oscillator frequencies to be less than 1 Hz apart. This frequency resolution is only used to prevent erroneous duplicated oscillators during search. [0189] Second, the choice of AR model order as 2 ^^ െ 1 in step 1 assumes at most ^^ oscillators. During subsequent search iterations after the first oscillator has been added, fitting AR(2 ^^ െ 1) will again identify ^^ additional frequencies. Since there are already oscillators present in the model, a fixed AR model order does not strictly respect the upper bound of numbers of oscillators. We chose to keep the AR model order constant in order to ensure comparable initialization of oscillator parameters
^ ^^, ^^, ^^
ଶ^ across iterations. When there are no further oscillations remaining in the residual during later search iterations, fitting AR(2 ^^ െ 1) is similar to fitting a high order AR model to a white noise process. Diffuse poles with small radii will be identified, which are still compatible with iOsc and the use of log likelihoods for model selection, since only a single oscillator is added at a time. [0190] Decomposed oscillator modeling
[0191] A non-iterative variant of the iOsc algorithm can be easily derived from the above routine with slight modifications. Since the spectral decomposition using complex eigenvalues in step 3 is not limited to just the largest pole/zero, we can initialize the desired state noise variance parameter ^^
ଶ for any of the rotated oscillators after AR modeling and applying separate masking observation vectors. Similarly, we have access to oscillator parameters ^ ^^, ^^^ for all poles and zeros in step 2. Therefore, a non-iterative approach to model ^^ can be constructed by initializing all of ^^ oscillators at once after an initial fit of an AR model of order 2 ^^ െ 1 to ^^. This removes the need for step 4 that examines OSPE because all oscillators to be added have been initialized appropriately. [0192] We refer to this approach as decomposed oscillator modeling (dOsc) because it relies on AR-based decomposition of a time series into complex eigenmodes, from which oscillators are initialized. The rest of processing in dOsc is identical to iOsc and will add all initialized oscillators one at a time in descending order of ^^
ଶ. The only difference is that each oscillator is no longer initialized after preceding oscillators have been added and learned with EM; initial parameters of all added oscillators are determined in one step with a single AR model fitting. [0193] As expected, iOsc and dOsc perform similarly in many cases given the overlap in the underlying computations. However, a key distinction lies in how the two approaches extract oscillations underlying recordings. iOsc is better geared to uncover masked oscillations that are only clearly visible after other oscillations have been explained away since it operates on innovation time series except the first oscillator. This is at the cost of potentially biasing subsequent added oscillators: this can be understood from the maximal likelihood nature of Kalman filtering and smoothing. When an insufficient number of oscillators than the true number of underlying oscillations are employed in a model, the oscillators at lower frequencies tend to get skewed away from their central frequencies in order to account for unexplained activity in other frequencies. The innovation spectrum may then contain more power in frequency ranges with existing oscillators and less power in frequency ranges without existing oscillators. Since oscillator parameters in iOsc are initialized from AR fitting of the innovation spectrum, this results in biased initial estimates and parameter values after EM learning. [0194] In comparison, dOsc does not suffer from biases due to oscillator fitting in previous iterations because oscillator parameters are initialized at once on the original data ^^. Thus,
when there are clear and separable oscillations in a recording, dOsc often recovers more precise parameter estimates and state estimation results. However, dOsc is not capable of distinguishing between oscillations that are very close by, especially if there is a weaker oscillation that is nested under a stronger oscillation that can only be modelled after the stronger oscillation is captured by its oscillator. In other words, the performance of dOsc is limited by how well AR modeling can separate out eigenmodes; dOsc is still much superior to AR modeling due to employing interpretable and efficient model structures, allowing more accurate representation of neural oscillations, and adding oscillators sequentially for model selection. Nevertheless, if no oscillator gets initialized at a value after AR fitting to ^^, dOsc does not have a mechanism to adapt and add an oscillator in frequency regions not well captured by previous oscillators (which requires examining the OSPE in iOsc), since it simply adds the set of oscillators initialized at the beginning in descending order of initialized values of ^^
ଶ. [0195] When it comes to studying real recordings of neural activity with a variable number of oscillations as well as uncertain oscillation properties, dOsc and iOsc nicely complement each other. Clearer understanding of neural oscillations is often achieved by applying both approaches to the recording and contrasting their modeling results. [0196] Observation noise initialization [0197] The only remaining parameter that needs to be initialized before iOsc and dOsc can proceed is the observation noise variance ^^
ଶ. In different recording modalities, there are different ways to estimate the observation noise. For instance, MEG can utilize empty room recording to directly measure the observation noise process. In EEG however, there is not an equivalent empty room recording available. Thus, we derive an empirical estimate of the assumed white observation noise after high-pass filtering above a selected frequency. This heuristic estimate is motivated by Pirondini et al. (2017) with the idea that EEG recordings above a certain frequency is predominated by noise processes. For example, in observed data ^^ sampled at Fs ൌ 100 ^^ ^^, we use the variance of ^^ high-pass filtered above 30 Hz, scaled by
ହ to account for full frequency range of white noise, to obtain an initial estimate of ^^
ଶ. [0198] We employ an informed inverse gamma prior when updating the observation noise variance ^^
ଶ. The motivation is that oscillator state noise processes are additive with the observation noise in the formulation of (7). Thus, during Kalman filtering and fixed-interval
smoothing, some of the observation noise gets explained away by the hidden oscillator processes, resulting in a biased estimate of ^^
ଶ during parameter learning. This can also be understood by noting that theoretical spectra of oscillators have non-zero values above 30 Hz, and the additive effect of multiple oscillators in this frequency range can account for a portion of a white observation noise process during state estimation, resulting in underestimated Thus, we
ଶ
constrain parameter updates of during iOsc and dOsc search
by imposing a prior whose mode is placed at the empirically estimated described above
using high-pass filtered data variance. [0199] To ensure comparable evaluation of log likelihoods for model selection, we must use
the same prior throughout iOsc iterations since the log likelihood expression involves ^^ଶ:
where
with being the conditional mean and variance of the iven ^
![Figure imgf000047_0004](https://patentimages.storage.googleapis.com/68/e6/88/113557d4adb94a/imgf000047_0004.png)
![Figure imgf000047_0005](https://patentimages.storage.googleapis.com/ed/d1/d6/9ad233affdb31f/imgf000047_0005.png)
![Figure imgf000047_0003](https://patentimages.storage.googleapis.com/5a/e8/4a/a4366a2befbd87/imgf000047_0003.png)
[0200] Model selection [0201] Now that the oscillators can be iteratively added to model the observed data ^^ with proper initialization, each iteration of iOsc and dOsc simply performs EM learning for some number of iterations to update model parameters and estimate hidden oscillator states. Model selection is based on which model, i.e., which combination of oscillators, provides the best representation of ^^ using log likelihoods. Since more oscillators provide more degrees of freedom and can over-fit ^^, we select the knee point on the log likelihood curve spanning ^^ to ^^ oscillators to account for diminishing returns of additional model complexity. The model and its learned oscillators at the knee point are the outputs of iOsc and dOsc as a set of oscillatory components underlying observed data ^^. [0202] Switching State-Space Modeling of Neural Signal Dynamics [0203] State-space modeling is a promising and versatile analytic approach for characterizing neural signals that has received substantial attention over many decades. State-space models can be highly effective for analyzing neural time series data, particularly if the model formulation and parameterization are able to accurately represent the dynamics of the data
generating process. Powerful neural data analysis methods have been developed using stationary linear models, for which inference and learning algorithms can be readily derived using well-established methods. However, real neural signal dynamics are more often time- varying, due to either intrinsic fluctuations of physiological states or extrinsic influences from cognitive and environmental processes. Accordingly, mismatches between chosen state-space models and true underlying dynamical processes can be a major source of error in such algorithms. [0204] In practice, researchers can manage variations in temporal dynamics by allowing the model parameters to vary across time or fitting different model parameters to consecutive windows of data. Thresholds can also be used to exclude outliers and limit the range of time series data within some neighborhood where a locally time-invariant approximation remains valid. These approaches can be effective when the time-varying activity can be tracked by changing parameter values within a given model class, or when the temporal dynamics evolve slowly compared to the window length. However, they fall short when the neural signals of interest are by definition transient. Examples of such transient activity are abundant in neuroscience, including epileptic bursts, hippocampal ripples, and event-related potentials. In other scenarios, neural dynamics can change abruptly at unpredictable times, for instance, during anesthesia-induced burst-suppression, different stages of sleep, and modulations from active learning and behavioral responses. Time-varying parameters or windowed approximations would perform sub-optimally at best in these cases and could miss rapid transitions in the dynamics if the scale of temporal variation or window length is mis- specified. [0205] State-space models with switching have been proposed to represent signals that are composed of multiple segments with different dynamics, under the assumption that each segment is approximately linear and stationary. Two distinct approaches have been taken to develop corresponding algorithms for these switching state-space models. In one approach, approximate solutions were developed based on existing solutions to stationary Gaussian state-space models, or Gaussian SSM for short. These early studies recognized the intractability of accounting for all possible switching transitions whose combinations grow exponentially with the length of the time series. Nevertheless, efficient inference algorithms have been developed by assuming that the transitions between multiple state-space dynamics follow a hidden Markov model (HMM) process and by approximating the conditional means and covariances of Gaussian hidden states. These methods have focused on the filtered estimates of the switching model probabilities and therefore on the ability to forecast a
transition in dynamics. While this inference is relevant in control systems and econometrics, data analysis questions in neuroscience are more often centered on segmenting a given time series into periods with distinct neural dynamics and subsequently characterizing the properties of those dynamics. Likely for this reason, these inference algorithms have not been extensively applied in neural signal processing. [0206] In another more recent approach, advances in Bayesian methods have enabled the development of several new algorithms for switching state-space models. These methods assume similar switching dynamics undergoing HMM transitions but leverage Markov chain Monte Carlo sampling techniques to perform state estimation and parameter learning. The adoption of blocked Gibbs sampling for switching state-space models is particularly powerful, as it allows extensions with non-parametric and generalized linear models as well as with traditionally intractable structures such as recurrent dynamics. These state-of-the-art methods have been successfully applied to the analysis of neural signals. Empirical studies so far using these algorithms have employed less structured hidden state-spaces with dense transition matrices. A limitation of this general-purpose approach is that the candidate models may be less interpretable in relation to the neural mechanisms being studied compared to simpler and physiologically motivated models. So far, the Bayesian sampling methods have been successfully applied to segmentation problems using high-dimensional embedding of neural signals. Meanwhile, state-space models employing quasi-stationary sliding windows with closed-form solutions have been used to characterize time-varying signals. The performance of such models in time-varying scenarios could benefit from extensions to incorporate switching dynamics. [0207] In 2000, Ghahramani and Hinton proposed an insightful solution for switching state- space models inspired by variational approximations from graphical models. Their generative models were similar to the hybrid models combining Gaussian SSMs and HMM, and they derived inference and learning computations in the recent Bayesian framework. But, instead of relying on Gibbs sampling, their approximate inference and learning solution utilized traditional exact inference algorithms as building blocks. The Ghahramani and Hinton algorithm is uniquely situated between the above two eras of research on switching state- space models, and it provides an opportunity to combine strengths from both approaches to study neural activity with transient dynamics using interpretable models and optimized approximations. This algorithm provides an accessible entry point to switching state-space models and facilitates construction of time-varying models of neural activity that could be further developed using more recent Bayesian methods. Despite the important and significant
conceptual advance described by Ghahramani and Hinton, its applications in neuroscience have also been limited. Overall, the likelihood function for switching state-space models is non-convex and solutions are therefore sensitive to the initialization conditions. To address this issue, Ghahramani and Hinton used deterministic annealing, enabling the algorithm to perform comparably to past inference methods, but with little improvement. Moreover, the complexity of the algorithm and its computational requirements may have limited its adoption. [0208] In this paper we introduce methods to significantly improve switching state-space model inference and learning under the framework initially proposed by Ghahramani and Hinton. First, we present a simple but rigorous derivation of the inference algorithm under the variational Bayes approximation and expectation-maximization (EM) frameworks, complete with all closed-form equations involved along with practical considerations when applying this method to neuroscience problems. We then describe a novel initialization procedure that addresses the crucial challenge of navigating the non-concave log-likelihood function to find relevant optimal solutions. We show results on how this method significantly improves over deterministic annealing, enabling variational inference to outperform past inference algorithms. It is also shown that the approach herein improves parameter recovery, which makes it possible to more accurately characterize time-varying dynamics. We then extend the generative model structure to accommodate more complicated switching state-space models including nested models, which are frequently encountered in many applications including neuroscience. Finally, we apply the variational Bayesian learning algorithm to a long- standing problem of sleep spindle detection from electroencephalography (EEG) recordings during sleep in humans and show compelling results that switching state-space models can reliably identify transient neural activity. [0209] Results [0210] Throughout this work, we use regular and boldface lowercase letters to denote scalars and vectors, respectively. Matrices are denoted by boldface uppercase letters or by regular uppercase letters when the matrices are one-dimensional. The transpose of a matrix ^^ is denoted by ^^
ୃ, and ^^
^^ indicates the element of the matrix at the ^^
th row and ^^
th column position. A variable indexed with another discrete variable ^^ taking values in ^1,⋯ , ^^^, e.g., ^^
^^^, refers to the following:
[0211] We consider switching state-space models to consist of a set of ^^ uncorrelated linear Gaussian SSMs encoding arbitrarily distinct dynamics and an HMM of a discrete variable ^^ taking values in ^
![Figure imgf000051_0008](https://patentimages.storage.googleapis.com/26/52/e1/e3b8f8208d516b/imgf000051_0008.png)
The hidden states of Gaussian SSMs evolve in parallel and are allowed to be of different dimensions with appropriate mapping to observations. At every time point, the HMM selects one of the Gaussian SSMs to generate the observed data, giving rise to the switching behavior of this generative model. However, this flexible switching structure comes with high computational complexity: exact inference of the hidden states from observed data quickly becomes intractable, even for ^^ ൌ 2 with moderately long time series. [0212] It has been noted that when the switching states are known, the hidden states of Gaussian SSMs can be efficiently estimated. Conversely, one can infer the hidden HMM states given the Gaussian SSM states. Based on this insight, the intractability can be circumvented by using a surrogate distribution, ^^, which approximates the intractable posterior, Specifically, we introduce two auxiliary variables ^ acts as
the model evidence for the ^
Gaussian SSM to produce the observed data in the absence of known Gaussian SSM states, while represents the model responsibility for the
Gaussian SSM to explain the observed data when the switching states are unknown. Therefore, alternately updating allows us to efficiently estimate posterior
distributions of all hidden states in closed-form. The functional forms of these variables are obtained by maximizing a closeness metric between the two distributions ^^ and ^^. This procedure is known as variational approximation. We can also use this approximate variational inference within an instance of generalized EM algorithm to learn (fine-tune) the Gaussian SSM and HMM parameters when the model parameters are unknown (unsatisfactory). FIG.18 outlines this variational Bayesian learning algorithm for switching state-space models. [0213] The variational inference procedure requires a good initialization of so
![Figure imgf000051_0004](https://patentimages.storage.googleapis.com/a8/d0/b0/a45bc3cef6f2d9/imgf000051_0004.png)
that they can be iteratively updated to drive the surrogate posterior ^^ closer to the intractable true posterior ^^ as described in Materials and methods. Given the non-convex nature of the problem, a good initialization should lead to a good local minimum. In practice, having an
informed initialization is often difficult since no prior information on the discrete variable is available. One interesting quantity available from closed-form state inference is an
![Figure imgf000052_0006](https://patentimages.storage.googleapis.com/e8/66/ea/6f8306970ac173/imgf000052_0006.png)
interpolated density, defined as the conditional probability distribution of any particular observation, given all past and future observations. This interpolated density allows us to devise an informative initialization of the iterative variational inference procedure, instead of using deterministic annealing (see details in section Initialization of fixed-point iterations of Materials and methods). Concretely, we use this density to compare between the Gaussian SSMs and establish the initial weights for the HMM, which enables us to achieve
![Figure imgf000052_0005](https://patentimages.storage.googleapis.com/3b/48/63/c7977c3d3ee6af/imgf000052_0005.png)
superior performance in both segmentation and parameter estimation. [0214] In the following results sections, we first show simulation studies to assess the performance of such variational inference and learning. As performance metrics, we evaluate segmentation accuracy of a time series by learned switching states against the ground truth, and parameter estimation errors where applicable, to compare the proposed algorithm with a few existing switching inference algorithms. In addition, we investigate the effects of varying data length on segmentation and parameter learning metrics. Lastly, we model real-world human sleep EEG using switching state-space models and apply the proposed algorithm to detect the occurrence of sleep spindles in an unsupervised manner. [0215] Segmentation with posterior inference [0216] We first focus on the variational inference part (E-step) of the algorithm that approximates the true posterior distribution ^^ with a structured approximate posterior distribution ^^ via fixed-point iterations (see the Variational approximation of hidden state posterior section of Materials of methods). To demonstrate improvements with the novel initialization procedure using the interpolated density, we repeated the simulations from Ghahramani and Hinton (2000) using the following switching autoregressive (AR) models of order 1:
![Figure imgf000052_0001](https://patentimages.storage.googleapis.com/50/06/f9/1a757189403c62/imgf000052_0001.png)
[0217] The starting points for each AR model were drawn from the same distribution as their respective state noises. The switching state ^^
௧ followed a binary HMM process with initial p
riors
and a symmetric state-transition probability matrix with
W
e generated 200 sequences of 200 time points from the above
generative model and analyzed the segmentation accuracy of inference algorithms given the true parameters. [0218] Four inference algorithms were compared: static switching, interacting multiple models (IMM), variational inference with deterministic annealing (VI-A), and the proposed variational inference with interpolated densities (VI-I). Static switching assumes the switching state ^^
௧ to be independent across time points and applies the Bayes rule directly for switching inference. IMM utilizes Gaussian merging to approximate the posterior distribution at each time point to estimate the switching state. VI-A was initialized with a temperature parameter ^
that decreased to
over 12 fixed-point iterations with
![Figure imgf000053_0002](https://patentimages.storage.googleapis.com/1b/36/ac/53230e08a51c6d/imgf000053_0002.png)
For comparison, VI-I also ran for 12 fixed-point iterations. Switching states were labelled with a 0.5 threshold on the posterior model probabilities. [0219] An example simulated ^^ is shown in FIG.19A along with the estimated switching state by random segmentation and the four inference algorithms. Histograms of percentage correct segmentation (FIG.19B) verify that the same results of VI-A are obtained as in Ghahramani and Hinton (2000) with mean accuracy 0.810. Surprisingly, both static switching and IMM were more accurate, with means at 0.827 and 0.864, respectively. VI-I achieved the best mean accuracy at 0.890, surpassing the other algorithms. Notably, in terms of the precision of correct segmentation, i.e., the width of the histograms in FIG.19B, all of static switching, IMM, and VI-I show superior performance over VI-A. [0220] Segmentation with parameter learning [0221] An important advantage of the variational Bayesian framework over other approximate inference algorithms is the ability to learn model parameters using a generalized version of EM. When the underlying parameters are unknown, an accurate learning algorithm can improve segmentation performance over an arbitrary selection of parameters; conversely, if the learning algorithm is inaccurate, segmentation could become unreliable. [0222] To investigate switching segmentation when the model parameters are unknown and must be learned from data, we conducted simulations similar to the study above using the following generative model:
![Figure imgf000053_0001](https://patentimages.storage.googleapis.com/7f/43/dc/eead5847d5c45f/imgf000053_0001.png)
[0223] Initial state values followed their model-specific state noise distributions. An HMM process identical as before was used for the switching state ^^
௧, and we again generated 200
sequences of 200 time points. We initialized all algorithms with random parameters drawn from uniform distributions centered around the true values for the ^
th
sequence:
![Figure imgf000054_0002](https://patentimages.storage.googleapis.com/01/cb/2b/16601cf2ab2396/imgf000054_0002.png)
where the notations for model parameters are described in the Real- and discrete-valued state-space models section of materials and methods. [0224] We compared the same four algorithms: for static switching and IMM inference algorithms, we estimated the segmentation of sequences using the randomly initialized parameters, since parameter learning is not part of these algorithms; for the VI-A EM and VI- I EM learning algorithms, we ran both the fixed-point iterations during the E-step and generalized EM iterations until convergence (see the Fixed-point iterations section of Materials and methods). The same temperature decaying in the variational inference analysis was used for the VI-A EM learning at every E-step. Switching states were labelled with a 0.5 threshold on the posterior model probabilities. [0225] Percentage correct segmentation histograms show performance degradation as expected across all algorithms compared to inference with true parameters (FIG.20A). Despite the use of incorrect parameters, static switching and IMM achieved reasonable segmentation accuracy at 0.750 and 0.809, respectively, which decreased by 6-7% compared to the previous section where the true parameters were used. Notably, VI-A EM was much less accurate, with a mean accuracy of 0.705. In contrast, VI-I EM maintained excellent segmentation accuracy, outperforming the other algorithms, with a mean of 0.849 that only decreased 4.1% compared to the previous section. This pattern of better segmentation accuracy of VI-I EM over others was replicated when the same simulation study was repeated across a wide range of data length (see FIG.20C). Unlike static switching and IMM, both EM methods achieved greater segmentation accuracy as data length increased, followed by eventual plateauing. However, VI-I EM reached its plateau at a much higher accuracy than VI-A EM. [0226] To characterize the robustness of variational learning and the ability to recover generative model parameters via EM, we analyzed the converged model parameters produced by the two variational learning algorithms relative to the true values (FIG.20B). Across
model parameters, VI-A EM failed to identify the true values in most cases, suggesting the algorithm got trapped in local maxima of log-likelihood, which explains the poor segmentation accuracy in FIG.20A. VI-I EM estimated model parameters that have distributional modes at the true values, indicating more successful parameter recovery than VI-A EM. [0227] Tracking updated model parameters through 10 EM iterations reveals that estimated parameters converged quickly (∼5 EM iterations) to their stationary values (FIG.20D). On average, the stationary values obtained by VI-I EM were closer to the true values than VI-A EM, which is consistent with FIG.20B. Additionally, some VI-A EM estimated parameters converged away from the true values. Increasing the data length reduced estimation errors in both methods while requiring slightly more EM iterations to reach desired convergence. The observation noise variance ^^ did not vary substantially during EM iterations, since its magnitude was much smaller compared to the state noise variances ^^
^^,ଶ^. Lastly, both methods tended to overestimate the probability of HMM to stay in the same model, a bias due to finite sampling that was attenuated for a longer data length (FIG.20D). [0228] All of the simulations produced similar results when they were repeated with less informative distributions for model parameter initialization, albeit with slightly less accurate segmentation. Overall, the variational learning method with interpolated densities showed robustness under uncertain model parameters. [0229] Extensions of model structure and parameter estimation [0230] The simulations studied so far employ observations that switch between two AR1 models. While the findings are useful as a proof of principle for variational Bayesian learning, they are not representative of neural signals in real applications that require switching inference, as such signals usually have more complex dynamics. In addition, a binary HMM switching between two parallel processes introduces large change-point deviations as can be seen in FIG.19A, which might be trivially detected by looking at the first derivative of the time series. To address these gaps, we conducted simulation analyses using a single bivariate AR1 model that switches between two state-transition matrices. Specifically, the generative model is as follows:
![Figure imgf000055_0001](https://patentimages.storage.googleapis.com/58/c6/e9/804367cb387b1a/imgf000055_0001.png)
where
The switching state ^^
௧ followed a binary HMM process as before with initial priors
and a state-transition probability matrix with
![Figure imgf000056_0004](https://patentimages.storage.googleapis.com/f6/11/0f/f349dcb267b652/imgf000056_0004.png)
![Figure imgf000056_0005](https://patentimages.storage.googleapis.com/d5/3a/70/db0c2dd430daf5/imgf000056_0005.png)
Therefore, this generative model consists of two identical AR1 models where the second model has time-varying influence on the first. An example simulated time series ^^ is shown in FIG.21 along with the resultant segmentation by the four inference algorithms using true parameters. We generated 200 sequences of 200 time points and analyzed the segmentation accuracy of inference algorithms given the true parameters. The same temperature parameters as before were used for variational inference with deterministic annealing. Both VI-A and VI-I ran until convergence of the fixed-point iterations. Switching states were labelled with a 0.5 threshold on the posterior model probabilities. [0231] It is evident from both the generative model equations and the example time series that this is a more difficult inference problem than previous studies using two AR1 models with uncoupled dynamics. Nevertheless, all inference algorithms provided informative estimates compared to random segmentation, with the variational inference algorithms obtaining better accuracy results (FIG.22A). VI-I produced segmentation with the best accuracy mean at 0.741, followed by VI-A at 0.719. Static switching and IMM had worse performance at 0.634 and 0.704 respectively. We note that the true generative model in this simulation is different from the assumed generative model employed during inference. Nonetheless, we see that VI-I still provides useful switching inference. This feature is discussed in more detail in the forthcoming section. [0232] Next, we proceed to variational learning when model parameters are unknown. We initialized all algorithms for the ^^
th sequence with parameters drawn from uniform distributions centered around the true values:
[0233] Multivariate static switching and IMM were applied using the randomly initialized parameters. For VI-A EM and VI-I EM learning algorithms, we ran both the fixed-point iterations (E-step) and generalized EM iterations until convergence. Identical annealing and switching state decision thresholds were used as before. [0234] With the M-step of variational learning in this analysis, we demonstrate an important extension of the structured variational approximation to support jointly estimated parameters
across models. Specifically, the structured variational algorithms assume the following generative model structure in this bivariate AR1 example:
where the two candidate models have distinct transition matrices
that could be updated separately during the M-step, likewise for ^
^ ^ ^ଶ^
However, the true generative model suggests that if two parallel models are used to approximate the switching dynamic, they should share all parameters except the ^^
^ଶ element of the state-transition matrix that gates the effect of the second sequence on the first sequence. Simple manipulations of the update equations can exploit this shared structure to reduce the set size of estimated parameters and pool information across candidate models. [0235] As expected, the segmentation results in FIG.22B show slightly inferior performance compared to when using true parameters. The VI-I EM algorithm achieved the best mean accuracy at 0.732. Distributions of converged parameters were comparable between VI-I EM and VI-A EM, suggesting better segmentation could still be obtained in the absence of obvious improvement in parameter recovery. In addition, when compared against a few different variational algorithms that are able to assume the accurate generative model structure using a single hidden state, VI-I EM also showed improved segmentation consistently. [0236] This simulation study suggests that the method described herein could be effective in analyzing neural signals that exhibit subtle switching dynamics, and in time-varying Granger causality problems particularly when the causal structure is changing rapidly. In the current setting, the switching state can provide a statistical inference on the improvement of temporal forecasting of the first sequence by the second sequence at every time point, without relying on log-likelihood ratio tests and windowing. The method also estimates parameters shared between the “full" and “reduced" models, while taking the observation noise into account. This application can be extended to higher-order models in a straightforward manner. [0237] Switching state-space oscillator models [0238] The performance of methods based on parametric state-space models, such as the switching inference algorithms studied here, is ultimately dependent on how well the chosen model class represents the signal of interest. Oscillations are ubiquitous in neural signals, and therefore they need to be modeled with high sensitivity and specificity for the methods to
capture any transient oscillations. Here, we employ a recently developed state-space model that is uniquely suited for modeling oscillations:
![Figure imgf000058_0001](https://patentimages.storage.googleapis.com/b4/ea/d8/837a58b1ce0672/imgf000058_0001.png)
[0239] This model can be viewed as a generalized phasor rotating around the origin with frequency ^^ in the real and imaginary plane whose projection onto the real line produces the observed oscillation. We refer to this model as an oscillator hereafter. [0240] In many neuroscience applications, neural systems may exhibit different states defined by distinct combinations of oscillations, and the system may transition between these neural states over time. To test the applicability of the switching inference method for analyzing neural oscillations, a simulation study was conducted using hidden states composed of oscillators. We simulated up to 5 simultaneous oscillations at different frequencies with model parameters for 10 seconds under a sampling rate of
100 Hz. The frequencies were set at 1, 10, 20, 30, and 40 Hz respectively. With ^^ total underlying oscillations, one can generate 2
^ െ 1 possible states, each with a different combination of the oscillations. We used a multinomial switching variable ^^
௧ taking
values to select one of the states at each time point, to determine which oscillations were observed. The switching states followed an HMM process with uniform initial priors and a symmetric state-transition probability matrix with 0.98 on the diagonal. We repeated this data generation process 200 times for each of with the above oscillators. FIG.23A
shows one such simulation instance where
[0241] For example, if there are two oscillations with one at 1 Hz and the other at 10 Hz, there are three possible switching states, and the observation equation takes the form:
where the hidden states consist of only the 1 Hz oscillation, of only the 10 Hz
oscillation, and of both oscillations. The switching state ^^ therefore takes values in
௧ ^1,2,3^. We note that candidate models with more oscillators will inherently be favored by log-likelihood measures, owing to their higher numbers of degrees of freedom. To account for this, we mean-centered the model evidence of each candidate model when initializing the E-step.
[0242] We compared the segmentation accuracy of VI-A and VI-I given true parameters as we increased the number of underlying oscillations from 2 to 5. For simplicity, we considered the scenario where the model parameters are known, but as in previous examples, variational learning of the model parameters via EM is certainly possible. In practice, target neural oscillations are often characterized with stationary methods first; these models could be used as-is for inference, or to initialize model parameters for variational learning. FIG.23B shows that as the number of switching states increased, the accuracy of VI-A dropped precipitously while VI-I maintained good segmentation accuracy. VI-I was able to segment complicated switching trajectories close to the ground truth, such as in the example displayed in FIG.23C with 31 possible switching states from 5 oscillations, while VI-A failed to do so as shown in FIG.23D. These results demonstrate that the switching inference method is capable of modeling transient oscillations in a manner that scales well to higher-dimensional problems. [0243] Real-world application: spindle detection [0244] Sleep spindles are transient oscillatory bursts of neural activity observed in EEG recordings during non-rapid eye movement (NREM) sleep, with waxing and waning waveforms in the sigma frequency range (12 Hz–16 Hz) that last 0.5 s–3 s and that occur on a background of slow waves (0.5 Hz–2 Hz). In this section, we describe a novel spindle detection algorithm to illustrate how the developed switching method can be easily applied in the context of neural signal processing. As we will show, switching state-space models provide a rigorous and probabilistic framework for detection and extraction of spindles from sleep EEG recordings in an unsupervised way. [0245] To model slow oscillations ( ^^) and spindles (ς) during sleep, we consider two independent state-space oscillators within a set of two candidate models:
![Figure imgf000059_0001](https://patentimages.storage.googleapis.com/fb/c8/17/03a76e539c310f/imgf000059_0001.png)
process determines the switching state ^^
௧ taking values in ^1,2^ across time. This model represents transient sleep spindles by switching between two “neural states": one with both slow oscillations and spindles, and the other with only slow oscillations. Furthermore, the slow ( ^^) oscillator parameters are shared between the two candidate models to reflect the stationary dynamics of slow oscillations in the background.
[0246] To perform spindle segmentation, we initialized oscillator parameters as described in the Initialization of Gaussian SSM parameters section of Materials and methods. Briefly, the entire time series was modeled with a stationary state-space model including both slow oscillation and spindle oscillators, and parameters were learned through a standard EM algorithm. For variational learning, we derived M-step equations that considered the slow oscillators jointly across both candidate models, to update a single set of parameters for slow waves. In other words, the dynamics of the slow oscillations were modeled by pooling together segments regardless of the presence of sleep spindles. This inference model structure can be viewed as close approximations to a range of other possible generative models for spindles. Lastly, the HMM switching process was initialized with initial state priors
![Figure imgf000060_0002](https://patentimages.storage.googleapis.com/17/b2/b7/f816ab4eed01fd/imgf000060_0002.png)
^^ ൌ
and a state-transition probability matrix with
0.01 that were updated during the M-steps. As mentioned in the earlier section, models with more hidden oscillation components will be favored by log-likelihood measures during learning. Thus, we again matched the ensemble averages of model evidence when initializing each E-step. For VI-A EM, we used the same temperature decay as before. [0247] We first compared variational learning methods with other inference algorithms on a short segment of EEG data recorded during sleep. In addition to static switching and IMM, we also analyzed a special Gaussian merging algorithm derived for Gaussian SSMs where only the observation matrix is allowed to switch. FIG.24B shows that these inference-only algorithms performed poorly, even in this case with strong spindle activity: their posterior model probabilities were biased to select spindles due to the nested structure of the candidate models. On the other hand, VI-A EM labelled spindles that appeared reasonable given the EEG waveform and spectrogram. In comparison, VI-I EM produced segmentation results that were more accurate with tighter margins around spindles (FIG.24A). We note that the posterior model probabilities for VI-A EM and VI-I EM were more polarized (i.e., closer to either 0 or 1) compared to the other inference methods. This pushing the model responsibility ^
^ away from 0.5 is a feature of the fixed-point iterations . A softer segmentation can be obtained using interpolated densities, while a harder segmentation is also possible via the Viterbi algorithm. [0248] Besides estimating posterior model probabilities for spindle detection, this approach provides other parametric characterizations of spindle activity, such as the center frequency. For example, in FIG.24A, the algorithm learns that the spindles are centered around 12.7 Hz. In addition, the underlying spindle (and slow oscillation) waveform can be extracted via the
inference procedure without any bandpass filtering (see FIG.24A). While the spindle waveform appears similar to the bandpass filtered signal, unlike bandpass filtering we can construct confidence intervals around the estimated spindle activity. We also emphasize that bandpass filtering requires a pre-defined frequency band, and this could introduce serious biases if the spindle frequency gets closer to the boundaries. Furthermore, the current method naturally limited spindle activity to the statistically relevant periods without arbitrary thresholding. Finally, these estimated hidden oscillator states (i.e., the real and imaginary parts) can be readily used to compute instantaneous amplitude and phase for downstream hypothesis testing. [0249] Next, we applied the best performing VI-I EM method to three randomly selected 30 s NREM stage 2 sleep recordings, robustly extracting sleep spindles across all time points (FIGS.25A-25C). In addition, we compared the segmentation results to spindles identified by a wavelet-based automatic spindle detector, which has been shown to correspond well to human expert scoring. Every spindle identified by the conventional method was also captured by VI-I EM. However, there were many other spindle events labelled by VI-I EM that were missed by the current “clinical standard”. These events nevertheless showed obvious sigma- frequency power, suggesting a potentially inadequate detection by expert scoring. Furthermore, wavelet-based detection relied on arbitrary thresholds that only selected the strongest portion of a spindle activity. In contrast, actual spindles likely lasted for a longer duration, as evident both from the spindle waveforms estimated by VI-I EM and from the original signal time traces (FIGS.25A-25C). [0250] These results demonstrate that switching oscillator models with variational learning could detect spindles in an unsupervised manner that may be more reliable than expert scoring. In each recording, VI-I EM learned individualized parameters for the oscillators to best capture the time-varying amplitudes, frequencies, and waveform shapes of the underlying spindles. The posterior model probabilities of the HMM process provide a probabilistic criterion for spindle detection, which is not achievable with a simple fixed threshold on bandpass filtered data as in the de facto practice in sleep research. [0251] In this paper we presented an algorithm for inference and learning with switching state-space models. This method holds promise in modeling time-varying dynamics of neural signals, in particular, neural time series data such as EEG. It takes a Variational Bayes approach to approximate the otherwise intractable posterior and enables state estimation and system identification via a generalized EM algorithm. We showed extensive simulation results on how the method can provide accurate segmentation of piece-wise linear dynamic
regions, even when the true model parameters were unknown. We also extended the generative model structure to higher dimensions and to more subtle switching transitions instead of jumping between independent state-space models with distinct dynamics. Accordingly, we derived parameter learning rules that were modified to encode this structure and to provide more efficient parameter updates with joint estimation across switching models. Finally, we applied this learning algorithm to a real-world neural signal processing problem of sleep spindle detection, providing excellent detection and extraction of spindles with a rigorous statistical characterization. Taken together, the results add to a growing literature on how switching state-space models can provide powerful statistical inferences to study neural signals. [0252] We were inspired by an earlier, influential paper by Ghahramani and Hinton that applied the variational approximation to switching state-space models, but we introduced several important innovations that significantly improved the performance of the variational EM algorithm for these models in practice. In addition, we described in detail a simpler derivation of variational algorithms for such models. Ghahramani and Hinton utilized the same basic form of generative models presented here, but postulated the functional forms of the marginal distributions by introducing additional parameters ^^
௧ and ^^
௧ and solved for their optimal values by minimizing the KL divergence between the exact and approximate Hamiltonian functions. In contrast, we derived the optimal functional forms of hidden state marginals to maximize the negative variational free energy, which is mathematically equivalent to minimizing the KL divergence. In this way, we obtained expressions for ^^
௧ and ^^
௧ as summary statistics of the hidden state marginals directly from the variational posterior (see the Variational approximation of hidden state posterior section of Materials and methods). [0253] Given the non-convexity of the problem, robust initialization of the generalized EM iterations, in particular of the fixed-point iterations during the E-step, is crucial for achieving reliable segmentation and avoiding trivial local maxima in the negative free energy. Ghahramani and Hinton proposed a deterministic annealing approach, which can be viewed as a form of variational tempering. As discussed in Ghahramani and Hinton (2000) and analyzed in this paper, the annealed learning does not improve performance over traditional inference-only algorithms. We proposed a novel initialization strategy for ^^
௧, based on the insight that interpolated densities provide a local statistical comparison between the candidate models, agnostic of switching dynamics. The interpolated densities therefore provide natural
starting values for ^^
௧, which serves as the surrogate observation probabilities in the HMM switching process. The results showed that this technique significantly improved the segmentation accuracy and system identification of the algorithm described herein compared to the deterministic annealing approach. We found that this more informed initialization was essential in more complicated scenarios, such as time-varying bivariate AR processes and switching oscillator models, or in practical neural signal processing applications, such as sleep spindle segmentation. [0254] This algorithm provides a unique take on inference and learning for switching state- space models. Compared to traditional inference-only algorithms, also known as assumed density filtering (ADF), the variational learning method allows iterative parameter tuning that improves segmentation. These learned parameters also allow us to characterize the properties of the underlying physical system, which may be important in scientific applications such as neural signal processing. Recent algorithms have explored smoothed inference by extending ADF algorithms. In contrast, the approach provides an elegant solution for smoothed inference using familiar message-passing algorithms such as the Kalman filter, RTS smoother, and forward-backward algorithm, which are applicable after the conditional dependencies between the real- and discrete-valued hidden states are ignored under the parallel-model variational approximation. This approach is an instance of structured variational inference for time series data. [0255] Other variational learning methods have been developed for switching linear dynamical systems based on the same principle of approximating intractable posterior with factorized distributions. A notable difference is that these methods assume a single multi- modal hidden state that is modulated by the switching state instead of the multiple parallel state-space models in the generative process described herein. As we explored in the bivariate AR1 simulation and spindle analyses, multiple parallel models can be an effective way to approximate multi-modality: the hidden states in parallel models are suppressed when they are not contributing to the observed data, since the Kalman filtering and smoothing within each model are weighed by the model responsibility ^^
௧. In addition, when multiple models contain hidden states reflected in the observed data, the smoothed estimates across parallel models are closely linked since they are all conditioned on the same observed data. A future study could compare these two approaches and characterize their computational costs and learning performance. Nevertheless, the initialization strategy using interpolated densities could still be helpful in these multi-modal variational learning algorithms. Given that the observed data have a sufficiently high sampling rate relative to the underlying state-transition
dynamics, interpolated densities can provide local comparisons among switching states to warm start the E-steps of variational learning algorithms close to the desired optimal solution. [0256] More recent algorithms for switching state-space models embrace a Bayesian approach by using sampling techniques to obtain posterior distributions. Some of these methods can simultaneously learn the number of switching states and more complicated recurrent structures. Recurrent neural networks can also be adapted to support Kalman updates in a deep state-space model and to characterize switching dynamics after training. These methods have varying computational cost and complexity, which require careful adaptation for neural signal processing or biomedical applications. A few studies have applied variational inference on more elaborate model structures and showed meaningful segmentation with neural signals. [0257] In comparison to the existing applications of switching state-space models, we focus on interpretable linear Gaussian SSMs that can faithfully capture the distinct dynamics observed in neural signals. By applying an intuitive variational approximation on several candidate models constructed from clinical and neuroscience domain knowledge, the variational learning method we described here provides a general solution for segmenting a given time series into neural dynamic states of interest. This method, however, still has limitations encountered in other switching state-space modeling algorithms. First, many of the methodological considerations we described here, including the bias of log-likelihoods in nested structures, apply to other methods. In addition, a single phenomenon or physical process may have multiple switching state-space representations. For example, the process described in the univariate AR1 simulations can have an alternative generative process: two univariate hidden states may be concatenated into a single bivariate hidden state with fixed dynamics, while an observation matrix gets modulated by the switching state to choose particular indices of the bivariate hidden state. Parameter learning and interpretation can be difficult in this scenario if the parameters for the SSM dynamics and the corresponding observation matrices are both learned from the observed data. Thus, we opted to use fixed observation matrices as a constraint to facilitate recovering unique solutions. We nonetheless provide expressions to update the observation matrices, as in some neuroscience applications, switching observation matrices may best encode neural dynamics of interest. In this scenario, they can be learned in a data-driven way with the approach described herein after sharing all SSM parameters across candidate models, analogous to constraining to a stationary SSM dynamic in a single multi-modal hidden state.
[0258] Our assumed generative structure, consisting of multiple independent dynamical models, permits efficient closed-form computations on the approximate posterior using existing state-space inference algorithms. While the fixed-point iterations and parameter updates can be rigorously derived and familiar methods can be easily plugged in, the non- concave negative free energy poses challenges for inference, including frequent local maxima with polarized model probability estimates that we report here. An alternative to the generalized EM algorithm is blocked Gibbs sampling, which has been successfully applied to switching state-space models. A future direction for further development could be to apply blocked Gibbs sampling to the class of neural oscillator models, in a manner that uses the interpolated density to improve computational efficiency. [0259] In the same vein of simplifying the state estimation using sampling techniques, black- box variational inference is also applicable to switching state-space models. Black-box variational approximations are attractive because they provide a general-purpose tool requiring almost no distributional assumptions on the dynamical systems. However, they might not achieve as good performance in cases where we can identify candidate models and their switching structure from prior domain knowledge. In addition, closed-form inferences tend to have greater computational efficiency than sampling techniques despite the requirement for fixed-point iterations, which converge quickly when the parallel models are reasonably initialized. One clear advantage of sampling techniques is that they are readily applicable in the case of non-linear observation models, such as observations generated by counting processes (e.g., point processes to model neural spiking activity with history dependence). In this context, we note that the approach described herein can also be extended to such non-linear observation models with relatively small effort. For example, point process observations can be handled by replacing Kalman filtering with extended Kalman filtering or unscented Kalman filtering, with the rest of the switching inference procedure unaltered. [0260] In general, the variational inference algorithm described here is best suited for scenarios where an informed guess of model parameters can be used for initialization. From a neural signal processing perspective, we are equally concerned about characterizing interpretable hidden dynamics of different neural states and achieving accurate segmentation of time-varying activity due to transitions among them. This intention is reflected in the use of multiple models in parallel, which compete with each other to explain observed data, rather than using a larger array of models with arbitrary structures and parameterizations that are intended to approximate neural dynamics of interest. In contrast, the parallel models we use favor a more parsimonious and informed set of candidate models to obtain interpretable
characterization of dynamics and accurate segmentation. The use of the interpolated density is consistent with this premise, since it better exploits prior domain knowledge and the carefully constructed candidate models. In cases where there is an unknown number of switching states, the dynamics are less well-characterized, or the goal is to fit to data as well as possible, a number of sampling-based algorithms such as in could be highly effective. [0261] The sleep spindles analyzed here serve as a good example of scenarios where switching state-space models can be directly applied to achieve much more powerful analyses than standard methods employed in the field. Spindles have been studied with visual scoring by trained human experts for decades, until recent efforts to develop reliable detection algorithms in light of their integral roles in normal sleep physiology, memory consolidation, and psychiatric and neurodegenerative diseases. While different in the exact implementation, almost all spindle detection algorithms are fundamentally based on setting an arbitrary threshold and picking strong transient signals in predefined frequency ranges, which is severely biased as shown in a recent study. With informed prior knowledge of sleep oscillations, state-space oscillator models can be constructed. The exact parameters of these models should be learned from data since they vary across individuals and different stages of sleep. But one can decide on the number of switching models and their compositions given domain knowledge of sleep and the target spindle detection problem. In this study, we show that the variational learning method can robustly extract sleep spindles from NREM2 sleep in an unsupervised manner, which obviates the need for thresholding. At the same time, variational learning can better characterize spindle properties such as frequency and amplitude by focusing on periods when they are statistically detectable. As mentioned earlier, the use of state-space oscillator models allows one to directly estimate instantaneous amplitude, phase, and frequency without relying on traditional bandpass filtering and Hilbert transform, as well as to easily obtain derived measures such as phase–amplitude coupling. In addition, the state-space formulation also allows for filtered estimates of posterior switching state probabilities for online detection of sleep spindles, making it potentially suitable for real-time applications. Thus, there are numerous analytical advantages to a switching state- space modeling approach to detect and extract spindles. [0262] Similar applications can be widely found across neuroscience: burst-suppression induced by anesthetic drugs, epileptic bursts, event-related potentials during cognitive tasks, spike train recordings, or simply behavioral responses over a block of trials. Switching state- space models provide a statistical framework to investigate temporal variations and extract distinct states that are ubiquitous in different datasets. Thus, we advocate for a broad adoption
of switching methods in place of previous windowing techniques that limit the analysis of neural signals. [0263] Materials and methods [0264] Definitions [
0265] We denote a sequence of values or vectors defined for
with
We denote the matrix trace operator by
we mean the Gaussian distribution:
[0266] Real- and discrete-valued state-space model [0267] A state-space model (SSM) defines a probability density of real-valued time series data, ^ ^^
௧^, by relating them to a sequence of hidden state vectors, ^ ^^
௧^, which follow the Markov independence property so that the joint probability for the state sequence and observations can be factored as:
[0268] In its simplest form, the transition and output processes are linear and time invariant, incorporating multivariate Gaussian uncertainties:
where ^^, ^^ are called state-transition and observation matrices. Provided is Gaussian,
this set of equations defines a joint Gaussian density on we use the term Gaussian
SSM to denote a model of this form. [0269] A closely related type of state-space models has hidden states,
^ ^^
௧ ^, which are discrete-valued, i.e., a multinomial variable that can take one of ^^ values. Using the Markov independence property, the joint probability for the discrete-valued state sequence and observations can be again factored as in , with ^^
௧ taking place of ^^
௧:
[0270] The state-transition probabilities, are specified by a ^^ ൈ ^^ state-transition
matrix, If the observed data are also discrete symbols taking one of L values, the
observation probabilities can be fully specified by a observation matrix, This type
of models is known as hidden Markov models (HMM). [0271] Next, we outline the various existing exact inference and learning algorithms associated with Gaussian SSM and HMM, in order to define the terminologies used throughout this paper. These algorithms will serve as the building blocks for inference and learning computations of switching state-space models described later. [0272] State estimation [0273] Given the observations and model parameters, inferring the hidden states of SSM has well-established solutions in the literature for linear Gaussian SSM with parameters
[0274] For Gaussian SSM, the joint distribution of the observations and hidden states given in is Gaussian, so inference on the hidden states also induces Gaussian posteriors. Various different application scenarios call for three kinds of inference problems: filtering, smoothing, and prediction. Filtering deals with computing the posterior distribution
of the current hidden state, ^^௧, conditioned on the observations up to time ^^. The recursive algorithm that carries out the computation is known as Kalman-Bucy filter. The smoothing problem addresses finding the posterior probabilities
of the hidden states ^^
௧ given also future observations, i.e., up to time A similar recursive algorithm
running backward from ^^ to ^^ implements the computation. This backward recursion, combined with the Kalman filter running forward recursions from time 1 to ^^, is called Rauch-Tung-Striebel (RTS) smoother. Finally, prediction computes the posterior predictive distribution ^
௧
of the future hidden states, ^^
௧ାఛ, conditioned on observations up to time ^^, as well as
௧ given the observation matrix that relates future hidden states
to observations. [0275] For HMM, there are two similar inference problems of interest. First, the recursive forward-backward algorithm computes the posterior probabilities of the discrete hidden states given observations from time
As the name suggests, the forward pass computation steps are analogous to the Kalman filter, while the backward pass steps are analogous to the RTS smoother. The second form of inference deals with decoding the most likely sequence of hidden states that could generate the observations. A well-known solution is given by the Viterbi algorithm that relies on similar forward and backward passes through all time points. [0276] System identification
[0277] The problem of finding the model parameters ^
![Figure imgf000069_0001](https://patentimages.storage.googleapis.com/3f/e1/87/89c864c5c9b32b/imgf000069_0001.png)
is known as system identification in the engineering literature. In the most general form, these problems assume that only the observed data sequence is accessible. Given the probabilistic nature of the models, one can choose to either seek a single locally optimal point estimate of the parameters (Maximum likelihood (ML) or Maximum a posteriori (MAP) learning) or follow a Bayesian approach to compute or approximate posterior distributions of the model parameters given the data. In the system identification context, estimates from the former category (i.e., ML or MAP learning) are more popular and make use of the EM algorithm. The EM algorithm alternates between optimizing the posterior distribution of the hidden states given current parameter estimates (the E-step) and updating the parameter estimates from the optimized posterior distribution of the hidden states (the M-step). This general procedure is guaranteed to increase the log-likelihood of the observed data sequence w.r.t. the model parameters. [0278] For Gaussian SSM, the E-step is realized by the RTS smoother, and the M-step takes the form of a linear regression problem. For M-steps with ML estimates, the linear regression problem remains unconstrained. On the other hand, for M-steps with MAP estimates, priors on the model parameters put constraints on the log-likelihood. Update equations can be derived in closed-form after taking derivatives with respect to each of the model parameters. [0279] In the case of HMM parameters, the E-step utilizes the forward-backward algorithm to infer the posterior probabilities of the hidden states (Rabiner Feb./1989). The M-step uses the expected counts of the discrete-valued state transitions and observations to update the state-transition and observation matrices with ML or MAP estimates. This procedure, also known as the Baum-Welch algorithm, predates the EM algorithm above. [0280] Switching state-space models [0281] We employ one particular form of switching state-space models that allows time- varying dynamics among arbitrary real-valued state-space models with stationary parameters. In addition to its more general formulation compared to modeling time-varying parameters within a single model, this construction offers an elegant solution under the variational Bayesian framework as detailed later. The generative model consists of ^^ linear Gaussian SSMs indexed by numbers from 1 to ^^ which each contain continuous real-valued states, and one HMM whose states take on discrete integer values from 1 to ^^. Furthermore, the states within each of the Gaussian SSMs are assumed to evolve independently from other models. The discrete HMM too is independent of the Gaussian SSM and decides which one
of the ^^ state-space models is generating the current observation data point. The directed acyclic graph corresponding to this conditional independence relation is shown in FIG 26. [0282] The real-valued hidden states in the ^^ Gaussian SSMs are labelled as with
![Figure imgf000070_0001](https://patentimages.storage.googleapis.com/e0/8f/85/b7caf4e61c331f/imgf000070_0001.png)
![Figure imgf000070_0010](https://patentimages.storage.googleapis.com/fb/2d/68/7cd1a73fe7ffca/imgf000070_0010.png)
the discrete-valued hidden states of HMM as ^ ^^
௧^ where
and the real-valued observations as ^ ^^
௧^. The joint probability for the observations and hidden states therefore factors nicely as:
where all three parts have the familiar forms from classical Gaussian SSM and HMM literature. We provide each of these expressions below. [0283] First of all, conditioned on the discrete hidden state,
ൌ the observation is simply a multivariate Gaussian variable from the observation equation of the ^
Gaussian SSM. The probability distribution of the observation vector is given by:
where ^^ is the observation noise covariance matrix, and ^^
^^^ is the observation matrix of the linear Gaussian SSMs indexed by ^^. [0284] Secondly, the real-valued states of the ^^ Gaussian SSMs evolve independently and in parallel, with dynamics specified by the model-specific state-transition matrix,
^ ^
, and state noise covariance, ^^
^^^, starting from different initial states,
்
[0285] Lastly, the discrete-valued switching state evolves according to the HMM specified by the ^^ ൈ ^^ state-transition matrix ^^ and initial state probabilities ^^, independent of the other ^^ Gaussian SSMs with real-valued states:
[0286] Intractable posterior of hidden states [0287] With the generative model defined, state estimation and system identification problems need to be solved. As reviewed above, both problems are encompassed in the EM algorithm, which alternates between 1) finding the posterior distribution of the hidden states given the current values of model parameters, and 2) optimizing the model
parameters given expectations of hidden states under the posterior distribution from (1). However, the hidden state posterior here cannot be computed explicitly. To illustrate this d
ifficulty, we start with the complete log-likelihood of observations and hidden states:
parameters of the distributions that fully characterize the switching state-space generative model in FIG.26. Note that the HMM process does not have a separate observation probability parameter (e.g. ^^). Once the switching state is fixed to ^^
௧ ൌ ^^, the observation equation dynamic of the corresponding ^^
th Gaussian SSM in the second term of specifies the likelihood of generating the observation at that time point. [0288] State estimation, i.e., the E-step, is computationally difficult because of the products between the discrete switching states of the HMM and the real-valued states of the Gaussian SSMs. This conditional dependence between the hidden states (the ‘two’ causes) when conditioning on the observations (the ‘common’ effect) has been described in the causal
inference literature. Graphically, conditioning on the children of v-structures results in a graph with parents connected with undirected edges for the new conditional (in)dependence relation (FIG.27A). The new edges introduced make it challenging to express the exact hidden state posterior in a factorized form. In particular, the individual Markov independence property within each Gaussian SSM is no longer present, since conditioning on
alone does not make conditionally independent from the history due
to the new viable paths through the other Gaussian SSMs. Then exact inferences can only be solved after considering the combinations of states across all Gaussian SSMs and HMM. For example, the optimal filtering problem alone requires a bank of elemental estimators with the size of the bank increasing exponentially with time, so naturally the smoothing problem, i.e., state estimation of becomes intractable. Without the hidden state
posterior distribution estimated, system identification, i.e., parameter learning, cannot be performed either under the classical EM algorithm. [0289] Variational approximation of hidden state posterior [0290] One possible solution is to use the Variational Bayes technique to approximate the
true posterior of the hidden states, he idea is to work within a
subspace of tractable posterior probability distributions defined over the hidden states, and choose the optimal approximation, based on a lower bound on the
marginal log-likelihood of observations:
known as the negative variational free energy. Since the distribution ^^ is approximating the
true posterior, the conditioning on
in the expression
already implied, therefore omitted in all ^^ notations. [0291] The choice of ^ maximizes the
negative free energy so that it reaches the true log-likelihood. However, since exact E-step is intractable due to the structure of the posterior distribution, we consider the following family of distributions that factor over the HMM and Gaussian SSMs, i.e.,
[0292] In other words, we get rid of the dependence between the switching state and real- valued states by limiting to distributions within this functional subspace. The graphical model corresponding to this conditional independence relation is shown in FIG.27B. It only remains to find the optimal approximate posterior that is the closest to
the true posterior Since negative free energy is a lower bound on
the marginal log-likelihood, we use the negative free energy as a measure of one-sided closeness to choose the optimal approximation as following:
[0293] We proceed to a direct derivation of the optimal functional forms of the approximate posterior. Within the functional subspace ^^, the negative free energy can be simplified:
where
[0294] Following the development in, the optimal functional forms for the variational log-
posteriors to maximize are given as:
with ^^ and ^^
ᇱ being normalizing constants so that the posteriors sum or integrate to 1.
[0295] Substituting the expressions from into reveals that the variational log-posterior of the switching state can be written as:
[0297] This equation is identical to the log-posterior density of the discrete-valued states in an HMM with observation probabilities proportional to
In other words,
functions as the model log-evidence of the ^^
th state-space model in generating the data at time ^^ in the HMM process. Thus, the desired posterior distribution, i.e.
individual time points, can be expressed in terms of the forward-backward variables from the forward-backward algorithm and computed efficiently.
[0298] Similarly, we obtain the following variational log-posterior of the real-valued states in the Gaussian SSMs:
that factorizes over the ^^ parallel state-space models, i.e.,
[
0299] Each of the log-posteriors within the sum takes the familiar Gaussian density form:
that can be computed using a forward pass through Kalman filter, followed by a backward pass through RTS smoother to get the posterior mean and covariance matrices,
[0300] Kalman smoothing here should be modified to use as the observation noise
covariance. Scaling the observation noise leads to an intuitive interpretation: when
resulting in high effective observation noise, the hidden state update that follows one-step prediction during Kalman filtering is effectively skipped for that time point; when ℎ
^^^
the hidden state is updated as in recursions without switching. In this sense,
represents the model responsibility of the ^^
th state-space model in generating the observed data at the time point ^^. [0301] Fixed-point iterations [0302] The variables are the bridges between the distributions ^^ and ^^,
replacing the stochastic dependencies with deterministic quantities. The variational log- posterior of the discrete switching states ^ ^^
௧^ involves some summary statistics of the real- valued states the approximate distribution ^^, and vice-versa. This
motivates an iterative procedure to update the forward-backward variables and posterior means and covariances in a cyclic fashion to find the optimal variational parameters
1. Run forward-backward algorithm to update
2. Compute from the updated ^
3. Run Kalman filter and RTS smoother to update 4. Compute from the updated
^ ^ ^^^
[0303] During these iterations, the negative variational free energy can be computed using log-likelihood values from the Kalman filter and forward-backward algorithm. We stop the fixed-point iterations when the model evidence and model responsibility,
![Figure imgf000075_0014](https://patentimages.storage.googleapis.com/3e/43/cb/5154517bb34e0e/imgf000075_0014.png)
have stabilized or when the negative free energy stops increasing. [0304] Under the assumed generative structure in FIG.26, the true posterior ^^ is expected to contain polarized values (e.g., with model probabilities close to 1 or 0) due to a few factors. First, each transition between candidate models introduces a discontinuity in the observation sequence at the switch point, as the realization trajectories from different SSMs are distinct
from each other. Second, since the SSM hidden state trajectories consist of multiple time points, they reside in a high dimensional space. The dimensionality further increases as SSM state dimension increases. Because of this high-dimensionality, the separation between any two candidate models becomes very substantial. Third, real recordings often have infrequent transitions between distinct dynamics. This empirical skewness leads to a high probability to stay within the same candidate model. Such probability propagates as a strong prior across time points amplifying the effect of the other factors in polarizing the values of posterior estimates. [0305] Correspondingly, the fixed-point iterations could polarize the values of ℎ
^ ^
through the reciprocal interdependence between ^^
^^^ ௧ and ℎ
^^^ ௧ . The observation noise covariance during Kalman smoothing is scaled by the model responsibility as ^^/ℎ
^^^ ௧ to produce the model log-evidence. This log-evidence is then used by the forward-backward algorithm to compute the model responsibility for the next pass of fixed-point iterations. Thus, this cycle could amplify the ℎ
^^^
for the best candidate model toward 1 while pushing the others close to 0, effectively assigning each time point to one of the candidate models instead of maintaining a mixture of models. This empirical behavior of the fixed-point iterations appears similar to automatic relevance determination (Wipf and Nagarajan 2007), with ^^
^^^ ^^
^^^ treated as fea
^^^ ௧ tures weighted by ℎ
at each time point. However, the log- likelihood expression and therefore the pruning mechanism of the respective solutions are different. [0306] It should be emphasized that the negative variational free energy is not jointly concave w.r.t. the values of ^^
^^^ ௧ and ℎ
^^^ ௧ . In fact, the free energy landscape contains abundant local maxima, and the fixed-point iterations stop updating whenever a local maximum is reached. Empirically, the fixed-point iterations indeed recover polarized ℎ
^^^
even when the segmentation accuracy is low, i.e., when sub-optimal local maxima are reached. This behavior makes proper initialization the most important factor in the performance of the current variational inference approach. We address this technical challenge below in section Initialization of fixed-point iterations. [0307] Parameter learning using generalized EM algorithm [0308] Once the approximation of the hidden state posterior can be computed efficiently, one can employ an EM-like algorithm to learn the model parameters of the Gaussian SSMs and HMM. To this end, starting from an initial guess of model parameters, we first repeat the
fixed-point iterations until convergence to find the optimal variational parameters, which parameterize the approximate hidden state
![Figure imgf000077_0001](https://patentimages.storage.googleapis.com/ac/a4/ec/04fc73fa1ff095/imgf000077_0001.png)
posterior (E-step). Next, we update the model parameters of the individual Gaussian SSMs and HMM to maximize the negative variational free energy using the converged variational parameters (M-step). Iterating between these two steps constitutes an instance of generalized EM algorithm where in lieu of data log-likelihood, a lower bound on the log-likelihood is maximized over successive iterations. It should be noted that while derivations differ, the overall form of this algorithm is the same as in Ghahramani and Hinton (2000). A flowchart of this variational learning algorithm is shown in FIG.18. [0309] In the following section, we derive closed-form update equations for the model parameters of the Gaussian SSMs and HMM, given the approximate hidden state posterior. We note that these M-step equations can be derived from the complete log-likelihood expression in under the expectation w.r.t. the variational approximate distribution ^^, because only the first term in depends on the parameters ^^. It is therefore sufficient to maximize this term for an optimization problem analogous to but over ^^. [0310] Specifically, at the ^^
th EM iteration, with the converged variational parameters, ^
Further the expectation of the complete log-likelihood is:
[0311] The optimal parameters, for
which maximize , can be obtained in closed-form by taking individual partial
derivatives. The parameter subset can be updated for each
Gaussian SSM through the usual equations described in Shumway and Stoffer (1982):
with a slight exception for ^
^^^ due to the product with the switching state:
[0312] We note that since the ^^ Gaussian SSMs have independent dynamics, these updates can be completed efficiently in parallel. [0313] Similarly, the usual update equations for an HMM can be used to update
[0314] As noted earlier,
the observation probabilities of the HMM are not explicitly updated as a parameter, since the variational model evidence, ^^
^^^ ௧ , is converged through the fixed-
point iterations and used as point estimates of the (log) observation probability to relate the hidden states of Gaussian SSMs to the observation at each time point. [0315] Finally, the update equation for ^^ pools the posterior estimates of hidden states across the ^^ Gaussian SSMs:
where
௧ captures the contribution from the ^^
th model:
[0316] This update equation for ^^ is an instance of joint estimation of parameters shared among Gaussian SSMs. In a more general case where each state-space model has its individual observation noise covariance ^^
^^^, the update equation takes the form:
[0317] It is also possible for other Gaussian SSM parameters
![Figure imgf000079_0003](https://patentimages.storage.googleapis.com/96/50/6f/4f15d7897fc213/imgf000079_0003.png)
to be partially shared among different models. Closed-form update equations in these cases can be derived. This extension exploits the flexibility of the variational Bayesian method to accommodate different generative models. [0318] Initialization of fixed-point iterations [0319] As shown above, the M-step equations provide optimal updates of parameters in a convex optimization setting given the multivariate Gaussian distribution ^^ and the posterior hidden state estimates. In contrast, the fixed-point iterations in the E-step are not guaranteed to achieve globally maximal negative free energy due to the non-concavity of the negative free energy w.r.t. the variational parameters. This makes the final approximate distribution at the end of fixed-point iterations very sensitive to initialization. Moreover, while the fixed- point iterations are defined explicitly step-by-step, it is not obvious how to initialize these iterations. Thus, here we describe two practical initialization approaches. [0320] Deterministic annealing. [0321] This entropy-motivated initialization technique was proposed along with the variational framework to alleviate getting trapped in local maxima during the fixed-point iterations. Specifically, equal model responsibilities are used as the initial at the onset of
![Figure imgf000079_0004](https://patentimages.storage.googleapis.com/db/04/53/72e3c97b33f97e/imgf000079_0004.png)
the first E-step, followed by RTS smoothing and forward-backward algorithm to compute
with definitions modified by a temperature parameter ^^:
[0322] The temperature ^^ cools over successive fixed-point iterations via the decay function ^^
^ା^ ൌ ^ ^^
^ ^ 1^/2. In essence, a larger ^^ allows the iterations to explore a bigger subspace with high negative free energy entropy, which gradually decreases in trying to identify an optimal approximate distribution. This initialization technique was necessary for the variational approximation to produce any reasonable results. [0323] After the first E-step, ℎ
^^^
is not re-initialized with equal model responsibilities to allow a warm start in subsequent iterations. However, this makes the algorithm severely limited by the results of the first round of fixed-point iterations. Under this initialization scheme, once the ℎ
^^^ th
for the ^^ model gets close to zero, that model cannot regain responsibility for that time point in subsequent iterations. One may choose to reset the ℎ
^^^
to be equal across models at every E-step, but that tends to select only one of the Gaussian SSMs for the entire duration as a local maximum. Regardless, with the model parameters fixed during the E-step, potential local maxima are likely selected, since the annealing is equivalent across the ^^ Gaussian SSMs. Also, initializing ℎ
^^^
with equal responsibilities assumes all models to explain all time points equally well, which is certainly far from the true posterior ^^. Thus, a better initialization should discern among the Gaussian SSMs and try to initialize closer to the global maximum. [0324] Interpolated densities. [0325] Here we propose a different initialization technique that statistically compares the ^^ Gaussian SSMs for their likelihoods of generating the observation at each time point. Specifically, we initialize ^^
^^^ ௧ using the interpolated density, i.e., the probability density of the observation at a given time point conditioned on all other time points under the ^^
th model.
[0326] In other words, we attempt to initialize based on how well each Gaussian SSM predicts the current observation, ^^
௧, based on all the past and future observations. We expect this informative initialization of ^^
^^^ ௧ to be close to the global maximum in general. Since
conventional Kalman filtering and RTS smoothing cannot compute the interpolated density easily, we utilize a different implementation. [0327] This new initialization technique is well-grounded in the view of ^^
^^^ ௧ as the log- evidence of generating the observation at each time point within the HMM of a discrete- valued state ^^
௧ ൌ ^^, with ^^ ∈ ^1,⋯ , ^^^. In the absence of known Gaussian SSM states, the next best choice is to evaluate which model dynamic provides the closest interpolation from all other time points for the current observation. It can also be seen as a “smoothing" extension of using filtered densities in place of the HMM observation probabilities in the early switching state-space model inference literature. [0328] As extensively analyzed in the Segmentation with posterior inference and Segmentation with parameter learning sections in Results, variational inference and learning with interpolated densities substantially improve over deterministic annealing and offer greater segmentation accuracy compared to the other switching inference methods. However, unlike the definition of ^^
^^^ ௧ in that is comparable across arbitrary Gaussian SSMs due to the identical ^^, interpolated densities are bonafide normalized distribution density functions. Therefore, the model-specific parameters (e.g. ^^
^^^), especially the hidden state dimensionality, could bias the interpolated densities during the initialization. A simple heuristic is suggested to address this bias if present and shows robust performance in the spindle detection problem. [0329] Sleep spindle detection application [0330] To demonstrate the utility of switching state-space models in neural signal processing, we analyzed EEG recorded during overnight sleep from a healthy young adult. The sleep EEG data were collected at the Massachusetts General Hospital Sleep Center after obtaining written informed consent from the subject, and the study protocol was approved by the Massachusetts General Hospital Human Research Committee. The recording was acquired using the 128-channel eego mylab EEG amplifier system (ANT Neuro, Enschede, The Netherlands) at a sampling rate of 1 kHz. Ag/AgCl electrodes were arranged in an equidistant Waveguard montage (ANT Neuro). The ground and online reference electrodes were placed at the left mastoid and a central electrode Z3, respectively. Sleep staging was scored by a trained polysomnography technician following AASM guidelines. [0331] EEG data segments were extracted from the periods scored as non-rapid eye movement (NREM) stage 2 sleep, re-referenced to the common average reference, and then downsampled to 100 Hz. We analyzed single-channel segments from a left parietal electrode
LA2 (analogous to C3 in the International 10-20 system). EEG spectrograms were computed using the multitaper method with 1 s window length and 95 % overlap between adjacent windows (3 discrete prolate spheroidal sequences tapers, corresponding to a time-half- bandwidth product of 2, and 2
^^ minimum number of fast Fourier transform samples) after constant detrending within the sliding window. [0332] Initialization of Gaussian SSM parameters [0333] As mentioned in the section in Results, we modeled slow oscillations ( ^^) and sleep spindles (ς) observed in EEG recordings during NREM stage 2 sleep using Gaussian SSMs of the following form:
[0334] The Gaussian SSM parameters, and the observation
noise variance ^^ need to be initialized in order to learn the optimal model parameters using the generalized EM algorithm. We accomplished this by fitting two oscillators (one in slow frequency range, the other in spindle frequency range) to the EEG time series, assuming that both oscillations are present for the entire duration. This fitting was done using a standard EM algorithm with the parameters initialized based on prior knowledge of the typical frequencies of these sleep oscillations: ఋ
ఋ
[0335] Initial states were taken as zero-mean white noise with variance of 3 and not updated. We ran the EM algorithm for 50 iterations and used the resultant parameters as initial guesses for the Gaussian SSMs in switching state-space models. For the switching inference algorithms that do not have mechanisms to update model parameters, these parameters after the 50 EM iterations were directly used to infer segmentation. [0336] Priors for MAP estimates [0337] Parameters in state-space models of the form in can be subjected to prior distributions to yield MAP instead of ML estimation. We followed Amanda M. Beck et al. (2022) to impose priors on the rotation frequency, ^^, and the state- and observation-noise variances, ^^
ଶ and ^^. We used these MAP estimates throughout all the M-steps that involve updating the Gaussian SSM parameters and the observation noise variance.
[0338] Multilevel SSMs for High precisions ERP Analysis [0339] Event related potentials (ERPs) provide a non-invasive method to study psychophysiological correlates of sensory and cognitive processes with components that are informative of the course of sensory (‘exogenous’) and cognitive (‘endogenous’) processes with millisecond temporal resolution. ERPs are tiny ~ 1 μV signals that are embedded in background spontaneous oscillations that may be 10 to 100 times larger. Classically, combining a large number single-trial waveforms into an averaged ERP waveform is considered to be a “gold standard” for averaging out the event-independent activities. However, it is often difficult to obtain enough trials to extract these time-locked elicited responses, since a large number of trials are required to effectively average or cancel out the background of persistent oscillations. Also, excessive averaging could easily obscure fine structures of the ERPs, since intrinsic factors like mental workload, fatigue, attention are known to modulate time locked responses. To reduce this reliance on large number of trials, we propose to explicitly model the background oscillations using a novel linear state-space model, so that the persistent, spontaneous activities could be effectively removed. [0340] To this end, we model the human brain as a linear time-invariant system, consistent with classical ERP literature, so the response evoked by stimulus presentation manifested in EEG can be written as a convolution between ERP waveforms, ^ ^^
^^
^ ^, and an impulse train pertaining to discrete events of stimulus presentation, ^^
௧:
[0341] But unlike traditional analyses, we model the spontaneous stimulus-agnostic activities in EEG as a superimposition multiple oscillators, modeled after state-space oscillators:
a
re assumed independent,
ampling
frequency, and ^^
^^^ ^ 1. In other words, we explicitly impose oscillatory temporal dynamics on background activity rather considering them as white or otherwise unstructured noise. In summary, we use the oscillator models to work in tandem with ERP model to explain the EEG recording, effectively removing the contamination coming from strong oscillations:
[0342] We further include a prior on the ERP waveform to ensure robust recovery and improve the variability of the estimates. We chose to impose a temporal continuity constraint in form of following state-space model:
resulting in a multi-level state-space framework. We hereby refer to this ERPs as state-space ERP or SS-EEP, in short. [0343] Learning Algorithm [0344] We learn oscillator parameters, noise variance,
, and ଶ
smoothness parameter, ^^ using an instance of Expectation Maximization (EM) algorithm. EM alternates between estimating the joint posterior distribution of the hidden oscillator states and ERP waveform given the current parameters (the E-step) and updating the model parameters given the distribution of hidden states (the M-step). We modify the E-step by constraining the posterior distributions of ERP waveform and oscillation states to be independent:
![Figure imgf000084_0007](https://patentimages.storage.googleapis.com/a8/a0/19/c776c7cc326153/imgf000084_0007.png)
[0345] to be able to cyclically update the oscillator state and the ERP distribution. This approach is known as Variational Bayes approximation, and the modified algorithm is known as generalized EM. Specifically, the ERP update step resembles ridge regression, with the estimated oscillations removed from EEG signal. It is noted here that in contrast to point estimates in classical averaging technique, the framework provides the posterior distribution of the ERP. [0346] Alternatively, the model in (1) can be utilized with a different assumption that the ERP admits following expansion on a dictionary consisted of a given basis function. As a
representative of such basis functions, we considered Gabor dictionary, ^^, whose atoms are given by Gaussian kernels of fixed variance, normalized to have maximum 1.
W
e again consider a zero mean gaussian prior on the expansion coefficients, ^ ^ ^ with a diagonal Λ covariance matrix. We hereby refer to this ERPs as Kernel Expansion ERP or KE-ERP, in short. [
0347] Learning the ERP. Similar to SS-ERP, Oscillator parameters,
n
oise variance, ଶ
, and expansion coefficient prior variance, Λ are learned
using an instance of Expectation Maximization (EM) algorithm. EM alternates between optimizing the distribution over the hidden oscillator states and ERP waveform given the current parameters (the E-step) and updating the parameters given the distribution of hidden states (the M-step). The E-step was further simplified by constraining the posterior distributions of ERP waveform expansion coefficients,
^ ^^ and oscillation states to be statistically independent:
![Figure imgf000085_0004](https://patentimages.storage.googleapis.com/d0/42/c8/fc861216140b9b/imgf000085_0004.png)
[0348] to be able to cyclically update the oscillator state and the ERP distribution. The different model parameters can then be re-estimated in closed from in the M-step. This learning of Λ, in fact, encourages the coefficients of irrelevant lags to drive to zero, thereby, performing an automatic relevant detection (ARD) in a completely data-driven manner. [0349] Simulation Results [0350] FIGS. 28A-28B illustrates the SS-ERP extraction from simulated auditory tone responses, contaminated by strong slow and alpha oscillations. FIG. 28A depicts how the oscillatory components are effectively removed to retrieve the auditory evoked responses by explicit modeling of the oscillatory activities. FIG.28B compares the SS-ERP to the traditional average ERP alongside the ground truth for increasing number trials ( ^^ ൌ 25,50,100). Clearly, SS-ERP follows the ground truth very closely in all three cases, with the narrow confidence intervals deteriorating slightly as ^^ decreases. On the contrary, average ERP is dominated by the large oscillatory components present in the background: meaningful results are only obtained after averaging 100 trials. This demonstrates that the SS-ERP is superior at capturing the true latencies and amplitudes of the ERP peaks from small number of trials.
[0351] FIGS.28C-28D illustrate the KE-ERP extraction from the similar example. They correspond to the FIGS 28A-B respectively. Clearly, the observations made for SS-ERP also apply here. [0352] Thus, while significantly reducing the number of trials requirements, the proposed method will potentially allow tracking of short-term changes in ERP due to various intrinsic and extrinsic reasons, which would have significant implications for neuroscience studies and clinical applications.