SIGNAL PROCESSING METHOD USING A FTNITE-DIMENSIONAL FILTER
The present invention relates to a signal processing method and a finite- dimensional filter for use in deriving parameters of a linear system.
A number of signal processing procedures involve attempts to determine a system model which can be used to define and predict a received signal of interest. A received signal or observed data may include additional components, such as Gaussian noise, which need to be taken into account when forming the system model. Signal processing filters are used to process the received data and provide estimates which can be used to derive the parameters of the system. Processing of the received data is executed on the basis of an iterative process to determine the parameters and a filter may be used to execute part of the process. Once the parameters have been determined or converged to an acceptable value after executing the iterative process, the system model can be used to determine the signal of interest with the additional components removed. This is applicable to a number of signal processing applications, including multi-sensor speech enhancement, high resolution source localisation in multi-sensor signal processing and linear predictive coding of speech. The process can also be applied to dynamic shock error models used in economic forecasting.
One iterative process, known as the expectation maximisation (EM) algorithm, is used to obtain maximum likelihood (ML) estimates of system parameters. ML parameter estimation for linear Gaussian models and other related time series models using the EM algorithm was studied in the 1980s, as discussed in R.H. Shumway and D.S. Stoffer, "An Approach to Time Series Smoothing and Forecasting using the EM Algorithm" , J. Time Series Analysis, Vol. 3, No. 4, pp. 253-264, 1982 and D. Ghosh, " Maximum Likelihood Estimation of the Dynamic Shock-Error Model" , Journal of Econometrics, Vol. 41 , No. 1 , pp. 121-143, May 1989, and developed more recently for signal processing applications, as discussed in E. Weinstein, A.V. Oppenheim, M. Feder and J. R. Buck, "Iterative and Sequential Algorithms for Multisensor Signal Enhancement" , IEEE Signal Proc , Vol. 42, No. 4, pp. 846-859, April 1994 and V. Krishnamurthy, "On-line Estimation of Dynamic Shock-Error Models" , IEEE Trans. Auto. Control, Vol. 35, No. 5, pp. 1 129-1 134, 1994. Each iteration of the process comprises two steps, the expectation (E-step) and the maximisation (M-step). The E-step for linear Gaussian models involves determining the following two conditional expectations based on all of the observed or received data: (i) The sum over time of the state of the signal of interest, (ii) The sum over time of the covariance of the state of the signal of interest.
From these two expectations or estimations, the maximisation step can be used to determine the ML estimates of the parameters of a system model for the signal of interest.
Determination of the expectations in the E-step in the past has involved, for linear Gaussian system models, approximating the sums over the signal distribution by performing fixed interval smoothing of an integral. Implemented in this manner the E-step is non-causal. To perform the fixed interval smoothing a Kalman smoother has been used which requires processing the time series data obtained in one time direction and then also processing the data in the reverse direction. This is referred to as performing both forward and backward passes over the time series data and during each iteration pass, the state data obtained has to be stored for use in the next iteration until the E-step is completed. This significantly affects the memory requirements for executing the E-step, and the necessity
to perform both a forward and backward pass using all of the time series data obtained reduces the speed of the process.
In accordance with the present invention there is provided a signal processing method for determining parameters of a linear Gaussian system to remove a Gaussian noise component, comprising: processing received data by a finite-dimensional filter to obtain expectation data; processing said expectation data to derive parameter data representative of said parameters.
The present invention also provides a finite-dimensional filter for processing received data to obtain state data for deriving parameter data of a linear Gaussian system associated with said received data.
Preferred embodiments of the present invention are hereinafter described, by way of example only, with reference to the accompanying drawings, wherein:
Figure 1 is a block diagram of a preferred embodiment of a system for executing an expectation and maximisation process;
Figure 2 is a block diagram of a finite dimensional filter for the expectation and maximisation process;
Figure 3 is a block diagram of another preferred embodiment of a system for executing an expectation and maximisation process;
Figure 4 is a block diagram of a two sensor speech enhancement system; Figure 5 is a graph of parameter estimates against number of data passes; Figure 6 is a block diagram of a system for localisation of narrowband sources;
Figure 7 is a block diagram of a chemical process plant; Figure 8 is a graph of output data obtained from the plant against time; and Figure 9 is a graph of parameter estimates for a model of a subsystem of the plant versus iterations of an expectation and maximisation process.
A linear Gaussian system model can be applied to time signals and time series data, such as uncoded speech, which includes a random component that can be considered to vary in accordance with a Gaussian distribution. The system model for such a signal xk+ 1 for k = 0, 1 , is given by the dynamics:
Xk = A-.Λ + 5JMW*H (1)
Here xk e Rm and x„ e Rm is a Gaussian random variable with zero mean and covariance matrix B2 0 e Rmx,n.
At time k + 1 , k = 0, 1 , , the noise in equation 1 is modelled by an independent Gaussian random variable with zero mean and covariance matrix B2 k+ 1. It is known that such a Gaussian random variable can be represented as Bk+ 1 wk+ 1 where wk+ 1 is an m-vector of N(0, 1) random variables.
For k = 0, 1, , the state xk may be observed indirectly via the vector observations yk, where
yk = Λ + Dkv k (2)
Here for each k, yk e Rd and vk is a vector of independent N(0, 1) random variables. We assume that Dk is a symmetric non-singular d x d matrix.
The vector observations yk represent the received data or signal and constitute the information which is available on the state of the signal of interest xk. For a system which includes a Gaussian noise component, the variable vk and the parameter Dk represent that noise component which needs to be determined so that the state xk can be determined.
To execute the E-step, instead of using a Kalman smoother, a filter is used to derive the sum of the state data and the sum of the covariance of the state data. If we let et, βj denote the unit vectors in Rm with 1 in the ith and jth position respectively, and let en be the unit vector in Rd with 1 in the nth position, for i e { 1 , , m} and n e { 1 , ,
d}, we can define the scalar processes
(3)
where (•,■> denotes the inner product.
The Hk represent covariance sums and the Jk represents the sum of the state multiplied by the observed data yk and is hereinafter referred to as the state sum. The filter derives the conditional expectations
recursively, and the data produced by the filter is then used in the maximisation step of the EM process to derive or estimate the parameters Ak, Bk, Ck and Dk of the system, from which the state of the signal of interest xk can be determined. The expectations are related to densities of the distribution of the state and can be determined from those densities. The accompanying Appendix A describes the relationship between the densities and the required expectations, and explains the derivation of filter equations for the expectations. The filter for determining the expectations is configured to process the observed data and other data as described by the equations.
The densities αk, β(0) k, β(1) k, β<2) k and γk are defined by equation 11 in Section 4.1 of Appendix A. They can be determined by the integrals specified in equations 13 to 17, yet the integrals are not closed and normally require a fixed interval smoothing process, such as that executed by the Kalman smoother with all its inherent disadvantages to provide estimates of the densities. Theorems 5.1, 5.2 and 5.3 however specify filter
equations for the densities, and Theorem 5 4 uses the relationship between the densities and the expectations to derive finite-dimensional filter equations 47 and 48 for the expectations The coefficients of the filter equations au(M) k, bIJ<M) k and d'J(M) k can be determined from equations 71 to 79 using the observed data and the data obtained trom a Kalman filter, being μk, Rk and Rk| k.!, as recited in equation 70, where Ak, Ck and Dk are assigned initial parameter estimates. The filter equations 47, 48 and 71 to 79 each stipulate the configuration of a respective finite-dimensional filter which together form a finite- dimensional filter 14 to generate the expectation data
The expectation data is then used in the EM process to obtain the parameters A, C,
B and D of the system, as recited in equations 84 to 87 ot Appendix A The expectation data is obtained by retaining the summations recited in equations 47, 48 and 70 to 79 as a forward pass is made across the observed data without requiring data to be retained at each discrete time interval for each iteration of the forward pass Also no backward iteration is required to obtain the expectations.
The EM process executed using the new filters, as shown in Figure 1 , involves the observed data yk 10 being applied to a Kalman filter 12 and the finite-dimensional recursive filter 14. The Kalman filter 12 processes the observed data yk 10 to generate conditional mean data and conditional covariance data of the state, μk, and Rk. This is applied with the observed data yk to the finite-dimensional recursive filter 14 which generates the conditional expectations in accordance with equations 47 and 48 of Appendix A The expectations are then passed to an M-step process module 16 to generate estimates of the parameter in accordance with equations 84 to 87 The parameter estimates generated by the M-step process module 16 are also used by the Kalman filter 12 and the finite- dimensional filter 14 for further iterations. Although the finite-dimensional filter 14 normally executes a number of forward iterations, parameter estimates can be obtained using one forward or real time pass of the observed data.
The finite-dimensional filter 14 includes first, second and third coefficient modules
18, 20 and 22, and first, second and third delay units 24, 26 and 28 for the coefficient modules, respectively. The coefficient modules 18 to 22 receive the conditional mean and conditional covariance data of the state, μ and Rk, from the Kalman filter 12. The first coefficient module 18 generates and ouφuts updates of the coefficients a<M) k+ 1 according to 5 equation 71 based on the data received from the Kalman filter 12 and the coefficients a{M> k, b(M) k, d(M) k outputted from the delay units 24, 26 and 28, respectively. The second coefficient module 20 generates and ouφuts the coefficients b(M) k+ι according to equation
72, 74 and 76 based on the inputs from the Kalman filter, the previous b(M) k provided by the delay unit 26 and the coefficient d(M) k ouφutted by the delay unit 28. The third
10 coefficient module 22 generates and ouφuts the coefficients d(M) +ι according to equations
73, 75 and 77 on the basis of the data from the Kalman filter 12 and the previous coefficient d<M) k provided by the delay unit 28. The coefficients a(M) k+ I, b(M, k+ι and d(M) k+ I generated by the modules 18 to 22 are provided to a filter covariance and sum module 30 which generates the covariance and state sum expectations according to equations 47 and
15 48.
The filters of the filter 14 are finite-dimensional because of the following two properties that hold at each time instant:
1. The filtered density of the current time sum of the state γk is given by an 0 affine function in x times the filtered state density αk as per equation 44.
The filtered state density is a Gaussian in x with mean and covariance given by the Kalman filter equations 24 and 25.
2. The filtered density of the current time sum of the state covariance β(M) k, is a quadratic in x times the filtered state estimate αk per equation 28. 5
The filtered density of the state sum is given in terms of four sufficient statistics, namely the two coefficients of the affine function in x and the Kalman mean μk and covariance Rk. Similarly the filtered density of the covariance sum is given by five sufficient statistics.
30
The above "closure" property also holds for higher order statistics as well. The filtered density of the current time sum of the p-th order statistic of the state is a p-th order polynomial in x times the filtered state estimate. Thus in general finite-dimensional filters can be derived for the time sum of pth order statistics of the state. For the filtered E-step only the filter for the first and second orders is used.
The EM process and the filters used can be implemented by an application specific signal processing circuit, but normally would be implemented using one or more microprocessors and a number of iterative software routines to process the observed data or signal. The EM process based on the use of the filters 12 and 14 has significant advantages over the EM process which uses an integral smoother, such as the Kalman smoother. The filter-based EM process provides significant savings in memory requirements, as it eliminates the need for a backward iteration and only summations need to be obtained, therefore removing the requirement for estimates to be maintained for each time interval for successive iterations.
The filter based process is also twice as fast as the previous smoother based EM process as no forward and backward iteration method is required. Parameter estimates for the process also converge faster, which is particularly advantageous when processing observed data as it is received. The finite-dimensional filters of the filter 14 are also decoupled, in that one does not depend on the other, and therefore can be implemented in parallel on a multiprocessor system to further enhance the speed of the process, as discussed in Section 9 of Appendix A.
A recursive version of the EM process can be implemented as shown in Figure 3.
Here at each time instant with each new data point, the Kalman filter 12 and the finite- dimensional filter 14 are used to compute the required statistical data, and the model parameter estimates are updated by an M-step process module 32. The resulting process is recursive in that the parameters are updated by the M-step module 32 at each time instant, rather than the end of the pass of the available data points as in an off-line case. This real
SUBSTΓΓUTE SHEET (Rule 26)
time joint parameter and state estimation can be used to track time varying parameters in real time multi-sensor signal enhancement, and high resolution localisation of narrowband sources, as discussed hereinafter. Other applications of the filter-based EM process include linear predictive coding of speech and estimation of dynamic shock error models as discussed hereinafter.
When enhancing a desired signal in the presence of noise, multiple sensor measurements will typically have components from both the signal and the noise sources. Since the systems that couple the signal and noise to the sensors are unknown, one must deal with the more difficult problem of joint signal estimation and system identification.
A frequency domain approach to the two-sensor signal enhancement problem is discussed in M. Feder, A. V. Oppenheim and E. Weinstein, " Maximum Likelihood Noise Cancellation using the EM Algorithm" , IEEE Trans. Acoustics Speech and Signal Processing, Vol. 37, No. 2, pp. 204-216, Feb. 1989. In that approach, the desired (speech) signal is modelled as an auto- regressive (AR) Gaussian process, the noise is modelled as a white Gaussian process, and the coupling systems are modelled as linear time-invariant finite impulse response filters. In order to deal with the non-stationarity of the signal, the noise, and the coupling systems, it is suggested in the paper that an algorithm be applied on consecutive time frames using a sliding window. This approach involves two contradicting requirements. The window should be short enough so that the algorithm will respond to non-stationary changes in the signal and noise statistics. However, the window should be long in order to improve the statistical stability of the resulting signal and parameter estimates and in order to obtain a computationally tractable algorithm in which non-causal frequency domain Wiener filtering can be applied.
Using the filter-based EM process described herein, a time domain approach can be adopted, whereby the process jointly estimates the signal, the noise, the coupling systems and the unknown signal and noise spectral parameters. The advantage of this time domain approach is that many of the computational and conceptual difficulties associated
with the frequency domain approach are avoided. An example of the time domain approach for an observed signal sk where vk represents white Gaussian noise is described on pages 31 and 32 of Appendix A.
5 A two sensor speech enhancement system 32, as shown in Figure 4, uses two noisy sensors (microphones) 34 and 36 to receive a speech signal in ambient noise. The noise received at the two sensors 34 and 36 affect the received speech signals and are correlated with different correlations as the sensors 34 and 36 are in ditferent locations. The aim is to optimally combine the two sensor ouφuts to extract optimal estimates of the speech signal 0 (state) and speech and noise parameters This is achieved by using the filter-based EM process in a module 38 which receives the digitised outputs of the sensors 34 and 36 Source code in Matlab for the module 38 is given in Appendix B. For a simulation using the source code of Appendix B the signal source was assumed to be an AR(1) process with AR coefficient a! = 0.75, δ2 v = 0.01 for the observed signal sk of equation 88 of 5 Appendix A. The sensor (observation noise) variances were assumed to be δε = 0.0025. The filter-based EM module 38 was run on 1000 data points and the initial AR parameter estimate was randomly initialised to 0.941. Figure 5 shows the AR parameter estimates versus successive passes and the parameter estimates are very close to the true parameter value. A recursive version of the process can be also used to achieve real time (on-line) 0 speech signal (state) and parameter estimates. This can be used to track time varying signal and noise statistics. The smoother based EM process requires of the order of N2T memory (wherein N is the order of the AR process and T is the number of data points) whereas the filter based process requires no storage memory apart from local variables. Also the various filters are decoupled and hence can be implemented in parallel. A systolic 5 implementation is described in Section 9 of Appendix A.
For high resolution localisation of narrowband sources, the use of the EM process is described in I. Ziskind and D. Hertz , "Maximum Likelihood Localization of Narrow- Band Autoregressive Sources via the EM Algorithm" , IEEE Trans. Signal Processing, Vol 30 41 , No. 8, pp. 2719-2723, August 1993. The filter-based EM process described above can
be used on a Gaussian state space model to estimate the parameters of model signal sources and their direction of arrival. In our notation the multi-sensor multi-source signal model in Ziskind and Hertz can be expressed as:
yk = C(θ^)x, + Dvk W
The state equation relates to the narrowband sources. In particular, θ{1) are parameters that determine the bandwidth of the sources. In the observation equation, θ(2) denotes the angles of arrival to the multi-sensor array. C(θ(2)) is called the steering matrix of the array towards direction θ(2). A Kalman smoother is used to obtain maximum likelihood estimates of the parameters θ(1), θ(2) and noise variances B and D. The filter-based EM process can be used instead of the Kalman smoother by using a filter-based EM process module 40 to receive the signal data observed by a sensor array 42 in response to the signals generated by a plurality of narrowband sources 44, as shown in Figure 6. The module 40 is able to provide estimates of direction arrival θ<2), bandwidth θ(,) and source signal xk+ I for each narrowband source 44.
The filter based EM process can be used, for example, in a chemical process plant application. A plant 50 may include a three-tank level and temperature system, as shown in Figure 7, which is versatile and representative of typical chemical process plants. The plant 50 has two-cylindrical glass tanks 52 and 54 and one conical glass tank 56, each having an internal diameter of 16 cm, and height of 50 cm. The system is equipped with several transducers: three differential pressure cell transducers 58, 60 and 62 for measuring the flow rates of steam and water; and five thermocouples 64, 66, 68, 70 and 72 for measuring the temperature of the water at various points in the system. In addition there are three control valves 74, 76 and 78 for manipulating the flow rates of water and steam into the tanks 52, 54 and 56, and three tank level sensors 80, 82 and 84 for the tanks 52, 56 and 58. Further details of the plant are given in O.O. Badmus, D. Grant Fisher and Sirish L. Shah, "Real-time, Sensor-based Computing in the Laboratory" , Chemical Engineering Education, Fall 1996, pages 280 to 289. The process equipment of
the plant 2 can readily be configured to demonstrate the performance of different process systems and control schemes. Only a subsystem of the plant 2, being the bold portion of Figure 7 for the first tank 52, is utilised for the description below. To generate data suitable for identification of a dynamic process model, a uniformly distributed random input signal of maximum amplitude 1 (0 < M < 1) was generated and applied every 5 seconds as a perturbation of the nominal value of the position of the cold water valve 74 (set at 21 % open) with an amplification gain of 10. The input signal is bounded by the interval 16% to 26% open. As described in the O.O. Badmus et al article, a noisy Gaussian AR series adequately describes the process dynamics when the input signal is assumed to be unknown and modelled as Gaussian white noise. Figure 8 shows a 2000 point ouφut data segment obtained from the level sensor 80 for the first tank 52 when the input signal is white Gaussian noise with variance 8.2687. The ouφut series was detrended by subtracting the mean, resulting in a zero-mean time series. Assuming an AR(2) model for the subsystem, the filter-based EM process was executed on the zero-mean data. Initial AR parameter estimates were arbitrarily chosen as 0.4 and 0.0. The parameter estimates obtained from the filter-based EM process are plotted in Figure 9. The estimates converge after about 10 iterations. After 20 iterations the estimates are 0.5970 and 0.4027. The parameter estimates, after only 6 iterations, are very close to the final values which stabilise after 10 iterations. This rapid convergence, combined with the fact that no memory storage of previous values is required, demonstrates the advantages of the filter- based EM.
In the linear predictive coding of speech, unvoiced speech is modelled as an auto- regressive process driven by white Gaussian noise. The auto-regressive parameters correspond to those of the vocal tract resonant cavities and change with the flow of speech according to movement of the mouth, tongue and chest. These changes are relatively slow compared to the speech signal variations. The speech signal itself is embedded in white Gaussian noise. The aim is to obtain ML estimates of the AR parameters, which can be done using the filter-based EM process.
The filter-based EM process can also be used as a substitute for the Kalman smoother based EM algorithm described in D. Ghosh, "Maximum Likelihood Estimation of the Dynamic Shock-Error Model" , Journal of Econometrics, Vol. 41 , No. 1, pp. 121- 143, May 1989. Here the EM process is used to attain ML estimates of dynamic shock error model parameters for economies, such as the shadow economy in the United States.
The filter-based EM process can also be used to determine the parameters for a system to predict the spot price of oil, as described in detail in accompanying Appendix C.
The filter-based EM process has been described above in relation to discrete time systems but it can also be applied to continuous time linear Gaussian systems.
Various discretisations of the filters (e.g. first order, second order or robust discretisation) (refer to A. Bensoussan, "Stochastic Control of Partially Observable Systems" , Cambridge University Press, Cambridge, 1992) result in different discrete time filters than can be applied to the above examples.
Many modifications will be apparent to those skilled in the art without departing from the scope of the present invention as hereinbefore described with reference to the accompanying drawings.
Appendix A
There are very few estimation problems for which finite dimensional optimal filters exist, i e . filters given in terms of finite-dimensional sufficient statistics. Indeed the only two cases that are widely used are the Kalman filter for linear Gaussian models and the Wonham filter ( Hidden Markov Model filter) for finite state Markov chains in white noise.
In this paper we derive new finite-dimensional filters for linear Gaussian state space models in discrete-time. Indeed, the Kalman filter is merely a special case of our general filter Our filters compute all the statistics required to obtain maximum likelihood (ML) estimates of the model parameters via the Expectation Maximization (EM ) algorithm
Maximum likelihood parameter estimation of linear Gaussian models and other related time- series models using the EM algorithm was studied in the 1980s in [1] and [2] and more recently in the electrical engineering literature in [3], [4] The EM algorithm is a general iterative numerical algorithm for obtaining ML estimates. Each iteration consists of two steps The Expectation (E- step) and the Maximization (M-step). The E-step for linear Gaussian models involves computing the following two conditional expectations based on all the observations (i) The sum over time of the state, (ii) The sum over time of the state covariance.
In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, the E-step is non-causal involving fixed-interval smoothing via a Kalman smoother ( I e a forward pass and a backward pass)
In this paper we derive a filter-based EM algorithm for linear Gaussian models That is, the E-step is implemented using filters (i.e. only a forward pass) rather than smoothers. The main contribution of this paper is to show that these filters are fi le-dimensionαl Few finite dimensional filters are known, so the result is remarkable. It was certainly a surprise to the authors
The filtered E-step has the following advantages.
1 The memory costs are significantly reduced compared to the standard (smoother-based) EM algorithm
2 Our filters are decoupled and hence easy to implement in parallel on a multiprocessor system
3 Our algorithm is at least twice as fast than the standard smoother based EM algorithm because no forward-backward scheme is required
Filter-based EM algorithms have recently been proposed for Hidden Markov Models in ' ^ These
HMM filters are finite dimensional because of the idempotent property of the state indicator lunction of a finite state Markov chain In linear Gaussian models unlike the HM case the state indicator vector is no longer idempotent Instead our filters are finite dimensional because of the following two remarkable algebraic properties that hold at each time instant
1 The filtered density of the current time sum of the state is given by an affine function in x times the filtered state density The filtered state density is a Gaussian in x with mean and variance given by the Kalman filter equations
2 The filtered density of the current time sum of the state covariance is a quadratic in i times the filtered state estimate
So the filtered density of the state sum is given in terms of 4 sufficient statistics, nameiv the 2 coefficients of the affine function in α. and the Kalman mean and covariance Similarly tlι<= filtered density of the covariance sum is given by 5 sufficient statistics
Actually this remarkable algebraic
"closure property holds for higher order statistics as well We prove that the filtered density of the current time sum of the p-th order statistic of the state is a p-th order polynomial in x times the filtered state estimate So in general we can
finite dimensional filters for the time sum of pth order statistics of the state Of course for the filtered E-step we only use filter for the first and second order statistics Also for p = 0, our filters reduce to the Kalman filter
Applications Our filter-based EM algorithm for linear Gaussian models can be applied to all the applications where the standard EM algorithm has been applied In particular these include
• Multi-sensor signal enhancement algorithms for estimation of speech signals in room acoustic environments [3].
• High Resolution Localization of Narrowband Sources using multiple sensors and Direction of Arrival Estimation [6]
• Linear Predictive coding of speech ( see Chapter 6 -[5])
• Forecasting and prediction of the ''shadow economy'' in market cycles using linear errors-in- \ arιables models [2]
In all these applications the advantages of our filter-based EM algorithm can be exploited
This paper is organized as follows In Sec 2 we present our signal model In Sec 3 we propose a measure change which facilitates easy derivation of the filters in Sec 4, recursions are deriv ed for the
filtered densities of the variables of interest In Sec 5 we derive our finite dimensional Sec I) a general finite dimensional filter is proposed Sec 7 re-e.\presses the filters to allow to
i sinc.ui slat* noise as long as a certain controllability condition is satisfied In Sec {>, the filters derived m Sec 4 and 6 are used in a filter-based EM algorithm to obtain maximum likelihood parameter estimates In
Sec 9 we evaluate the computational complexity of the filters and propose a parallel implementation
Finally conclusions are presented in Sec 10
2 Signal Model
To derive our filters with maximum generality we consider a multi-input multi-output state space model with time varying parameters and noise variances as follows
All processes are defined initiallv on the probability space [V.,T P) We shall consid r the classical linear-Gaussian model for the signal and observation processes That is for L = 0 1 , the signal is given by the dynamics
_ + ι = Ak + ι k + Bk+ι ωjt + i (1)
Here ∑k € Rm and ι0 € Km is a Gaussian random variable with zero mean and covariance matrix B2 €lm m
At time k -- 1, k — 0, 1, , the noise in (1) is modelled by an independent Gaussian random variable with zero mean and covariance matrix B\+i It is known [7] that such a Gaussian random variable can be represented as Bt + i u>t + ι where un+i is an rτ?-vector of Λ'(0 1) random variables
Assumption 2.1 For the time being, we assume that the matrices Bt € ?mxm ι k = 0, 1 , are nonsmgular and symmetric The symmetry follows from the construction in [7] The case when Bk is singular is discussed in Section 9
For Ar = 0,1,. , we assume that the state process x* is observed indirectly via the vector observations yjt, where
Here for each Jfc. yι £ ϋld and m. is a vector of independent V(0 1) random variables We assume that Dk is a nonsmgular d x d matrix
Remark We assume B to be a covariance matrix and hence symmetric tor notational convenience The results in this paper also hold for non- ymmetric B simply replace B- bv B B' and B~ by (BB')~l below
3 Measure Change Construction and Dynamics
We shall adapt the techniques in [10] and show how the dynamics (ll -ind (2) can be modelled starting with an initial reference probability measure P
Suppose on a probability space (Ω, ", P) we are given two sequences of independent, identically distributed random variables n € Em, yt € Rd. Under the probability measure P, the Xk are a sequence of independent m-dimensional N(0,Im) random variables, and the i/jt are a sequence of independent d-dimensional N(0,Id) random variables. Here, 7m (resp Ij) represents the m x m (resp. d x d) identity matrix.
For x Km and y 6 ? write
c(x) = (2π)
_m 2 exp{-x'x/2)
Define the sigma-fields
Qk = σ{x0,xi: .. ,χk,yo,yι, ,y*} yk = o-{y0,y .-. ,Vk) (4)
Thus Q is the complete filtration generated by the x and y sequences and ^ is the complete filtration generated by the observations.
For any matrix B let \B\ denote its determinant
Write φ(D-l(y0 -C0x0)) λπ =
\D0\φ{yo)
and for / > 1
For k > 0 set
A new probability measure P can be defined on (Ω.Vi.tJjt) by setting the Q restriction of the Radon- derivative of P with respect to P dP_ d~P
Definition 3.1 For I = 0,1, define
vι = Dl l{ ι - Cixt)
For / = 1,2, . ., define
Lemma 3.2 Under the measure P, vi and wι are sequences of independent /V(0,/d) and N(0,Im) random variables, respectively
PROOF Suppose / . Rd — R and g . !Rm — R are arbitrary measurable 'test' functions Then with E (resp E) denoting expectation under P (resp P)
E{g(
Wk) fMGk-i} = ~)
using a version of Bayes' theorem [10]
Now Λfc-i ιs k_ι measurable, therefore
However,
(8)
Notice that the inner conditional expectation is
Hence
E{λt|Gt._,} = -i- (fl-'(xt - Ak rt_ ) dxk = 1
Consequently,
f(D;
lly
k -C
kx
k)) Gk-l, k Gk- (9)
Then for any measurable "test" function g
The following theorem gives recursive expressions for the k, k and ~;k. The recursions are derived under the measure P where {x,} and {j/;}, / € 2Z+ are independent sequences of random variables.
Theorem 4.1 For k € + , the un-norma zed densities /^(Λ . M = 0.1.2. -|." and αt defined in (11) are given by the following recursions
+ (x,el)(x,eJ) k.l(z)Φ{B7l(x-Akz)) dz (13)
-ύ(i). β ?
k'ϊ.
0Λ:)v(B
t- -i, - A
k -)) dz
+ (x,cΛ [ ( ,e
J)α
t_
l(;)ι/
)(β
i-'(
i'- l
t:)) i (14) Js™ -I
β.m
{:)lp{B-
l{x_
Λk z)) dz
+ / (:.c,){:,e])ak-l(z)φ(B 1{x-Akz)) dz (15)
ι\n{χ) ø(g t-'(y t-ct χ))rr (βri(l-..4tr)) d.-
|β*l|D* (yt) LΛ- ^*l *
+ ( .c,){yt,en> / -.*-ι(z) '(β 1(ι- it-)) <'- (16)
ak{x) \ ak-ι(z)φ{Bk (x-Akz)) dz (17)
\Bk\\Dk\Φ(yk JS." PROOF. We prove (13). The proof of (14), (15), (16) and (17) are very similar and hence omitted
Since H rk'JlO) = // r'.;_(θ,ι - \xk.tt) (x ,C]). using (δ) we have
Φ(D7l(yk-Ckxk)) φ{B7l(xk-AkXk-ι))
+ E<^Λit_1 (xt,e,) (xk. tj)) y(xk )\yk
\Dk\φ(yk) \Bk\φ(xk)
- EfAt.! / o(D7l{yk-Ckx)) ψ{Brl{x-Akxk-l)) (x, e,){x. e,) g(x)dx\y
= ιaιιaι«wι [/,./.. tf'"")*<°'""'-c"))*'fl''"-"--|)"J|lt"'
+ / / «»-il--)o(Dr'(» - ι)) (■(β;'(J'-.-U--)) (r.e,!!i.<-,)» I<<"J-- :i8)
where the second equality follows from the independence of the ^'s and yk 's under P. Since g is an arbitrary Borel test function, equating the RHS of (12) with (18) proves (13)
D
Remarks:
1. By virtue of (17), we can rewrite (13) and (16) as:
WW = dz+(x,e,)(x.
e])
ak[x)(19)
- 7*ιπ'U^) - = 0 {Dk {yk-CkX) [ γk"_i(x)φ{Bk-1(x-Akz)) dz + ( ,e1)(!/,en)Qjt(ι)(20)
\Bk\\Dkφ(yk) jM
2. The above theorem does not require v\ and wι to be Gaussian. The recursions (17), (13) and (16) hold for arbitrary densities ψ and φ as long as o is strictly positive. We use the Gaussian assumption to derive the finite dimensional filters in Sec.5
3. Initial conditions: Note that at k — 0, we have for any arbitrary Borel test function <7(x)
Equating (11) and (21) yields
, . φ{Dόl(y0-C0x))
(22) \Do\Φ(yo)
Similarly the initial conditions for βk , .1/ =0,1,2 and -.{." are
; ,0,(xl = (._■, <•,)(*, <■,)«,,(*) / (1)W = 0
;
(21(x)
(r.e,)(ι/o,tn/uo( ) (23)
5 Finite dimensional Filters
In this section finite dimensional filters are derived for H[]{ \1 - U 1 2 and }[" defined in (10) In particular, we characterize the densities k, β[l M and n in terms of a finite number of sufficient statistics Then recursions are derived for these statistics
Define the conditional mean and covariance matrix of x , respectively as μ
k = E{x
k\y
k) and
The linearity of (1) and (2) implies that ak{x) is an un-normahzed normal density with mean and variance given by the standard Kalman filter equations
Theorem 5.1 (Kalman Filler) For k — 0,1, ak is an un-normahzed Gaussian density with mean in and xariancc R given via the standard Kalman filter equations
μk = Rk B 2 Λkσ7l RT^μk^ + Rky'k (Dk D[)-' C. (24)
Rk = [(AkRk^A'k + Blrl- Ck' (Dk D[fl θ]~ (25)
where μk e ?.m, Rk £ Kmxm and
σk = A'kB72Ak+R _x (26)
Her μL = E{xk\yk) and Rk = E{(x - μk)(x - μk)'\yk}
PROOF See [9] □
Remark Consequently, ak has the form
k(x) = ά
k (2ι)-
m'
2 |Λ
fc|-
1 2 exp - μ
k)' R - μt)) (27)
where ά
k — J
Rm ct
k(x) dx
Due to the presence of the quadratic term (x,e
t) (x,e_,), the density β
k in (19) is not Gaussian Is it possible to characterize the density β in terms of a finite number of sufficient statistics
7 Amazingly enough, the answer is "yes" As we prove below it is possible to express
as a quadratic expression in x multiplied by a
k(x) for all k The important conclusion then is that by updating the coefficients of the quadratic expression, together with the Kalman filter above, we have finite dimensional filters for computing H[ \ simiiai result also holds tor
3(l H and J
k n
Theorems 52 and 53 that follow derive finite dimensional sufficient statistics for the densities l3llM \I =0,1,2 and [n
Theorem 5.2 At time k. the density ) (initialized according to ι ])) <s completely defined by the 5 statistics a i.}( \!) ,i]( ) ,.j, I
Rk and μ as follou.
t?
k'
{M)(x) = (28)
where a
3{M) G 1, b[
3{M G R
m and d[
3(M) G R'"*"
1 (5 α symmetric matrix with elements d
k{p,q) p = 1. . , m, g = 1, .. ,m.
Furthermore, a
k ,
by the following recursions
'J(M) _ t](M) "lt+1 - "i p=l »=1
,ι (U) _ D-2 , _| .ιj(U) .. .i^ Ol - l w-l „ ι, ιθ)
= „ (30)
/ 'JzO) _ = * D*-+2. > „ .+, * _*-+l. C ,»j(0)^ _ι ,, ■> Λ;+. % + 51 («. «; + *, o d'3 (0) e, - _ f.
^ 1)
+ e. e, σJt"+ι β (,'Jθ = 0n (32)
J' 0) _ 0-2 . -1 , (l) -1 ./ p-2
d'3(1) - 0 (33)
+ 2 te' e σ*+ι -4'l + ι β*+> + fl*+ι ljt+1 σ*+ι e> e'5 ι>j(2) _ , -1 (h'](2) , (9 ,>J(2) , ^
°t + l - °fc + l Λ* + l σt + l ^°* + '2"t + e< e; + ej e έul2) - 0 (34)
J>J(2)
O = ^> Ak + i σ;l, (df + I [e, e; + e,]) 7 _4't + I B Q (35)
where σk is defined in (26) and μk, Rk are obtained from the Kalman filter (24) and (25)
PROOF. We only prove the theorem for M = 0, the proofs for M = 1.2 are very similar and hence omitted.
We prove (28) by induction.
From (23), at time k = 0, β0'l{0)(x) is of the form (28) with α'0 j(0) = 0. i'0 j(0) = 0 and d'j{0) =
(e,e +eje;)/2.
'J(O) .ιj(0) ,ι;(0) 3« (0|
For convenience we drop the superscripts (.j,(0) in ak \ 6λ , dk and βk + l Assume that (28) holds at time k Then at time k + 1, using (28) and the recursion (13) we have
βk + χ{x) - v(B
L + l(x- A + z))(a
k-rb
kz + z d
k z)a
k(:)dz
+ (x,e,)(x,e
;) / v>(B
k + l(x-A
k + l :)) a
k(;)dz (36)
Jr.
™
Let us concentrate on the first term on the RHS which we shall denote as l\
λ
" ") J,
-{(x- κ
+ι--)'βr
+ι - 4ι+ι --) + (-- -/
'»)'?: t' ):
( αt +6'( : + ;' tit.) dr
(37)
Λ'ι(x) / exp -- (c'σι + ι ; - bk'+l z) (ak + b : + z d z)dz JR"> L - where σt is deined in (26)
Λ
'ι(x) =
-- {*' B
kl
x x + μ',
+1 fljrj,/.t+i!
(38)
Completing the "square' in the e.xponential term in (37) yields c; _- 1 cl + l
11 — exp 1 / "l- l ft + 1
Now consider the integral in (39)
1 / σ +lδk+ι σl+l**+l
(ak +b'kz + :' dkz)exp 2 ; "~k* +τ l' v" 2
(2πr/2|σt+1|-1/2(αl+6;E{c} +E(.' :}) (40)
since theexp( ) term is an un-normalized Gaussian density in z with normalization constant (2 τr)m/2 l<τjc + ιl l*~ So σ kll δ +i
E{.} = (41)
E{z'dkz} = E{(.- -E{.-})' .t(z-E{z})} + E{r'}rftE{--}
Therefore from (39), (40). (41), (42) and (36) we have
-H(X) = <n + ι(x)
uι +
k(p,q)σ
k ιip. )
+ ^ ^l + i σ +ι dkσ lι δ*- + ι + -1' e, Cj x (43)
Substituting for δk+l (uhicli is affine in x) in (43) we obtain
A+l(- = (O-L + l Tυ{ + \ -i -x' c/n + 1 - )Ol+|(j ) where αj. + 1, _>t + 1 and (fn + i are given by (29), (30) and (31 j D
The proof of the following theorem is very similar and hence omitted
Theorem 5.3 The density ~f[
n{x) is completely determined by the J, statistics
, b\" , μ
k £ K
m and R
k£E
mxm as follows
yk n(x)= [άk n +b ~\n'x] ak(x), -;i"(x)=(x e,)(Vo e„)α0(x) (44)
where ά[" ξP. If £ ?m are given by the following recursions
*ϊ+ι = B lι ^ + 1 σt~+ι έI" + e- (yt + i.en), = e.{jo.e„) (46)
Having characterized the densities ?J (x), Λ/ = 0, 1 2 and ~:L'n(x) by their finite sufficient statistics, we now derive finite dimensional filters for H' and J[n
Theorem 5.4 nire dimensional filters for H1 ' M - 0 1 2 and }[n are given by
E{H'
3(M))y
k} = a
M) + b
M),μ
k + (47)
PROOF Using the abstract Bayes rule (7) we have
where the constant A
' = J
Rm a
k(x)dx But since Qj
t(x) is an un-normahzed density, from (28)
/ / β ak' (M) '{,x)d ,x = A T- E n{ iθ ιjj(M) + , b ,tkj(M)ι x + , x i d jtkι(JM) ' x l)
Substituting in (49) proves the theorem
The proof of (48) is similar and hence omitted D
6 General Filter for Higher Order Moments
Theorem 54 gives finite dimensional filters for the time sum of the states J[n and time sum of the square of the states H[3 In this section we show that finite dimensional filters exist for the time sum of any arbitrary integral power of the states
Assumption 6.1 I or notatonal simplicity, in this section ικ assume that the state and obsenation processes are scalar lalued i c m — d — 1 I (1) and (2)
Let H be the time sum of the pth power of the state '
(51)
1 = 0
Our aim is to derive a finite dimensional filter for Hk
Define the un-normalized densi βk(x) = E{ H I(ι 6 dx)\yk)
Our first step is to obtain a recursion for β (x)
B\ using a proof
similar to Theorem 41 we can show
Our task now is to characterize βk(x) in terms of finite sufficient statistics Recall that for p = 0, the Kalman filter state and covariance are sufficient statistics as shown in Theorem 51 Λ.lso for p = 1 and 2 Theorems 53 and 52 give finite dimensional sufficient statistics We now show that βk can be characterized in terms of finite dimensional statistics for any p £ 3 +
Theorem 6.2 At time k, the density βk{x) i (52) is completely defined by the p + 3 statistics α t(0) a (ϊ) ,αjt(p), Rk and μk as follows p
A(χ) = £ *(*)*' atk(x) (53)
Lι=0 where
αt+ι(n) ΣΣ a (ι)η> {R;lμky~ t+iB*- ,) o<n<P
■=»)= αn-ι(p) = l -αj( )τjpp (Ak + { Bk + 2 i) \P (54)
and
()) 1 3 (i - J - l)σ;!v ' «/ι - J » et>en i > ;
*?.; = 0 if i - j is odd, i > j (55)
PROOF λs in Theorem 52 we give an proof At k — 0 Jk{x) = xp Qo(-ι) and so satisfies (53)
'These new definitions for //* in (51) and 0k{ι) in (52) ate onk used in this section
Assume that (53) holds at time it Then at tini^ k + 1 using similar arguments to I heorem ό 2 we
The first term on the RHS of the above equation
1 tl + i kli + i
The integral in the above equation is
Now recall from (41) that E{z] is affine in x
Also E{(; - E{z})2) is independent of x Indeed. ([11], pg 111)
E{(*-E{z})-'} = 1 3 (! -J- l)σ
έ ι if i - is even, i > j (60)
Thus
A + ι(z) <*k + l{x) ∑∑∑ak(ι)η } n) (R7iμky-"(Ak+<B7ll)nχ» +χr
Eq (61) is of the form (53) with αt+ι(ι), i = 0 , p given by (54) D
7 Singular State Noise
The filters derived in Theorems 51, > 2 and 53 have one major problem They require B to be invertible In practice (e g see Sec 8), Bk is often not invertible
In this section, we will use a simple transformation that expresses our filters in the terms of the inverse of the predicted Kalman covariance matrix This inverse exists even if Bk is singular as long
as a certain uniform contiollabiliu condition holds Both the uniform controllability condition and the transformation we use aie well known in the Kalman filter literature [11] Chapter 7
Define the Kalman predicted sιaκ covariance as R \k~\ = E{(xjt - μk){n — μk)' |.W-ι } It is straightforward to show that
Rk k-l = Bl+AkRk-lA'k (62)
Our first step is to provide a sufficient condition for Rk\k-\ to De non-singular
Definition 7.1 (Chapter 7, [14J ) The state space model (1), (2) is said to be uniformly completely controllable if there exist a postπc integer \\ and positive constants a 3 such thai
a I < ('fit k - Λ'ι )< 31 for all k > Λf, (63)
Here
C[k k- ι)= ∑ o(k,l+l)B,B,'ώ'(k,l+l) (64)
/ = i-Λ,
4t..4l,_ι J.. + 1 ιfk2 > kι Φ(k2 kl)= i (65) / ιfk2 = kι
Lemma 7.2 // the dynamical system (1), (2) is uniformly completely controllable and RQ > 0 then Rk\ and Rk\k-\ are positive definite matrices (and hence nonsmgular) for all k > Nι
PROOF See [14], pg 238 Lemma 73 Q
Our aim now is to re-express the filters in Sec 5 in terms of Rk\ -\ The following lemma will be used in the sequel
Lemma 7.3 Assume
exists Then with σ
k defined in (26),
PROOF Straightforward use of the matrix inversion lemma on (26) yields
σ l = Rk-i - Rl.lA'k[Bl + Ak Rk., A'k l AkRk-ι (68)
Substituting (62) in (6S) proves (66) To prove (67), first note that
# +i Ak + Ϊ σ7^ = B7^ lλ + 1 Rk - B7^ Λk + ι Rk A'k + l β ( "+ιμ + i Λι
= B
k + i Λ
k +
+ ι
β<-
because . ;.
+ 1 R
k A[.
^ = R
k + ι\
k - Bl
+ l from (62). So
! + i - V-M σk t - 7^{ Ak + ι Rk - B ^{ Ak + l Rk 4- k k Ak + ι Rk
= «i+1|.^+. it . (69)
D Applying the above lemma to the filters derived in Sec.5, we now express them in terms of Rk\k-\ instead of B . The following theorem summarizes the main result of this paper in terms of the new finite dimensional filters:
Theorem 7.4 Assume that the linear dynamical system given by (1) and (2) is uniformly completely controllable, i.e. (63) holds. At time k, the density βk (x) (defined in (11) and initialized according to (23)) is completely defined by the 5 statistics a3 . bk . d'k 3 . Ii and μk as follows:
β ,-,ιk J(K.W) (,x .) = f \a iki'fΛ ) + , o ,ikι(Λf)' x + . x i d ,ιkjlM) x ] αjt( ,x ,), M ,f = n 0, ,1, n2
where a' £ R, ^ £ L?.m, dk £ S m m i a symmetric matrix with elements dk(p,q), p = 1 m.7 = 1. . ,m and k(x) is an un-normahzed Gaussian density with mean μ £ Km and covariance Rk £ ~.mxm.
Also the density 7j."(x) (defined in (11) is completely determined by the 4 statistics άf , b'" as follows:
y[
n(x) =
+ l
t n'x] a
k(x), y
0'
n(x) = (x.e
l)(y
0,e
n)
ao(x)
where £ ?., bl k £? ~ m.
Furthermore μk, Rk, ak , 6( t' , dk , M — 0, 1,2, άk and ik are given by the following recursions (where σ l is given by (66)): Kalman Filter Equations:
μ = Akμk- 4- Rk\k-ιCk' [Ck Rk\k-\Ck' + Dk D'k\ (yk - Ck Ak μk.x) R = Rk\k-> -Rk[k-χC"k {CkRkμ-1Ck + DkD, k)-i CkR -i Λ.lt-ι = Bi' + Av Rk-ιA'k (70)
Finite dimensional filters for a[,). bk" d[" , M = 0.1.2. άk and hk
+, ^ -l
,^ „-ι j
(«
)) ,-ι,^ p-i « , (W)' = _ o (71)
"t+i - lik + ι\kAic+ι Hkdk Λt^t + i /cιt + ιμ + 2 ^ 2
(73) b^ = R illk k + i Rk ( l) + 2d l σ7llR7i μk) -r etej'σklχRk-l μk, 6o (I, = 0mxl
(74)
rf*'Vι = βϊ+ιμ -'^' + t Λt ; β*- A'k+ι Rk + ]\k
+ [*. e; Λt
-
•* + •
R*
e> } ' " = °— <
75)
W 0 JB x 1
(76)
i
n +1 =
+ e,{
ϊt+lle
n)
1 6
0" = e,- {y
0,e
n) (79)
F»nα//y zniϊe dimensional filters for Hk'l(M) and Jk'n (defined in 10) are given by (41) and (48).
PROOF First consider the Kalman filter equations (24) and (25). Using Lemma 7.3 on (25) gives
Λt = (βrl 1 t_1 + cirt-1ct)"1. (80)
Using the matrix inversion lemma on (80) and applying (67) to the first term on the RHS of (24) yields the "standard" Kalman filter equations.
Similarly applying (67) to the RHS of (30), (31) and (46) yield (72) and (73) and (79). D
Singular State Noise
Subject to the uniform complete controllability condition (63). the filtering equations in Theorem 7.4 hold even if the matrices B +
is singular
1. Add e κ /V(0, 1) noise to each component of xfc + ,. This is done by replacing Bk+i in (1) with the non-singular matrix Bk+l = β|t + 1 + c Im where e £? ".. Denote the resulting state process as x'+i
2 Define R[ + 1ιk as in (62) wit h Bk + l eplaced bv B{ + 1 Express the hlters m terms of R[ + l i , as in Theorem 7 4
3 As e — 0, R\ + 1\k — Rk+ι\k
4 Then using the bounded conditional convergence theorem (pg 214 [15]), the conditional estimates of x[ , x
k x
k',
and J
k n(x
c) converge to the conditional estimates of x
k , x
k x'
k,
and J
k'
n(x) respectively
8 Filtered EM Algorithm for Gaussian State-Space Models
The aim of this section is to derive a filter based EM algorithm for computing maximum-likelihood parameter estimates of a linear Gaussian state space system The finite dimensional filters of Section 7 are used in implementing the E-step resulting in a filter-based EM algorithm Consider the time-invariant version of the state space model ( 1), (2)
j.i. + ι = A xk + B wk + Ϊ (81 ) yk = C xk + D vk (82)
Our aim is to compute ML estimates of the parameters θ = (A, B2 , C, DD') given the observation sequence j θ ι , VT (Identifiability and consistency in special parametrized cases are discussed at the end of this section) We do this ia the EM algorithm
EM Algorithm
Suppose we have observations {j/i , yτ \ available, where T is a fixed positive integer Let { Pβ , θ £
Θ} be a family of probability measures on (Ω, F) all absolutely continuous with respect to a fixed probability measure PQ The likelihood function for computing an estimate of the parameter θ based on the information available in y? is
and the maximum likelihood estimate ( MLE) is defined by
θ £ argmax £(0)
The EM algorithm is an iterative numerical method for computing the MLE Let 00 be the initial parameter estimate The EM algorithm generates a sequence of parameter estimates as follows Each iteration of the EM algorithm consists of two steps
Step 1 (E-step) Set 0 = θ, and compute Q( ,0), where dP
Q(θ,θ) = Ei[\og dPi yτ]
Step 2 f M-step) Find θ] + l £ argmaxQ( ;)
The sequence generated {θ
} ,
j > 0} gives non-decreasing values of C(θ
3 ) with equality if and only
It is shown in the appendix that
Q(θ θ) = -T\og\B\-{T+ 1) log| _ IE J ∑(xι -Ax,- )'B-2(ιt - Ax,.ι)\yτ >
-E Σ (yi - C ι) D D' 1 (y, - Cx,)\yτ \ + E{R(0)\yT} (83)
where R(θ) does not involve θ
To implement the \l-step we set the derivatives dQ/dθ = 0 This yields (using the identity 31og|Λ/|/5.\/ = Λ/_1 for any non-singular matrix M)
(II
B
2 = E{
M,
1 A') + AH ύ
T('2
l)> A A'
1=1
(86)
DD' yιy,'-(J!rC' + CJτ) CH
τ {0)σ (87)
where for M = 1,2,3, H
τ lM) £ K
mxm denotes the matrix with elements H
l M = E{H^'
{M)\y
τ}, i,) £ {1. ,m} Also j
τ £ R
mxd denotes the matrix with elements 3ψ = E{Jψ\y
τ), i = 1,. , m n = 1, , d The terms Hlj. and 7" are computed using (47
) (48) together with the filters in Theorem 74 Thus we have a filter-based EM algorithm
Example: Errors in Variables Time Series
We use the filtered EM algorithm to estimate the parameters of the the errors-ιn-varιables time series example considered in [2] and [3]
Consider the scalar valued AR{p) process sk , k £ 2+, defined as
1 = 1
where k is a white Gaussian process Assume that we observe s; indirect Iv via the scalar process
Vk sk 4-< α ~Λ'(0 σ- (89)
where c is a white Gaussian process independent of v The aim is to compute the ML estimate of the parameter vector θ = (<n, ,ap,σ ,σ )
We first re-express (88) and (89) in state space form (81), (82) with d = [, m = p + \, a = [α, ap]
Using a similar procedure as above, it can be shown that [2] that
^ logσ£ - - ∑E{(y, - 4,)2LVr} + E{R(θ)\yT} (90)
2σ( ι=o where R(θ) does not involve θ So the M-step yields
As shown in [2], Theorem 1 in [12] implies that the sequence of model estimates {0 }, j £ X+ from the EM algorithm are such that the sequence of likelihoods {£(#,)}, j £ 7i is monotonically increasing It is straightforward to compactify the Euclidean parameter space Θ Since C(θ) is continuous on this compact space it is bounded from above Therefore the sequence {£(#,)} converges to some £* Since the function Q(#, ;) is continuous both in θ and θ} , bv Theorem 2 in [13], £" ι^ Λ stationary value of £ To make sure that C is a maximum value of the likelihood, it is necessarv to try different initial values ΘQ Weak consistency of the maximum likelihood estimate is proved m [2] for this particular model (also see Chapter 4 in [16])
Similar parametrized models are used [1] and [6] and can be estimated ia our filter-based EM algorithm
9 Parallel Implementation of Filters
In this section we discuss the computational complexity of our filters and the resulting filter-based EM algorithm In particular, we describe why our filter based algorithm is suitable for parallel implementation and propose a systolic processor implementation of the algorithm
Sequential Complexity
We evaluate the computational cost and memory requirements of our filter-based algorithm and compare them with the standard smoother-based EM algorithm
Computational Cost The filter based E-step requires computation l each time : of E{H[3 ' \yk ) , M = 0, 1 2 and E{Jk'n \yk } for all pairs (ι, j)
• a'k + ι Consider the RHS of the update equation (71 ) The following are the computational cost for each (i, ) pair at each time-instant k
2nd term 0(m) multiplications (inner product of two m-vectors)
3rd term 0(m2) multiplications (matrix vector multiplication)
4th term 0(m2) multiplications (matrix vector multiplication)
Since there are m2 (i, j) pairs, the total complexity at each time instant is 0(m )
• Similarly the total complexity for evaluating 6'^ ' for all 2 (t, j) pairs is 0(m4) multiplications
• Evaluating d^j for each (ι,j) pair requires m x m matrix-matrix multiplications This involves 0(m3) complexity So the total complexity for all m2 (ι, j) pairs is 0(m5) at each time instant
In comparison, the Kalman smoother based E-step in [2], [3]), requires 0(m3) complexitv at each time instant to compute E{H (-M l\yτ ) , M = 0. 1 , 2, and E{J n| τ} for all pairs (j j) Memory Requirements In our filter-based EM algorithm only the filtered variables at each time instant need to be stored to compute the variables at the next time instant They can then be discarded Hence the memory required is independent of the number of observations T
In comparison, the Kalman smoot her-based EM scheme in [2] , [3] requires 0( m3 T) memorv since all the Kalman filter covariance matrices Rk . k < T need to be stored before smoot hed cov ariance matrices can be computed, see Eq.(2.12) in [2] This also involves significant memory read-write overhead costs.
Parallel Implementation on Systolic Array Architecture
The following properties of our algorithm make it suitable for vector-processor or systolic-processor implementation:
1. The computation of a
3 (
3 '( t and for any other pair (i', /) for all time k = 0, 1 , . . . So all the i j components of these variables can be computed m parallel on m
2 processors
Similarly computation of all (J, n ) components of dj." and δj," are mutually independent and can be done in parallel.
2. The recursions for ak , bk ', d)k , and ά'k n do not explicitly involve the observations They only involves the Kalman filter variables, μk- ι , ck , Rk\k ~ ι Moreover, dk only involves Rk (and so Rk\k- ι) which itself is independent of the observations and can be computed off-line for a given parameter set θ.
All the processor blocks used above are required to do a synchronous matrix vector multiplication at every time instant k. Now a N x ;V matrix can be multiplied by a N vector in Λ' time units on a N processor systolic array, see [17] , pp 216-220 for details. (Also it can also be done in unit time on N2 processors).
If r is the time required for tins matrix-vector multiplication, then for a T-point data sequence, our filtered algorithm requires a total of T T time units per EM iteration. In comparison, a parallel implementation of the forward-backward smoother-based EM algorithm requires 2 τ T time units per EM iteration because we need a minimum of T T time units to compute the forward variables and another T T units for the backward variables. For large T and a large number of EM iterations, this saving in time is quite considerable.
In addition, unlike our filtering method which has negligible memory requirements, the forward backward algorithm requires significant memory read-write overhead costs requiring πr T memory locations to be accessed for the stored forward variables while computing the backward variables
Finally, we may add that our algorithm can be easily implemented in a Smgle Instruction Multiple Data (SIMD) mode on a supercomputer in the vectorization mode or the Connection Machine using
FORTRAN 8X Tv picalh w ith m
vector multiplications per time instant That is w e need a total of 10000 processor units on a C onnect ion Machine w hich typically has 2
16 = 65536 processors
10 Conclusions and Future work
We have presented a new class of finite dimensional filters for linear Gauss Markov models that includes the Kalman filter as a special case These filters were then used to derive a filter based Expectation Maximization algorithm for computing maximum likelihood estimates of the parameters It is possible to derive the new filters in continuous-time using similar techniques This is the subject of a companion paper [18] \lso in practical applications it is worthwhile developing risk sensitive versions of the filter i e where the estimates are computed to ιnιnιmi7e an exponential cost function rather than a quadratic cost function
References
[1] R H Shumway and D S Stoffer, A n approach to time series smoothing and forecasting using the EM algorithm J Time Series Analysis, Vol 3, No 4, pp 253-264 1982
[2] D Ghosh, Maximum Likelihood Estimation of the Dynamic Shock-Error Model, Journal of Econometrics Vol 41 No 1, pp 121-143, May 1989
[3] E Weinstein, A V Oppenheim M Feder and J R Buck Iterative and Sequential Algorithms for Multisensor Signal Enhancement, IEEE Signal Proc Vol 42, No 4 pp 846-859 April 1994
[4] V Kπshnamurthy, On-line estimation of Dynamic Shock-Error Models, IEEE Trans Auto Control, Vol 35 No 5, pp 1129- 1134, 1994
[5] N S Jayant and P Noll Digital Coding of Waveforms Prentice Hall 1984
[6] I Ziskind and D Hertz, Maximum Likelihood Localization of Narrow-Band λutoregressive Sources via the EM Algorithm IEEE Trans Signal Processing, V ol 41 No 8, pp 2719-2723, August 1993
[7] L Breiman, Probability Classics in Applied Math Vol 7 SIAM Philadelphia 1992
[8] R J Elliott 1 inr t Adaptive r lers for Markov Chains Observed in ( i us i n Λ oist Vutomat- ica \ ol 0, No 9 pp 1399-1408 September 1994
[9] L ggoun R J Elliott and J B Moore 1 Measure C hang De nt ation oj ( ontinuou s tatc
Baum- W elch Estimators 1995
[10] R J Elliott, L Aggoun and J B Moore Hidden Markov Models Estimation and Control Springer Verlag, 1995
[11] A Papou s, Probability, Random Variables and Stochastic Processes McGraw Hill 1984
[12] A P Dempster, N M Laird and D B Rubin, Maximum Likelihood from Incomplete Data via the E M algorithm Journal of the Royal Statistical Society, B '19 pp 1-3S 1977
[13] C F J Wu, On the convergence properties of the EM algorithm The Annals of Statistics 11 , pp 95- 103 1983
[14] A H jazwmski, Stochastic Processes and Filtering Theory Academic Press 1970
[15] P Bilhngsley, Probability and Measure, Second Edition 1986 John W ilev NY
[16] E J Hannan and M Deistler, The Statistical Theory of Linear Systems Wiley, 1988
[17] E \ Kπshiiamurthy, Parallel Processing Principles and Practice Addison Wesle> , 1989
[18] R J Elliott and V Kπshnamurthy, New Finite Dimensional Filters for Estimation of Continuous-time Linear Gaussian Systems, submitted to SIAM Journal on Control and Optimization, 1995
A Derivation of Q(θ ) in (83)
Consider the time invariant state space model given by (81), (82) with θ — (A, B, C, D) denoting a possible set of parameters
We have shown in Sec 3 how starting from a measure P under w hich the xj and vi are independent and normal we can construct the measure P = P(θ), such that under P = P(θ), the x and y sequences satisfy the dynamics (1) and (2) In fact dP(θ) \
= * (*) dP
Suppose a — ( \ , B, C D) is a second set of parameters 'I hen
To change from, say. one set of parameters 0 lo " w i nust introduce the densities
•U((>) ,=n vhere
\D\ { - {yo Cxo)) \B\ Ψ [B-l{xι-Ax,.ι)) \D\ Φ(D-l{y, - x,))
;o ~ll =
M ( D- (yo C Xθ \B\ B- (XI iti-i)) | | ^(β-^W-C ,
The parameters of our model will be changed from θ to θ if we set dP(θ)\
= rk(β,β) dP(θ) Cϊi
In this case, dP(θ) log -k \og\B\ -{k+l) log\D\- -∑(x,- Ax,-ι)'B-2(x,- Ax,_{) dP(θ) 1 = 1
1 * - ∑(y, - C x,)' D-2[yι - C x,) + R(θ)
1=0 where R(θ) does not involve any of the parameters θ.
Then evaluating Q(θ,θ) = E{log - d^(θ )| V) for a fixed positive integer T yields (83)
Appendix B
%%%%%%%%%%%%%%%%
%% Matlab source code implementation of Filter-based EM algorithm for
%% multi-sensor signal enhancement
%%%%%%%%%%%%%%%%
%% Important variables
%% most variables are named the same as in the patent document
%% Exceptions are:
%% es :parameter estimates
%%
%%%%%%%%%%%%%%%%
%% External Functions used (source code listed below main program)
%% R.m, R_set.m, a.m, a_set.m, b.m, b_set m, d.m, d_set.π, e.m, mu.m,
%% mu_set.m, x.m, x_set.m, y.m, y_set.m
%%%%%%%%%%%%%%%%%%
% data length global T T = 1000; % passes global PASSES PASSES = 10,
% AR coefficients arc = [ 81;
% order of AR = p p = length (arc) , global m = p+1;
% process noise std
S = .1 ;
% measurement noise std sv = .05;
A = [arc, 0;eye(p), zeros (p, 1)1
B = [s zeros (l,p); zeros (p,l), zeros (p,p)
BB = B*B' ;
C = [1, zeros (l,p); ones(l.p) 0]
D = sv*eye(2,2)
DD = D*D' ;
% initial x xO = [s *randn; zeros(p.l)];
% define x global x_ x_ = zeros (m, T) ;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % read data y load y;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% define Kalman Filter variables global mu_ R_ mu_ = zeros (m,T'),
R_ = zeros (m,T*m) ; % R_{ |k)
% a global a_ a_ = zeros (1, T*m*m) ;
% b global b_
D_ = zeros ( , T*m*m) ;
% d global d_ d_ = zeros (m, T*m*m*m) ,-
A = [arc - (0 5-rand ( 1 , p) ) * 5 0 eye(p) zeros'p 1 est( . , 1) = Ail, l.p) ' for pass=l .PASSES pass
% Initialise Kalman Filter mu_set ( 1 , y ( 1 ) ) ,
R_set ( 1 , ones(m) + eye(m)*l)
% Initialise New Filter for ι=l -m for 3 = 1 m a_se (1, i, , 0) , end end for ι=l for 3=1 m b_set(l, l, ], zerosim,!)), end end for ι=l m for ]=1 m d_set(l _ j (e ( . ) 'e ( - > ■ +e i - ) *e ' . ' ι _) enα end
% Kalman Filter & New tir.xte αimensional Filters for k=2 T % Eq 70
Rk = BB+A*R(k-ll *A'
% Eq 66
InvSig = Rik-1) -R(k-l) *A' 'mvlR l *A*R(k-l)
% Eq.70 mu_set (k, A*mu ( k- 1 ) +Rk*C ' *mv (C*Rk*C ' +DD) * (y (k) -C*A*mu (k-1 ) ) ) , R_set(k, Rk-Rk*C *ιnv(C*Rk*C'+DD) *C*Rk) ,
InvR = ιnv(Rik-l) ) , % Eq 71 or 1=1 -m for 3=ι m a_set (k,ι,3, a(k-l,ι,3)+b(k-l,ι,3)' * InvSig* InvR*mu (k-1 su ( sum (d + (k-l,ι, 3) "InvSig) ) +mu(k-l) ' 'InvR* InvSιg*d (k- 1 , i, ) *InvS g* InvR*mu (k-1 ) ) b_se (k, l, , mv(Rk) *A*R(k-l)*(b(k-l,ι,3)+2*d(k-l,ι,3) * InvSig* InvR + *mu(k- 1) ) 1 , d_set (k, 1,3, ιnv(Rk) *A*R(k-l) *d(k-l, ,3)*R(k-l)*A' 'ιnv(Rk) +1/2* (e( + ι)*e(3) '+e(3)*e(ι) ' ) ) ; end end end
H = zeros (m, m) , % Eq.47 for 1=1 :m for 3=1.m
H(ι,3)=a(T,ι,3)+b(T,ι,3) ' *mu(T) +sum( sum( ( , I, 3) . *R(T) ) )+mu(T) ' *d(T, 1 3) *"n u(T) end end for ι=l :m- 1 for 3 =1 +1 :m
H(3,ι) H(ι,3) end end
E = zeros (p, p) ;
F = zeros (p, 1) ; for ι=l:p for 3=l:p
E(ι,3) = H(I+1, +1) ; end end for i=l:p
F(i,l) = H(l,i+1) ; end
G = ιnv(E) *F
A = [C, 0;eye(p), zeros(p.l)]
%%%%%%%%%%%%%%%%
%% est denotes the parameter estimates
%%%%%%%%%%%%%%% est ( : . pass+1) = G end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Matlab source code listing of functions used in main program
% for Filter-based EM algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% R.m
%%%%%%%%%%%%% function [y] = R(t) global R_ m; y = R_( : , (t-1) *m+l:t*m) ;
%%%%%%%%%%%%%
% R_set.m
%%%%%%%%%%%% function [y] = R_set(t,z) global R_ ;
R_( : , (t-1) *m+l: t*m)=z;
%%%%%%%%%%%%
% a.m
%%%%%%%%%%%% function [y] = alt, 1,3) glooal a_ m T; v = a_(l, (t-l)*m*m + (ι-l)*m + 3)
%%%%%%%%%%%% %
% a_set.m
%%%%%%%%%%%%% function [] = a_set ( t , 1 , 3 , z ) global a_ T m; a_(l, ( t-1) *m*m + (ι-l)*m + 3)=z;
%%%%%%%%%%%%%
% b.m
%%%%%%%%%%%% function [y) = b(t,ι,3) global b_ m T; y = b_( : , (t-1) *m*m + (i-l) *m + 3)
%%%%%%%%%%%
% b_set.m
%%%%%%%%%%% function [] = b_set ( , 1 , , z) global b_ T m; b_( : , (t-1) *m*m + (i-l) *m + 3 )=z;
%%%%%%%%%%
% d.m
%%%%%%%%%% function [y] = d(t,ι,3) global d_ m T; y = d_( : , (t-1) *m*m*m+(ι-l) *m*m+ (3-D *m+l: (t-1) *m*m*m+(ι-l) *m*m+3*m)
%%%%%%%%%% % d_set.m %%%%%%%%%% function [] = d_se ( t , 1 , 3 , z! global d_ T m; d_( . , (t-1) *m*m*m+ (i-l) *m*m+ (3-D *m+l. (t-1) *m*m*m+ (i-l) *rr*m+3 *m) =z ;
%%%%%%%%%% % e.m %%%%%%%%%%
function [yj = e(ι) global m y = zeros ( , 1) ; yd) = 1;
%%%%%%%%%%%% % mu.m %%%%%%%%%%% function [y] = rau ( ) global mu_,- = mu_ ( : , t ) ;
%%%%%%%%%%% * nu_set.m function [] = mu_set(t,z) glooal ιπu_; r__!l . length (z) , ) =z;
%%%%%%%%%% Ϊ x.m function [y] = x(t) global x_; y = X_ ( : , t ) ;
%%%%%%%%%
% x_set.m
%%%%%%%%% function [y] = x_set(t,z) global x_; x_( : , )=z;
%%%%%%%%%%
* y.
%%%%%%%%%% function [z] = y(t) global y_; z = y_ ( : , ) ;
%%%%%%%%%%%
% y_set.m
%%%%%%%%%%% function [] = y_set(t,z) global y_; y_( : ,t)=z;
APPENDIX C
The following model for the spot price of oil, S, was proposed by Eduardo Schwartz in his Presidential Address to the American Finance Association in January 1997 dSt = f i - δt)SfU +σ,Stdzt{t) (1 )
Here, Zi is a standard Brownian motion, and δt represents the "convenience yield" (which models the value of holdings amounts of the commodity). In fact this follows similar stochastic dynamics of the form
dδ, = κ(a -δ, )dt + σ2dz2 (/) (2)
Here, z2 is a second standard Brownian motion with <Zι(t), z2(t) = pt.
It is convenient to consider the logarithm of the stock price. x, = In St
Then x satisfies
If r is the risk-free interest rate (taken to be constant here) and λ is the market price of convenience yield risk (also assumed constant) S and δ follow similar processes under an equivalent Martingale measure.
However, it is equations (2) and (3) which give discrete dynamics for the state vector (xt, δt)1 as:
(*„<?,) = cf +Q(. ,A,)' + ^ W
Here
1 -At '
Q, = a 2x2 matrix 0 \ - κAt
The future price for oil delivery at time T>0 is given by:
-
where
Here: S is the spot price today, T=0 δ is the value of the convenience yield today, T=0.
Consequently;
\nF(S,δ,T)=\iΛS-δ^ e~ ^+A(T) (5)
It is these futures prices, for various dates, T, which are given in the market. That is for different dates Tι, T2, ..., TN we have observations
y?=\nF(S„δt +TN)
These observations give the right hand side of (5), plus some "noise" term εe9ϊN where εt =(ε\ ,ε2 t,...,εN t) is a sequence t=0,1,2 of independent
Gaussian random variables with
E[ε,]=0 e<RN and varε,= E[εtε't]=H e9ϊ ,NxN
The observation equation (5) plus εt noise on the right side therefore has the form:
y, =d, + Z,(x„δ,y+ε, for t=1,2, ...,T (6)
where
are the future prices at time t, for delivery at times t+T, t+T
2, .... t+T
N.
The model in summary has dynamics (4) for the "signal" (x,, δt)
(x,A) = c, +Q(x,_ δl_iy + η, (7)
and dynamics (5) for the observations
y, = d + Z(x„διγ + ε, (8)
In spite of Schwartz's notation; c, Q, d and Z do not depend on t. They do include Δt, the time increment of fixed size.
Equations (7) and (8) are of the form where the Kalman filter can be applied. This considers linear dynamics for the signal
A e tø" x/+1 = A + Ax, + Bω [ (9)
A e m" and observations
y^ C + Cx. + Dv, , t=O, 1 , ... (10)
One observes y , t=0, 1 , ... , T, ... and wishes to make the best estimate of xt. This is the quantity
x, = £[x,|y0,... y,
In fact , is also Gaussian random variable with conditional mean
In fact, the formulas are better written in terms of the one-step predictions:
R* -ι = £[(*. - _-ι_-ι)(** -
Then
R^^B'+A^A'
The Kalman filter then gives recursive up-dates:
μ
k+
+ DD'J
l(
k+, -C-CA- CAμ
k (11 )
k+ι = R*
+)ι* -R
k+ι\kC'\CR
k+ilkC' +DD') CR
k+lik (12)
As stated, μk = E[xk| y0) ... , yk] is the conditional mean, or best estimate, of xk given y0l ... , yk . Similarly, Rk = E[(xk- μk)(xk- μk)' | y0 yk].
However, to implement the Kalman filter knowledge of the parameters A,A,B,C,C,D is required.
The filter-based EM process, using the finite-dimensional filter, can advantageously be used to estimate these parameters, as set out below.
Consider the following recursions for l ≤ij≤ a3 ,bk ,d3 ,ak,bk , k,vk,uk,vk \<n≤d
A = 0,1,2
For M = 0,1:
-b?M)Rk kARkA+A ltkARk ? RkA kA αv(M = 0 e ^
btX = Il ARt(b + 2d Λ] k-2d RkA'KlkA A-;(θ) = 0 s K-
« = Rk ,]kARkd 0)RkA'Rl k + (ele +eJel
1
Given the following:
The expectation values are defined as:
Then for M=0, 1,2:
£p* ]= ύk + vk'μk
These equations represent the recursive finite dimensional filters for estimating the matrices and vectors H* , M=0, 1,2, Jk, Lk, and Lk given the observations
The revised estimates for the parameters A,B,C,D,~A,C are then (once given y0l ... , yτ):
- t- Lr) Cτ = τ-TCχXη]
Given observations y
0, ... , y
τ the parameters are initialized and the above used to re-estimate the parameters one at a time. With the same data y
0, ... , y
τ this process is iterated until some convergence is observed.