WO1998008167A1

WO1998008167A1 - Signal processing method using a finite-dimensional filter

Info

Publication number: WO1998008167A1
Application number: PCT/AU1997/000519
Authority: WO
Inventors: Robert James Elliott; Vikram Krishnamurthy
Original assignee: University Of Alberta; The University Of Melbourne
Priority date: 1996-08-16
Filing date: 1997-08-15
Publication date: 1998-02-26
Also published as: AUPO170196A0

Abstract

A signal processing method for determining parameters of a linear Gaussian system to remove a Gaussian noise component, comprising processing received data by a finite-dimensional filter to obtain expectation data, processing the expectation data to derive parameter data representative of the parameters. A finite-dimensional filter for processing received data to obtain state data for deriving parameter data of a linear Gaussian system associated with the received data.

Description

SIGNAL PROCESSING METHOD USING A FTNITE-DIMENSIONAL FILTER

The present invention relates to a signal processing method and a finite- dimensional filter for use in deriving parameters of a linear system.

A number of signal processing procedures involve attempts to determine a system model which can be used to define and predict a received signal of interest. A received signal or observed data may include additional components, such as Gaussian noise, which need to be taken into account when forming the system model. Signal processing filters are used to process the received data and provide estimates which can be used to derive the parameters of the system. Processing of the received data is executed on the basis of an iterative process to determine the parameters and a filter may be used to execute part of the process. Once the parameters have been determined or converged to an acceptable value after executing the iterative process, the system model can be used to determine the signal of interest with the additional components removed. This is applicable to a number of signal processing applications, including multi-sensor speech enhancement, high resolution source localisation in multi-sensor signal processing and linear predictive coding of speech. The process can also be applied to dynamic shock error models used in economic forecasting. One iterative process, known as the expectation maximisation (EM) algorithm, is used to obtain maximum likelihood (ML) estimates of system parameters. ML parameter estimation for linear Gaussian models and other related time series models using the EM algorithm was studied in the 1980s, as discussed in R.H. Shumway and D.S. Stoffer, "An Approach to Time Series Smoothing and Forecasting using the EM Algorithm" , J. Time Series Analysis, Vol. 3, No. 4, pp. 253-264, 1982 and D. Ghosh, " Maximum Likelihood Estimation of the Dynamic Shock-Error Model" , Journal of Econometrics, Vol. 41 , No. 1 , pp. 121-143, May 1989, and developed more recently for signal processing applications, as discussed in E. Weinstein, A.V. Oppenheim, M. Feder and J. R. Buck, "Iterative and Sequential Algorithms for Multisensor Signal Enhancement" , IEEE Signal Proc , Vol. 42, No. 4, pp. 846-859, April 1994 and V. Krishnamurthy, "On-line Estimation of Dynamic Shock-Error Models" , IEEE Trans. Auto. Control, Vol. 35, No. 5, pp. 1 129-1 134, 1994. Each iteration of the process comprises two steps, the expectation (E-step) and the maximisation (M-step). The E-step for linear Gaussian models involves determining the following two conditional expectations based on all of the observed or received data: (i) The sum over time of the state of the signal of interest, (ii) The sum over time of the covariance of the state of the signal of interest.

From these two expectations or estimations, the maximisation step can be used to determine the ML estimates of the parameters of a system model for the signal of interest.

Determination of the expectations in the E-step in the past has involved, for linear Gaussian system models, approximating the sums over the signal distribution by performing fixed interval smoothing of an integral. Implemented in this manner the E-step is non-causal. To perform the fixed interval smoothing a Kalman smoother has been used which requires processing the time series data obtained in one time direction and then also processing the data in the reverse direction. This is referred to as performing both forward and backward passes over the time series data and during each iteration pass, the state data obtained has to be stored for use in the next iteration until the E-step is completed. This significantly affects the memory requirements for executing the E-step, and the necessity to perform both a forward and backward pass using all of the time series data obtained reduces the speed of the process.

In accordance with the present invention there is provided a signal processing method for determining parameters of a linear Gaussian system to remove a Gaussian noise component, comprising: processing received data by a finite-dimensional filter to obtain expectation data; processing said expectation data to derive parameter data representative of said parameters.

The present invention also provides a finite-dimensional filter for processing received data to obtain state data for deriving parameter data of a linear Gaussian system associated with said received data.

Preferred embodiments of the present invention are hereinafter described, by way of example only, with reference to the accompanying drawings, wherein:

Figure 1 is a block diagram of a preferred embodiment of a system for executing an expectation and maximisation process;

Figure 2 is a block diagram of a finite dimensional filter for the expectation and maximisation process;

Figure 3 is a block diagram of another preferred embodiment of a system for executing an expectation and maximisation process;

Figure 4 is a block diagram of a two sensor speech enhancement system; Figure 5 is a graph of parameter estimates against number of data passes; Figure 6 is a block diagram of a system for localisation of narrowband sources;

Figure 7 is a block diagram of a chemical process plant; Figure 8 is a graph of output data obtained from the plant against time; and Figure 9 is a graph of parameter estimates for a model of a subsystem of the plant versus iterations of an expectation and maximisation process. A linear Gaussian system model can be applied to time signals and time series data, such as uncoded speech, which includes a random component that can be considered to vary in accordance with a Gaussian distribution. The system model for such a signal x_{k+ 1} for k = 0, 1 , is given by the dynamics:

^Xk = A-.Λ ^{+ 5}JM^W*H (1)

Here x_k e R^m and x„ e R^m is a Gaussian random variable with zero mean and covariance matrix B² ₀ e R^mx,n.

At time k + 1 , k = 0, 1 , , the noise in equation 1 is modelled by an independent Gaussian random variable with zero mean and covariance matrix B² _{k+ 1}. It is known that such a Gaussian random variable can be represented as B_{k+ 1} w_{k+ 1} where w_{k+ 1} is an m-vector of N(0, 1) random variables.

For k = 0, 1, , the state x_k may be observed indirectly via the vector observations y_k, where

y_k = Λ ^{+ D}k^v _k (2)

Here for each k, y_k e R^d and v_k is a vector of independent N(0, 1) random variables. We assume that D_k is a symmetric non-singular d x d matrix.

The vector observations y_k represent the received data or signal and constitute the information which is available on the state of the signal of interest x_k. For a system which includes a Gaussian noise component, the variable v_k and the parameter D_k represent that noise component which needs to be determined so that the state x_k can be determined.

To execute the E-step, instead of using a Kalman smoother, a filter is used to derive the sum of the state data and the sum of the covariance of the state data. If we let e_t, β_j denote the unit vectors in R^m with 1 in the ith and jth position respectively, and let e_n be the unit vector in R^d with 1 in the nth position, for i e { 1 , , m} and n e { 1 , , d}, we can define the scalar processes

(3)

where (^•,^■> denotes the inner product.

The H_k represent covariance sums and the J_k represents the sum of the state multiplied by the observed data y_k and is hereinafter referred to as the state sum. The filter derives the conditional expectations

recursively, and the data produced by the filter is then used in the maximisation step of the EM process to derive or estimate the parameters A_k, B_k, C_k and D_k of the system, from which the state of the signal of interest x_k can be determined. The expectations are related to densities of the distribution of the state and can be determined from those densities. The accompanying Appendix A describes the relationship between the densities and the required expectations, and explains the derivation of filter equations for the expectations. The filter for determining the expectations is configured to process the observed data and other data as described by the equations.

The densities α_k, β⁽⁰⁾ _k, β⁽¹⁾ _k, β^<2) _k and γ_k are defined by equation 11 in Section 4.1 of Appendix A. They can be determined by the integrals specified in equations 13 to 17, yet the integrals are not closed and normally require a fixed interval smoothing process, such as that executed by the Kalman smoother with all its inherent disadvantages to provide estimates of the densities. Theorems 5.1, 5.2 and 5.3 however specify filter equations for the densities, and Theorem 5 4 uses the relationship between the densities and the expectations to derive finite-dimensional filter equations 47 and 48 for the expectations The coefficients of the filter equations a^u(M) _k, b^IJ<M) _k and d'^J(M) _k can be determined from equations 71 to 79 using the observed data and the data obtained trom a Kalman filter, being μ_k, R_k and R_k| _k._!, as recited in equation 70, where A_k, C_k and D_k are assigned initial parameter estimates. The filter equations 47, 48 and 71 to 79 each stipulate the configuration of a respective finite-dimensional filter which together form a finite- dimensional filter 14 to generate the expectation data

The expectation data is then used in the EM process to obtain the parameters A, C,

B and D of the system, as recited in equations 84 to 87 ot Appendix A The expectation data is obtained by retaining the summations recited in equations 47, 48 and 70 to 79 as a forward pass is made across the observed data without requiring data to be retained at each discrete time interval for each iteration of the forward pass Also no backward iteration is required to obtain the expectations.

The EM process executed using the new filters, as shown in Figure 1 , involves the observed data y_k 10 being applied to a Kalman filter 12 and the finite-dimensional recursive filter 14. The Kalman filter 12 processes the observed data y_k 10 to generate conditional mean data and conditional covariance data of the state, μ_k, and R_k. This is applied with the observed data y_k to the finite-dimensional recursive filter 14 which generates the conditional expectations in accordance with equations 47 and 48 of Appendix A The expectations are then passed to an M-step process module 16 to generate estimates of the parameter in accordance with equations 84 to 87 The parameter estimates generated by the M-step process module 16 are also used by the Kalman filter 12 and the finite- dimensional filter 14 for further iterations. Although the finite-dimensional filter 14 normally executes a number of forward iterations, parameter estimates can be obtained using one forward or real time pass of the observed data.

The finite-dimensional filter 14 includes first, second and third coefficient modules 18, 20 and 22, and first, second and third delay units 24, 26 and 28 for the coefficient modules, respectively. The coefficient modules 18 to 22 receive the conditional mean and conditional covariance data of the state, μ and R_k, from the Kalman filter 12. The first coefficient module 18 generates and ouφuts updates of the coefficients a^<M) _{k+ 1} according to 5 equation 71 based on the data received from the Kalman filter 12 and the coefficients a^{M> _k, b^(M) _k, d^(M) _k outputted from the delay units 24, 26 and 28, respectively. The second coefficient module 20 generates and ouφuts the coefficients b^(M) _k+ι according to equation

72, 74 and 76 based on the inputs from the Kalman filter, the previous b^(M) _k provided by the delay unit 26 and the coefficient d^(M) _k ouφutted by the delay unit 28. The third

10 coefficient module 22 generates and ouφuts the coefficients d^(M) ₊ι according to equations

73, 75 and 77 on the basis of the data from the Kalman filter 12 and the previous coefficient d^<M) _k provided by the delay unit 28. The coefficients a^(M) _{k+ I}, b^(M, _k+ι and d^(M) _{k+ I} generated by the modules 18 to 22 are provided to a filter covariance and sum module 30 which generates the covariance and state sum expectations according to equations 47 and

15 48.

The filters of the filter 14 are finite-dimensional because of the following two properties that hold at each time instant:

1. The filtered density of the current time sum of the state γ_k is given by an 0 affine function in x times the filtered state density α_k as per equation 44.

The filtered state density is a Gaussian in x with mean and covariance given by the Kalman filter equations 24 and 25.

2. The filtered density of the current time sum of the state covariance β^(M) _k, is a quadratic in x times the filtered state estimate α_k per equation 28. 5

The filtered density of the state sum is given in terms of four sufficient statistics, namely the two coefficients of the affine function in x and the Kalman mean μ_k and covariance R_k. Similarly the filtered density of the covariance sum is given by five sufficient statistics.

30 The above "closure" property also holds for higher order statistics as well. The filtered density of the current time sum of the p-th order statistic of the state is a p-th order polynomial in x times the filtered state estimate. Thus in general finite-dimensional filters can be derived for the time sum of pth order statistics of the state. For the filtered E-step only the filter for the first and second orders is used.

The EM process and the filters used can be implemented by an application specific signal processing circuit, but normally would be implemented using one or more microprocessors and a number of iterative software routines to process the observed data or signal. The EM process based on the use of the filters 12 and 14 has significant advantages over the EM process which uses an integral smoother, such as the Kalman smoother. The filter-based EM process provides significant savings in memory requirements, as it eliminates the need for a backward iteration and only summations need to be obtained, therefore removing the requirement for estimates to be maintained for each time interval for successive iterations.

The filter based process is also twice as fast as the previous smoother based EM process as no forward and backward iteration method is required. Parameter estimates for the process also converge faster, which is particularly advantageous when processing observed data as it is received. The finite-dimensional filters of the filter 14 are also decoupled, in that one does not depend on the other, and therefore can be implemented in parallel on a multiprocessor system to further enhance the speed of the process, as discussed in Section 9 of Appendix A.

A recursive version of the EM process can be implemented as shown in Figure 3.

Here at each time instant with each new data point, the Kalman filter 12 and the finite- dimensional filter 14 are used to compute the required statistical data, and the model parameter estimates are updated by an M-step process module 32. The resulting process is recursive in that the parameters are updated by the M-step module 32 at each time instant, rather than the end of the pass of the available data points as in an off-line case. This real

SUBSTΓΓUTE SHEET (Rule 26) time joint parameter and state estimation can be used to track time varying parameters in real time multi-sensor signal enhancement, and high resolution localisation of narrowband sources, as discussed hereinafter. Other applications of the filter-based EM process include linear predictive coding of speech and estimation of dynamic shock error models as discussed hereinafter.

When enhancing a desired signal in the presence of noise, multiple sensor measurements will typically have components from both the signal and the noise sources. Since the systems that couple the signal and noise to the sensors are unknown, one must deal with the more difficult problem of joint signal estimation and system identification.

A frequency domain approach to the two-sensor signal enhancement problem is discussed in M. Feder, A. V. Oppenheim and E. Weinstein, " Maximum Likelihood Noise Cancellation using the EM Algorithm" , IEEE Trans. Acoustics Speech and Signal Processing, Vol. 37, No. 2, pp. 204-216, Feb. 1989. In that approach, the desired (speech) signal is modelled as an auto- regressive (AR) Gaussian process, the noise is modelled as a white Gaussian process, and the coupling systems are modelled as linear time-invariant finite impulse response filters. In order to deal with the non-stationarity of the signal, the noise, and the coupling systems, it is suggested in the paper that an algorithm be applied on consecutive time frames using a sliding window. This approach involves two contradicting requirements. The window should be short enough so that the algorithm will respond to non-stationary changes in the signal and noise statistics. However, the window should be long in order to improve the statistical stability of the resulting signal and parameter estimates and in order to obtain a computationally tractable algorithm in which non-causal frequency domain Wiener filtering can be applied.

Using the filter-based EM process described herein, a time domain approach can be adopted, whereby the process jointly estimates the signal, the noise, the coupling systems and the unknown signal and noise spectral parameters. The advantage of this time domain approach is that many of the computational and conceptual difficulties associated with the frequency domain approach are avoided. An example of the time domain approach for an observed signal s_k where v_k represents white Gaussian noise is described on pages 31 and 32 of Appendix A.

5 A two sensor speech enhancement system 32, as shown in Figure 4, uses two noisy sensors (microphones) 34 and 36 to receive a speech signal in ambient noise. The noise received at the two sensors 34 and 36 affect the received speech signals and are correlated with different correlations as the sensors 34 and 36 are in ditferent locations. The aim is to optimally combine the two sensor ouφuts to extract optimal estimates of the speech signal 0 (state) and speech and noise parameters This is achieved by using the filter-based EM process in a module 38 which receives the digitised outputs of the sensors 34 and 36 Source code in Matlab for the module 38 is given in Appendix B. For a simulation using the source code of Appendix B the signal source was assumed to be an AR(1) process with AR coefficient a_! = 0.75, δ² _v = 0.01 for the observed signal s_k of equation 88 of 5 Appendix A. The sensor (observation noise) variances were assumed to be δ_ε = 0.0025. The filter-based EM module 38 was run on 1000 data points and the initial AR parameter estimate was randomly initialised to 0.941. Figure 5 shows the AR parameter estimates versus successive passes and the parameter estimates are very close to the true parameter value. A recursive version of the process can be also used to achieve real time (on-line) 0 speech signal (state) and parameter estimates. This can be used to track time varying signal and noise statistics. The smoother based EM process requires of the order of N²T memory (wherein N is the order of the AR process and T is the number of data points) whereas the filter based process requires no storage memory apart from local variables. Also the various filters are decoupled and hence can be implemented in parallel. A systolic 5 implementation is described in Section 9 of Appendix A.

For high resolution localisation of narrowband sources, the use of the EM process is described in I. Ziskind and D. Hertz , "Maximum Likelihood Localization of Narrow- Band Autoregressive Sources via the EM Algorithm" , IEEE Trans. Signal Processing, Vol 30 41 , No. 8, pp. 2719-2723, August 1993. The filter-based EM process described above can be used on a Gaussian state space model to estimate the parameters of model signal sources and their direction of arrival. In our notation the multi-sensor multi-source signal model in Ziskind and Hertz can be expressed as:

y_k = C(θ^)x, ₊ Dv_k ^W

The state equation relates to the narrowband sources. In particular, θ^{1) are parameters that determine the bandwidth of the sources. In the observation equation, θ⁽²⁾ denotes the angles of arrival to the multi-sensor array. C(θ⁽²⁾) is called the steering matrix of the array towards direction θ⁽²⁾. A Kalman smoother is used to obtain maximum likelihood estimates of the parameters θ⁽¹⁾, θ⁽²⁾ and noise variances B and D. The filter-based EM process can be used instead of the Kalman smoother by using a filter-based EM process module 40 to receive the signal data observed by a sensor array 42 in response to the signals generated by a plurality of narrowband sources 44, as shown in Figure 6. The module 40 is able to provide estimates of direction arrival θ^<2), bandwidth θ^(,) and source signal x_{k+ I} for each narrowband source 44.

The filter based EM process can be used, for example, in a chemical process plant application. A plant 50 may include a three-tank level and temperature system, as shown in Figure 7, which is versatile and representative of typical chemical process plants. The plant 50 has two-cylindrical glass tanks 52 and 54 and one conical glass tank 56, each having an internal diameter of 16 cm, and height of 50 cm. The system is equipped with several transducers: three differential pressure cell transducers 58, 60 and 62 for measuring the flow rates of steam and water; and five thermocouples 64, 66, 68, 70 and 72 for measuring the temperature of the water at various points in the system. In addition there are three control valves 74, 76 and 78 for manipulating the flow rates of water and steam into the tanks 52, 54 and 56, and three tank level sensors 80, 82 and 84 for the tanks 52, 56 and 58. Further details of the plant are given in O.O. Badmus, D. Grant Fisher and Sirish L. Shah, "Real-time, Sensor-based Computing in the Laboratory" , Chemical Engineering Education, Fall 1996, pages 280 to 289. The process equipment of the plant 2 can readily be configured to demonstrate the performance of different process systems and control schemes. Only a subsystem of the plant 2, being the bold portion of Figure 7 for the first tank 52, is utilised for the description below. To generate data suitable for identification of a dynamic process model, a uniformly distributed random input signal of maximum amplitude 1 (0 < M < 1) was generated and applied every 5 seconds as a perturbation of the nominal value of the position of the cold water valve 74 (set at 21 % open) with an amplification gain of 10. The input signal is bounded by the interval 16% to 26% open. As described in the O.O. Badmus et al article, a noisy Gaussian AR series adequately describes the process dynamics when the input signal is assumed to be unknown and modelled as Gaussian white noise. Figure 8 shows a 2000 point ouφut data segment obtained from the level sensor 80 for the first tank 52 when the input signal is white Gaussian noise with variance 8.2687. The ouφut series was detrended by subtracting the mean, resulting in a zero-mean time series. Assuming an AR(2) model for the subsystem, the filter-based EM process was executed on the zero-mean data. Initial AR parameter estimates were arbitrarily chosen as 0.4 and 0.0. The parameter estimates obtained from the filter-based EM process are plotted in Figure 9. The estimates converge after about 10 iterations. After 20 iterations the estimates are 0.5970 and 0.4027. The parameter estimates, after only 6 iterations, are very close to the final values which stabilise after 10 iterations. This rapid convergence, combined with the fact that no memory storage of previous values is required, demonstrates the advantages of the filter- based EM.

In the linear predictive coding of speech, unvoiced speech is modelled as an auto- regressive process driven by white Gaussian noise. The auto-regressive parameters correspond to those of the vocal tract resonant cavities and change with the flow of speech according to movement of the mouth, tongue and chest. These changes are relatively slow compared to the speech signal variations. The speech signal itself is embedded in white Gaussian noise. The aim is to obtain ML estimates of the AR parameters, which can be done using the filter-based EM process. The filter-based EM process can also be used as a substitute for the Kalman smoother based EM algorithm described in D. Ghosh, "Maximum Likelihood Estimation of the Dynamic Shock-Error Model" , Journal of Econometrics, Vol. 41 , No. 1, pp. 121- 143, May 1989. Here the EM process is used to attain ML estimates of dynamic shock error model parameters for economies, such as the shadow economy in the United States.

The filter-based EM process can also be used to determine the parameters for a system to predict the spot price of oil, as described in detail in accompanying Appendix C.

The filter-based EM process has been described above in relation to discrete time systems but it can also be applied to continuous time linear Gaussian systems.

Various discretisations of the filters (e.g. first order, second order or robust discretisation) (refer to A. Bensoussan, "Stochastic Control of Partially Observable Systems" , Cambridge University Press, Cambridge, 1992) result in different discrete time filters than can be applied to the above examples.

Many modifications will be apparent to those skilled in the art without departing from the scope of the present invention as hereinbefore described with reference to the accompanying drawings.

Appendix A

There are very few estimation problems for which finite dimensional optimal filters exist, i e . filters given in terms of finite-dimensional sufficient statistics. Indeed the only two cases that are widely used are the Kalman filter for linear Gaussian models and the Wonham filter ( Hidden Markov Model filter) for finite state Markov chains in white noise.

In this paper we derive new finite-dimensional filters for linear Gaussian state space models in discrete-time. Indeed, the Kalman filter is merely a special case of our general filter Our filters compute all the statistics required to obtain maximum likelihood (ML) estimates of the model parameters via the Expectation Maximization (EM ) algorithm

Maximum likelihood parameter estimation of linear Gaussian models and other related time- series models using the EM algorithm was studied in the 1980s in [1] and [2] and more recently in the electrical engineering literature in [3], [4] The EM algorithm is a general iterative numerical algorithm for obtaining ML estimates. Each iteration consists of two steps The Expectation (E- step) and the Maximization (M-step). The E-step for linear Gaussian models involves computing the following two conditional expectations based on all the observations (i) The sum over time of the state, (ii) The sum over time of the state covariance.

In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, the E-step is non-causal involving fixed-interval smoothing via a Kalman smoother ( I e a forward pass and a backward pass)

In this paper we derive a filter-based EM algorithm for linear Gaussian models That is, the E-step is implemented using filters (i.e. only a forward pass) rather than smoothers. The main contribution of this paper is to show that these filters are fi le-dimensionαl Few finite dimensional filters are known, so the result is remarkable. It was certainly a surprise to the authors

The filtered E-step has the following advantages.

1 The memory costs are significantly reduced compared to the standard (smoother-based) EM algorithm

2 Our filters are decoupled and hence easy to implement in parallel on a multiprocessor system

3 Our algorithm is at least twice as fast than the standard smoother based EM algorithm because no forward-backward scheme is required Filter-based EM algorithms have recently been proposed for Hidden Markov Models in ' ^ These

HMM filters are finite dimensional because of the idempotent property of the state indicator lunction of a finite state Markov chain In linear Gaussian models unlike the HM case the state indicator vector is no longer idempotent Instead our filters are finite dimensional because of the following two remarkable algebraic properties that hold at each time instant

1 The filtered density of the current time sum of the state is given by an affine function in x times the filtered state density The filtered state density is a Gaussian in x with mean and variance given by the Kalman filter equations

2 The filtered density of the current time sum of the state covariance is a quadratic in i times the filtered state estimate

So the filtered density of the state sum is given in terms of 4 sufficient statistics, nameiv the 2 coefficients of the affine function in α. and the Kalman mean and covariance Similarly tlι<= filtered density of the covariance sum is given by 5 sufficient statistics

Actually this remarkable algebraic ^"closure property holds for higher order statistics as well We prove that the filtered density of the current time sum of the p-th order statistic of the state is a p-th order polynomial in x times the filtered state estimate So in general we can

finite dimensional filters for the time sum of pth order statistics of the state Of course for the filtered E-step we only use filter for the first and second order statistics Also for p = 0, our filters reduce to the Kalman filter

Applications Our filter-based EM algorithm for linear Gaussian models can be applied to all the applications where the standard EM algorithm has been applied In particular these include

• Multi-sensor signal enhancement algorithms for estimation of speech signals in room acoustic environments [3].

• High Resolution Localization of Narrowband Sources using multiple sensors and Direction of Arrival Estimation [6]

• Linear Predictive coding of speech ( see Chapter 6 -[5])

• Forecasting and prediction of the ''shadow economy^'' in market cycles using linear errors-in- \ arιables models [2]

In all these applications the advantages of our filter-based EM algorithm can be exploited

This paper is organized as follows In Sec 2 we present our signal model In Sec 3 we propose a measure change which facilitates easy derivation of the filters in Sec 4, recursions are deriv ed for the filtered densities of the variables of interest In Sec 5 we derive our finite dimensional Sec I) a general finite dimensional filter is proposed Sec 7 re-e.\presses the filters to allow to

i sinc.ui slat* noise as long as a certain controllability condition is satisfied In Sec {>, the filters derived m Sec 4 and 6 are used in a filter-based EM algorithm to obtain maximum likelihood parameter estimates In

Sec 9 we evaluate the computational complexity of the filters and propose a parallel implementation

Finally conclusions are presented in Sec 10

2 Signal Model

To derive our filters with maximum generality we consider a multi-input multi-output state space model with time varying parameters and noise variances as follows

All processes are defined initiallv on the probability space [V.,T P⁾ We shall consid r the classical linear-Gaussian model for the signal and observation processes That is for L = 0 1 , the signal is given by the dynamics

_ + ι = Ak + ι k + B_k+ι ωjt + i (1)

Here ∑_k € R^m and ι₀ € K^m is a Gaussian random variable with zero mean and covariance matrix B² €l^{m m}

At time k -- 1, k — 0, 1, , the noise in (1) is modelled by an independent Gaussian random variable with zero mean and covariance matrix B\_+i It is known [7] that such a Gaussian random variable can be represented as Bt + i u>t + ι where un+i is an rτ?-vector of Λ^'(0 1) random variables

Assumption 2.1 For the time being, we assume that the matrices Bt € ?^mxm _ι k = 0, 1 , are nonsmgular and symmetric The symmetry follows from the construction in [7] The case when B_k is singular is discussed in Section 9

For Ar = 0,1,. , we assume that the state process x* is observed indirectly via the vector observations yjt, where

Here for each Jfc. yι £ ϋl^d and m. is a vector of independent V(0 1) random variables We assume that D_k is a nonsmgular d x d matrix

Remark We assume B to be a covariance matrix and hence symmetric tor notational convenience The results in this paper also hold for non- ymmetric B simply replace B- bv B B' and B^~ by (BB')~^l below 3 Measure Change Construction and Dynamics

We shall adapt the techniques in [10] and show how the dynamics (ll -ind (2) can be modelled starting with an initial reference probability measure P

Suppose on a probability space (Ω, ^", P) we are given two sequences of independent, identically distributed random variables n € E^m, y_t € R^d. Under the probability measure P, the Xk are a sequence of independent m-dimensional N(0,I_m) random variables, and the i_/jt are a sequence of independent d-dimensional N(0,I_d) random variables. Here, 7_m (resp Ij) represents the m x m (resp. d x d) identity matrix.

For x K^m and y 6 ? write

c(x) = (2π)^{_m 2} exp{-x'x/2)

Define the sigma-fields

Qk = σ{x₀,x_i: .. ,χk,yo,yι, ,y*} yk = o-{y₀,y .-. ,Vk) (4)

Thus Q is the complete filtration generated by the x and y sequences and ^ is the complete filtration generated by the observations.

For any matrix B let \B\ denote its determinant

Write φ(D-^l(y₀ -C₀x₀)) λπ =

\D₀\φ{yo)

and for / > 1

For k > 0 set

A new probability measure P can be defined on (Ω.Vi.tJjt) by setting the Q restriction of the Radon- derivative of P with respect to P dP_ d^~P Definition 3.1 For I = 0,1, define

vι = D_l ^l{ ι - Cixt)

For / = 1,2, . ., define

Lemma 3.2 Under the measure P, vi and wι are sequences of independent /V(0,/_d) and N(0,I_m) random variables, respectively

PROOF Suppose / . R^d — R and g . !R^m — R are arbitrary measurable 'test' functions Then with E (resp E) denoting expectation under P (resp P)

E{g(_Wk) fMGk-i} = ~)

using a version of Bayes' theorem [10]

Now Λ_fc-i ιs _k_ι measurable, therefore

However,

(8)

Notice that the inner conditional expectation is

Hence

E{λ_t|Gt._,} = -i- (fl-'(x_t - A_k r_t_ ) dx_k = 1

Consequently,

f(D;^lly_k -C_kx_k)) Gk-l, k Gk- (9)

Then for any measurable "test" function g

0.1.2. (12)

The following theorem gives recursive expressions for the _k, _k and ~_;k. The recursions are derived under the measure P where {x,} and {j_/;}, / € 2Z⁺ are independent sequences of random variables.

Theorem 4.1 For k € ⁺ , the un-norma zed densities /^^(Λ . M = 0.1.2. -|." and α_t defined in (11) are given by the following recursions

+ (x,e_l)(x,e_J) _k._l(z)Φ{B7^l(x-A_kz)) dz (13)

-ύ(i). β ?_k'ϊ.⁰Λ:)v(B_t- -i, - A_k -)) dz

+ (x,cΛ [ ( ,e_J)α_t__l(;)ι/₎(β_i-'(_i'- l_t:)) i (14) Js™ -I _β.m_{:)lp{B-_l{x__{Λk z)) dz}

+ / (:.c,){:,e_])a_k-_l(z)φ(B ¹{x-A_kz)) dz (15)

ι\ⁿ{^χ) ø^(g _t-'^(y _t-c_t ^χ))rr _(βri_(l-._{.4tr)) d}.-

|^β*l|D* (y_t) LΛ- ^^*l *

+ ( .c,){yt,e_n> / -.*-ι(z) '(^{β 1}(ι- it-)) ^<'- (16)

ak{x) \ a_k-ι(z)φ{B_k (x-A_kz)) dz (17)

\B_k\\D_k\Φ(y_k JS." PROOF. We prove (13). The proof of (14), (15), (16) and (17) are very similar and hence omitted

Since H r_k'JlO) = // r'.;_(θ,ι - \x_k.t_t) (x ,C_]). using (δ) we have

Φ(D7^l(y_k-C_kx_k)) φ{B7^l(xk-AkXk-ι))

+ E<^Λ_it_₁ (x_t,e,) (x_k. t_j)) y(x_k )\y_k

\D_k\φ(y_k) \B_k\φ(x_k)

- EfAt._! / o(D7^l{y_k-C_kx)) ψ{Br^l{x-A_kx_k-_l)) (x, e,){x. e,) g(x)dx\y

= ιaιιaι«_wι [/,./.. tf'""⁾*<°'""'-^c"⁾⁾*'^fl''"-"--^|)"^J|lt"'

+ / / «»-il--)o(Dr'(» - ι)) ^(■(β;'(^J'-.-U--)) (r.e,!!i.<-,)» I<<"J-- :i8)

where the second equality follows from the independence of the ^^'s and y_k ^'s under P. Since g is an arbitrary Borel test function, equating the RHS of (12) with (18) proves (13)

D

Remarks:

1. By virtue of (17), we can rewrite (13) and (16) as:

^WW = dz+(x,e,)(x._e])_ak[x)(19)

- 7*^ιπ'U^⁾ - = ^{0 {Dk {yk}-^CkX) [ γ_k"__i(x)φ{B_k-¹(x-A_kz)) dz + ( ,e₁)(!_/,_en)_Qjt(ι)(20)

\B_k\\D_kφ(y_k) j_M

2. The above theorem does not require v_\ and wι to be Gaussian. The recursions (17), (13) and (16) hold for arbitrary densities ψ and φ as long as o is strictly positive. We use the Gaussian assumption to derive the finite dimensional filters in Sec.5

3. Initial conditions: Note that at k — 0, we have for any arbitrary Borel test function <7(x)

E{A,_J(x)_lW

Equating (11) and (21) yields

, . φ{Dό^l(y₀-C₀x))

(22) \Do\Φ(yo)

Similarly the initial conditions for β_k , .1/ =0,1,2 and -.{." are

^{; ,0,}(xl = (._^■, <^•,)(*, <^■,)«,,(*) / ⁽¹⁾W = 0

; ⁽²¹(x)

(r.e,)(ι/o,tn/uo( ) (23) 5 Finite dimensional Filters

In this section finite dimensional filters are derived for H[^]{ \1 - U 1 2 and ^}[" defined in (10) In particular, we characterize the densities _k, β[^{l M} and ⁿ in terms of a finite number of sufficient statistics Then recursions are derived for these statistics

Define the conditional mean and covariance matrix of x , respectively as μ_k = E{x_k\y_k) and

The linearity of (1) and (2) implies that a_k{x) is an un-normahzed normal density with mean and variance given by the standard Kalman filter equations

Theorem 5.1 (Kalman Filler) For k — 0,1, a_k is an un-normahzed Gaussian density with mean in and xariancc R given via the standard Kalman filter equations

μ_k = R_k B ² Λ_kσ7^l RT^μ_k^ + R_ky'_k (D_k D[)-' C. (24)

R_k = [(A_kR_k^A'_k + Blr^l- C_k' (D_k D[f^l θ]^~ (25)

where μ_k e ?.^m, R_k £ K^mxm and

σ_k = A'_kB7²A_k+R _^x (26)

Her μ_L = E{x_k\y_k) and R_k = E{(x - μ_k)(x - μ_k)'\y_k}

PROOF See [9] □

Remark Consequently, a_k has the form

_k(x) = ά_k (2ι)-^m'² |Λ_fc|-^{1 2} exp - μ_k)' R - μt)) (27)

where ά_k — J_Rm ct_k(x) dx

Due to the presence of the quadratic term (x,e_t) (x,e_,), the density β_k in (19) is not Gaussian Is it possible to characterize the density β in terms of a finite number of sufficient statistics⁷ Amazingly enough, the answer is "yes" As we prove below it is possible to express

as a quadratic expression in x multiplied by a_k(x) for all k The important conclusion then is that by updating the coefficients of the quadratic expression, together with the Kalman filter above, we have finite dimensional filters for computing H[ \ simiiai result also holds tor ^3(l H and J_k ⁿ

Theorems 52 and 53 that follow derive finite dimensional sufficient statistics for the densities l3^llM \I =0,1,2 and [ⁿ Theorem 5.2 At time k. the density ) (initialized according to ι ])) <s completely defined by the 5 statistics a i.}( \!) ,i]( ) ,.j, I

R_k and μ as follou.

t?_k'^{M)(x) = (28)

where a^3{M) G 1, b[^3{M G R^m and d[^3(M) G R'"*"¹ (5 α symmetric matrix with elements d_k{p,q) p = 1. . , m, g = 1, .. ,m.

Furthermore, a_k ,

by the following recursions

'J(M) _ t](M) "lt+1 - "i p=l »=1

A-μ_kκ_k ^σ

(29)

,ι (U) _ D-2 , _| .ιj(U) .. .i^ Ol - l _w-l „ ι, ιθ)

= „ (30)

/ 'JzO) _ = * D_*-+2. > „ .₊, * __*-+l. C ,»j(0)^ _ι ,, ^■> Λ;₊. % + 51 («. «; + *, o d'³ (0) ^e, - _ ^f.

^ 1)

+ ^e. ^e, ^σJt^"+ι ^β (,'Jθ = 0_n (32)

J' 0) _ 0-2 . -1 , (l) -1 ./ p-2

d'³⁽¹⁾ - 0 (33)

+ 2 t^e' ^{e σ}*+ι -⁴'^l + ι ^β*+> ^{+ fl}*+ι ^{ljt+1 σ}*+ι ^e> ^e'5 ι>j(2) _ , -1 (_h'](2) , ₍₉ ,>J(2) , ^

°t + l - °fc + l ^Λ* + l ^σt + l ^°* + '²"t + ^e< ^e; + ^ej ^e έ^ul2) - 0 (34)

J>J(2)

O = ^> A_{k + i} σ;l, (df + I [e, _e; + e,]) 7 _4'_{t + I} B Q (35)

where σ_k is defined in (26) and μ_k, R_k are obtained from the Kalman filter (24) and (25)

PROOF. We only prove the theorem for M = 0, the proofs for M = 1.2 are very similar and hence omitted.

We prove (28) by induction.

From (23), at time k = 0, β₀'^l{0)(x) is of the form (28) with α'₀ ^j(0) = 0. i'₀ ^j(0) = 0 and d'j^{0) =

(e,e +_eje;)/2.

'J(O) .ιj(0) ,ι;(0) ₃« (0|

For convenience we drop the superscripts (.j,(0) in a_k \ 6_λ , d_k and β_{k + l} Assume that (28) holds at time k Then at time k + 1, using (28) and the recursion (13) we have

βk + χ{x) - v(B_{L + l}(x- A + z))(a_k-rb_kz + z d_k z)a_k(:)dz

+ (x,e,)(x,e_;) / v>(B_{k + l}(x-A_{k + l} :)) a_k(;)dz (36)

Jr._™ Let us concentrate on the first term on the RHS which we shall denote as l\

λ^" ") J,

-{(x- κ₊ι--)'βr₊ι - 4ι+ι --) + (-- -/_'»)'?: t' ):

( α_t +6'₍ : + ;' tit.) dr

(37)

Λ^'ι(x) / exp -- (c'σι ₊ ι ; - b_k'_+l z) (a_k + b : + z d z)dz JR"> L - where σ_t is deined in (26)

m = I a_k(z)dz

Λ^'ι(x) =

-- {*' B_kl_x x + μ',₊₁ fljrj,/.t+i! ⁽38)

Completing the "square' in the e.xponential term in (37) yields c; _- 1 ^cl + l

1₁ — exp 1 / "l- l ft + 1

{u_k+b z + z'd_kz)dz (39)

Now consider the integral in (39)

1 / σ _+lδk+ι ^σl+l**+l

(a_k +b'_kz + :' d_kz)exp ² ; ^"~k^* +^τ l^' v^{" 2}

(2πr/²|σ_t+1|-¹/²(α_l+6;E{c} +E(.' :}) (40)

since theexp( ) term is an un-normalized Gaussian density in z with normalization constant (2 τr)^m/2 l<τjc + ιl ^l*~ So ^σ _kl_l ^δ +i

E{.} = (41)

E{z'd_kz} = E{(.- -E{.-})' ._t(z-E{z})} + E{r'}rf_tE{--}

= + i (42)

Therefore from (39), (40). (41), (42) and (36) we have

-H(X) = <n + ι(x) ^uι + _k(p,q)σ_k ιip. )

+ ^ ^l + i ^σ +ι ^dk^σ lι ^δ*- + ι + -¹' e, ^Cj x (43) Substituting for δ_k+l (uhicli is affine in x) in (43) we obtain

A+l(- = (O-L + l T^υ{ + _\ -i -x' c/n _{+ 1} - )Ol₊|(^{j )} where αj. _{+ 1}, _>_{t + 1} and (fn + i are given by (29), (30) and (31 j D

The proof of the following theorem is very similar and hence omitted

Theorem 5.3 The density ~f[ⁿ{x) is completely determined by the J, statistics

, b\" , μ_k £ K^m and R_k£E^mxm as follows

y_k ⁿ(x)= [ά_k ⁿ +b ^~\ⁿ'x] a_k(x), -_;i"(x)=(x e,)(_Vo e„)α₀(x) (44)

where ά[" ξP. If £ ?^m are given by the following recursions

*ϊ+ι = ^B lι ^ + 1 ^σt^~+ι ^έI" + ^e- (yt + i.en), = e.{^jo.e„) (46)

Having characterized the densities ?^J (x), Λ/ = 0, 1 2 and ~:_L'ⁿ(x) by their finite sufficient statistics, we now derive finite dimensional filters for H' and J[ⁿ

Theorem 5.4 nire dimensional filters for H¹ ' M - 0 1 2 and }[ⁿ are given by

E{H'^3(M))y_k} = a ^M) + b ^M),μ_k + (47)

PROOF Using the abstract Bayes rule (7) we have

where the constant A^' = J_Rm a_k(x)dx But since Qj_t(x) is an un-normahzed density, from (28)

/ / β a_k' (M) '{,x)d ,x = A _T- E n{ iθ ι_jj(M) + , b ,t_kj(M)ι x + , x i d jt_kι(JM) ' x l)

Substituting in (49) proves the theorem

The proof of (48) is similar and hence omitted D

6 General Filter for Higher Order Moments

Theorem 54 gives finite dimensional filters for the time sum of the states J[ⁿ and time sum of the square of the states H[³ In this section we show that finite dimensional filters exist for the time sum of any arbitrary integral power of the states Assumption 6.1 I or notatonal simplicity, in this section ικ assume that the state and obsenation processes are scalar lalued i c m — d — 1 I (1) and (2)

Let H be the time sum of the pth power of the state '

(51)

1 = 0

Our aim is to derive a finite dimensional filter for H_k

Define the un-normalized densi β_k(x) = E{ H I(ι 6 dx)\y_k)

Our first step is to obtain a recursion for β (x)

B\ using a proof

similar to Theorem 41 we can show

Our task now is to characterize β_k(x) in terms of finite sufficient statistics Recall that for p = 0, the Kalman filter state and covariance are sufficient statistics as shown in Theorem 51 Λ.lso for p = 1 and 2 Theorems 53 and 52 give finite dimensional sufficient statistics We now show that β_k can be characterized in terms of finite dimensional statistics for any p £ 3 ⁺

Theorem 6.2 At time k, the density β_k{x) i (52) is completely defined by the p + 3 statistics ^α _t(0) a (ϊ) ,αj_t(p), R_k and μ_k as follows p

A(^χ) = £ _*(*)*' atk(x) (53)

Lι=0 where

αt+ι(n) ΣΣ a (ι)η> {R;^lμ_ky^~ t+iB*- ,) o<n<_P

■=»)= αn-ι(p) = l -αj( )τj_pp (A_{k + {} B_{k +} 2 _i) \P (54)

and

()) 1 3 (i - J - l)σ;!v ' «/ι - J » et>en i > ;

*?.; = 0 if i - j is odd, i > j (55)

PROOF λs in Theorem 52 we give an proof At k — 0 J_k{x) = x^p Qo(-ι) and so satisfies (53)

'These new definitions for //* in (51) and 0_k{ι) in (52) ate onk used in this section Assume that (53) holds at time it Then at tini^ k + 1 using similar arguments to I heorem ό 2 we

The first term on the RHS of the above equation

1 tl + i _kli + i

A = A^'ι(x) exp

The integral in the above equation is

Now recall from (41) that E{z] is affine in x

Also E{(; - E{z})²) is independent of x Indeed. ([11], pg 111)

E{(*-E{z})-'} = 1 3 (! -J- l)σ_έ ι if i - is even, i > j (60)

Thus

A + ι(z) <*k + l{x) ∑∑∑a_k(ι)_η ^} _n) (R7ⁱμ_ky-"(A_k+<B7l_l)ⁿχ^» ₊χr

Eq (61) is of the form (53) with αt+ι(ι), i = 0 , p given by (54) D

7 Singular State Noise

The filters derived in Theorems 51, > 2 and 53 have one major problem They require B to be invertible In practice (e g see Sec 8), B_k is often not invertible

In this section, we will use a simple transformation that expresses our filters in the terms of the inverse of the predicted Kalman covariance matrix This inverse exists even if B_k is singular as long as a certain uniform contiollabiliu condition holds Both the uniform controllability condition and the transformation we use aie well known in the Kalman filter literature [11] Chapter 7

Define the Kalman predicted sιaκ covariance as R \_k~\ = E{(xjt - μ_k){n — μ_k)' |.W-ι } It is straightforward to show that

R_{k k}-_l = Bl+A_kR_k-_lA'_k (62)

Our first step is to provide a sufficient condition for R_k\_k-_\ to ^De non-singular

Definition 7.1 (Chapter 7, [14J ) The state space model (1), (2) is said to be uniformly completely controllable if there exist a postπc integer \_\ and positive constants a 3 such thai

a I < ('fit k - Λ^'ι )< 31 for all k > Λ^f, (63)

Here

C[k k- ι)= ∑ o(k,l+l)B,B,'ώ'(k,l+l) (64)

/ = i-Λ,

4_t..4_l,_ι J.. + 1 ιfk₂ > kι Φ(k₂ k_l)= i (65) / ιfk₂ = kι

Lemma 7.2 // the dynamical system (1), (2) is uniformly completely controllable and R_Q > 0 then R_k\ and R_k\_k-_\ are positive definite matrices (and hence nonsmgular) for all k > Nι

PROOF See [14], pg 238 Lemma 73 ^Q

Our aim now is to re-express the filters in Sec 5 in terms of R_k\ -_\ The following lemma will be used in the sequel

Lemma 7.3 Assume

exists Then with σ_k defined in (26),

PROOF Straightforward use of the matrix inversion lemma on (26) yields

σ ^l = R_k-i - R_l._lA'_k[Bl + A_k R_k., A'_k ^l A_kR_k-ι (68)

Substituting (62) in (6S) proves (66) To prove (67), first note that

# +i A_{k + Ϊ} σ7^ = B7^ l_{λ + 1} R_k - B7^ Λ_k + ι R_k A'_{k + l} ^β ₍ ^"+ιμ + i Λι

= B_k + i Λ_k +

+ ι ^β<- because . ;. _{+ 1} R_k A[._^ = R_k + ι\_k - Bl _{+ l} from (62). So

! + i - V-M σ_{k t} - 7^_{ A_k + ι Rk - B ^_{ A_{k + l} R_k 4- _{k k} A_k + ι R_k

= «i_+1|.^₊. i_t . (69)

D Applying the above lemma to the filters derived in Sec.5, we now express them in terms of R_k\_k-_\ instead of B . The following theorem summarizes the main result of this paper in terms of the new finite dimensional filters:

Theorem 7.4 Assume that the linear dynamical system given by (1) and (2) is uniformly completely controllable, i.e. (63) holds. At time k, the density β_k (x) (defined in (11) and initialized according to (23)) is completely defined by the 5 statistics a³ . b_k . d'_k ³ . Ii and μ_k as follows:

β ,-,ι_k ^J(^K.W) (,x .) = f \a i_ki^'fΛ ) + , o ,i_kι(Λf)' x + . x i d ,ι_kjlM) x ] αjt( ,x ,), M ,_f = _n 0, ,1, _n2

where a' £ R, ^ £ L?.^m, d_k £ S ^{m m} i a symmetric matrix with elements d_k(p,q), p = 1 m.7 = 1. . ,m and _k(x) is an un-normahzed Gaussian density with mean μ £ K^m and covariance R_k £ ~.^mxm.

Also the density 7j."(x) (defined in (11) is completely determined by the 4 statistics άf , b'" as follows:

y[ⁿ(x) =

+ l_t ⁿ'x] a_k(x), y₀'ⁿ(x) = (x.e_l)(y₀,e_n)_ao(x)

where £ ?., b^l _k £? ^{~ m}.

Furthermore μ_k, R_k, a_k , 6⁽ _t' , d_k , M — 0, 1,2, ά_k and i_k are given by the following recursions (where σ ^l is given by (66)): Kalman Filter Equations:

μ = A_kμ_k- 4- R_k\_k-ιC_k' [C_k R_k\_k-_\C_k' + D_k D'_k\ (y_k - C_k A_k μ_k._x) R = R_k\_k-_> -R_k[k-χC"_k {C_kR_kμ-₁C_k + D_kD^, _k)-ⁱ CkR -i Λ.lt-ι = Bi^' + Av R_k-ιA'_k (70) Finite dimensional filters for a[^,). b_k" d[" , M = 0.1.2. ά_k and h_k

+, ^ -l^,^ „-ι j ⁽«⁾⁾ ,-ι,^ p-i « , (W)' = _ o (71)

"^t+i - ^lik + ι\k^Aic+ι ^Hkdk ^Λt^^t + ^{i /c}ιt + ιμ ⁺ 2 ^ 2

(73) b^ = R i_{llk k + i} R_k ( ^l) + 2d ^l σ7l_lR7ⁱ μ_k) -r e_te_j'_σkl_χR_k-^l μ_k, 6_o ^(I, = 0_mxl

(74)

^rf*^'Vι = ^βϊ+ιμ -'^' + t ^{Λt ; β}*- A'_k+ι R_{k + ]\k}

+ [*. e; Λt

-^•* + • ^R* ^e> } ' " = °— <⁷⁵)

W 0 J_B x 1

(76)

+"'

_m ⁽7-

iⁿ ₊₁ =

+ e,{_ϊt+lle_n)₁ 6₀" = e,- {y₀,e_n) (79)

F»nα//y zniϊe dimensional filters for H_k'^l(M) and J_k'ⁿ (defined in 10) are given by (41) and (48).

PROOF First consider the Kalman filter equations (24) and (25). Using Lemma 7.3 on (25) gives

Λt = (βr_l ¹ _t_₁ + cir_t-¹c_t)^"1. (80)

Using the matrix inversion lemma on (80) and applying (67) to the first term on the RHS of (24) yields the "standard" Kalman filter equations.

Similarly applying (67) to the RHS of (30), (31) and (46) yield (72) and (73) and (79). D

Singular State Noise

Subject to the uniform complete controllability condition (63). the filtering equations in Theorem 7.4 hold even if the matrices B +

is singular

1. Add e κ /V(0, 1) noise to each component of x_{fc +} ,. This is done by replacing B_k+i in (1) with the non-singular matrix B_k+l = β|_{t + 1} + c I_m where e £? ".. Denote the resulting state process ^as x'+i 2 Define R[ _{+ 1}ι_k as in (62) wit h B_{k +} l eplaced bv B{ _{+ 1} Express the hlters m terms of R[ _{+ l i} , as in Theorem 7 4

3 As e — 0, R\ _{+ 1}\_k — R_k+ι\_k

4 Then using the bounded conditional convergence theorem (pg 214 [15]), the conditional estimates of x[ , x_k x_k',

and J_k ⁿ(x^c) converge to the conditional estimates of x_k , x_k x'_k,

and J_k'ⁿ(x) respectively

8 Filtered EM Algorithm for Gaussian State-Space Models

The aim of this section is to derive a filter based EM algorithm for computing maximum-likelihood parameter estimates of a linear Gaussian state space system The finite dimensional filters of Section 7 are used in implementing the E-step resulting in a filter-based EM algorithm Consider the time-invariant version of the state space model ( 1), (2)

j.i. + ι = A x_k + B w_{k + Ϊ} (81 ) y_k = C x_k + D v_k (82)

Our aim is to compute ML estimates of the parameters θ = (A, B² , C, DD') given the observation sequence j _{θ ι} , VT (Identifiability and consistency in special parametrized cases are discussed at the end of this section) We do this ia the EM algorithm

EM Algorithm

Suppose we have observations {j/i , yτ \ available, where T is a fixed positive integer Let { Pβ , θ £

Θ} be a family of probability measures on (Ω, F) all absolutely continuous with respect to a fixed probability measure PQ The likelihood function for computing an estimate of the parameter θ based on the information available in y_? is

and the maximum likelihood estimate ( MLE) is defined by

θ £ argmax £(0)

The EM algorithm is an iterative numerical method for computing the MLE Let 0₀ be the initial parameter estimate The EM algorithm generates a sequence of parameter estimates as follows Each iteration of the EM algorithm consists of two steps Step 1 (E-step) Set 0 = θ, and compute Q( ,0), where dP

Q(θ,θ) = E_i[\og dPi y_τ]

Step 2 f M-step) Find θ_{] + l} £ argmaxQ( ;)

The sequence generated {θ_} , _j > 0} gives non-decreasing values of C(θ₃ ) with equality if and only

It is shown in the appendix that

Q(θ θ) = -T\og\B\-{T+ 1) log| _ I_E J ∑(xι -Ax,- )'B-²(ι_t - Ax,.ι)\y_τ >

-E Σ (yi - C ι) D D' ¹ (y, - Cx,)\y_τ \ + E{R(0)\y_T} (83)

where R(θ) does not involve θ

To implement the \l-step we set the derivatives dQ/dθ = 0 This yields (using the identity 31og|Λ/|/5.\/ = Λ/^_1 for any non-singular matrix M)

i (84)

(II

B² = E{

M,¹ A') + AH ύ_T('2 ^l)> A A'

1=1

(86)

DD' yιy,'-(J!rC' + CJτ) CH_τ ^{0)σ (87)

where for M = 1,2,3, H_τ ^lM) £ K^mxm denotes the matrix with elements H^{l M} = E{H^' ^{M)\y_τ}, i,) £ {1. ,m} Also j_τ £ R^mxd denotes the matrix with elements 3ψ = E{Jψ\y_τ), i = 1,. , m n = 1, , d The terms Hlj. and 7" are computed using (47⁾ (48) together with the filters in Theorem 74 Thus we have a filter-based EM algorithm

Example: Errors in Variables Time Series

We use the filtered EM algorithm to estimate the parameters of the the ^errors-ιn-varιables time series example considered in [2] and [3]

Consider the scalar valued AR{p) process s_k , k £ 2⁺, defined as

1 = 1 where _k is a white Gaussian process Assume that we observe s; indirect Iv via the scalar process

Vk s_k 4-< α ~Λ'(0 σ- (89)

where c is a white Gaussian process independent of v The aim is to compute the ML estimate of the parameter vector θ = (<n, ,a_p,σ ,σ )

We first re-express (88) and (89) in state space form (81), (82) with d = [, m = p + \, a = [α, a_p]

Using a similar procedure as above, it can be shown that [2] that

^ logσ_£ - - ∑E{(y, - ₄,)²LVr} + E{R(θ)\y_T} (90)

2σ₍ ι=o where R(θ) does not involve θ So the M-step yields

ύ'j(0) ^Hτ ^{• <»}, a, (92)

(93)

As shown in [2], Theorem 1 in [12] implies that the sequence of model estimates {0 }, j £ X⁺ from the EM algorithm are such that the sequence of likelihoods {£(#,)}, j £ 7i is monotonically increasing It is straightforward to compactify the Euclidean parameter space Θ Since C(θ) is continuous on this compact space it is bounded from above Therefore the sequence {£(#,)} converges to some £^* Since the function Q(#, _;) is continuous both in θ and θ_} , bv Theorem 2 in [13], £" ι_^ Λ stationary value of £ To make sure that C is a maximum value of the likelihood, it is necessarv to try different initial values ΘQ Weak consistency of the maximum likelihood estimate is proved m [2] for this particular model (also see Chapter 4 in [16]) Similar parametrized models are used [1] and [6] and can be estimated ia our filter-based EM algorithm

9 Parallel Implementation of Filters

In this section we discuss the computational complexity of our filters and the resulting filter-based EM algorithm In particular, we describe why our filter based algorithm is suitable for parallel implementation and propose a systolic processor implementation of the algorithm

Sequential Complexity

We evaluate the computational cost and memory requirements of our filter-based algorithm and compare them with the standard smoother-based EM algorithm

Computational Cost The filter based E-step requires computation l each time : of E{H[³ ' \y_k ) , M = 0, 1 2 and E{J_k'ⁿ \y_k } for all pairs (ι, j)

• ^a'_k + ι Consider the RHS of the update equation (71 ) The following are the computational cost for each (i, ) pair at each time-instant k

2nd term 0(m) multiplications (inner product of two m-vectors)

3rd term 0(m²) multiplications (matrix vector multiplication)

4th term 0(m²) multiplications (matrix vector multiplication)

Since there are m² (i, j) pairs, the total complexity at each time instant is 0(m )

• Similarly the total complexity for evaluating 6'^ ' for all ² (t, j) pairs is 0(m⁴) multiplications

• Evaluating d^_j for each (ι,j) pair requires m x m matrix-matrix multiplications This involves 0(m³) complexity So the total complexity for all m² (ι, j) pairs is 0(m⁵) at each time instant

In comparison, the Kalman smoother based E-step in [2], [3]), requires 0(m³) complexitv at each time instant to compute E{H ⁽-^{M l}\y_τ ) , M = 0. 1 , 2, and E{J ⁿ| _τ} for all pairs (j j) Memory Requirements In our filter-based EM algorithm only the filtered variables at each time instant need to be stored to compute the variables at the next time instant They can then be discarded Hence the memory required is independent of the number of observations T In comparison, the Kalman smoot her-based EM scheme in [2] , [3] requires 0( m³ T) memorv since all the Kalman filter covariance matrices R_k . k < T need to be stored before smoot hed cov ariance matrices can be computed, see Eq.(2.12) in [2] This also involves significant memory read-write overhead costs.

Parallel Implementation on Systolic Array Architecture

The following properties of our algorithm make it suitable for vector-processor or systolic-processor implementation:

1. The computation of a ^{3 (}

^{3 '( t} and for any other pair (i', /) for all time k = 0, 1 , . . . So all the i j components of these variables can be computed m parallel on m² processors

Similarly computation of all (J, n ) components of dj." and δj," are mutually independent and can be done in parallel.

2. The recursions for a_k , b_k ', d)_k , and ά'_k ⁿ do not explicitly involve the observations They only involves the Kalman filter variables, μ_k- ι , c_k , R_k\_{k ~} ι Moreover, d_k only involves R_k (and so R_k\_k- ι) which itself is independent of the observations and can be computed off-line for a given parameter set θ.

All the processor blocks used above are required to do a synchronous matrix vector multiplication at every time instant k. Now a N x ;V matrix can be multiplied by a N vector in Λ' time units on a N processor systolic array, see [17] , pp 216-220 for details. (Also it can also be done in unit time on N² processors).

If r is the time required for tins matrix-vector multiplication, then for a T-point data sequence, our filtered algorithm requires a total of T T time units per EM iteration. In comparison, a parallel implementation of the forward-backward smoother-based EM algorithm requires 2 τ T time units per EM iteration because we need a minimum of T T time units to compute the forward variables and another T T units for the backward variables. For large T and a large number of EM iterations, this saving in time is quite considerable.

In addition, unlike our filtering method which has negligible memory requirements, the forward backward algorithm requires significant memory read-write overhead costs requiring πr T memory locations to be accessed for the stored forward variables while computing the backward variables

Finally, we may add that our algorithm can be easily implemented in a Smgle Instruction Multiple Data (SIMD) mode on a supercomputer in the vectorization mode or the Connection Machine using FORTRAN 8X Tv picalh w ith m

vector multiplications per time instant That is w e need a total of 10000 processor units on a C onnect ion Machine w hich typically has 2¹⁶ = 65536 processors

10 Conclusions and Future work

We have presented a new class of finite dimensional filters for linear Gauss Markov models that includes the Kalman filter as a special case These filters were then used to derive a filter based Expectation Maximization algorithm for computing maximum likelihood estimates of the parameters It is possible to derive the new filters in continuous-time using similar techniques This is the subject of a companion paper [18] \lso in practical applications it is worthwhile developing risk sensitive versions of the filter i e where the estimates are computed to ιnιnιmi7e an exponential cost function rather than a quadratic cost function

References

[1] R H Shumway and D S Stoffer, A n approach to time series smoothing and forecasting using the EM algorithm J Time Series Analysis, Vol 3, No 4, pp 253-264 1982

[2] D Ghosh, Maximum Likelihood Estimation of the Dynamic Shock-Error Model, Journal of Econometrics Vol 41 No 1, pp 121-143, May 1989

[3] E Weinstein, A V Oppenheim M Feder and J R Buck Iterative and Sequential Algorithms for Multisensor Signal Enhancement, IEEE Signal Proc Vol 42, No 4 pp 846-859 April 1994

[4] V Kπshnamurthy, On-line estimation of Dynamic Shock-Error Models, IEEE Trans Auto Control, Vol 35 No 5, pp 1129- 1134, 1994

[5] N S Jayant and P Noll Digital Coding of Waveforms Prentice Hall 1984

[6] I Ziskind and D Hertz, Maximum Likelihood Localization of Narrow-Band λutoregressive Sources via the EM Algorithm IEEE Trans Signal Processing, V ol 41 No 8, pp 2719-2723, August 1993

[7] L Breiman, Probability Classics in Applied Math Vol 7 SIAM Philadelphia 1992

[8] R J Elliott 1 inr t Adaptive r lers for Markov Chains Observed in ( i us i n Λ oist Vutomat- ica \ ol 0, No 9 pp 1399-1408 September 1994 [9] L ggoun R J Elliott and J B Moore 1 Measure C hang De nt ation oj ( ontinuou s tatc

Baum- W elch Estimators 1995

[10] R J Elliott, L Aggoun and J B Moore Hidden Markov Models Estimation and Control Springer Verlag, 1995

[11] A Papou s, Probability, Random Variables and Stochastic Processes McGraw Hill 1984

[12] A P Dempster, N M Laird and D B Rubin, Maximum Likelihood from Incomplete Data via the E M algorithm Journal of the Royal Statistical Society, B '19 pp 1-3S 1977

[13] C F J Wu, On the convergence properties of the EM algorithm The Annals of Statistics 11 , pp 95- 103 1983

[14] A H jazwmski, Stochastic Processes and Filtering Theory Academic Press 1970

[15] P Bilhngsley, Probability and Measure, Second Edition 1986 John W ilev NY

[16] E J Hannan and M Deistler, The Statistical Theory of Linear Systems Wiley, 1988

[17] E \ Kπshiiamurthy, Parallel Processing Principles and Practice Addison Wesle> , 1989

[18] R J Elliott and V Kπshnamurthy, New Finite Dimensional Filters for Estimation of Continuous-time Linear Gaussian Systems, submitted to SIAM Journal on Control and Optimization, 1995

A Derivation of Q(θ ) in (83)

Consider the time invariant state space model given by (81), (82) with θ — (A, B, C, D) denoting a possible set of parameters

We have shown in Sec 3 how starting from a measure P under w hich the xj and vi are independent and normal we can construct the measure P = P(θ), such that under P = P(θ), the x and y sequences satisfy the dynamics (1) and (2) In fact dP(θ) \

= * (*) dP

Suppose a — ( \ , B, C D) is a second set of parameters 'I hen

To change from, say. one set of parameters 0 lo " w i nust introduce the densities

•U(⁽>) ,_=n vhere

\D\ { - {yo Cxo)) \B\ Ψ [B-^l{xι-Ax,.ι)) \D\ Φ(D-^l{y, - x,))

;o ~ll =

M ( D- (yo C Xθ \B\ B- (XI iti-i)) ^{| |} ^(^β-^W-C ,

The parameters of our model will be changed from θ to θ if we set dP(θ)\

= r_k(^β,^β) dP(θ) Cϊi

In this case, dP(θ) log -k \og\B\ -{k+l) log\D\- -∑(x,- Ax,-ι)'B-²(x,- Ax,__{) dP(θ) 1 = 1

1 * - ∑(y, - C x,)' D-²[yι - C x,) + R(θ)

1=0 where R(θ) does not involve any of the parameters θ.

Then evaluating Q(θ,θ) = E{log - d^(θ )| V) for a fixed positive integer T yields (83)

Appendix B

%%%%%%%%%%%%%%%%

%% Matlab source code implementation of Filter-based EM algorithm for

%% multi-sensor signal enhancement

%%%%%%%%%%%%%%%%

%% Important variables

%% most variables are named the same as in the patent document

%% Exceptions are:

%% es :parameter estimates

%%

%%%%%%%%%%%%%%%%

%% External Functions used (source code listed below main program)

%% R.m, R_set.m, a.m, a_set.m, b.m, b_set m, d.m, d_set.π, e.m, mu.m,

%% mu_set.m, x.m, x_set.m, y.m, y_set.m

%%%%%%%%%%%%%%%%%%

% data length global T T = 1000; % passes global PASSES PASSES = 10,

% AR coefficients arc = [ 81;

% order of AR = p p = length (arc) , global m = p+1;

% process noise std

S = .1 ;

% measurement noise std sv = .05;

A = [arc, 0;eye(p), zeros (p, 1)1

B = [s zeros (l,p); zeros (p,l), zeros (p,p)

BB = B*B' ;

C = [1, zeros (l,p); ones(l.p) 0]

D = sv*eye(2,2)

DD = D*D' ;

% initial x xO = [s *randn; zeros(p.l)];

% define x global x_ x_ = zeros (m, T) ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % read data y load y;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% define Kalman Filter variables global mu_ R_ mu_ = zeros (m,T'),

R_ = zeros (m,T*m) ; % R_{ |k)

% a global a_ a_ = zeros (1, T*m*m) ;

% b global b_

D_ = zeros ( , T*m*m) ;

% d global d_ d_ = zeros (m, T*m*m*m) ,- A = [arc - (0 5-rand ( 1 , p) ) * 5 0 eye(p) zeros'p 1 est( . , 1) = Ail, l.p) ' for pass=l .PASSES pass

% Initialise Kalman Filter mu_set ( 1 , y ( 1 ) ) ,

R_set ( 1 , ones(m) + eye(m)*l)

% Initialise New Filter for ι=l -m for 3 = 1 m a_se (1, i, , 0) , end end for ι=l for ₃=1 m b_set(l, l, ], zerosim,!)), end end for ι=l m for ]=1 m d_set(l _ _j (e ( . ) 'e ( - > ^■ +e i - ) ^*e ' . ' ι _) enα end

% Kalman Filter & New tir.xte αimensional Filters for k=2 T % Eq 70

Rk = BB+A*R(k-ll *A'

% Eq 66

InvSig = Rik-1) -R(k-l) *A' 'mvlR l *A*R(k-l)

% Eq.70 mu_set (k, A*mu ( k- 1 ) +Rk*C ' *mv (C*Rk*C ' +DD) * (y (k) -C*A*mu (k-1 ) ) ) , R_set(k, Rk-Rk*C *ιnv(C*Rk*C'+DD) *C*Rk) ,

InvR = ιnv(Rik-l) ) , % Eq 71 or 1=1 -m for 3=ι m a_set (k,ι,3, a(k-l,ι,₃)+b(k-l,ι,3)' * InvSig* InvR*mu (k-1 su ( sum (d + (k-l,ι, 3) "InvSig) ) +mu(k-l) ' 'InvR* InvSιg*d (k- 1 , i, ) *InvS g* InvR*mu (k-1 ) ) b_se (k, l, , mv(Rk) *A*R(k-l)*(b(k-l,ι,₃)+2*d(k-l,ι,3) * InvSig* InvR + *mu(k- 1) ) 1 , d_set (k, 1,3, ιnv(Rk) *A*R(k-l) *d(k-l, ,3)*R(k-l)*A' 'ιnv(Rk) +1/2* (e⁽ + ι)*e(₃) '+e(₃)*e(ι) ' ) ) ; end end end

H = zeros (m, m) , % Eq.47 for 1=1 :m for 3=1.m

H(ι,3)=a(T,ι,3)+b(T,ι,3) ' *mu(T) +sum( sum( ( , I, 3) . *R(T) ) )+mu(T) ' *d(T, 1 3) *"n u(T) end end for ι=l :m- 1 for ₃ =1 +1 :m

H(3,ι) H(ι,₃) end end

E = zeros (p, p) ; F = zeros (p, 1) ; for ι=l:p for 3=l:p

E(ι,3) = H(I+1, +1) ; end end for i=l:p

F(i,l) = H(l,i+1) ; end

G = ιnv(E) *F

A = [C, 0;eye(p), zeros(p.l)]

%%%%%%%%%%%%%%%%

%% est denotes the parameter estimates

%%%%%%%%%%%%%%% est ( : . pass+1) = G end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Matlab source code listing of functions used in main program

% for Filter-based EM algorithm

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% R.m

%%%%%%%%%%%%% function [y] = R(t) global R_ m; y = R_( : , (t-1) *m+l:t*m) ;

%%%%%%%%%%%%%

% R_set.m

%%%%%%%%%%%% function [y] = R_set(t,z) global R_ ;

R_( : , (t-1) *m+l: t*m)=z;

%%%%%%%%%%%%

% a.m

%%%%%%%%%%%% function [y] = alt, 1,3) glooal a_ m T; v = a_(l, (t-l)*m*m + (ι-l)*m + 3)

%%%%%%%%%%%% %

% a_set.m

%%%%%%%%%%%%% function [] = a_set ( t , 1 , 3 , z ) global a_ T m; a_(l, ( t-1) *m*m + (ι-l)*m + ₃)=z;

%%%%%%%%%%%%%

% b.m

%%%%%%%%%%%% function [y) = b(t,ι,3) global b_ m T; y = b_( : , (t-1) *m*m + (i-l) *m + 3)

%%%%%%%%%%%

% b_set.m

%%%%%%%%%%% function [] = b_set ( , 1 , , z) global b_ T m; b_( : , (t-1) *m*m + (i-l) *m + 3 )=z;

%%%%%%%%%%

% d.m

%%%%%%%%%% function [y] = d(t,ι,₃) global d_ m T; y = d_( : , (t-1) *m*m*m+(ι-l) *m*m+ (3-D *m+l: (t-1) *m*m*m+⁽ι-l^{) *}m^*m⁺3^*m⁾

%%%%%%%%%% % d_set.m %%%%%%%%%% function [] = d_se ( t , 1 , 3 , z! global d_ T m; d_( . , (t-1) *m*m*m₊ (i-l) *m*m+ (₃-D *m+l. (t-1) *m*m*m+ (i-l^{) *}rr^*m+3 *m⁾ =z ;

%%%%%%%%%% % e.m %%%%%%%%%% function [yj = e(ι) global m y = zeros ( , 1) ; yd) = 1;

%%%%%%%%%%%% % mu.m %%%%%%%%%%% function [y] = rau ( ) global mu_,- = mu_ ( : , t ) ;

%%%%%%%%%%% * nu_set.m function [] = mu_set(t,z) glooal ιπu_; r__!l . length (z) , ) =z;

%%%%%%%%%% Ϊ x.m function [y] = x(t) global x_; y = X_ ( : , t ) ;

%%%%%%%%%

% x_set.m

%%%%%%%%% function [y] = x_set(t,z) global x_; x_( : , )=z;

%%%%%%%%%%

* y.

%%%%%%%%%% function [z] = y(t) global y_; z = y_ ⁽ : , ) ;

%%%%%%%%%%%

% y_set.m

%%%%%%%%%%% function [] = y_set(t,z) global y_; y_( : ,t)=z; APPENDIX C

The following model for the spot price of oil, S, was proposed by Eduardo Schwartz in his Presidential Address to the American Finance Association in January 1997 dS_t = f i - δ_t)SfU +σ,S_tdz_t{t) (1 )

Here, Zi is a standard Brownian motion, and δ_t represents the "convenience yield" (which models the value of holdings amounts of the commodity). In fact this follows similar stochastic dynamics of the form

dδ, = κ(a -δ, )dt + σ₂dz₂ (/) (2)

Here, z₂ is a second standard Brownian motion with <Zι(t), z₂(t) = pt.

It is convenient to consider the logarithm of the stock price. x, = In S_t

Then x satisfies

If r is the risk-free interest rate (taken to be constant here) and λ is the market price of convenience yield risk (also assumed constant) S and δ follow similar processes under an equivalent Martingale measure.

However, it is equations (2) and (3) which give discrete dynamics for the state vector (x_t, δt)¹ as:

(*„<?,) = c_f ⁺Q(. ,A,)' + ^ W

Here

and

1 -At ^'

Q, = a 2x2 matrix 0 \ - κAt

The future price for oil delivery at time T>0 is given by: -

F(S,δ,T) 1

where

A(T) r -a +

Here: S is the spot price today, T=0 δ is the value of the convenience yield today, T=0.

Consequently;

\nF(S,δ,T)=\iΛS-^δ^ ^e~ ^+A(T) (5)

It is these futures prices, for various dates, T, which are given in the market. That is for different dates Tι, T₂, ..., T_N we have observations

y?=\nF(S„δ_t +T^N)

These observations give the right hand side of (5), plus some "noise" term εe9ϊ^N where ε_t =(ε\ ,ε² _t,...,ε^N _t) is a sequence t=0,1,2 of independent

Gaussian random variables with

E[ε,]=0 e<R^N and varε,= E[ε_tε'_t]=H e9ϊ ,NxN

The observation equation (5) plus ε_t noise on the right side therefore has the form:

y, =d, + Z,(x„δ,y+ε, for t=1,2, ...,T (6)

where

are the future prices at time t, for delivery at times t+T, t+T², .... t+T^N.

The model in summary has dynamics (4) for the "signal" (x,, δ_t)

(x,A) = c, +Q(x,_ δ_l__iy + η, (7)

and dynamics (5) for the observations

y, = d + Z(x„δ_ιγ + ε, (8)

In spite of Schwartz's notation; c, Q, d and Z do not depend on t. They do include Δt, the time increment of fixed size.

Equations (7) and (8) are of the form where the Kalman filter can be applied. This considers linear dynamics for the signal

A e tø" x_/+1 = A + Ax, + Bω _[ (9)

A e m" and observations

y^ C + Cx. + Dv, , t=O, 1 , ... (10)

One observes y , t=0, 1 , ... , T, ... and wishes to make the best estimate of x_t. This is the quantity

x, = £[x,|y₀,... y,

In fact , is also Gaussian random variable with conditional mean

and variance

In fact, the formulas are better written in terms of the one-step predictions:

and

R* -ι = £[(*. - _-ι_-ι)(** -

Then

R^^B'+A^A'

The Kalman filter then gives recursive up-dates:

μ_k+

+ DD'J^l(_k+, -C-CA- CAμ_k (11 ) _k+ι = R*₊₎ι* -R_k+ι\kC'\CR_k+ilkC' +DD') CR_k+lik (12)

As stated, μ_k = E[x_k| y₀₎ ... , y_k] is the conditional mean, or best estimate, of x_k given y_0l ... , y_k . Similarly, R_k = E[(x_k- μ_k)(x_k- μ_k)' | y₀ y_k].

However, to implement the Kalman filter knowledge of the parameters A,A,B,C,C,D is required.

The filter-based EM process, using the finite-dimensional filter, can advantageously be used to estimate these parameters, as set out below.

Consider the following recursions for l ≤ij≤ a³ ,b_k ,d³ ,a_k,b_k , _k,v_k,u_k,v_k \<n≤d

A = 0,1,2

For M = 0,1:

a

-b?^M)R_{k k}AR_kA+A l_tkAR_k ? R_kA _kA _αv(M _{= 0 e} ^

btX = Il AR_t(b ₊ 2d Λ^] _k-2d R_kA'Kl_kA _A-;(θ) _{= 0 s K}-

« = R_k ,_]kAR_kd ⁰⁾R_kA'Rl _{k +} (e_le ₊e_Jel 1

Given the following:

The expectation values are defined as:

Hr

, etc...

Then for M=0, 1,2:

£p_* ]= ύ_k + v_k'μ_k

These equations represent the recursive finite dimensional filters for estimating the matrices and vectors H* , M=0, 1,2, J_k, L_k, and L_k given the observations

The revised estimates for the parameters A,B,C,D,~A,C are then (once given y_0l ... , y_τ):

- _t- L_r) C_τ = _τ-TCχXη^]

Given observations y₀, ... , y_τ the parameters are initialized and the above used to re-estimate the parameters one at a time. With the same data y₀, ... , y_τ this process is iterated until some convergence is observed.

Claims

CLAIMS:

1. A signal processing method for determining parameters of a linear Gaussian system to remove a Gaussian noise component, comprising: processing received data by a finite-dimensional filter to obtain expectation data; processing said expectation data to derive parameter data representative of said parameters.

2. A signal processing method as claimed in claim 1 , wherein said expectation data represents the sum over time of the state of a signal defined by said linear Gaussian system and the sum over time of the covariance of the state of said signal.

3. A signal processing method as claimed in claim 2, wherein said received data is time series data.

4. A signal processing method as claimed in claim 3, wherein said received data processing step is executed iteratively on said time series data in one direction in time.

5. A signal processing method as claimed in claim 4, wherein said direction comprises a forward pass of said time series data.

6. A signal processing method as claimed in claim 5, wherein said processing steps are executed in real time as said time series data is received, and said parameter data is updated recursively by said expectation data processing step

7. A signal processing method as claimed in claim 1 or 5, wherein said finite dimensional filter obtains mean and covariance state data for said signal using Kalman filtering, and generates in parallel a plurality of coefficient data from said mean and covariance state data, and said coefficient data is used to generate said expectation data.

8. A finite-dimensional filter for processing received data to obtain state data for deriving parameter data of a linear Gaussian system associated with said received data.

9. A finite-dimensional filter as claimed in claim 8, including:

5 a plurality of finite-dimensional filters for generating coefficient data in parallel; and a module for generating said state data from said coefficient datas said state data representing the sum over time of the state of a signal defined by said linear Gaussian system and the sum over time of the covariance of the state of said signal.

10 10. A finite-dimensional filter as claimed in claim 9, wherein said received data is time series data.

11. A finite-dimensional filter as claimed in claim 10, wherein said time series data is processed iteratively and in one direction in time.

15

12. A finite-dimensional filter as claimed in claim 11 , wherein said direction comprises a forward pass of said time series data.

13. A finite-dimensional filter as claimed in claim 12, wherein said received data which 0 includes mean and covariance state data obtained from a Kalman filter which operates on received data representing said signal.

14. A finite-dimensional filter as claimed in claim 13, wherein said received data is processed in real time. 5

15. A finite-dimensional filter as claimed in claim 14, wherein said state data is used by an expectation and maximisation process to derive said parameter data which is representative of the parameters of said Gaussian system.

16. A speech signal processing method for determining parameters of a linear Gaussian system representing speech received by at least two sensors, comprising: processing the signals received by the sensors using an expectation and maximisation process to obtain estimates of the parameters representing the system, said process including using a finite-dimensional filter to obtain state data representative of the speech signal, said state data being used to derive the parameter estimates.

17. A signal processing method for determining parameters of a plurality of narrowband signal sources, comprising: receiving signals from said sources at a central location; processing signals using a finite-dimensional filter to obtain expectation data representative of the state of a signal for each source; and processing said expectation data to derive parameter data representative of said parameters, including signal direction of arrival and signal bandwidth for each source.

18. A signal processing method for determining parameters of a linear Gaussian system representing a financial system for generating predictive financial data, comprising: processing observed financial data for the financial system using an expectation and maximisation process to obtain estimates of the parameters representing the system, said process including using a finite-dimensional filter to obtain state data representative of the financial data, said state data being used to derive the parameter estimates.