EP2011114A1 - Signalanalyseverfahren mit einem nichtgausschen autoregressiven modell - Google Patents

Signalanalyseverfahren mit einem nichtgausschen autoregressiven modell

Info

Publication number
EP2011114A1
EP2011114A1 EP07722544A EP07722544A EP2011114A1 EP 2011114 A1 EP2011114 A1 EP 2011114A1 EP 07722544 A EP07722544 A EP 07722544A EP 07722544 A EP07722544 A EP 07722544A EP 2011114 A1 EP2011114 A1 EP 2011114A1
Authority
EP
European Patent Office
Prior art keywords
model
signal
gaussian
state
analysis method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07722544A
Other languages
English (en)
French (fr)
Inventor
Li Chunjian
Søren Vang ANDERSEN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aalborg Universitet AAU
Original Assignee
Aalborg Universitet AAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aalborg Universitet AAU filed Critical Aalborg Universitet AAU
Publication of EP2011114A1 publication Critical patent/EP2011114A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]

Definitions

  • the present invention relates to the field of signal processing, more specifically the invention relates to signal modelling and system identification, e.g. blind system identification as used within speech analysis or communication channels with inter-symbol interference.
  • a wide range of signal processing applications and methods perform signal analysis as an important step.
  • a parametric analysis of a signal consists of designing a signal model and identifying the model parameters.
  • Blind system identification of non-Gaussian auto-regressive model (AR) models is known in the art, e.g. within speech analysis. Examples of such methods are:
  • the known methods have a number of disadvantages.
  • the EMAX method as described in 1), tries to solve the parameter estimation problem using a Maximum Likelihood (ML) criterion.
  • ML Maximum Likelihood
  • the SAR model as described in 2), does not exploit the finite state structure in the excitation since it assumes the mean of the excitation process to be constantly zero. Also, the way it updates the AR filter coefficients is inefficient because in many signals the AR filter coefficients change more slowly than the excitation.
  • the invention provides a signal analysis method including a non- Gaussian auto-regressive model, wherein an input to the auto -regressive model is modelled as a sequence of symbols from a finite alphabet by a finite state stochastic model, where probability density functions of the input at each time instant are Gaussian probability density functions having the same variance for each symbol and with their means representing values of the symbols.
  • a preferred embodiment of the FSSM is the Hidden Markov Model (HMM).
  • HMM Hidden Markov Model
  • GMM Gaussian Mixture Model
  • HMARM Hidden Markov-Auto Regressive Model
  • GMARM Gaussian Mixture-Auto Regressive Model
  • This method is advantageous for implementation into algorithms for many applications that involve signal analysis using blind system identification of a source-filter type model, e.g. within speech analysis/processing or telecommunication. Due to the design of the HMARM/GMARM with the constraint on the variance of emission pdfs, an exact ML parameter estimation can be done by solving a set of linear equations iteratively. Thus, the method is computationally efficient and therefore suited also for miniature equipment and real time applications, e.g. hearing devices and mobile communication devices, where only a limited processing power is available and real time performance is critical.
  • the FSSM may in principle have any finite number of states, i.e. the number being an integer larger than one.
  • the FSSM is a two-state model.
  • the method performs an expectation-maximization (EM) algorithm for system identification.
  • the EM algorithm may involve performing an optimal smoothing on the signal with a multi-state minimum mean-square error smoother as the E- step.
  • a multi-state minimum mean-square error smoother is a soft-decision switching Kalman filter. The smoothing reduces noise in the signal and gives all the necessary statistics that are needed in the M-step.
  • the signal analysis method according to the first aspect may form part of an algorithm for applications such as: speech analysis, speech enhancement, blind source separation, blind channel equalization, blind channel estimation, or blind system identification.
  • the invention provides a device including a processor arranged to perform the method according to the first aspect.
  • the device may be one of: a mobile phone, a communication device, an internet telephony system, sound recording equipment, sound processing equipment, sound editing equipment, broadcasting sound equipment, and a monitoring system.
  • the device may also be one of: a hearing aid, a headset, an assistive listening device, an electronic hearing protector, and a headphone with a built-in microphone.
  • the device may be arranged to receive a telecommunication signal, wherein the device includes a channel equalizer arranged to perform a blind channel equalization according to a method of the first aspect.
  • the invention provides a computer executable program code adapted to perform the method according to the first aspect.
  • program may be present on a computer or processor memory or represented as data on a data carrier, e.g. hard disk or mobile data carrier (CD, DVD, memory card or memory stick etc.).
  • FIG. 1 illustrating a generative data structure of a HMARM embodiment
  • FIG. 2 illustrating a generative data structure of an E-HMARM embodiment
  • Fig. 3 illustrating the shift invariance property of the HMARM method of the invention compared to the LPC method as the prior art.
  • the two graphs show: (a) log-spectral distribution (LSD) of AR spectra of 50 shifted frames, (b) synthetic signal waveform used in the experiment resulting in (a), Figs. 4a and 4b illustrating four panels of graphs with AR spectra estimated for different vowels to illustrate the result of LPC analysis according to prior art compared with the result of the HMARM according to the invention, each panel illustrating: HMARM (top), LPC analysis (middle), and original signal waveform (bottom),
  • Fig. 5 illustrating prediction residuals by the HMARM (top), by LPC analysis (middle), the example signal waveform being shown in the bottom graph
  • Fig. 7 illustrating an example of recovered symbol sequences, where dots indicate transmitted symbols, circles indicate recovered symbols by a HMARM embodiment, while stars indicate recovered symbols by a least-squares (LS) method,
  • Fig. 8 illustrating true and estimated spectra with the HMARM and the LS method for the example shown in Fig. 7, where it is noted that the HMARM result is overlapping with the true spectrum and is therefore hardly visible,
  • FIG. 9 illustrating another example of recovered symbol sequences with the same type of indications as in Fig. 7,
  • Fig. 10 illustrating true and estimated spectra for the example of Fig. 9, where the difference between the HMARM result and the true spectrum is only marginal (SNR is 15 dB),
  • FIG. 11 illustrating yet another example of recovered symbol sequences with the same type of indications as in Fig. 7,
  • Fig. 12 illustrating true and estimated spectra for the example of Fig. 11, and again the difference between the HMARM result and the true spectrum is hardly visible (SNR is 18 dB)
  • Fig. 13 illustrating a block diagram of a generative signal model used in a preferred signal analysis embodiment
  • FIG. 14 illustrating a preferred headset with speech enhancement.
  • the finite-state stochastic model is embodied as a hidden Markov model (HMM), and an exact expectation- maximization (EM) algorithm is proposed that incorporates a switching Kalman smoother which provides optimum nonlinear MMSE estimates of system output based on the HMM.
  • EM algorithms can only be obtained by appropriate constraints in the model design, and have better convergence properties than algorithms employing generalized EM algorithms or empirical iterative schemes.
  • the method embodiments also provide good data efficiency since only second order statistics is involved in the computation.
  • the signal models are general and suitable to numerous important signals, e.g. speech signals and base-band communication signals. Two system identification algorithm embodiments will be described, and experiment results will be given for embodiments with applications within speech analysis and channel equalization.
  • Embodiments of the signal analysis method according to the invention will be described applied to speech analysis, blind channel equalization of a communication channel, spectrum estimation for voiced speech in background noise, and blind noisy channel equalization.
  • the first system model consists of a linear time-invariant AR filter excited by a first- order discrete-state Hidden Markov process.
  • the AR filter models the resonant property of the vocal tract
  • a two-state Hidden Markov process models the excitation to the filter as a noisy impulse train.
  • the task of system identification here is to jointly estimate the AR coefficients and the excitation dynamics, which contains information about the impulse position, the impulse amplitude, and the noise variance, under a certain optimum criterion. By the joint estimation, the highly non-Gaussian impulse train structure of the excitation no longer affects the AR estimation as it does in the classic Least Squares (LS) solution.
  • LS Least Squares
  • the LS methods such as the auto-correlation method, also known as the LPG analysis, assumes a Gaussian signal model.
  • the consequence of the mismatch of Gaussian model to non-Gaussian signals is an unnecessarily large variation in the estimates. This is supported by the fact that the Cramer-Rao bound for the variances of the AR estimators is lower in the non- Gaussian case than in the Gaussian case [I].
  • Estimating the AR parameters taking into account the impulse structure of the excitation can also reduce bias. This bias is present in the LPC analysis because of the spectral sampling effect of the impulse train. We will s show that the AR spectra estimated by our method have smaller variance and bias and a better shift invariance property than the LPC analysis.
  • the algorithm exploits the underlying dynamics and non-Gaussianity of the finite alphabet symbol sequence to accomplish system identification.
  • An example of equalizing an MA channel is also demonstrated.
  • observation noise is taken into account.
  • the model consists of a linear time-invariant AR filter excited by a first-order discrete-state Hidden Markov process, and the measurements of the system output are perturbed by white Gaussian noise.
  • the identification algorithm must jointly estimate the AR parameters, the excitation dynamics, and the measurement noise variance.
  • the introduction of measurement noise complicates the problem significantly. This is because that the simplicity of the first algorithm partly comes from the fact that the AB. model aggregates the state information in the most recent system output samples, which are now hidden due to the presence of measurement noise.
  • the EM algorithm thus involves a nonlinear MMSE smoother, which provides estimates of the conditional first and second moments of the system output needed in the parameter estimations.
  • a nonlinear MMSE smoother that can be seen as a variant of the soft-decision Switching Kalman Filter [4], where the states control the discrete inputs to the AR filter, and the switching relies on the a posteriori probability of states estimated by a forward-backward algorithm.
  • the EM algorithm thus iterates between the nonlinear MMSE smoothing and the ML parameter estimations.
  • the introduction of measurement noise modeling in the second system model is a major extension to the first system model.
  • the second method is thus noise robust and applicable in adverse environments, although with a price of higher computational complexity.
  • the algorithm gives better estimates of the signal spectra than reference methods do, under moderate noise conditions.
  • Established iterative estimators based on Gaussian AR models are known to have convergence problems, thus an empirical termination is required [5] [6]. They also require prior knowledge of measurement noise statistics.
  • the proposed algorithm does not require prior knowledge of the noise statistics, and its convergence is guaranteed. Applications to channel equalization under moderate noise conditions are also demonstrated. Simulations show that the proposed algorithm has better estimates of the channel response and the transmitted symbols than the Least Squares method.
  • SAR Switching Auto Regression
  • HMM Switching Auto Regression
  • An EM algorithm is derived for parameter estimation. This is a very general formulation, since its dynamics has vector states and all parameters are time dependent. But if the system dynamics takes the specific form of all-pole filtering, as is focused in this work, the SAR becomes less efficient since it is hard to impose s any structure on the state transition matrix. Besides, the system noise in the SAR is assumed to be zero mean, while in our models the mean of the system noise is also controlled by the state, which is found particularly useful in the applications discussed in this paper.
  • the following Section introduces the two signal models and derives the EM algo- o rithms for blind system identification.
  • the proposed algorithms are applied to solving problems in speech analysis, noise robust spectrum estimation, and blind channel equalizations with and without measurement noise.
  • This noisy impulse train structure can be characterized by a two-state symbol sequence. While an M- ary Pulse Amplitude Modulation (PAM) signal can be characterized by an M-state symbol sequence.
  • the probability distribution functions (pdfs) of these discrete state excitations are thus multi-modal, and possibly asymmetric (as is for the impulse train).
  • a Gaussian Mixture Model (GMM) or a Hidden Markov Model (HMM) is suitable to characterize the statistics of such excitations.
  • GMM Gaussian Mixture Model
  • HMM Hidden Markov Model
  • HMARM Hidden Markov-Auto Regressive Model
  • GMARM Gaussian Mixture- Auto Regressive model
  • the HMM is preferable in modeling the excitation because of its capability of modeling the underlying dynamics that is not captured by the GMM, which is a static model. Therefore, the following presentation will mainly focus on the HMARM with a brief discussion on the advantage of the HMM over the GMM in modeling temporal structure.
  • the HMARM and its identification without measurement noise we present the HMARM and its identification without measurement noise.
  • the subsequent section deals with the identification of HMARM with its output perturbed by white Gaussian noise, which is termed the Extended-HMARM.
  • x(t) is the observed signal (system output)
  • g(k) is the fcth AR coefficient
  • r(t) is the excitation.
  • the excitation is a Hidden Markov process, i.e., a first order Markov chain ⁇ (t) plus white Gaussian noise u(t) with zero mean and variance ⁇ 2 .
  • a diagram of the data structure of the HMARM is shown in Pig. 1.
  • the state q t at time t selects according to the state transition probability GL ⁇ one of M states.
  • the emission pdfs of the states are Gaussian pdfs with the same variance ⁇ 2 , and means m r (j), j ; 6 (1, - - - , M), respectively.
  • the emission outcome constitutes the excitation sequence r(t), which is independent of r(l) for I ⁇ t and only dependent on the state q t .
  • the excitation r(t) is then convolved with an AR(p) filter with coefficients [g(l), • - ⁇ , g ⁇ p)] to produce the observation x(t).
  • Erom (1), (2) and (3), b x (j, t) can be shown to be a Gaussian pdf with a time varying mean m x (j, t),
  • the quantity represents the expected number of transitions made from state i, and represents the expected number of transitions from state i to state j [H].
  • the first term in (9) concerns only a ⁇ and the second term concerns the rest of the parameters. Thus the optimization can be done on the two terms separately.
  • the re- estimation equation of a ZJ is found by the Lagrange multiplier method, and is identical 5 to the standard Baum- Welch re-estimation algorithm [13]:
  • Equation (12) and (13) form p + M coupled linear equations which can be solved analytically, wherein m x (j ⁇ i) is calculated by (5). Then (14) can be solved by inserting the estimated g (k) and m r (j).
  • Equation (12) is a multi-state version of the orthogonality principle
  • Equation (13) tells that the prediction error weighted by state posterior is of zero mean
  • (14) calculates the mean of the prediction error power weighted by the state posterior as the variance of the stochastic element of the signal.
  • a GMM with similar constraint can be used in place of the HMM in our signal model, and the EM equations can be derived in the same way as shown above with proper changes in the definition of a and ⁇ (the ⁇ i, j, t) used in the HMM is not needed in the GMM).
  • the derivation of the GMARM is briefly described in Appendix 0.4.
  • the advantage of the GMARM is a lighter computational load than that of the HMARM. Whereas, the lack of dynamic modeling makes the GMARM converge slower and estimate less accurately than the HMARM whenever there is a temporal dependency in the excitation that is invisible to the GMM, since the GMM is still a static model.
  • the complexity of the HMARM identification is similar to that of the standard HMM, since the forward-backward evaluations of state posterior and the re-estimation equations are analogous to that of the HMM.
  • the extra complexity lies in solving the p+M linear equations (that is, (12) and (13)) for the AR coefficients in every iteration, which accounts for only a mild increase in complexity.
  • y(t) is the observations
  • z(t) is the measurement noise
  • g(k) is the fcth AR coefficient
  • r(t) is the non-Gaussian process noise, or, the filter excitation.
  • r(t) is the sum of v(t), a sequence of M-state symbols, and a white Gaussian noise sequence u(t) with zero mean and variance
  • the excitation r(t) is actually a Hidden Markov process with M states.
  • these states have Gaussian emission pdfs with mean ro r (j), j ' £ [1, • • • , M], and identical variance
  • the observation noise is assumed to be white Gaussian noise with zero mean and variance
  • the mean of y(t) should be x(t) if x(t) was known. But since x(t) is not available, a proper choice of the mean of y(t) will be the mean of x(t) given y.
  • So rn y ( j, t) can be obtained by calculating the smoothing estimate of x(t) using the observations y and the current state q t ⁇
  • the variance of the emission pdf is therefore the sum of the smoothing error variance and the measurement noise variance.
  • the smoothing estimates and the error variance can be calculated with a nonlinear MMSE smoother, which will be described later. It can be summarized as follows:
  • Equation (21) follows from the first order Markovian property of the layered data model:
  • Q T involves only the top hidden layer parameters
  • Q B involves only the bottom hidden layer parameters
  • Qy involves only the visible (observation) layer parameters.
  • the maximization of the Q function can now be done by maximizing the three terms in (21) separately.
  • Qy According to the Gaussian assumption of the observation noise, Qy can be written as:
  • the transition probability can be estimated in the same way as in the standard HMM:
  • Equation (26), (29), and (30) consist of a set of 1 +p+ M linear equations with the same number of unknowns, and can be solved by straight forward linear algebra. Then (32) can be solved by inserting the newly updated parameter estimates.
  • the quantities needed in these equations include: the state posteriors ⁇ i,j,t) and 7(2, i), which are calculated by the forward-backward algorithm; the first and second moments of x(t), which are estimated by a nonlinear MMSE fixed-interval smoother.
  • the nonlinear MMSE smoother consists of a forward sweep and a backward sweep.
  • M estimates weighted by the state a posteriori probabilities, 7(2, i), to get an MMSE filtering estimate conditioned only on y.
  • the backward sweep calculates the smoothing estimates and MSE matrices using the filtering estimates and MSE matrices obtained in the forward sweep, Tie backward sweep equations are identical to those of the two-pass Kalman smoother, and can be found in, e.g., [14, p.572].
  • the o algorithm thus iterates between the nonlinear MMSE smoother, and the estimation of ⁇ and j(i, t).
  • the algorithm stacks two dynamic state estimators together, i.e., the nonlinear MMSB smoother and the HMM estimator.
  • a unifying view of the Kalman-type state estimator and the HMM state estimator can be found in [15].
  • the nonlinear smoother uses a continuous state model, where the state vector is the output of the AR(p) filter, xj t _ p+ i : i, and the state transition is ruled by the auto-regressive property of the AR(JD) filter.
  • the HMM uses a discrete state model, where the states are the input symbols, and the state transition is ruled by the underlying mechanism that produces
  • the proposed nonlinear MMSE smoother falls in the category of Switching Kalman Filter (SKF) with soft-decision, as is defined in [4]. But there are major differences between the model and identification proposed in [4] and the ones proposed here.
  • the parameters of a vector AR model (named Switching AR model, or o SAR) switch their values over time, and the switching is modeled by an HMM.
  • the parameter estimation is done by an EM algorithm.
  • the major differences between the SAR and the E-HMARM are: 1) When modeling an AR(p) signal, the state transition matrix has a sparse structure, but the transition matrix in the SAR has no structure.
  • the zero- valued elements of the matrix are set to zeros s after the estimation of the matrix. This is analogous to projecting the estimates to the correct parameter space.
  • the AR model is explicitly expressed in terms of AR coefficients, thus no matrix estimation neither projection are needed.
  • the system noise process is modeled as a zero- mean process with a varying variance
  • the system noise 0 is modeled as a process with a varying mean and a constant variance.
  • the SAR assumes no observation noise.
  • the complexity of the E-HMARM identification is significantly higher than that of the HMARM. This is due to the need of Kalman smoothing in each iteration.
  • the Kalman smoothing has a complexity of 0(Tp 3 ), whereas the complexity of the HMARM 5 is O(TM 2 ). Therefore, the complexity of the E-HMARM is O (Tp 3 ).
  • Least Squares methods such as the LPC analysis (implemented as an autocorrelation method), have been the standard methods of analyzing AR models.
  • the Gaussian assumption taken by the LS method results in simple analytic solutions. But when applied to non ⁇ Gaussian signals such as voiced speech signals, the mismatch of assumption brings in undesirably large variance and bias.
  • the large variance implies a bad shift-invariance property of the LPC analysis. This means that, when a sustained vowel is segmented into several frames, the LPC estimates of the AR parameters for each frame can be very different.
  • the synthetic signal is made by filtering a noisy impulse train with an AR(IO) filter. 50 realizations of this signal are analyzed. To get the 50 realizations we shift a rectangular window along the signal one sample each time 50 times. The window length is 320 s samples. The estimated AR spectra of the 50 realizations are compared to the true AR spectrum, and the difference is measured by the Log-Spectral Distortion (LSD) measure.
  • LSD Log-Spectral Distortion
  • L is the number of spectral bins.
  • the LSD versus the shift is shown in Pig o 3. It is clear that the proposed method has a flat distortion surface and this surface is lower than the LPCs. It is important to note that the LPC estimates encounter huge deviation from the true values in the second half of the plot. This is where a large "hump" in the signal comes into the analysis frame. The large humps in the signal are caused by the impulses in the excitation, which represent the non-Gaussian/nonlinear s structure of the signal.
  • the bias of the estimates is also calculated by taking the difference between the true parameters and the sample mean of the estimates, and the variance of the estimates is calculated using the sample variance of the estimates. Table 1 compares the biases and variances of the HMARM and LPC estimates.
  • the spectra are plotted in Fig 4.
  • the estimates by the HMARM show good consistency, while the consistency of the LPC analysis appears to be poor.
  • the residual of the HMARM analysis also has different properties than the LPC analysis.
  • Fig. 5 we show the prediction residual of a voiced speech signal using the AR parameters estimated by the HMARM and the LPC respectively. It is clear that the residual of the HMARM has more prominent impulses, and the noise between the impulses appears to be less correlated.
  • the residual of HMARM has a smaller Ll norm than that of the LPC analysis.
  • the proposed method provides a sparser representation of the voice signal than the one given by LPC analysis.
  • sparse representation is achieved by minimizing Ll-norm with numerical optimizations (see [16] for a review, and [17] for application in speech analysis), or using Bayesian inference with a super Gaussian pdf as prior [18].
  • the HMARM method proposed here provides a computationally simple alternative to the sparse coding of voiced speech signals.
  • Another known LS method is the covariance method [19, Ch. 5.3].
  • the covariance method is known to give more accurate estimates of the AR coefficients than the autocorrelation method when the data length is small. In our experiments, it is so when the analysis window is rectangular. When a Hamming window is used, the covariance method gives similar results as the autocorrelation method. 0
  • the channel response has included the response of the transmitter filter, the medium, the receiver filter, and the symbol-rate sampler.
  • the channel can be well characterized by an AR model, and no measurement noise is present (or, the channel B has a very high SNR).
  • the transmitted symbols are quaternary PAM symbols.
  • the channel distortion is compensated and the transmitted symbols are decoded.
  • the receiver has no prior knowledge about the channel, the alphabet of the transmitted symbols, and the probability distribution of the symbols.
  • the equalization and decoding are done jointly.
  • the channel is AR(IO) with coefficients
  • A [1, -1.223, -0.120, 1.016, 0.031, -0.542, -0.229, 0.659, 0.307, -0.756, 0.387].
  • the equalizer output and the estimated channel spectra are shown in Fig. 7 and Fig. 8, respectively.
  • the LS method as the reference method. It is clear from the figures that the recovered symbol sequence by the HMARM method coincides with the transmitted symbols very well, and the spectrum estimated by the HMARM method completely overlaps with the true channel spectrum. Whereas the LS method has a much larger estimation error on both the recovered symbols and the channel spec- trum. More precisely, the estimation error variance of the recovered symbol sequence is 1.06 x 10 ⁇ 26 for the HMARM method and 0.36 for the LS method, which represents a 255 dB gain of the HMARM method over the LS method.
  • the alphabet A is the same as before, and the 3rd order MA channel coefficients are B — [1.0, -0.65, 0.06, 0.41].
  • the recovered symbol sequence are shown in Fig. 9.
  • the estimation error variance of the recovered symbol sequence is 0.0023 for the HMARM method, and 0.4212 for the LS method.
  • the gain of the HMARM method over the LS method is 22.6 dB.
  • the performance of the HMARM method degrades.
  • the gain of the HMARM method over the LS method are 27.5 dB, 17.5 dB, and 8 dB, respectively. From 30 dB down, the performance of HMARM is similar to that of the LS method.
  • Fig. 10 shows the signal spectrum and its estimetes given by the E-HMARM and LS, respectively, at 15 dB input SNR.
  • Table 2 shows the averaged values of parameters of 50 estimations. The results show that the E-HMARM algorithm gives much better estimates of the signal spectra than the LS method. The estimates of the impulse amplitude and measurement noise variance are also quite accurate.
  • the estimated process noise variance is always larger than the true value, especially when the SNR is low. This is because in the E-HMARM algorithm, the modeling error is included as part of the process noise.
  • the E-HMARM algorithm Like all EM-type algorithms, it is possible for the E-HMARM algorithm to converge towards a local maxima. A good initialization can prevent converging to the local maxima.
  • the LS estimates of the AR coefficients are used as initial values. The convergence criterion is set such that the iteration stops when the norm of the difference in the parameter vectors is smaller than 10 ⁇ 4 . No divergence has ever been observed under extensive experiments.
  • the E-HMARM algorithm works best at SNRs above 15 dB. From 10 dB and below, the algorithm converges to the LS solution.
  • PPM Pulse Position Modulation
  • UWB ultra-wide-band
  • the spectrum of a PPM symbol sequence is high pass and has a strong DC component (Instead of defining the whole frame as a symbol, here we treat the pulse duration as the symbol duration.
  • a time frame consists of M symbols, and the sampler at the receiver samples M times per frame. This is why the received symbol sequence has a strong DC component and a high pass spectrum.).
  • a signal frame thus has 8 time slots, each corresponding to one symbol in the alphabet.
  • a pulse is put at the fcth time slot, and zeros elsewhe ⁇ e.
  • the transmitted signal is modeled as a "1" at the symbol position and "0" at the other 7 positions.
  • the channel is modeled as an AR(IO) filter.
  • White Gaussian noise is added to the output of the AR(IO) filter.
  • the E-HMARM equalizer estimates the channel response and the noise variance, and does inverse filtering to recover the transmitted symbols.
  • the standard LS method is used as a reference method. It is shown in Fig. 11 that the recovered symbol sequence by the E-HMARM method has much smaller error variance than that of the LS method. In Fig. 12 it is shown that the E-HMARM gives a very good estimate of the channel spectrum, while the LS estimate is far off.
  • the channel SNR in this example is 18 dB, and the signal length is 400 samples.
  • the E-HMARM equalizer works best at SNRs above 18 dB. At SNRs below 18 dB its performance degrades fast. At SNRs below 15 dB the E-HMARM algorithm converges to the LS solution.
  • This GMARM algorithm has a lighter computational load than the HMARM pre- sented in Section 0.2.1 since the calculation of the state posterior probability has a simpler form.
  • Table 1 A comparison of biasis and variances of the estimates by the HMARM and LPC. The improvement of the HMARM over the LPC is shown in the Imprv.
  • TaLIe 2 The true and estimated parameters. Results are the average of 50 estimations.
  • Fig. 13 illustrates in block diagram form a preferred embodiment of the signal model used in system identification.
  • the signal is modelled as the output of an AR filter, here illustrated as an AR(IO) filter, excited by the input X.
  • the input X is modelled by a Finite State Stochastic Model FSSM.
  • the constraint on the variance is due to the fact that the input X is modelled as a sequence of symbols I selected from a finite alphabet added with stationary white Gaussian noise.
  • the parameters of this model are identified by an EM algorithm and are used in a later stage for processing of the signal.
  • the FSSM uses a Gaussian probability density function with a variance value Oi 2 for the first symbol which equals a variance value ⁇ i 2 for the second symbol.
  • Mean values ⁇ l7 ⁇ 2 in the Gaussian probability density functions are determined by the symbols, and in this example the values 0 and 1 are selected.
  • the algorithm is suited e.g. for input signals X such as speech which can be modelled as a train of impulses, and hence only a sequence of symbols indicating "pulse” and "no pulse” is needed as input to the AR.
  • Fig. 14 illustrates in block diagram form an example of a device according to the invention, namely a headset.
  • the headset includes digital signal processor arranged to receive a signal representing noisy speech, e.g. the voice of a remote speaker distorted by background acoustic noise in a noisy environment.
  • System identification of a signal model e.g. the one described in Fig. 13, is then performed on this input signal to estimate parameters of the model.
  • a speech enhancement procedure filters out the noise from the speech using the estimated parameters.
  • the loudspeaker produces an acoustic signal representing an enhanced version of the noisy input speech signal and thus provides the headset user with an improved speech quality.
  • the headset may include many additional signal processing features and additional electronic equipment. E.g. analog-to-digital and digital-to-analog converters are not illustrated.
  • the input noisy speech signal is analyzed using a 2-state E-HMARM model where the emission probability density functions for the two states have the same variance.
  • the noise reduction block reduces noise by performing a multi-state Kalman smoothing, given the estimated model parameters.
  • this enhanced speech signal is applied, e.g. after digital-to-analog conversion, to a loudspeaker in the headset that converts the signal to an enhanced acoustic speech signal.
  • the illustrated speech analysis and enhancement procedure may additionally or alternatively be used for the speech picked up by the microphone in the headset before transmitted from the headset, so as to clean speech picked up by the microphone, e.g. if the user is located in a noisy environment.
  • the individual elements of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way such as in a single unit, in a plurality of units or as part of separate functional units.
  • the invention may be implemented in a single unit, or be both physically and functionally distributed between different units and processors.
EP07722544A 2006-04-04 2007-03-30 Signalanalyseverfahren mit einem nichtgausschen autoregressiven modell Withdrawn EP2011114A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA200600476 2006-04-04
PCT/DK2007/000163 WO2007112749A1 (en) 2006-04-04 2007-03-30 Signal analysis method with non-gaussian auto-regressive model

Publications (1)

Publication Number Publication Date
EP2011114A1 true EP2011114A1 (de) 2009-01-07

Family

ID=38134633

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07722544A Withdrawn EP2011114A1 (de) 2006-04-04 2007-03-30 Signalanalyseverfahren mit einem nichtgausschen autoregressiven modell

Country Status (2)

Country Link
EP (1) EP2011114A1 (de)
WO (1) WO2007112749A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859467A (zh) * 2019-01-30 2019-06-07 银江股份有限公司 一种交通模型中环境影响因子的挖掘分析方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185480B2 (en) 2008-04-02 2012-05-22 International Business Machines Corporation System and method for optimizing pattern recognition of non-gaussian parameters
TWI465122B (zh) 2009-01-30 2014-12-11 Dolby Lab Licensing Corp 自帶狀脈衝響應資料測定反向濾波器之方法
US10394985B2 (en) * 2017-01-11 2019-08-27 Samsung Electronics Co., Ltd. Apparatus and method for modeling random process using reduced length least-squares autoregressive parameter estimation
CN109389979B (zh) * 2018-12-05 2022-05-20 广东美的制冷设备有限公司 语音交互方法、语音交互系统以及家用电器
US11428839B2 (en) * 2018-12-28 2022-08-30 Carbo Ceramics Inc. Systems and methods for detecting a proppant in a wellbore
CN116866124A (zh) * 2023-07-13 2023-10-10 中国人民解放军战略支援部队航天工程大学 一种基于基带信号时间结构的盲分离方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2007112749A1 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859467A (zh) * 2019-01-30 2019-06-07 银江股份有限公司 一种交通模型中环境影响因子的挖掘分析方法

Also Published As

Publication number Publication date
WO2007112749A1 (en) 2007-10-11

Similar Documents

Publication Publication Date Title
EP0689194B1 (de) Verfahren und Vorrichtung zur Signalerkennung unter Kompensation von Fehlzusammensetzungen
Sankar et al. A maximum-likelihood approach to stochastic matching for robust speech recognition
EP0886263B1 (de) An Umgebungsgeräusche angepasste Sprachverarbeitung
US9881631B2 (en) Method for enhancing audio signal using phase information
KR100549133B1 (ko) 노이즈 감소 방법 및 장치
JP3919287B2 (ja) 連続する入力音声フレームの観測されたシーケンスによって構成される音声信号を等化するための方法および装置
CN101647061B (zh) 用于语音增强的噪声方差估计器
EP2011114A1 (de) Signalanalyseverfahren mit einem nichtgausschen autoregressiven modell
WO2011037587A1 (en) Downsampling schemes in a hierarchical neural network structure for phoneme recognition
US8737460B2 (en) Equalizer and detector arrangement employing joint entropy-based calibration
González et al. MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition
US20020010578A1 (en) Determination and use of spectral peak information and incremental information in pattern recognition
JP4891805B2 (ja) 残響除去装置、残響除去方法、残響除去プログラム、記録媒体
KR20110024969A (ko) 음성신호에서 통계적 모델을 이용한 잡음 제거 장치 및 방법
Li et al. Efficient blind system identification of non-Gaussian autoregressive models with HMM modeling of the excitation
Akarsh et al. Speech enhancement using non negative matrix factorization and enhanced NMF
Zhao An EM algorithm for linear distortion channel estimation based on observations from a mixture of gaussian sources
Tang et al. Speech Recognition in High Noise Environment.
White et al. Reduced computation blind equalization for FIR channel input Markov models
Nedel et al. Duration normalization for improved recognition of spontaneous and read speech via missing feature methods
Li Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods-with Applications to Speech Processing and Channel Estimation
Li et al. Paper F
Niazadeh et al. ISI sparse channel estimation based on SL0 and its application in ML sequence-by-sequence equalization
Wang et al. RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function
Yang et al. HB-DTW: Hyperdimensional Bayesian dynamic time warping for non-uniform Doppler

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081104

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ANDERSEN, SOREN, VANG

Inventor name: CHUNJIAN, LI

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ANDERSEN, SOREN, VANG

Inventor name: CHUNJIAN, LI

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110226