EP2363853A1 - A method for estimating the clean spectrum of a signal - Google Patents

A method for estimating the clean spectrum of a signal Download PDF

Info

Publication number
EP2363853A1
EP2363853A1 EP10450036A EP10450036A EP2363853A1 EP 2363853 A1 EP2363853 A1 EP 2363853A1 EP 10450036 A EP10450036 A EP 10450036A EP 10450036 A EP10450036 A EP 10450036A EP 2363853 A1 EP2363853 A1 EP 2363853A1
Authority
EP
European Patent Office
Prior art keywords
signal
spectrum
coefficients
noise
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10450036A
Other languages
German (de)
French (fr)
Inventor
Luis Weruaga
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovationsagentur GmbH
Osterreichische Akademie der Wissenschaften
Original Assignee
Innovationsagentur GmbH
Osterreichische Akademie der Wissenschaften
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovationsagentur GmbH, Osterreichische Akademie der Wissenschaften filed Critical Innovationsagentur GmbH
Priority to EP10450036A priority Critical patent/EP2363853A1/en
Publication of EP2363853A1 publication Critical patent/EP2363853A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients

Definitions

  • the present invention relates to a method for estimating the clean spectrum of a signal degraded by additive noise, in particular a speech signal, by determining the coefficients of a predictive model of said clean spectrum.
  • the invention further relates to a method for enhancing a signal based on this clean spectrum estimation.
  • the enhancement of speech by digital signal processing means improves the quality and intelligibility of voice communication for a wide fan of applications, such as mobile telephony, hearing aids, teleconference systems, dictation systems, voice coders and automatic speech recognition systems.
  • minimizing is intended to comprise both, making the cost function minimal as well as making the cost function at least a sufficiently low value, i.e. a value within a given or acceptable tolerance interval from that minimum.
  • the biological hearing sense responds to the logarithm of the sound intensity.
  • the invention is based on the insight that this bio-acoustic principle of logarithmic sense can be introduced into a novel cost function as stated above which takes into account the actual signal-to-noise ratio in each portion of the signal spectrum.
  • the proposed cost function fits the model to the data for those regions with high SNR, and - as will be detailed later on - in low-SNR areas the fitting process is driven by the mentioned good fitting performance taking place on adjacent high-SNR areas.
  • the inventive method thus leads to an interpolation effect from high-SNR to low-SNR spectral regions.
  • said equation can be solved by holding E ( ⁇ ) and M ( ⁇ ) constant, solving the remaining linear problem, using the solution to re-evaluate the previous constant terms, and proceeding further iteratively.
  • the method of the invention is suited for any predictive model known in the art.
  • a parametric all-pole filter model an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model is used.
  • ARC autoregressive coefficients filter
  • RC reflection coefficients filter
  • LSF line spectral frequencies
  • a method for enhancing a digital signal, in particular a speech signal, with increased quality comprises the further steps of
  • the signal is enhanced by means of a Wiener filter, a MMSE-based enhancement, or variants thereof, using said spectral signal-to-noise ratio.
  • the inventor has found out analytically that the IWF method is equivalent to a method that results from the following minimization problem arg min a ⁇ ⁇ 2 ⁇ ⁇ ⁇
  • the inventor has found out that a functional built on the ratio between the samples and the model, such as in (1), does not possess the desirable property of frequency selectivity while such a property would be desirable when not all spectral samples are available:
  • the spectral samples at which the a priori SNR is low or very low do not represent a trustful reference for the estimation of the autoregressive model.
  • the method of the present invention for estimating the clean speech spectrum is related to the minimization of the maximum likelihood (ML) of the ratio between the input noisy spectrum X ( ⁇ ) and the model of clean speech corrupted by additive noise.
  • ML maximum likelihood
  • X( ⁇ ) is modelled by a Gaussian distribution
  • said maximum likelihood estimation turns out arg min a ⁇ ⁇ 2 ⁇ ⁇ ⁇
  • the clean speech follows the autoregressive model defined in (2)
  • a is the vector containing the autoregressive coefficients
  • S v ( ⁇ ) is the power spectral density of the noise which is available a priori.
  • the spectral mask is defined in terms of the a-priori signal-to-noise ratio for each frequency ("spectral" signal-to-noise ratio), SNR( ⁇ ).
  • equation (4) Since equation (4) is nonlinear with respect to the autoregressive coefficients, its solution must and can be obtained by means of an iterative procedure, in which at each iteration a positive-definite Toeplitz linear system must be solved.
  • Several techniques are available to solve Toeplitz systems, such as the well-known Levinson algorithm.
  • the spectral mask (7) weights the importance of the spectral error between the noisy samples and the model of clean speech plus additive noise. This weight at each frequency depends on the respective signal-to-noise ratio.
  • the spectral mask is close to 1 at that frequency, and the information at that frequency is valuable in the estimation.
  • the spectral mask tends to zero, which implies that the relevance of the information at the frequency is low.
  • the spectral mask, the signal-to-noise ratio, and therewith the clean speech model are estimated in an iterative fashion.
  • the final solution is obtained either after several iterations or when successive partial solutions do not differ from each other substantially.
  • the noise-substracted power spectrum can be
  • the notation in the integrals refers to ⁇ ⁇ M ⁇ ⁇ ⁇ ⁇ ⁇ - ⁇ ⁇ M ⁇ ⁇ where M ⁇ ⁇ is the spectral weight (mask) M ( ⁇ ) at the K iteration. Since the spectral weight is present in all terms of the inverse problem (8f), its effect is that of weighting the relevance of the spectral samples. The magnitude of the weight depends on the local SNR ⁇ ⁇ , such that in areas with high SNR >> 1) the spectral weight tends to one, while in low-SNR areas ( ⁇ ⁇ ⁇ 1) it tends to zero. Note as comparison that in the noiseless case the spectral weight turns one for all frequencies, this meaning that the noiseless case need not require spectral selectivity.
  • step (8f) is a linear inverse problem involving a positive-semidefinite symmetric Toeplitz system.
  • it can be efficiently solved with the Levinson algorithm or any other algorithm to solve Toeplitz systems.
  • Fig. 1 shows in a simplified fashion the processing-block diagram of a speech enhancement front-end (apparatus 100) that uses the method of the present invention.
  • Fig. 2 shows the function of the clean speech estimation step (block 40) of Fig. 1 in detail.
  • Block 10 performs the usual segmentation of the input digital signal into segments.
  • Block 20 performs the spectral transformation of said segment.
  • Said spectral transformation corresponds to the "Discrete Fourier Transform", “Discrete Sinus Transform” and/or to the “Fan-Chirp Transform”, among other popular choices.
  • Block 30 carries out the estimation of the power spectrum of the noise according to known ad-hoc techniques. It is assumed that this block has memory facilities in such a way that the spectrum of the previous segments are stored therein. Therefore, if required, the estimation of the noise power spectrum can be performed by statistical methods over spectral data stretching within a reasonably long time span.
  • Block 40 carries out the estimation of the clean speech model from the spectrum of the segment and the estimation of the noise power spectrum.
  • the estimation of the clean speech model is based on the numerical implementation of the minimization problem (3), which represents the core method of the present invention.
  • Block 50 computes numerically the signal-to-noise ratio for each frequency (spectral signal-to-noise ratio) from the estimated clean speech model and noise model.
  • Block 60 enhances the spectrum of the input signal by means of state-of-art techniques that require the signal-to-noise ratio for each frequency.
  • Wiener filter and its variants e.g. the root-square of the Wiener filter
  • MMSE minimum-mean-square-error
  • its variants e.g. the log-MMSE, et cet. (see P. J. Wolfe and S. J. Godsill, loc.cit.).
  • Block 70 performs the inverse spectral transformation to block 20.
  • the output of block 70 is the enhanced segment of the audio signal.

Abstract

The invention proposes a method for estimating the clean spectrum of a signal degraded by additive noise, in particular a speech signal, by determining the coefficients of a predictive model of said clean spectrum, comprising:
computing the spectrum of said signal;
estimating the power spectrum of said noise; and
determining said coefficients by minimizing the cost function 2 π | X ω | 2 H ω 2 + S V ω - log | X ω | 2 H ω 2 + S V ω
Figure imga0001

with respect to said coefficients, with
X (ω) being the spectrum of said signal,
Sv (ω) being the power spectrum of said noise, and
H(ω) being the transfer function of said model based on said coefficients.

Description

  • The present invention relates to a method for estimating the clean spectrum of a signal degraded by additive noise, in particular a speech signal, by determining the coefficients of a predictive model of said clean spectrum. The invention further relates to a method for enhancing a signal based on this clean spectrum estimation.
  • Restoration of single-channel digital audio recordings degraded by additive noise is a technical problem that currently arouses large interest from scientific and commercial points of view. The enhancement of speech by digital signal processing means improves the quality and intelligibility of voice communication for a wide fan of applications, such as mobile telephony, hearing aids, teleconference systems, dictation systems, voice coders and automatic speech recognition systems.
  • Among different solutions proposed for the enhancement of noisy speech, restoration of short-time speech spectrum has been extensively studied, see e.g. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator", IEEE Trans. Acoust., Speech, Signal Processing, Vol. 32, No. 6, pp. 1109- 1121, 1984; B. Sim, Y. Tong, J. Chang, and C. Tan, "A parametric formulation of the generalized spectral subtraction method", IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 4, pp. 328-337, Jul. 1998; P. J. Wolfe and S. J. Godsill, "Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement", EURASIP J. Applied Signal Processing, Vol. 2003, No. 10, pp. 1043-1051, 2003; and P. C. Loizou, "Speech enhancement: Theory and practice", CRC Press, 2007. This approach is based on estimation of the short-time spectral amplitude of the clean speech from an estimate of the signal-to-noise ratio (SNR) at each frequency. In other cases, the clean speech is assumed to follow a parametric model, such as an autoregressive model (AR); upon the estimation of that model, an enhancement filter, such as the Wiener filter, is employed to enhance the noisy signal, see J. H. L. Hansen and M. A. Clements, "Constrained iterative speech enhancement with application to speech, recognition", IEEE Trans. Signal Processing, Vol. 39, No. 4, pp. 795-805, Apr. 1991. In all cases, an accurate estimation of the power spectrum of the noise is required. This can be accomplished by several techniques, such as minimum statistics, tracking of the spectral floor, or by detecting silences in the speech activity (P. C. Loizou, l.c).
  • The biggest technical challenge in this problem is thus to obtain the a priori signal-to-noise ratio at each frequency. Since the noise is assumed to be available with state-of-the art techniques, the previous challenge is equivalent to the estimation of the clean speech spectrum from the available noisy spectrum. This problem has coped the efforts of many researchers in the last twenty five years: a decision-directed method (Y. Ephraim and D. Malah, loc.cit.), subspace methods (P. C. Loizou, loc.cit.), iterative Wiener filter (J. H. L. Hansen and M. A. Clements, loc.cit.), or Kalman filters (see e.g. E. Za-varehei, S. Vaseghi, and Q. Yan, "Speech enhancement using Kalman filters for restoration of short-time DFT trajectories," IEEE Workshop Automatic Speech Recognition and Understanding, 2005, pp. 313-318.) are some of the most popular techniques thereto. From the previous techniques, the iterative Wiener filter is particularly interesting because it aims to estimate the clean speech spectrum only from the current noisy spectrum, combining iteratively Wiener filtering (IWF) with autoregressive analysis. The problem of that technique is its tendency to generate high resonance peaks, which introduces an unpleasant distortion in the enhanced speech. Further attempts for stabilizing the IWF have been made in T. V. Sreenivas and P. Kirnapure, "Codebook constrained Wiener filtering for speech enhancement," IEEE Trans. Speech, Audio Processing, Vol. 4, No. 5, pp. 383-389, Sep. 1996, but the performance of this technique and its variants is still clearly insufficient.
  • One possible solution to the problem of estimating parameters of a predictive clean speech model has been disclosed in the earlier application WO 2008/109904 Al of the same applicant. This prior solution fails to estimate the clean speech spectrum in cases where the signal-to-noise (SNR) ratio of the signal is low. Likewise, the mentioned IWF method is not appropriate in such applications.
  • It is therefore an object of the invention to provide a method for estimating the clean spectrum of a noise-corrupted signal with improved accuracy.
  • This object is achieved by means of a method for estimating the clean spectrum of a signal degraded by additive noise, in particular a speech signal, by determining the coefficients of a predictive model of said clean spectrum, comprising:
    • computing the spectrum of said signal;
    • estimating the power spectrum of said noise; and
    • determining said coefficients by minimizing the cost function 2 π | X ω | 2 H ω 2 + S V ω - log | X ω | 2 H ω 2 + S V ω
      Figure imgb0001

      with respect to said coefficients, with
    • X(ω) being the spectrum of said signal,
    • Sv (ω) being the power spectrum of said noise, and
    • H(ω) being the transfer function of said model based on said coefficients.
  • In the present disclosure the term "minimizing" is intended to comprise both, making the cost function minimal as well as making the cost function at least a sufficiently low value, i.e. a value within a given or acceptable tolerance interval from that minimum.
  • From the field of bioacoustics it is known that the biological hearing sense responds to the logarithm of the sound intensity. The invention is based on the insight that this bio-acoustic principle of logarithmic sense can be introduced into a novel cost function as stated above which takes into account the actual signal-to-noise ratio in each portion of the signal spectrum. Loosely speaking the proposed cost function fits the model to the data for those regions with high SNR, and - as will be detailed later on - in low-SNR areas the fitting process is driven by the mentioned good fitting performance taking place on adjacent high-SNR areas. The inventive method thus leads to an interpolation effect from high-SNR to low-SNR spectral regions.
  • According to a preferred embodiment of the invention said cost function is minimized by solving the equation - π π M ω E ω A ω e jωl = 0
    Figure imgb0002

    with A(ω) being the predictive model based on its coefficients am according to A ω = m = 0 M a m e - jωm ,
    Figure imgb0003

    E(ω) being the prediction error according to E ω = | X ω | 2 - H ω 2 - S V ω ,
    Figure imgb0004

    and M(ω) being a spectral mask defined as M ω = SNR ω SNR ω + 1 2 with SNR ω = H ω 2 S V ω .
    Figure imgb0005
  • In particular said equation can be solved by holding E(ω) and M(ω) constant, solving the remaining linear problem, using the solution to re-evaluate the previous constant terms, and proceeding further iteratively.
  • In general, the method of the invention is suited for any predictive model known in the art. Preferably, a parametric all-pole filter model, an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model is used.
  • In a second aspect of the invention a method for enhancing a digital signal, in particular a speech signal, with increased quality is provided. The inventive method comprises the further steps of
  • calculating a spectral signal-to-noise ratio on the basis of the clean spectrum and the noise spectrum, and
  • using the spectral signal-to-noise ratio to enhance the signal.
  • Preferably, the signal is enhanced by means of a Wiener filter, a MMSE-based enhancement, or variants thereof, using said spectral signal-to-noise ratio.
  • Further details and advantages of the invention will become apparent from the appended claims and the following detailed description of a preferred embodiment under reference to the enclosed drawings in which
    • Fig. 1 shows in block diagram form an apparatus for enhancing a digital speech signal, the blocks concurrently illustrating the steps of the method of the invention, and
    • Fig. 2 shows the function of the clean speech estimation block and step of Fig. 1 in detail.
  • As a first basis of the present invention, the inventor has found out analytically that the IWF method is equivalent to a method that results from the following minimization problem arg min a 2 π | X ω | 2 H ω 2 + S V ω
    Figure imgb0006

    where ω is frequency, X(ω) is the Fourier transform of a short-time segment of the input noisy signal, Sv (ω) is the estimate of the noise power spectral density, and H(ω) is the transfer function of the autoregressive model which relates to the clean speech spectrum. The said transfer function of the autoregressive model is equal to H ω = 1 m = 0 M a m e - jωm
    Figure imgb0007

    where a = (ao, a1, ..., aM] are the autoregressive coefficients, and M is the autoregressive model order.
  • As a second basis of the invention, the inventor has found out that a functional built on the ratio between the samples and the model, such as in (1), does not possess the desirable property of frequency selectivity while such a property would be desirable when not all spectral samples are available: In case of the spectrum of the noisy signal X(ω), the spectral samples at which the a priori SNR is low or very low do not represent a trustful reference for the estimation of the autoregressive model.
  • To this end, the method of the present invention for estimating the clean speech spectrum is related to the minimization of the maximum likelihood (ML) of the ratio between the input noisy spectrum X(ω) and the model of clean speech corrupted by additive noise. Assuming that X(ω) is modelled by a Gaussian distribution, said maximum likelihood estimation turns out arg min a 2 π | X ω | 2 H ω 2 + S V ω - log | X ω | 2 H ω 2 + S V ω
    Figure imgb0008

    where the clean speech follows the autoregressive model defined in (2), a is the vector containing the autoregressive coefficients, and Sv (ω) is the power spectral density of the noise which is available a priori.
  • By computing the gradient of the functional (3) with respect to the autoregressive coefficients a, one gets to the solution of this problem, given by the following equation - π π M ω E ω A ω e jωl = 0
    Figure imgb0009

    where A(ω) is the linear prediction error filter, defined in terms of the autoregressive coefficients as A ω = m = 0 M a m e - jωm
    Figure imgb0010

    E(ω) is the prediction error of the model according to E ω = | X ω | 2 - H ω 2 - S V ω
    Figure imgb0011

    and M(ω) is a so-called spectral mask defined as M ω = SNR ω SNR ω + 1 2
    Figure imgb0012
  • Here, the spectral mask is defined in terms of the a-priori signal-to-noise ratio for each frequency ("spectral" signal-to-noise ratio), SNR(ω). The a-priori (spectral) SNR is defined as the ratio between the clean speech power spectrum and the noise power spectrum, SNR ω = H ω 2 S V ω
    Figure imgb0013
  • Since equation (4) is nonlinear with respect to the autoregressive coefficients, its solution must and can be obtained by means of an iterative procedure, in which at each iteration a positive-definite Toeplitz linear system must be solved. Several techniques are available to solve Toeplitz systems, such as the well-known Levinson algorithm. One skilled in the art will immediately recognize that this choice does not affect the essence of the present invention. It is important to mention that the spectral mask (7) weights the importance of the spectral error between the noisy samples and the model of clean speech plus additive noise. This weight at each frequency depends on the respective signal-to-noise ratio. Thus, if the SNR is high at a given frequency, the spectral mask is close to 1 at that frequency, and the information at that frequency is valuable in the estimation. On the contrary, if the SNR is low, the spectral mask tends to zero, which implies that the relevance of the information at the frequency is low.
  • The spectral mask, the signal-to-noise ratio, and therewith the clean speech model are estimated in an iterative fashion. The final solution is obtained either after several iterations or when successive partial solutions do not differ from each other substantially.
  • One iterative approach to solve equation (4) will be discussed in detail. This approach is based on considering E(ω) and M(ω) constant, and solving the remaining linear problem; this partial solution is used to re-evaluate the previous constant terms, and proceeding further iteratively. Thus, the linear residue filter A(ω) is obtained with the following iterative algorithm S X , ω κ = 1 A ω κ 2
    Figure imgb0014
    ξ ω κ = S X , ω κ S V , ω
    Figure imgb0015
    M ω κ = ξ ω κ ξ ω κ + 1 2
    Figure imgb0016
    h l κ = M κ e jωl A ω κ *
    Figure imgb0017
    g l κ = M κ S V , ω A ω κ e jωl
    Figure imgb0018
    M κ X ω 2 A κ + 1 e e jωl = h l κ + g l κ
    Figure imgb0019

    for ℓ = 0, 1, ..., M, where subindex K denotes iteration, and superscript * complex conjugate. The noise-substracted power spectrum can be used as initial seed, i.e., S X , ω 1 = X ω 2 - S V , ω
    Figure imgb0020
  • The notation in the integrals refers to M κ - π π M ω κ
    Figure imgb0021

    where M ω κ
    Figure imgb0022
    is the spectral weight (mask) M (ω) at the K iteration. Since the spectral weight is present in all terms of the inverse problem (8f), its effect is that of weighting the relevance of the spectral samples. The magnitude of the weight depends on the local SNR ξω , such that in areas with high SNR >> 1) the spectral weight tends to one, while in low-SNR areas (ξ ω 1) it tends to zero. Note as comparison that in the noiseless case the spectral weight turns one for all frequencies, this meaning that the noiseless case need not require spectral selectivity.
  • Finally, the step (8f) is a linear inverse problem involving a positive-semidefinite symmetric Toeplitz system. Thus, it can be efficiently solved with the Levinson algorithm or any other algorithm to solve Toeplitz systems.
  • Fig. 1 shows in a simplified fashion the processing-block diagram of a speech enhancement front-end (apparatus 100) that uses the method of the present invention. Fig. 2 shows the function of the clean speech estimation step (block 40) of Fig. 1 in detail.
  • Block 10 performs the usual segmentation of the input digital signal into segments.
  • Block 20 performs the spectral transformation of said segment. Said spectral transformation corresponds to the "Discrete Fourier Transform", "Discrete Sinus Transform" and/or to the "Fan-Chirp Transform", among other popular choices.
  • Block 30 carries out the estimation of the power spectrum of the noise according to known ad-hoc techniques. It is assumed that this block has memory facilities in such a way that the spectrum of the previous segments are stored therein. Therefore, if required, the estimation of the noise power spectrum can be performed by statistical methods over spectral data stretching within a reasonably long time span.
  • Block 40 carries out the estimation of the clean speech model from the spectrum of the segment and the estimation of the noise power spectrum. The estimation of the clean speech model is based on the numerical implementation of the minimization problem (3), which represents the core method of the present invention.
  • Block 50 computes numerically the signal-to-noise ratio for each frequency (spectral signal-to-noise ratio) from the estimated clean speech model and noise model.
  • Block 60 enhances the spectrum of the input signal by means of state-of-art techniques that require the signal-to-noise ratio for each frequency. Among these techniques, we can cite the Wiener filter and its variants, e.g. the root-square of the Wiener filter, and the minimum-mean-square-error (MMSE)-based enhancement (see Y. Ephraim and D. Malah, loc.cit.) and its variants, e.g. the log-MMSE, et cet. (see P. J. Wolfe and S. J. Godsill, loc.cit.).
  • Block 70 performs the inverse spectral transformation to block 20. The output of block 70 is the enhanced segment of the audio signal.
  • Although all processor blocks of the apparatus 100 operate with time-discrete and frequency-discrete samples, for the sake of clarity the mathematical description of the invention has been given in continuous frequency. One skilled in the art will immediately recognize that this choice does not affect the essence of the present invention.

Claims (6)

  1. A method for estimating the clean spectrum of a signal degraded by additive noise, in particular a speech signal, by determining the coefficients of a predictive model of said clean spectrum, comprising:
    computing the spectrum of said signal;
    estimating the power spectrum of said noise; and
    determining said coefficients by minimizing the cost function 2 π | X ω | 2 H ω 2 + S V ω - log | X ω | 2 H ω 2 + S V ω
    Figure imgb0023

    with respect to said coefficients, with
    X (ω) being the spectrum of said signal,
    Sv (ω) being the power spectrum of said noise, and
    H(ω) being the transfer function of said model based on said coefficients.
  2. The method of claim 1, wherein said cost function is minimized by solving the equation - π π M ω E ω A ω e jωl = 0
    Figure imgb0024

    with A(ω) being the predictive model based on its coefficients am according to A ω = m = 0 M a m e - jωm ,
    Figure imgb0025

    E(ω) being the prediction error according to E ω = | X ω | 2 - H ω 2 - S V ω ,
    Figure imgb0026
    and M (ω) being a spectral mask defined as M ω = SNR ω SNR ω + 1 2 with SNR ω = H ω 2 S V ω .
    Figure imgb0027
  3. The method of claim 2, wherein said equation is solved by holding E(ω) and M(ω) constant, solving the remaining linear problem, using the solution to re-evaluate the previous constant terms, and proceeding further iteratively.
  4. The method of any of the claims 1 to 3, wherein said predictive model is a parametric all-pole filter model, an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model.
  5. The method of any of the claims 1 to 4, further for enhancing the signal, comprising the further steps of
    calculating a spectral signal-to-noise ratio on the basis of the clean spectrum and the noise spectrum, and
    using the spectral signal-to-noise ratio to enhance the signal.
  6. The method of claim 5, wherein the signal is enhanced by means of a Wiener filter, a MMSE-based enhancement, or variants thereof, using said spectral signal-to-noise ratio.
EP10450036A 2010-03-04 2010-03-04 A method for estimating the clean spectrum of a signal Withdrawn EP2363853A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP10450036A EP2363853A1 (en) 2010-03-04 2010-03-04 A method for estimating the clean spectrum of a signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP10450036A EP2363853A1 (en) 2010-03-04 2010-03-04 A method for estimating the clean spectrum of a signal

Publications (1)

Publication Number Publication Date
EP2363853A1 true EP2363853A1 (en) 2011-09-07

Family

ID=42316009

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10450036A Withdrawn EP2363853A1 (en) 2010-03-04 2010-03-04 A method for estimating the clean spectrum of a signal

Country Status (1)

Country Link
EP (1) EP2363853A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013061232A1 (en) * 2011-10-24 2013-05-02 Koninklijke Philips Electronics N.V. Audio signal noise attenuation
CN112562701A (en) * 2020-11-16 2021-03-26 华南理工大学 Heart sound signal double-channel self-adaptive noise reduction algorithm, device, medium and equipment
US20220358904A1 (en) * 2019-03-20 2022-11-10 Research Foundation Of The City University Of New York Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008031124A1 (en) * 2006-09-15 2008-03-20 Technische Universität Graz Apparatus for noise suppression in an audio signal
EP1970893A1 (en) * 2007-03-13 2008-09-17 Österreichische Akademie der Wissenschaften A method for estimating signal coding parameters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008031124A1 (en) * 2006-09-15 2008-03-20 Technische Universität Graz Apparatus for noise suppression in an audio signal
EP1970893A1 (en) * 2007-03-13 2008-09-17 Österreichische Akademie der Wissenschaften A method for estimating signal coding parameters
WO2008109904A1 (en) 2007-03-13 2008-09-18 Österreichische Akademie der Wissenschaften A method for estimating signal coding parameters

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
B. SIM; Y. TONG; J. CHANG; C. TAN: "A parametric formulation of the generalized spectral subtraction method", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 6, no. 4, July 1998 (1998-07-01), pages 328 - 337
E. ZA- VAREHEI; S. VASEGHI; Q. YAN: "Speech enhancement using Kalman filters for restoration of short-time DFT trajectories", IEEE WORKSHOP AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, 2005, pages 313 - 318
J. H. L. HANSEN; M. A. CLEMENTS: "Constrained iterative speech enhancement with application to speech, recognition", IEEE TRANS. SIGNAL PROCESSING, vol. 39, no. 4, April 1991 (1991-04-01), pages 795 - 805
K. FUNAKI: "Speech enhancement based on iterative Wiener filter using complex speech analysis", PROC. EUSIPCO 2008, 29 August 2008 (2008-08-29), Lausanne, Switzerland, pages 1 - 5, XP002593133, Retrieved from the Internet <URL:http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/papers/1569105040.pdf> [retrieved on 20100722] *
P. C. LOIZOU: "Speech enhancement: Theory and practice", 2007, CRC PRESS
P. J. WOLFE; S. J. GODSILL: "Efficient alternatives to the Ephraim and Malah suppression rule for au dio signal enhancement", EURASIP J. APPLIED SIGNAL PROCESSING, vol. 2003, no. 10, 2003, pages 1043 - 1051
T. V. SREENIVAS; P. KIRNAPURE: "Codebook constrained Wiener filtering for speech enhancement", IEEE TRANS. SPEECH, AUDIO PROCESSING, vol. 4, no. 5, September 1996 (1996-09-01), pages 383 - 389
Y. EPHRAIM; D. MALAH: "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator", IEEE TRANS. ACOUST., SPEECH, SIGNAL PROCESSING, vol. 32, no. 6, 1984, pages 1109 - 1121

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013061232A1 (en) * 2011-10-24 2013-05-02 Koninklijke Philips Electronics N.V. Audio signal noise attenuation
US9875748B2 (en) 2011-10-24 2018-01-23 Koninklijke Philips N.V. Audio signal noise attenuation
US20220358904A1 (en) * 2019-03-20 2022-11-10 Research Foundation Of The City University Of New York Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder
CN112562701A (en) * 2020-11-16 2021-03-26 华南理工大学 Heart sound signal double-channel self-adaptive noise reduction algorithm, device, medium and equipment

Similar Documents

Publication Publication Date Title
JP5068653B2 (en) Method for processing a noisy speech signal and apparatus for performing the method
US7313518B2 (en) Noise reduction method and device using two pass filtering
TWI420509B (en) Noise variance estimator for speech enhancement
EP0807305B1 (en) Spectral subtraction noise suppression method
Soon et al. Speech enhancement using 2-D Fourier transform
US20100023327A1 (en) Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain
CN110767244B (en) Speech enhancement method
WO2000017855A1 (en) Noise suppression for low bitrate speech coder
US7016839B2 (en) MVDR based feature extraction for speech recognition
US20130138437A1 (en) Speech recognition apparatus based on cepstrum feature vector and method thereof
Daqrouq et al. An investigation of speech enhancement using wavelet filtering method
Mellahi et al. LPC-based formant enhancement method in Kalman filtering for speech enhancement
EP2363853A1 (en) A method for estimating the clean spectrum of a signal
Lei et al. Speech enhancement for nonstationary noises by wavelet packet transform and adaptive noise estimation
Poovarasan et al. Speech enhancement using sliding window empirical mode decomposition and hurst-based technique
Gupta et al. Speech enhancement using MMSE estimation and spectral subtraction methods
Batina et al. Noise power spectrum estimation for speech enhancement using an autoregressive model for speech power spectrum dynamics
Hong et al. Independent component analysis based single channel speech enhancement
EP1635331A1 (en) Method for estimating a signal to noise ratio
Tran et al. Speech enhancement using modified IMCRA and OMLSA methods
Bolisetty et al. Speech enhancement using modified wiener filter based MMSE and speech presence probability estimation
Erkelens et al. Speech enhancement based on Rayleigh mixture modeling of speech spectral amplitude distributions
Gui et al. Adaptive subband Wiener filtering for speech enhancement using critical-band gammatone filterbank
Zavarehei et al. Speech enhancement in temporal DFT trajectories using Kalman filters.
Funaki Speech enhancement based on iterative wiener filter using complex speech analysis

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

AX Request for extension of the european patent

Extension state: AL BA ME RS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20120308