WO2006097886A1 - Noise power estimation - Google Patents

Noise power estimation Download PDF

Info

Publication number
WO2006097886A1
WO2006097886A1 PCT/IB2006/050771 IB2006050771W WO2006097886A1 WO 2006097886 A1 WO2006097886 A1 WO 2006097886A1 IB 2006050771 W IB2006050771 W IB 2006050771W WO 2006097886 A1 WO2006097886 A1 WO 2006097886A1
Authority
WO
WIPO (PCT)
Prior art keywords
power
noise
speech
estimate
speech signal
Prior art date
Application number
PCT/IB2006/050771
Other languages
French (fr)
Inventor
Ivo Batina
Jesper Jensen
Richard Heusdens
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006097886A1 publication Critical patent/WO2006097886A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present invention relates to noise estimation. More in particular, the present invention relates to a method of and a device for estimating the power spectral density of noise in a noisy speech signal.
  • Speech enhancement schemes typically require an estimate of the power spectral density of noise in order to extract a "clean" speech signal estimate from the noisy input speech signal.
  • Typical portable consumer devices in which speech enhancement may be used include, for example, mobile (cellular) telephones and electronic hearing aids. Prior art methods and devices do not allow effective speech enhancement or noise estimation to be carried out in portable consumer devices.
  • the present invention provides a method of estimating the power spectral density of noise in a speech signal, the method comprising the steps of: - determining the total power of the speech signal, determining the difference of said total power and a current estimate of said total power, providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, - deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein the above steps are carried out per time segment and per frequency range.
  • the method of the present invention can be carried out even by devices having relatively little computing power, such as portable consumer devices.
  • Kalman filtering The type of calculations made in the method of the present invention is known as Kalman filtering. For these calculations, several efficient algorithms are known, thus enabling an efficient computation. In particular, the recursive nature of these calculations enhance the efficiency.
  • the noise power estimates are made per frequency band (also called frequency bin). In this way, the power spectral density is obtained.
  • the calculations can be very efficient.
  • the total power of the speech signal is the combined power of the noise and the "clean" speech signal.
  • the estimate of the noise power and the speech power comprises both a noise power estimate and a separate speech power estimate.
  • the step of providing a new estimate of the noise power and the speech power involves multiplying the current estimate by a first gain, said first gain preferably being determined using a priori knowledge.
  • a priori knowledge may be used to improve the estimates.
  • This a priori knowledge may be obtained during (off-line) test sessions or may be based on scientific assumptions on the noise properties.
  • the first gain may be updated on-line, while the method is being executed, to track the local statistics of the underlying "clean" speech.
  • the step of providing a new estimate of the noise power and the speech power involves multiplying the difference by a second gain, said second gain preferably being determined using a priori knowledge.
  • the method of the present invention is carried out per time segment and per frequency band.
  • the speech signal is transformed using a short-time Fourier transform.
  • STFT short-time Fourier transform
  • an efficient transformation may be achieved.
  • the method may be carried out in the frequency domain.
  • This preferred embodiment may be summarized as Kalman filtering in the frequency domain. It is noted that Kalman filtering is conventionally carried out in the time domain, not in the frequency domain. By using Kalman filtering in the frequency domain, a more efficient and effective estimation of noise properties in a speech signal may be obtained. In particular, no assumptions have to be made on noise properties, such as the typical (but often unrealistic) assumption that the noise is autoregressive.
  • the present invention further provides a speech enhancement method, comprising the steps of: estimating the power spectral density of noise in a speech signal as defined above, and - using the estimated power spectral density to remove noise from the speech signal.
  • the amplitude of the noise is derived from the power spectral density. That is, the noise amplitude (the absolute value of the spectrum) instead of the power spectral density is estimated. Those skilled in the art will appreciate that the said amplitude is equal to the square root of the power spectral density.
  • the present invention additionally provides a computer program product for carrying out the method as defined above.
  • a computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD.
  • the set of computer executable instructions which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
  • the present invention also provides a device for estimating the power spectral density of noise in a speech signal, the device comprising: means for determining the total power of the speech signal, means for determining the difference of said total power and a current estimate of said total power, means for providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, means for deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein said means are arranged for estimating the power spectral density of noise per time segment and per frequency range.
  • the present invention provides a speech enhancement device, comprising: means for estimating the power spectral density of noise in a speech signal as defined above, and means for using the estimated power spectral density to remove noise from the speech signal.
  • Fig. 1 schematically shows a noise power estimating device according to the present invention.
  • Fig. 2 schematically shows a speech enhancement device according to the present invention.
  • the noise power estimating device 1 shown merely by way of non- limiting example in Fig. 1 comprises a squaring unit 11, a first combination unit 12, a (Kalman) gain unit 13, a second combination unit 14, a delay unit 15, a power factor unit 16, and a further gain unit 17.
  • the squaring unit 11 receives the absolute value (magnitude)
  • the frequency spectrum Y(k,m) is a function of the frequency range (that is, frequency band or bin) represented by the frequency range index k and the time segment (that is, time frame or other time segment) represented by the time segment index m. That is, a frequency spectrum value Y(k,m) is made available for each frequency range k and each time segment m.
  • may be produced by a short-time Fourier transform (STFT) which is well known in the art.
  • STFT short-time Fourier transform
  • the squaring unit 11 outputs the total power
  • the first combination unit 12 determines the difference of this power and the current estimate of the total power. This current estimate is produced by the power factor unit 16 on the basis of the noise power and the speech power, as will be explained later in more detail.
  • the difference of the total power and its estimate is fed to the gain unit 13 where it is multiplied by a gain K.
  • This gain K is the so-called Kalman gain, as will be explained later in more detail, and may be based upon a priori knowledge.
  • the result of this multiplication is fed to the second combination unit 14, where it is added to the output of the unit 17.
  • the result of this addition is the new estimate of the noise power and the speech power: x(k, m+1) or, more accurately, x(k, m + 1) , where A indicates that the value is an estimate.
  • x(k,m) is a vector which is indicative of the power spectrum of the noise and of the "clean" speech in a frequency range (or bin) k and in a time segment (or frame) m.
  • the delay unit 15 is coupled to the second combination unit 14 so as to produce a delayed version of the new estimate x(k, m+1), that is, the current estimate x(k, m).
  • This current estimate is fed to the power factor unit 16 where it is multiplied by a factor C so as to produce the current estimate of the total power (spectrum) Cx(k,m).
  • the current estimate x(k, m) of the noise power and the speech power is also fed to the gain unit 17 where it is multiplied by a factor A.
  • the result of this multiplication is also fed to the second combination unit 14.
  • the method of the present invention as carried out by the device 1 of Fig. 1, can be expressed mathematically as follows. Assuming that the noisy speech signal y(n) is transformed by the short-term Fourier transform (STFT) to the "short-time” frequency domain (also called “time-frequency domain”), the resulting (total or combined) frequency spectrum may be written as:
  • STFT short-term Fourier transform
  • Y, S and N denote STFT coefficients of the noisy speech signal, the speech and the noise respectively.
  • k denotes the frequency range index and m represents the time segment index.
  • the power spectrum of the signal y(n) may be modeled as:
  • K(Jc, m) is the Kalman gain for frequency range k and time segment m.
  • the Kalman gain K(Jc, m) can be written as:
  • K(k,m) A(k)Q e (k,m)C T (C(2Q e (k,m) + Q(k,m))C T )- i (4)
  • Q e (k,m) and Q(k,m) define the variance of the estimation error.
  • the Kalman gain may be pre-computed, for example using a priori knowledge, or may be determined experimentally.
  • the non-zero coefficients of A(Jc) may be determined experimentally, for example by numerical optimization for a large, "clean" speech sample. It has been found that a suitable value of a n is 1, while suitable values for a s (k) may be given by:
  • ⁇ (A:) L - k + 1 with L indicating the number of frequency ranges, typically the number of FFT bins.
  • the speech enhancement device 10 of figure 2 comprises a short-time Fourier transform (STFT) unit 2, a noise power estimating unit 1 and a speech enhancement unit 3.
  • STFT short-term Fourier transform
  • STFT receives a noisy signal y(n) which is a time signal, n being the sample number.
  • the unit 2 transforms this time signal y(n) into a frequency spectrum Y(k, m) and its absolute value
  • is then fed to the noise power estimating (NPE) unit 1, which preferably corresponds with the noise power estimating device 1 of Fig. 1.
  • NPE noise power estimating
  • Both the estimated noise power and the speech signal y(n) are fed to the speech enhancement (SE) unit 3 which may apply a known speech enhancement algorithm, such as a short-time spectral amplitude (STSA) algorithm.
  • STSA short-time spectral amplitude
  • This unit 3 outputs a spectral amplitude
  • may be transformed into a speech time signal s(n) using means well known in the art, such as an inverse STFT.
  • of the frequency spectrum may be fed to the speech enhancement (SE) unit 3, in which case the output of the STFT unit 2 is connected to the SE unit 3.
  • SE speech enhancement
  • the squaring (SQR) unit 11 shown in Fig. 1 may be omitted from the device 1 if the power spectrum
  • the present invention provides both devices, as illustrated in Figs. 1 and 2, and methods, as carried out by the exemplary devices of Figs. 1 and 2. Accordingly, the present invention provides a method of estimating the power spectral density of noise in a speech signal, the method comprising the steps of: determining the total power of the speech signal, as determined in Fig. 1 by the squaring unit 11, determining the difference of said total power and a current estimate of said total power, as determined in Fig. 1 by the combination unit 12, providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, as provided in Fig.
  • the present invention is based upon the insight that Kalman filtering in the frequency domain provides an efficient way to estimate noise properties in a noisy speech a signal.
  • the present invention benefits from the further insight that a short-time Fourier transform is particularly suitable for pre-processing speech samples for Kalman filtering in the frequency domain.

Abstract

A method of estimating the power spectral density of noise in a speech signal uses filtering in the frequency domain. The current power of the speech signal is determined per time segment and per frequency range, the difference of the current power and a current estimate of the power is determined, and then a new estimate of the noise power is provided on the basis of said difference and said current estimate. A speech enhancement method may include these method steps. A device (1) for estimating the power spectral density of noise in the speech signal may be comprised in portable consumer apparatus.

Description

Noise power estimation
The present invention relates to noise estimation. More in particular, the present invention relates to a method of and a device for estimating the power spectral density of noise in a noisy speech signal.
It is well known to estimate the amount of noise in speech signals. As it is very difficult to separate noise from speech, conventional methods estimate noise properties in the pauses between speech, that is, between words. In the (assumed) absence of speech, noise properties such as the amplitude or power spectrum can be determined relatively easily. If it is assumed that the noise in the pauses between speech is identical to the noise during speech (stationary noise), an accurate speech estimation may be obtained. However, experiments have shown that natural noise, for example the noise heard in a moving vehicle, is not necessarily stationary and that speech estimates during pauses are therefore unreliable predictors of the noise during speech. It is therefore desirable to estimate speech properties during speech. The paper by Rainer Martin, "Noise power spectral density and estimation based on optimal smoothing and minimum statistics", IEEE Transactions on Speech and Audio Processing, 9(5): 504-512, July 2001, discloses a method for noise estimation in which no distinction is made between speech activity and speech pause. This known method, commonly known as the Minimum Statistics (MS) method, is designed to be combined with speech enhancement schemes. The Minimum Statistics method is based on tracking of spectral minima of the noisy speech power spectrum. This known method is computationally complex and displays a slow update of the noise estimate in the case of a sudden rise in the noise energy level. Consequently, this method is mainly applicable for slowly varying noise sources. Speech enhancement schemes typically require an estimate of the power spectral density of noise in order to extract a "clean" speech signal estimate from the noisy input speech signal. In many applications, it is desirable to enhance speech in real-time. This requires methods having a relatively small computational complexity. This is particularly the case when the speech enhancement is to be carried out by a relatively small device having a limited computing power, such as a portable consumer device. Typical portable consumer devices in which speech enhancement may be used include, for example, mobile (cellular) telephones and electronic hearing aids. Prior art methods and devices do not allow effective speech enhancement or noise estimation to be carried out in portable consumer devices.
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a method and a device which allow noise properties to be estimated from a noisy speech signal, that is, during speech activity. It is a further object of the present invention to provide a method and a device for noise power estimation which have a low computational complexity and are therefore suitable for real-time applications.
Accordingly, the present invention provides a method of estimating the power spectral density of noise in a speech signal, the method comprising the steps of: - determining the total power of the speech signal, determining the difference of said total power and a current estimate of said total power, providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, - deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein the above steps are carried out per time segment and per frequency range.
By providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, new estimates of the noise power can be obtained very efficiently, requiring a relatively small number of calculations. Accordingly, the method of the present invention can be carried out even by devices having relatively little computing power, such as portable consumer devices.
The type of calculations made in the method of the present invention is known as Kalman filtering. For these calculations, several efficient algorithms are known, thus enabling an efficient computation. In particular, the recursive nature of these calculations enhance the efficiency.
By carrying out the method per time segment and per frequency range, the noise power estimates are made per frequency band (also called frequency bin). In this way, the power spectral density is obtained. By carrying out the method per time segment, for example per time frame, the calculations can be very efficient.
It is noted that the total power of the speech signal is the combined power of the noise and the "clean" speech signal. The estimate of the noise power and the speech power comprises both a noise power estimate and a separate speech power estimate.
It is preferred that the step of providing a new estimate of the noise power and the speech power involves multiplying the current estimate by a first gain, said first gain preferably being determined using a priori knowledge. In this way, a priori knowledge may be used to improve the estimates. This a priori knowledge may be obtained during (off-line) test sessions or may be based on scientific assumptions on the noise properties. Alternatively, or additionally, the first gain may be updated on-line, while the method is being executed, to track the local statistics of the underlying "clean" speech.
It is further preferred that the step of providing a new estimate of the noise power and the speech power involves multiplying the difference by a second gain, said second gain preferably being determined using a priori knowledge.
As stated above, the method of the present invention is carried out per time segment and per frequency band. To achieve this, it is preferred that the speech signal is transformed using a short-time Fourier transform. By using a short-time Fourier transform (STFT), an efficient transformation may be achieved. In addition, the method may be carried out in the frequency domain. This preferred embodiment may be summarized as Kalman filtering in the frequency domain. It is noted that Kalman filtering is conventionally carried out in the time domain, not in the frequency domain. By using Kalman filtering in the frequency domain, a more efficient and effective estimation of noise properties in a speech signal may be obtained. In particular, no assumptions have to be made on noise properties, such as the typical (but often unrealistic) assumption that the noise is autoregressive. The present invention further provides a speech enhancement method, comprising the steps of: estimating the power spectral density of noise in a speech signal as defined above, and - using the estimated power spectral density to remove noise from the speech signal.
In an alternative embodiment, the amplitude of the noise is derived from the power spectral density. That is, the noise amplitude (the absolute value of the spectrum) instead of the power spectral density is estimated. Those skilled in the art will appreciate that the said amplitude is equal to the square root of the power spectral density.
The present invention additionally provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention also provides a device for estimating the power spectral density of noise in a speech signal, the device comprising: means for determining the total power of the speech signal, means for determining the difference of said total power and a current estimate of said total power, means for providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, means for deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein said means are arranged for estimating the power spectral density of noise per time segment and per frequency range. In addition, the present invention provides a speech enhancement device, comprising: means for estimating the power spectral density of noise in a speech signal as defined above, and means for using the estimated power spectral density to remove noise from the speech signal.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which: Fig. 1 schematically shows a noise power estimating device according to the present invention.
Fig. 2 schematically shows a speech enhancement device according to the present invention. The noise power estimating device 1 shown merely by way of non- limiting example in Fig. 1 comprises a squaring unit 11, a first combination unit 12, a (Kalman) gain unit 13, a second combination unit 14, a delay unit 15, a power factor unit 16, and a further gain unit 17.
The squaring unit 11 receives the absolute value (magnitude) |Y(k,m)| of the frequency spectrum Y(k,m) of a noisy speech signal y. The frequency spectrum Y(k,m) is a function of the frequency range (that is, frequency band or bin) represented by the frequency range index k and the time segment (that is, time frame or other time segment) represented by the time segment index m. That is, a frequency spectrum value Y(k,m) is made available for each frequency range k and each time segment m. The frequency spectrum value Y(k, m) or its absolute value |Y(k,m)| may be produced by a short-time Fourier transform (STFT) which is well known in the art.
The squaring unit 11 outputs the total power |Y(k,m)|2 of the frequency spectrum of the noisy speech signal, that is, the power of the speech signal including noise, for each frequency range and each time segment. The first combination unit 12 determines the difference of this power and the current estimate of the total power. This current estimate is produced by the power factor unit 16 on the basis of the noise power and the speech power, as will be explained later in more detail. The difference of the total power and its estimate is fed to the gain unit 13 where it is multiplied by a gain K. This gain K is the so-called Kalman gain, as will be explained later in more detail, and may be based upon a priori knowledge. The result of this multiplication is fed to the second combination unit 14, where it is added to the output of the unit 17. The result of this addition is the new estimate of the noise power and the speech power: x(k, m+1) or, more accurately, x(k, m + 1) , where A indicates that the value is an estimate.
It is noted that x(k,m) is a vector which is indicative of the power spectrum of the noise and of the "clean" speech in a frequency range (or bin) k and in a time segment (or frame) m. This will later be explained in more detail. The delay unit 15 is coupled to the second combination unit 14 so as to produce a delayed version of the new estimate x(k, m+1), that is, the current estimate x(k, m). This current estimate is fed to the power factor unit 16 where it is multiplied by a factor C so as to produce the current estimate of the total power (spectrum) Cx(k,m). The current estimate x(k, m) of the noise power and the speech power is also fed to the gain unit 17 where it is multiplied by a factor A. The result of this multiplication is also fed to the second combination unit 14.
The method of the present invention, as carried out by the device 1 of Fig. 1, can be expressed mathematically as follows. Assuming that the noisy speech signal y(n) is transformed by the short-term Fourier transform (STFT) to the "short-time" frequency domain (also called "time-frequency domain"), the resulting (total or combined) frequency spectrum may be written as:
Y(k,m) = S(k,m) + N(k,m) (1)
where Y, S and N denote STFT coefficients of the noisy speech signal, the speech and the noise respectively. As before, k denotes the frequency range index and m represents the time segment index.
Starting from equation (1), the power spectrum of the signal y(n) may be modeled as:
Figure imgf000007_0001
\ (Jc,m)' where C is a vector [1 1], x(k,m) is a vector in which λs(k,m) and λn(k,m) are the λn(k,m) speech variance and the noise variance respectively, and e(k,m) is an exponentially distributed random variable. Accordingly, the Kalman filtering equations for the device 1 of Fig. 1 can be written as:
x(k, m + Y) = A(Jc)x(Jc,m)
Figure imgf000007_0002
- Cx(Jc, m)) (3)
an (Jc) 0 where x denotes the estimate of x, where A(Jc) is the matrix and
0 as (Jc) where K(Jc, m) is the Kalman gain for frequency range k and time segment m. The Kalman gain K(Jc, m) can be written as:
K(k,m) = A(k)Qe(k,m)CT(C(2Qe(k,m) + Q(k,m))CT)-i (4) where Qe(k,m) and Q(k,m) define the variance of the estimation error. As noted above, however, the Kalman gain may be pre-computed, for example using a priori knowledge, or may be determined experimentally. The non-zero coefficients of A(Jc) may be determined experimentally, for example by numerical optimization for a large, "clean" speech sample. It has been found that a suitable value of an is 1, while suitable values for as(k) may be given by:
as (k) = 6.265 10"6A:2 -1.9163 10"3A: + 0.87941 for 1 < k < L/2 αs (A:) = 6.265 10"6η(A:)2 -1.9163 10"3η (A:) + 0.87941 for L/2 < k ≤ L
where η (A:) = L - k + 1 with L indicating the number of frequency ranges, typically the number of FFT bins.
It will be understood that the above numerical values are exemplary only and that other numerical values can be used without departing from the scope of the present invention.
The speech enhancement device 10 of figure 2 comprises a short-time Fourier transform (STFT) unit 2, a noise power estimating unit 1 and a speech enhancement unit 3. The short-term Fourier transform (STFT) unit 2 receives a noisy signal y(n) which is a time signal, n being the sample number. The unit 2 transforms this time signal y(n) into a frequency spectrum Y(k, m) and its absolute value |Y(k,m)|, which are dependent on both time (the time segment or frame index m) and frequency (the frequency range or frequency bin index k). The absolute value |Y(k,m)| is then fed to the noise power estimating (NPE) unit 1, which preferably corresponds with the noise power estimating device 1 of Fig. 1. Both the estimated noise power and the speech signal y(n) are fed to the speech enhancement (SE) unit 3 which may apply a known speech enhancement algorithm, such as a short-time spectral amplitude (STSA) algorithm. This unit 3 outputs a spectral amplitude |S(k,m)| representing a speech signal from which the noise has been substantially removed. The spectral amplitude |S(k,m)| may be transformed into a speech time signal s(n) using means well known in the art, such as an inverse STFT.
It is noted that instead of the signal y(n), the absolute value |Y(k,m)| of the frequency spectrum may be fed to the speech enhancement (SE) unit 3, in which case the output of the STFT unit 2 is connected to the SE unit 3. The squaring (SQR) unit 11 shown in Fig. 1 may be omitted from the device 1 if the power spectrum |Y(k,m)|2 is available. Instead, the squaring unit 11 may for example be part of the device 10 of Fig. 2.
It will be clear from the above description that the present invention provides both devices, as illustrated in Figs. 1 and 2, and methods, as carried out by the exemplary devices of Figs. 1 and 2. Accordingly, the present invention provides a method of estimating the power spectral density of noise in a speech signal, the method comprising the steps of: determining the total power of the speech signal, as determined in Fig. 1 by the squaring unit 11, determining the difference of said total power and a current estimate of said total power, as determined in Fig. 1 by the combination unit 12, providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, as provided in Fig. 1 by the units 13, 14, 15 and 17, and deriving a new estimate of said total power from the new estimate of the noise power and the speech power, as derived in Fig. 1 by the unit 16, wherein the above steps are carried out per time segment and per frequency range, as illustrated by the indices k and m.
The present invention is based upon the insight that Kalman filtering in the frequency domain provides an efficient way to estimate noise properties in a noisy speech a signal. The present invention benefits from the further insight that a short-time Fourier transform is particularly suitable for pre-processing speech samples for Kalman filtering in the frequency domain.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words "comprise(s)" and "comprising" are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.

Claims

CLAIMS:
1. A method of estimating the power spectral density of noise in a speech signal, the method comprising the steps of: determining the total power of the speech signal, determining the difference of said total power and a current estimate of said total power, providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein the above steps are carried out per time segment and per frequency range.
2. The method according to claim 1, wherein the step of providing a new estimate of the noise power and the speech power involves multiplying the current estimate by a first gain (A), said first gain preferably being determined using a priori knowledge.
3. The method according to claim 1, wherein the step of providing a new estimate of the noise power and the speech power involves multiplying said difference by a second gain (K), said second gain preferably being determined using a priori knowledge.
4. The method according to claim 1, wherein the speech signal is transformed using a short-time Fourier transform.
5. A speech enhancement method, comprising the steps of: - estimating the power spectral density of noise in a speech signal in accordance with to claim 1, and using the estimated power spectral density to remove noise from the speech signal.
6. The speech enhancement method according to claim 5, comprising the additional step of deriving the amplitude of the speech signal from its power spectral density.
7. A computer program product for carrying out the method according to claim 1 or claim 5.
8. A device for estimating the power spectral density of noise in a speech signal, the device comprising: means (11) for determining the total power of the speech signal, - means (12) for determining the difference of said total power and a current estimate of said total power, means (13, 14, 15, 17) for providing a new estimate of the noise power and the speech power using said difference and a current estimate of the noise power and the speech power, - means (16) for deriving a new estimate of said total power from the new estimate of the noise power and the speech power, wherein said means are arranged for estimating the power spectral density of noise per time segment and per frequency range.
9. The device according to claim 8, wherein the means for providing a new estimate of the noise power and the speech power are arranged for multiplying the current estimate by a first gain (A), said first gain preferably being determined using a priori knowledge.
10. The device according to claim 8, wherein the means for providing a new estimate of the noise power and the speech power are arranged for multiplying the difference by a second gain (K), said second gain preferably being determined using a priori knowledge.
11. The device according to claim 8, arranged for transforming the speech signal using a short-time Fourier transform.
12. A speech enhancement device (10), comprising: means (1) for estimating the power spectral density of noise in a speech signal in accordance with to claim 1, and means (3) for using the estimated power spectral density to remove noise from the speech signal.
13. The device according to claim 12, arranged for deriving the amplitude of the noise from its power spectral density.
14. A consumer device, comprising a device (1; 10) according to claim 8 or claim
12.
PCT/IB2006/050771 2005-03-16 2006-03-13 Noise power estimation WO2006097886A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05102067.5 2005-03-16
EP05102067 2005-03-16

Publications (1)

Publication Number Publication Date
WO2006097886A1 true WO2006097886A1 (en) 2006-09-21

Family

ID=36593632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050771 WO2006097886A1 (en) 2005-03-16 2006-03-13 Noise power estimation

Country Status (1)

Country Link
WO (1) WO2006097886A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2164066A1 (en) * 2008-09-15 2010-03-17 Oticon A/S Noise spectrum tracking in noisy acoustical signals

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659622A (en) * 1995-11-13 1997-08-19 Motorola, Inc. Method and apparatus for suppressing noise in a communication system
WO2005050623A1 (en) * 2003-11-12 2005-06-02 Telecom Italia S.P.A. Method and circuit for noise estimation, related filter, terminal and communication network using same, and computer program product therefor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659622A (en) * 1995-11-13 1997-08-19 Motorola, Inc. Method and apparatus for suppressing noise in a communication system
WO2005050623A1 (en) * 2003-11-12 2005-06-02 Telecom Italia S.P.A. Method and circuit for noise estimation, related filter, terminal and communication network using same, and computer program product therefor

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms; ETSI ES 202 050", ETSI STANDARDS, EUROPEAN TELECOMMUNICATIONS STANDARDS INSTITUTE, SOPHIA-ANTIPO, FR, vol. STQ-AURORA, no. V111, October 2002 (2002-10-01), XP014004538, ISSN: 0000-0001 *
CHUNJIAN LI ET AL: "Integrating Kalman filtering and multi-pulse coding for speech enhancement with a non-stationary model of the speech signal", SIGNALS, SYSTEMS AND COMPUTERS, 2004. CONFERENCE RECORD OF THE THIRTY-EIGHTH ASILOMAR CONFERENCE ON PACIFIC GROVE, CA, USA NOV. 7-10, 2004, PISCATAWAY, NJ, USA,IEEE, 7 November 2004 (2004-11-07), pages 2300 - 2304, XP010781136, ISBN: 0-7803-8622-1 *
I. BATINA, J. JENSEN, R. HEUSDENS: "KALMAN FILTERING BASED NOISE POWER SPECTRAL DENSITY ESTIMATION FOR SPEECH ENHANCEMENT", THIRD EUROPEAN SIGNAL PROCESING CONFERENCE, ANTALYA, TURKEY, 4 September 2005 (2005-09-04), XP007900788, Retrieved from the Internet <URL:http://www.ee.bilkent.edu.tr/~signal/defevent/papers/cr1834.pdf> [retrieved on 20060704] *
NEHORAI A ET AL: "A mapping result between Wiener theory and Kalman filtering for nonstationary processes", IEEE TRANSACTIONS ON AUTOMATIC CONTROL USA, vol. AC-30, no. 2, February 1985 (1985-02-01), pages 175 - 177, XP002392936, ISSN: 0018-9286 *
RAINER MARTIN: "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 9, no. 5, July 2001 (2001-07-01), XP011054118, ISSN: 1063-6676 *
WU W-R ET AL: "SUBBAND KALMAN FILTERING FOR SPEECH ENHANCEMENT", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, IEEE INC. NEW YORK, US, vol. 45, no. 8, August 1998 (1998-08-01), pages 1072 - 1083, XP000848629, ISSN: 1057-7130 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2164066A1 (en) * 2008-09-15 2010-03-17 Oticon A/S Noise spectrum tracking in noisy acoustical signals
US8712074B2 (en) 2008-09-15 2014-04-29 Oticon A/S Noise spectrum tracking in noisy acoustical signals

Similar Documents

Publication Publication Date Title
EP0809842B1 (en) Adaptive speech filter
US8467538B2 (en) Dereverberation apparatus, dereverberation method, dereverberation program, and recording medium
TWI420509B (en) Noise variance estimator for speech enhancement
US6108610A (en) Method and system for updating noise estimates during pauses in an information signal
US8712074B2 (en) Noise spectrum tracking in noisy acoustical signals
CN105788607B (en) Speech enhancement method applied to double-microphone array
US20070299655A1 (en) Method, Apparatus and Computer Program Product for Providing Low Frequency Expansion of Speech
US9026435B2 (en) Method for estimating a fundamental frequency of a speech signal
US7885810B1 (en) Acoustic signal enhancement method and apparatus
CN113096682A (en) Real-time voice noise reduction method and device based on mask time domain decoder
CN110797039B (en) Voice processing method, device, terminal and medium
EP4189677B1 (en) Noise reduction using machine learning
CN107969164B (en) Adaptive inter-channel discrimination rescaling filter
Garg Speech enhancement using long short term memory with trained speech features and adaptive wiener filter
US10297272B2 (en) Signal processor
Radfar et al. Monaural speech separation based on gain adapted minimum mean square error estimation
WO2006097886A1 (en) Noise power estimation
US8736359B2 (en) Signal processing method, information processing apparatus, and storage medium for storing a signal processing program
JP7152112B2 (en) Signal processing device, signal processing method and signal processing program
WO2021193637A1 (en) Fundamental frequency estimation device, active noise control device, fundamental frequency estimation method, and fundamental frequency estimation program
Deger et al. Subband DCT and EMD based hybrid soft thresholding for speech enhancement
Fattah et al. A ramp cosine cepstrum model for the parameter estimation of autoregressive systems at low SNR
Gruden et al. Using spectral subtraction for suppression of noise in speech signals with analog integrated circuits
Zhu et al. Noise-robust speech analysis using running spectrum filtering
CN115985337A (en) Single-microphone-based transient noise detection and suppression method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06711083

Country of ref document: EP

Kind code of ref document: A1

WWW Wipo information: withdrawn in national office

Ref document number: 6711083

Country of ref document: EP