EP0994463A2 - Post filter - Google Patents

Post filter Download PDF

Info

Publication number
EP0994463A2
EP0994463A2 EP99307954A EP99307954A EP0994463A2 EP 0994463 A2 EP0994463 A2 EP 0994463A2 EP 99307954 A EP99307954 A EP 99307954A EP 99307954 A EP99307954 A EP 99307954A EP 0994463 A2 EP0994463 A2 EP 0994463A2
Authority
EP
European Patent Office
Prior art keywords
frequency
formant
max
postfilter
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99307954A
Other languages
German (de)
French (fr)
Inventor
Alastair Black
Jacek Horos
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Oyj
Original Assignee
Nokia Mobile Phones Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Mobile Phones Ltd filed Critical Nokia Mobile Phones Ltd
Publication of EP0994463A2 publication Critical patent/EP0994463A2/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Definitions

  • This invention relates to a method and apparatus for postfiltering a digitally processed signal.
  • a compressed speech signal allows more information to be transmitted than an uncompressed signal
  • the quality of digitally compressed speech signals is often degraded by, for example, background noise, coding noise and by noise due to transmission over a channel.
  • the SNR also drops and the noise floor of the coding noise rises.
  • the noise floor of the coding noise rises.
  • the first technique uses noise spectral shaping at the speech encoder.
  • the idea behind spectral shaping is to shape the spectrum of the coding noise so that it follows the speech spectrum, otherwise known as the speech spectral envelope.
  • Spectrally shaped noise when coded, is less audible to the human ear due to the noise masking effect of the human auditory system.
  • noise spectral shaping alone is not sufficient to make the coding noise inaudible.
  • CELP Code Excited Linear Prediction
  • the second technique uses an adaptive postfilter at the speech decoder output and typically comprises a short term postfilter element and a long term postfilter element.
  • the purpose of the long term postfilter is to attenuate frequency components between pitch harmonic peaks.
  • the purpose of the short term postfilter is to accurately track the time-varying nature of the speech signal and suppress the noise residing in the spectral valleys.
  • the frequency response of the short term postfilter typically corresponds to a modified version of the speech spectrum where the postfilter has local minimums in the regions corresponding to the spectral valleys and local maximums at the spectral peaks, otherwise known as formant frequencies. The dips in the regions corresponding to the spectral valleys (i.e. local minimums) will suppress the noise, thereby accomplishing noise reduction.
  • a method for calculating a short term postfilter frequency response for filtering digitally processed speech comprising identifying at least one formant of the speech spectrum; and normalising points of the speech spectrum with respect to the magnitude of an identified formant.
  • the points of the speech spectrum are normalised with respect to the magnitude of the nearest formant.
  • k is a point in frequency
  • k min is the frequency of a spectral valley
  • k max is the frequency of a formant
  • controls the degree of postfiltering i.e controls the depth of the postfilter valleys.
  • the at least one formant is identified by finding a first derivative of the speech spectrum.
  • a postfiltering method for enhancing a digitally processed speech signal comprising obtaining a speech spectrum of the digitally processed signal; identifying at least one formant of the speech spectrum; normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; and filtering the speech spectrum of the digitally processed signal with the postfilter frequency response.
  • a postfilter comprising identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; means for filtering the digitally processed speech spectrum with the postfilter frequency response.
  • a radiotelephone comprising a postfilter, the postfilter having identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with the magnitude of an identified formant to produce a postfilter frequency response; means for filtering the digitally processed speech spectrum with the postfilter frequency response.
  • the embodiment of the invention described below is based on the postfiltering of a digitally processed signal by means of a time domain adaptive predictive coder, for example Residual Excited Linear Prediction (RELP) and CELP coders/decoders.
  • a time domain adaptive predictive coder for example Residual Excited Linear Prediction (RELP) and CELP coders/decoders.
  • this invention is equally applicable to the postfiltering of a digitally processed speech signal by means of a frequency domain coder/decoder, for example SBC and MBE coders/decoders.
  • Figure 1 shows a digital radiotelephone 1 having an antenna 2 for transmitting signals to and for receiving signals from a base station (not shown).
  • the antenna 2 supplies an encoded digital radio signal, which represents an audio signal transmitted from a calling party, to the receiver 3 which converts the low power radio frequency into a low frequency signal which is then demodulated.
  • the demodulated signal is then supplied to a decoder 4, which decodes the signal before passing the signal to the postfilter 5.
  • the postfilter 5 modifies the signal, as described in detail below, before passing the modified signal to a digital to analogue converter 6.
  • the analogue signal is then passed to a speaker 7 for conversion into an audio signal.
  • the signal is then passed to postfilter 5.
  • the signal is passed to a windowing function 8 which divides the signal into frames.
  • the frame size determines how often the frequency response of the postfilter is updated. That is to say, a larger frame size will result in a longer time between the recalculation of the postfilter frequency response than a shorter frame size.
  • a frame size of 80 samples is used which is windowed using a trapezoidal window function (i.e. a quadrilateral having only one pair of parallel sides).
  • the 80 samples correspond to 10ms when using a 8kHz sampling rate.
  • the process uses an overlap of 18 samples to remove the effect of the shape of the window function from the time domain signal.
  • the frame is padded with zeroes to give 128 data points.
  • the speech signal frames are then supplied to a Fast Fourier Transform function 9, which converts the time domain signal into the frequency domain using a 128 point Fast Fourier Transform.
  • the postfilter 5 has a Linear Prediction Coefficient filter 10, which typically has the same characteristics as the synthesis filter in the decoder 4.
  • An approximation of the speech signal is obtained by finding the impulse response of the LPC synthesis filter 10 using the transmitted LPC coefficients 19 and the pulse train 18.
  • the impulse response of LPC filter 10 is then supplied to a Fast Fourier Transform function 11, which converts the impulse response into the frequency domain using a 128 point Fast Fourier Transform in the same manner as described above.
  • the frequency transform of the impulse response provides an approximation of the spectral envelope of the speech signal.
  • time domain signal is converted into the frequency domain. This is relevant for time domain coders such as CELP and RELP. Frequency domain coders, however, need no such conversion.
  • the approximation of the spectral envelope of the speech signal is passed to a spectral envelope modifying function 13 and a formants identifying function 12.
  • the formants identifying function 12 uses the FFT output to identify the turning points of the spectral envelope by finding the first derivative on a spectral bin by spectral bin basis i.e. for each output point of the FFT function 11. This provides the positions of the maximum and minimums of the spectral envelope which correspond to the formants and spectral valleys respectively.
  • the formant identifying function 12 passes the positions of the formants that have been identified to the spectral envelope modifying function 13.
  • the modifying function 13 calculates the postfilter frequency response by normalising each point of the spectral envelope with respect to the magnitude of its nearest formant. If more than one formant has been identified each point of the spectral envelope can be normalised with reference to one of the formants, however preferably the normalisation of each point should be with respect to its nearest formant.
  • Equation 1 A preferred normalisation equation is shown in equation 1.
  • R post ( k ) ⁇ R k R form k where 0 ⁇ k ⁇ 64 Equation 1
  • the upper value of k is typically chosen to be half the Fast Fourier Transform. Therefore, in this embodiment the upper limit of k is 64.
  • R(k) is a point on the spectral envelope
  • R form (k) is the magnitude of the nearest formant
  • k is a point in frequency.
  • Equation 2 controls the degree of postfiltering (i.e. controls the depth of the postfilter valleys) and is preferably chosen to lie between 0.7 and 1.0. Equations 2 and 3 ensure that there is a gradual de-emphasis of the spectral valleys such that maximum attenuation occurs at the bottom of the valley.
  • Figure 3b shows a representation of the postfilter frequency response according to equation 1 while figure 3a shows the corresponding spectral envelope of the received signal.
  • point A is a maximum (i.e. a formant) this is normalised to one at point D on the postfilter frequency response.
  • the sample positions between point A and B are correspondingly normalised with reference to point A.
  • the sample positions between point B and C are normalised with reference to point C.
  • Point B can be normalised with reference to either point A or C.
  • the modified spectrum can be passed to a high pass filter (not shown) which adds a slight high frequency tilt to the speech.
  • a high pass filter (not shown) which adds a slight high frequency tilt to the speech.
  • this is given by Equation 4. 1 - ⁇ cos 2 ⁇ k 64 + ⁇ 2 Equation 4
  • power normalisation can also be carried out in the frequency domain, to scale the postfiltered speech such that it has roughly the same power as the unfiltered noisy speech.
  • One technique used to normalise the output signal power is for a power normalisation function 15 to estimate the power of the unfiltered and filtered speech separately using inputs from the noisy speech spectrum and the postfiltered spectrum, then determine an appropriate scaling factor based on the ratio of the two estimated power values.
  • a possible gain factor g is given by
  • the postfilter spectrum is passed to an inverse Fast Fourier Transform function 16, which performs an inverse FFT on the spectrum in order to bring the signal back into the time domain.
  • the phase components for the inverse FFT are those of the original speech spectrum.
  • the overlap and add function 17 is used to remove the effect of the window function.
  • the present invention may include any novel feature or combination of features disclosed herein either explicitly or implicitly or any generalisation thereof irrespective of whether or not it relates to the presently claimed invention or mitigates any or all of the problems addressed.
  • the postfilter may also include a long term postfilter in series with the short term postfilter.

Abstract

A method for calculating a postfilter frequency response for filtering digitally processed speech, the method comprising identifying at least one formant of a speech spectrum of the digitally processed speech; and normalising points of the speech spectrum with respect to an identified formant.

Description

  • This invention relates to a method and apparatus for postfiltering a digitally processed signal.
  • To enable transmission of speech at low bit rates various types of speech encoders have been developed which are used to compress a speech signal before the signal is transmitted. On receipt of the compressed signal the receiver decompresses the signal before finally being reconverted back into an audio signal.
  • Even though, over the same bandwidth, a compressed speech signal allows more information to be transmitted than an uncompressed signal, the quality of digitally compressed speech signals is often degraded by, for example, background noise, coding noise and by noise due to transmission over a channel.
  • In particular, as the encoding rate of the processed signal is reduced, the SNR also drops and the noise floor of the coding noise rises. At low encoding rates it can become impossible to keep the noise below the audible masking threshold and hence the noise can contribute to the overall roughness of the speech signal.
  • Two techniques have been developed to deal with this problem. The first technique uses noise spectral shaping at the speech encoder. The idea behind spectral shaping is to shape the spectrum of the coding noise so that it follows the speech spectrum, otherwise known as the speech spectral envelope. Spectrally shaped noise, when coded, is less audible to the human ear due to the noise masking effect of the human auditory system. However, at low encoding rates noise spectral shaping alone is not sufficient to make the coding noise inaudible. For example, even with noise spectral shaping, the quality of a Code Excited Linear Prediction (CELP) coder having an encoding rate of 4.8kb/s is still perceived as rough or noisey. The second technique uses an adaptive postfilter at the speech decoder output and typically comprises a short term postfilter element and a long term postfilter element. The purpose of the long term postfilter is to attenuate frequency components between pitch harmonic peaks. Whereas the purpose of the short term postfilter is to accurately track the time-varying nature of the speech signal and suppress the noise residing in the spectral valleys. The frequency response of the short term postfilter typically corresponds to a modified version of the speech spectrum where the postfilter has local minimums in the regions corresponding to the spectral valleys and local maximums at the spectral peaks, otherwise known as formant frequencies. The dips in the regions corresponding to the spectral valleys (i.e. local minimums) will suppress the noise, thereby accomplishing noise reduction. This has the effect of removing noise from the perceived speech signal. The local maximums allow for more noise in the formant regions, which is masked by the speech signal. However, some speech distortion is introduced because the relative signal levels in the formant regions are altered due to the postfiltering.
  • Most speech codecs use a time domain based postfilter based on US Patent Number 4,969,192. In this technique the postfiltering is implemented temporally as a difference equation. As such, the postfilter can be described by a transfer function. Consequently it is not possible to independently control the different portions of the frequency spectrum with the result that noise reduction by suppressing the noise around the spectral valleys distorts the speech signal by sharpening the formant peaks.
  • Consequently, most current short term postfilters shape the spectrum such that the formants become narrower and more peaky. Whilst this reduces the noise in the valleys, it has the side effect of altering the spectral shape such that the speech becomes boomy and less natural. This effect is especially prevalent when large amounts of post filtering is applied to the signal, as is the case for Pitch Synchronous Innovation-CELP (PSI-CELP).
  • In accordance with one aspect of the present invention there is provided a method for calculating a short term postfilter frequency response for filtering digitally processed speech, the method comprising identifying at least one formant of the speech spectrum; and normalising points of the speech spectrum with respect to the magnitude of an identified formant.
  • Using this method it is possible to independently control different portions of the frequency spectrum.
  • Preferably the points of the speech spectrum are normalised with respect to the magnitude of the nearest formant.
  • Most preferably the points of the speech spectrum are normalised according to a function of the form Rpost (k) = R(k) Rform (k) β Where R(k) is the amplitude of the spectrum at a frequency k and Rform(k) is the amplitude of the spectrum at a frequency k which corresponds to an identified formant frequency and β controls the degree of postfiltering. Where β = k min - k k min - kmax · γ for k max < kk min and β = k max - k k max - k min · γ for kmin < k ≤ kmax
    where k is a point in frequency, kmin is the frequency of a spectral valley, kmax is the frequency of a formant and ϒ controls the degree of postfiltering i.e controls the depth of the postfilter valleys.
  • Preferably the at least one formant is identified by finding a first derivative of the speech spectrum.
  • In accordance with a second aspect of the present invention there is provided a postfiltering method for enhancing a digitally processed speech signal, the method comprising obtaining a speech spectrum of the digitally processed signal; identifying at least one formant of the speech spectrum; normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; and filtering the speech spectrum of the digitally processed signal with the postfilter frequency response.
  • In accordance with a third aspect of the present invention there is provided a postfilter comprising identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; means for filtering the digitally processed speech spectrum with the postfilter frequency response.
  • In accordance with a fourth aspect of the present invention there is provided a radiotelephone comprising a postfilter, the postfilter having identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with the magnitude of an identified formant to produce a postfilter frequency response; means for filtering the digitally processed speech spectrum with the postfilter frequency response.
  • The invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-
  • Figure 1 is a schematic block diagram of a radio telephone incorporating a postfilter according to the present invention;
  • Figure 2 is a schematic block diagram of a postfilter according to the present invention;
  • Figure 3a and 3b illustrate an example of a frequency response of a postfilter according to the present invention compared with the corresponding postfiltered speech spectrum;
  • The embodiment of the invention described below is based on the postfiltering of a digitally processed signal by means of a time domain adaptive predictive coder, for example Residual Excited Linear Prediction (RELP) and CELP coders/decoders. However, this invention is equally applicable to the postfiltering of a digitally processed speech signal by means of a frequency domain coder/decoder, for example SBC and MBE coders/decoders.
  • Figure 1 shows a digital radiotelephone 1 having an antenna 2 for transmitting signals to and for receiving signals from a base station (not shown). During reception of a call the antenna 2 supplies an encoded digital radio signal, which represents an audio signal transmitted from a calling party, to the receiver 3 which converts the low power radio frequency into a low frequency signal which is then demodulated. The demodulated signal is then supplied to a decoder 4, which decodes the signal before passing the signal to the postfilter 5. The postfilter 5 modifies the signal, as described in detail below, before passing the modified signal to a digital to analogue converter 6. The analogue signal is then passed to a speaker 7 for conversion into an audio signal.
  • As stated above, after the signal has been decoded the signal is then passed to postfilter 5. Referring to Figure 2 on receipt of the signal by the postfilter, the signal is passed to a windowing function 8 which divides the signal into frames. The frame size determines how often the frequency response of the postfilter is updated. That is to say, a larger frame size will result in a longer time between the recalculation of the postfilter frequency response than a shorter frame size. In this embodiment a frame size of 80 samples is used which is windowed using a trapezoidal window function (i.e. a quadrilateral having only one pair of parallel sides). The 80 samples correspond to 10ms when using a 8kHz sampling rate. The process uses an overlap of 18 samples to remove the effect of the shape of the window function from the time domain signal. Once the encoded speech has been windowed the frame is padded with zeroes to give 128 data points. The speech signal frames are then supplied to a Fast Fourier Transform function 9, which converts the time domain signal into the frequency domain using a 128 point Fast Fourier Transform.
  • The postfilter 5 has a Linear Prediction Coefficient filter 10, which typically has the same characteristics as the synthesis filter in the decoder 4. An approximation of the speech signal is obtained by finding the impulse response of the LPC synthesis filter 10 using the transmitted LPC coefficients 19 and the pulse train 18. The impulse response of LPC filter 10 is then supplied to a Fast Fourier Transform function 11, which converts the impulse response into the frequency domain using a 128 point Fast Fourier Transform in the same manner as described above. The frequency transform of the impulse response provides an approximation of the spectral envelope of the speech signal.
  • The above description describes how a time domain signal is converted into the frequency domain. This is relevant for time domain coders such as CELP and RELP. Frequency domain coders, however, need no such conversion.
  • The approximation of the spectral envelope of the speech signal is passed to a spectral envelope modifying function 13 and a formants identifying function 12. The formants identifying function 12 uses the FFT output to identify the turning points of the spectral envelope by finding the first derivative on a spectral bin by spectral bin basis i.e. for each output point of the FFT function 11. This provides the positions of the maximum and minimums of the spectral envelope which correspond to the formants and spectral valleys respectively.
  • The formant identifying function 12 passes the positions of the formants that have been identified to the spectral envelope modifying function 13. The modifying function 13 calculates the postfilter frequency response by normalising each point of the spectral envelope with respect to the magnitude of its nearest formant. If more than one formant has been identified each point of the spectral envelope can be normalised with reference to one of the formants, however preferably the normalisation of each point should be with respect to its nearest formant.
  • A preferred normalisation equation is shown in equation 1. Rpost (k) = β R k Rform k    where 0 ≤ k < 64    Equation 1
  • As FFT output is symmetrical the upper value of k is typically chosen to be half the Fast Fourier Transform. Therefore, in this embodiment the upper limit of k is 64.
  • R(k) is a point on the spectral envelope, Rform(k) is the magnitude of the nearest formant, and k is a point in frequency.
  • for k max < k ≤ kmin β is given by equation 2 β = k min - k k min - k max · γ    Equation 2 for kmin < k ≤ kmax β is given by equation 3 β = k max - k k max - k min · γ    Equation 3 where k is a point in frequency, kmin is the frequency of a spectral valley, kmax is the frequency of a formant.
  • γ controls the degree of postfiltering (i.e. controls the depth of the postfilter valleys) and is preferably chosen to lie between 0.7 and 1.0. Equations 2 and 3 ensure that there is a gradual de-emphasis of the spectral valleys such that maximum attenuation occurs at the bottom of the valley.
  • Figure 3b shows a representation of the postfilter frequency response according to equation 1 while figure 3a shows the corresponding spectral envelope of the received signal. As point A is a maximum (i.e. a formant) this is normalised to one at point D on the postfilter frequency response. The sample positions between point A and B are correspondingly normalised with reference to point A. The sample positions between point B and C are normalised with reference to point C. Point B can be normalised with reference to either point A or C.
  • To increase the brightness of the speech the modified spectrum can be passed to a high pass filter (not shown) which adds a slight high frequency tilt to the speech. In the frequency domain this is given by Equation 4. 1 - µcos k 64 + µ2    Equation 4
  • Once the postfilter frequency response has been calculated it is passed to a multiplier 14 which multiplies the modified spectrum with the original noisy speech spectrum to give the postfiltered speech magnitude spectrum, as shown in equation 5. Spost (k)= S(k)·Rpost (k)·(1-µcosk 64 + µ2)    Equation 5
  • Additionally, power normalisation can also be carried out in the frequency domain, to scale the postfiltered speech such that it has roughly the same power as the unfiltered noisy speech. One technique used to normalise the output signal power is for a power normalisation function 15 to estimate the power of the unfiltered and filtered speech separately using inputs from the noisy speech spectrum and the postfiltered spectrum, then determine an appropriate scaling factor based on the ratio of the two estimated power values. One example of a possible gain factor g is given by
    Figure 00100001
  • Therefore,the normalised postfilter speech spectrum Snp is given by Snp (k)=g· Spost (k)
  • The postfilter spectrum is passed to an inverse Fast Fourier Transform function 16, which performs an inverse FFT on the spectrum in order to bring the signal back into the time domain. The phase components for the inverse FFT are those of the original speech spectrum. Finally the overlap and add function 17 is used to remove the effect of the window function.
  • The present invention may include any novel feature or combination of features disclosed herein either explicitly or implicitly or any generalisation thereof irrespective of whether or not it relates to the presently claimed invention or mitigates any or all of the problems addressed. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention. For example, it will be appreciated that the postfilter may also include a long term postfilter in series with the short term postfilter.

Claims (16)

  1. A method for calculating a postfilter frequency response for filtering digitally processed speech, the method comprising identifying at least one formant of a speech spectrum of the digitally processed speech; and normalising points of the speech spectrum with respect to the magnitude of an identified formant.
  2. A method according to claim 1, wherein the points of the speech spectrum are normalised with respect to the magnitude of the nearest formant.
  3. A method according to claim 1 or 2, wherein the points of the speech spectrum are normalised according to a function of the form Rpost (k) = β R k Rform k where R(k) is the amplitude of the spectrum at a frequency k and Rform(k) is the amplitude of the spectrum at a frequency k which corresponds to an identified formant frequency and β controls the degree of postfiltering.
  4. A method according to claim 3, wherein β = k - k max k min - k max · γ for k max < kk min and β = k max - k k max - k min · γ for kmin < k ≤ kmax
    where k is a point in frequency, kmin is the frequency of a spectral valley, kmax is the frequency of a formant and ϒ controls the degree of postfiltering.
  5. A method according to any one of the preceding claims wherein the at least one formant is identified by finding a first derivative of the speech spectrum.
  6. A postfiltering method for enhancing a digitally processed speech signal, the method comprising obtaining a speech spectrum of the digitally processed signal; identifying at least one formant of the speech spectrum; normalising points of the speech spectrum with the magnitude of an identified formant to produce a postfilter frequency response; filtering the speech spectrum of the digitally processed signal with the postfilter frequency response.
  7. A method according to claim 6 wherein the points of the speech spectrum are normalised with respect to the magnitude of the nearest formant.
  8. A method according to claim 6 or 7, wherein the points of the speech spectrum are normalised according to a function of the form Rpost (k) = β R k Rform k where R(k) is the amplitude of the spectrum at a frequency k and Rform(k) is the amplitude of the spectrum at a frequency k which corresponds to an identified formant frequency and β controls the degree of postfiltering.
  9. A method according to claim 8, wherein β = k- k max k min - k max · γ for k max < kk min and β = k max - k k max - k min · γ for kmin < k ≤ kmax
    where k is a point in frequency, kmin is the frequency of a spectral valley, kmax is the frequency of a formant and ϒ controls the degree of postfiltering.
  10. A method according to any one of the claims 6 to 9 wherein at least one formant is identified by finding a first derivative of the speech spectrum.
  11. A postfilter comprising identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; and means for filtering the digitally processed speech spectrum with the postfilter frequency response.
  12. A postfilter according to claim 11, wherein the normalising means normalises points of the speech spectrum with respect to the magnitude of the nearest formant.
  13. A postfilter according to claim 11 or 12, wherein the normalising means normalises points of the speech spectrum according to a function of the form Rpost (k) = β R k Rform k where R(k) is the amplitude of the spectrum at a frequency k and Rform(k) is the amplitude of the spectrum at a frequency k which corresponds to an identifed formant frequency and β controls the degree of postfiltering.
  14. A postfitler according to claim 13 wherein β = k- k max k min - k max · γ for k max < kk min β = k max - k k max - k min · γ for kmin < k ≤ kmax
    where k is a point in frequency, kmin is the frequency of a spectral valley, kmax is the frequency of a formant and ϒ controls the degree of postfiltering.
  15. A postfilter according to any one of claims 11 to 14 wherein the identifying means identifies at least one formant by finding a first derivative of the speech spectrum.
  16. A radiotelephone comprising a postfilter, the postfilter having identifying means for identifying at least one formant of a digitally processed speech spectrum; normalising means for normalising points of the speech spectrum with respect to the magnitude of an identified formant to produce a postfilter frequency response; and means for filtering the digitally processed speech spectrum with the postfilter frequency response.
EP99307954A 1998-10-13 1999-10-08 Post filter Withdrawn EP0994463A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9822347A GB2342829B (en) 1998-10-13 1998-10-13 Postfilter
GB9822347 1998-10-13

Publications (1)

Publication Number Publication Date
EP0994463A2 true EP0994463A2 (en) 2000-04-19

Family

ID=10840505

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99307954A Withdrawn EP0994463A2 (en) 1998-10-13 1999-10-08 Post filter

Country Status (4)

Country Link
US (1) US6629068B1 (en)
EP (1) EP0994463A2 (en)
JP (1) JP2000122695A (en)
GB (1) GB2342829B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2375028A (en) * 2001-04-24 2002-10-30 Motorola Inc Processing speech signals
EP1557827A1 (en) * 2002-10-31 2005-07-27 Fujitsu Limited Voice intensifier
US10141001B2 (en) 2013-01-29 2018-11-27 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI116643B (en) * 1999-11-15 2006-01-13 Nokia Corp Noise reduction
US7512535B2 (en) * 2001-10-03 2009-03-31 Broadcom Corporation Adaptive postfiltering methods and systems for decoding speech
KR100434723B1 (en) * 2001-12-24 2004-06-07 주식회사 케이티 Sporadic noise cancellation apparatus and method utilizing a speech characteristics
US7707034B2 (en) * 2005-05-31 2010-04-27 Microsoft Corporation Audio codec post-filter
JP4738213B2 (en) * 2006-03-09 2011-08-03 富士通株式会社 Gain adjusting method and gain adjusting apparatus
DE102006022346B4 (en) * 2006-05-12 2008-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Information signal coding
CN105324982B (en) * 2013-05-06 2018-10-12 波音频有限公司 Method and apparatus for inhibiting unwanted audio signal
US9620134B2 (en) 2013-10-10 2017-04-11 Qualcomm Incorporated Gain shape estimation for improved tracking of high-band temporal characteristics
US10614816B2 (en) 2013-10-11 2020-04-07 Qualcomm Incorporated Systems and methods of communicating redundant frame information
US10083708B2 (en) 2013-10-11 2018-09-25 Qualcomm Incorporated Estimation of mixing factors to generate high-band excitation signal
US9384746B2 (en) 2013-10-14 2016-07-05 Qualcomm Incorporated Systems and methods of energy-scaled signal processing
US10163447B2 (en) 2013-12-16 2018-12-25 Qualcomm Incorporated High-band signal modeling
EP3107097B1 (en) * 2015-06-17 2017-11-15 Nxp B.V. Improved speech intelligilibility

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1986003873A1 (en) * 1984-12-20 1986-07-03 Gte Laboratories Incorporated Method and apparatus for encoding speech
US4827516A (en) * 1985-10-16 1989-05-02 Toppan Printing Co., Ltd. Method of analyzing input speech and speech analysis apparatus therefor
US4969192A (en) 1987-04-06 1990-11-06 Voicecraft, Inc. Vector adaptive predictive coder for speech and audio
DE69428119T2 (en) * 1993-07-07 2002-03-21 Picturetel Corp REDUCING BACKGROUND NOISE FOR LANGUAGE ENHANCEMENT
US5784532A (en) * 1994-02-16 1998-07-21 Qualcomm Incorporated Application specific integrated circuit (ASIC) for performing rapid speech compression in a mobile telephone system
JP3321971B2 (en) * 1994-03-10 2002-09-09 ソニー株式会社 Audio signal processing method
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
US5774837A (en) * 1995-09-13 1998-06-30 Voxware, Inc. Speech coding system and method using voicing probability determination
US5673361A (en) * 1995-11-13 1997-09-30 Advanced Micro Devices, Inc. System and method for performing predictive scaling in computing LPC speech coding coefficients
SE9700772D0 (en) * 1997-03-03 1997-03-03 Ericsson Telefon Ab L M A high resolution post processing method for a speech decoder
US6098036A (en) * 1998-07-13 2000-08-01 Lockheed Martin Corp. Speech coding system and method including spectral formant enhancer

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2375028A (en) * 2001-04-24 2002-10-30 Motorola Inc Processing speech signals
GB2375028B (en) * 2001-04-24 2003-05-28 Motorola Inc Processing speech signals
EP1557827A1 (en) * 2002-10-31 2005-07-27 Fujitsu Limited Voice intensifier
EP1557827A4 (en) * 2002-10-31 2008-05-14 Fujitsu Ltd Voice intensifier
US10141001B2 (en) 2013-01-29 2018-11-27 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding

Also Published As

Publication number Publication date
US6629068B1 (en) 2003-09-30
JP2000122695A (en) 2000-04-28
GB2342829A (en) 2000-04-19
GB2342829B (en) 2003-03-26
GB9822347D0 (en) 1998-12-09

Similar Documents

Publication Publication Date Title
EP0770988B1 (en) Speech decoding method and portable terminal apparatus
US7529660B2 (en) Method and device for frequency-selective pitch enhancement of synthesized speech
US6931373B1 (en) Prototype waveform phase modeling for a frequency domain interpolative speech codec system
US6996523B1 (en) Prototype waveform magnitude quantization for a frequency domain interpolative speech codec system
EP1509903B1 (en) Method and device for efficient frame erasure concealment in linear predictive based speech codecs
US20060116874A1 (en) Noise-dependent postfiltering
US7013269B1 (en) Voicing measure for a speech CODEC system
US6629068B1 (en) Calculating a postfilter frequency response for filtering digitally processed speech
EP2005419B1 (en) Speech post-processing using mdct coefficients
EP2290815B1 (en) Method and system for reducing effects of noise producing artifacts in a voice codec
US9489964B2 (en) Effective pre-echo attenuation in a digital audio signal
US6732075B1 (en) Sound synthesizing apparatus and method, telephone apparatus, and program service medium
KR20080103088A (en) Method for trained discrimination and attenuation of echoes of a digital signal in a decoder and corresponding device
US20130246055A1 (en) System and Method for Post Excitation Enhancement for Low Bit Rate Speech Coding
EP1313091A2 (en) Speech analysis, synthesis, and quantization methods
JPH1097296A (en) Method and device for voice coding, and method and device for voice decoding
CN112086107B (en) Method, apparatus, decoder and storage medium for discriminating and attenuating pre-echo
WO1998006090A1 (en) Speech/audio coding with non-linear spectral-amplitude transformation
KR102099293B1 (en) Audio Encoder and Method for Encoding an Audio Signal
US20020173949A1 (en) Speech coding system
US20020156625A1 (en) Speech coding system with input signal transformation
EP0984433A2 (en) Noise suppresser speech communications unit and method of operation
KR100210444B1 (en) Speech signal coding method using band division
EP1164577A2 (en) Method and apparatus for reproducing speech signals

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NOKIA CORPORATION

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20020113

R18W Application withdrawn (corrected)

Effective date: 20030113