WO2000075919A1 - Methods and apparatus for generating comfort noise using parametric noise model statistics - Google Patents

Methods and apparatus for generating comfort noise using parametric noise model statistics Download PDF

Info

Publication number
WO2000075919A1
WO2000075919A1 PCT/US2000/013829 US0013829W WO0075919A1 WO 2000075919 A1 WO2000075919 A1 WO 2000075919A1 US 0013829 W US0013829 W US 0013829W WO 0075919 A1 WO0075919 A1 WO 0075919A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
statistic
modeling parameter
updates
comfort noise
Prior art date
Application number
PCT/US2000/013829
Other languages
French (fr)
Inventor
Phillip Marc Johnson
Leland Scott Bloebaum
Original Assignee
Ericsson, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ericsson, Inc. filed Critical Ericsson, Inc.
Priority to AU50320/00A priority Critical patent/AU5032000A/en
Priority to JP2001502113A priority patent/JP2003501925A/en
Priority to DE10084675T priority patent/DE10084675T1/en
Publication of WO2000075919A1 publication Critical patent/WO2000075919A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding

Definitions

  • the present invention relates to communications systems, and more particularly, to the generation of comfort noise in communications systems.
  • a receiving radio In digital wireless communications systems (e.g., systems including cellular phones, land mobile radios, satellite phones, air phones, etc.), it is sometimes necessary that a receiving radio generate low-volume audio noise. For example, during a digital wireless radio call, there can exist periods in which a receiving radio temporarily does not receive valid speech information from a transmitting radio. During such periods, it is desirable that the receiving radio generate audible noise so that the user of the receiving radio does not falsely believe that the call transmission has ceased. Such noise is referred to in the art and hereinafter as comfort noise.
  • comfort noise is particularly advantageous.
  • a communications link becomes significantly degraded but is still operable, it is sometimes preferable that the speech path at the receiving radio be muted to prevent badly distorted speech from being passed through to the receiving radio user.
  • the receiving radio can instead generate and play comfort noise. Doing so informs the receiving user that the receiver is still operational without subjecting him or her to the loud pops and artifacts which typically accompany corrupted speech.
  • Comfort noise is also quite useful in the context of Discontinuous Transmission, or DTX, communications systems.
  • a transmitter detects whether an outgoing signal includes voice, and ceases or reduces the rate of transmission of the outgoing signal when it does not include voice.
  • the receiver play some type of comfort noise so that the receiving user perceives that the communications path between the transmitter and the receiver is still open and operable.
  • the comfort noise generated at the receiver match, as closely as possible, the background noise existing at the transmitter.
  • the comfort noise generation process should be transparent to the receiving user.
  • the background noise existing at a transmitter can be sampled, and one or more parameters representing characteristics of the sampled noise can be periodically transmitted to a receiver for use in generating matching comfort noise.
  • conventional techniques for doing so still lead to perceptible differences between the artificial comfort noise and the naturally occurring background noise. Consequently, there is a need for improved methods and apparatus for generating comfort noise in communications systems.
  • the present invention fulfills the above-described and other needs by providing techniques wherein one or more higher order statistics relating to a parametric background noise model are used, in conjunction with the noise model itself, to realize high-quality, natural-sounding comfort noise.
  • conventional systems generate comfort noise based solely on periodically estimated noise model parameters
  • embodiments of the invention supplement the noise model parameters with suitable statistics so that more accurate and better sounding comfort noise can be generated.
  • the noise model parameters can, according to the invention, be averaged, smoothed or otherwise filtered at the transmitting and/or receiving sides of a communications link in order to further enhance the sound quality of the resulting comfort noise.
  • mean values of a number of background noise spectrum magnitudes are periodically estimated at a DTX transmitter and then transmitted, along with a single standard deviation value also estimated at the DTX transmitter, to a DTX receiver.
  • the periodically received mean spectrum magnitudes are smoothed across DTX frames, and the resulting smoothed values are dithered using the received standard deviation.
  • the dithered mean values are then used to generate comfort noise at the DTX receiver.
  • the exemplary embodiment prevents spectral randomness at the transmitter from introducing sharp spectral deviations at the receiver. Additionally, smoothing the received mean values across frames at the receiver reduces sharp, often perceptible spectral transitions which can result from the fact that comfort noise updates occur relatively infrequently. Moreover, using the estimated standard deviation to dither the smoothed mean values slightly changes the comfort noise characteristics on a frame by frame basis, resulting in a more random spectrum and therefore a more natural-sounding comfort noise.
  • An exemplary radio transmitter includes an encoder configured to sample an input noise signal and to provide as output a parametric model of the sampled noise signal, the parametric model including at least one modeling parameter representative of the sampled noise signal.
  • the encoder also provides as output a statistic relating to the at least one modeling parameter, an order of the statistic being greater than an order of each modeling parameter.
  • the encoder can, for example, be a multi-band excitation coder, a homomorphic coder, or a sinusoidal transform coder.
  • the parametric model can include a number of estimated mean spectral magnitudes, and the statistic can be an estimated standard deviation for the estimated mean spectral magnitudes. To enhance signal reconstruction, the encoder can periodically update and filter the at least one modeling parameter and the statistic.
  • An exemplary radio receiver includes a comfort noise generator configured to receive at least one noise modeling parameter representative of a noise signal and a statistic relating to the at least one noise modeling parameter. An order of the statistic is greater than an order of each noise modeling parameter, and the comfort noise generator decodes the at least one noise modeling parameter and the statistic to provide comfort noise to a user of the radio receiver.
  • Each noise modeling parameter can, for example, be an estimated mean spectral magnitude, and the statistic can be an estimated standard deviation for the at least one estimated mean spectral magnitude.
  • the comfort noise generator can periodically receive and filter updates of the at least one noise modeling parameter and the statistic. Further, the comfort noise generator can process the filtered updates of the at least one noise modeling parameter in accordance with the statistic to provide the comfort noise. For example, the comfort noise generator can use an estimated standard deviation to dither filtered updates of received mean spectral magnitudes.
  • Figure 1 is a block diagram of an exemplary DTX transmitter in which background noise modeling techniques of the invention can be implemented.
  • Figure 2 is a block diagram of an exemplary DTX receiver in which comfort noise generation techniques according to the invention can be implemented.
  • Figure 3 depicts an exemplary speech signal and corresponding timing of exemplary DTX frames for a DTX communications system in which the techniques of the invention can be implemented.
  • Figure 4 is a flow diagram depicting steps in an exemplary method of comfort noise generation according to the invention.
  • Figure 5 is a block diagram of an exemplary comfort noise frame generator according to the invention.
  • Figure 6 is a time plot of a number of spectral magnitudes representing typical background noise at a DTX transmitter.
  • Figure 7 is a time plot of a number of spectral magnitudes representing comfort noise produced at a DTX receiver, the spectral magnitudes having been generated based on the spectral magnitudes of Figure 6 using prior art techniques.
  • Figure 8 is a time plot of a number of estimated mean spectral magnitudes representing background noise at a DTX transmitter, the estimated mean spectral magnitudes having been derived by filtering or smoothing the spectral magnitudes of Figure 6 in accordance with the invention.
  • Figure 9 is a time plot of a number of spectral magnitudes representing background noise at a DTX transmitter, the spectral magnitudes having been derived by receiving the spectral magnitudes of Figure 8 at a DTX receiver and thereafter filtering the received spectral magnitudes according to the invention.
  • Figure 10 is a time plot of a number of spectral magnitudes representing comfort noise generated at a DTX receiver, the spectral magnitudes having been derived by randomizing or dithering the spectral magnitudes of Figure 9 according to the invention.
  • Figure 11 is a time plot of a number of spectral magnitudes representing enhanced comfort noise generated at a DTX receiver, the spectral magnitudes having been derived by filtering or smoothing the spectral magnitudes of Figure 10 according to the invention.
  • DTX Discontinuous Transmission
  • PDC Pacific Digital Cellular
  • D-AMPS Digital Advanced Mobile Phone System
  • IS641A Global System for Mobile Communications
  • GSM Global System for Mobile Communications
  • ACeS Asian Cellular Satellite
  • MBETM speech coding algorithm originally developed at the Massachusetts Institute of Technology.
  • the MBE algorithm (and, more recently, the well known successor algorithms IMBETM and AMBETM) are very popular in digital communications systems requiring a low bit rate (i.e., under 4.8 kbps).
  • some form of MBE is used in each of the well known Iridium, INMARSAT M, INMARSAT Mini-M, ICO (INMARSAT-P), Optus, and ACeS systems.
  • MBE-based algorithms have also been used in Land Mobile Radio (e.g. APCO-25) and air phone applications.
  • Land Mobile Radio e.g. APCO-25
  • the disclosed parametric and statistical signal modeling techniques are readily applied, not only to the frequency-domain MBE speech coding algorithm, but to any signal coding algorithm.
  • the disclosed techniques are directly applicable to other frequency-domain algorithms (such as those used in homomorphic vocoders and sinusoidal transform coders) and to time-domain algorithms (such as the well known Code Excited Linear Prediction, or CELP, algorithm and the also well known Vector Sum Excited Linear Prediction, or VSELP, algorithm).
  • Figures 1 and 2 depict, respectively, a DTX transmitter 100 and a compatible DTX receiver 200.
  • the exemplary DTX transmitter 100 includes a voice activity detector (VAD) 110, a speech encoder 120, a silence description (SID) encoder 130, a channel encoder 140, and first and second transmit switches 150,
  • VAD voice activity detector
  • SID silence description
  • the exemplary DTX receiver 200 includes a channel decoder 210, a frame validation processor 220, a speech frame buffer 230, a comfort noise frame buffer 240, a speech decoder 250 and a receive switch 260.
  • a channel decoder 210 includes a channel decoder 210, a frame validation processor 220, a speech frame buffer 230, a comfort noise frame buffer 240, a speech decoder 250 and a receive switch 260.
  • Those of skill in the art will appreciate that the below described functionality of the components of Figures 1 and 2 can be implemented using a variety of hardware configurations including, for example, a general purpose digital computer, standard digital signal processing components, and one or more application specific integrated circuits.
  • an audio frame e.g., a collection of successive pulse code modulated samples of a user speech signal
  • the voice activity detector 110 is provided to the speech encoder 120, and the SID encoder 130 of the DTX transmitter 100.
  • the voice activity detector 110 analyzes the audio frame and determines whether the frame contains voice information. If so, then the first transmit switch 150 is set to couple an output of the speech encoder 120 to an input of the channel encoder 140, and the speech encoder 120 is instructed to encode a speech frame (using techniques described below) for input to the channel coder 140. Otherwise, the transmit switch 150 is set to couple an output of the SID encoder 130 to the input of the channel encoder 140, and the SID encoder
  • SID frame (using techniques also described below) for input to the channel coder 140.
  • operation of the speech encoder 120 and the SID encoder 130 can be combined in a single encoding device.
  • the channel encoder 140 uses known channel coding techniques to prepare the frame for transmission across the communications channel (e.g., the air interface). During periods in which the speech signal includes voice, the second transmit switch 155 remains closed, and successive speech frames are encoded and transmitted. However, when the voice activity detector 110 determines that voice activity has just stopped, only a limited number (typically one or two) SID frames are encoded and transmitted. Subsequently, SID update frames are periodically (e.g. , every 250 ms to 1.0 sec) encoded and transmitted until the voice activity detector 110 indicates that voice has restarted. At that time, the speech encoder 120 resumes generating speech frames for transmission until the voice again ceases.
  • the channel decoder 210 receives and decodes incoming frames (i.e., the channel decoder 210 performs the inverse of the coding process implemented by the channel coder 140) and provides the decoded frames to the validation processor 220, the speech frame buffer 230, and the comfort noise frame buffer 240.
  • the validation processor 220 the speech frame buffer 230
  • the comfort noise frame buffer 240 the comfort noise frame buffer 240.
  • DTX periods most of the received frames are invalid and are therefore filled with random data created by RF interference and receiver noise.
  • a valid SID update frame is transmitted during DTX, and valid speech frame transmissions can resume at any time.
  • the validation processor 220 analyzes each received frame for content. If the received frame is not valid, the receive switch 260 is set to couple the comfort noise frame buffer 240 to an input of the speech decoder 250, and the comfort noise frame buffer 240 is instructed to provide a noise frame to the speech decoder 250 for comfort noise generation. In the event the received frame is a valid SID update, the received frame is used to update the contents of the comfort noise frame buffer 240 before a noise frame is provided to the speech decoder 250 for comfort noise generation. Lastly, if the received frame represents a valid voice frame, the receive switch 260 is set to couple the speech frame buffer 230 to the speech decoder, and the received frame is sent to the speech decoder 250 to be synthesized for presentation to the receiver user.
  • FIG. 3 is a timing diagram illustrating the above described DTX operation.
  • a speech signal includes first and second speech bursts 310, 320, separated by a period of silence.
  • valid speech frames 315 are continually transmitted.
  • valid SID frames 330 are transmitted instead of speech frames.
  • valid speech frames 325 are again continually transmitted.
  • Such DTX operation provides significant advantages over conventional continuous transmission, and DTX is therefore a common feature in digital wireless systems of today. For example, DTX enables the transmitting radio to save power since it is not required to transmit as often.
  • the transmitter power amplifier typically consumes the majority of the transmitter power
  • significant power savings can be realized by turning the power amplifier off during DTX mode.
  • DTX results in less RF energy being transmitted into the air interface spectrum. Consequently, the average RF interference seen by other users in a multiple-access system is reduced, and the
  • Carrier-to-Interference (C/I) ratio seen by those users is commensurately enhanced. Increased C/I improves the performance of the radio terminals or, conversely, increases the capacity of the system (i.e. , the number of users that can be supported in a given frequency allocation is increased).
  • DTX system is sampled and encoded (e.g., by the speech encoder 120), and thereafter the encoded values are decoded (e.g., by the speech decoder 250) to synthesize or reconstruct the speech signal.
  • the combination of an encoder and a decoder is often referred to in the art as a codec or a vocoder, and any of a number of known techniques can be implemented within a vocoder to accomplish the functions of speech coding and decoding.
  • Waveform vocoders attempt to quantize and encode the speech signal itself, while parametric vocoders assume a model for the speech signal, the model consisting of a number of parameters.
  • a parametric vocoder receives samples of the speech signal, groups the samples into frames, fits the frame of samples to the model, then quantizes and encodes the values for the model parameters. In this manner, parametric vocoders are able to produce a desired speech quality at lower information (i.e., bit) rates than are waveform vocoders.
  • a robust and popular parametric vocoder is the above noted MBE vocoder.
  • the MBE vocoder divides a sampled speech signal into 20-ms frames. For each voice frame, a set of MBE model parameters is calculated.
  • the model parameters e.g., including a fundamental pitch frequency and a number of voicing decisions
  • For frames containing no voice e.g.
  • the MBE model produces a set of spectral magnitudes which can be used to recreate the frames (e.g., to synthesize comfort noise at a DTX receiver).
  • the most recent SID update is used directly and repeatedly during DTX periods to generate comfort noise.
  • the latest SID frame e.g., an MBE frame containing spectral magnitudes
  • the speech decoder 250 for synthesis.
  • the DTX receiver forces the comfort noise characteristic at the receiver to match that of the background noise at the transmitter each time a SID update is received.
  • the comfort noise spectrum remains static between SID updates.
  • the invention provides methods and apparatus for capturing both the loudness and the liveliness of the transmitter background noise.
  • the invention provides techniques for capturing the perceptible characteristics of any signal of interest.
  • a parametric model of the signal e.g., a set of MBE spectral magnitudes representing transmitter background noise
  • one or more higher order statistics relating to the parametric model For example, in the case of DTX transmissions, the MBE spectral magnitudes of a SID frame (which can be thought of as a crude estimate of the mean noise spectrum) can be supplemented with an estimate of the variance of the background noise spectrum.
  • the one or more higher order statistics e.g., the variance estimate
  • the model parameters e.g., the spectral magnitudes
  • model parameters can be smoothed, averaged, or otherwise filtered to further enhance the reconstructed signal. Such filtering can be performed when the model parameters are generated
  • the MBE spectral magnitudes in a DTX SID frame can be thought of as an estimate of the mean noise spectrum. According to the invention, however, a superior estimate of the mean spectrum can be obtained by filtering successive spectral magnitude frames. For example, at the beginning of each period of voice inactivity, a DTX voice activity detector (e.g., the detector 110 of Figure 1) typically waits for a period of time before declaring that voice is inactive. The waiting period (typically lasting approximately 4 to 6 frames) is known in the art as a hangover period and provides opportunity to average successive frames. In other words, a set of spectral means can be calculated by averaging MBE spectral magnitudes within the hangover period as:
  • M,(k) represents an instantaneous spectral magnitude for vocoder frame number i
  • P is the number of spectral magnitudes in each frame
  • N is the number of frames in the hangover period.
  • the spectral means can then be transmitted as a SID frame update at the beginning of the period of voice inactivity.
  • the instantaneous spectral magnitudes M l are quantified on a logarithmic scale and all computations involving the instantaneous spectral magnitudes are performed using the resulting logarithmic values. Since quantization of spectral magnitudes is not critical to an understanding of the present invention, however, a detailed description of such quantization is omitted here for sake of brevity. For details regarding quantization of MBE model parameters, see the above cited International Publication No. WO 9412972.
  • the mean estimates are also refined during DTX periods so that each SID frame update accurately characterizes the prevailing transmitter background noise.
  • a running average of the mean magnitudes can be calculated as:
  • is the filter averaging coefficient or memory.
  • the AR filter is applied to the spectral magnitudes so that a continuously updating estimate of the mean is obtained.
  • the AR process has the advantage of providing good filtering while requiring few memory resources.
  • the output of the AR filter weights the current frame more heavily than previous frames so that an excessive delay is not introduced.
  • all spectral magnitudes occurring between SID updates can be averaged as described above with respect to the initial hangover period. Doing so, however, is more computationally complex and requires significantly more memory than does the above described AR filtering approach. Moreover, such a continuous averaging tends to introduce a more noticeable delay as compared to the first order AR method.
  • the MBE spectral magnitudes are not only filtered to provide superior spectral mean estimates, the MBE spectral magnitudes are also supplemented with an estimate of the variance of the noise spectrum.
  • the variance quantifies the distribution of the instantaneous spectral magnitudes about the spectrum mean and thus provides an indication of the liveliness of the modeled noise.
  • the variance of a random variable x is computed as:
  • the standard deviation of x is then defined as the square root of the variance and, like the variance, provides information regarding the liveliness of x.
  • a single standard deviation parameter is calculated and used to characterize all of the spectral magnitudes within a SID frame.
  • the instantaneous standard deviation for a particular SID frame i can be estimated as:
  • M (k) is an instantaneous spectral magnitude
  • M (k) is a filtered or mean spectral magnitude estimated as described above.
  • the instantaneous standard deviation estimate can be transmitted in a SID frame, along with the filtered MBE spectral magnitudes, and then used at the receiver to generate high-quality comfort noise (as is described hereinafter).
  • successive instantaneous standard deviation estimates can be filtered or smoothed, and the filtered standard deviation estimates can be transmitted with the filtered spectral magnitudes.
  • the instantaneous standard deviation estimates can be smoothed using a first-order AR process as:
  • a x is a per-frame update coefficient or filter memory. Filtering the instantaneous standard deviation values reduces the effects of abnormal, or outlier, spectral magnitude samples.
  • the first standard deviation estimate can be set equal to the instantaneous standard deviation value.
  • the first estimate can be equated with a last filtered estimate from a previous DTX period.
  • a weighted combination of the previous estimate and the current instantaneous value can be used to provide the first estimate.
  • the update coefficient ⁇ is not fixed and is instead adapted for each frame. This results from the fact that a fixed update coefficient can provide poor variance estimates in certain cases. For example, suppose the transmitter background noise is experiencing a volume increase across most or all frequencies of interest. In other words, suppose that the noise is non-stationary. Since the mean spectral magnitude estimates are derived by filtering the actual spectral magnitudes, changes in the actual spectral magnitudes show up in the estimated mean spectral magnitudes after some delay. For example, a volume increase in the actual spectral magnitudes typically will not appear in the mean spectral magnitudes until a few frames have passed. During this delay period, the difference between the actual spectral magnitudes and the estimated mean spectral magnitudes can be quite significant.
  • the update coefficient is dynamically adapted from frame to frame. To do so, a quality variable q, can be calculated for each frame i as: P -l l ⁇ ⁇ - ⁇ sign ⁇ M (k) -M (k) ⁇ ⁇
  • the above defined quality variable is indicative of the stationarity of the spectrum. Whenever there is a general volume change, all of the magnitude differences will tend to have the same sign, thus leading to a high sum and a low value for the variable q,. However, when the spectrum is fairly stationary, there are generally as many magnitude differences in the positive direction as there are in the negative direction, thus leading to a low sum and a high value for q r
  • each SID frame includes a set of estimated mean spectral magnitudes and a single estimated standard deviation or variance.
  • the mean spectral magnitudes are processed, in accordance with the standard deviation value, to provide enhanced spectral magnitudes for input to a speech decoder (e.g. , the decoder 250 of Figure 2).
  • the enhanced spectral magnitudes result in synthesized comfort noise which closely matches the background noise at the transmitter.
  • a random factor based on the standard deviation estimate is added to each ramped spectral magnitude.
  • the added random numbers are created using a pseudo-random number generator having a normally-distributed output.
  • the pseudo-random numbers are scaled according to the standard deviation estimate, and the randomized spectral magnitudes for a given frame are given by:
  • is the standard deviation estimate computed at the transmitter and sent within a SID frame.
  • the standard deviation ⁇ could be fixed at the receiver so that standard deviation estimates would not have to be computed at the transmitter and sent to the receiver. However, doing so would also fix the amount of liveliness in the generated comfort noise and would not track the liveliness of the background noise existing at the transmitter. Such an embodiment would, however, perform better than current methods which include no random factor at all.
  • the randomized spectral magnitudes can be sent to the speech decoder for generating quality comfort noise.
  • the character of the comfort noise can be further improved by filtering the randomized spectral magnitudes across frames.
  • the above described addition of random noise to the ramped spectral magnitudes assumes that the background noise process at the transmitter is independent or uncorrelated from frame to frame.
  • the randomness that dithers the spectral magnitudes about their means has some correlation between frames. This is the spectral equivalent of colored noise in the time-domain.
  • the present invention accounts for this phenomenon by smoothing the randomized spectral magnitudes from frame to frame as:
  • M ⁇ nal (k) ⁇ M TM ndom,zed (k) + (l - ⁇ )M ⁇ fl '(Jfc) .
  • FIG. 4 is a flow chart 400 depicting steps in the above described comfort noise generation method.
  • the steps of Figure 4 can be implemented, for example, within the DTX receiver 200 of Figure 2.
  • a determination is made as to whether a valid MBE frame has been received. If the received frame is not valid, then a comfort noise frame (i.e. , a frame of enhanced spectral magnitudes) is computed at step 420 (based in part on a previously received SID update), and the resulting comfort noise frame is synthesized at step 430. If the received frame is valid, then a determination is made at step 440 as to whether the received frame is a speech frame. If so, then the speech frame is synthesized at step 430. Otherwise, the received frame is presumed to be a valid SID update and is stored as such at step 450. Additionally, the SID update is synthesized at step 430.
  • a comfort noise frame i.e. , a frame of enhanced spectral magnitudes
  • Figure 5 depicts an exemplary comfort noise frame generator 500 according to the invention.
  • the exemplary generator can be used, for example, to implement the comfort noise frame generation step 420 of Figure 4.
  • the exemplary generator 500 includes an old comfort noise frame buffer 510, a new comfort noise frame buffer 520, a pseudo-random number generator 530, a delay buffer 540, first through fifth multipliers 550, 552, 554, 556, 558, and first and second summing devices 560, 562.
  • Those of skill in the art will appreciate that the below described functionality of the components of Figure 5 can be implemented using a variety of hardware configurations including, for example, a general purpose digital computer, standard digital signal processing components, and one or more application specific integrated circuits (ASICs).
  • ASICs application specific integrated circuits
  • outputs of the old comfort noise frame buffer 510, the new comfort noise frame buffer 520 and the pseudo-random number generator 530 are weighted, respectively, via the first, second and third multipliers 550, 552, 554, and the weighted output frames are summed via the first summing device 560.
  • Frames output by the first summing device are thus ramped and randomized as described above.
  • the ramped and randomized frames are then filtered via the fourth and fifth multipliers 556, 558, the second summing device 562, and the delay buffer 540 to provide the enhanced comfort noise frames.
  • the enhanced comfort noise frames (each including a set of enhanced spectral magnitudes) can be input to the speech decoder 250 for synthesis.
  • Figures 6-11 demonstrate the advantages of the present invention as compared to prior art comfort noise generation techniques. Specifically, Figure 6 depicts an exemplary time sequence (i.e. , successive frames) of spectral magnitudes associated with typical background noise at a DTX transmitter. Figure 7 then depicts a time sequence of comfort noise frames generated by using conventional techniques to process the spectral magnitudes of Figure 6, and Figures 8-11 depict time sequences of frames generated using the above described embodiment of the invention to process the same spectral magnitudes.
  • Figure 6 depicts an exemplary time sequence (i.e. , successive frames) of spectral magnitudes associated with typical background noise at a DTX transmitter.
  • Figure 7 depicts a time sequence of comfort noise frames generated by using conventional techniques to process the spectral magnitudes of Figure 6
  • Figures 8-11 depict time sequences of frames generated using the above described embodiment of the invention to process the same spectral magnitudes.
  • Figure 8 depicts smoothing of the spectral magnitudes of Figure 6 (e.g., at a DTX transmitter), and Figure 9 depicts ramping of the smoothed spectral magnitudes of Figure 8 (e.g. , upon receipt at a DTX receiver).
  • Figure 10 then depicts randomization of the ramped spectral magnitudes of Figure 9, and
  • Figure 11 depicts final filtering or enhancement of the randomized spectral magnitudes of Figure 10.
  • the spectral characteristic of Figure 11 is clearly closer to that of Figure 6 as compared to that of Figure 7.
  • the present invention provides improved methods and apparatus for characterizing a noise or other signal and for thereafter using the characterization to reconstruct the signal.
  • a parametric model of the signal is supplemented with at least one higher order statistic relating to the parameters of the model.
  • transmitter background noise is characterized by successive frames of estimated mean spectral magnitudes, each frame being accompanied by a single estimated standard deviation value.
  • the estimated standard deviation value is used to randomize the estimated mean spectral magnitudes and to thereby improve the sound quality of the reconstructed noise.
  • the quality of the reconstructed noise is further enhanced by averaging, smoothing or otherwise filtering the spectral magnitudes prior to transmission and/or upon receipt.
  • the spectral characteristic of the reconstructed noise very closely resembles that of the original noise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)
  • Transmitters (AREA)
  • Circuits Of Receivers In General (AREA)

Abstract

In methods and apparatus for characterizing a noise or information signal and for thereafter using the characterization to reconstruct the signal, a parametric model of the signal is supplemented with at least one higher order statistic relating to the parameters of the model. For example, in the context of DTX communications, transmitter background noise is characterized by successive frames of estimated mean spectral magnitudes, each frame being accompanied by an estimated standard deviation for the spectral magnitudes. Upon reconstruction, the estimated standard deviation is used to randomize the spectral magnitudes and to thereby improve the sound quality of the reconstructed noise. The quality of the reconstructed noise is further enhanced by averaging, smoothing or otherwise filtering the mean spectral magnitudes prior to transmission and/or upon receipt.

Description

METHODS AND APPARATUS FOR
GENERATING COMFORT NOISE USING
PARAMETRIC NOISE MODEL STATISTICS
Field of the Invention
The present invention relates to communications systems, and more particularly, to the generation of comfort noise in communications systems.
Background of the Invention In digital wireless communications systems (e.g., systems including cellular phones, land mobile radios, satellite phones, air phones, etc.), it is sometimes necessary that a receiving radio generate low-volume audio noise. For example, during a digital wireless radio call, there can exist periods in which a receiving radio temporarily does not receive valid speech information from a transmitting radio. During such periods, it is desirable that the receiving radio generate audible noise so that the user of the receiving radio does not falsely believe that the call transmission has ceased. Such noise is referred to in the art and hereinafter as comfort noise.
There are at least two primary contexts in which the generation of comfort noise is particularly advantageous. First, when a communications link becomes significantly degraded but is still operable, it is sometimes preferable that the speech path at the receiving radio be muted to prevent badly distorted speech from being passed through to the receiving radio user. However, since a complete muting of the receiver loudspeaker can lead the receiving user to incorrectly conclude that the link is completely inoperable with no possibility of recovery, the receiving radio can instead generate and play comfort noise. Doing so informs the receiving user that the receiver is still operational without subjecting him or her to the loud pops and artifacts which typically accompany corrupted speech. Comfort noise is also quite useful in the context of Discontinuous Transmission, or DTX, communications systems. In DTX systems, a transmitter detects whether an outgoing signal includes voice, and ceases or reduces the rate of transmission of the outgoing signal when it does not include voice. During such DTX periods, it is desirable that the receiver play some type of comfort noise so that the receiving user perceives that the communications path between the transmitter and the receiver is still open and operable.
In either of the above described contexts, it is generally desirable that the comfort noise generated at the receiver match, as closely as possible, the background noise existing at the transmitter. In other words, the comfort noise generation process should be transparent to the receiving user. Toward this end, the background noise existing at a transmitter can be sampled, and one or more parameters representing characteristics of the sampled noise can be periodically transmitted to a receiver for use in generating matching comfort noise. However, conventional techniques for doing so still lead to perceptible differences between the artificial comfort noise and the naturally occurring background noise. Consequently, there is a need for improved methods and apparatus for generating comfort noise in communications systems.
Summary of the Invention
The present invention fulfills the above-described and other needs by providing techniques wherein one or more higher order statistics relating to a parametric background noise model are used, in conjunction with the noise model itself, to realize high-quality, natural-sounding comfort noise. Whereas conventional systems generate comfort noise based solely on periodically estimated noise model parameters, embodiments of the invention supplement the noise model parameters with suitable statistics so that more accurate and better sounding comfort noise can be generated. Moreover, the noise model parameters can, according to the invention, be averaged, smoothed or otherwise filtered at the transmitting and/or receiving sides of a communications link in order to further enhance the sound quality of the resulting comfort noise.
In an exemplary embodiment, mean values of a number of background noise spectrum magnitudes are periodically estimated at a DTX transmitter and then transmitted, along with a single standard deviation value also estimated at the DTX transmitter, to a DTX receiver. At the DTX receiver, the periodically received mean spectrum magnitudes are smoothed across DTX frames, and the resulting smoothed values are dithered using the received standard deviation. The dithered mean values are then used to generate comfort noise at the DTX receiver.
By transmitting mean spectrum magnitudes, rather than instantaneous spectrum magnitudes, the exemplary embodiment prevents spectral randomness at the transmitter from introducing sharp spectral deviations at the receiver. Additionally, smoothing the received mean values across frames at the receiver reduces sharp, often perceptible spectral transitions which can result from the fact that comfort noise updates occur relatively infrequently. Moreover, using the estimated standard deviation to dither the smoothed mean values slightly changes the comfort noise characteristics on a frame by frame basis, resulting in a more random spectrum and therefore a more natural-sounding comfort noise. An exemplary radio transmitter according to the invention includes an encoder configured to sample an input noise signal and to provide as output a parametric model of the sampled noise signal, the parametric model including at least one modeling parameter representative of the sampled noise signal. The encoder also provides as output a statistic relating to the at least one modeling parameter, an order of the statistic being greater than an order of each modeling parameter. The encoder can, for example, be a multi-band excitation coder, a homomorphic coder, or a sinusoidal transform coder. Additionally, the parametric model can include a number of estimated mean spectral magnitudes, and the statistic can be an estimated standard deviation for the estimated mean spectral magnitudes. To enhance signal reconstruction, the encoder can periodically update and filter the at least one modeling parameter and the statistic. An exemplary radio receiver according to the invention includes a comfort noise generator configured to receive at least one noise modeling parameter representative of a noise signal and a statistic relating to the at least one noise modeling parameter. An order of the statistic is greater than an order of each noise modeling parameter, and the comfort noise generator decodes the at least one noise modeling parameter and the statistic to provide comfort noise to a user of the radio receiver. Each noise modeling parameter can, for example, be an estimated mean spectral magnitude, and the statistic can be an estimated standard deviation for the at least one estimated mean spectral magnitude. Additionally, the comfort noise generator can periodically receive and filter updates of the at least one noise modeling parameter and the statistic. Further, the comfort noise generator can process the filtered updates of the at least one noise modeling parameter in accordance with the statistic to provide the comfort noise. For example, the comfort noise generator can use an estimated standard deviation to dither filtered updates of received mean spectral magnitudes.
The above-described and other features and advantages of the invention are explained in detail hereinafter with reference to the illustrative examples shown in the accompanying drawings. Those of skill in the art will appreciate that the described embodiments are provided for purposes of illustration and understanding and that numerous equivalent embodiments are contemplated herein.
Brief Description of the Drawings
Figure 1 is a block diagram of an exemplary DTX transmitter in which background noise modeling techniques of the invention can be implemented. Figure 2 is a block diagram of an exemplary DTX receiver in which comfort noise generation techniques according to the invention can be implemented.
Figure 3 depicts an exemplary speech signal and corresponding timing of exemplary DTX frames for a DTX communications system in which the techniques of the invention can be implemented.
Figure 4 is a flow diagram depicting steps in an exemplary method of comfort noise generation according to the invention.
Figure 5 is a block diagram of an exemplary comfort noise frame generator according to the invention.
Figure 6 is a time plot of a number of spectral magnitudes representing typical background noise at a DTX transmitter.
Figure 7 is a time plot of a number of spectral magnitudes representing comfort noise produced at a DTX receiver, the spectral magnitudes having been generated based on the spectral magnitudes of Figure 6 using prior art techniques.
Figure 8 is a time plot of a number of estimated mean spectral magnitudes representing background noise at a DTX transmitter, the estimated mean spectral magnitudes having been derived by filtering or smoothing the spectral magnitudes of Figure 6 in accordance with the invention. Figure 9 is a time plot of a number of spectral magnitudes representing background noise at a DTX transmitter, the spectral magnitudes having been derived by receiving the spectral magnitudes of Figure 8 at a DTX receiver and thereafter filtering the received spectral magnitudes according to the invention.
Figure 10 is a time plot of a number of spectral magnitudes representing comfort noise generated at a DTX receiver, the spectral magnitudes having been derived by randomizing or dithering the spectral magnitudes of Figure 9 according to the invention. Figure 11 is a time plot of a number of spectral magnitudes representing enhanced comfort noise generated at a DTX receiver, the spectral magnitudes having been derived by filtering or smoothing the spectral magnitudes of Figure 10 according to the invention.
Detailed Description of the Invention
Exemplary embodiments of the invention are hereinafter described with respect to Discontinuous Transmission (DTX) communications systems. DTX is utilized for example in the well known Pacific Digital Cellular (PDC), Digital Advanced Mobile Phone System (D-AMPS, including IS641A), Global System for
Mobile Communications (GSM), and Asian Cellular Satellite (ACeS) standards. Publicly available specifications for each of the above described standards provide detailed descriptions of the standard-specific use of DTX.
Within the context of DTX, the exemplary embodiments are also hereinafter described with reference to the well known Multi-Band Excitation
(MBE™) speech coding algorithm originally developed at the Massachusetts Institute of Technology. The MBE algorithm (and, more recently, the well known successor algorithms IMBE™ and AMBE™) are very popular in digital communications systems requiring a low bit rate (i.e., under 4.8 kbps). For example, in the field of satellite phone communications, some form of MBE is used in each of the well known Iridium, INMARSAT M, INMARSAT Mini-M, ICO (INMARSAT-P), Optus, and ACeS systems. MBE-based algorithms have also been used in Land Mobile Radio (e.g. APCO-25) and air phone applications. For a detailed description of MBE algorithms, see, for example, B.S. Atal et al. , Advances in Speech Coding, Kluwer Academic Publishers, 1991; A.M. Kondoz,
Digital Speech: Coding for Low Bit Rate Systems, Wiley & Sons, 1994; and WIPO Publication WO 9412972, 06/1994, Method and Apparatus for Quantization of Harmonic Amplitudes. Although the exemplary embodiments are clearly useful in the DTX and MBE contexts, those of skill in the art will appreciate that certain aspects of the invention are equally applicable to other communications and digital signal processing applications. For example, the disclosed methods for describing or modeling characteristics of a signal and thereafter using the model parameters to generate or simulate the signal can be used, not only in providing comfort noise within a DTX system, but also in recording and/or playing back any signal of interest. Moreover, the disclosed parametric and statistical signal modeling techniques are readily applied, not only to the frequency-domain MBE speech coding algorithm, but to any signal coding algorithm. For example, the disclosed techniques are directly applicable to other frequency-domain algorithms (such as those used in homomorphic vocoders and sinusoidal transform coders) and to time-domain algorithms (such as the well known Code Excited Linear Prediction, or CELP, algorithm and the also well known Vector Sum Excited Linear Prediction, or VSELP, algorithm).
Turning now to exemplary embodiments of the invention, Figures 1 and 2 depict, respectively, a DTX transmitter 100 and a compatible DTX receiver 200. As shown in Figure 1 , the exemplary DTX transmitter 100 includes a voice activity detector (VAD) 110, a speech encoder 120, a silence description (SID) encoder 130, a channel encoder 140, and first and second transmit switches 150,
155. In Figure 2, the exemplary DTX receiver 200 includes a channel decoder 210, a frame validation processor 220, a speech frame buffer 230, a comfort noise frame buffer 240, a speech decoder 250 and a receive switch 260. Those of skill in the art will appreciate that the below described functionality of the components of Figures 1 and 2 can be implemented using a variety of hardware configurations including, for example, a general purpose digital computer, standard digital signal processing components, and one or more application specific integrated circuits. In operation, an audio frame (e.g., a collection of successive pulse code modulated samples of a user speech signal) is provided to the voice activity detector 110, the speech encoder 120, and the SID encoder 130 of the DTX transmitter 100. The voice activity detector 110 analyzes the audio frame and determines whether the frame contains voice information. If so, then the first transmit switch 150 is set to couple an output of the speech encoder 120 to an input of the channel encoder 140, and the speech encoder 120 is instructed to encode a speech frame (using techniques described below) for input to the channel coder 140. Otherwise, the transmit switch 150 is set to couple an output of the SID encoder 130 to the input of the channel encoder 140, and the SID encoder
130 is instructed to encode a SID frame (using techniques also described below) for input to the channel coder 140. In practice, operation of the speech encoder 120 and the SID encoder 130 can be combined in a single encoding device.
Having received either a speech frame from the speech encoder 120 or a SID frame from the SID encoder 130, the channel encoder 140 uses known channel coding techniques to prepare the frame for transmission across the communications channel (e.g., the air interface). During periods in which the speech signal includes voice, the second transmit switch 155 remains closed, and successive speech frames are encoded and transmitted. However, when the voice activity detector 110 determines that voice activity has just stopped, only a limited number (typically one or two) SID frames are encoded and transmitted. Subsequently, SID update frames are periodically (e.g. , every 250 ms to 1.0 sec) encoded and transmitted until the voice activity detector 110 indicates that voice has restarted. At that time, the speech encoder 120 resumes generating speech frames for transmission until the voice again ceases.
At the receiver 200, the channel decoder 210 receives and decodes incoming frames (i.e., the channel decoder 210 performs the inverse of the coding process implemented by the channel coder 140) and provides the decoded frames to the validation processor 220, the speech frame buffer 230, and the comfort noise frame buffer 240. During DTX periods, most of the received frames are invalid and are therefore filled with random data created by RF interference and receiver noise. Occasionally, however, a valid SID update frame is transmitted during DTX, and valid speech frame transmissions can resume at any time.
To handle this ambiguity, the validation processor 220 analyzes each received frame for content. If the received frame is not valid, the receive switch 260 is set to couple the comfort noise frame buffer 240 to an input of the speech decoder 250, and the comfort noise frame buffer 240 is instructed to provide a noise frame to the speech decoder 250 for comfort noise generation. In the event the received frame is a valid SID update, the received frame is used to update the contents of the comfort noise frame buffer 240 before a noise frame is provided to the speech decoder 250 for comfort noise generation. Lastly, if the received frame represents a valid voice frame, the receive switch 260 is set to couple the speech frame buffer 230 to the speech decoder, and the received frame is sent to the speech decoder 250 to be synthesized for presentation to the receiver user.
Figure 3 is a timing diagram illustrating the above described DTX operation. In the figure, a speech signal includes first and second speech bursts 310, 320, separated by a period of silence. During the first speech burst 310, valid speech frames 315 are continually transmitted. However, immediately following the end of the first speech burst 310, and periodically throughout the period of silence between speech bursts, valid SID frames 330 are transmitted instead of speech frames. Then, at the start of the second speech burst 325, valid speech frames 325 are again continually transmitted. Such DTX operation provides significant advantages over conventional continuous transmission, and DTX is therefore a common feature in digital wireless systems of today. For example, DTX enables the transmitting radio to save power since it is not required to transmit as often. More specifically, since the transmitter power amplifier (PA) typically consumes the majority of the transmitter power, significant power savings can be realized by turning the power amplifier off during DTX mode. Additionally, DTX results in less RF energy being transmitted into the air interface spectrum. Consequently, the average RF interference seen by other users in a multiple-access system is reduced, and the
Carrier-to-Interference (C/I) ratio seen by those users is commensurately enhanced. Increased C/I improves the performance of the radio terminals or, conversely, increases the capacity of the system (i.e. , the number of users that can be supported in a given frequency allocation is increased). As described above with respect to Figures 1 and 2, the speech signal in a
DTX system is sampled and encoded (e.g., by the speech encoder 120), and thereafter the encoded values are decoded (e.g., by the speech decoder 250) to synthesize or reconstruct the speech signal. The combination of an encoder and a decoder is often referred to in the art as a codec or a vocoder, and any of a number of known techniques can be implemented within a vocoder to accomplish the functions of speech coding and decoding.
Such techniques can generally be classified as one of two types. Namely, there are waveform coding techniques and parametric coding techniques. Waveform vocoders attempt to quantize and encode the speech signal itself, while parametric vocoders assume a model for the speech signal, the model consisting of a number of parameters. Typically, a parametric vocoder receives samples of the speech signal, groups the samples into frames, fits the frame of samples to the model, then quantizes and encodes the values for the model parameters. In this manner, parametric vocoders are able to produce a desired speech quality at lower information (i.e., bit) rates than are waveform vocoders.
A robust and popular parametric vocoder is the above noted MBE vocoder. Like many speech coders, the MBE vocoder divides a sampled speech signal into 20-ms frames. For each voice frame, a set of MBE model parameters is calculated. The model parameters (e.g., including a fundamental pitch frequency and a number of voicing decisions) describe the perceptual content of the frame and can therefore be used to later generate a synthesized speech signal which is perceptually similar to the original speech signal. For frames containing no voice (e.g. , for frames containing only background noise sampled at a DTX transmitter), the MBE model produces a set of spectral magnitudes which can be used to recreate the frames (e.g., to synthesize comfort noise at a DTX receiver). In conventional DTX systems, the most recent SID update is used directly and repeatedly during DTX periods to generate comfort noise. In other words, the latest SID frame (e.g., an MBE frame containing spectral magnitudes) is sent, over and over, to the speech decoder 250 for synthesis. As a result, the DTX receiver forces the comfort noise characteristic at the receiver to match that of the background noise at the transmitter each time a SID update is received. Moreover, the comfort noise spectrum remains static between SID updates. There are at least two significant disadvantages to such an approach.
First, consider a case in which the background noise at the transmitter is stationary. Then, by definition, the mean noise spectrum is constant over time. However, this says nothing of the variance of the spectrum. In most realistic noise environments, the instantaneous spectral values are continually changing and form a random distribution around a mean value. The human listener perceives both the spectral mean and the spectral variance. Whereas the spectral mean indicates the loudness of the background noise, the spectral variance indicates noise liveliness. Since conventional comfort noise generation schemes only account for the mean spectrum (e.g., by fixing the MBE spectral magnitudes between comfort noise updates), such schemes often result in a perceptible mismatch between the comfort noise generated during DTX periods and the background noise encoded during periods of continuous speech transmission. Next, consider a case in which the transmitter background noise is non- stationary between comfort noise updates. In such case, sharp transitions can occur when a comfort noise update is received (e.g., when a prevailing set of MBE spectral magnitudes is replaced with an updated set of spectral magnitudes). At the DTX transmitter, changes in the volume and/or spectral characteristic of the background noise typically occur over a period of several frames. However, since the DTX receiver gets relatively few SID updates, such changes are reflected very abruptly at the receiver and can therefore make the DTX functionality less transparent and perceptually displeasing to the receiving user. Advantageously, the invention provides methods and apparatus for capturing both the loudness and the liveliness of the transmitter background noise. More generally, the invention provides techniques for capturing the perceptible characteristics of any signal of interest. To do so, a parametric model of the signal (e.g., a set of MBE spectral magnitudes representing transmitter background noise) is supplemented with one or more higher order statistics relating to the parametric model. For example, in the case of DTX transmissions, the MBE spectral magnitudes of a SID frame (which can be thought of as a crude estimate of the mean noise spectrum) can be supplemented with an estimate of the variance of the background noise spectrum. By using the one or more higher order statistics (e.g., the variance estimate), in conjunction with the model parameters (e.g., the spectral magnitudes), to reconstruct the original signal (e.g. , to generate comfort noise), a more accurate and perceptually pleasing result is achieved. Moreover, the invention discloses that the model parameters can be smoothed, averaged, or otherwise filtered to further enhance the reconstructed signal. Such filtering can be performed when the model parameters are generated
(e.g., prior to DTX transmission or prior to recording on a storage medium) and/or when the parameters are used for signal reconstruction (e.g., upon DTX reception or upon playback from a storage medium). Without loss of generality, the various features and advantages of the invention are described hereinafter with respect to comfort noise generation in a DTX communications system utilizing the above described MBE speech coding model.
As noted above, the MBE spectral magnitudes in a DTX SID frame can be thought of as an estimate of the mean noise spectrum. According to the invention, however, a superior estimate of the mean spectrum can be obtained by filtering successive spectral magnitude frames. For example, at the beginning of each period of voice inactivity, a DTX voice activity detector (e.g., the detector 110 of Figure 1) typically waits for a period of time before declaring that voice is inactive. The waiting period (typically lasting approximately 4 to 6 frames) is known in the art as a hangover period and provides opportunity to average successive frames. In other words, a set of spectral means can be calculated by averaging MBE spectral magnitudes within the hangover period as:
Mt(k) = M
Figure imgf000015_0001
- (k) for k = 0 to P -l ,
where M,(k) represents an instantaneous spectral magnitude for vocoder frame number i, P is the number of spectral magnitudes in each frame, and N is the number of frames in the hangover period. The spectral means can then be transmitted as a SID frame update at the beginning of the period of voice inactivity. In practice, the instantaneous spectral magnitudes Ml are quantified on a logarithmic scale and all computations involving the instantaneous spectral magnitudes are performed using the resulting logarithmic values. Since quantization of spectral magnitudes is not critical to an understanding of the present invention, however, a detailed description of such quantization is omitted here for sake of brevity. For details regarding quantization of MBE model parameters, see the above cited International Publication No. WO 9412972.
According to the invention, the mean estimates are also refined during DTX periods so that each SID frame update accurately characterizes the prevailing transmitter background noise. For example, a running average of the mean magnitudes can be calculated as:
M ) = M ) + (1 - ) { (k) , for k = 0 to P-l .
Those of skill in the art will recognize this is a first order auto-regressive (AR) filter applied to each spectral magnitude, where α is the filter averaging coefficient or memory. The AR filter is applied to the spectral magnitudes so that a continuously updating estimate of the mean is obtained. The AR process has the advantage of providing good filtering while requiring few memory resources. Furthermore, the output of the AR filter weights the current frame more heavily than previous frames so that an excessive delay is not introduced. Empirical studies have shown that a filter memory of α = 1/16 provides quality results. Alternatively, all spectral magnitudes occurring between SID updates can be averaged as described above with respect to the initial hangover period. Doing so, however, is more computationally complex and requires significantly more memory than does the above described AR filtering approach. Moreover, such a continuous averaging tends to introduce a more noticeable delay as compared to the first order AR method.
According to the invention, the MBE spectral magnitudes are not only filtered to provide superior spectral mean estimates, the MBE spectral magnitudes are also supplemented with an estimate of the variance of the noise spectrum. The variance quantifies the distribution of the instantaneous spectral magnitudes about the spectrum mean and thus provides an indication of the liveliness of the modeled noise. Mathematically, the variance of a random variable x is computed as:
σ* = E{(x -μ f}
where E{ } is the expected value operator and μ = E{Λ:} is the mean of x. The standard deviation of x is then defined as the square root of the variance and, like the variance, provides information regarding the liveliness of x.
In exemplary embodiments, a single standard deviation parameter is calculated and used to characterize all of the spectral magnitudes within a SID frame. For example, the instantaneous standard deviation for a particular SID frame i can be estimated as:
Figure imgf000017_0001
where P is the number of spectral magnitudes in each frame, M (k) is an instantaneous spectral magnitude, and M (k) is a filtered or mean spectral magnitude estimated as described above.
Advantageously, the instantaneous standard deviation estimate can be transmitted in a SID frame, along with the filtered MBE spectral magnitudes, and then used at the receiver to generate high-quality comfort noise (as is described hereinafter). Alternatively, successive instantaneous standard deviation estimates can be filtered or smoothed, and the filtered standard deviation estimates can be transmitted with the filtered spectral magnitudes. For example, the instantaneous standard deviation estimates can be smoothed using a first-order AR process as:
Figure imgf000017_0002
where ax is a per-frame update coefficient or filter memory. Filtering the instantaneous standard deviation values reduces the effects of abnormal, or outlier, spectral magnitude samples.
At the beginning of each DTX period, the first standard deviation estimate can be set equal to the instantaneous standard deviation value. Alternatively, the first estimate can be equated with a last filtered estimate from a previous DTX period. Additionally, a weighted combination of the previous estimate and the current instantaneous value can be used to provide the first estimate.
According to exemplary embodiments, the update coefficient α, is not fixed and is instead adapted for each frame. This results from the fact that a fixed update coefficient can provide poor variance estimates in certain cases. For example, suppose the transmitter background noise is experiencing a volume increase across most or all frequencies of interest. In other words, suppose that the noise is non-stationary. Since the mean spectral magnitude estimates are derived by filtering the actual spectral magnitudes, changes in the actual spectral magnitudes show up in the estimated mean spectral magnitudes after some delay. For example, a volume increase in the actual spectral magnitudes typically will not appear in the mean spectral magnitudes until a few frames have passed. During this delay period, the difference between the actual spectral magnitudes and the estimated mean spectral magnitudes can be quite significant.
However, such difference is caused by the mean estimator and not by true randomness in the spectra. Since these differences are summed and input to the variance estimator, the variance estimate will be artificially inflated when a fixed update coefficient is used. Thus, according to the invention, the update coefficient is dynamically adapted from frame to frame. To do so, a quality variable q, can be calculated for each frame i as: P -l l ~ \ -∑ sign{M (k) -M (k)} \
P ι=0
The above defined quality variable is indicative of the stationarity of the spectrum. Whenever there is a general volume change, all of the magnitude differences will tend to have the same sign, thus leading to a high sum and a low value for the variable q,. However, when the spectrum is fairly stationary, there are generally as many magnitude differences in the positive direction as there are in the negative direction, thus leading to a low sum and a high value for qr
Therefore, when the quality factor q, is high, one can be confident in the instantaneous variance estimate, and it is reasonable to use the instantaneous estimate in updating the smoothed standard deviation estimate. However, when the quality factor is low, the instantaneous variance estimate is questionable, and it is preferable that the instantaneous estimate not be used to update the smoothed estimate. This idea can be quantified by applying an adaptive update coefficient , which is controlled by the quality vactor q, according to:
a, = q,a,
where AT is a constant representing the maximum possible update coefficient (i.e. , since q, is by definition in the range 0 to 1). Empirical studies have shown that a maximum of a= l/32 provides quality results. As noted above, the smoothed standard deviation estimates are transmitted within SID frames, along with the mean spectral magnitudes, for comfort noise generation at the DTX receiver. Of course, the variance estimate, rather than the standard deviation estimate, can be smoothed and transmitted instead. Whether variance or standard deviation estimates are used is matter of design choice. Note also that a separate standard deviation (or variance) estimate could be computed for each spectral magnitude. Doing so, however, would result in transmission of many additional parameters. Furthermore, experimentation has shown that most noise sources of interest tend to have similar variances across their spectra. Thus, a single term is adequate for most cases.
At the DTX receiver, the SID frames (which, according to exemplary embodiments, are transmitted every 48 MBE frames, or every 960 ms) form the basis for the spectrum of the comfort noise to be generated. As described above, each SID frame includes a set of estimated mean spectral magnitudes and a single estimated standard deviation or variance. According to the invention, the mean spectral magnitudes are processed, in accordance with the standard deviation value, to provide enhanced spectral magnitudes for input to a speech decoder (e.g. , the decoder 250 of Figure 2). Advantageously, the enhanced spectral magnitudes result in synthesized comfort noise which closely matches the background noise at the transmitter.
First, abrupt spectral changes are avoided at SID updates by filtering the mean spectral magnitudes from update to update. For example, suppose Molά(k) (for k = 1 to P as above) represent mean spectral magnitudes from a previously received SID frame and Mnev/(k) represent mean spectral magnitudes from a just- received SID frame. Rather than transitioning immediately from MM(k) to
Mnev/(k), the spectral magnitudes are transitioned over a numer of frames N. For example, linear ramping functions or other transition functions involving polynomials or exponentials are possible. An exemplary linear ramp is given by:
Figure imgf000020_0001
M (k) for i≥N After the ramping, the updated mean spectral magnitudes new(£) are used until a next SID frame update is received. Empirical studies have shown that a ramping or transition period of N= 16 frames provides good results.
To make the comfort noise characteristic less static, a random factor based on the standard deviation estimate is added to each ramped spectral magnitude. According to exemplary embodiments, the added random numbers are created using a pseudo-random number generator having a normally-distributed output. The pseudo-random numbers are scaled according to the standard deviation estimate, and the randomized spectral magnitudes for a given frame are given by:
Figure imgf000021_0001
where x(k) is the output of a normally-distributed pseudo-random number generator with var( ) = 1 , and σ is the standard deviation estimate computed at the transmitter and sent within a SID frame.
Note that the standard deviation σ could be fixed at the receiver so that standard deviation estimates would not have to be computed at the transmitter and sent to the receiver. However, doing so would also fix the amount of liveliness in the generated comfort noise and would not track the liveliness of the background noise existing at the transmitter. Such an embodiment would, however, perform better than current methods which include no random factor at all.
Advantageously, the randomized spectral magnitudes can be sent to the speech decoder for generating quality comfort noise. According to the invention, however, the character of the comfort noise can be further improved by filtering the randomized spectral magnitudes across frames. Note that the above described addition of random noise to the ramped spectral magnitudes assumes that the background noise process at the transmitter is independent or uncorrelated from frame to frame. In reality, the randomness that dithers the spectral magnitudes about their means has some correlation between frames. This is the spectral equivalent of colored noise in the time-domain. The present invention accounts for this phenomenon by smoothing the randomized spectral magnitudes from frame to frame as:
M ιnal(k) = β M ™ndom,zed(k) + (l -β)M^fl'(Jfc) .
Those of skill in the art will recognize this as a first-order AR filter applied to each randomized spectral magnitude, where β is the filter update coefficient or memory. Empirical studies have shown that an update coefficient of /?=0.5 provides good results. Note also that alternative smoothing methods (e.g., higher-order AR filters or moving average filters) can be implemented as well.
Figure 4 is a flow chart 400 depicting steps in the above described comfort noise generation method. The steps of Figure 4 can be implemented, for example, within the DTX receiver 200 of Figure 2. At step 410, a determination is made as to whether a valid MBE frame has been received. If the received frame is not valid, then a comfort noise frame (i.e. , a frame of enhanced spectral magnitudes) is computed at step 420 (based in part on a previously received SID update), and the resulting comfort noise frame is synthesized at step 430. If the received frame is valid, then a determination is made at step 440 as to whether the received frame is a speech frame. If so, then the speech frame is synthesized at step 430. Otherwise, the received frame is presumed to be a valid SID update and is stored as such at step 450. Additionally, the SID update is synthesized at step 430.
Figure 5 depicts an exemplary comfort noise frame generator 500 according to the invention. The exemplary generator can be used, for example, to implement the comfort noise frame generation step 420 of Figure 4. As shown, the exemplary generator 500 includes an old comfort noise frame buffer 510, a new comfort noise frame buffer 520, a pseudo-random number generator 530, a delay buffer 540, first through fifth multipliers 550, 552, 554, 556, 558, and first and second summing devices 560, 562. Those of skill in the art will appreciate that the below described functionality of the components of Figure 5 can be implemented using a variety of hardware configurations including, for example, a general purpose digital computer, standard digital signal processing components, and one or more application specific integrated circuits (ASICs).
In operation, outputs of the old comfort noise frame buffer 510, the new comfort noise frame buffer 520 and the pseudo-random number generator 530 are weighted, respectively, via the first, second and third multipliers 550, 552, 554, and the weighted output frames are summed via the first summing device 560. Frames output by the first summing device are thus ramped and randomized as described above. The ramped and randomized frames are then filtered via the fourth and fifth multipliers 556, 558, the second summing device 562, and the delay buffer 540 to provide the enhanced comfort noise frames. As shown, the enhanced comfort noise frames (each including a set of enhanced spectral magnitudes) can be input to the speech decoder 250 for synthesis.
Figures 6-11 demonstrate the advantages of the present invention as compared to prior art comfort noise generation techniques. Specifically, Figure 6 depicts an exemplary time sequence (i.e. , successive frames) of spectral magnitudes associated with typical background noise at a DTX transmitter. Figure 7 then depicts a time sequence of comfort noise frames generated by using conventional techniques to process the spectral magnitudes of Figure 6, and Figures 8-11 depict time sequences of frames generated using the above described embodiment of the invention to process the same spectral magnitudes.
Specifically, Figure 8 depicts smoothing of the spectral magnitudes of Figure 6 (e.g., at a DTX transmitter), and Figure 9 depicts ramping of the smoothed spectral magnitudes of Figure 8 (e.g. , upon receipt at a DTX receiver). Figure 10 then depicts randomization of the ramped spectral magnitudes of Figure 9, and Figure 11 depicts final filtering or enhancement of the randomized spectral magnitudes of Figure 10. Advantageously, the spectral characteristic of Figure 11 is clearly closer to that of Figure 6 as compared to that of Figure 7. Generally, the present invention provides improved methods and apparatus for characterizing a noise or other signal and for thereafter using the characterization to reconstruct the signal. According to the invention, a parametric model of the signal is supplemented with at least one higher order statistic relating to the parameters of the model. In the context of DTX communications, transmitter background noise is characterized by successive frames of estimated mean spectral magnitudes, each frame being accompanied by a single estimated standard deviation value. Upon reconstruction, the estimated standard deviation value is used to randomize the estimated mean spectral magnitudes and to thereby improve the sound quality of the reconstructed noise. The quality of the reconstructed noise is further enhanced by averaging, smoothing or otherwise filtering the spectral magnitudes prior to transmission and/or upon receipt. Advantageously, the spectral characteristic of the reconstructed noise very closely resembles that of the original noise.
Those skilled in the art will appreciate that the present invention is not limited to the specific exemplary embodiments which have been described herein for purposes of illustration and that numerous alternative embodiments are also contemplated. The scope of the invention is therefore defined by the claims appended hereto, rather than the foregoing description, and all equivalents which are consistent with the meaning of the claims are intended to be embraced therein.

Claims

We Claim:
1. A radio transmitter, comprising: an encoder configured to sample an input noise signal and to provide as output a parametric model of the sampled noise signal, the parametric model including at least one modeling parameter representative of the sampled noise signal, wherein said encoder also provides as output a statistic relating to the at least one modeling parameter, an order of the statistic being greater than an order of each modeling parameter.
2. A radio transmitter according to claim 1, wherein said encoder is one of a multi-band excitation coder, a homomorphic coder and a sinusoidal transform coder.
3. A radio transmitter according to claim 1 , wherein the parametric model includes a number of estimated mean spectral magnitudes.
4. A radio transmitter according to claim 3, wherein the statistic is an estimated standard deviation for the spectral magnitudes.
5. A radio transmitter according to claim 1 , wherein said encoder periodically updates the at least one modeling parameter and the statistic.
6. A radio transmitter according to claim 5, wherein said encoder filters successive updates of the at least one modeling parameter.
7. A radio transmitter according to claim 6, wherein each modeling parameter update is an estimated mean spectral magnitude.
8. A radio transmitter according to claim 5, wherein said encoder filters successive updates of the statistic.
9. A radio transmitter according to claim 5, wherein said encoder filters successive updates of the at least one modeling parameter and the statistic, and wherein said transmitter sends the filtered updates to a radio receiver.
10. A radio transmitter according to claim 9, wherein said radio transmitter is a Discontinuous Transmission (DTX) device, and wherein the filtered updates are transmitted to said radio receiver within Silence Description (SID) frames.
11. A radio receiver, comprising: a comfort noise generator configured to receive at least one noise modeling parameter representative of a noise signal and a statistic relating to the at least one noise modeling parameter, an order of the statistic being greater than an order of each noise modeling parameter, wherein said comfort noise generator decodes the at least one noise modeling parameter and the statistic to provide comfort noise to a user of said radio receiver.
12. A radio receiver according to claim 11, wherein each noise modeling parameter is an estimated mean spectral magnitude.
13. A radio receiver according to claim 12, wherein the statistic is an estimated standard deviation for the at least one spectral magnitude.
14. A radio receiver according to claim 11, wherein said comfort noise generator periodically receives updates of the at least one noise modeling parameter and the statistic.
15. A radio receiver according to claim 14, wherein said comfort noise generator filters successive updates of the at least one noise modeling parameter.
16. A radio receiver according to claim 15, wherein said comfort noise generator applies a ramping function in filtering successive updates of the at least one noise modeling parameter.
17. A radio receiver according to claim 15, wherein said comfort noise generator processes the filtered updates of the at least one noise modeling parameter in accordance with the statistic to provide the comfort noise.
18. A radio receiver according to claim 17, wherein each noise modeling parameter is an estimated mean spectral magnitude, wherein the statistic is an estimated standard deviation for the at least one estimated mean spectral magnitude, and wherein said comfort noise generator dithers the filtered updates of the at least one estimated mean spectral magnitude in accordance with the estimated standard deviation.
19. A radio receiver according to claim 15, wherein said comfort noise generator filters the dithered updates of the at least one spectral magnitude to provide correlation between successive dithered updates.
20. A radio receiver according to claim 14, wherein said comfort noise generator receives the periodic updates of the at least one noise modeling parameter and the statistic from a radio transmitter.
21. A radio according to claim 20, wherein said radio receiver is a Discontinuous Transmission (DTX) device, and wherein said comfort noise generator receives the at least one noise modeling parameter and the statistic within Silence Description (SID) frames sent by said transmitter.
PCT/US2000/013829 1999-06-07 2000-05-19 Methods and apparatus for generating comfort noise using parametric noise model statistics WO2000075919A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU50320/00A AU5032000A (en) 1999-06-07 2000-05-19 Methods and apparatus for generating comfort noise using parametric noise model statistics
JP2001502113A JP2003501925A (en) 1999-06-07 2000-05-19 Comfort noise generation method and apparatus using parametric noise model statistics
DE10084675T DE10084675T1 (en) 1999-06-07 2000-05-19 Method and device for generating artificial noise using parametric noise model measures

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US32668099A 1999-06-07 1999-06-07
US09/326,680 1999-06-07

Publications (1)

Publication Number Publication Date
WO2000075919A1 true WO2000075919A1 (en) 2000-12-14

Family

ID=23273227

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/013829 WO2000075919A1 (en) 1999-06-07 2000-05-19 Methods and apparatus for generating comfort noise using parametric noise model statistics

Country Status (6)

Country Link
JP (1) JP2003501925A (en)
CN (1) CN1145928C (en)
AU (1) AU5032000A (en)
DE (1) DE10084675T1 (en)
MY (1) MY133505A (en)
WO (1) WO2000075919A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1120775A1 (en) * 1999-06-15 2001-08-01 Matsushita Electric Industrial Co., Ltd. Noise signal encoder and voice signal encoder
SG102694A1 (en) * 2002-09-06 2004-03-26 Building And Construction Auth Facade integrity testing apparatus and method
EP2202725A1 (en) * 2007-09-28 2010-06-30 Huawei Technologies Co., Ltd. Apparatus and method for noise generation
US7890322B2 (en) 2008-03-20 2011-02-15 Huawei Technologies Co., Ltd. Method and apparatus for speech signal processing
US8380497B2 (en) 2008-10-15 2013-02-19 Qualcomm Incorporated Methods and apparatus for noise estimation
US9037457B2 (en) 2011-02-14 2015-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec supporting time-domain and frequency-domain coding modes
US9047859B2 (en) 2011-02-14 2015-06-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
US9153236B2 (en) 2011-02-14 2015-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec using noise synthesis during inactive phases
US9384739B2 (en) 2011-02-14 2016-07-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for error concealment in low-delay unified speech and audio coding
US9536530B2 (en) 2011-02-14 2017-01-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Information signal representation using lapped transform
US9583110B2 (en) 2011-02-14 2017-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing a decoded audio signal in a spectral domain
US9595262B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Linear prediction based coding scheme using spectral domain noise shaping
US9595263B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding and decoding of pulse positions of tracks of an audio signal
US9620129B2 (en) 2011-02-14 2017-04-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499856B2 (en) * 2002-12-25 2009-03-03 Nippon Telegraph And Telephone Corporation Estimation method and apparatus of overall conversational quality taking into account the interaction between quality factors
US7610197B2 (en) * 2005-08-31 2009-10-27 Motorola, Inc. Method and apparatus for comfort noise generation in speech communication systems
CN101303855B (en) * 2007-05-11 2011-06-22 华为技术有限公司 Method and device for generating comfortable noise parameter
CN102760441B (en) * 2007-06-05 2014-03-12 华为技术有限公司 Background noise coding/decoding device and method as well as communication equipment
CN101320563B (en) * 2007-06-05 2012-06-27 华为技术有限公司 Background noise encoding/decoding device, method and communication equipment
CN101453517B (en) * 2007-09-28 2013-08-07 华为技术有限公司 Noise generating apparatus and method
DE102008009718A1 (en) 2008-02-19 2009-08-20 Siemens Enterprise Communications Gmbh & Co. Kg Method and means for encoding background noise information
AR085895A1 (en) * 2011-02-14 2013-11-06 Fraunhofer Ges Forschung NOISE GENERATION IN AUDIO CODECS
KR101589038B1 (en) * 2014-03-14 2016-01-27 국방과학연구소 Method and device for generating random noise data preserving the correlation on privacy preserving time-series databases
EP2980790A1 (en) * 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for comfort noise generation mode selection
CN106328151B (en) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 ring noise eliminating system and application method thereof
JP7385381B2 (en) * 2019-06-21 2023-11-22 株式会社日立製作所 Abnormal sound detection system, pseudo sound generation system, and pseudo sound generation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0786760A2 (en) * 1996-01-29 1997-07-30 Texas Instruments Incorporated Speech coding
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
US5893056A (en) * 1997-04-17 1999-04-06 Northern Telecom Limited Methods and apparatus for generating noise signals from speech signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
EP0786760A2 (en) * 1996-01-29 1997-07-30 Texas Instruments Incorporated Speech coding
US5893056A (en) * 1997-04-17 1999-04-06 Northern Telecom Limited Methods and apparatus for generating noise signals from speech signals

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1120775A1 (en) * 1999-06-15 2001-08-01 Matsushita Electric Industrial Co., Ltd. Noise signal encoder and voice signal encoder
EP1120775A4 (en) * 1999-06-15 2001-09-26 Matsushita Electric Ind Co Ltd Noise signal encoder and voice signal encoder
SG102694A1 (en) * 2002-09-06 2004-03-26 Building And Construction Auth Facade integrity testing apparatus and method
EP2202725A1 (en) * 2007-09-28 2010-06-30 Huawei Technologies Co., Ltd. Apparatus and method for noise generation
EP2202725A4 (en) * 2007-09-28 2010-09-22 Huawei Tech Co Ltd Apparatus and method for noise generation
JP2010540992A (en) * 2007-09-28 2010-12-24 華為技術有限公司 Noise generating apparatus and method
US8296132B2 (en) 2007-09-28 2012-10-23 Huawei Technologies Co., Ltd. Apparatus and method for comfort noise generation
US7890322B2 (en) 2008-03-20 2011-02-15 Huawei Technologies Co., Ltd. Method and apparatus for speech signal processing
US8380497B2 (en) 2008-10-15 2013-02-19 Qualcomm Incorporated Methods and apparatus for noise estimation
US9037457B2 (en) 2011-02-14 2015-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec supporting time-domain and frequency-domain coding modes
US9047859B2 (en) 2011-02-14 2015-06-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
US9153236B2 (en) 2011-02-14 2015-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio codec using noise synthesis during inactive phases
US9384739B2 (en) 2011-02-14 2016-07-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for error concealment in low-delay unified speech and audio coding
US9536530B2 (en) 2011-02-14 2017-01-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Information signal representation using lapped transform
US9583110B2 (en) 2011-02-14 2017-02-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing a decoded audio signal in a spectral domain
US9595262B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Linear prediction based coding scheme using spectral domain noise shaping
US9595263B2 (en) 2011-02-14 2017-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoding and decoding of pulse positions of tracks of an audio signal
US9620129B2 (en) 2011-02-14 2017-04-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for coding a portion of an audio signal using a transient detection and a quality result

Also Published As

Publication number Publication date
MY133505A (en) 2007-11-30
AU5032000A (en) 2000-12-28
CN1367918A (en) 2002-09-04
JP2003501925A (en) 2003-01-14
CN1145928C (en) 2004-04-14
DE10084675T1 (en) 2002-06-06

Similar Documents

Publication Publication Date Title
WO2000075919A1 (en) Methods and apparatus for generating comfort noise using parametric noise model statistics
US12100406B2 (en) Method, apparatus, and system for processing audio data
US7680653B2 (en) Background noise reduction in sinusoidal based speech coding systems
EP1337999B1 (en) Method and system for comfort noise generation in speech communication
US6539355B1 (en) Signal band expanding method and apparatus and signal synthesis method and apparatus
US8065141B2 (en) Apparatus and method for processing signal, recording medium, and program
US6122607A (en) Method and arrangement for reconstruction of a received speech signal
US20090190780A1 (en) Systems, methods, and apparatus for context processing using multiple microphones
JP2003514473A (en) Noise suppression
US8190440B2 (en) Sub-band codec with native voice activity detection
JPH0713600A (en) Vocoder ane method for encoding of drive synchronizing time
US6424942B1 (en) Methods and arrangements in a telecommunications system
CN114550732A (en) Coding and decoding method and related device for high-frequency audio signal
CN101069231A (en) Method of comfort noise generation for speech communication
EP1190495A1 (en) Coded domain echo control
EP1672619A2 (en) Speech coding apparatus and method therefor
EP1112568B1 (en) Speech coding
JP4533517B2 (en) Signal processing method and signal processing apparatus
JP2000132192A (en) Signal processing method and device therefor, and band width extending method and device therefor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2001 502113

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 008112266

Country of ref document: CN

RET De translation (de og part 6b)

Ref document number: 10084675

Country of ref document: DE

Date of ref document: 20020606

WWE Wipo information: entry into national phase

Ref document number: 10084675

Country of ref document: DE

122 Ep: pct application non-entry in european phase