EP1287521A1 - Perceptual spectral weighting of frequency bands for adaptive noise cancellation - Google Patents

Perceptual spectral weighting of frequency bands for adaptive noise cancellation

Info

Publication number
EP1287521A1
EP1287521A1 EP01918328A EP01918328A EP1287521A1 EP 1287521 A1 EP1287521 A1 EP 1287521A1 EP 01918328 A EP01918328 A EP 01918328A EP 01918328 A EP01918328 A EP 01918328A EP 1287521 A1 EP1287521 A1 EP 1287521A1
Authority
EP
European Patent Office
Prior art keywords
speech
power
weighting
values
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01918328A
Other languages
German (de)
French (fr)
Other versions
EP1287521A4 (en
Inventor
Ravi Chandran
Bruce E. Dunne
Daniel J. Marchok
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Coriant Operations Inc
Original Assignee
Tellabs Operations Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tellabs Operations Inc filed Critical Tellabs Operations Inc
Publication of EP1287521A1 publication Critical patent/EP1287521A1/en
Publication of EP1287521A4 publication Critical patent/EP1287521A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • This invention relates to communication system noise cancellation techniques, and more particularly relates to weighting calculations used in such techniques
  • the need for speech quality enhancement in single-channel speech communication systems has increased in importance especially due to the tremendous growth in cellular telephony Cellular telephones are operated often in the presence of high levels of environmental background noise, such as in moving vehicles. Such high levels of noise cause significant degradation of the speech quality at the far end receiver.
  • speech enhancement techniques may be employed to improve the quality of the received speech so as to increase customer satisfaction and encourage longer talk times
  • FIG. 1A shows an example of a typical p ⁇ or noise suppression system that uses spectral subtraction.
  • a spectral decomposition of the input noisy speech-containing signal is first performed using the Filter Bank.
  • the Filter Bank may be a bank of bandpass filters (such as in reference [1], which is identified at the end of the desc ⁇ ption of the preferred embodiments).
  • the Filter Bank decomposes the signal into separate frequency bands For each band, power measurements are performed and continuously updated over time in the noisysy Signal Power & Noise Power Estimation block.
  • SNR signal-to-noise ratio
  • the Voice Activity Detector is used to distinguish pe ⁇ ods of speech activity from pe ⁇ ods of silence
  • the noise power in each band is updated primarily during silence while the noisy signal power is tracked at all times.
  • a gain (attenuation) factor is computed based on the SNR of the band and is used to attenuate the signal in the band.
  • each frequency band of the noisy input speech signal is attenuated based on its SNR.
  • Figure IB illustrates another more sophisticated prior approach using an overall SNR level in addition to the individual SNR values to compute the gain factors for each band.
  • the overall SNR is estimated in the Overall SNR Estimation block.
  • the gain factor computations for each band are performed in the Gain Computation block.
  • the attenuation of the signals in different bands is accomplished by multiplying the signal in each band by the corresponding gain factor in the Gain Multiplication block.
  • Low SNR bands are attenuated more than the high SNR bands.
  • the amount of attenuation is also greater if the overall SNR is low.
  • the signals in the different bands are recombined into a single, clean output signal. The resulting output signal will have an improved overall perceived quality.
  • the decomposition of the input noisy speech-containing signal can also be performed using Fourier transform techniques or wavelet transform techniques.
  • Figure 2 shows the use of discrete Fourier transform techniques (shown as the Windowing & FFT block).
  • a block of input samples is transformed to the frequency domain.
  • the magnitude of the complex frequency domain elements are attenuated based on the spectral subtraction principles described earlier.
  • the phase of the complex frequency domain elements are left unchanged.
  • the complex frequency domain elements are then transformed back to the time domain via an inverse discrete Fourier transform in the IFFT block, producing the output signal.
  • wavelet transform techniques may be used for decomposing the input signal.
  • a Voice Activity Detector is part of many noise suppression systems. Generally, the power of the input signal is compared to a variable threshold level. Whenever the threshold is exceeded, speech is assumed to be present. Otherwise, the signal is assumed to contain only background noise. Such two-state voice activity detectors do not perform robustly under adverse conditions such as in cellular telephony environments. An example of a voice activity detector is described in reference [5].
  • Various implementations of noise suppression systems utilizing spectral subtraction differ mainly in the methods used for power estimation, gain factor determination, spectral decomposition of the input signal and voice activity detection. A broad overview of spectral subtraction techniques can be found in reference [3].
  • Several other approaches to speech enhancement, as well as spectral subtraction, are overviewed in reference [4].
  • Perceptual spectral weighting can improve the performance of some adaptive noise cancellation systems.
  • deficiencies in weighting functions have limited the effectiveness of known noise cancellation systems.
  • This invention addresses and provides one solution for such problems. BRIEF SUMMARY OF THE INVENTION
  • the preferred embodiment is useful in a communication system for processing a communication signal including a speech component derived from speech and a noise component derived from noise.
  • the quality of the communication signal can be enhanced by dividing the communication signal into a plurality of frequency band signals representing the communication signal in a plurality of frequency bands.
  • the dividing may be accomplished with a filter or a calculator employing, for example, a Fourier transform.
  • a control signal is generated in response to the speech component.
  • the control signal indicates one or more characteristics of the frequency distribution of the speech component corresponding to at least some of the frequency bands.
  • Weighting values are assigned to the frequency band signals in response to the values of the control signal.
  • the frequency band signals are altered in response to the weighting values to generate weighted frequency band signals.
  • the weighted frequency band signals are combined to generate a communication signal with enhanced quality.
  • the foregoing signal generation and manipulation of signals and values preferably is accomplished with a calculator.
  • Figures 1 A and IB are schematic block diagrams of known noise cancellation systems.
  • Figure 2 is a schematic block diagram of another form of a known noise cancellation system.
  • Figure 3 is a functional and schematic block diagram illustrating a preferred form of adaptive noise cancellation system made in accordance with the invention.
  • Figure 4 is a schematic block diagram illustrating one embodiment of the invention implemented by a digital signal processor.
  • Figure 5 is graph of relative noise ratio versus weight illustrating a preferred assignment of weight for va ⁇ ous ranges of values of relative noise ratios.
  • Figure 6 is a graph plotting power versus Hz illustrating a typical power spectral density of background noise recorded from a cellular telephone in a moving vehicle.
  • Figure 7 is a curve plotting Hz versus weight obtained from a preferred form of adaptive weighting function in accordance with the invention.
  • Figure 8 is a graph plotting Hz versus weight for a family of weighting curves calculated according to a preferred embodiment of the invention.
  • Figure 9 is a graph plotting Hz versus decibels of the broad spectral shape of a typical voiced speech segment.
  • Figure 10 is a graph plotting Hz versus decibels of the broad spectral shape of a typical unvoiced speech segment.
  • the preferred form of ANC system shown in Figure 3 is robust under adverse conditions often present in cellular telephony and packet voice networks. Such adverse conditions include signal dropouts and fast changing background noise conditions with wide dynamic ranges.
  • the Figure 3 embodiment focuses on attaining high perceptual quality in the processed speech signal under a wide variety of such channel impairments.
  • SPM Signal Activity Measure
  • the SPM is capable of detecting signal dropouts as well as new environments. Dropouts are temporary losses of the signal that occur commonly in cellular telephony and in voice over packet networks. New environment detection is the ability to detect the start of new calls as well as sudden changes in the background noise environment of an ongoing call.
  • the SPM can be beneficial to any noise reduction function, including the preferred embodiment of this invention.
  • Accurate noisy signal and noise power measures which are performed for each frequency band, improve the performance of the preferred embodiment.
  • the measurement for each band is optimized based on its frequency and the state information from the SPM.
  • the frequency dependence is due to the optimization of power measurement time constants based on the statistical distribution of power across the spectrum in typical speech and environmental background noise.
  • this spectrally based optimization of the power measures has taken into consideration the non-linear nature of the human auditory system.
  • the SPM state information provides additional information for the optimization of the time constants as well as ensu ⁇ ng stability and speed of the power measurements under adverse conditions. For instance, the indication of a new environment by the SPM allows the fast reaction of the power measures to the new environment
  • weighting functions are based on (1) the overall noise-to- signal ratio (NSR), (2) the relative noise ratio, and (3) a perceptual spectral weighting model.
  • the first function is based on the fact that over-suppression under heavier overall noise conditions provide better perceived quality.
  • the second function utilizes the noise cont ⁇ bution of a band relative to the overall noise to approp ⁇ ately weight the band, hence providing a fine structure to the spectral weighting.
  • the third weighting function is based on a model of the power-frequency relationship in typical environmental background noise. The power and frequency are approximately inversely related, from which the name of the model is de ⁇ ved.
  • the inverse spectral weighting model parameters can be adapted to match the actual environment of an ongoing call.
  • the weights are conveniently applied to the NSR values computed for each frequency band; although, such weighting could be applied to other parameters with approp ⁇ ate modifications just as well.
  • the weighting functions are independent, only some or all the functions can be jointly utilized
  • the preferred embodiment preserves the natural spectral shape of the speech signal which is important to perceived speech quality. This is attained by careful spectrally interdependent gam adjustment achieved through the attenuation factors.
  • An additional advantage of such spectrally interdependent gam adjustment is the va ⁇ ance reduction of the attenuation factors.
  • a preferred form of adaptive noise cancellation system 10 comp ⁇ ses an input voice channel 20 transmitting a communication signal comp ⁇ smg a plurality of frequency bands derived from speech and noise to an input terminal 22.
  • a speech signal component of the commumcation signal is due to speech and a noise signal component of the communication signal is due to noise.
  • a filter function 50 filters the communication signal into a plurality of frequency band signals on a signal path 51.
  • a DTMF tone detection function 60 and a speech presence measure function 70 also receive the communication signal on input channel 20.
  • the frequency band signals on path 51 are processed by a noisy signal power and noise power estimation function 80 to produce va ⁇ ous forms of power signals.
  • the power signals provide inputs to an perceptual spectral weighting function 90, a relative noise ratio based weighting function 100 and an overall noise to signal ratio based weighting function 110.
  • Functions 90, 100 and 110 also receive inputs from speech presence measure function 70 which is an improved voice activity detector
  • Functions 90, 100 and 110 generate preferred forms of weighting signals having weighting factors for each of the frequency bands generated by filter function 50.
  • the weighting signals provide inputs to a noise to signal ratio computation and weighting function 120 which multiplies the weighting factors from functions 90, 100 and 110 for each frequency band together and computes an NSR value for each frequency band signal generated by the filter function 50.
  • Some of the power signals calculated by function 80 also provide inputs to function 120 for calculating the NSR value.
  • a gain computation and interdependent gain adjustment function 130 calculates preferred forms of initial gain signals and preferred forms of modified gain signals with initial and modified gain values for each of the frequency bands and modifies the initial gain values for each frequency band by, for example, smoothing so as to reduce the variance of the gain.
  • the value of the modified gain signal for each frequency band generated by function 130 is multiplied by the value of every sample of the frequency band signal in a gain multiplication function 140 to generate preferred forms of weighted frequency band signals.
  • the weighted frequency band signals are summed in a combiner function 160 to generate a communication signal which is transmitted through an output terminal 172 to a channel 170 with enhanced quality.
  • a DTMF tone extension or regeneration function 150 also can place a DTMF tone on channel 170 through the operation of combiner function 160.
  • the function blocks shown in Figure 3 may be implemented by a variety of well known calculators, including one or more digital signal processors (DSP) including a program memory storing programs which are executed to perform the functions associated with the blocks (described later in more detail) and a data memory for storing the variables and other data described in connection with the blocks.
  • DSP digital signal processors
  • Figure 4 illustrates a calculator in the form of a digital signal processor 12 which communicates with a memory 14 over a bus 16.
  • Processor 12 performs each of the functions identified in connection with the blocks of Figure 3.
  • any of the function blocks may be implemented by dedicated hardware implemented by application specific integrated circuits (ASICs), including memory, which are well known in the art.
  • ASICs application specific integrated circuits
  • Figure 3 also illustrates an ANC 10 comprising a separate ASIC for each block capable of performing the function indicated by the block. Filtering
  • the noisy speech-containing input signal on channel 20 occupies a 4kHz bandwidth.
  • This communication signal may be spectrally decomposed by filter 50 using a filter bank or other means for dividing the communication signal into a plurality of frequency band signals.
  • the filter function could be implemented with block-processing methods, such as a Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • the resulting frequency band signals typically represent a magnitude value (or its square) and a phase value.
  • the techniques disclosed in this specification typically are applied to the magnitude values of the frequency band signals.
  • Filter 50 decomposes the input signal into N frequency band signals representing N frequency bands on
  • the input to filter 50 will be denoted x(n) while the output of the k' h filter
  • the input, x( ⁇ ) , to filter 50 is high-pass filtered to remove DC components by
  • the gain (or attenuation) factor for the k' h frequency band is computed by function 130 once every T samples as
  • a suitable value for T is 10 when the sampling rate is 8kHz.
  • the gain factor will range between a small positive value, ⁇ , and 1 because the weighted NSR values are limited to he in the range [0,1- ⁇ ]. Setting the lower limit of the gain to ⁇ reduces the effects of "musical noise" (described in reference [2]) and permits limited background signal transparency. In the preferred embodiment, ⁇ is set to 0.05.
  • W k (n) is used for over-suppression and under-suppression pu ⁇ oses of the
  • the overall weighting factor is computed by function 120 as
  • u k (n) is the weight factor or value based on overall NSR as calculated by
  • w k (n) is the weight factor or value based on the relative noise ratio
  • function 140 by multiplying x k (n) by its corresponding gain factor, G k (n) , every
  • Combiner 160 sums the resulting attenuated signals, ⁇ ( ⁇ ) , to generate the enhanced output signal on channel
  • noisy signal power and noise power estimation function 80 include the calculation of power estimates and generating preferred forms of corresponding power band signals having power band values as identified in Table 1 below.
  • the power, P(n) at sample n, of a discrete-time signal u(n) is estimated approximately by either (a) lowpass filte ⁇ ng the full-wave rectified signal or (b) lowpass filte ⁇ ng an even power of the signal such as the square of the signal
  • a first order IIR filter can be used for the lowpass filter for both cases as follows:
  • the lowpass filte ⁇ ng of the full-wave rectified signal or an even power of a signal is an averaging process.
  • the power estimation (e.g., averaging) has an effective time window or time pe ⁇ od du ⁇ ng which the filter coefficients are large, whereas outside this window, the coefficients are close to zero
  • the coefficients of the lowpass filter determine the size of this window or time pe ⁇ od
  • the power estimation (e.g , averaging) over different effective window sizes or time periods can be achieved by using different filter coefficients.
  • the rate of averaging is said to be increased, it is meant that a shorter time period is used.
  • the power estimates react more quickly to the newer samples, and "forget" the effect of older samples more readily.
  • the rate of averaging is said to be reduced, it is meant that a longer time period is used.
  • the first order IIR filter has the following transfer function:
  • the coefficient, ⁇ is a decay constant.
  • the decay constant also represents how fast the old power value is forgotten and how quickly the power of the newer input samples is inco ⁇ orated.
  • larger values of ⁇ result in longer effective averaging windows
  • Such first order lowpass IIR filters may be used for estimation of the va ⁇ ous power measures listed in the Table 1 below-
  • Function 80 generates a signal for each of the foregoing Va ⁇ ables.
  • Each of the signals in Table 1 is calculated using the estimations desc ⁇ bed in this Power Estimation section.
  • the Speech Presence Measure which will be discussed later, utilizes short-term and long-term power measures in the first formant region. To perform the first formant power measurements, the input signal, x(n) , is lowpass
  • the filter has a cut-off frequency at 850 ⁇ z and has coefficients
  • time constants are examples of the parameters used to analyze a communication signal and enhance its quality.
  • NSR overall (n) at sample n is defined as
  • the NSR for the k' h frequency band may be computed as
  • Speech presence measure (SPM) 70 may utilize any known DTMF detection method if DTMF tone extension or regeneration functions 150 are to be performed.
  • SPM 70 primarily performs a measure of the likelihood that the signal activity is due to the presence of speech. This can be quantized to a discrete number of decision levels depending on the application. In the preferred embodiment, we use five levels.
  • the SPM performs its decision based on the DTMF flag and the LEVEL value.
  • the SPM also outputs two flags or signals, DROPOUT and NEWENV, which will be desc ⁇ bed in the following sections.
  • the novel multi-level decisions made by the SPM are achieved by using a speech likelihood related companson signal and multiple va ⁇ able thresholds.
  • a speech likelihood related compa ⁇ son signal we de ⁇ ve such a speech likelihood related compa ⁇ son signal by compa ⁇ ng the values of the first formant short-term noisy signal power estimate,
  • Phtsiin Phtsiin
  • Pht ur(n)- Multiple compa ⁇ sons are performed using expressions involving and Pi st Lii'i) as given in the preferred embodiment of equation (11) below.
  • the result of these compa ⁇ sons is used to update the speech likelihood related compa ⁇ son signal.
  • the speech likelihood related compa ⁇ son signal is a
  • the hangover counter, /. var can be assigned a va ⁇ able hangover pe ⁇ od that is
  • the inequalities of (11) determine whether P ⁇ it ,s ⁇ (n) exceeds P] it ,L ⁇ (n) by more
  • h M represents a preferred form of
  • comparison signal resulting from the comparisons defined in (11) and having a value representing differing degrees of likelihood that a portion of the input communication signal results from at least some speech.
  • the hangover period length can be considered as a measure that is directly proportional to the probability of speech presence. Since the SPM decision is required to reflect the likelihood that the signal activity is due to the presence of speech, and the SPM decision is based partly on the LEVEL value according to Table 1, we determine the value for LEVEL based on the hangover counter as tabulated below.
  • SPM 70 generates a preferred form of a speech likelihood signal having values corresponding to LEVELs 0-3.
  • LEVEL depends indirectly on the power measures and represents varying likelihood that the input communication signal results from at least some speech. Basing LEVEL on the hangover counter is advantageous because a certain amount of hyste ⁇ sis is provided. That is, once the count enters one of the ranges defined in the preceding table, the count is constrained to stay in the range for va ⁇ able pe ⁇ ods of time. This hyste ⁇ sis prevents the LEVEL value and hence the SPM decision from changing too often due to momentary changes in the signal power. If LEVEL were based solely on the power measures, the SPM decision would tend to flutter between adjacent levels when the power measures he near decision bounda ⁇ es.
  • a dropout is a situation where the input signal power has a defined attribute, such as suddenly dropping to a very low level or even zero for short durations of time
  • dropouts are often expe ⁇ enced especially in a cellular telephony environment. For example, dropouts can occur due to loss of speech frames in cellular telephony or due to the user moving from a noisy environment to a quiet environment suddenly. During dropouts, the ANC system operates differently as will be explained later.
  • Equation (8) shows the use of a DROPOUT signal in the long-term (noise) power measure. Du ⁇ ng dropouts, the adaptation of the long-term power for the SPM is stopped or slowed significantly. This prevents the long-term power measure from being reduced drastically during dropouts, which could potentially lead to incorrect speech presence measures later.
  • the SPM dropout detection utilizes the DROPOUT signal or flag and a
  • the counter is updated as follows every sample time.
  • the attribute of c dwpout determines at least in part the
  • comparison factor, ⁇ dropout is 0.2.
  • the background noise environment would not be known by ANC system 10.
  • the background noise environment can also change suddenly when the user moves from a noisy environment to a quieter environment e g moving from a busy street to an indoor environment with windows and doors closed. In both these cases, it would be advantageous to adapt the noise power measures quickly for a short pe ⁇ od of time.
  • the SPM outputs a signal or flag called NEWENV to the ANC system
  • the detection of a new environment at the beginning of a call will depend on the system under question. Usually, there is some form of indication that a new call has been initiated. For instance, when there is no call on a particular line in some networks, an idle code may be transmitted. In such systems, a new call can be detected by checking for the absence of idle codes. Thus, the method for mfer ⁇ ng that a new call has begun will depend on the particular system.
  • a pitch estimator is used to monitor whether voiced speech is present in the input signal. If voiced speech is present, the pitch pe ⁇ od (i.e., the inverse of pitch frequency) would be relatively steady over a pe ⁇ od of about 20ms. If only background noise is present, then the pitch pe ⁇ od would change in a random manner. If a cellular handset is moved from a quiet room to a noisy outdoor environment, the input signal would be suddenly much louder and may be incorrectly detected as speech. The pitch detector can be used to avoid such incorrect detection and to set the new environment signal so that the new noise environment can be quickly measured
  • any of the numerous known pitch pe ⁇ od estimation devices may be used, such as device 74 shown in Fig. 3.
  • the following method is used. Denoting K(n-T) as the pitch pe ⁇ od estimate from T samples ago, and K(n) as the current pitch pe ⁇ od estimate, if ⁇ K(n)- K(n-40) ⁇ >3, and ⁇ K(n-40)-K(n-80) ⁇ >3, and ⁇ K(n-80)-K(n-120) ⁇ >3, then the pitch pe ⁇ od is not steady and it is unlikely that the input signal contains voiced speech. If these conditions are true and yet the SPM says that LEVEL>1 which normally implies that significant speech is present, then it can be inferred that a sudden increase in the background noise has occurred.
  • the following table specifies a method of updating NEWENV and c nmem .
  • the NEWENV flag is set to 1 for a pe ⁇ od of time specified by
  • the NEWENV flag is set to 1 in response to
  • a suitable value for the c newmv max is 2000 which corresponds to 0.25 seconds
  • the multi-level SPM decision and the flags DROPOUT and NEWENV are generated on path 72 by SPM 70. With these signals, the ANC system is able to perform noise cancellation more effectively under adverse conditions. Furthermore, as previously desc ⁇ bed, the power measurement function has been significantly enhanced compared to p ⁇ or known systems. Additionally, the three independent weighting functions earned out by functions 90, 100 and 110 can
  • SPM 70 is indicating that there is a new environment due to either a new call or that it is a post-dropout environment. If there is no speech activity, i.e. the SPM indicates that there is silence, then it would be advantageous for the ANC system to measure the noise spectrum quickly. This quick reaction allows a shorter adaptation time for the ANC system to a new noise
  • the time constants ⁇ N k , ⁇ k , a N k and a s k are based on
  • the time constants are also based on the multi-level decisions of the SPM
  • SPM decisions there are four possible SPM decisions (i.e., Silence, Low Speech, Medium Speech, High Speech).
  • Silence When the SPM decision is Silence, it would be beneficial to speed up the tracking of the noise in all the bands.
  • the SPM decision When the SPM decision is Low Speech, the likelihood of speech is higher and the noise power measurements are slowed down accordingly. The likelihood of speech is considered too high in the remaining speech states and thus the noise power measurements are turned off in these states.
  • the time constants for the signal power measurements are modified so as to slow down the tracking when the likelihood of speech is low. This reduces the va ⁇ ance of the signal power measures du ⁇ ng low speech levels and silent pe ⁇ ods. This is especially beneficial du ⁇ ng silent pe ⁇ ods as it prevents short-duration noise spikes from causing the gam factors to ⁇ se.
  • over-suppression is achieved by weighting the NSR according
  • u k (n) 0.5 + NSR overall (n) (14)
  • a suitable update rate is once per 2T samples.
  • the relative noise ratio in a frequency band can be defined as
  • the goal is to assign a higher weight for a band when the ratio, R k (n) , for that
  • Function 80 ( Figure 3) generates preferred forms of band power signals corresponding to the terms on the right side of equation (15) and function 100 generates preferred forms of weighting signals with weighting values co ⁇ esponding to the term on the left side of equation (15).
  • Figure 6 shows the typical power spectral density of background noise recorded from a cellular telephone in a moving vehicle.
  • Typical environmental background noise has a power spectrum that corresponds to pink or brown noise.
  • Pink noise has power inversely proportional to the frequency.
  • Brown noise has power inversely proportional to the square of the frequency.
  • the weight, w f for a particular frequency, / can be modeled as a function
  • This model has three parameters ⁇ b, f 0 , c ⁇
  • the Figure 7 curve vanes monotomcally with decreasing values of weight from 0 Hz to about 3000 Hz, and also vanes monotomcally with increasing values of weight from about 3000 Hz to about 4000 Hz.
  • the ideal weights, w k may be obtained as a function of the measured noise
  • the ideal weights are equal to the noise power measures normalized by the largest noise power measure.
  • the normalized power of a noise component m a particular frequency band is defined as a ratio of the power of the noise component in that frequency band and a function of some or all of the powers of the noise components m the frequency band or outside the frequency band. Equations (15) and (18) are examples of such normalized power of a noise component. In case all the power values are zero, the ideal weight is set to unity. This ideal weight is actually an alternative definition of RNR.
  • the normalized power may be calculated according to (18) Accordingly, function 100 ( Figure 3) may generate a preferred form of weighting signals having weighting values approximating equation (18).
  • the approximate model in (17) attempts to mimic the ideal weights computed
  • the iterations may be performed every sample time or slower, if desired, for economy.
  • the weights are adapted efficiently using a simpler adaptation technique for economical reasons. We fix the value of the weighting
  • the weighting values so that they vary monotonically between two frequencies separated by a factor of 2 (e.g., the weighting values vary monotonically between 1000-2000 Hz and/or between 1500-3000 Hz).
  • the determination of c n is performed by comparing the total noise power in
  • lowpass and highpass filter could be used to filter x(n) followed by
  • the min and max functions rest ⁇ ct c n to he within [0.1,1.0].
  • a curve such as Figure 7, could be stored as a weighting signal or table in memory 14 and used as static weighting values for each of the frequency band signals generated by filter 50.
  • the curve could vary monotonically, as previously explained, or could vary according to the estimated
  • the power spectral density shown in Figure 6 could be thought of as defining the spectral shape of the noise component of the communication signal received on channel 20.
  • the value of c is altered according to the spectral shape in
  • weighting values determined according to the spectral shape of the noise component of the communication signal on channel 20 are denved in part from the likelihood that the communication signal is denved at least in part from speech.
  • the weighting values could be determined from the overall background noise power.
  • equation (17) is determined by the value of P BN (n)
  • the weighting values may vary in accordance with at least an approximation of one or more characte ⁇ stics (e.g., spectral shape of noise or overall background power) of the noise signal component of the communication signal on channel 20.
  • characte ⁇ stics e.g., spectral shape of noise or overall background power
  • the perceptual importance of different frequency bands change depending on charactenstics of the frequency distnbution of the speech component of the communication signal being processed. Determining perceptual importance from such characte ⁇ stics may be accomplished by a vanety of methods. For example, the characte ⁇ stics may be determined by the likelihood that a communication signal is denved from speech. As explained previously, this type of classification can be
  • the type of signal can be further classified by determining whether the speech is voiced or unvoiced.
  • Voiced speech results from vibration of vocal cords and is illustrated by utterance of a vowel sound.
  • Unvoiced speech does not require vibration of vocal cords and is illustrated by utterance of a consonant sound.
  • the actual implementation of the perceptual spectral weighting may be performed directly on the gam factors for the individual frequency bands.
  • Another alternative is to weight the power measures approp ⁇ ately. In our prefened method, the weighting is inco ⁇ orated into the NSR measures.
  • the PSW technique may be implemented independently or in any combination with the overall NSR based weighting and RNR based weighting methods.
  • the weights in the PSW technique are selected to vary between zero and one. Larger weights conespond to greater suppression
  • the basic idea of PSW is to adapt the weighting curve in response to changes in the characteristics of the frequency distribution of at least some components of the communication signal on channel 20.
  • the weighting curve may be changed as the speech spectrum changes when the speech signal transitions from one type of communication signal to another, e.g., from voiced to unvoiced and vice versa.
  • the weighting curve may be adapted to changes in the speech component of the communication signal.
  • the regions that are most critical to perceived quality are weighted less so that they are suppressed less. However, if these perceptually important regions contain a significant amount of noise, then their weights will be adapted closer to one.
  • v, b(k - k 0 ) 2 + c (30)
  • v k is the weight for frequency band k. In this method, we will vary only k 0
  • This weighting curve is generally U-shaped and has a minimum value of c at
  • the lowest weight frequency band, k 0 is adapted based on the likelihood of
  • k 0 is allowed to be in the
  • midband frequencies are weighted less in general.
  • lowest weight frequency band k 0 is placed closer to 4000Hz so that the mid to high
  • the lowest weight frequency band is varied with the speech likelihood related comparison signal as follows:
  • the minimum weight c could be fixed to a small value such as 0.25.
  • the regional NSR, NSR nal (k) is defined with respect to the minimum weight
  • the regional ⁇ SR is the ratio of the noise power to the noisy signal
  • the minimum weight c when the regional ⁇ SR is -15dB or lower, we set the minimum weight c to 0.25 (which is about 12dB). As the regional ⁇ SR approaches its maximum value of OdB, the minimum weight is increased towards unity. This can be achieved by adapting the minimum weight c at sample time n as
  • NSR overall (n) ⁇ 0.1778 -15dB + 0.08S , 0.1778 ⁇ NSR overall (n) ⁇ 1
  • the v ⁇ curves are plotted for a range of values of c and k 0 in Figures 11-13 to
  • processor 12 generates a control signal from
  • the likelihood signal can also be used as a measure of whether the speech is voiced or unvoiced. Determining whether the speech is voiced or unvoiced can be accomplished by means other than the likelihood signal. Such means are known to those skilled in the field of communications.
  • the characteristics of the frequency distribution of the speech component of the channel 20 signal needed for PSW also can be determined from the output of pitch estimator 74.
  • the pitch estimate is used as a control signal which indicates the characteristics of the frequency distribution of the speech component of the channel 20 signal needed for PSW.
  • the pitch estimate or to be more specific, the rate of change of the pitch, can be used to solve for k ⁇ , in equation (32). A slow rate of change would correspond to smaller JQ values, and vice versa.
  • the calculated weights for the different bands are based on an approximation of the broad spectral shape or envelope of the speech component of the communication signal on channel 20.
  • the calculated weighting curve has a generally inverse relationship to the broad spectral shape of the speech component of the channel 20 signal.
  • An example of such an inverse relationship is to calculate the weighting curve to be inversely proportional to the speech spectrum, such that when the broad spectral shape of the speech spectrum is multiplied by the weighting curve, the resulting broad spectral shape is approximately flat or constant at all frequencies in the frequency bands of interest. This is different from the standard spectral subtraction weighting which is based on the noise-to-signal ratio of individual bands.
  • PSW we are taking into consideration the entire speech signal (or a significant portion of it) to determine the weighting curve for all the frequency bands.
  • the weights are determined based only on the individual bands. Even in a spectral subtraction implementation such as in Figure IB, only the overall SNR or NSR is considered but not the broad spectral shape
  • the total power, P k (n) may be used to approximate the speech power
  • band power values together provide the broad spectral shape estimate or envelope estimate.
  • the number of band power values m the set will vary depending on the desired accuracy of the estimate. Smoothing of these band power values using moving average techniques is also beneficial to remove jaggedness in the envelope estimate.
  • the perceptual weighting curve may be determined to be inversely proportional to the broad spectral shape
  • the weight for the &* band, v k may be determined as
  • v ; (n) ⁇ I P (n) , where ⁇ is a predetermined value.
  • a set of speech power values such as a set of R (n) values, is used as a control signal
  • the va ⁇ ation of the power signals used for the estimate is reduced across the N frequency bands. For instance, the spectrum shape of the speech component of the channel 20 signal is made more nearly flat across the N frequency bands, and the vanation in the spectrum shape is reduced.
  • a paramet ⁇ c technique in our preferred implementation which also has the advantage that the weighting curve is always smooth across frequencies.
  • a paramet ⁇ c weighting curve i.e. the weighting curve is formed based on a few parameters that are adapted based on the spectral shape. The number of parameters is less than the number of weighting factors.
  • the parametnc weighting function in our economical implementation is given by the equation (30), which is a quadratic curve with three parameters.
  • a noise cancellation system will benefit from the implementation of only one or va ⁇ ous combinations of the functions.
  • the bandpass filters of the filter bank used to separate the speech signal into different frequency band components have little overlap Specifically, the magnitude frequency response of one filter does not significantly overlap the magnitude frequency response of any other filter in the filter bank This is also usually true for discrete Fou ⁇ er or fast Founer transform based implementations In such cases, we have discovered that improved noise cancellation can be achieved by interdependent gain adjustment Such adjustment is affected by smoothing of the input signal spectrum and reduction in vanance of gam factors across the frequency bands according to the techniques descnbed below.
  • the splitting of the speech signal into different frequency bands and applying independently determined ga factors on each band can sometimes destroy the natural spectral shape of the speech signal Smoothing the gain factors across the bands can help to preserve the natural spectral shape of the speech signal Furthermore, it also reduces the va ⁇ ance of the gain factors
  • the initial gam factors preferably are generated in the form of signals with initial gam values in function block 130 ( Figure 3) according to equation (1)
  • the initial gain factors or values are modified using a weighted moving average
  • the gain factors co ⁇ espondmg to the low and high values of k must be handled slightly differently to prevent edge effects.
  • the initial gain factors are modified by recalculating equation (1) in function 130 to a prefened form of modified gain signals having modified gain values or factors. Then the modified gain factors are used for gain multiplication by equation (3) in function block 140 ( Figure 3).
  • the M k are the moving average coefficients tabulated below for our preferred
  • coefficients selected from the following ranges of values are in the range of 10 to 50 times the value of the sum of the other coefficients.
  • the coefficient 0.95 is in the range of 10 to 50 times the value of the sum of the other coefficients shown in each line of the preceding table. More specifically, the coefficient 0.95 is in the range from .90 to .98.
  • the coefficient 0.05 is in the range .02 to .09.
  • the gain for frequency band k depends on NSR k (n) which in turn
  • G k ( ⁇ ) is computed as a function noise power and noisy signal power values from
  • G k (n) may be computed
  • Equations (1.1)-(1.4) All provide smoothing of the input signal spectrum and reduction in variance of the gain factors across the frequency bands. Each method has its own particular advantages and trade-offs.
  • the first method (1.1) is simply an alternative to smoothing the gains directly.
  • the method of (1.2) provides smoothing across the noise spectrum only while (1.3) provides smoothing across the noisy signal spectrum only.
  • Each method has its advantages where the average spectral shape of the corresponding signals are maintained. By performing the averaging in (1.2), sudden bursts of noise happening in a particular band for very short periods would not adversely affect the estimate of the noise spectrum. Similarly in method (1.3), the broad spectral shape of the speech spectrum which is generally smooth in nature will not become too jagged in the noisy signal power estimates due to, for instance, changing pitch of the speaker.
  • the method of (1.4) combines the advantages of both (1.2) and (1.3).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Noise Elimination (AREA)

Abstract

A communication system for processing a communication signal derived from speech and noise enhances the quality of the communication signal by providing a filter (50) dividing the communication signal into a plurality of frequency band signals representing the communication signal in a plurality of frequency bands. A calculator generates a likelihood signal having values representing the likelihoods that the communication signal is derived from speech (70). The calculator assigns weighting values (120) to the frequency band signals in response to the values of the likelihood signal. The calculator also alters the frequency band signals in response to the weighting values to generate weighted frequency band signals, and combines (160) the weighted frequency bands signals to generate a communication signal with enhanced quality (170).

Description

TITLE OF INVENTION
PERCEPTUAL SPECTRAL WEIGHTING OF FREQUENCY BANDS FOR ADAPTIVE NOISE CANCELLATION
BACKGROUND OF THE INVENTION
This invention relates to communication system noise cancellation techniques, and more particularly relates to weighting calculations used in such techniques The need for speech quality enhancement in single-channel speech communication systems has increased in importance especially due to the tremendous growth in cellular telephony Cellular telephones are operated often in the presence of high levels of environmental background noise, such as in moving vehicles. Such high levels of noise cause significant degradation of the speech quality at the far end receiver. In such circumstances, speech enhancement techniques may be employed to improve the quality of the received speech so as to increase customer satisfaction and encourage longer talk times
Most noise suppression systems utilize some vaπation of spectral subtraction Figure 1A shows an example of a typical pπor noise suppression system that uses spectral subtraction. A spectral decomposition of the input noisy speech-containing signal is first performed using the Filter Bank. The Filter Bank may be a bank of bandpass filters (such as in reference [1], which is identified at the end of the descπption of the preferred embodiments). The Filter Bank decomposes the signal into separate frequency bands For each band, power measurements are performed and continuously updated over time in the Noisy Signal Power & Noise Power Estimation block. These power measures are used to determine the signal-to-noise ratio (SNR) in each band The Voice Activity Detector is used to distinguish peπods of speech activity from peπods of silence The noise power in each band is updated primarily during silence while the noisy signal power is tracked at all times. For each frequency band, a gain (attenuation) factor is computed based on the SNR of the band and is used to attenuate the signal in the band. Thus, each frequency band of the noisy input speech signal is attenuated based on its SNR. Figure IB illustrates another more sophisticated prior approach using an overall SNR level in addition to the individual SNR values to compute the gain factors for each band. (See also reference [2].) The overall SNR is estimated in the Overall SNR Estimation block. The gain factor computations for each band are performed in the Gain Computation block. The attenuation of the signals in different bands is accomplished by multiplying the signal in each band by the corresponding gain factor in the Gain Multiplication block. Low SNR bands are attenuated more than the high SNR bands. The amount of attenuation is also greater if the overall SNR is low. After the attenuation process, the signals in the different bands are recombined into a single, clean output signal. The resulting output signal will have an improved overall perceived quality.
The decomposition of the input noisy speech-containing signal can also be performed using Fourier transform techniques or wavelet transform techniques. Figure 2 shows the use of discrete Fourier transform techniques (shown as the Windowing & FFT block). Here a block of input samples is transformed to the frequency domain. The magnitude of the complex frequency domain elements are attenuated based on the spectral subtraction principles described earlier. The phase of the complex frequency domain elements are left unchanged. The complex frequency domain elements are then transformed back to the time domain via an inverse discrete Fourier transform in the IFFT block, producing the output signal. Instead of Fourier transform techniques, wavelet transform techniques may be used for decomposing the input signal.
A Voice Activity Detector is part of many noise suppression systems. Generally, the power of the input signal is compared to a variable threshold level. Whenever the threshold is exceeded, speech is assumed to be present. Otherwise, the signal is assumed to contain only background noise. Such two-state voice activity detectors do not perform robustly under adverse conditions such as in cellular telephony environments. An example of a voice activity detector is described in reference [5]. Various implementations of noise suppression systems utilizing spectral subtraction differ mainly in the methods used for power estimation, gain factor determination, spectral decomposition of the input signal and voice activity detection. A broad overview of spectral subtraction techniques can be found in reference [3]. Several other approaches to speech enhancement, as well as spectral subtraction, are overviewed in reference [4].
Perceptual spectral weighting can improve the performance of some adaptive noise cancellation systems. In the past, deficiencies in weighting functions have limited the effectiveness of known noise cancellation systems. This invention addresses and provides one solution for such problems. BRIEF SUMMARY OF THE INVENTION
The preferred embodiment is useful in a communication system for processing a communication signal including a speech component derived from speech and a noise component derived from noise. In such an environment, the quality of the communication signal can be enhanced by dividing the communication signal into a plurality of frequency band signals representing the communication signal in a plurality of frequency bands. The dividing may be accomplished with a filter or a calculator employing, for example, a Fourier transform. A control signal is generated in response to the speech component. The control signal indicates one or more characteristics of the frequency distribution of the speech component corresponding to at least some of the frequency bands. Weighting values are assigned to the frequency band signals in response to the values of the control signal. The frequency band signals are altered in response to the weighting values to generate weighted frequency band signals. The weighted frequency band signals are combined to generate a communication signal with enhanced quality.
The foregoing signal generation and manipulation of signals and values preferably is accomplished with a calculator.
By using the foregoing techniques, a perceptual weighting function needed to improve communication signal quality can be generated with a degree of ease and accuracy unattained by the known prior techniques.
BRIEF DESCRIPTION OF THE DRAWINGS Figures 1 A and IB are schematic block diagrams of known noise cancellation systems. Figure 2 is a schematic block diagram of another form of a known noise cancellation system.
Figure 3 is a functional and schematic block diagram illustrating a preferred form of adaptive noise cancellation system made in accordance with the invention. Figure 4 is a schematic block diagram illustrating one embodiment of the invention implemented by a digital signal processor.
Figure 5 is graph of relative noise ratio versus weight illustrating a preferred assignment of weight for vaπous ranges of values of relative noise ratios. Figure 6 is a graph plotting power versus Hz illustrating a typical power spectral density of background noise recorded from a cellular telephone in a moving vehicle.
Figure 7 is a curve plotting Hz versus weight obtained from a preferred form of adaptive weighting function in accordance with the invention. Figure 8 is a graph plotting Hz versus weight for a family of weighting curves calculated according to a preferred embodiment of the invention.
Figure 9 is a graph plotting Hz versus decibels of the broad spectral shape of a typical voiced speech segment.
Figure 10 is a graph plotting Hz versus decibels of the broad spectral shape of a typical unvoiced speech segment.
Figure 11 is a graph plotting Hz versus decibels of perceptual spectral weighting curves for k0=25
Figure 12 is a graph plotting Hz versus decibels of perceptual spectral weighting curves for ko=38.
Figure 13 is a graph plotting Hz versus decibels of perceptual spectral weighting curves for ko=50. DESCRIPTION OF THE PREFERRED EMBODIMENTS
The preferred form of ANC system shown in Figure 3 is robust under adverse conditions often present in cellular telephony and packet voice networks. Such adverse conditions include signal dropouts and fast changing background noise conditions with wide dynamic ranges. The Figure 3 embodiment focuses on attaining high perceptual quality in the processed speech signal under a wide variety of such channel impairments.
The performance limitation imposed by commonly used two-state voice activity detection functions is overcome in the preferred embodiment by using a probabilistic speech presence measure. This new measure of speech is called the Speech Presence
Measure (SPM), and it provides multiple signal activity states and allows more accurate handling of the input signal during different states. The SPM is capable of detecting signal dropouts as well as new environments. Dropouts are temporary losses of the signal that occur commonly in cellular telephony and in voice over packet networks. New environment detection is the ability to detect the start of new calls as well as sudden changes in the background noise environment of an ongoing call. The SPM can be beneficial to any noise reduction function, including the preferred embodiment of this invention.
Accurate noisy signal and noise power measures, which are performed for each frequency band, improve the performance of the preferred embodiment. The measurement for each band is optimized based on its frequency and the state information from the SPM. The frequency dependence is due to the optimization of power measurement time constants based on the statistical distribution of power across the spectrum in typical speech and environmental background noise. Furthermore, this spectrally based optimization of the power measures has taken into consideration the non-linear nature of the human auditory system. The SPM state information provides additional information for the optimization of the time constants as well as ensuπng stability and speed of the power measurements under adverse conditions. For instance, the indication of a new environment by the SPM allows the fast reaction of the power measures to the new environment
According to the preferred embodiment, significant enhancements to perceived quality, especially under severe noise conditions, are achieved via three novel spectral weighting functions The weighting functions are based on (1) the overall noise-to- signal ratio (NSR), (2) the relative noise ratio, and (3) a perceptual spectral weighting model. The first function is based on the fact that over-suppression under heavier overall noise conditions provide better perceived quality. The second function utilizes the noise contπbution of a band relative to the overall noise to appropπately weight the band, hence providing a fine structure to the spectral weighting. The third weighting function is based on a model of the power-frequency relationship in typical environmental background noise. The power and frequency are approximately inversely related, from which the name of the model is deπved. The inverse spectral weighting model parameters can be adapted to match the actual environment of an ongoing call. The weights are conveniently applied to the NSR values computed for each frequency band; although, such weighting could be applied to other parameters with appropπate modifications just as well. Furthermore, since the weighting functions are independent, only some or all the functions can be jointly utilized
The preferred embodiment preserves the natural spectral shape of the speech signal which is important to perceived speech quality. This is attained by careful spectrally interdependent gam adjustment achieved through the attenuation factors An additional advantage of such spectrally interdependent gam adjustment is the vaπance reduction of the attenuation factors.
Refernng to Figure 3, a preferred form of adaptive noise cancellation system 10 made in accordance with the invention compπses an input voice channel 20 transmitting a communication signal compπsmg a plurality of frequency bands derived from speech and noise to an input terminal 22. A speech signal component of the commumcation signal is due to speech and a noise signal component of the communication signal is due to noise. A filter function 50 filters the communication signal into a plurality of frequency band signals on a signal path 51. A DTMF tone detection function 60 and a speech presence measure function 70 also receive the communication signal on input channel 20. The frequency band signals on path 51 are processed by a noisy signal power and noise power estimation function 80 to produce vaπous forms of power signals.
The power signals provide inputs to an perceptual spectral weighting function 90, a relative noise ratio based weighting function 100 and an overall noise to signal ratio based weighting function 110. Functions 90, 100 and 110 also receive inputs from speech presence measure function 70 which is an improved voice activity detector Functions 90, 100 and 110 generate preferred forms of weighting signals having weighting factors for each of the frequency bands generated by filter function 50. The weighting signals provide inputs to a noise to signal ratio computation and weighting function 120 which multiplies the weighting factors from functions 90, 100 and 110 for each frequency band together and computes an NSR value for each frequency band signal generated by the filter function 50. Some of the power signals calculated by function 80 also provide inputs to function 120 for calculating the NSR value.
Based on the combined weighting values and NSR value input from function 120, a gain computation and interdependent gain adjustment function 130 calculates preferred forms of initial gain signals and preferred forms of modified gain signals with initial and modified gain values for each of the frequency bands and modifies the initial gain values for each frequency band by, for example, smoothing so as to reduce the variance of the gain. The value of the modified gain signal for each frequency band generated by function 130 is multiplied by the value of every sample of the frequency band signal in a gain multiplication function 140 to generate preferred forms of weighted frequency band signals. The weighted frequency band signals are summed in a combiner function 160 to generate a communication signal which is transmitted through an output terminal 172 to a channel 170 with enhanced quality. A DTMF tone extension or regeneration function 150 also can place a DTMF tone on channel 170 through the operation of combiner function 160.
The function blocks shown in Figure 3 may be implemented by a variety of well known calculators, including one or more digital signal processors (DSP) including a program memory storing programs which are executed to perform the functions associated with the blocks (described later in more detail) and a data memory for storing the variables and other data described in connection with the blocks. One such embodiment is shown in Figure 4 which illustrates a calculator in the form of a digital signal processor 12 which communicates with a memory 14 over a bus 16. Processor 12 performs each of the functions identified in connection with the blocks of Figure 3. Alternatively, any of the function blocks may be implemented by dedicated hardware implemented by application specific integrated circuits (ASICs), including memory, which are well known in the art. Of course, a combination of one or more DSPs and one or more ASICs also may be used to implement the preferred embodiment. Thus, Figure 3 also illustrates an ANC 10 comprising a separate ASIC for each block capable of performing the function indicated by the block. Filtering
In typical telephony applications, the noisy speech-containing input signal on channel 20 occupies a 4kHz bandwidth. This communication signal may be spectrally decomposed by filter 50 using a filter bank or other means for dividing the communication signal into a plurality of frequency band signals. For example, the filter function could be implemented with block-processing methods, such as a Fast Fourier Transform (FFT). In the case of an FFT implementation of filter function 50, the resulting frequency band signals typically represent a magnitude value (or its square) and a phase value. The techniques disclosed in this specification typically are applied to the magnitude values of the frequency band signals. Filter 50 decomposes the input signal into N frequency band signals representing N frequency bands on
path 51. The input to filter 50 will be denoted x(n) while the output of the k'h filter
in the filter 50 will be denoted xk (n) , where n is the sample time.
The input, x(ή) , to filter 50 is high-pass filtered to remove DC components by
conventional means not shown. Gain Computation
We first will discuss one form of gain computation. Later, we will discuss an
interdependent gain adjustment technique. The gain (or attenuation) factor for the k'h frequency band is computed by function 130 once every T samples as
r t Jl -Wø (n)NSR» , n = 0,70270... G, (n) = (1)
{ Gk (n - 1) , n = l,2,...,T - l,T + l,...,2T - l,...
A suitable value for T is 10 when the sampling rate is 8kHz. The gain factor will range between a small positive value, ε , and 1 because the weighted NSR values are limited to he in the range [0,1- ε ]. Setting the lower limit of the gain to ε reduces the effects of "musical noise" (described in reference [2]) and permits limited background signal transparency. In the preferred embodiment, ε is set to 0.05. The weighting
factor, Wk (n) , is used for over-suppression and under-suppression puφoses of the
signal in the k'h frequency band. The overall weighting factor is computed by function 120 as
Wk (n) ^ uk (n)vk (n)wk (n) (2)
where uk (n) is the weight factor or value based on overall NSR as calculated by
function 110, wk (n) is the weight factor or value based on the relative noise ratio
weighting as calculated by function 100, and ^ (rc)is the weight factor or value based
on perceptual spectral weighting as calculated by function 90. As previously described, each of the weight factors may be used separately or in various combinations. Gam Multiplication
The attenuation of the signal xk (n) from the k'h frequency band is achieved
by function 140 by multiplying xk (n) by its corresponding gain factor, Gk (n) , every
sample to generate weighted frequency band signals. Combiner 160 sums the resulting attenuated signals, γ(ή) , to generate the enhanced output signal on channel
170. This can be expressed mathematically as:
y(n) = Gk (n)xk (n) (3)
Power Estimation
The operations of noisy signal power and noise power estimation function 80 include the calculation of power estimates and generating preferred forms of corresponding power band signals having power band values as identified in Table 1 below. The power, P(n) at sample n, of a discrete-time signal u(n), is estimated approximately by either (a) lowpass filteπng the full-wave rectified signal or (b) lowpass filteπng an even power of the signal such as the square of the signal A first order IIR filter can be used for the lowpass filter for both cases as follows:
P(n) = βP(n - l) + cc \ u(n) \ (4a)
P(n) = βP(n - 1) + a[u(n)]2 (4b)
The lowpass filteπng of the full-wave rectified signal or an even power of a signal is an averaging process. The power estimation (e.g., averaging) has an effective time window or time peπod duπng which the filter coefficients are large, whereas outside this window, the coefficients are close to zero The coefficients of the lowpass filter determine the size of this window or time peπod Thus, the power estimation (e.g , averaging) over different effective window sizes or time periods can be achieved by using different filter coefficients. When the rate of averaging is said to be increased, it is meant that a shorter time period is used. By using a shorter time period, the power estimates react more quickly to the newer samples, and "forget" the effect of older samples more readily. When the rate of averaging is said to be reduced, it is meant that a longer time period is used.
The first order IIR filter has the following transfer function:
H(z) = -^-r (5)
1 - pz
OL
The DC gain of this filter is H(l) = . The coefficient, β , is a decay constant.
The decay constant represents how long it would take for the present (non-zero) value of the power to decay to a small fraction of the present value if the input is zero, i.e. u(n) = 0. If the decay constant, β , is close to unity, then it will take a longer time
for the power value to decay. If β is close to zero, then it will take a shorter time for
the power value to decay. Thus, the decay constant also represents how fast the old power value is forgotten and how quickly the power of the newer input samples is incoφorated. Thus, larger values of β result in longer effective averaging windows
or time periods.
Depending on the signal of interest, effectively averaging over a shorter or longer time period may be appropriate for power estimation. Speech power, which
has a rapidly changing profile, would be suitably estimated using a smaller β . Noise
can be considered stationary for longer periods of time than speech. Noise power would be more accurately estimated by using a longer averaging window (large β ). The preferred form of power estimation significantly reduces computational complexity by undersamplmg the input signal for power estimation puφoses This means that only one sample out of every T samples is used for updating the power
P(n) in (4). Between these updates, the power estimate is held constant. This
procedure can be mathematically expressed as
\βP(n - l) + \ u(n) \ , n = 0,2T,3T,... P(n) - (6)
[P(n - Ϊ) , n = l,2,...T - l,T + l,...2T - l,...
Such first order lowpass IIR filters may be used for estimation of the vaπous power measures listed in the Table 1 below-
Table 1
Function 80 generates a signal for each of the foregoing Vaπables. Each of the signals in Table 1 is calculated using the estimations descπbed in this Power Estimation section. The Speech Presence Measure, which will be discussed later, utilizes short-term and long-term power measures in the first formant region. To perform the first formant power measurements, the input signal, x(n) , is lowpass
filtered using an IIR filter H(z) = . In the preferred implementation, the filter has a cut-off frequency at 850Ηz and has coefficients
b0 = 0 1027 , b, = 0.2053 , a = -0.9754 and a, = 0 4103. Denoting the output of this filter as jc, (n) , the short-term and long-term first formant power measures can
be obtained as follows:
P ,,sτ (") = β ,.sτ pis,, ST (n - l) + alil ST (n)\ (7) if Pls,,Lτ (n) < Pls,,Sτ (n)
Pn,,LT (") = βn,,LT,Λ,.LT (n -l) + αlιr,Lr,, (n)\ and DROPOUT = 0
(8) = Pm,Lτ (n ~ 1) if DROPOUT = 1 DROPOUT in (8) will be explained later. The time constants used in the above difference equations are the same as those described in (6) and are tabulated below:
One effect of these time constants is that the short term first formant power measure is effectively averaged over a shorter time period than the long term first formant power measure. These time constants are examples of the parameters used to analyze a communication signal and enhance its quality.
Noise-to-Signal Ratio (NSR) Estimation
Regarding overall NSR based weighting function 110, the overall NSR,
NSRoverall (n) at sample n , is defined as
PBΛn)
NSRovemll (n) = - BN (9)
PsιXn) The overall NSR is used to influence the amount of over-suppression of the signal in
each frequency band and will be discussed later. The NSR for the k'h frequency band may be computed as
Those skilled in the art recognize that other algorithms may be used to compute the
NSR values instead of expression (10).
Speech Presence Measure (SPM)
Speech presence measure (SPM) 70 may utilize any known DTMF detection method if DTMF tone extension or regeneration functions 150 are to be performed. In the preferred embodiment, the DTMF flag will be 1 when DTMF activity is detected and 0 otherwise. If DTMF tone extension or regeneration is unnecessary, then the following can be understood by always assuming that DTMF=0.
SPM 70 primarily performs a measure of the likelihood that the signal activity is due to the presence of speech. This can be quantized to a discrete number of decision levels depending on the application. In the preferred embodiment, we use five levels.
The SPM performs its decision based on the DTMF flag and the LEVEL value. The
DTMF flag has been described previously. The LEVEL value will be described shortly. The decisions, as quantized, are tabulated below. The lower four decisions
(Silence to High Speech) will be referred to as SPM decisions. Table 1: Joint Speech Presence Measure and DTMF Activity decisions
In addition to the above multi-level decisions, the SPM also outputs two flags or signals, DROPOUT and NEWENV, which will be descπbed in the following sections.
Power Measurement in the SPM
The novel multi-level decisions made by the SPM are achieved by using a speech likelihood related companson signal and multiple vaπable thresholds. In our prefeπed embodiment, we deπve such a speech likelihood related compaπson signal by compaπng the values of the first formant short-term noisy signal power estimate,
Phtsiin). and the first formant long-term noisy signal power estimate, Pht ur(n)- Multiple compaπsons are performed using expressions involving and Pi st Lii'i) as given in the preferred embodiment of equation (11) below. The result of these compaπsons is used to update the speech likelihood related compaπson signal. In our prefeπed embodiment, the speech likelihood related compaπson signal is a
hangover counter, hVά[ . Each of the inequalities involving Pjstsτ(n) and Pιst Lτ(n) uses
different scaling values (i.e. theμ, 's). They also possibly may use different additive
constants, although we use Po-2 for all of them.
The hangover counter, /.var , can be assigned a vaπable hangover peπod that is
updated every sample based on multiple threshold levels, which, in the preferred embodiment, have been limited to 3 levels as follows var max, 3 ifil,sτ (n) > μ3P t T (n) + P0
=
= = max[0, h - 1] otherwise where hmάx 3 > hmάx 2 > hmΑ]i and μ3 > μ2 > μ, .
Suitable values for the maximum values of zvar are bmax 3 = 2000 , bmax 2 = 1400 and
hmax ! = 800 . Suitable scaling values for the threshold comparison factors are
μ3 = 3.0 , μ2 = 2.0 and μ, = 1.6. The choice of these scaling values are based on the
desire to provide longer hangover periods following higher power speech segments. Thus, the inequalities of (11) determine whether Pιit,sτ(n) exceeds P]it,Lτ(n) by more
than a predetermined factor. Therefore, h M represents a preferred form of
comparison signal resulting from the comparisons defined in (11) and having a value representing differing degrees of likelihood that a portion of the input communication signal results from at least some speech.
Since longer hangover periods are assigned for higher power signal segments, the hangover period length can be considered as a measure that is directly proportional to the probability of speech presence. Since the SPM decision is required to reflect the likelihood that the signal activity is due to the presence of speech, and the SPM decision is based partly on the LEVEL value according to Table 1, we determine the value for LEVEL based on the hangover counter as tabulated below.
SPM 70 generates a preferred form of a speech likelihood signal having values corresponding to LEVELs 0-3. Thus, LEVEL depends indirectly on the power measures and represents varying likelihood that the input communication signal results from at least some speech. Basing LEVEL on the hangover counter is advantageous because a certain amount of hysteπsis is provided. That is, once the count enters one of the ranges defined in the preceding table, the count is constrained to stay in the range for vaπable peπods of time. This hysteπsis prevents the LEVEL value and hence the SPM decision from changing too often due to momentary changes in the signal power. If LEVEL were based solely on the power measures, the SPM decision would tend to flutter between adjacent levels when the power measures he near decision boundaπes.
Dropout Detection in the SPM
Another novel feature of the SPM is the ability to detect 'dropouts' in the signal. A dropout is a situation where the input signal power has a defined attribute, such as suddenly dropping to a very low level or even zero for short durations of time
(usually less than a second). Such dropouts are often expeπenced especially in a cellular telephony environment. For example, dropouts can occur due to loss of speech frames in cellular telephony or due to the user moving from a noisy environment to a quiet environment suddenly. During dropouts, the ANC system operates differently as will be explained later.
Dropout detection is incoφorated into the SPM. Equation (8) shows the use of a DROPOUT signal in the long-term (noise) power measure. Duπng dropouts, the adaptation of the long-term power for the SPM is stopped or slowed significantly. This prevents the long-term power measure from being reduced drastically during dropouts, which could potentially lead to incorrect speech presence measures later.
The SPM dropout detection utilizes the DROPOUT signal or flag and a
counter, cdropout . The counter is updated as follows every sample time.
The following table shows how DROPOUT should be updated.
As shown in the foregoing table, the attribute of cdwpout determines at least in part the
condition of the DROPOUT signal. A suitable value for the power threshold
comparison factor, μdropout , is 0.2. Suitable values for cx and c2 are c = 4000 and
c2 - 8000 , which correspond to 0.5 and 1 second, respectively. The logic presented
here prevents the SPM from indicating the dropout condition for more than cx
samples.
Limiting of Long-term (Noise) Power Measure in the SPM
In addition to the above enhancements to the long-term (noise) power
measure, Pλst LT (n) , it is further constrained from exceeding a certain threshold,
P t.LT.nux ' e- if e v lue °f Ri.sr. r (n) computed according to equation (7) is greater
than P l >t7-ιimx , then we set P t LT (n) = P LT max . This enhancement to the long-term
power measure makes the SPM more robust as it will not be able to rise to the level of the short-term power measure in the case of a long and continuous period of loud speech This prevents the SPM from providing an incorrect speech presence measure
in such situations. A suitable value for Plit LT ^ = 500/8159 assuming that the
maximum absolute value of the input signal x( ) is normalized to unity.
New Environment Detection in the SPM
At the beginning of a call, the background noise environment would not be known by ANC system 10. The background noise environment can also change suddenly when the user moves from a noisy environment to a quieter environment e g moving from a busy street to an indoor environment with windows and doors closed. In both these cases, it would be advantageous to adapt the noise power measures quickly for a short peπod of time. In order to indicate such changes in the environment, the SPM outputs a signal or flag called NEWENV to the ANC system
The detection of a new environment at the beginning of a call will depend on the system under question. Usually, there is some form of indication that a new call has been initiated. For instance, when there is no call on a particular line in some networks, an idle code may be transmitted. In such systems, a new call can be detected by checking for the absence of idle codes. Thus, the method for mferπng that a new call has begun will depend on the particular system.
In the preferred embodiment of the SPM, we use the flag NEWENV together
with a counter cnewem and a flag, OLDDROPOUT. The OLDDROPOUT flag
contains the value of the DROPOUT from the previous sample time.
A pitch estimator is used to monitor whether voiced speech is present in the input signal. If voiced speech is present, the pitch peπod (i.e., the inverse of pitch frequency) would be relatively steady over a peπod of about 20ms. If only background noise is present, then the pitch peπod would change in a random manner. If a cellular handset is moved from a quiet room to a noisy outdoor environment, the input signal would be suddenly much louder and may be incorrectly detected as speech. The pitch detector can be used to avoid such incorrect detection and to set the new environment signal so that the new noise environment can be quickly measured
To implement this function, any of the numerous known pitch peπod estimation devices may be used, such as device 74 shown in Fig. 3. In our preferred implementation, the following method is used. Denoting K(n-T) as the pitch peπod estimate from T samples ago, and K(n) as the current pitch peπod estimate, if \K(n)- K(n-40)\>3, and \K(n-40)-K(n-80)\>3, and \K(n-80)-K(n-120)\>3, then the pitch peπod is not steady and it is unlikely that the input signal contains voiced speech. If these conditions are true and yet the SPM says that LEVEL>1 which normally implies that significant speech is present, then it can be inferred that a sudden increase in the background noise has occurred.
The following table specifies a method of updating NEWENV and cnmem .
In the above method, the NEWENV flag is set to 1 for a peπod of time specified by
newem max , after which it is cleared. The NEWENV flag is set to 1 in response to
vaπous events or attπbutes:
(1) at the beginning of a new call, (2) at the end of a dropout peπod;
(3) in response to an increase in background noise (for example, the pitch detector 74 may reveal that a new high amplitude signal is not due to speech, but rather due to noise.); or (4) m response to a sudden decrease in background noise to a lower level of sufficient amplitude to avoid being a drop out condition.
A suitable value for the cnewmv max is 2000 which corresponds to 0.25 seconds
Operation of the ANC System
Referπng to Figure 3, the multi-level SPM decision and the flags DROPOUT and NEWENV are generated on path 72 by SPM 70. With these signals, the ANC system is able to perform noise cancellation more effectively under adverse conditions. Furthermore, as previously descπbed, the power measurement function has been significantly enhanced compared to pπor known systems. Additionally, the three independent weighting functions earned out by functions 90, 100 and 110 can
be used to achieve over-suppression or under-suppression. Finally, gain computation
and mterdependent gain adjustment function 130 offers enhanced performance
Use of Dropout Signals
When the flag DROPOUT=l, the SPM 70 is indicating that there is a temporary loss of signal. Under such conditions, continuing the adaptation of the signal and noise power measures could result in poor behavior of a noise suppression system One solution is to slow down the power measurements by using very long time constants. In the preferred embodiment, we freeze the adaptation of both signal and noise power measures for the individual frequency bands, i.e we set P^ „ = P ( _ j) an ps k „) = p£ (n _ j) when DROPOUT=l. Since DROPOUT
remains at 1 only for a short time (at most 0.5 sec in our implementation), an erroneous dropout detection may only affect ANC system 10 momentaπly The improvement m speech quality gained by our robust dropout detection outweighs the low πsk of incorrect detection.
Use of New Environment Signals
When the flag NEWENV=1, SPM 70 is indicating that there is a new environment due to either a new call or that it is a post-dropout environment. If there is no speech activity, i.e. the SPM indicates that there is silence, then it would be advantageous for the ANC system to measure the noise spectrum quickly. This quick reaction allows a shorter adaptation time for the ANC system to a new noise
environment. Under normal operation, the time constants, N k and βN k , used for the
noise power measurements would be as given in Table 2 below. When NEWENV=1, we force the time constants to correspond to those specified for the Silence state in
Table 2. The larger 3 values result in a fast adaptation to the background noise power
SPM 70 will only hold the NEWENV at 1 for a short peπod of time. Thus, the ANC system will automatically revert to using the normal Table 2 values after this time.
Table 2: Power measurement time constants
Frequency-Dependent and Speech Presence Measure-Based Time Constants for Power Measurement
The noise and signal power measurements for the different frequency bands are given by βN kPk(n-l) + aN k \xk(n)\ , n = 0,27\37\.
PN k(n) = (12)
W-i) ι,2,...τ-ι,τ + ι,...2τ-ι,...
* .._ I jS;P (w-l) + o_s \xk(n)\ , n = 0,2T,3T,...
Ps(n) = (13)
P r,kk(n-1) , n = l,2,...T-l,T + l,...2T-l,...
In the preferred embodiment, the time constants βN k , βk , aN k and as k are based on
both the frequency band and the SPM decisions. The frequency dependence will be explained first, followed by the dependence on the SPM decisions.
The use of different time constants for power measurements in different frequency bands offers advantages. The power in frequency bands in the middle of the 4kHz speech bandwidth naturally tend to have higher average power levels and vaπance duπng speech than other bands. To track the faster vaπations, it is useful to have relatively faster time constants for the signal power measures in this region. Relatively slower signal power time constants are suitable for the low and high frequency regions. The reverse is true for the noise power time constants, i.e. faster time constants in the low and high frequencies and slower time constants in the middle frequencies. We have discovered that it would be better to track at a higher speed the noise in regions where speech power is usually low. This results in an earlier suppression of noise especially at the end of speech bursts.
In addition to the vaπation of time constants with frequency, the time constants are also based on the multi-level decisions of the SPM In our preferred implementation of the SPM, there are four possible SPM decisions (i.e., Silence, Low Speech, Medium Speech, High Speech). When the SPM decision is Silence, it would be beneficial to speed up the tracking of the noise in all the bands. When the SPM decision is Low Speech, the likelihood of speech is higher and the noise power measurements are slowed down accordingly. The likelihood of speech is considered too high in the remaining speech states and thus the noise power measurements are turned off in these states. In contrast to the noise power measurement, the time constants for the signal power measurements are modified so as to slow down the tracking when the likelihood of speech is low. This reduces the vaπance of the signal power measures duπng low speech levels and silent peπods. This is especially beneficial duπng silent peπods as it prevents short-duration noise spikes from causing the gam factors to πse.
In the preferred embodiment, we have selected the time constants as shown in Table 2 above. The DC gains of the IIR filters used for power measurements remain fixed across all frequencies for simplicity in our preferred embodiment although this could be vaπed as well
Weighting based on Overall NSR
In reference [2], it is explained that the perceived quality of speech is improved by over-suppression of frequency bands based on the overall SNR. In the preferred embodiment, over-suppression is achieved by weighting the NSR according
to (2) using the weight, uk (n) , given by
uk (n) = 0.5 + NSRoverall (n) (14) Here, we have limited the weight to range from 0.5 to 1.5. This weight computation may be performed slower than the sampling rate for economical reasons. A suitable update rate is once per 2T samples.
Weighting Based on Relative Noise Ratios
We have discovered that improved noise cancellation results from weighting based on relative noise ratios. According to the preferred embodiment, the weighting,
denoted by wk , based on the values of noise power signals in each frequency band,
has a nominal value of unity for all frequency bands. This weight will be higher for a frequency band that contπbutes relatively more to the total noise than other bands. Thus, greater suppression is achieved in bands that have relatively more noise. For bands that contπbute little to the overall noise, the weight is reduced below unity to reduce the amount of suppression. This is especially important when both the speech and noise power in a band are very low and of the same order In the past, in such situations, power has been severely suppressed, which has resulted in hollow sounding speech. However, with this weighting function, the amount of suppression is reduced, preserving the richness of the signal, especially in the high frequency region.
There are many ways to determine suitable values for wk . First, we note that
the average background noise power is the sum of the background noise powers in N
frequency bands divided by the N frequency bands and is represented by PBN (n) / N .
The relative noise ratio in a frequency band can be defined as
R n) -- ^~ (15)
The goal is to assign a higher weight for a band when the ratio, Rk (n) , for that
band is high, and lower weights when the ratio is low. In the preferred embodiment, we assign these weights as shown in Figure 5, where the weights are allowed to range between 0.5 and 2. To save on computational time and cost, we perform the update of (15) once per 2T samples. Function 80 (Figure 3) generates preferred forms of band power signals corresponding to the terms on the right side of equation (15) and function 100 generates preferred forms of weighting signals with weighting values coπesponding to the term on the left side of equation (15).
If an approximate knowledge of the nature of the environmental noise is known, then the RΝR weighting technique can be extended to incoφorate this knowledge. Figure 6 shows the typical power spectral density of background noise recorded from a cellular telephone in a moving vehicle. Typical environmental background noise has a power spectrum that corresponds to pink or brown noise. (Pink noise has power inversely proportional to the frequency. Brown noise has power inversely proportional to the square of the frequency.) Based on this approximate knowledge of the relative noise ratio profile across the frequency bands, the perceived quality of speech is improved by weighting the lower frequencies more heavily so that greater suppression is achieved at these frequencies.
We take advantage of the knowledge of the typical noise power spectrum profile (or equivalently, the RNR profile) to obtain an adaptive weighting function. In
general, the weight, wf for a particular frequency, / , can be modeled as a function
of frequency in many ways. One such model is
f = Kf - f0)2 + c (16)
This model has three parameters { b, f0 , c } An example of a weighting curve
obtained from this model is shown in Figure 7 for b = 5.6 xlO"8 , f0 = 3000 and
c = 0.5 .
The Figure 7 curve vanes monotomcally with decreasing values of weight from 0 Hz to about 3000 Hz, and also vanes monotomcally with increasing values of weight from about 3000 Hz to about 4000 Hz. In practice, we could use the frequency band index, k , corresponding to the actual frequency / . This provides the following
practical and efficient model with parameters { b, k0 , c } :
wk = b(k - k0)2 + c (17)
In general, the ideal weights, wk , may be obtained as a function of the measured noise
power estimates, P^ , at each frequency band as follows:
Basically, the ideal weights are equal to the noise power measures normalized by the largest noise power measure. In general, the normalized power of a noise component m a particular frequency band is defined as a ratio of the power of the noise component in that frequency band and a function of some or all of the powers of the noise components m the frequency band or outside the frequency band. Equations (15) and (18) are examples of such normalized power of a noise component. In case all the power values are zero, the ideal weight is set to unity. This ideal weight is actually an alternative definition of RNR. We have discovered that noise cancellation can be improved by providing weighting which at least approximates normalized power of the noise signal component of the input communication signal. In the preferred embodiment, the normalized power may be calculated according to (18) Accordingly, function 100 (Figure 3) may generate a preferred form of weighting signals having weighting values approximating equation (18). The approximate model in (17) attempts to mimic the ideal weights computed
using (18). To obtain the model parameters { b,k0 ,c }, a least-squares approach may
be used. An efficient way to perform this is to use the method of steepest descent to
adapt the model parameters { b, k0 , c } .
We deπve here the general method of adapting the model parameters using the steepest descent technique. First, the total squared error between the weights generated by the model and the ideal weights is defined for each frequency band as follows:
Taking the partial denvative of the total squared error, e2 , with respect to each of the model parameters in turn and dropping constant terms, we obtain
^ = [b(k -kQ)2 +c-Wklk -k0)2 (20) db all A
Denoting the model parameters and the error at the n sample time as { bn , k0 n > c„ }
and en (k) , respectively, the model parameters at the (n + 1) th sample can be estimated
as
r == r e2
^π+l '" n - d '"c - V---'-v
Here { λbkc } are appropriate step-size parameters. The model definition in (17)
can then be used to obtain the weights for use in noise suppression, as well as being used for the next iteration of the algorithm. The iterations may be performed every sample time or slower, if desired, for economy.
We have described the alternative preferred RNR weight adaptation technique above. The weights obtained by this technique can be used to directly multiply the coπesponding NSR values. These are then used to compute the gain factors for attenuation of the respective frequency bands.
In another embodiment, the weights are adapted efficiently using a simpler adaptation technique for economical reasons. We fix the value of the weighting
model parameter k0 to k0 = 36 which corresponds to f0 = 2880Hz in (16).
Furthermore, we set the model parameter bn at sample time n to be a function of k0
and the remaining model parameter cn as follows: = - l ^X (26)
Equation (26) is obtained by setting k = 0 and wk = 1 in (17). We adapt only cn to
determine the curvature of the relative noise ratio weighting curve. The range of cn is
restricted to [0.1,1.0]. Several weighting curves corresponding to these specifications
are shown in Figure 8. Lower values of cn correspond to the lower curves. When
cn = 1 , no spectral weighting is performed as shown in the uppermost line. For all
other values of cn , the curves vary monotomcally in the same manner described in
connection with Figure 7. The greatest amount of curvature is obtained when
cn = 0.1 as shown in the lowest curve. The applicants have found it advantageous to
aπange the weighting values so that they vary monotonically between two frequencies separated by a factor of 2 (e.g., the weighting values vary monotonically between 1000-2000 Hz and/or between 1500-3000 Hz).
The determination of cn is performed by comparing the total noise power in
the lower half of the signal bandwidth to the total noise power in the upper half. We define the total noise power in the lower and upper half bands as:
P,o, l,loWeΛn) = ∑ P' (ll) (27)
P, ,al,upP r (n) = ∑ R» (28)
Alternatively, lowpass and highpass filter could be used to filter x(n) followed by
appropriate power measurement using (6) to obtain these noise powers. In our filter
bank implementation, k e {3,4,...,42} and hence Flmer = {3,4,...22} and
Fuppeι = {23, 24, ...42} . Although these power measures may be updated every sample, they are updated once every 2T samples for economical reasons. Hence the value of
cn needs to be updated only as often as the power measures. It is defined as follows:
The min and max functions restπct cn to he within [0.1,1.0].
According to another embodiment, a curve, such as Figure 7, could be stored as a weighting signal or table in memory 14 and used as static weighting values for each of the frequency band signals generated by filter 50. The curve could vary monotonically, as previously explained, or could vary according to the estimated
spectral shape of noise or the estimated overall noise power, PBN (n) ,as explained in
the next paragraphs.
Alternatively, the power spectral density shown in Figure 6 could be thought of as defining the spectral shape of the noise component of the communication signal received on channel 20. The value of c is altered according to the spectral shape in
order to determine the value of wk in equation (17). Spectral shape depends on the
power of the noise component of the communication signal received on channel 20.
As shown in equations (12) and (13), power is measured using time constants aN k and
βN k which vary according to the likelihood of speech as shown in Table 2. Thus, the
weighting values determined according to the spectral shape of the noise component of the communication signal on channel 20 are denved in part from the likelihood that the communication signal is denved at least in part from speech. According to another embodiment, the weighting values could be determined from the overall background noise power. In this embodiment, the value of c in
equation (17) is determined by the value of PBN (n)
In general, according to the preceding paragraphs, the weighting values may vary in accordance with at least an approximation of one or more characteπstics (e.g., spectral shape of noise or overall background power) of the noise signal component of the communication signal on channel 20. Perceptual Spectral Weighting
We have discovered that improved noise cancellation results from perceptual spectral weighting (PSW) in which different frequency bands are weighted differently based on their perceptual importance. Heavier weighting results m greater suppression in a frequency band. For a given SNR (or NSR), frequency bands where speech signals are more important to the perceptual quality are weighted less and hence suppressed less. Without such weighting, noisy speech may sometimes sound 'hollow' after noise reduction. Hollow sound has been a problem in previous noise reduction techniques because these systems had a tendency to oversuppress the perceptually important parts of speech. Such oversuppression was partly due to not taking into account the perceptually important spectral interdependence of the speech signal.
The perceptual importance of different frequency bands change depending on charactenstics of the frequency distnbution of the speech component of the communication signal being processed. Determining perceptual importance from such characteπstics may be accomplished by a vanety of methods. For example, the characteπstics may be determined by the likelihood that a communication signal is denved from speech. As explained previously, this type of classification can be
implemented by using a speech likelihood related signal, such as bvdr . Assuming a
signal was denved from speech, the type of signal can be further classified by determining whether the speech is voiced or unvoiced. Voiced speech results from vibration of vocal cords and is illustrated by utterance of a vowel sound. Unvoiced speech does not require vibration of vocal cords and is illustrated by utterance of a consonant sound.
The broad spectral shapes of typical voiced and unvoiced speech segments are shown in Figures 9 and 10, respectively. Typically, the 1000Hz to 3000Hz regions contain most of the power in voiced speech. For unvoiced speech, the higher frequencies (>2500Hz) tend to have greater overall power than the lower frequencies. The weighting in the PSW technique is adapted to maximize the perceived quality as the speech spectrum changes.
As in RNR weighting technique, the actual implementation of the perceptual spectral weighting may be performed directly on the gam factors for the individual frequency bands. Another alternative is to weight the power measures appropπately. In our prefened method, the weighting is incoφorated into the NSR measures.
The PSW technique may be implemented independently or in any combination with the overall NSR based weighting and RNR based weighting methods. In our prefeπed implementation, we implement PSW together with the other two techniques as given in equation (2).
The weights in the PSW technique are selected to vary between zero and one. Larger weights conespond to greater suppression The basic idea of PSW is to adapt the weighting curve in response to changes in the characteristics of the frequency distribution of at least some components of the communication signal on channel 20. For example, the weighting curve may be changed as the speech spectrum changes when the speech signal transitions from one type of communication signal to another, e.g., from voiced to unvoiced and vice versa. In some embodiments, the weighting curve may be adapted to changes in the speech component of the communication signal. The regions that are most critical to perceived quality (and which are usually oversuppressed when using previous methods) are weighted less so that they are suppressed less. However, if these perceptually important regions contain a significant amount of noise, then their weights will be adapted closer to one.
Many weighting models can be devised to achieve the PSW. In a manner similar to the RNR technique's weighting scheme given by equation (17), we utilize the
practical and efficient model with parameters {b,k0 ,c} :
v, = b(k - k0)2 + c (30) Here vk is the weight for frequency band k. In this method, we will vary only k0
and c. This weighting curve is generally U-shaped and has a minimum value of c at
frequency band k0. For simplicity, we fix the weight at k=0 to unity. This gives the
following equation for b as a function of k0 and c:
The lowest weight frequency band, k0 , is adapted based on the likelihood of
speech being voiced or unvoiced. In our preferred method, k0 is allowed to be in the
range [25,50], which coπesponds to the frequency range [2000Hz, 4000Hz]. During strong voiced speech, it is desirable to have the U-shaped weighting curve vk to have
the lowest weight frequency band k0 to be near 2000Hz. This ensures that the
midband frequencies are weighted less in general. During unvoiced speech, the
lowest weight frequency band k0 is placed closer to 4000Hz so that the mid to high
frequencies are weighted less, since these frequencies contain most of the perceptually important parts of unvoiced speech. To achieve this, the lowest weight frequency
band k0 is varied with the speech likelihood related comparison signal which is the
hangover counter, &Vdr , in our prefened method. Recall that Λvar is always in the
range [0, timax 3 =2000]. Larger values of hVdT indicate higher likelihoods of speech and
also indicate a higher likelihood of voiced speech. Thus, in our preferred method, the lowest weight frequency band is varied with the speech likelihood related comparison signal as follows:
Since k0 is an integer, the floor function |_.J is used for rounding. Next, the method for adapting the minimum weight c is presented. In one approach, the minimum weight c could be fixed to a small value such as 0.25.
However, this would always keep the weights in the neighborhood of the lowest
weight frequency band k0 at this minimum value even if there is a strong noise
component in that neighborhood. This could possibly result in insufficient noise attenuation. Hence we use the novel concept of a regional NSR to adapt the minimum weight.
The regional NSR, NSR nal (k) , is defined with respect to the minimum weight
frequency band k0 and is given by: NSR r.egional («) = (33)
Basically, the regional ΝSR is the ratio of the noise power to the noisy signal
power in a neighborhood of the minimum weight frequency band k0. In our prefened
method, we use up to 5 bands centered at k0 as given in the above equation.
In our prefened implementation, when the regional ΝSR is -15dB or lower, we set the minimum weight c to 0.25 (which is about 12dB). As the regional ΝSR approaches its maximum value of OdB, the minimum weight is increased towards unity. This can be achieved by adapting the minimum weight c at sample time n as
, NSRoverall (n) < 0.1778 = -15dB + 0.08S , 0.1778 < NSRoverall (n) ≤ 1 The v^ curves are plotted for a range of values of c and k0 in Figures 11-13 to
illustrate the flexibility that this technique provides in adapting the weighting curves.
Regardless of k0 , the curves are flat when c=l, which conesponds to the situation
where the regional ΝSR is unity (OdB). The curves shown in Figures 11-13 have the same monotonic properties and may be stored in memory 14 as a weighting signal or table in the same manner previously described in connection with Figure 7.
As can be seen from equation (32), processor 12 generates a control signal from
the speech likelihood signal hV3[ which represents a characteristic of the speech and
noise components of the communication signal on channel 20. As previously explained, the likelihood signal can also be used as a measure of whether the speech is voiced or unvoiced. Determining whether the speech is voiced or unvoiced can be accomplished by means other than the likelihood signal. Such means are known to those skilled in the field of communications.
The characteristics of the frequency distribution of the speech component of the channel 20 signal needed for PSW also can be determined from the output of pitch estimator 74. In this embodiment, the pitch estimate is used as a control signal which indicates the characteristics of the frequency distribution of the speech component of the channel 20 signal needed for PSW. The pitch estimate, or to be more specific, the rate of change of the pitch, can be used to solve for kς, in equation (32). A slow rate of change would correspond to smaller JQ values, and vice versa. In one embodiment of PSW, the calculated weights for the different bands are based on an approximation of the broad spectral shape or envelope of the speech component of the communication signal on channel 20. More specifically, the calculated weighting curve has a generally inverse relationship to the broad spectral shape of the speech component of the channel 20 signal. An example of such an inverse relationship is to calculate the weighting curve to be inversely proportional to the speech spectrum, such that when the broad spectral shape of the speech spectrum is multiplied by the weighting curve, the resulting broad spectral shape is approximately flat or constant at all frequencies in the frequency bands of interest. This is different from the standard spectral subtraction weighting which is based on the noise-to-signal ratio of individual bands. In this embodiment of PSW, we are taking into consideration the entire speech signal (or a significant portion of it) to determine the weighting curve for all the frequency bands. In spectral subtraction, the weights are determined based only on the individual bands. Even in a spectral subtraction implementation such as in Figure IB, only the overall SNR or NSR is considered but not the broad spectral shape
Computation of Broad Spectral Shape or Envelope of Speech
There are many methods available to approximate the broad spectral shape of the speech component of the channel 20 signal. For instance, linear prediction analysis techniques, commonly used in speech coding, can be used to determine the spectral shape.
Alternatively, if the noise and signal powers of individual frequency bands are
tracked using equations such as (12) and (13), the speech spectrum power at the k&
band can be estimated as \ P (n) - Pk (n) . Since the goal is to obtain the broad
spectral shape, the total power, Pk (n) , may be used to approximate the speech power
in the band. This is reasonable since, when speech is present, the signal spectrum shape is usually dominated by the speech spectrum shape. The set of band power values together provide the broad spectral shape estimate or envelope estimate. The number of band power values m the set will vary depending on the desired accuracy of the estimate. Smoothing of these band power values using moving average techniques is also beneficial to remove jaggedness in the envelope estimate.
Computation of Perceptual Spectral Weighting Curve
After the broad spectral shape is approximated, the perceptual weighting curve may be determined to be inversely proportional to the broad spectral shape
approximation. For instance, if Pk (n) is used as the broad spectral shape estimate at
the fc* band, then the weight for the &* band, vk , may be determined as
v; (n) = ψ I P (n) , where ψ is a predetermined value. In this embodiment, a set of speech power values, such as a set of R (n) values, is used as a control signal
indicating the char ac ten sties of the frequency distnbution of the speech component of the channel 20 signal needed for PSW. By using the foregoing spectral shape estimate and weighting curve, the vaπation of the power signals used for the estimate is reduced across the N frequency bands. For instance, the spectrum shape of the speech component of the channel 20 signal is made more nearly flat across the N frequency bands, and the vanation in the spectrum shape is reduced.
For economical reasons, we use a parametπc technique in our preferred implementation which also has the advantage that the weighting curve is always smooth across frequencies. We use a parametπc weighting curve, i.e. the weighting curve is formed based on a few parameters that are adapted based on the spectral shape. The number of parameters is less than the number of weighting factors. The parametnc weighting function in our economical implementation is given by the equation (30), which is a quadratic curve with three parameters.
Use of Weighting Functions
Although we have implemented weighting functions based on overall NSR
( uk ), perceptual spectral weighting ( vk ) and relative noise ratio weighting ( wk )
jointly, a noise cancellation system will benefit from the implementation of only one or vaπous combinations of the functions.
In our prefened embodiment, we implement the weighting on the NSR values for the different frequency bands. One could implement these weighting functions just as well, after appropnate modifications, directly on the gain factors.
Alternatively, one could apply the weights directly to the power measures pnor to computation of the noise-to-signal values or the gam factors A further possibility is to perform the different weighting functions on different vaπables appropπately in the ANC system Thus, the novel weighting techniques descnbed are not restπcted to specific implementations
Spectral Smoothing and Gam Vanance Reduction Across Frequency Bands
In some noise cancellation applications, the bandpass filters of the filter bank used to separate the speech signal into different frequency band components have little overlap Specifically, the magnitude frequency response of one filter does not significantly overlap the magnitude frequency response of any other filter in the filter bank This is also usually true for discrete Fouπer or fast Founer transform based implementations In such cases, we have discovered that improved noise cancellation can be achieved by interdependent gain adjustment Such adjustment is affected by smoothing of the input signal spectrum and reduction in vanance of gam factors across the frequency bands according to the techniques descnbed below The splitting of the speech signal into different frequency bands and applying independently determined ga factors on each band can sometimes destroy the natural spectral shape of the speech signal Smoothing the gain factors across the bands can help to preserve the natural spectral shape of the speech signal Furthermore, it also reduces the vaπance of the gain factors
This smoothing of the gam factors, Gk (n) (equation (1)), can be performed by
modifying each of the initial gam factors as a function of at least two of the initial gain factors The initial gam factors preferably are generated in the form of signals with initial gam values in function block 130 (Figure 3) according to equation (1) According to the preferred embodiment, the initial gain factors or values are modified using a weighted moving average The gain factors coπespondmg to the low and high values of k must be handled slightly differently to prevent edge effects. The initial gain factors are modified by recalculating equation (1) in function 130 to a prefened form of modified gain signals having modified gain values or factors. Then the modified gain factors are used for gain multiplication by equation (3) in function block 140 (Figure 3).
More specifically, we compute the modified gains by first computing a set of
initial gain values, Gk' (n) . We then perform a moving average weighting of these
initial gain factors with neighboring gain values to obtain a new set of gain values,
Gk (ή) . The modified gain values derived from the initial gain values is given by
Gk (n) = ∑MLGk' (ή) (35)
A=A,
The M k are the moving average coefficients tabulated below for our preferred
embodiment.
We have discovered that improved noise cancellation is possible with coefficients selected from the following ranges of values. One of the coefficients is in the range of 10 to 50 times the value of the sum of the other coefficients. For example, the coefficient 0.95 is in the range of 10 to 50 times the value of the sum of the other coefficients shown in each line of the preceding table. More specifically, the coefficient 0.95 is in the range from .90 to .98. The coefficient 0.05 is in the range .02 to .09.
In another embodiment, we compute the gain factor for a particular frequency band as a function not only of the corresponding noisy signal and noise powers, but also as a function of the neighboring noisy signal and noise powers. Recall equation
(1):
In this equation, the gain for frequency band k depends on NSRk (n) which in turn
depends on the noise power, Pk (ή) , and noisy signal power, Pk (n) of the same
frequency band. We have discovered an improvement on this concept whereby
Gk (ή) is computed as a function noise power and noisy signal power values from
multiple frequency bands. According to this improvement, Gk (n) may be computed
using one of the following methods:
MkPk (n) n = 0,T,2T,...
Gk (n) = A =A,
1 -WXn) (1-2)
Pk (n)
Gk (n - Ϊ) n = l,2,...,T -l,T + l,...,2T -l,...
l,r + l,...,2r - l,...
Our prefened embodiment uses equation (1.4) with M k determined using the same
table given above.
Methods described by equations (1.1)-(1.4) all provide smoothing of the input signal spectrum and reduction in variance of the gain factors across the frequency bands. Each method has its own particular advantages and trade-offs. The first method (1.1) is simply an alternative to smoothing the gains directly.
The method of (1.2) provides smoothing across the noise spectrum only while (1.3) provides smoothing across the noisy signal spectrum only. Each method has its advantages where the average spectral shape of the corresponding signals are maintained. By performing the averaging in (1.2), sudden bursts of noise happening in a particular band for very short periods would not adversely affect the estimate of the noise spectrum. Similarly in method (1.3), the broad spectral shape of the speech spectrum which is generally smooth in nature will not become too jagged in the noisy signal power estimates due to, for instance, changing pitch of the speaker. The method of (1.4) combines the advantages of both (1.2) and (1.3).
There is a subtle difference between (1.4) and (1.1). In (1.4), the averaging is performed prior to determining the NSR ratio. In (1.1), the NSR values are computed first and then averaged. Method (1.4) is computationally more expensive than (1.1) but performs better than ( 1.1 ) . References
[1] IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, No. 2, Apr. 1980, pp. 137-145, "Speech Enhancement Using a Soft-Decision Noise Suppression Filter", Robert J. McAulay and Maπlyn L. Malpass.
[2] IEEE Conference on Acoustics, Speech and Signal Processing, Apnl 1979, pp.
208-211, "Enhancement of Speech Corrupted by Acoustic Noise", M. Berouti, R Schwartz and J. Makhoul.
[3] Advanced Signal Processing and Digital Noise Reduction, 1996, Chapter 9, pp. 242-260, Saeed V. Vaseghi. (ISBN Wiley 0471958751)
[4] Proceedings of the IEEE, Vol. 67, No. 12, December 1979, pp. 1586-1604,
"Enhancement and Bandwidth Compression of Noisy Speech", Jake S. Lim and Alan V. Oppenheim.
[5] U.S. Patent 4,351,983, "Speech detector with vanable threshold", Sep. 28, 1982 William G. Crouse, Charles R. Knox.
Those skilled in the art will recognize that preceding detailed descnption discloses the prefened embodiments and that those embodiments may be altered and modified without departing from the true spint and scope of the invention as defined by the accompanying claims. For example, the numerators and denominators of the ratios shown in this specification could be reversed and the shape of the curves shown in Figures 5, 7 and 8 could be reversed by making other suitable changes in the algoπthms. In addition, the function blocks shown in Figure 3 could be implemented in whole or in part by application specific integrated circuits or other forms of logic circuits capable of performing logical and anthmetic operations

Claims

What is claimed is:
1. In a commumcation system for processing a communication signal including a speech component deπved from speech and a noise component denved from noise, a method for enhancing the quality of the commumcation signal compnsing: dividing said communication signal into a plurality of frequency band signals representing said communication signal m a plurality of frequency bands; generating in response to said speech component a control signal indicating one or more characteπstics of the frequency distπbution of said speech component coπespondmg to at least some of said frequency bands; assigning weighting values to said frequency band signals in response to said control signal; alteπng the frequency band signals in response to said weighting values to generate weighted frequency band signals; and combining said weighted frequency bands signals to generate a communication signal with enhanced quality.
2. A method, as claimed m claim 1, wherein said generating further is responsive to said noise component and wherem said control signal represents said one or more characteπstics by a plurality of likelihood values representing likelihoods that said communication signal is deπved from speech.
3. A method, as claimed in claim 2, wherein said likelihood values represent voiced speech and unvoiced speech.
4. A method, as claimed in claim 1, wherem said one or more characteπstics compnse voiced speech and unvoiced speech
5. A method, as claimed in claim 1, wherein said one or more characteristics comprise at least an approximation of the spectrum shape of said speech component within at least some of said frequency bands.
6. A method, as claimed in claim 5, wherein said weighting values represent
a weighting curve at least approximating inverse proportionality to the spectrum shape.
7. A method, as claimed in claim 5, and further comprising the step of generating power signals having power values derived from the power of at least some of said frequency band signals and wherein said control signal is derived from said power values.
8. A method, as claimed in claim 7, wherein said weighting values are assigned so that the variation of said power signals across at least some of said frequency bands is reduced.
9. A method, as claimed in claim 1, wherein said one or more characteristics comprise the pitch of said speech component.
10. A method, as claimed in claim 1, wherein said weighting values vary monotonically from a first value at a first frequency to a second value different from said first value at a second frequency greater than said first frequency by at least a factor of 2.
11. A method, as claimed in claim 10, wherein said weighting values also vary monotonically from said second value to a third value between said first value and second value at a frequency greater than said second frequency.
12. A method, as claimed in claim 1, wherein said assigning weighting values comprises assigning a weighting value resulting in a minimum of suppression to one of the frequency band signals selected in response to said one or more characteristics.
13. A method, as claimed in claim 12, and further comprising the step of determining a ratio of properties of said signal component and said noise component within at least said one frequency band and wherein said weighting values comprise a range of said weighting values assigned to said one frequency band in response to said ratio.
14. A method, as claimed in claim 13, wherein said one or more characteristics comprise a plurality of values representing likelihoods that said communication signal is derived from speech and wherein said properties comprise the power of said signal component and the power of said noise component.
15. A method, as claimed in claim 1, wherein the weighting values are derived from a table of weighting values .
16. A method, as claimed in claim 1, wherein the weighting values are derived from a function parameterized by a number of parameters fewer than the number of the weighting values.
17. In a communication system for processing a communication signal including a speech component derived from speech and a noise component derived from noise, apparatus for enhancing the quality of the communication signal comprising: means for dividing said communication signal into a plurality of frequency band signals representing said communication signal in a plurality of frequency bands; and a calculator generating in response to said speech component a control signal indicating one or more characteristics of the frequency distribution of said speech component in at least some of said frequency bands, assigning weighting values to said frequency band signals in response to said control signal, altering the frequency band signals in response to said weighting values to generate weighted frequency band signals and combining said weighted frequency bands signals to generate a communication signal with enhanced quality.
18. Apparatus, as claimed in claim 17, wherein said calculator further is responsive to said noise component and wherein said control signal represents said one or more characteristics by a plurality of likelihood values representing likelihoods that said communication signal is derived from speech.
19. Apparatus, as claimed in claim 18, wherein said likelihood values represent voiced speech and unvoiced speech.
20. Apparatus, as claimed in claim 17, wherein said one or more characteristics comprise voiced speech and unvoiced speech.
21. Apparatus, as claimed in claim 17, wherein said one or more characteristics comprise at least an approximation of the spectrum shape of said speech component within at least some of said frequency bands.
22. Apparatus, as claimed in claim 21, wherein said weighting values
represent a weighting curve at least approximating inverse proportionality to the
spectrum shape.
23. Apparatus, as claimed in claim 21, wherein said calculator generates power signals having power values derived from the power of at least some of said frequency band signals and wherein said control signal is derived from said power values.
24. Apparatus, as claimed in claim 23, wherein said weighting values are assigned so that the variation of said power values across at least some of said frequency bands is reduced.
25. Apparatus, as claimed in claim 17, wherein said one or more characteristics comprise the pitch of said speech component.
26. Apparatus, as claimed in claim 17, wherein said weighting values vary monotonically from a first value at a first frequency to a second value different from said first value at a second frequency greater than said first frequency by at least a factor of 2.
27. Apparatus, as claimed in claim 26, wherein said weighting values also vary monotonically from said second value to a third value between said first value and second value at a frequency greater than said second frequency.
28. Apparatus, as claimed in claim 17, wherein said weighting values comprise a weighting value resulting in a minimum of suppression to one of the frequency band signals selected in response to said one or more characteristics.
29. Apparatus, as claimed in claim 28, wherein said calculator determines a ratio of properties of said signal component and said noise component within at least said one frequency band and wherein said weighting values comprise a range of said weighting values assigned to said one frequency band in response to said ratio.
30. Apparatus, as claimed in claim 29, wherein said one or more characteristics comprise a plurality of values representing likelihoods that said communication signal is derived from speech and wherein said properties comprise the power of said signal component and the power of said noise component.
31. Apparatus, as claimed in claim 17, wherein the weighting values are derived from a table of weighting values .
32. Apparatus, as claimed in claim 17, wherein the weighting values are
derived from a function parameterized by a number of parameters fewer than the
number of the weighting values.
EP01918328A 2000-03-28 2001-03-02 Perceptual spectral weighting of frequency bands for adaptive noise cancellation Withdrawn EP1287521A4 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US53781400A 2000-03-28 2000-03-28
US537814 2000-03-28
PCT/US2001/006888 WO2001073759A1 (en) 2000-03-28 2001-03-02 Perceptual spectral weighting of frequency bands for adaptive noise cancellation

Publications (2)

Publication Number Publication Date
EP1287521A1 true EP1287521A1 (en) 2003-03-05
EP1287521A4 EP1287521A4 (en) 2005-11-16

Family

ID=24144210

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01918328A Withdrawn EP1287521A4 (en) 2000-03-28 2001-03-02 Perceptual spectral weighting of frequency bands for adaptive noise cancellation

Country Status (4)

Country Link
EP (1) EP1287521A4 (en)
AU (1) AU2001245418A1 (en)
CA (1) CA2401672A1 (en)
WO (1) WO2001073759A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101385079B (en) * 2006-02-14 2012-08-29 法国电信公司 Device for perceptual weighting in audio encoding/decoding
US8194873B2 (en) 2006-06-26 2012-06-05 Davis Pan Active noise reduction adaptive filter leakage adjusting
GB0725113D0 (en) * 2007-12-21 2008-01-30 Wolfson Microelectronics Plc SNR dependent gain
GB0725110D0 (en) * 2007-12-21 2008-01-30 Wolfson Microelectronics Plc Gain control based on noise level
US8204242B2 (en) 2008-02-29 2012-06-19 Bose Corporation Active noise reduction adaptive filter leakage adjusting
US8306240B2 (en) 2008-10-20 2012-11-06 Bose Corporation Active noise reduction adaptive filter adaptation rate adjusting
US8355512B2 (en) 2008-10-20 2013-01-15 Bose Corporation Active noise reduction adaptive filter leakage adjusting
US8160271B2 (en) * 2008-10-23 2012-04-17 Continental Automotive Systems, Inc. Variable noise masking during periods of substantial silence
US9173025B2 (en) 2012-02-08 2015-10-27 Dolby Laboratories Licensing Corporation Combined suppression of noise, echo, and out-of-location signals
CN103325380B (en) 2012-03-23 2017-09-12 杜比实验室特许公司 Gain for signal enhancing is post-processed

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4630305A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic gain selector for a noise suppression system
US5757937A (en) * 1996-01-31 1998-05-26 Nippon Telegraph And Telephone Corporation Acoustic noise suppressor
US5768473A (en) * 1995-01-30 1998-06-16 Noise Cancellation Technologies, Inc. Adaptive speech filter
US5806025A (en) * 1996-08-07 1998-09-08 U S West, Inc. Method and system for adaptive filtering of speech signals using signal-to-noise ratio to choose subband filter bank
US5812970A (en) * 1995-06-30 1998-09-22 Sony Corporation Method based on pitch-strength for reducing noise in predetermined subbands of a speech signal

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4185168A (en) * 1976-05-04 1980-01-22 Causey G Donald Method and means for adaptively filtering near-stationary noise from an information bearing signal
WO1995015550A1 (en) * 1993-11-30 1995-06-08 At & T Corp. Transmitted noise reduction in communications systems
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4630305A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic gain selector for a noise suppression system
US5768473A (en) * 1995-01-30 1998-06-16 Noise Cancellation Technologies, Inc. Adaptive speech filter
US5812970A (en) * 1995-06-30 1998-09-22 Sony Corporation Method based on pitch-strength for reducing noise in predetermined subbands of a speech signal
US5757937A (en) * 1996-01-31 1998-05-26 Nippon Telegraph And Telephone Corporation Acoustic noise suppressor
US5806025A (en) * 1996-08-07 1998-09-08 U S West, Inc. Method and system for adaptive filtering of speech signals using signal-to-noise ratio to choose subband filter bank

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MCAULAY R J ET AL: "SPEECH ENHANCEMENT USING A SOFT-DECISION NOISE SUPPRESSION FILTER" IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, US, vol. ASSP-28, no. 2, 1 April 1980 (1980-04-01), pages 137-145, XP000647165 ISSN: 0096-3518 *
See also references of WO0173759A1 *
YANG J ED - INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS: "Frequency domain noise suppression approaches in mobile telephone systems" STATISTICAL SIGNAL AND ARRAY PROCESSING. MINNEAPOLIS, APR. 27 - 30, 1993, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. VOL. 4, 27 April 1993 (1993-04-27), pages 363-366, XP010110469 ISBN: 0-7803-0946-4 *

Also Published As

Publication number Publication date
CA2401672A1 (en) 2001-10-04
WO2001073759A1 (en) 2001-10-04
EP1287521A4 (en) 2005-11-16
AU2001245418A1 (en) 2001-10-08

Similar Documents

Publication Publication Date Title
US6529868B1 (en) Communication system noise cancellation power signal calculation techniques
US6766292B1 (en) Relative noise ratio weighting techniques for adaptive noise cancellation
US6839666B2 (en) Spectrally interdependent gain adjustment techniques
US6671667B1 (en) Speech presence measurement detection techniques
US6023674A (en) Non-parametric voice activity detection
RU2329550C2 (en) Method and device for enhancement of voice signal in presence of background noise
US5839101A (en) Noise suppressor and method for suppressing background noise in noisy speech, and a mobile station
US7058572B1 (en) Reducing acoustic noise in wireless and landline based telephony
US6415253B1 (en) Method and apparatus for enhancing noise-corrupted speech
US7873114B2 (en) Method and apparatus for quickly detecting a presence of abrupt noise and updating a noise estimate
US8352257B2 (en) Spectro-temporal varying approach for speech enhancement
US20050240401A1 (en) Noise suppression based on Bark band weiner filtering and modified doblinger noise estimate
WO2000017855A1 (en) Noise suppression for low bitrate speech coder
MX2011001339A (en) Apparatus and method for processing an audio signal for speech enhancement using a feature extraction.
US8712768B2 (en) System and method for enhanced artificial bandwidth expansion
EP1287521A1 (en) Perceptual spectral weighting of frequency bands for adaptive noise cancellation
Upadhyay et al. Spectral subtractive-type algorithms for enhancement of noisy speech: an integrative review
Nemer Acoustic Noise Reduction for Mobile Telephony
Upadhyay et al. Spectral Subtractive-Type Algorithms for Enhancement of Noisy Speech: An Integrative

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021028

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

RIN1 Information on inventor provided before grant (corrected)

Inventor name: MARCHOK, DANIEL, J.

Inventor name: DUNNE, BRUCE, E.

Inventor name: CHANDRAN, RAVI

A4 Supplementary search report drawn up and despatched

Effective date: 20050930

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: TELLABS OPERATIONS, INC.

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20091001