EP1016072B1 - Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals - Google Patents

Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals Download PDF

Info

Publication number
EP1016072B1
EP1016072B1 EP98943999A EP98943999A EP1016072B1 EP 1016072 B1 EP1016072 B1 EP 1016072B1 EP 98943999 A EP98943999 A EP 98943999A EP 98943999 A EP98943999 A EP 98943999A EP 1016072 B1 EP1016072 B1 EP 1016072B1
Authority
EP
European Patent Office
Prior art keywords
signal
speech signal
noise
frame
spectral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP98943999A
Other languages
English (en)
French (fr)
Other versions
EP1016072A1 (de
Inventor
Philip Lockwood
Stéphane LUBIARZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nortel Networks France SAS
Original Assignee
Matra Nortel Communications SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matra Nortel Communications SAS filed Critical Matra Nortel Communications SAS
Publication of EP1016072A1 publication Critical patent/EP1016072A1/de
Application granted granted Critical
Publication of EP1016072B1 publication Critical patent/EP1016072B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • the present invention relates to techniques digital denoising of speech signals. She relates more particularly to denoising by nonlinear spectral subtraction.
  • This technique allows acceptable denoising to be obtained for strongly voiced signals, but completely distorts the speech signal. Faced with relatively coherent noise, such as that caused by the contact of car tires or the clicking of an engine, the noise may be more easily predictable as the unvoiced speech signal. We then tend to project the speech signal into part of the noise vector space.
  • the method does disregards the speech signal, especially unvoiced speech areas where predictability is scaled down.
  • predict the speech signal from of a reduced set of parameters does not allow taking counts all the intrinsic richness of speech. We understands here the limits of techniques based only on mathematical considerations forgetting the special character of speech.
  • a main object of the present invention is to propose a new denoising technique that takes take into account the characteristics of speech perception through the human ear, allowing denoising effective without deteriorating speech perception.
  • a method as set out in claim 1 and a device as set out in claim 19 are provided.
  • the second subtracted quantity can in particular be limited to the fraction of the estimate increased by the corresponding spectral component of the noise which exceeds the masking curve. This way of proceeding is based on the observation that it is enough to denoise the frequencies of audible noise. Conversely, there is no point in eliminating noise that is masked by speech.
  • the overestimation of the spectral envelope of the noise is generally desirable so that the estimate increased thus obtained is robust to sudden variations noise.
  • this overestimation has usually the downside of distorting the speech signal when it becomes too large. This has the effect to affect the voiced character of the speech signal in removing some of its predictability.
  • This disadvantage is very inconvenient in the conditions of the telephony, because it is during the voicing areas that the speech signal is then the most energetic.
  • the invention allows greatly reduce this drawback.
  • the denoising system shown in FIG. 1 processes a digital speech signal s.
  • the signal frame is transformed in the frequency domain by a module 11 applying a conventional fast Fourier transform (TFR) algorithm to calculate the module of the signal spectrum.
  • TFR fast Fourier transform
  • the frequency resolution available at the output of the fast Fourier transform is not used, but a lower resolution, determined by a number I of frequency bands covering the band [0 , F e / 2] of the signal.
  • a module 12 calculates the respective averages of the spectral components S n, f of the speech signal in bands, for example by a uniform weighting such that:
  • This averaging reduces the fluctuations between the bands by averaging the noise contributions in these bands, which will decrease the variance of the estimator of noise. In addition, this averaging allows a large reduction of the complexity of the system.
  • the averaged spectral components S n, i are addressed to a voice activity detection module 15 and to a noise estimation module 16. These two modules 15, 16 operate jointly, in the sense that degrees of vocal activity ⁇ n, i measured for the different bands by the module 15 are used by the module 16 to estimate the long-term energy of the noise in the different bands, while these long-term estimates B and n, i are used by module 15 to carry out a priori denoising of the speech signal in the different bands to determine the degrees of vocal activity ⁇ n, i .
  • modules 15 and 16 can correspond to the flowcharts represented in the figures 2 and 3.
  • the module 15 proceeds a priori to denoising the speech signal in the different bands i for the signal frame n.
  • This a priori denoising is carried out according to a conventional process of non-linear spectral subtraction from noise estimates obtained during one or more previous frames.
  • ⁇ 1 and ⁇ 2 are delays expressed in number of frames ( ⁇ 1 ⁇ 1, ⁇ 2 ⁇ 0), and ⁇ '/ n, i is a noise overestimation coefficient whose determination will be explained later.
  • the spectral components pp n, i are calculated according to: where ⁇ p i is a floor coefficient close to 0, conventionally used to prevent the spectrum of the denoised signal from taking negative or too low values which would cause musical noise.
  • Steps 17 to 20 therefore essentially consist in subtracting from the spectrum of the signal an estimate, increased by the coefficient ⁇ ' n - ⁇ 1, i , from the spectrum of noise estimated a priori.
  • the module 15 calculates, for each band i (0 ⁇ i ⁇ I), a quantity ⁇ E n, i representing the short-term variation of the energy of the noise-suppressed signal in the band i, as well as long-term value E n, i of the energy of the denoised signal in band i.
  • step 25 the quantity ⁇ E n, i is compared with a threshold ⁇ 1. If the threshold ⁇ 1 is not reached, the counter b i is incremented by one unit in step 26.
  • step 27 the long-term estimator ba i is compared to the value of the smoothed energy E n, i . If ba i ⁇ E n, i , the estimator ba i is taken equal to the smoothed value E n, i in step 28, and the counter b i is reset to zero.
  • the quantity ⁇ i which is taken equal to the ratio ba i / E n, i (step 36), is then equal to 1.
  • step 27 shows that ba i ⁇ E n, i
  • the counter b i is compared with a limit value bmax in step 29. If b i > bmax, the signal is considered to be too stationary to support vocal activity.
  • Bm represents an update coefficient between 0.90 and 1. Its value differs depending on the state of a voice activity detection automaton (steps 30 to 32). This state ⁇ n-1 is that determined during the processing of the previous frame.
  • the coefficient Bm takes a value Bmp very close to 1 so that the noise estimator is very slightly updated in the presence of speech. Otherwise, the coefficient Bm takes a lower value Bms, to allow a more significant update of the noise estimator in silence phase.
  • the difference ba i -bi i between the long-term estimator and the internal noise estimator is compared to a threshold ⁇ 2. If the threshold ⁇ 2 is not reached, the long-term estimator ba i is updated with the value of the internal estimator bi i in step 35. Otherwise, the long-term estimator ba i remains unchanged . This avoids that sudden variations due to a speech signal lead to an update of the noise estimator.
  • the module 15 After having obtained the quantities ⁇ i , the module 15 proceeds to the voice activity decisions in step 37.
  • the module 15 first updates the state of the detection automaton according to the quantity ⁇ 0 calculated for l of the signal band.
  • the new state ⁇ n of the automaton depends on the previous state ⁇ n-1 and on ⁇ 0 , as shown in Figure 4.
  • the module 15 also calculates the degrees of vocal activity ⁇ n, i in each band i ⁇ 1.
  • This function has for example the appearance shown in FIG. 5.
  • Module 16 calculates the band noise estimates, which will be used in the denoising process, using the successive values of the components S n, i and the degrees of voice activity ⁇ n, i . This corresponds to steps 40 to 42 of FIG. 3.
  • step 40 it is determined whether the voice activity detection machine has just gone from the rising state to the speaking state. If so, the last two estimates B and n -1, i and B and n -2, i Previously calculated for each band i ⁇ 1 are corrected according to the value of the previous estimate B and n -3, i .
  • step 42 the module 16 updates the noise estimates per band according to the formulas: where ⁇ B denotes a forgetting factor such as 0 ⁇ B ⁇ 1.
  • Formula (6) shows how the degree of non-binary vocal activity ⁇ n, i is taken into account.
  • the long-term noise estimates B and n, i are overestimated, by a module 45 (FIG. 1), before proceeding to denoising by nonlinear spectral subtraction.
  • Module 45 calculates the overestimation coefficient ⁇ '/ n, i previously mentioned, as well as an increased estimate B and ' / n, i which essentially corresponds to ⁇ '/ n, i . B and n, i .
  • the organization of the overestimation module 45 is shown in FIG. 6.
  • the enhanced estimate B and '/ n, i is obtained by combining the long-term estimate B and n, i and a measure ⁇ B max / n, i the variability of the noise component in band i around its long-term estimate.
  • this combination is essentially a simple sum made by an adder 46. It could also be a weighted sum.
  • the measure ⁇ B max / n, i of the noise variability reflects the variance of the noise estimator. It is obtained as a function of the values of S n, i and of B and n, i calculated for a certain number of previous frames on which the speech channel does not present any vocal activity in the band i. It is a function of deviations S nk, i - B nk, i calculated for a number K of frames of silence (nk ⁇ n). In the example shown, this function is simply the maximum (block 50).
  • the degree of voice activity ⁇ n, i is compared to a threshold (block 51) to decide whether the difference S or - B or , calculated in 52-53, may or may not be loaded into a queue 54 of K locations organized in first-in-first-out (FIFO) mode. If ⁇ n, i does not exceed the threshold (which can be equal to 0 if the function g () has the form of FIG. 5), the FIFO 54 is not supplied, while it is in the opposite case. The maximum value contained in FIFO 54 is then provided as a measure of variability ⁇ B max / n, i .
  • the measure of variability ⁇ B max / n, i can, as a variant, be obtained as a function of the values S n, f (and not S n, i ) and B and n, i .
  • FIFO 54 does not contain S nk, i - B nk, i for each of the bands i, but rather
  • the enhanced estimator B and '/ n, i provides excellent robustness to the musical noises of the denoising process.
  • a first phase of the spectral subtraction is carried out by the module 55 shown in FIG. 1.
  • This phase provides, with the resolution of the bands i (1 i i I I), the frequency response H 1 / n, i of first denoising filter, as a function of the components S n, i and B and n, i and the overestimation coefficients ⁇ '/ n, i .
  • the coefficient ⁇ 1 / i represents, like the coefficient ⁇ p i of formula (3), a floor conventionally used to avoid negative or too low values of the denoised signal.
  • the overestimation coefficient ⁇ ' n, i could be replaced in formula (7) by another coefficient equal to a function of ⁇ ' n, i and an estimate of the signal-to-noise ratio (for example S n, i / B and n, i ), this function decreasing according to the estimated value of the signal-to-noise ratio.
  • This function is then equal to ⁇ ' n, i for the lowest values of the signal-to-noise ratio. Indeed, when the signal is very noisy, it is a priori not useful to reduce the overestimation factor.
  • this function decreases to zero for the highest values of the signal / noise ratio. This protects the most energetic areas of the spectrum, where the speech signal is most significant, the amount subtracted from the signal then tending towards zero.
  • This strategy can be refined by applying it selectively to frequency harmonics pitch of the speech signal when it has voice activity.
  • a second denoising phase is carried out by a module 56 for protecting harmonics.
  • the module 57 can apply any known method of analysis of the speech signal of the frame to determine the period T p , expressed as an integer or fractional number of samples, for example a linear prediction method.
  • the protection provided by the module 56 may consist in carrying out, for each frequency f belonging to a band i:
  • H 2 / n, f 1
  • the quantity subtracted from the component S n, f will be zero.
  • the floor coefficients ⁇ 2 / i express the fact that certain harmonics of the tone frequency f p can be masked by noise, so that n protecting them is useless.
  • This protection strategy is preferably applied for each of the frequencies closest to the harmonics of f p , that is to say for any arbitrary integer.
  • condition (9) the difference between the ⁇ -th harmonic of the real tonal frequency is its estimate ⁇ ⁇ f p (condition (9)) can go up to ⁇ ⁇ ⁇ ⁇ f p / 2.
  • this difference can be greater than the spectral half-resolution ⁇ f / 2 of the Fourier transform.
  • the corrected frequency response H 2 / n, f can be equal to 1 as indicated above, which corresponds to the subtraction of a zero quantity in the context of spectral subtraction, that is to say ie full protection of the frequency in question. More generally, this corrected frequency response H 2 / n, f could be taken equal to a value between 1 and H 1 / n, f depending on the degree of protection desired, which corresponds to the subtraction of an amount less than which would be subtracted if the frequency in question was not protected.
  • S 2 / n, f H 2 n, f .
  • S n, f H 2 n, f .
  • This signal S 2 / n, f is supplied to a module 60 which calculates, for each frame n, a masking curve by applying a psychoacoustic model of auditory perception by the human ear.
  • the masking phenomenon is a principle known from functioning of the human ear. When two frequencies are heard simultaneously, it is possible that one of the two is no longer audible. We say then that it is hidden.
  • the masking curve is seen as the convolution of the spectral spreading function of the basilar membrane in the bark domain with the excitatory signal, constituted in the present application by the signal S 2 / n, f .
  • the spectral spreading function can be modeled as shown in Figure 7.
  • R q depends on the more or less voiced character of the signal.
  • designates a degree of voicing of the speech signal, varying between zero (no voicing) and 1 (strongly voiced signal).
  • the denoising system also includes a module 62 which corrects the frequency response of the denoising filter, as a function of the masking curve M n, q calculated by the module 60 and of the increased estimates B and '/ n, i calculated by the module 45.
  • Module 62 decides the level of denoising which must really be reached.
  • the new response H 3 / n, f for a frequency f belonging to the band i defined by the module 12 and to the bark band q, thus depends on the relative difference between the increased estimate B and '/ n, i of the corresponding spectral component of the noise and the masking curve M n, q , as follows:
  • the quantity subtracted from a spectral component S n, f , in the process of spectral subtraction having the frequency response H 3 / n, f is substantially equal to the minimum between on the one hand the quantity subtracted from this spectral component in the spectral subtraction process having the frequency response H 2 / n, f , and on the other hand the fraction of the increased estimate B and '/ n, i of the corresponding spectral component of the noise which, if if necessary, exceeds the masking curve M n, q .
  • FIG. 8 illustrates the principle of the correction applied by the module 62. It schematically shows an example of masking curve M n, q calculated on the basis of the spectral components S 2 / n, f of the noise-suppressed signal, as well as the estimation plus B and '/ n, i of the noise spectrum.
  • the quantity finally subtracted from the components S n, f will be that represented by the hatched areas, that is to say limited to the fraction of the increased estimate B and '/ n, i of the spectral components of the noise which exceeds the curve masking.
  • This subtraction is carried out by multiplying the frequency response H 3 / n, f of the denoising filter by the spectral components S n, f of the speech signal (multiplier 64).
  • TFRI inverse fast Fourier transform
  • FIG. 9 shows a preferred embodiment of a denoising system implementing the invention.
  • This system comprises a certain number of elements similar to corresponding elements of the system of FIG. 1, for which the same reference numbers have been used.
  • modules 10, 11, 12, 15, 16, 45 and 55 provide in particular the quantities S n, i , B and n, i , ⁇ '/ n, i,, B and ' / n, i and H 1 / n, f to perform selective denoising.
  • the frequency resolution of the fast Fourier transform 11 is a limitation of the system of FIG. 1.
  • the frequency causing protection by the module 56 is not necessarily the precise tonal frequency f p , but the frequency closest to it in the discrete spectrum. In some cases, it is then possible to protect harmonics relatively far from that of the tone frequency.
  • the system of FIG. 9 overcomes this drawback thanks to an appropriate conditioning of the speech signal.
  • the sampling frequency of the signal is modified so that the period 1 / f p covers exactly an integer number of sample times of the conditioned signal.
  • This size N is usually a power of 2 for putting implementation of the TFR. It is 256 in the example considered.
  • This choice is made by a module 70 according to the value of the delay T p supplied by the harmonic analysis module 57.
  • the module 70 provides the ratio K between the sampling frequencies to three frequency change modules 71, 72, 73 .
  • the module 71 is used to transform the values S n, i , B and n, i , ⁇ '/ n, i,, B and ' / n, i and H 1 / n, f , relating to the bands i defined by the module 12, in the scale of modified frequencies (sampling frequency f e ). This transformation consists simply in dilating the bands i in the factor K. The values thus transformed are supplied to the module 56 for protecting harmonics.
  • the module 72 performs the oversampling of the frame of N samples provided by the windowing module 10.
  • the conditioned signal frame supplied by the module 72 includes KN samples at the frequency f e . These samples are sent to a module 75 which calculates their Fourier transform.
  • the two blocks therefore have an overlap of (2-K) ⁇ 100%.
  • the autocorrelations A (k) are calculated by a module 76, for example according to the formula:
  • a module 77 then calculates the normalized entropy H, and supplies it to module 60 for the calculation of the masking curve (see SA McClellan et al: “Spectral Entropy: an Alternative Indicator for Rate Allocation?”, Proc. ICASSP'94 , pages 201-204):
  • the normalized entropy H constitutes a measurement of voicing very robust to noise and variations in the tonal frequency.
  • the correction module 62 operates in the same way as that of the system in FIG. 1, taking into account the overestimated noise B and '/ n, i rescaled by the frequency change module 71. It provides the response in frequency H 3 / n, f of the final denoising filter, which is multiplied by the spectral components S n, f of the signal conditioned by the multiplier 64. The components S 3 / n, f which result therefrom are brought back into the time domain by the TFRI 65 module. At the output of this TFRI 65, a module 80 combines, for each frame, the two signal blocks resulting from the processing of the two overlapping blocks delivered by the TFR 75. This combination can consist of a weighted sum Hamming of samples, to form a denoised conditioned signal frame of KN samples.
  • the management module 82 controls the windowing module 10 so that the overlap between the current frame and the next one corresponds to NM. This recovery of NM samples will be required in the recovery sum carried out by the module 66 during the processing of the next frame.
  • the tone frequency is estimated in an average way on the frame.
  • the tonal frequency can vary some little over this period. It is possible to take into account these variations in the context of the present invention, in conditioning the signal so as to obtain artificially a constant tone frequency in the frame.
  • the analysis module 57 harmonic provides the time intervals between the consecutive breaks in speech signal due to closures of the glottis of the intervening speaker for the duration of the frame.
  • Usable methods to detect such micro-ruptures are well known in the area of harmonic signal analysis lyrics.
  • the principle of these methods is to perform a statistical test between two models, one in the short term and the other in the long term. Both models are adaptive linear prediction models.
  • the value of this statistical test w m is the cumulative sum of the posterior likelihood ratio of two distributions, corrected by the Kullback divergence. For a distribution of residuals having a Gaussian statistic, this value w m is given by: where e 0 / m and ⁇ 2/0 represent the residue calculated at the time of the sample m of the frame and the variance of the long-term model, e 1 / m and ⁇ 2/1 likewise representing the residue and the variance of the short term model. The closer the two models are, the more the value w m of the statistical test is close to 0. On the other hand, when the two models are distant from each other, this value w m becomes negative, which indicates a break R of the signal.
  • FIG. 10 thus shows a possible example of evolution of the value w m , showing the breaks R of the speech signal.
  • FIG. 11 shows the means used to calculate the conditioning of the signal in the latter case.
  • the harmonic analysis module 57 is produced so as to implement the above analysis method, and to provide the intervals t r relative to the signal frame produced by the module 10.
  • These oversampling reports K r are supplied to the frequency change modules 72 and 73, so that the interpolations are carried out with the sampling ratio K r over the corresponding time interval t r .
  • the largest T p of the time intervals t r supplied by the module 57 for a frame is selected by the module 70 (block 91 in FIG. 11) to obtain a torque p, ⁇ as indicated in table I.
  • This embodiment of the invention also involves an adaptation of the window management module 82.
  • the number M of samples of the denoised signal to be saved on the current frame here corresponds to an integer number of consecutive time intervals t r between two glottal breaks (see FIG. 10). This arrangement avoids the problems of phase discontinuity between frames, while taking into account the possible variations of the time intervals t r on a frame.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Claims (19)

  1. Verfahren zur Rauschunterdrückung eines in aufeinanderfolgenden Blöcken behandelten digitalen Sprachsignals (s), wobei:
    Spektralkomponenten (Sn,f,Sn,i) des Sprachsignals an jedem Block berechnet werden;
    für jeden Block majorierte Schätzungen (B and ' / n,i) von Spektralkomponenten von in dem Sprachsignal enthaltenem Rauschen berechnet werden;
    eine spektrale Subtraktion durchgeführt wird, die mindestens einen ersten Subtraktionsschritt aufweist, in dem jeweils von jeder Spektralkomponente (Sn,f) des Sprachsignals an dem Block eine erste Größe subtrahiert wird, die von Parametern abhängt, welche die majorierte Schätzung (B and ' / n,i) der dem Rauschen für den Block entsprechenden Spektralkomponenten beinhalten, so daß Spektralkomponenten (S2 n,f) eines ersten rauschunterdrückten Signals erhalten werden,
    dadurch gekennzeichnet, daß die spektrale Subtraktion des weiteren die folgenden Schritte aufweist:
    Berechnung einer Maskierungskurve (Mn,q) unter Anwendung eines Modells der auditiven Wahrnehmung ausgehend von den Spektralkomponenten (S2 n,f) des ersten rauschunterdrückten Signals;
    Vergleich der majorierten Schätzungen (B and ' / n,i) der Spektralkomponenten des Rauschens für den Block mit der berechneten Maskierungskurve (Mn,q); und
    einen zweiten Subtrahierschritt, in dem jeweils von jeder Spektralkomponente (Sn,f) des Sprachsignals an dem Block eine zweite Größe subtrahiert wird, die von Parametern abhängt, welche einen Abstand zwischen der majorierten Schätzung der entsprechenden Spektralkomponente des Rauschens und der berechneten Maskierungskurve beinhalten.
  2. Verfahren nach Anspruch 1, bei dem die zweite Größe bezüglich einer Spektralkomponente (Sn,f) des Sprachsignals an dem Block im wesentlichen gleich dem Minimum zwischen der entsprechenden ersten Größe und dem Anteil der majorierten Schätzung (B and ' / n,i) der entsprechenden Spektralkomponente des Rauschens ist, welcher die Maskierungskurve (Mn,q) übersteigt.
  3. Verfahren nach einem der Ansprüche 1 oder 2, bei dem eine harmonische Analyse des Sprachsignals durchgeführt wird, um eine Tonfrequenz (fp) des Sprachsignals an jedem Block zu schätzen, wo es eine Stimmaktivität aufweist.
  4. Verfahren nach Anspruch 3, bei dem die Parameter, von denen die ersten zu subtrahierenden Größen abhängen, die geschätzte Tonfrequenz (fp) beinhalten.
  5. Verfahren nach Anspruch 4, bei dem die erste von einer gegebenen Spektralkomponente (Sn,f) des Sprachsignals zu subtrahierende Größe geringer ist, wenn die Spektralkomponente derjenigen Frequenz entspricht, die einem ganzzahligen Vielfachen der geschätzten Tonfrequenz (fp) am nächsten ist, als wenn die Spektralkomponente nicht der Frequenz entspricht, die einem ganzzahligen Vielfachen der geschätzten Tonfrequenz am nächsten ist.
  6. Verfahren nach Anspruch 4 oder 5, bei dem die jeweils von den Spektralkomponenten (Sn,f) des Sprachsignals zu subtrahierenden Größen, welche den Frequenzen entsprechen, die den ganzzahligen Vielfachen der geschätzten Tonfrequenz (fp) am nächsten sind, im wesentlichen Null sind.
  7. Verfahren nach einem der Ansprüche 3 bis 6, bei dem, nach der Schätzung der Tonfrequenz (fp) des Sprachsignals an einem Block das Sprachsignal des Blocks konditioniert wird, indem es bei einer Überabtastfrequenz (fe) überabgetastet wird, die ein Mehrfaches der geschätzten Tonfrequenz ist, und die Spektralkomponenten (Sn,f) des Sprachsignals an dem Block auf der Grundlage des konditionierten Signals (s') berechnet werden, um diese Größen von ihnen zu subtrahieren.
  8. Verfahren nach Anspruch 7, bei dem Spektralkomponenten (Sn,f) des Sprachsignals berechnet werden, indem das konditionierte Signal (s') auf Blöcke von N Abtastproben verteilt wird, welche einer Transformation im Frequenzbereich unterzogen werden, und bei dem das Verhältnis (p) zwischen der Überabtastfrequenz (fe) und der geschätzten Tonfrequenz ein Teiler mit der Zahl N ist.
  9. Verfahren nach Anspruch 7 oder 8, bei dem ein Grad der Stimmhaftigkeit (χ) des Sprachsignals an dem Block ausgehend von einer Berechnung der Entropie (H) der Autokorrelation der auf der Grundlage des konditionierten Signals berechneten Spektralkomponenten geschätzt wird.
  10. Verfahren nach Anspruch 9, bei dem die Spektralkomponenten (S2 n,f), deren Autokorrelation (H) berechnet wird, die auf der Grundlage des konditionierten Signals (s') nach Subtraktion der ersten Größen berechneten sind.
  11. Verfahren nach Anspruch 9 oder 10, bei dem der Grad der Stimmhaftigkeit (χ) ausgehend von einer normalisierten Entropie H mit der Form
    Figure 00420001
    gemessen wird, wobei N die Anzahl von Abtastproben ist, die zur Berechnung der Spektralkomponenten (Sn,f) auf der Grundlage des konditionierten Signals (s') verwendet werden, und A(k) die normalisierte Autokorrelation ist, die definiert ist durch:
    Figure 00420002
    wobei S2 n,f die auf der Grundlage des konditionierten Signals berechnete Spektralkomponente mit Rang f ist.
  12. Verfahren nach Anspruch 11, wobei die Berechnung der Maskierungskurve (Mn,q) den mittels der normalisierten Entropie H gemessenen Grad der Stimmhaftigkeit (χ) einsetzt.
  13. Verfahren nach einem der Ansprüche 3 bis 12, bei dem nach der Behandlung eines jeden Blockes von den durch diese Behandlung zur Verfügung gestellten Abtastproben des rauschunterdrückten Sprachsignals eine Anzahl von Abtastproben (M) aufbewahrt wird, die gleich einem ganzzahligen Vielfachen von Malen des Verhältnisses (Tp) aus der Abtastfrequenz (Fe) und der geschätzten Tonfrequenz (fp) ist.
  14. Verfahren nach einem der Ansprüche 3 bis 12, bei dem die Schätzung der Tonfrequenz des Sprachsignals an einem Block die folgenden Schritte aufweist:
    Schätzen der Zeitintervalle (tr) zwischen zwei aufeinanderfolgenden, während der Dauer des Blocks auftretenden Unterbrechungen (R) des Signals, welche Schließungen der Glottis des Sprechers zuzuordnen sind, wobei die geschätzte Tonfrequenz zu den Zeitintervallen umgekehrt proportional ist;
    Interpolieren des Sprachsignals in den Zeitintervallen, damit das aus dieser Interpolation hervorgehende konditionierte Signal (s') zwischen zwei aufeinanderfolgenden Unterbrechungen ein konstantes Zeitintervall aufweist.
  15. Verfahren nach Anspruch 14, bei dem nach Behandlung eines jeden Blockes von den durch diese Behandlung zur Verfügung gestellten Abtastproben des rauschunterdrückten Sprachsignals eine Anzahl von Abtastproben (M) aufbewahrt wird, welche einer ganzzahligen Anzahl von geschätzten Zeitintervallen (tr) entspricht.
  16. Verfahren nach einem der vorhergehenden Ansprüche, bei dem im Spektralbereich Werte eines Rauschabstandes geschätzt werden, den das Sprachsignal (s) an jedem Block aufweist, und bei dem die Parameter, von denen die ersten zu subtrahierenden Größen abhängen, die geschätzten Werte des Rauschabstandes beinhalten, wobei die von jeder Spektralkomponenten (Sn,f) des Sprachsignals an dem Block zu subtrahierende erste Größe eine abnehmende Funktion des entsprechenden geschätzten Werts des Rauschabstandes ist.
  17. Verfahren nach Anspruch 16, bei dem die Funktion für die höchsten Werte des Rauschabstandes nach Null hin abnimmt.
  18. Verfahren nach einem der vorhergehenden Ansprüche, bei dem auf das Ergebnis der spektralen Subtraktion eine Transformation in den Zeitbereich angewendet wird, um ein rauschunterdrücktes Sprachsignal (s3) zu erstellen.
  19. Vorrichtung zur Rauschunterdrückung eines Sprachsignals, mit Behandlungseinrichtungen, die dazu konzipiert sind, ein Verfahren nach einem der vorhergehenden Ansprüche durchzuführen.
EP98943999A 1997-09-18 1998-09-16 Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals Expired - Lifetime EP1016072B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9711643A FR2768547B1 (fr) 1997-09-18 1997-09-18 Procede de debruitage d'un signal de parole numerique
FR9711643 1997-09-18
PCT/FR1998/001980 WO1999014738A1 (fr) 1997-09-18 1998-09-16 Procede de debruitage d'un signal de parole numerique

Publications (2)

Publication Number Publication Date
EP1016072A1 EP1016072A1 (de) 2000-07-05
EP1016072B1 true EP1016072B1 (de) 2002-01-16

Family

ID=9511230

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98943999A Expired - Lifetime EP1016072B1 (de) 1997-09-18 1998-09-16 Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals

Country Status (7)

Country Link
US (1) US6477489B1 (de)
EP (1) EP1016072B1 (de)
AU (1) AU9168998A (de)
CA (1) CA2304571A1 (de)
DE (1) DE69803203T2 (de)
FR (1) FR2768547B1 (de)
WO (1) WO1999014738A1 (de)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0997003A2 (de) * 1997-07-01 2000-05-03 Partran APS Verfahren und schaltung zum rauschereduktion in sprachsignalen
US6549586B2 (en) 1999-04-12 2003-04-15 Telefonaktiebolaget L M Ericsson System and method for dual microphone signal noise reduction using spectral subtraction
US6717991B1 (en) * 1998-05-27 2004-04-06 Telefonaktiebolaget Lm Ericsson (Publ) System and method for dual microphone signal noise reduction using spectral subtraction
FR2797343B1 (fr) * 1999-08-04 2001-10-05 Matra Nortel Communications Procede et dispositif de detection d'activite vocale
JP3454206B2 (ja) 1999-11-10 2003-10-06 三菱電機株式会社 雑音抑圧装置及び雑音抑圧方法
US6804640B1 (en) * 2000-02-29 2004-10-12 Nuance Communications Signal noise reduction using magnitude-domain spectral subtraction
US6766292B1 (en) * 2000-03-28 2004-07-20 Tellabs Operations, Inc. Relative noise ratio weighting techniques for adaptive noise cancellation
JP2002221988A (ja) * 2001-01-25 2002-08-09 Toshiba Corp 音声信号の雑音抑圧方法と装置及び音声認識装置
AU4627801A (en) * 2001-04-11 2001-07-09 Phonak Ag Method for the elimination of noise signal components in an input signal for an auditory system, use of said method and hearing aid
US6985709B2 (en) * 2001-06-22 2006-01-10 Intel Corporation Noise dependent filter
DE10150519B4 (de) * 2001-10-12 2014-01-09 Hewlett-Packard Development Co., L.P. Verfahren und Anordnung zur Sprachverarbeitung
US7103539B2 (en) * 2001-11-08 2006-09-05 Global Ip Sound Europe Ab Enhanced coded speech
US20040078199A1 (en) * 2002-08-20 2004-04-22 Hanoh Kremer Method for auditory based noise reduction and an apparatus for auditory based noise reduction
US7398204B2 (en) * 2002-08-27 2008-07-08 Her Majesty In Right Of Canada As Represented By The Minister Of Industry Bit rate reduction in audio encoders by exploiting inharmonicity effects and auditory temporal masking
EP1554716A1 (de) * 2002-10-14 2005-07-20 Koninklijke Philips Electronics N.V. Signalfilterung
DE602004013031T2 (de) * 2003-10-10 2009-05-14 Agency For Science, Technology And Research Verfahren zum codieren eines digitalen signals in einen skalierbaren bitstrom, verfahren zum decodieren eines skalierbaren bitstroms
US7725314B2 (en) * 2004-02-16 2010-05-25 Microsoft Corporation Method and apparatus for constructing a speech filter using estimates of clean speech and noise
US7729908B2 (en) * 2005-03-04 2010-06-01 Panasonic Corporation Joint signal and model based noise matching noise robustness method for automatic speech recognition
US20060206320A1 (en) * 2005-03-14 2006-09-14 Li Qi P Apparatus and method for noise reduction and speech enhancement with microphones and loudspeakers
KR100927897B1 (ko) * 2005-09-02 2009-11-23 닛본 덴끼 가부시끼가이샤 잡음억제방법과 장치, 및 컴퓨터프로그램
US8126706B2 (en) * 2005-12-09 2012-02-28 Acoustic Technologies, Inc. Music detector for echo cancellation and noise reduction
JP4592623B2 (ja) * 2006-03-14 2010-12-01 富士通株式会社 通信システム
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
JP4757158B2 (ja) * 2006-09-20 2011-08-24 富士通株式会社 音信号処理方法、音信号処理装置及びコンピュータプログラム
US20080162119A1 (en) * 2007-01-03 2008-07-03 Lenhardt Martin L Discourse Non-Speech Sound Identification and Elimination
BRPI0807703B1 (pt) 2007-02-26 2020-09-24 Dolby Laboratories Licensing Corporation Método para aperfeiçoar a fala em áudio de entretenimento e meio de armazenamento não-transitório legível por computador
JP5260561B2 (ja) * 2007-03-19 2013-08-14 ドルビー ラボラトリーズ ライセンシング コーポレイション 知覚モデルを使用した音声の強調
WO2009035614A1 (en) * 2007-09-12 2009-03-19 Dolby Laboratories Licensing Corporation Speech enhancement with voice clarity
WO2009035613A1 (en) * 2007-09-12 2009-03-19 Dolby Laboratories Licensing Corporation Speech enhancement with noise level estimation adjustment
EP2192579A4 (de) * 2007-09-19 2016-06-08 Nec Corp Rauschunterdrückungsvorrichtung sowie entsprechendes verfahren und programm
JP5056654B2 (ja) * 2008-07-29 2012-10-24 株式会社Jvcケンウッド 雑音抑制装置、及び雑音抑制方法
US20110257978A1 (en) * 2009-10-23 2011-10-20 Brainlike, Inc. Time Series Filtering, Data Reduction and Voice Recognition in Communication Device
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US8423357B2 (en) * 2010-06-18 2013-04-16 Alon Konchitsky System and method for biometric acoustic noise reduction
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9536540B2 (en) * 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
CN103824562B (zh) * 2014-02-10 2016-08-17 太原理工大学 基于心理声学模型的语音后置感知滤波器
DE102014009689A1 (de) * 2014-06-30 2015-12-31 Airbus Operations Gmbh Intelligentes Soundsystem/-modul zur Kabinenkommunikation
DE112015003945T5 (de) 2014-08-28 2017-05-11 Knowles Electronics, Llc Mehrquellen-Rauschunterdrückung
WO2016040885A1 (en) 2014-09-12 2016-03-17 Audience, Inc. Systems and methods for restoration of speech components
CN105869652B (zh) * 2015-01-21 2020-02-18 北京大学深圳研究院 心理声学模型计算方法和装置
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
CN110168640B (zh) * 2017-01-23 2021-08-03 华为技术有限公司 用于增强信号中需要分量的装置和方法
US11017798B2 (en) * 2017-12-29 2021-05-25 Harman Becker Automotive Systems Gmbh Dynamic noise suppression and operations for noisy speech signals

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03117919A (ja) * 1989-09-30 1991-05-20 Sony Corp ディジタル信号符号化装置
AU633673B2 (en) 1990-01-18 1993-02-04 Matsushita Electric Industrial Co., Ltd. Signal processing device
EP0459362B1 (de) 1990-05-28 1997-01-08 Matsushita Electric Industrial Co., Ltd. Sprachsignalverarbeitungsvorrichtung
US5450522A (en) * 1991-08-19 1995-09-12 U S West Advanced Technologies, Inc. Auditory model for parametrization of speech
US5469087A (en) 1992-06-25 1995-11-21 Noise Cancellation Technologies, Inc. Control system using harmonic filters
US5400409A (en) * 1992-12-23 1995-03-21 Daimler-Benz Ag Noise-reduction method for noise-affected voice channels
EP0683916B1 (de) * 1993-02-12 1999-08-11 BRITISH TELECOMMUNICATIONS public limited company Rauschverminderung
US5623577A (en) * 1993-07-16 1997-04-22 Dolby Laboratories Licensing Corporation Computationally efficient adaptive bit allocation for encoding method and apparatus with allowance for decoder spectral distortions
JP3131542B2 (ja) * 1993-11-25 2001-02-05 シャープ株式会社 符号化復号化装置
US5555190A (en) 1995-07-12 1996-09-10 Micro Motion, Inc. Method and apparatus for adaptive line enhancement in Coriolis mass flow meter measurement
FR2739736B1 (fr) * 1995-10-05 1997-12-05 Jean Laroche Procede de reduction des pre-echos ou post-echos affectant des enregistrements audio
FI100840B (fi) * 1995-12-12 1998-02-27 Nokia Mobile Phones Ltd Kohinanvaimennin ja menetelmä taustakohinan vaimentamiseksi kohinaises ta puheesta sekä matkaviestin
US6144937A (en) * 1997-07-23 2000-11-07 Texas Instruments Incorporated Noise suppression of speech by signal processing including applying a transform to time domain input sequences of digital signals representing audio information

Also Published As

Publication number Publication date
FR2768547A1 (fr) 1999-03-19
WO1999014738A1 (fr) 1999-03-25
US6477489B1 (en) 2002-11-05
AU9168998A (en) 1999-04-05
CA2304571A1 (fr) 1999-03-25
DE69803203D1 (de) 2002-02-21
EP1016072A1 (de) 2000-07-05
DE69803203T2 (de) 2002-08-29
FR2768547B1 (fr) 1999-11-19

Similar Documents

Publication Publication Date Title
EP1016072B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP2002428B1 (de) Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung
EP1789956B1 (de) Verfahren zum verarbeiten eines rauschbehafteten tonsignals und einrichtung zur implementierung des verfahrens
EP1356461B1 (de) Rauschverminderungsverfahren und -einrichtung
EP1320087B1 (de) Synthese eines Anregungssignales zur Verwendung in einem Generator von Komfortrauschen
EP1016071B1 (de) Verfahren und vorrichtung zur sprachdetektion
EP1395981B1 (de) Einrichtung und verfahren zur verarbeitung eines audiosignals
EP1016073B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP0490740A1 (de) Verfahren und Einrichtung zum Bestimmen der Sprachgrundfrequenz in Vocodern mit sehr niedriger Datenrate
JP2003280696A (ja) 音声強調装置及び音声強調方法
EP1021805B1 (de) Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals
EP1429316B1 (de) Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen
EP3192073B1 (de) Unterscheidung und dämpfung von vorechos in einem digitalen audiosignal
EP2515300B1 (de) Verfahren und System für die Geräuschunterdrückung
FR2888704A1 (de)
EP4287648A1 (de) Elektronische vorrichtung und verarbeitungsverfahren, akustische vorrichtung und computerprogramm dafür
FR2885462A1 (fr) Procede d'attenuation des pre-et post-echos d'un signal numerique audio et dispositif correspondant

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000316

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 20001004

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 21/02 A

RTI1 Title (correction)

Free format text: METHOD AND APPARATUS FOR SUPPRESSING NOISE IN A DIGITAL SPEECH SIGNAL

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 21/02 A

RTI1 Title (correction)

Free format text: METHOD AND APPARATUS FOR SUPPRESSING NOISE IN A DIGITAL SPEECH SIGNAL

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 69803203

Country of ref document: DE

Date of ref document: 20020221

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20020407

RAP2 Party data changed (patent owner data changed or rights of a patent transferred)

Owner name: NORTEL NETWORKS FRANCE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: FR

Ref legal event code: CD

Ref country code: FR

Ref legal event code: CA

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20031127

Year of fee payment: 6

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050401

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20050817

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20050902

Year of fee payment: 8

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20060916

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20070531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060916

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20061002