EP1016071B1 - Method and apparatus for detecting speech activity - Google Patents

Method and apparatus for detecting speech activity Download PDF

Info

Publication number
EP1016071B1
EP1016071B1 EP98943998A EP98943998A EP1016071B1 EP 1016071 B1 EP1016071 B1 EP 1016071B1 EP 98943998 A EP98943998 A EP 98943998A EP 98943998 A EP98943998 A EP 98943998A EP 1016071 B1 EP1016071 B1 EP 1016071B1
Authority
EP
European Patent Office
Prior art keywords
frame
noise
signal
band
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP98943998A
Other languages
German (de)
French (fr)
Other versions
EP1016071A1 (en
Inventor
Philip Lockwood
Stéphane LUBIARZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EADS Defence and Security Networks SAS
Nortel Networks France SAS
Original Assignee
EADS Defence and Security Networks SAS
Matra Nortel Communications SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EADS Defence and Security Networks SAS, Matra Nortel Communications SAS filed Critical EADS Defence and Security Networks SAS
Publication of EP1016071A1 publication Critical patent/EP1016071A1/en
Application granted granted Critical
Publication of EP1016071B1 publication Critical patent/EP1016071B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/932Decision in previous or following frames
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/935Mixed voiced class; Transitions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/937Signal energy in various frequency bands

Definitions

  • the present invention relates to techniques digital speech signal processing. She relates more particularly to techniques making call for voice activity detection to perform differentiated processing depending on whether the signal supports or not a voice activity.
  • the digital techniques in question come under various fields: speech coding for transmission or storage, speech recognition, decreased noise, echo cancellation ...
  • Voice activity detection methods have main difficulty distinguishing between activity voice and the accompanying noise.
  • the use of a conventional denoising technique does not allow treatment this difficulty, since these techniques do themselves use noise estimates that depend on degree signal voice activity. This problem has for example been described in document US-A-5659622.
  • a main object of the present invention is to improve the noise robustness of the voice activity detection. To achieve this goal, a method as claimed in claim 1 is provided.
  • the invention thus provides a detection method voice activity in a digital speech signal processed by successive frames, in which the speech signal to denoising taking into account noise estimates included in the signal, updated day for each frame in a dependent manner of at least a degree of vocal activity determined for said frame.
  • a priori denoising is carried out of the speech signal of each frame based on estimates noise obtained when processing at least one frame previous, and we analyze the energy variations of the a priori denoised signal to detect the level of activity voice of said frame.
  • the denoising system shown in FIG. 1 processes a digital speech signal s.
  • the signal frame is transformed in the frequency domain by a module 11 applying a conventional algorithm this fast Fourier transform (TFR) to calculate the module of the signal spectrum.
  • TFR this fast Fourier transform
  • the frequency resolution available at the output of the fast Fourier transform is not used, but a lower resolution, determined by a number I of frequency bands covering the band [0 , F e / 2] of the signal.
  • a module 12 calculates the respective averages of the spectral components S n, f of the speech signal in bands, for example by a uniform weighting such that:
  • This averaging reduces the fluctuations between the bands by averaging the noise contributions in these bands, which will decrease the variance of the estimator of noise. In addition, this averaging allows a large reduction of the complexity of the system.
  • the averaged spectral components S n, i are addressed to a voice activity detection module 15 and to a noise estimation module 16. These two modules 15, 16 operate jointly, in the sense that degrees of vocal activity ⁇ n, i measured for the different bands by the module 15 are used by the module 16 to estimate the long-term energy of the noise in the different bands, while these long-term estimates ⁇ n, i are used by the module 15 to carry out a priori denoising of the speech signal in the different bands to determine the degrees of vocal activity ⁇ n, i .
  • modules 15 and 16 can correspond to the flowcharts represented in the figures 2 and 3.
  • the module 15 proceeds a priori to denoising the speech signal in the different bands i for the signal frame n.
  • This a priori denoising is carried out according to a conventional process of non-linear spectral subtraction from noise estimates obtained during one or more previous frames.
  • ⁇ 1 and ⁇ 2 are delays expressed in number of frames ( ⁇ 1 ⁇ 1, ⁇ 2 ⁇ 0), and ⁇ 'n, i is a noise overestimation coefficient whose determination will be explained later.
  • the spectral components pp n, i are calculated according to: where ⁇ p i is a floor coefficient close to 0, conventionally used to prevent the spectrum of the denoised signal from taking negative or too low values which would cause musical noise.
  • Steps 17 to 20 therefore essentially consist in subtracting from the signal spectrum an estimate, increased by the coefficient ⁇ ' n- ⁇ 1, i , of the noise spectrum estimated a priori.
  • the module 15 calculates, for each band i (0 ⁇ i ⁇ I), a quantity ⁇ E n, i representing the short-term variation of the energy of the noise-suppressed signal in the band i, as well as long-term value E n, i of the energy of the denoised signal in band i.
  • step 25 the quantity ⁇ E n, i is compared with a threshold ⁇ 1. If the threshold ⁇ 1 is not reached, the counter b i is incremented by one unit in step 26.
  • step 27 the long-term estimator ba i is compared to the value of the smoothed energy E n, i . If ba i ⁇ E n, i , the estimator ba i is taken equal to the smoothed value E n, i in step 28, and the counter b i is reset to zero.
  • the quantity ⁇ i which is taken equal to the ratio ba i / E n, i (step 36), is then equal to 1.
  • step 27 shows that ba i ⁇ E n, i
  • the counter b i is compared with a limit value bmax in step 29. If b i > bmax, the signal is considered to be too stationary to support vocal activity.
  • Bm represents an update coefficient between 0.90 and 1. Its value differs depending on the state of a voice activity detection automaton (steps 30 to 32). This state ⁇ n-1 is that determined during the processing of the previous frame.
  • the coefficient Bm takes a value Bmp very close to 1 so that the noise estimator is very slightly updated in the presence of speech. Otherwise, the coefficient Bm takes a lower value Bms, to allow a more significant update of the noise estimator in the phase of silence.
  • the difference ba i -bi i between the long-term estimator and the internal noise estimator is compared to a threshold ⁇ 2. If the threshold ⁇ 2 is not reached, the long-term estimator ba i is updated with the value of the internal estimator bi i in step 35. Otherwise, the long-term estimator ba i remains unchanged . This avoids that sudden variations due to a speech signal lead to an update of the noise estimator.
  • the module 15 After having obtained the quantities ⁇ i , the module 15 proceeds to the voice activity decisions in step 37.
  • the module 15 first updates the state of the detection automaton according to the quantity ⁇ 0 calculated for l of the signal band.
  • the new state ⁇ n of the automaton depends on the previous state ⁇ n-1 and on ⁇ 0 , as shown in Figure 4.
  • the module 15 also calculates the degrees of vocal activity ⁇ n, i in each band i ⁇ 1.
  • This function has for example the appearance shown in FIG. 5.
  • Module 16 calculates the band noise estimates, which will be used in the denoising process, using the successive values of the components S n, i and the degrees of voice activity ⁇ n, i . This corresponds to steps 40 to 42 of FIG. 3.
  • step 40 it is determined whether the voice activity detection machine has just gone from the rising state to the speaking state. If so, the last two estimates B and n -1, i and B and n -2, i previously calculated for each band i ⁇ 1 are corrected according to the value of the previous estimate B and n -3, i .
  • step 42 the module 16 updates the noise estimates per band according to the formulas: where ⁇ B denotes a forgetting factor such as 0 ⁇ B ⁇ 1.
  • Formula (6) shows that the degree of non-binary vocal activity ⁇ n, 1 is taken into account.
  • the long-term noise estimates B and n, i are overestimated, by a module 45 (FIG. 1), before proceeding to denoising by nonlinear spectral subtraction.
  • Module 45 calculates the overestimation coefficient ⁇ ' n, i previously mentioned, as well as an increased estimate B and ' n, i which essentially corresponds to ⁇ ' n, i . B and n, i .
  • the organization of the overestimation module 45 is shown in FIG. 6.
  • the enhanced estimate B and ' n, i is obtained by combining the long-term estimate B and n, i and a measure ⁇ B max / n, i of the variability of the noise component in band i around its long-term estimate.
  • this combination is essentially a simple sum made by an adder 46. It could also be a weighted sum.
  • the measure ⁇ B max / n, i of the noise variability reflects the variance of the noise estimator. It is obtained as a function of the values of S n, i and of B and n, i calculated for a certain number of previous frames on which the speech signal does not present any vocal activity in the band i. It is a function of deviations
  • the degree of vocal activity ⁇ n, i is compared with a threshold (block 51) to decide whether the deviation
  • the measure of variability ⁇ B max / n, i can, as a variant, be obtained as a function of the values S n, f (and not S n, i ) and B and n, i .
  • FIFO 54 does not contain
  • a first phase of the spectral subtraction is carried out by the module 55 shown in FIG. 1.
  • This phase provides, with the resolution of the bands i (1 i i I I), the frequency response H 1 / n, i of first denoising filter, as a function of the components S n, i and B and n, i and the overestimation coefficients ⁇ ' n, i .
  • the coefficient ⁇ 1 / i represents, like the coefficient ⁇ p i of formula (3), a floor conventionally used to avoid negative or too low values of the denoised signal.
  • the overestimation coefficient ⁇ ' n, i could be replaced in formula (7) by another coefficient equal to a function of ⁇ ' n, i and an estimate of the signal-to-noise ratio (for example S n, i / B and n, i ), this function decreasing according to the estimated value of the signal-to-noise ratio.
  • This function is then equal to ⁇ ' n, i for the lowest values of the signal-to-noise ratio. Indeed, when the signal is very noisy, it is a priori not useful to reduce the overestimation factor.
  • this function decreases to zero for the highest values of the signal / noise ratio. This protects the most energetic areas of the spectrum, where the speech signal is most significant, the amount subtracted from the signal then tending towards zero.
  • This strategy can be refined by applying it selectively to frequency harmonics pitch of the speech signal when it has voice activity.
  • a second denoising phase is carried out by a module 56 for protecting harmonics.
  • the module 57 can apply any known method of analysis of the speech signal of the frame to determine the period T p , expressed as an integer or fractional number of samples, for example a linear prediction method.
  • the protection provided by the module 56 may consist in carrying out, for each frequency f belonging to a band i:
  • H 2 / n, f 1
  • the quantity subtracted from the component S n, f will be zero.
  • the floor coefficients ⁇ 2 / i express the fact that certain harmonics of the tone frequency f p can be masked by noise, so that n protecting them is useless.
  • This protection strategy is preferably applied for each of the frequencies closest to the harmonics of f p , that is to say for any arbitrary integer.
  • ⁇ f p the frequency resolution with which the analysis module 57 produces the estimated tone frequency f p , that is to say that the real tone frequency is between f p - ⁇ f p / 2 and f p + ⁇ f p / 2
  • the difference between the ⁇ -th harmonic of the real tonal frequency is its estimate ⁇ ⁇ f p (condition (9)) can go up to ⁇ ⁇ ⁇ ⁇ f p / 2.
  • this difference can be greater than the spectral half-resolution ⁇ f / 2 of the Fourier transform. To take account of this uncertainty and to guarantee the good protection of the harmonics of the real tonal frequency, each of the frequencies can be protected.
  • condition (9) i.e. replace condition (9) above by: ⁇ integer / f - ⁇ . f p ⁇ ( ⁇ . ⁇ f p + ⁇ f ) / 2
  • condition (9 ') is of particular interest when the values of ⁇ can be large, in particular in the case where the method is used in a broadband system.
  • the corrected frequency response H 2 / n, f can be equal to 1 as indicated above, which corresponds to the subtraction of a zero quantity in the context of spectral subtraction, that is to say ie full protection of the frequency in question. More generally, this corrected frequency response H 2 / n, f could be taken equal to a value between 1 and H 1 / n, f depending on the degree of protection desired, which corresponds to the subtraction of an amount less than which would be subtracted if the frequency in question was not protected.
  • S 2 / n, f H 2 n, f .
  • S n, f H 2 n, f .
  • This signal S 2 / n, f is supplied to a module 60 which calculates, for each frame n, a masking curve by applying a psychoacoustic model of auditory perception by the human ear.
  • the masking phenomenon is a principle known from functioning of the human ear. When two frequencies are heard simultaneously, it is possible that one of the two is no longer audible. We say then that it is hidden.
  • the masking curve is seen as the convolution of the spectral spreading function of the basilar membrane in the bark domain with the excitatory signal, constituted in the present application by the signal S 2 / n, f .
  • the spectral spreading function can be modeled as shown in Figure 7.
  • R q depends on the more or less voiced character of the signal.
  • designates a degree of voicing of the speech signal, varying between zero (no voicing) and 1 (strongly voiced signal).
  • the denoising system also includes a module 62 which corrects the frequency response of the denoising filter, as a function of the masking curve M n, q calculated by the module 60 and of the increased estimates B and ' n, i calculated by the module 45.
  • the module 62 decides the level of denoising which must really be reached.
  • the new response H 3 / n, f for a frequency f belonging to the band i defined by the module 12 and to the bark band q, thus depends on the relative difference between the increased estimate B and ' n, i of the corresponding spectral component of the noise and the masking curve M n, q , as follows:
  • the quantity subtracted from a spectral component S n, f , in the process of spectral subtraction having the frequency response H 3 n, f is substantially equal to the minimum between on the one hand the quantity subtracted from this spectral component in the spectral subtraction process having the frequency response H 2 n, f , and on the other hand the fraction of the increased estimate B and ' n, i of the corresponding spectral component of the noise which, if necessary, exceeds the masking curve M n, q .
  • FIG. 8 illustrates the principle of the correction applied by the module 62. It schematically shows an example of a masking curve M n, q calculated on the basis of the spectral components S 2 n, f of the noise-suppressed signal, as well as the increased estimate B and ' n, i of the noise spectrum.
  • the quantity finally subtracted from the components S n, f will be that represented by the hatched areas, that is to say limited to the fraction of the increased estimate B and ' n, i of the spectral components of the noise which exceeds the curve of masking.
  • This subtraction is carried out by multiplying the frequency response H 3 / n, f of the denoising filter by the spectral components S n, f of the speech signal (multiplier 64).
  • TFRI inverse fast Fourier transform

Description

La présente invention concerne les techniques numériques de traitement de signaux de parole. Elle concerne plus particulièrement les techniques faisant appel à une détection d'activité vocale afin d'effectuer des traitements différenciés selon que le signal supporte ou non une activité vocale.The present invention relates to techniques digital speech signal processing. She relates more particularly to techniques making call for voice activity detection to perform differentiated processing depending on whether the signal supports or not a voice activity.

Les techniques numériques en question relèvent de domaines variés : codage de la parole pour la transmission ou le stockage, reconnaissance de la parole, diminution du bruit, annulation d'écho...The digital techniques in question come under various fields: speech coding for transmission or storage, speech recognition, decreased noise, echo cancellation ...

Les méthodes de détection d'activité vocale ont pour principale difficulté la distinction entre l'activité vocale et le bruit qui l'accompagne. Le recours à une technique de débruitage classique ne permet pas de traiter cette difficulté, puisque ces techniques font elles-mêmes appel à des estimations du bruit qui dépendent du degré d'activité vocale du signal. Ce problème a par exemple été décrit dans le document US-A-5659622.Voice activity detection methods have main difficulty distinguishing between activity voice and the accompanying noise. The use of a conventional denoising technique does not allow treatment this difficulty, since these techniques do themselves use noise estimates that depend on degree signal voice activity. This problem has for example been described in document US-A-5659622.

Un but principal de la présente invention est d'améliorer la robustesse au bruit des méthodes de détection d'activité vocale. Pour atteindre ce but, un procédé comme indiqué dans la revendication 1 est proposé.A main object of the present invention is to improve the noise robustness of the voice activity detection. To achieve this goal, a method as claimed in claim 1 is provided.

L'invention propose ainsi un procédé de détection d'activité vocale dans un signal de parole numérique traité par trames successives, dans lequel on soumet le signal de parole à un débruitage en tenant compte d'estimations du bruit compris dans le signal, mises à jour pour chaque trame d'une manière dépendante d'au moins un degré d'activité vocale déterminé pour ladite trame. Selon l'invention, on procède à un débruitage a priori du signal de parole de chaque trame sur la base d'estimations du bruit obtenues lors du traitement d'au moins une trame précédente, et on analyse les variations d'énergie du signal débruité a priori pour détecter le degré d'activité vocale de ladite trame.The invention thus provides a detection method voice activity in a digital speech signal processed by successive frames, in which the speech signal to denoising taking into account noise estimates included in the signal, updated day for each frame in a dependent manner of at least a degree of vocal activity determined for said frame. According to the invention, a priori denoising is carried out of the speech signal of each frame based on estimates noise obtained when processing at least one frame previous, and we analyze the energy variations of the a priori denoised signal to detect the level of activity voice of said frame.

Le fait de procéder à la détection d'activité vocale (selon une méthode qui peut généralement être toute méthode connue) sur la base d'un signal débruité a priori améliore sensiblement les performances de cette détection lorsque le bruit environnant est relativement important.Proceeding with activity detection vocal (according to a method which can generally be any known method) on the basis of an a priori denoised signal significantly improves the performance of this detection when the surrounding noise is relatively high.

Dans la suite de la présente description, on illustrera le procédé de détection d'activité vocale selon l'invention dans un système de débruitage d'un signal de parole. On comprendra que ce procédé peut trouver des applications dans de nombreux autres types de traitement numérique de la parole dans lesquels on souhaite disposer d'une information sur le degré d'activité vocale du signal traité : codage, reconnaissance, annulation d'écho...In the remainder of this description, we illustrate the method of detecting voice activity according to the invention in a denoising system of a signal word. It will be understood that this process can find applications in many other types of processing digital speech in which we want to have information on the degree of vocal activity of the signal processed: coding, recognition, echo cancellation ...

D'autres particularités et avantages de la présente invention apparaítront dans la description ci-après d'exemples de réalisation non limitatifs, en référence aux dessins annexés, dans lesquels :

  • la figure 1 est un schéma synoptique d'un système de débruitage mettant en oeuvre la présente invention ;
  • les figures 2 et 3 sont des organigrammes de procédures utilisées par un détecteur d'activité vocale du système de la figure 1 ;
  • la figure 4 est un diagramme représentant les états d'un automate de détection d'activité vocale ;
  • la figure 5 est un graphique illustrant les variations d'un degré d'activité vocale ;
  • la figure 6 est un schéma synoptique d'un module de surestimation du bruit du système de la figure 1 ;
  • la figure 7 est un graphique illustrant le calcul d'une courbe de masquage ; et
  • la figure 8 est un graphique illustrant l'exploitation des courbes de masquage dans le système de la figure 1.
Other features and advantages of the present invention will appear in the description below of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:
  • Figure 1 is a block diagram of a denoising system implementing the present invention;
  • Figures 2 and 3 are flowcharts of procedures used by a voice activity detector of the system of Figure 1;
  • FIG. 4 is a diagram representing the states of a voice activity detection automaton;
  • FIG. 5 is a graph illustrating the variations of a degree of vocal activity;
  • Figure 6 is a block diagram of a noise overestimation module of the system of Figure 1;
  • FIG. 7 is a graph illustrating the calculation of a masking curve; and
  • FIG. 8 is a graph illustrating the use of the masking curves in the system of FIG. 1.

Le système de débruitage représenté sur la figure 1 traite un signal numérique de parole s. Un module de fenêtrage 10 met ce signal s sous forme de fenêtres ou trames successives, constituées chacune d'un nombre N d'échantillons de signal numérique. De façon classique, ces trames peuvent présenter des recouvrements mutuels. Dans la suite de la présente description, on considérera, sans que ceci soit limitatif, que les trames sont constituées de N=256 échantillons à une fréquence d'échantillonnage Fe de 8 kHz, avec une pondération de Hamming dans chaque fenêtre, et des recouvrements de 50% entre fenêtres consécutives.The denoising system shown in FIG. 1 processes a digital speech signal s. A windowing module 10 puts this signal s in the form of successive windows or frames, each consisting of a number N of digital signal samples. Conventionally, these frames can have mutual overlaps. In the following of this description, it will be considered, without this being limiting, that the frames consist of N = 256 samples at a sampling frequency F e of 8 kHz, with a Hamming weighting in each window, and 50% overlap between consecutive windows.

La trame de signal est transformée dans le domaine fréquentiel par un module 11 appliquant un algorithme classique ce transformée de Fourier rapide (TFR) pour calculer le module du spectre du signal. Le module 11 délivre alors un ensemble de N=256 composantes fréquentielles du signal de parole, notées Sn,f, où n désigne le numéro de la trame courante, et f une fréquence du spectre discret. Du fait des propriétés des signaux numériques dans le domaine fréquenciel, seuls les N/2=128 premiers échantillons sont utilisés.The signal frame is transformed in the frequency domain by a module 11 applying a conventional algorithm this fast Fourier transform (TFR) to calculate the module of the signal spectrum. The module 11 then delivers a set of N = 256 frequency components of the speech signal, denoted S n, f , where n denotes the number of the current frame, and f a frequency of the discrete spectrum. Due to the properties of digital signals in the frequency domain, only the first N / 2 = 128 samples are used.

Pour calculer les estimations du bruit contenu dans le signal s, on n'utilise pas la résolution fréquentielle disponible en sortie de la transformée de Fourier rapide, mais une résolution plus faible, déterminée par un nombre I de bandes de fréquences couvrant la bande [0,Fe/2] du signal. Chaque bande i (1≤i≤I) s'étend entre une fréquence inférieure f(i-1) et une fréquence supérieure f(i), avec f(0)=0, et f(I)=Fe/2. Ce découpage en bandes de fréquences peut être uniforme (f(i)-f(i-1)=Fe/2I). Il peut également être non uniforme (par exemple selon une échelle de barks). Un module 12 calcule les moyennes respectives des composantes spectrales Sn,f du signal de parole par bandes, par exemple par une pondération uniforme telle que :

Figure 00030001
To calculate the noise estimates contained in the signal s, the frequency resolution available at the output of the fast Fourier transform is not used, but a lower resolution, determined by a number I of frequency bands covering the band [0 , F e / 2] of the signal. Each band i (1≤i≤I) extends between a lower frequency f (i-1) and a higher frequency f (i), with f (0) = 0, and f (I) = F e / 2 . This division into frequency bands can be uniform (f (i) -f (i-1) = F e / 2I). It can also be non-uniform (for example according to a barks scale). A module 12 calculates the respective averages of the spectral components S n, f of the speech signal in bands, for example by a uniform weighting such that:
Figure 00030001

Ce moyennage diminue les fluctuations entre les bandes en moyennant les contributions du bruit dans ces bandes, ce qui diminuera la variance de l'estimateur de bruit. En outre, ce moyennage permet une forte diminution de la complexité du système. This averaging reduces the fluctuations between the bands by averaging the noise contributions in these bands, which will decrease the variance of the estimator of noise. In addition, this averaging allows a large reduction of the complexity of the system.

Les composantes spectrales moyennées Sn,i sont adressées à un module 15 de détection d'activité vocale et à un module 16 d'estimation du bruit. Ces deux modules 15, 16 fonctionnent conjointement, en ce sens que des degrés d'activité vocale γn,i mesurés pour les différentes bandes par le module 15 sont utilisés par le module 16 pour estimer l'énergie à long terme du bruit dans les différentes bandes, tandis que ces estimations à long terme Ên,i sont utilisées par le module 15 pour procéder à un débruitage a priori du signal de parole dans les différentes bandes pour déterminer les degrés d'activité vocale γn,i.The averaged spectral components S n, i are addressed to a voice activity detection module 15 and to a noise estimation module 16. These two modules 15, 16 operate jointly, in the sense that degrees of vocal activity γ n, i measured for the different bands by the module 15 are used by the module 16 to estimate the long-term energy of the noise in the different bands, while these long-term estimates Ê n, i are used by the module 15 to carry out a priori denoising of the speech signal in the different bands to determine the degrees of vocal activity γ n, i .

Le fonctionnement des modules 15 et 16 peut correspondre aux organigrammes représentés sur les figures 2 et 3.The operation of modules 15 and 16 can correspond to the flowcharts represented in the figures 2 and 3.

Aux étapes 17 à 20, le module 15 procède au débruitage a priori du signal de parole dans les différentes bandes i pour la trame de signal n. Ce débruitage a priori est effectué selon un processus classique de soustraction spectrale non linéaire à partir d'estimations du bruit obtenues lors d'une ou plusieurs trames précédentes. A l'étape 17, le module 15 calcule, avec la résolution des bandes i, la réponse en fréquence Hpn,i du filtre de débruitage a priori, selon la formule : Hpn,i = S n,i - α' n-τ1,i . B n-τ1,i S n-τ2,i où τ1 et τ2 sont des retards exprimés en nombre de trames (τ1≥1, τ2≥0), et α'n,i est un coefficient de surestimation du bruit dont la détermination sera expliquée plus loin. Le retard τ1 peut être fixe (par exemple τ1=1) ou variable. Il est d'autant plus faible qu'on est confiant dans la détection d'activité vocale.In steps 17 to 20, the module 15 proceeds a priori to denoising the speech signal in the different bands i for the signal frame n. This a priori denoising is carried out according to a conventional process of non-linear spectral subtraction from noise estimates obtained during one or more previous frames. In step 17, the module 15 calculates, with the resolution of the bands i, the frequency response Hp n, i of the a priori denoising filter, according to the formula: hp or = S or - α ' not -τ1 i . B not -τ1 i S not -τ2 i where τ1 and τ2 are delays expressed in number of frames (τ1≥1, τ2≥0), and α 'n, i is a noise overestimation coefficient whose determination will be explained later. The delay τ1 can be fixed (for example τ1 = 1) or variable. The lower the confidence in the detection of voice activity.

Aux étapes 18 à 20, les composantes spectrales Êpn,i sont calculées selon :

Figure 00050001
où βpi est un coefficient de plancher proche de 0, servant classiquement à éviter que le spectre du signal débruité prenne des valeurs négatives ou trop faibles qui provoqueraient un bruit musical.In steps 18 to 20, the spectral components pp n, i are calculated according to:
Figure 00050001
where βp i is a floor coefficient close to 0, conventionally used to prevent the spectrum of the denoised signal from taking negative or too low values which would cause musical noise.

Les étapes 17 à 20 consistent donc essentiellement à soustraire du spectre du signal une estimation, majorée par le coefficient α'n-τ1,i, du spectre du bruit estimé a priori.Steps 17 to 20 therefore essentially consist in subtracting from the signal spectrum an estimate, increased by the coefficient α ' n-τ1, i , of the noise spectrum estimated a priori.

A l'étape 21, le module 15 calcule l'énergie du signal débruité a priori dans les différentes bandes i pour la trame n : E n,i = Êp 2 / n,i . Il calcule aussi une moyenne globale En,0 de l'énergie du signal débruité a priori, par une somme des énergies par bande En,i, pondérée par les largeurs de ces bandes. Dans les notations ci-dessous, l'indice i=0 sera utilisé pour désigner la bande globale du signal.In step 21, the module 15 calculates the energy of the a priori denoised signal in the different bands i for the frame n: E n, i = Êp 2 / n, i. It also calculates an overall average E n, 0 of the energy of the a priori denoised signal, by a sum of the energies per band E n, i , weighted by the widths of these bands. In the notations below, the index i = 0 will be used to designate the overall band of the signal.

Aux étapes 22 et 23, le module 15 calcule, pour chaque bande i (0≤i≤I), une grandeur ΔEn,i représentant la variation à court terme de l'énergie du signal débruité dans la bande i, ainsi qu'une valeur à long terme E n,i de l'énergie du signal débruité dans la bande i. La grandeur ΔEn,i peut être calculée par une formule simplifiée de dérivation : ΔEn,i = E n-4,i + E n-3,i - E n-1,i - E n,i 10 . Quant à l'énergie à long terme En,i, elle peut être calculée à l'aide d'un facteur d'oubli B1 tel que 0<B1<1, à savoir E n,i = B1. E n -1, i + (1-B1).En,i. In steps 22 and 23, the module 15 calculates, for each band i (0≤i≤I), a quantity ΔE n, i representing the short-term variation of the energy of the noise-suppressed signal in the band i, as well as long-term value E n, i of the energy of the denoised signal in band i. The quantity ΔE n, i can be calculated by a simplified derivation formula: .DELTA.E or = E not -4 i + E not -3 i - E not -1 i - E or 10 . As for the long-term energy E n, i , it can be calculated using a forgetting factor B1 such that 0 <B1 <1, namely E n, i = B 1. E n -1, i + (1- B 1). E n, i .

Après avoir calculé les énergies En,i du signal débruité, ses variations à court terme ΔEn,1 et ses valeurs à long terme E n,i de la manière indiquée sur la figure 2, le module 15 calcule, pour chaque bande i (0≤i≤I), une valeur ρ1 représentative de l'évolution de l'énergie du signal débruité. Ce calcul est effectué aux étapes 25 à 36 de la figure 3, exécutées pour chaque bande i entre i=0 et i=1. Ce calcul fait appel à un estimateur à long terme de l'enveloppe du bruit bai, à un estimateur interne bii et à un compteur de trames bruitées bi.After calculating the energies E n, i of the denoised signal, its short-term variations ΔE n, 1 and its long-term values E n, i in the manner indicated in FIG. 2, the module 15 calculates, for each band i (0 i i I I), a value ρ 1 representative of the evolution of the energy of the denoised signal. This calculation is carried out in steps 25 to 36 of FIG. 3, executed for each band i between i = 0 and i = 1. This calculation uses a long-term estimator of the noise envelope ba i , an internal estimator bi i and a noisy frame counter b i .

A l'étape 25, la grandeur ΔEn,i est comparée à un seuil ε1. Si le seuil ε1 n'est pas atteint, le compteur bi est incrémenté d'une unité à l'étape 26. A l'étape 27, l'estimateur à long terme bai est comparé à la valeur de l'énergie lissée En,i. Si baiE n,i, l'estimateur bai est pris égal à la valeur lissée E n,i à l'étape 28, et le compteur bi est remis à zéro. La grandeur ρi, qui est prise égale au rapport bai/E n,i (étape 36), est alors égale à 1.In step 25, the quantity ΔE n, i is compared with a threshold ε1. If the threshold ε1 is not reached, the counter b i is incremented by one unit in step 26. In step 27, the long-term estimator ba i is compared to the value of the smoothed energy E n, i . If ba i E n, i , the estimator ba i is taken equal to the smoothed value E n, i in step 28, and the counter b i is reset to zero. The quantity ρ i , which is taken equal to the ratio ba i / E n, i (step 36), is then equal to 1.

Si l'étape 27 montre que bai<E n,i, le compteur bi est comparé à une valeur limite bmax à l'étape 29. Si bi>bmax, le signal est considéré comme trop stationnaire pour supporter de l'activité vocale. L'étape 28 précitée, qui revient à considérer que la trame ne comporte que du bruit, est alors exécutée. Si bi≤bmax à l'étape 29, l'estimateur interne bii est calculé à l'étape 33 selon : bii = (1-Bm). E n,i + Bm.bai Dans cette formule, Bm représente un coefficient de mise à jour compris entre 0,90 et 1. Sa valeur diffère selon l'état d'un automate de détection d'activité vocale (étapes 30 à 32). Cet état δn-1 est celui déterminé lors du traitement de la trame précédente. Si l'automate est dans un état de détection de parole (δn-1=2 à l'étape 30), le coefficient Bm prend une valeur Bmp très proche de 1 pour que l'estimateur du bruit soit très faiblement mis à jour en présence de parole. Dans le cas contraire, le coefficient Bm prend une valeur Bms plus faible, pour permettre une mise à jour plus significative de l'estimateur de bruit en phase de silence. A l'étape 34, l'écart bai-bii entre l'estimateur à long terme et l'estimateur interne du bruit est comparé à un seuil ε2. Si le seuil ε2 n'est pas atteint, l'estimateur à long terme bai est mis à jour avec la valeur de l'estimateur interne bii à l'étape 35. Sinon, l'estimateur à long terme bai reste inchangé. On évite ainsi que de brutales variations dues à un signal de parole conduisent à une mise à jour de l'estimateur de bruit.If step 27 shows that ba i < E n, i , the counter b i is compared with a limit value bmax in step 29. If b i > bmax, the signal is considered to be too stationary to support vocal activity. The aforementioned step 28, which amounts to considering that the frame contains only noise, is then executed. If b i ≤bmax in step 29, the internal estimator bi i is calculated in step 33 according to: bi i = (1- bm ). E or + bm . ba i In this formula, Bm represents an update coefficient between 0.90 and 1. Its value differs depending on the state of a voice activity detection automaton (steps 30 to 32). This state δ n-1 is that determined during the processing of the previous frame. If the automaton is in a speech detection state (δ n-1 = 2 in step 30), the coefficient Bm takes a value Bmp very close to 1 so that the noise estimator is very slightly updated in the presence of speech. Otherwise, the coefficient Bm takes a lower value Bms, to allow a more significant update of the noise estimator in the phase of silence. In step 34, the difference ba i -bi i between the long-term estimator and the internal noise estimator is compared to a threshold ε2. If the threshold ε2 is not reached, the long-term estimator ba i is updated with the value of the internal estimator bi i in step 35. Otherwise, the long-term estimator ba i remains unchanged . This avoids that sudden variations due to a speech signal lead to an update of the noise estimator.

Après avoir obtenu les grandeurs ρi, le module 15 procède aux décisions d'activité vocale à l'étape 37. Le module 15 met d'abord à jour l'état de l'automate de détection selon la grandeur ρ0 calculée pour l'ensemble de la bande du signal. Le nouvel état δn de l'automate dépend de l'état précédent δn-1 et de ρ0, de la manière représentée sur la figure 4.After having obtained the quantities ρ i , the module 15 proceeds to the voice activity decisions in step 37. The module 15 first updates the state of the detection automaton according to the quantity ρ 0 calculated for l of the signal band. The new state δ n of the automaton depends on the previous state δ n-1 and on ρ 0 , as shown in Figure 4.

Quatre états sont possibles : δ=0 détecte le silence, ou absence de parole ; δ=2 détecte la présence d'une activité vocale ; et les états δ=1 et δ=3 sont des états intermédiaires de montée et de descente. Lorsque l'automate est dans l'état de silence (δn-1=0), il y reste si ρ0 ne dépasse pas un premier seuil SE1, et il passe dans l'état de montée dans le cas contraire. Dans l'état de montée (δn-1=1), il revient dans l'état de silence si ρ0 est plus petit que le seuil SE1, il passe dans l'état de parole si ρ0 est plus grand qu'un second seuil SE2 plus grand que le seuil SE1, et il reste dans l'état de montée si SE1≤ ρ0≤SE2. Lorsque l'automate est dans l'état de parole (δn-1=2), il y reste si ρ0 dépasse un troisième seuil SE3 plus petit que le seuil SE2, et il passe dans l'état de descente dans le cas contraire. Dans l'état de descente (δn-1=3), l'automate revient dans l'état de parole si ρ0 est plus grand que le seuil SE2, il revient dans l'état de silence si ρ0 est en deçà d'un quatrième seuil SE4 plus petit que le seuil SE2, et il reste dans l'état de descente si SE4≤ρ0≤SE2.Four states are possible: δ = 0 detects silence, or absence of speech; δ = 2 detects the presence of voice activity; and the states δ = 1 and δ = 3 are intermediate states of ascent and descent. When the automaton is in the state of silence (δ n-1 = 0), it remains there if ρ 0 does not exceed a first threshold SE1, and it goes into the state of ascent otherwise. In the rising state (δ n-1 = 1), it returns to the state of silence if ρ 0 is smaller than the threshold SE1, it goes into the speaking state if ρ 0 is greater than a second threshold SE2 greater than the threshold SE1, and it remains in the rising state if SE1≤ ρ 0 ≤SE2. When the automaton is in the speech state (δ n-1 = 2), it remains there if ρ 0 exceeds a third threshold SE3 smaller than the threshold SE2, and it goes into the descent state in the case opposite. In the descending state (δ n-1 = 3), the automaton returns to the speaking state if ρ 0 is greater than the threshold SE2, it returns to the silent state if ρ 0 is below a fourth threshold SE4 smaller than the threshold SE2, and it remains in the descent state if SE4≤ρ 0 ≤SE2.

A l'étape 37, le module 15 calcule également les degrés d'activité vocale γn,i dans chaque bande i≥1. Ce degré γn,i est de préférence un paramètre non binaire, c'est-à-dire que la fonction γn,i=g(ρi) est une fonction variant continûment entre 0 et 1 en fonction des valeurs prises par la grandeur ρi. Cette fonction a par exemple l'allure représentée sur la figure 5.In step 37, the module 15 also calculates the degrees of vocal activity γ n, i in each band i≥1. This degree γ n, i is preferably a non-binary parameter, that is to say that the function γ n, i = g (ρ i ) is a function continuously varying between 0 and 1 depending on the values taken by the magnitude ρ i . This function has for example the appearance shown in FIG. 5.

Le module 16 calcule les estimations du bruit par bande, qui seront utilisées dans le processus de débruitage, en utilisant les valeurs successives des composantes Sn,i et des degrés d'activité vocale γn,i. Ceci correspond aux étapes 40 à 42 de la figure 3. A l'étape 40, on détermine si l'automate de détection d'activité vocale vient de passer de l'état de montée à l'état de parole. Dans l'affirmative, les deux dernières estimations B andn -1, i et B andn -2, i précédemment calculées pour chaque bande i≥1 sont corrigées conformément à la valeur de l'estimation précédente B andn -3, i . Cette correction est effectuée pour tenir compte du fait que, dans la phase de montée (δ=1), les estimations à long terme de l'énergie du bruit dans le processus de détection d'activité vocale (étapes 30 à 33) ont pu être calculées comme s1 le signal ne comportait que du bruit (Bm=Bms), de sorte qu'elles risquent d'être entachées d'erreur.Module 16 calculates the band noise estimates, which will be used in the denoising process, using the successive values of the components S n, i and the degrees of voice activity γ n, i . This corresponds to steps 40 to 42 of FIG. 3. In step 40, it is determined whether the voice activity detection machine has just gone from the rising state to the speaking state. If so, the last two estimates B and n -1, i and B and n -2, i previously calculated for each band i≥1 are corrected according to the value of the previous estimate B and n -3, i . This correction is made to account for the fact that, in the ascent phase (δ = 1), the long-term noise energy estimates in the voice activity detection process (steps 30 to 33) may have been be calculated as s1 the signal contained only noise (Bm = Bms), so that they risk being tainted with error.

A l'étape 42, le module 16 met à jour les estimations du bruit par bande selon les formules :

Figure 00090001
Figure 00090002
où λB désigne un facteur d'oubli tel que 0<λB<1. La formule (6) met en évidence la prise en compte du degré d'activité vocale non binaire γn,1.In step 42, the module 16 updates the noise estimates per band according to the formulas:
Figure 00090001
Figure 00090002
where λ B denotes a forgetting factor such as 0 <λ B <1. Formula (6) shows that the degree of non-binary vocal activity γ n, 1 is taken into account.

Comme indiqué précédemment, les estimations à long terme du bruit B andn,i font l'objet d'une surestimation, par un module 45 (figure 1), avant de procéder au débruitage par soustraction spectrale non linéaire. Le module 45 calcule le coefficient de surestimation α' n,i précédemment évoqué, ainsi qu'une estimation majorée B and' n,i qui correspond essentiellement à α' n,i .B andn,i .As indicated above, the long-term noise estimates B and n, i are overestimated, by a module 45 (FIG. 1), before proceeding to denoising by nonlinear spectral subtraction. Module 45 calculates the overestimation coefficient α ' n, i previously mentioned, as well as an increased estimate B and ' n, i which essentially corresponds to α ' n, i . B and n, i .

L'organisation du module de surestimation 45 est représentée sur la figure 6. L'estimation majorée B and'n,i est obtenue en combinant l'estimation à long terme B andn,i et une mesure ΔB max / n,ide la variabilité de la composante du bruit dans la bande i autour de son estimation à long terme. Dans l'exemple considéré, cette combinaison est, pour l'essentiel, une simple somme réalisée par un additionneur 46. Ce pourrait également être une somme pondérée.The organization of the overestimation module 45 is shown in FIG. 6. The enhanced estimate B and ' n, i is obtained by combining the long-term estimate B and n, i and a measure ΔB max / n, i of the variability of the noise component in band i around its long-term estimate. In the example considered, this combination is essentially a simple sum made by an adder 46. It could also be a weighted sum.

Le coefficient de surestimation α' n,i est égal au rapport entre la somme B andn,i + ΔB max / n,idélivrée par l'additionneur 46 et l'estimation à long terme retardée B andn -τ3, i (diviseur 47), plafonné à une valeur limite αmax, par exemple αmax=4 (bloc 48). Le retard τ3 sert à corriger le cas échéant, dans les phases de montée (δ=1), la valeur du coefficient de surestimation α' n,i , avant que les estimations à long terme aient été corrigées par les étapes 40 et 41 de la figure 3 (par exemple τ3=3)The overestimation coefficient α ' n, i is equal to the ratio between the sum B and n, i + Δ B max / n, i delivered by the adder 46 and the delayed long-term estimate B and n -τ3, i (divider 47), capped at a limit value α max , for example α max = 4 (block 48). The delay τ3 is used to correct, if necessary, in the rise phases (δ = 1), the value of the overestimation coefficient α ' n, i , before the long-term estimates have been corrected by steps 40 and 41 of Figure 3 (for example τ3 = 3)

L'estimation majorée B and' n,i est finalement prise égale à α'n,i.B andn-τ 3 ,i (multiplieur 49).The increased estimate B and ' n, i is finally taken equal to α' n, i .B and n-τ 3 , i (multiplier 49).

La mesure ΔB max / n,ide la variabilité du bruit reflète la variance de l'estimateur de bruit. Elle est obtenue en fonction des valeurs de Sn,i et de B andn,i calculées pour un certain nombre de trames précédentes sur lesquelles le signal de parole ne présente pas d'activité vocale dans la bande i. C'est une fonction des écarts |Sn-k,i - B andn-k,i | calculés pour un nombre K de trames de silence (n-k≤n). Dans l'exemple représenté, cette fonction est simplement le maximum (bloc 50). Pour chaque trame n, le degré d'activité vocale γn,i est comparé à un seuil (bloc 51) pour décider si l'écart |S n,i - B andn,i |, calculé en 52-53, doit ou non être chargé dans une file d'attente 54 de K emplacements organisée en mode premier entré-premier sorti (FIFO). Si γn,i ne dépasse pas le seuil (qui peut être égal à 0 si la fonction g() a la forme de la figure 5), la FIFO 54 n'est pas alimentée, tandis qu'elle l'est dans le cas contraire. La valeur maximale contenue dans la FIFO 54 est alors fournie comme mesure de variabilité ΔB max / n,i.The measure Δ B max / n, i of the noise variability reflects the variance of the noise estimator. It is obtained as a function of the values of S n, i and of B and n, i calculated for a certain number of previous frames on which the speech signal does not present any vocal activity in the band i. It is a function of deviations | S nk, i - B and nk, i | calculated for a number K of frames of silence (nk≤n). In the example shown, this function is simply the maximum (block 50). For each frame n, the degree of vocal activity γ n, i is compared with a threshold (block 51) to decide whether the deviation | S n, i - B and n, i |, calculated in 52-53, must or not be loaded into a queue 54 of K locations organized in first-in-first-out (FIFO) mode. If γ n, i does not exceed the threshold (which can be equal to 0 if the function g () has the form of FIG. 5), the FIFO 54 is not supplied, while it is in the opposite case. The maximum value contained in FIFO 54 is then provided as a measure of variability Δ B max / n, i .

La mesure de variabilité ΔB max / n,ipeut, en variante, être obtenue en fonction des valeurs Sn,f (et non Sn,i) et B andn,i . On procède alors de la même manière, sauf que la FIFO 54 contient non pas |Sn-k,i - B andn-k,i| pour chacune des bandes i, mais plutôt

Figure 00110001
.The measure of variability Δ B max / n, i can, as a variant, be obtained as a function of the values S n, f (and not S n, i ) and B and n, i . We then proceed in the same way, except that FIFO 54 does not contain | S nk, i - B and nk, i | for each of the bands i, but rather
Figure 00110001
.

Grâce aux estimations indépendantes des fluctuations à long terme du bruit B andn,i et de sa variabilité à court terme ΔB max / n,i, l'estimateur majoré. B and' n,i procure une excellente robustesse aux bruits musicaux du procédé de débruitage.Thanks to independent estimates of long-term fluctuations in noise B and n, i and its short-term variability Δ B max / n, i , the enhanced estimator. B and ' n, i provides excellent robustness to the musical noises of the denoising process.

Une première phase de la soustraction spectrale est réalisée par le module 55 représenté sur la figure 1. Cette phase fournit, avec la résolution des bandes i (1≤i≤I), la réponse en fréquence H 1 / n,id'un premier filtre de débruitage, en fonction des composantes Sn,i et B andn,i et des coefficients de surestimation α'n,i. Ce calcul peut être effectué pour chaque bande i selon la formule : H 1 n,i = max{S n,i - α' n,i . B n,i 1 i . B n,i }S n-τ4,i où τ4 est un retard entier déterminé tel que τ4≥0 (par exemple τ4=0). Dans l'expression (7), le coefficient β 1 / i représente, comme le coefficient βpi de la formule (3), un plancher servant classiquement à éviter les valeurs négatives ou trop faibles du signal débruité.A first phase of the spectral subtraction is carried out by the module 55 shown in FIG. 1. This phase provides, with the resolution of the bands i (1 i i I I), the frequency response H 1 / n, i of first denoising filter, as a function of the components S n, i and B and n, i and the overestimation coefficients α ' n, i . This calculation can be performed for each band i according to the formula: H 1 or = max { S or - α ' or . B or , β 1 i . B or } S not -τ4, i where τ4 is a determined integer delay such as τ4≥0 (for example τ4 = 0). In expression (7), the coefficient β 1 / i represents, like the coefficient β p i of formula (3), a floor conventionally used to avoid negative or too low values of the denoised signal.

De façon connue (EP-A-0 534 837), le coefficient de surestimation α'n,i pourrait être remplacé dans la formule (7) par un autre coefficient égal à une fonction de α'n,i et d'une estimation du rapport signal-sur-bruit (par exemple Sn,i/B andn,i ), cette fonction étant décroissante selon la valeur estimée du rapport signal-sur-bruit. Cette fonction est alors égale à α' n,i pour les valeurs les plus faibles du rapport signal-sur-bruit. En effet, lorsque le signal est très bruité, il n'est a priori pas utile de diminuer le facteur de surestimation. Avantageusement, cette fonction décroít vers zéro pour les valeurs les plus élevées du rapport signal/bruit. Ceci permet de protéger les zones les plus énergétiques du spectre, où le signal de parole est le plus significatif, la quantité soustraite du signal tendant alors vers zéro.As is known (EP-A-0 534 837), the overestimation coefficient α ' n, i could be replaced in formula (7) by another coefficient equal to a function of α' n, i and an estimate of the signal-to-noise ratio (for example S n, i / B and n, i ), this function decreasing according to the estimated value of the signal-to-noise ratio. This function is then equal to α ' n, i for the lowest values of the signal-to-noise ratio. Indeed, when the signal is very noisy, it is a priori not useful to reduce the overestimation factor. Advantageously, this function decreases to zero for the highest values of the signal / noise ratio. This protects the most energetic areas of the spectrum, where the speech signal is most significant, the amount subtracted from the signal then tending towards zero.

Cette stratégie peut être affinée en l'appliquant de manière sélective aux harmoniques de la fréquence tonale (« pitch ») du signal de parole lorsque celui-ci présente une activité vocale.This strategy can be refined by applying it selectively to frequency harmonics pitch of the speech signal when it has voice activity.

Ainsi, dans la réalisation représentée sur la figure 1, une seconde phase de débruitage est réalisée par un module 56 de protection des harmoniques. Ce module calcule, avec la résolution de la transformée de Fourier, la réponse en fréquence H 2 / n,fd'un second filtre de débruitage en fonction des paramètres H 1 / n,i, α' n,i , B andn,i , δn, Sn,i et de la fréquence tonale fp=Fe/Tp calculée en dehors des phases de silence par un module d'analyse harmonique 57. En phase de silence (δn=0), le module 56 n'est pas en service, c'est-à-dire que H 2 / n,f = H 1 / n,ipour chaque fréquence f d'une bande i. Le module 57 peut appliquer toute méthode connue d'analyse du signal de parole de la trame pour déterminer la période Tp, exprimée comme un nombre entier ou fractionnaire d'échantillons, par exemple une méthode de prédiction linéaire.Thus, in the embodiment shown in FIG. 1, a second denoising phase is carried out by a module 56 for protecting harmonics. This module calculates, with the resolution of the Fourier transform, the frequency response H 2 / n, f of a second denoising filter as a function of the parameters H 1 / n, i , α ' n, i , B and n , i , δ n , S n, i and the tone frequency f p = F e / T p calculated outside the phases of silence by a harmonic analysis module 57. In the phase of silence (δ n = 0), the module 56 is not in service, that is to say that H 2 / n, f = H 1 / n, i for each frequency f of a band i. The module 57 can apply any known method of analysis of the speech signal of the frame to determine the period T p , expressed as an integer or fractional number of samples, for example a linear prediction method.

La protection apportée par le module 56 peut consister à effectuer, pour chaque fréquence f appartenant à une bande i :

Figure 00130001
The protection provided by the module 56 may consist in carrying out, for each frequency f belonging to a band i:
Figure 00130001

Δf=Fe/N représente la résolution spectrale de la transformée de Fourier. Lorsque H 2 / n,f=1, la quantité soustraite de la composante Sn,f sera nulle. Dans ce calcul, les coefficients de plancher β 2 / i(par exemple β 2 / i= β 1 / i) expriment le fait que certaines harmoniques de la fréquence tonale fp peuvent être masquées par du bruit, de sorte qu'il n'est pas utile de les protéger.Δf = F e / N represents the spectral resolution of the Fourier transform. When H 2 / n, f = 1, the quantity subtracted from the component S n, f will be zero. In this calculation, the floor coefficients β 2 / i (for example β 2 / i = β 1 / i ) express the fact that certain harmonics of the tone frequency f p can be masked by noise, so that n protecting them is useless.

Cette stratégie de protection est de préférence appliquée pour chacune des fréquences les plus proches des harmoniques de fp, c'est-à-dire pour η entier quelconque.This protection strategy is preferably applied for each of the frequencies closest to the harmonics of f p , that is to say for any arbitrary integer.

Si on désigne par δfp la résolution fréquentielle avec laquelle le module d'analyse 57 produit la réquence tonale estimée fp, c'est-à-dire que la fréquence tonale réelle est comprise entre fp-δfp/2 et fp+δfp/2, alors l'écart entre la η-ième harmonique de la fréquence tonale réelle est son estimation η×fp (condition (9)) peut aller jusqu'à ± η×δfp/2. Pour les valeurs élevées de η, cet écart peut être supérieur à la demi-résolution spectrale Δf/2 de la transformée de Fourier. Pour tenir compte de cette incertitude et garantir la bonne protection des harmoniques de la fréquence tonale réelle, on peut protéger chacune des fréquences . de l'intervalle [η×fp- η×δf p/2,η×f p+ η×δf p/2], c'est-à-dire remplacer la condition (9) ci-dessus par : ∃η entier / f - η.fp ≤ (η.δfp + Δf)/2 Cette facon de procéder (condition (9')) présente un intérêt particulier lorsque les valeurs de η peuvent être grandes, notamment dans le cas où le procédé est utilisé dans un système à bande élargie.If we denote by δf p the frequency resolution with which the analysis module 57 produces the estimated tone frequency f p , that is to say that the real tone frequency is between f p -δf p / 2 and f p + δf p / 2, then the difference between the η-th harmonic of the real tonal frequency is its estimate η × f p (condition (9)) can go up to ± η × δf p / 2. For high values of η, this difference can be greater than the spectral half-resolution Δf / 2 of the Fourier transform. To take account of this uncertainty and to guarantee the good protection of the harmonics of the real tonal frequency, each of the frequencies can be protected. of the interval [η × f p - η × δ f p / 2, η × f p + η × δ f p / 2], i.e. replace condition (9) above by: ∃η integer / f - η. f p ≤ (η.δ f p + Δ f ) / 2 This way of proceeding (condition (9 ')) is of particular interest when the values of η can be large, in particular in the case where the method is used in a broadband system.

Pour chaque fréquence protégée, la réponse en fréquence corrigée H 2 / n,f peut être égale à 1 comme indiqué ci-dessus, ce qui correspond à la soustraction d'une quantité nulle dans le cadre de la soustraction spectrale, c'est-à-dire à une protection complète de la fréquence en question. Plus généralement, cette réponse en fréquence corrigée H 2 / n,f pourrait être prise égale à une valeur comprise entre 1 et H 1 / n,f selon le degré de protection souhaité, ce qui correspond à la soustraction d'une quantité inférieure à celle qui serait soustraite si la fréquence en question n'était pas protégée.For each protected frequency, the corrected frequency response H 2 / n, f can be equal to 1 as indicated above, which corresponds to the subtraction of a zero quantity in the context of spectral subtraction, that is to say ie full protection of the frequency in question. More generally, this corrected frequency response H 2 / n, f could be taken equal to a value between 1 and H 1 / n, f depending on the degree of protection desired, which corresponds to the subtraction of an amount less than which would be subtracted if the frequency in question was not protected.

Les composantes spectrales S 2 / n,f d'un signal débruité sont calculées par un multiplieur 58 : S 2 n,f = H 2 n,f .Sn,f The spectral components S 2 / n, f of a denoised signal are calculated by a multiplier 58: S 2 n, f = H 2 n, f . S n, f

Ce signal S 2 / n,f est fourni à un module 60 qui calcule, pour chaque trame n, une courbe de masquage en appliquant un modèle psychoacoustique de perception auditive par l'oreille humaine.This signal S 2 / n, f is supplied to a module 60 which calculates, for each frame n, a masking curve by applying a psychoacoustic model of auditory perception by the human ear.

Le phénomène de masquage est un principe connu du fonctionnement de l'oreille humaine. Lorsque deux fréquences sont entendues simultanément, il est possible que l'une des deux ne soit plus audible. On dit alors qu'elle est masquée.The masking phenomenon is a principle known from functioning of the human ear. When two frequencies are heard simultaneously, it is possible that one of the two is no longer audible. We say then that it is hidden.

Il existe différentes méthodes pour calculer des courbes de masquage. On peut par exemple utiliser celle développée par J.D. Johnston («Transform Coding of Audio Signals Using Perceptual Noise Criteria », IEEE Journal on Selected Area in Communications, Vol. 6, No. 2, février 1988). Dans cette méthode, on travaille dans l'échelle fréquentielle des barks. La courbe de masquage est vue comme la convolution de la fonction d'étalement spectral de la membrane basilaire dans le domaine bark avec le signal excitateur, constitué dans la présente application par le signal S 2 / n,f. La fonction d'étalement spectral peut être modélisée de la manière représentée sur la figure 7. Pour chaque bande de bark, on calcule la contribution des bandes inférieures et supérieures convoluées par la fonction d'étalement de la membrane basilaire :

Figure 00150001
où les indices q et q' désignent les bandes de bark (0≤q,q'≤Q), et S 2 / n,q, représente la moyenne des composantes S 2 / n,f du signal excitateur débruité pour les fréquences discrètes f appartenant à la bande de bark q'.There are different methods for calculating masking curves. One can for example use that developed by JD Johnston ("Transform Coding of Audio Signals Using Perceptual Noise Criteria", IEEE Journal on Selected Area in Communications, Vol. 6, No. 2, February 1988). In this method, we work in the frequency scale of the barks. The masking curve is seen as the convolution of the spectral spreading function of the basilar membrane in the bark domain with the excitatory signal, constituted in the present application by the signal S 2 / n, f . The spectral spreading function can be modeled as shown in Figure 7. For each bark band, we calculate the contribution of the upper and lower bands convoluted by the spreading function of the basilar membrane:
Figure 00150001
where the indices q and q 'denote the bark bands (0≤q, q'≤Q), and S 2 / n, q , represents the mean of the components S 2 / n, f of the excitatory signal denoised for the discrete frequencies f belonging to the bark band q '.

Le seuil de masquage Mn,q est obtenu par le module 60 pour chaque bande de bark q, selon la formule : Mn,q = Cn,q/Rq où Rq dépend du caractère plus ou moins voisé du signal. De façcn connue, une forme possible de Rq est : 10.log10(Rq) = (A+q).χ + B.(1-χ) avec A=14,5 et B=5,5. χ désigne un degré de voisement du signal de parole, variant entre zéro (pas de voisement) et 1 (signal fortement voisé). Le paramètre χ peut être de la forme connue :

Figure 00150002
où SFM représente, en décibels, le rapport entre la moyenne arithmétique et la moyenne géométrique de l'énergie des bandes de bark, et SFMmax=-60 dB.The masking threshold M n, q is obtained by the module 60 for each bark band q, according to the formula: M n, q = C n, q / R q where R q depends on the more or less voiced character of the signal. As is known, a possible form of R q is: 10.log 10 (R q ) = (A + q) .χ + B. (1-χ) with A = 14.5 and B = 5.5. χ designates a degree of voicing of the speech signal, varying between zero (no voicing) and 1 (strongly voiced signal). The parameter χ can be of the known form:
Figure 00150002
where SFM represents, in decibels, the ratio between the arithmetic mean and the geometric mean of the energy of the bark bands, and SFM max = -60 dB.

Le système de débruitage comporte encore un module 62 qui corrige la réponse en fréquence du filtre de débruitage, en fonction de la courbe de masquage Mn,q calculée par le module 60 et des estimations majorées B and'n,i calculées par le module 45. Le module 62 décide du niveau de débruitage qui doit réellement être atteint.The denoising system also includes a module 62 which corrects the frequency response of the denoising filter, as a function of the masking curve M n, q calculated by the module 60 and of the increased estimates B and ' n, i calculated by the module 45. The module 62 decides the level of denoising which must really be reached.

En comparant l'enveloppe de l'estimation majorée du bruit avec l'enveloppe formée par les seuils de masquage Mn,q, on décide de ne débruiter le signal que dans la mesure où l'estimation majorée B and' n,i dépasse la courbe de masquage. Ceci évite de supprimer inutilement du bruit masqué par de la parole.By comparing the envelope of the estimate increased by the noise with the envelope formed by the masking thresholds M n, q , it is decided to denoise the signal only insofar as the estimate increased B and ' n, i exceeds the masking curve. This avoids unnecessarily removing noise masked by speech.

La nouvelle réponse H 3 / n,f, pour une fréquence f appartenant à la bande i définie par le module 12 et à la bande de bark q, dépend ainsi de l'écart relatif entre l'estimation majorée B and' n,i de la composante spectrale correspondante du bruit et la courbe de masquage Mn,q, de la manière suivante :

Figure 00160001
The new response H 3 / n, f , for a frequency f belonging to the band i defined by the module 12 and to the bark band q, thus depends on the relative difference between the increased estimate B and ' n, i of the corresponding spectral component of the noise and the masking curve M n, q , as follows:
Figure 00160001

En d'autres termes, la quantité soustraite d'une composante spectrale Sn,f, dans le processus de soustraction spectrale ayant la réponse fréquentielle H 3 n,f , est sensiblement égale au minimum entre d'une part la quantité soustraite de cette composante spectrale dans le processus de soustraction spectrale ayant la réponse fréquentielle H 2 n,f , et d'autre part la fraction de l'estimation majorée B and'n,i de la composante spectrale correspondante du bruit qui, le cas échéant, dépasse la courbe de masquage Mn,q.In other words, the quantity subtracted from a spectral component S n, f , in the process of spectral subtraction having the frequency response H 3 n, f , is substantially equal to the minimum between on the one hand the quantity subtracted from this spectral component in the spectral subtraction process having the frequency response H 2 n, f , and on the other hand the fraction of the increased estimate B and ' n, i of the corresponding spectral component of the noise which, if necessary, exceeds the masking curve M n, q .

La figure 8 illustre le principe de la correction appliquée par le module 62. Elle montre schématiquement un exemple de courbe de masquage Mn,q calculée sur la base des composantes spectrales S 2 n,f du signal débruité, ainsi que l'estimation majorée B and'n,i du spectre du bruit. La quantité finalement soustraite des composantes Sn,f sera celle représentée par les zones hachurées, c'est-à-dire limitée à la fraction de l'estimation majorée B and'n,i des composantes spectrales du bruit qui dépasse la courbe de masquage.FIG. 8 illustrates the principle of the correction applied by the module 62. It schematically shows an example of a masking curve M n, q calculated on the basis of the spectral components S 2 n, f of the noise-suppressed signal, as well as the increased estimate B and ' n, i of the noise spectrum. The quantity finally subtracted from the components S n, f will be that represented by the hatched areas, that is to say limited to the fraction of the increased estimate B and ' n, i of the spectral components of the noise which exceeds the curve of masking.

Cette soustraction est effectuée en multipliant la réponse fréquentielle H 3 / n,f du filtre de débruitage par les composantes spectrales Sn,f du signal de parole (multiplieur 64). Un module 65 reconstruit alors le signal débruité dans le domaine temporel, en opérant la transformée de Fourier rapide inverse (TFRI) inverse des échantillons de fréquence S 3 / n,f délivrés par le multiplieur 64. Pour chaque trame, seuls les N/2=128 premiers échantillons du signal produit par le module 65 sont délivrés comme signal débruité final s3, après reconstruction par addition-recouvrement avec les N/2=128 derniers échantillons de la trame précédente (module 66).This subtraction is carried out by multiplying the frequency response H 3 / n, f of the denoising filter by the spectral components S n, f of the speech signal (multiplier 64). A module 65 then reconstructs the denoised signal in the time domain, by operating the inverse fast Fourier transform (TFRI) which reverses samples of frequency S 3 / n, f delivered by the multiplier 64. For each frame, only the N / 2 = 128 first samples of the signal produced by module 65 are delivered as final denoised signal s 3 , after reconstruction by addition-overlap with the N / 2 = 128 last samples of the previous frame (module 66).

Claims (8)

  1. Method of detecting vocal activity in a digital speech signal (s) processed by successive frames, comprising the step of subjecting the speech signal to noise suppression taking account of estimates of the noise included in the signal, updated for each frame in a manner depending on at least one degree of vocal activity (γ n,i ) determined for said frame, characterized in that a priori noise suppression is applied to the speech signal of each frame on the basis of estimates of the noise obtained on processing at least one preceding frame, and energy variations of the a priori noise-suppressed signal are analyzed to detect the degree of vocal activity of said frame.
  2. Method according to claim 1, wherein the degree of vocal activity (γ n,i ) is a non-binary parameter.
  3. Method according to claim 2, wherein the degree of vocal activity (γn,i) is a function which varies in a continuous manner in the range from 0 to 1.
  4. Method according to any one of the preceding claims, wherein the estimates of the noise are obtained in different frequency bands of the signal, the a priori noise suppression is effected band by band, and a degree of vocal activity (γn,i) is determined for each band.
  5. Method according to any one of the preceding claims, wherein an estimate of the noise B andn,i is obtained for the frame n in a band of frequencies i in the form:
    Figure 00230001
    where
    Figure 00230002
    where λB is a forgetting factor in the range from 0 to 1, γn,i is the degree of vocal activity determined for the frame n in the band of frequencies i, and Sn,i is an average speech signal amplitude in frame n in band i.
  6. Method according to claim 5, in which the a priori noise-suppressed signal Êpn,i relative to a frame n and a band of frequencies i is of the form:
    Figure 00240001
    where Hpn,i = S n,i - α' n-τ1,i . B n-τ1,i S n-τ2,i , τ1 is an integer at least equal to 1, τ2 is an integer at least equal to 0, α'n-τ1, i is an overestimation coefficient determined for the frame n-τ1 and the band i, and βpi is a positive coefficient.
  7. Method according to any one of the preceding claims, wherein a long-term estimate ( E n,i) of the energy of the a priori noise-suppressed signal (Êpn,i ) is computed and said long-term estimate is compared with an instantaneous estimate (ba) of said energy, computed over the current frame, to obtain the degree of vocal activity (γn,i) of said frame.
  8. A vocal activity detector, comprising processing means adapted to implement a method according to any one of the preceding claims.
EP98943998A 1997-09-18 1998-09-16 Method and apparatus for detecting speech activity Expired - Lifetime EP1016071B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9711640A FR2768544B1 (en) 1997-09-18 1997-09-18 VOICE ACTIVITY DETECTION METHOD
FR9711640 1997-12-22
PCT/FR1998/001979 WO1999014737A1 (en) 1997-09-18 1998-09-16 Method for detecting speech activity

Publications (2)

Publication Number Publication Date
EP1016071A1 EP1016071A1 (en) 2000-07-05
EP1016071B1 true EP1016071B1 (en) 2002-01-16

Family

ID=9511227

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98943998A Expired - Lifetime EP1016071B1 (en) 1997-09-18 1998-09-16 Method and apparatus for detecting speech activity

Country Status (7)

Country Link
US (1) US6658380B1 (en)
EP (1) EP1016071B1 (en)
AU (1) AU9168898A (en)
CA (1) CA2304012A1 (en)
DE (1) DE69803202T2 (en)
FR (1) FR2768544B1 (en)
WO (1) WO1999014737A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2797343B1 (en) * 1999-08-04 2001-10-05 Matra Nortel Communications VOICE ACTIVITY DETECTION METHOD AND DEVICE
GB2367467B (en) 2000-09-30 2004-12-15 Mitel Corp Noise level calculator for echo canceller
GB2384670B (en) * 2002-01-24 2004-02-18 Motorola Inc Voice activity detector and validator for noisy environments
AUPS102902A0 (en) * 2002-03-13 2002-04-11 Hearworks Pty Ltd A method and system for reducing potentially harmful noise in a signal arranged to convey speech
JP4601970B2 (en) * 2004-01-28 2010-12-22 株式会社エヌ・ティ・ティ・ドコモ Sound / silence determination device and sound / silence determination method
JP4490090B2 (en) * 2003-12-25 2010-06-23 株式会社エヌ・ティ・ティ・ドコモ Sound / silence determination device and sound / silence determination method
US8788265B2 (en) * 2004-05-25 2014-07-22 Nokia Solutions And Networks Oy System and method for babble noise detection
KR100714721B1 (en) * 2005-02-04 2007-05-04 삼성전자주식회사 Method and apparatus for detecting voice region
US7983906B2 (en) * 2005-03-24 2011-07-19 Mindspeed Technologies, Inc. Adaptive voice mode extension for a voice activity detector
US20060241937A1 (en) * 2005-04-21 2006-10-26 Ma Changxue C Method and apparatus for automatically discriminating information bearing audio segments and background noise audio segments
US8126706B2 (en) * 2005-12-09 2012-02-28 Acoustic Technologies, Inc. Music detector for echo cancellation and noise reduction
US7366658B2 (en) * 2005-12-09 2008-04-29 Texas Instruments Incorporated Noise pre-processor for enhanced variable rate speech codec
GB0703275D0 (en) * 2007-02-20 2007-03-28 Skype Ltd Method of estimating noise levels in a communication system
JP4490507B2 (en) * 2008-09-26 2010-06-30 パナソニック株式会社 Speech analysis apparatus and speech analysis method
US20130282373A1 (en) * 2012-04-23 2013-10-24 Qualcomm Incorporated Systems and methods for audio signal processing
US9363603B1 (en) 2013-02-26 2016-06-07 Xfrm Incorporated Surround audio dialog balance assessment

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3840708A (en) * 1973-07-09 1974-10-08 Itt Arrangement to test a tasi communication system
US4281218A (en) * 1979-10-26 1981-07-28 Bell Telephone Laboratories, Incorporated Speech-nonspeech detector-classifier
US4277645A (en) * 1980-01-25 1981-07-07 Bell Telephone Laboratories, Incorporated Multiple variable threshold speech detector
US5212764A (en) 1989-04-19 1993-05-18 Ricoh Company, Ltd. Noise eliminating apparatus and speech recognition apparatus using the same
DE4012349A1 (en) * 1989-04-19 1990-10-25 Ricoh Kk Noise elimination device for speech recognition system - uses spectral subtraction of sampled noise values from sampled speech values
AU633673B2 (en) 1990-01-18 1993-02-04 Matsushita Electric Industrial Co., Ltd. Signal processing device
DE69124005T2 (en) 1990-05-28 1997-07-31 Matsushita Electric Ind Co Ltd Speech signal processing device
US5469087A (en) 1992-06-25 1995-11-21 Noise Cancellation Technologies, Inc. Control system using harmonic filters
ES2137355T3 (en) * 1993-02-12 1999-12-16 British Telecomm NOISE REDUCTION.
JP3685812B2 (en) * 1993-06-29 2005-08-24 ソニー株式会社 Audio signal transmitter / receiver
US5657422A (en) * 1994-01-28 1997-08-12 Lucent Technologies Inc. Voice activity detection driven noise remediator
US5555190A (en) 1995-07-12 1996-09-10 Micro Motion, Inc. Method and apparatus for adaptive line enhancement in Coriolis mass flow meter measurement
US5774837A (en) * 1995-09-13 1998-06-30 Voxware, Inc. Speech coding system and method using voicing probability determination
US5659622A (en) * 1995-11-13 1997-08-19 Motorola, Inc. Method and apparatus for suppressing noise in a communication system
FI100840B (en) * 1995-12-12 1998-02-27 Nokia Mobile Phones Ltd Noise attenuator and method for attenuating background noise from noisy speech and a mobile station

Also Published As

Publication number Publication date
FR2768544A1 (en) 1999-03-19
DE69803202T2 (en) 2002-08-29
CA2304012A1 (en) 1999-03-25
WO1999014737A1 (en) 1999-03-25
FR2768544B1 (en) 1999-11-19
AU9168898A (en) 1999-04-05
EP1016071A1 (en) 2000-07-05
US6658380B1 (en) 2003-12-02
DE69803202D1 (en) 2002-02-21

Similar Documents

Publication Publication Date Title
EP1016072B1 (en) Method and apparatus for suppressing noise in a digital speech signal
EP1016071B1 (en) Method and apparatus for detecting speech activity
EP1356461B1 (en) Noise reduction method and device
EP1789956B1 (en) Method of processing a noisy sound signal and device for implementing said method
US7286980B2 (en) Speech processing apparatus and method for enhancing speech information and suppressing noise in spectral divisions of a speech signal
FR2771542A1 (en) FREQUENTIAL FILTERING METHOD APPLIED TO NOISE NOISE OF SOUND SIGNALS USING A WIENER FILTER
EP2936488B1 (en) Effective attenuation of pre-echos in a digital audio signal
EP0490740A1 (en) Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders
EP2772916A1 (en) Method for suppressing noise in an audio signal by an algorithm with variable spectral gain with dynamically scalable hardness
EP3192073B1 (en) Discrimination and attenuation of pre-echoes in a digital audio signal
EP1016073B1 (en) Method and apparatus for suppressing noise in a digital speech signal
EP1021805B1 (en) Method and apparatus for conditioning a digital speech signal
EP2515300B1 (en) Method and system for noise reduction
EP4287648A1 (en) Electronic device and associated processing method, acoustic apparatus and computer program
FR3051958A1 (en) METHOD AND DEVICE FOR ESTIMATING A DEREVERBERE SIGNAL
WO1999027523A1 (en) Method for reconstructing sound signals after noise abatement
EP1192618B1 (en) Audio coding with adaptive liftering

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000316

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): CH DE FI GB LI SE

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 11/02 A, 7G 10L 11/06 B

RTI1 Title (correction)

Free format text: METHOD AND APPARATUS FOR DETECTING SPEECH ACTIVITY

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 20001130

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: EADS DEFENCE AND SECURITY NETWORKS

Owner name: MATRA NORTEL COMMUNICATIONS

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): CH DE FI GB LI SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020116

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 69803202

Country of ref document: DE

Date of ref document: 20020221

REG Reference to a national code

Ref country code: CH

Ref legal event code: NV

Representative=s name: KELLER & PARTNER PATENTANWAELTE AG

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020416

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20020501

RAP2 Party data changed (patent owner data changed or rights of a patent transferred)

Owner name: EADS DEFENCE AND SECURITY NETWORKS

Owner name: NORTEL NETWORKS FRANCE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20020930

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20020930

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20030930

Year of fee payment: 6

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050401

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20050817

Year of fee payment: 8

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20060916

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060916