EP0714088A1 - Voice activity detection - Google Patents

Voice activity detection Download PDF

Info

Publication number
EP0714088A1
EP0714088A1 EP95402589A EP95402589A EP0714088A1 EP 0714088 A1 EP0714088 A1 EP 0714088A1 EP 95402589 A EP95402589 A EP 95402589A EP 95402589 A EP95402589 A EP 95402589A EP 0714088 A1 EP0714088 A1 EP 0714088A1
Authority
EP
European Patent Office
Prior art keywords
vector
autocorrelation
value
standard
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP95402589A
Other languages
German (de)
French (fr)
Other versions
EP0714088B1 (en
Inventor
Jamil Chaqui
Ivan Bourmeyster
François Robbe
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel Lucent SAS
Original Assignee
Alcatel SA
Alcatel Mobile Communication France SA
Alcatel Mobile Phones SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel SA, Alcatel Mobile Communication France SA, Alcatel Mobile Phones SA filed Critical Alcatel SA
Publication of EP0714088A1 publication Critical patent/EP0714088A1/en
Application granted granted Critical
Publication of EP0714088B1 publication Critical patent/EP0714088B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the field of the invention is that of the detection of voice activity in an audio signal.
  • a first solution consists in monitoring the evolution of the signal energy. If it increases rapidly, it can correspond to the appearance of a vocal activity but it can also correspond to a variation of the ambient noise. It follows that this method, although very simple to implement does not appear to be very reliable in relatively noisy environments as is the case for example in a motor vehicle.
  • the autocorrelation coefficients of the audio signal are generally calculated to find the second maximum of these coefficients, the first maximum representing the energy. This is again a relatively complex technique which does not give complete satisfaction in terms of reliability.
  • the present invention therefore provides a solution for detecting voice activity which provides acceptable reliability for reduced complexity.
  • the device comprises reduction means for establishing a reduced standard by dividing the standard of the differentiation vector by a reduction value, this reduced standard representing a second voice activity indicator.
  • the reduction value is equal to the energy of the signal or it is equal to the sum of the energy of the signal and a compression constant.
  • the device comprises means for smoothing one of these voice activity indicators to produce a linear combination of the present value of this indicator and its previous value, this linear combination representing a third voice activity indicator.
  • the device comprises decision means for producing a voice activity signal if one of these indicators exceeds a detection threshold.
  • an advantageous solution consists in choosing the sum of the absolute values of the components of the differentiation vector as the norm of this vector.
  • an audio signal is digital in nature, that is to say that it is in the form of a series of samples which correspond to the value of the signal at successive instants which repeat at the rate of a sampling frequency.
  • the signal to be analyzed is analog in nature, if it comes from a microphone for example, it is first subjected to an analog-digital converter which operates at the rate of this sampling frequency to produce the audio signal .
  • the audio signal being digital, it seems natural to realize the voice activity detection device by means of a digital signal processor.
  • This processor can of course be used for other purposes.
  • this detection device will not be described in its structure because it implements elementary operations well known to those skilled in the art such as additions, multiplications, comparisons. It is therefore a functional description which has been retained, because it seems far preferable to explain the implementation of the invention with the greatest clarity.
  • the device therefore receives the audio signal and we consider a series of samples S (i) where i varies from 0 to N.
  • the first operation performed by the device is the calculation of the autocorrelation coefficients R (k) of the signal for all the values of k between O and N:
  • first R0 and second R q autocorrelation vectors have no utility in themselves. They have been introduced for the simple purpose of clarifying the presentation. The important point is the calculation of the differentiation vector. Thus, this vector is defined by the value of these components as defined above.
  • the detection device calculates a standard ⁇ R ⁇ of the differentiation vector ⁇ R.
  • this standard is equal to the sum of the absolute values of the components of the vector:
  • the invention also applies if one chooses to choose another standard such as, in particular, the Euclidean standard or the maximum value of the absolute values of each of the components.
  • This standard whatever it is, constitutes a first indicator of vocal activity.
  • a first option is to compare this indicator with a threshold to establish that there is presence of voice activity in the audio signal if the indicator is greater than the threshold.
  • the detection device calculates a reduced standard P by dividing the standard ⁇ R ⁇ of the differentiation vector by a reduction value.
  • this reduction value can be chosen equal to the energy R (0) of the audio signal, which will tend to compress the dynamics of the ⁇ R ⁇ standard.
  • Another solution which provides its own advantages consists in assigning to this reduction value the sum of the energy R (0) of the audio signal and of a constant which will be called the floor value C.
  • This reduced standard P in any event constitutes a second indicator of vocal activity which can also be compared to a threshold to establish the absence or the presence of vocal activity in this signal.
  • the detection device smoothes this reduced standard.
  • a reduced standard P i corresponds to the i th series.
  • This smoothed value P ⁇ i constitutes a third voice activity indicator which can also be compared to a threshold to establish whether or not the audio signal has voice activity.
  • the detection device therefore compares it to a detection threshold T.
  • the simplest solution consists in assigning a constant value to this detection threshold.
  • an advantageous solution consists in adapting this threshold to the level of the reduced standard P when the audio signal is devoid of voice activity.
  • the invention naturally relates to the method for detecting voice activity which is implemented by this device.
  • GSM pan-European digital cellular radiocommunication system
  • the analog signal to be processed is sampled at the frequency of 8 kHz.
  • the samples thus obtained are grouped in series of 160 which therefore each correspond to 20 ms.
  • N the number of samples

Abstract

The detector calculates autocorrelation coefficients, R(k) for a signal. A first vector (RO) is composed from a first series, K = 0 ... (N-q), and a second vector (Rq) is formed from the components, k = q ...N. which are shifted by q relative to the first. The first vector is subtracted from the second to yield a difference vector, delta, R, from which a first standard vector is obtained. A reduced standard is obtained by dividing the first standard vector by a reduction value to give a second indicator. The reduction value is calculated from the energy of the audio signal or the sum of the audio energy and a bottom value, C. The linear combination of the present and previous value give a third indicator and these are measured against a threshold.

Description

Le domaine de l'invention est celui de la détection d'activité vocale dans un signal audio.The field of the invention is that of the detection of voice activity in an audio signal.

En présence d'un signal audio qui est souvent issu d'un microphone, il est parfois nécessaire de savoir si ce signal contient de la parole ou bien s'il ne comporte que du bruit.In the presence of an audio signal which often comes from a microphone, it is sometimes necessary to know whether this signal contains speech or whether it only contains noise.

En effet, la détection d'activité vocale va souvent conditionner certains traitements que le signal audio est susceptible de subir. Au nombre des applications typiques qu'il convient d'activer en présence d'un signal de parole, on peut identifier la reconnaissance de la parole, l'annulation d'écho ou encore la fonction d'enregistrement.Indeed, the detection of voice activity will often condition certain processing that the audio signal is likely to undergo. Among the typical applications that should be activated in the presence of a speech signal, one can identify speech recognition, echo cancellation or the recording function.

Au contraire, si l'on considère un signal de téléphonie où seule la parole représente l'information utile, il est maintenant courant dans le domaine des radiocommunications de ne pas transmettre ce signal si celui-ci ne comprend que du bruit, c'est que l'on appelle couramment la transmission discontinue.On the contrary, if we consider a telephony signal where only speech represents useful information, it is now common in the radiocommunication field not to transmit this signal if it only includes noise, this is commonly known as discontinuous transmission.

Ainsi, des solutions ont déjà été proposées pour tenter de détecter l'activité vocale dans un signal audio.Thus, solutions have already been proposed in an attempt to detect voice activity in an audio signal.

Une première solution consiste à suivre l'évolution de l'énergie du signal. Si celle-ci augmente rapidement, cela peut correspondre à l'apparition d'une activité vocale mais cela peut aussi correspondre à une variation du bruit ambiant. Il s'ensuit que cette méthode, bien que très simple à mettre en oeuvre ne se présente pas comme très fiable dans les milieux relativement bruités comme c'est le cas par exemple dans un véhicule automobile.A first solution consists in monitoring the evolution of the signal energy. If it increases rapidly, it can correspond to the appearance of a vocal activity but it can also correspond to a variation of the ambient noise. It follows that this method, although very simple to implement does not appear to be very reliable in relatively noisy environments as is the case for example in a motor vehicle.

On connaît également de nombreuses autres solutions qui ont été développées pour pallier le défaut de fiabilité de la précédente. C'est le cas notamment de celles qui mettent en oeuvre une transformée de Fourier du signal audio pour mesurer la distance spectrale le séparant d'un signal de bruit moyenné qui est mis à jour en l'absence de toute activité vocale. C'est également le cas des méthodes utilisant une analyse du signal en sous-bandes, méthodes qui sont proches de celles faisant appel à une transformée de Fourier. C'est encore le cas des méthodes faisant appel à l'analyse cepstrale.Many other solutions are also known which have been developed to overcome the lack of reliability of the previous one. This is particularly the case for those which implement a Fourier transform of the audio signal to measure the spectral distance separating it from an averaged noise signal which is updated in the absence of any vocal activity. This is also the case for methods using an analysis of the signal in sub-bands, methods which are close to those using a Fourier transform. This is also the case for methods using cepstral analysis.

Il s'agit là de techniques beaucoup plus complexes qui, si elles apportent bien un gain au niveau de la fiabilité, ne donnent cependant pas complète satisfaction sur ce point.These are much more complex techniques which, although they bring a gain in terms of reliability, do not however give complete satisfaction on this point.

On connaît aussi des solutions qui mettent à profit une certaine périodicité de la parole au nombre desquelles figure celle décrite dans la demande de brevet EP 0 123 349. En effet, les sons voisés présentent tous une périodicité déterminée alors que le bruit est normalement apériodique ou bien présente une périodicité distincte de celle de la parole.Solutions are also known which take advantage of a certain periodicity of speech, among which is that described in patent application EP 0 123 349. Indeed, voiced sounds all have a determined periodicity while the noise is normally aperiodic or well presents a periodicity distinct from that of speech.

On peut donc rechercher la valeur de cette périodicité déterminée (ou "pitch" en anglais) pour reconnaître la présence de sons voisés.We can therefore search for the value of this determined periodicity (or "pitch" in English) to recognize the presence of voiced sounds.

Pour ce faire, on calcule généralement les coefficients d'autocorrélation du signal audio pour rechercher le second maximum de ces coefficients, le premier maximum représentant l'énergie. Il s'agit là encore d'une technique relativement complexe qui ne donne pas complète satisfaction sur le plan de la fiabilité.To do this, the autocorrelation coefficients of the audio signal are generally calculated to find the second maximum of these coefficients, the first maximum representing the energy. This is again a relatively complex technique which does not give complete satisfaction in terms of reliability.

La présente invention propose donc une solution pour détecter l'activité vocale qui procure une fiabilité acceptable pour une complexité réduite.The present invention therefore provides a solution for detecting voice activity which provides acceptable reliability for reduced complexity.

Selon l'invention, un dispositif de détection d'activité vocale dans un signal audio comprend :

  • des moyens pour calculer les coefficients d'autocorrélation de ce signal,
  • des moyens pour identifier un premier vecteur d'autocorrélation ayant pour composantes une première série de coefficients d'autocorrélation,
  • des moyens pour identifier un second vecteur d'autocorrélation ayant pour composantes une deuxième série de coefficients d'autocorrélation décalée par rapport à la première série d'une valeur de décalage prédéterminée,
  • des moyens pour soustraire le premier vecteur d'autocorrélation du second vecteur d'autocorrélation afin d'obtenir un vecteur de différentiation,
  • des moyens pour calculer une norme de ce vecteur de différentiation, cette norme représentant un premier indicateur d'activité vocale.
According to the invention, a device for detecting voice activity in an audio signal comprises:
  • means for calculating the autocorrelation coefficients of this signal,
  • means for identifying a first autocorrelation vector having as components a first series of autocorrelation coefficients,
  • means for identifying a second autocorrelation vector having as components a second series autocorrelation coefficients offset from the first series by a predetermined offset value,
  • means for subtracting the first autocorrelation vector from the second autocorrelation vector in order to obtain a differentiation vector,
  • means for calculating a norm of this differentiation vector, this norm representing a first indicator of vocal activity.

De plus, le dispositif comprend des moyens de réduction pour établir une norme réduite en divisant la norme du vecteur de différentiation par une valeur de réduction, cette norme réduite représentant un deuxième indicateur d'activité vocale.In addition, the device comprises reduction means for establishing a reduced standard by dividing the standard of the differentiation vector by a reduction value, this reduced standard representing a second voice activity indicator.

A titre d'exemple, la valeur de réduction est égale à l'énergie du signal ou bien elle est égale à la somme de l'énergie du signal et d'une constante de compression.For example, the reduction value is equal to the energy of the signal or it is equal to the sum of the energy of the signal and a compression constant.

Selon une caractéristique additionnnelle du dispositif, celui-ci comprend des moyens de lissage de l'un de ces indicateurs d'activité vocale pour produire une combinaison linéaire de la valeur présente de cet indicateur et de sa valeur antérieure, cette combinaison linéaire représentant un troisième indicateur d'activité vocale.According to an additional characteristic of the device, it comprises means for smoothing one of these voice activity indicators to produce a linear combination of the present value of this indicator and its previous value, this linear combination representing a third voice activity indicator.

Par ailleurs, le dispositif comprend des moyens de décision pour produire un signal d'activité vocale si l'un de ces indicateurs excède un seuil de détection.Furthermore, the device comprises decision means for producing a voice activity signal if one of these indicators exceeds a detection threshold.

On peut trouver un intérêt à établir ce seuil de détection à partir de l'énergie du signal audio en l'absence de signal d'activité vocale.It may be advantageous to establish this detection threshold from the energy of the audio signal in the absence of a voice activity signal.

En outre, une solution avantageuse consiste à choisir la somme des valeurs absolues des composantes du vecteur de différentiation comme norme de ce vecteur.In addition, an advantageous solution consists in choosing the sum of the absolute values of the components of the differentiation vector as the norm of this vector.

L'invention concerne également une méthode de détection d'activité vocale dans un signal audio comprenant les opérations suivantes :

  • calcul des coefficients d'autocorrélation de ce signal,
  • identification d'un premier vecteur d'autocorrélation ayant pour composantes une première série de coefficients d'autocorrélation,
  • identification d'un second vecteur d'autocorrélation ayant pour composantes une deuxième série de coefficients d'autocorrélation décalée par rapport à la première série d'une valeur de décalage prédéterminée,
  • soustraction du premier vecteur d'autocorrélation du second vecteur d'autocorrélation afin d'obtenir un vecteur de différentiation,
  • calcul d'une norme du vecteur de différentiation, cette norme représentant un premier indicateur d'activité vocale.
The invention also relates to a method for detecting voice activity in an audio signal, comprising the following operations:
  • calculation of the autocorrelation coefficients of this signal,
  • identification of a first autocorrelation vector having as components a first series of autocorrelation coefficients,
  • identification of a second autocorrelation vector having as components a second series of autocorrelation coefficients offset with respect to the first series by a predetermined offset value,
  • subtraction of the first autocorrelation vector from the second autocorrelation vector in order to obtain a differentiation vector,
  • calculation of a standard of the differentiation vector, this standard representing a first indicator of vocal activity.

La présente invention appraîtra maintenant de manière plus claire dans le cadre d'un exemple de réalisation donné à titre illustratif en se référant à la figure annexée qui représente le déroulement des opérations effectuées par le dispositif de détection d'activité vocale.The present invention will now appear more clearly in the context of an exemplary embodiment given by way of illustration with reference to the appended figure which represents the flow of operations carried out by the voice activity detection device.

On se place dans le cas où un signal audio est de nature numérique, c'est-à-dire qu'il se présente sous la forme d'une suite d'échantillons qui correspondent à la valeur du signal à des instants successifs qui se répètent au rythme d'une fréquence d'échantillonnage.We place ourselves in the case where an audio signal is digital in nature, that is to say that it is in the form of a series of samples which correspond to the value of the signal at successive instants which repeat at the rate of a sampling frequency.

Lorsque le signal à analyser est de nature analogique, s'il est issu d'un microphone par exemple, il est d'abord soumis à un convertisseur analogique-numérique qui fonctionne à la cadence de cette fréquence d'échantillonnage pour produire le signal audio.When the signal to be analyzed is analog in nature, if it comes from a microphone for example, it is first subjected to an analog-digital converter which operates at the rate of this sampling frequency to produce the audio signal .

Le signal audio étant numérique, il apparaît naturel de réaliser le dispositif de détection d'activité vocale au moyen d'un processeur de signal numérique. Ce processeur pourra bien entendu être utilisé à d'autres fins.The audio signal being digital, it seems natural to realize the voice activity detection device by means of a digital signal processor. This processor can of course be used for other purposes.

On comprend donc que ce dispositif de détection ne sera pas décrit dans sa structure car il met en oeuvre des opérations élémentaires bien connues de l'homme du métier telles que additions, multiplications, comparaisons. C'est donc une description fonctionnelle qui a été retenue, car elle semble de loin préférable pour expliciter la mise en oeuvre de l'invention avec la plus grande clarté.It is therefore understood that this detection device will not be described in its structure because it implements elementary operations well known to those skilled in the art such as additions, multiplications, comparisons. It is therefore a functional description which has been retained, because it seems far preferable to explain the implementation of the invention with the greatest clarity.

En référence à la figure unique, le dispositif reçoit donc le signal audio et on considère une série d'échantillons S(i) où i varie de 0 à N.With reference to the single figure, the device therefore receives the audio signal and we consider a series of samples S (i) where i varies from 0 to N.

La première opération qu'effectue le dispositif est le calcul des coefficients d'autocorrélation R(k) du signal pour toutes les valeurs de k comprises entre O et N :

Figure imgb0001
The first operation performed by the device is the calculation of the autocorrelation coefficients R (k) of the signal for all the values of k between O and N:
Figure imgb0001

A partir de ces coefficients d'autocorrélation R(k) on peut définir un premier R₀ et un second Rq vecteurs d'autocorrélation en considérant de plus une valeur de décalage q qui est un entier strictement positif. Le premier vecteur d'autocorrélation R₀ a pour composants les (N-q+1) premiers coefficients d'autocorrélation R(k) : R 0 = (R(O), R(1), ..., R(N-q))

Figure imgb0002
From these autocorrelation coefficients R (k) we can define a first R₀ and a second R q autocorrelation vectors by considering in addition an offset value q which is a strictly positive integer. The first autocorrelation vector R₀ has as components the (N-q + 1) first autocorrelation coefficients R (k): R 0 = (R (O), R (1), ..., R (Nq))
Figure imgb0002

Le second vecteur d'autocorrélation Rq a pour composants les (N-q+1) derniers coefficients d'autocorrélation R(k) : R q = (R(q), R(q+1), ..., R(N))

Figure imgb0003
The second autocorrelation vector R q has as components the (N-q + 1) last autocorrelation coefficients R (k): R q = (R (q), R (q + 1), ..., R (N))
Figure imgb0003

Le dispositif de détection calcule alors un vecteur de différentiation ΔR en soustrayant le premier vecteur d'autocorrélation R₀ du second vecteur d'autocorrélation Rq : ΔR = R q - R 0

Figure imgb0004
The detection device then calculates a differentiation vector ΔR by subtracting the first autocorrelation vector R₀ from the second autocorrelation vector R q : ΔR = R q - R 0
Figure imgb0004

Si l'on note ΔR(k) la (k+1)ième composante de ce vecteur de différenciation, celle-ci vaut alors pour tout k compris entre 0 et N-q : ΔR(k) = R(k+q) - R(k)

Figure imgb0005
If we denote by ΔR (k) the (k + 1) th component of this differentiation vector, this then applies for all k between 0 and Nq: ΔR (k) = R (k + q) - R (k)
Figure imgb0005

On s'aperçoit que les premiers R₀ et deuxième Rq vecteurs d'autocorrélation n'ont pas d'utilité en eux-mêmes. Ils ont été introduits dans le simple but de clarifier la présentation. Le point important est le calcul du vecteur de différenciation. Ainsi, ce vecteur se définit par la valeur de ces composantes telle que définie ci-dessus.We can see that the first R₀ and second R q autocorrelation vectors have no utility in themselves. They have been introduced for the simple purpose of clarifying the presentation. The important point is the calculation of the differentiation vector. Thus, this vector is defined by the value of these components as defined above.

Dès lors, le dispositif de détection calcule une norme ∥ΔR∥ du vecteur de différentiation ΔR. De manière avantageuse, cette norme est égale à la somme des valeurs absolues des composantes du vecteur :

Figure imgb0006
Consequently, the detection device calculates a standard ∥ΔR∥ of the differentiation vector ΔR. Advantageously, this standard is equal to the sum of the absolute values of the components of the vector:
Figure imgb0006

Il va sans dire que l'invention s'applique également si l'on choisit de retenir une autre norme telle que, notamment, la norme euclidienne ou la valeur maximale des valeurs absolues de chacune des composantes.It goes without saying that the invention also applies if one chooses to choose another standard such as, in particular, the Euclidean standard or the maximum value of the absolute values of each of the components.

Cette norme, quelle qu'elle soit, constitue un premier indicateur d'activité vocale.This standard, whatever it is, constitutes a first indicator of vocal activity.

Une première option consiste à comparer cet indicateur à un seuil pour établir qu'il y a présence d'activité vocale dans le signal audio si l'indicateur est supérieur au seuil.A first option is to compare this indicator with a threshold to establish that there is presence of voice activity in the audio signal if the indicator is greater than the threshold.

Selon une seconde option, le dispositif de détection calcule une norme réduite P en divisant la norme ∥ΔR∥ du vecteur de différentiation par une valeur de réduction. A titre d'exemple, cette valeur de réduction peut être choisie égale à l'énergie R(0) du signal audio, ce qui va tendre à comprimer la dynamique de la norme ∥ΔR∥. Une autre solution qui procure ses avantages propres consiste à affecter à cette valeur de réduction la somme de l'énergie R(0) du signal audio et d'une constante que l'on nommera valeur plancher C.According to a second option, the detection device calculates a reduced standard P by dividing the standard ∥ΔR∥ of the differentiation vector by a reduction value. For example, this reduction value can be chosen equal to the energy R (0) of the audio signal, which will tend to compress the dynamics of the ∥ΔR∥ standard. Another solution which provides its own advantages consists in assigning to this reduction value the sum of the energy R (0) of the audio signal and of a constant which will be called the floor value C.

Cette norme réduite P, en tout état de cause constitue un deuxième indicateur d'activité vocale que l'on peut également comparer à un seuil pour établir l'absence ou la présence d'activité vocale dans ce signal.This reduced standard P, in any event constitutes a second indicator of vocal activity which can also be compared to a threshold to establish the absence or the presence of vocal activity in this signal.

Selon une troisième option, le dispositif de détection procède à un lissage de cette norme réduite. Ainsi, si l'on considère plusieurs séries successives de N échantillons du signal audio, une norme réduite Pi correspond à la iième série. La valeur lissée P ¯

Figure imgb0007
i de cette norme réduite sera une combinaison linéaire de la valeur lissée P ¯
Figure imgb0008
i-1 de la norme réduite Pi-1 associée à la série précédente et de cette norme réduite Pi : P ¯ i = α P ¯ i-1 + βP i
Figure imgb0009
According to a third option, the detection device smoothes this reduced standard. Thus, if we consider several successive series of N samples of the audio signal, a reduced standard P i corresponds to the i th series. The smoothed value P ¯
Figure imgb0007
i of this reduced norm will be a linear combination of the smoothed value P ¯
Figure imgb0008
i-1 of the reduced standard P i-1 associated with the previous series and of this reduced standard P i : P ¯ i = α P ¯ i-1 + βP i
Figure imgb0009

On peut choisir α et β de sorte que leur somme soit égale à l'unité.We can choose α and β so that their sum is equal to unity.

De plus, il convient d'initialiser P ¯

Figure imgb0010
₀ à l'aide d'une constante quelconque, 0 par exemple.In addition, it is necessary to initialize P ¯
Figure imgb0010
₀ using any constant, 0 for example.

Cette valeur lissée P ¯

Figure imgb0011
i constitue un troisième indicateur d'activité vocale que l'on peut aussi comparer à un seuil pour établir si le signal audio présente ou non une activité vocale.This smoothed value P ¯
Figure imgb0011
i constitutes a third voice activity indicator which can also be compared to a threshold to establish whether or not the audio signal has voice activity.

Quel que soit l'indicateur d'activité vocale retenu, le dispositif de détection le compare donc à un seuil de détection T. La solution la plus simple consiste à affecter une valeur constante à ce seuil de détection.Whatever the voice activity indicator selected, the detection device therefore compares it to a detection threshold T. The simplest solution consists in assigning a constant value to this detection threshold.

Cependant, une solution avantageuse consiste à adapter ce seuil au niveau de la norme réduite P lorsque le signal audio est dépourvu d'activité vocale.However, an advantageous solution consists in adapting this threshold to the level of the reduced standard P when the audio signal is devoid of voice activity.

On peut donc calculer la valeur moyenne de la norme réduite sur plusieurs séries successives d'échantillons du signal audio pour lesquelles aucune activité vocale n'a été détectée et multiplier cette valeur moyenne par un coefficient constant pour obtenir le seuil de détection T. Il s'agit là d'une technique analogue à celle du lissage bien connue de l'homme du métier et elle ne sera donc pas plus détaillée.We can therefore calculate the average value of the reduced standard over several successive series of samples of the audio signal for which no voice activity has been detected and multiply this average value by a constant coefficient to obtain the detection threshold T. It s 'This is a technique similar to that of smoothing well known to those skilled in the art and it will therefore not be more detailed.

Outre le dispositif de détection proprement dit, l'invention concerne naturellement la méthode de détection d'activité vocale qui est mise en oeuvre par ce dispositif.In addition to the actual detection device, the invention naturally relates to the method for detecting voice activity which is implemented by this device.

A titre d'application numérique et pour présenter un cas concret d'utilisation de l'invention, on prendra pour illustration le système paneuropéen de radiocommunication cellulaire numérique dit système GSM. Dans ce système le signal analogique à traiter est échantillonné à la fréquence de 8 kHz. Les échantillons ainsi obtenus sont regroupés en séries de 160 qui correspondent donc chacune à 20 ms.By way of digital application and to present a concrete case of use of the invention, we will take as illustration the pan-European digital cellular radiocommunication system called GSM system. In this system the analog signal to be processed is sampled at the frequency of 8 kHz. The samples thus obtained are grouped in series of 160 which therefore each correspond to 20 ms.

Ainsi, N, le nombre d'échantillons, vaut 160 et l'on choisira de manière avantageuse de fixer la valeur de décalage q égale à l'unité.Thus, N, the number of samples, is worth 160 and it will be advantageous to choose to set the offset value q equal to unity.

Les composantes du vecteur de différentiation s'écrivent alors pour tout k compris entre 1 et 160 : ΔR(k) = R(k+1) - R(k)

Figure imgb0012
The components of the differentiation vector are then written for all k between 1 and 160: ΔR (k) = R (k + 1) - R (k)
Figure imgb0012

La norme de ce vecteur peut donc s'écrire :

Figure imgb0013
The norm of this vector can therefore be written:
Figure imgb0013

Claims (9)

Dispositif de détection d'activité vocale dans un signal audio comprenant : - des moyens pour calculer les coefficients d'autocorrélation (R(k)) de ce signal, - des moyens pour identifier un premier vecteur d'autocorrélation (R₀) ayant pour composantes une première série (k=0,...,N-q) de coefficients d'autocorrélation (R(k)), - des moyens pour identifier un second vecteur d'autocorrélation (Rq) ayant pour composantes une deuxième série (k=q,...,N) de coefficients d'autocorrélation (R(k)) décalée par rapport à ladite première série d'une valeur de décalage (q) prédéterminée, - des moyens pour soustraire ledit premier vecteur d'autocorrélation (R₀) dudit second vecteur d'autocorrélation (Rq) afin d'obtenir un vecteur de différentiation (ΔR), - des moyens pour calculer une norme (∥ΔR∥) dudit vecteur de différentiation, cette norme représentant un premier indicateur d'activité vocale. Voice activity detection device in an audio signal comprising: means for calculating the autocorrelation coefficients (R (k)) of this signal, means for identifying a first autocorrelation vector (R₀) having as components a first series (k = 0, ..., Nq) of autocorrelation coefficients (R (k)), means for identifying a second autocorrelation vector (R q ) having as components a second series (k = q, ..., N) of autocorrelation coefficients (R (k)) offset with respect to said first series a predetermined offset value (q), means for subtracting said first autocorrelation vector (R₀) from said second autocorrelation vector (R q ) in order to obtain a differentiation vector (ΔR), - Means for calculating a norm (∥ΔR∥) of said differentiation vector, this norm representing a first indicator of vocal activity. Dispositif selon la revendication 1, caractérisé en ce qu'il comprend de plus des moyens de réduction pour établir une norme réduite en divisant ladite norme (∥ΔR∥) du vecteur de différentiation par une valeur de réduction, cette norme réduite représentant un deuxième indicateur d'activité vocale.Device according to claim 1, characterized in that it further comprises reduction means for establishing a reduced standard by dividing said standard (∥ΔR∥) of the differentiation vector by a reduction value, this reduced standard representing a second indicator voice activity. Dispositif selon la revendication 2 caractérisé en ce que ladite valeur de réduction est égale à l'énergie du signal audio.Device according to claim 2 characterized in that said reduction value is equal to the energy of the audio signal. Dispositif selon la revendication 2 caractérisé en ce que ladite valeur de réduction est égale à la somme de l'énergie du signal audio et d'une valeur plancher (C).Device according to Claim 2, characterized in that the said reduction value is equal to the sum of the energy of the audio signal and of a floor value (C). Dispositif selon l'une quelconque des revendications 1 à 4, caractérisé en ce qu'il comprend des moyens de lissage de l'un desdits indicateurs d'activité vocale pour produire une combinaison linéaire de la valeur présente de cet indicateur et de sa valeur antérieure, ladite combinaison linéaire représentant un troisième indicateur d'activité vocale.Device according to any one of claims 1 to 4, characterized in that it comprises means for smoothing one of said activity indicators voice to produce a linear combination of the present value of this indicator and its previous value, said linear combination representing a third voice activity indicator. Dispositif selon l'une quelconque des revendications 1 à 5, caractérisé en ce qu'il comprend des moyens de décision pour produire un signal d'activité vocale si l'un desdits indicateurs excède un seuil de détection.Device according to any one of claims 1 to 5, characterized in that it comprises decision means for producing a voice activity signal if one of said indicators exceeds a detection threshold. Dispositif selon la revendication 6, caractérisé en ce que ledit seuil de détection est établi à partir de la valeur de la norme réduite dudit signal audio en l'absence dudit signal d'activité vocale.Device according to claim 6, characterized in that said detection threshold is established from the value of the reduced standard of said audio signal in the absence of said voice activity signal. Dispositif selon l'une quelconque des revendications 1 à 7, caractérisé en ce que ladite norme (∥ΔR∥) du vecteur de différentiation est égale à la somme des valeurs absolues des composantes de ce vecteur.Device according to any one of claims 1 to 7, characterized in that said standard (∥ΔR∥) of the differentiation vector is equal to the sum of the absolute values of the components of this vector. Méthode de détection d'activité vocale dans un signal audio comprenant les opérations suivantes : - calcul des coefficients d'autocorrélation (R(k)) de ce signal, - identification d'un premier vecteur d'autocorrélation (R₀) ayant pour composantes une première série (k=0,...,N-q) de coefficients d'autocorrélation (R(k)), - identification d'un second vecteur d'autocorrélation (Rq) ayant pour composantes une deuxième série (k=q,...,N) de coefficients d'autocorrélation (R(k)) décalée par rapport à ladite première série d'une valeur de décalage (q) prédéterminée, - soustraction dudit premier vecteur d'autocorrélation (R₀) dudit second vecteur d'autocorrélation (Rq) afin d'obtenir un vecteur de différentiation (ΔR), - calcul d'une norme (∥ΔR∥) dudit vecteur de différentiation, cette norme représentant un premier indicateur d'activité vocale. Method for detecting voice activity in an audio signal, comprising the following operations: - calculation of the autocorrelation coefficients (R (k)) of this signal, - identification of a first autocorrelation vector (R₀) having as components a first series (k = 0, ..., Nq) of autocorrelation coefficients (R (k)), - identification of a second autocorrelation vector (R q ) having as components a second series (k = q, ..., N) of autocorrelation coefficients (R (k)) offset with respect to said first series d 'a predetermined offset value (q), subtraction of said first autocorrelation vector (R₀) from said second autocorrelation vector (R q ) in order to obtain a differentiation vector (ΔR), - Calculation of a standard (∥ΔR∥) of said differentiation vector, this standard representing a first indicator of vocal activity.
EP95402589A 1994-11-22 1995-11-17 Voice activity detection Expired - Lifetime EP0714088B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR9413962 1994-11-22
FR9413962A FR2727236B1 (en) 1994-11-22 1994-11-22 DETECTION OF VOICE ACTIVITY

Publications (2)

Publication Number Publication Date
EP0714088A1 true EP0714088A1 (en) 1996-05-29
EP0714088B1 EP0714088B1 (en) 1999-08-18

Family

ID=9469024

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95402589A Expired - Lifetime EP0714088B1 (en) 1994-11-22 1995-11-17 Voice activity detection

Country Status (10)

Country Link
US (1) US5732141A (en)
EP (1) EP0714088B1 (en)
JP (1) JPH08221097A (en)
AT (1) ATE183598T1 (en)
AU (1) AU698712B2 (en)
CA (1) CA2163295A1 (en)
DE (1) DE69511508T2 (en)
ES (1) ES2136815T3 (en)
FI (1) FI955584A (en)
FR (1) FR2727236B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19716862A1 (en) * 1997-04-22 1998-10-29 Deutsche Telekom Ag Voice activity detection

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6556967B1 (en) 1999-03-12 2003-04-29 The United States Of America As Represented By The National Security Agency Voice activity detector
US6381568B1 (en) 1999-05-05 2002-04-30 The United States Of America As Represented By The National Security Agency Method of transmitting speech using discontinuous transmission and comfort noise
EP1304682A1 (en) * 2000-07-05 2003-04-23 Alcatel Distributed speech recognition system
EP1170728A1 (en) * 2000-07-05 2002-01-09 Alcatel System for adaptively reducing noise in speech signals
EP1175058A1 (en) * 2000-07-21 2002-01-23 Alcatel Processor system, and terminal, and network-unit, and method
US7305099B2 (en) * 2003-08-12 2007-12-04 Sony Ericsson Mobile Communications Ab Electronic devices, methods, and computer program products for detecting noise in a signal based on autocorrelation coefficient gradients
EP1729410A1 (en) * 2005-06-02 2006-12-06 Sony Ericsson Mobile Communications AB Device and method for audio signal gain control
JP4516157B2 (en) * 2008-09-16 2010-08-04 パナソニック株式会社 Speech analysis device, speech analysis / synthesis device, correction rule information generation device, speech analysis system, speech analysis method, correction rule information generation method, and program
US9002030B2 (en) 2012-05-01 2015-04-07 Audyssey Laboratories, Inc. System and method for performing voice activity detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0123349A1 (en) * 1983-04-20 1984-10-31 Philips Electronics Uk Limited Apparatus for distinguishing between speech and certain other signals
EP0335521A1 (en) * 1988-03-11 1989-10-04 BRITISH TELECOMMUNICATIONS public limited company Voice activity detection

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919479A (en) * 1972-09-21 1975-11-11 First National Bank Of Boston Broadcast signal identification system
JPS597120B2 (en) * 1978-11-24 1984-02-16 日本電気株式会社 speech analysis device
JPS5672499A (en) * 1979-11-19 1981-06-16 Hitachi Ltd Pretreatment for voice identifier
US4720802A (en) * 1983-07-26 1988-01-19 Lear Siegler Noise compensation arrangement
JPS62204652A (en) * 1986-03-04 1987-09-09 Nec Corp Audible frequency signal identification system
US4815137A (en) * 1986-11-06 1989-03-21 American Telephone And Telegraph Company Voiceband signal classification
FR2623382B1 (en) * 1987-11-24 1991-05-03 Peugeot Cycles DEVICE FOR FIXING A COVERING, IN PARTICULAR A SEAT COVERING
US5276765A (en) * 1988-03-11 1994-01-04 British Telecommunications Public Limited Company Voice activity detection
US5410632A (en) * 1991-12-23 1995-04-25 Motorola, Inc. Variable hangover time in a voice activity detector

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0123349A1 (en) * 1983-04-20 1984-10-31 Philips Electronics Uk Limited Apparatus for distinguishing between speech and certain other signals
EP0335521A1 (en) * 1988-03-11 1989-10-04 BRITISH TELECOMMUNICATIONS public limited company Voice activity detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
K.S. RAFILA ET AL.: "Voiced/Unvoiced/Mixed excitation classification of speech using the autocorrelation of the output of an adpcm system", IEEE INTERNATIONAL CONFERENCE ON SYSTEMS ENGINEERING, 24 August 1989 (1989-08-24), FAIRBORN,OHIO, pages 537 - 540, XP000089110 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19716862A1 (en) * 1997-04-22 1998-10-29 Deutsche Telekom Ag Voice activity detection
US6374211B2 (en) 1997-04-22 2002-04-16 Deutsche Telekom Ag Voice activity detection method and device

Also Published As

Publication number Publication date
DE69511508D1 (en) 1999-09-23
FR2727236B1 (en) 1996-12-27
FI955584A (en) 1996-05-23
DE69511508T2 (en) 2000-07-06
ES2136815T3 (en) 1999-12-01
AU3793795A (en) 1996-05-30
EP0714088B1 (en) 1999-08-18
FI955584A0 (en) 1995-11-20
US5732141A (en) 1998-03-24
CA2163295A1 (en) 1996-05-23
FR2727236A1 (en) 1996-05-24
ATE183598T1 (en) 1999-09-15
AU698712B2 (en) 1998-11-05
JPH08221097A (en) 1996-08-30

Similar Documents

Publication Publication Date Title
EP0127718B1 (en) Process for activity detection in a voice transmission system
EP1730729A1 (en) Improved voice signal conversion method and system
KR100269216B1 (en) Pitch determination method with spectro-temporal auto correlation
EP1730728A1 (en) Method and system for the quick conversion of a voice signal
FR2522179A1 (en) METHOD AND APPARATUS FOR RECOGNIZING WORDS FOR RECOGNIZING PARTICULAR PHONEMES OF THE VOICE SIGNAL WHATEVER THE PERSON WHO SPEAKS
EP0867856A1 (en) Method and apparatus for vocal activity detection
EP0714088B1 (en) Voice activity detection
EP0234993B1 (en) Method and device for automatic target recognition starting from doppler echos
EP1606792B1 (en) Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method
EP3192073B1 (en) Discrimination and attenuation of pre-echoes in a digital audio signal
EP0490740A1 (en) Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders
EP0506535A1 (en) Method and system for processing of pre-echos of a frequency transform coded digital audio signal
WO2013093291A1 (en) Method of detecting a predetermined frequency band in an audio data signal, detection device and computer program corresponding thereto
EP1039736B1 (en) Method and device for adaptive identification and related adaptive echo canceller
FR2797343A1 (en) METHOD AND DEVICE FOR DETECTING VOICE ACTIVITY
EP0534837A1 (en) Speech processing method in presence of acoustic noise using non-linear spectral subtraction and hidden Markov models
FR2905489A1 (en) PHASE ESTIMATION PROCESS FOR SINUSOIDAL MODELING OF A DIGITAL SIGNAL.
EP0616315A1 (en) Digital speech coding and decoding device, process for scanning a pseudo-logarithmic LTP codebook and process of LTP analysis
JP2932996B2 (en) Harmonic pitch detector
EP0015363B1 (en) Speech detector with a variable threshold level
EP1605440B1 (en) Method for signal source separation from a mixture signal
US20020123886A1 (en) Noise spectrum subtraction method and system
EP0821345B1 (en) Method to determine the fundamental frequency of a speech signal
EP1192618B1 (en) Audio coding with adaptive liftering
EP1192621B1 (en) Audio encoding with harmonic components

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LI NL SE

17P Request for examination filed

Effective date: 19961011

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 19981203

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ALCATEL

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH DE DK ES FR GB IT LI NL SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 19990818

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ALCATEL

REF Corresponds to:

Ref document number: 183598

Country of ref document: AT

Date of ref document: 19990915

Kind code of ref document: T

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 69511508

Country of ref document: DE

Date of ref document: 19990923

REG Reference to a national code

Ref country code: CH

Ref legal event code: NV

Representative=s name: CABINET ROLAND NITHARDT CONSEILS EN PROPRIETE INDU

ITF It: translation for a ep patent filed

Owner name: JACOBACCI & PERANI S.P.A.

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 19990928

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 19991118

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 19991130

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2136815

Country of ref document: ES

Kind code of ref document: T3

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
BERE Be: lapsed

Owner name: ALCATEL

Effective date: 19991130

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: CH

Payment date: 20011016

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: AT

Payment date: 20011026

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 20011105

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20011119

Year of fee payment: 7

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20021117

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20021118

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20021118

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20021130

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20021130

EUG Se: european patent has lapsed
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20031213

REG Reference to a national code

Ref country code: FR

Ref legal event code: CD

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20071123

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20071126

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20071120

Year of fee payment: 13

Ref country code: FR

Payment date: 20071122

Year of fee payment: 13

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20081117

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081117

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20090731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090603

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081117

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081130