EP0490740A1 - Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders - Google Patents

Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders Download PDF

Info

Publication number
EP0490740A1
EP0490740A1 EP91403309A EP91403309A EP0490740A1 EP 0490740 A1 EP0490740 A1 EP 0490740A1 EP 91403309 A EP91403309 A EP 91403309A EP 91403309 A EP91403309 A EP 91403309A EP 0490740 A1 EP0490740 A1 EP 0490740A1
Authority
EP
European Patent Office
Prior art keywords
signal
autocorrelation
pitch
values
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP91403309A
Other languages
German (de)
French (fr)
Inventor
Pierre-André Laurent
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thales SA
Original Assignee
Thomson CSF SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson CSF SA filed Critical Thomson CSF SA
Publication of EP0490740A1 publication Critical patent/EP0490740A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the present invention relates to a method and a device for evaluating the periodicity and the voicing of the speech signal in vocoders at very low bit rate.
  • the speech signal is divided into frames of 20 and 30 ms so as to determine within them the periodicity still designated by "Pitch" in the English language, of the signal of speech.
  • this period is not stable, and errors occur in the estimation of the "Pitch” and consequently of the voicing in these parts.
  • the evaluation of the "pitch” is then strongly disturbed or even erroneous.
  • the object of the invention is to overcome the aforementioned drawbacks.
  • the subject of the invention is a method for evaluating the periodicity and the voicing of the speech signal in vocoders at very low bit rate, characterized in that it consists in carrying out a first processing consisting in cutting up after sampling. the signal in frames of fixed duration, to perform a first self-adaptive filtering of the sampled signal (Sn) obtained in each frame to limit the influence of the first forming, to perform a second filtering to keep only a minimum of harmonics of the fundamental frequency, and to compare the signal obtained with respect to two adaptive thresholds SfMin (n) and SfMax (n) respectively positive and negative and evolutionary as a function of time according to a predetermined law to retain only the signal portions which are respectively higher and lower than the two thresholds and to carry out a second processing on the signal Scc (n) obtained at the end of the first processing, consi being at calculate on a predetermined number of fundamental frequencies or "Pitch" M has the autocorrelation of the signal obtained at the end of the first processing from a determined sampling instant No and
  • the invention consists in making, in a given frame, several "Pitch" estimates at regular intervals and in favoring successive estimates which have similar values, by giving a quality factor to each estimate.
  • the quality factor has a maximum value when the signal is perfectly periodic and a lower value if its periodicity is less marked.
  • the voicing is directly linked to the autocorrelation of the speech signal for a delay equal to the value of "Pitch" retained, for a sound voiced autocorrelation is maximum while it is weak for an unvoiced sound.
  • the voicing indication is obtained by comparing the autocorrelation with thresholds after having performed time smoothing and hysteresis operations to avoid erroneous transitions from the voiced state to the unvoiced state or vice versa.
  • the “Pitch” determination method comprises two main processing steps, a preprocessing step represented by the flow diagram of FIG. 1 and an autocorrelation calculation step, these two steps being able to be easily programmed on any processing processor of the known signal.
  • the pretreatment step is broken down as shown in FIG. 1 into a self-adaptive filtering step 1 followed by a low-pass filtering step 2 and a "self-adaptive clipping step 3".
  • the sampled speech signal is first whitened by a self-adaptive filter of order not too high, equal to 4 for example, so as to limit the influence of the first forming.
  • S (n) represents the n th speech sample and A i (n) is the value of the i th coefficient
  • the signal S b (n) is then applied in step 2 to the input of a low pass filter whose role is both to keep only a minimum of fundamental harmonics and to reduce the band of signal frequency to then perform subsampling in order to reduce the execution time of the autocorrelation calculations which are described below.
  • the last preprocessing which is carried out in step 3 transforms the signal Sf (n) into a signal Scc (n) by a self-adaptive clipping method of the type still known under the designation "center clipping" in the English language. -Saxon. It has the effect of reinforcing the temporal differences of the filtered signal.
  • step 3 In the case for example, where the signal Sf (n) contains very little fundamental at a frequency F o and a lot of harmonic 2, the waveform which is obtained at the end of step 3 is then close to d '' a sinusoid of frequency 2.F o presenting a slight distortion every two periods.
  • This pretreatment of step 3 then has the effect of further reinforcing this distortion to make it easier to calculate the "Pitch" performed subsequently.
  • This preprocessing consists, as shown in FIGS. 2A and 2B, of calculating two adaptive thresholds, SfMin (n) and SfMax (n), which change over time, so as to retain only the signal portions which are respectively lower and greater than these two thresholds.
  • SfMin (n) E.SfMin (n-1)
  • G represents a gain value which is preferably chosen constant to improve the calculation accuracy for the case where a signal processor working in fixed point is used.
  • the following autocorrelation calculation step is carried out for each value M of the "Pitch" for a determined position Sampling no.
  • the calculation takes place by means of a sub-sampling of a factor of 4 over a time range of 160 samples corresponding to a maximum value which can be accepted for the "pitch". It is obvious that the same principle can still apply for a different sampling order and for another different range.
  • the calculation consists, as represented by steps 4 to 6 on the flow diagram of FIG. 3, of calculating three quantities Roo, RMM and ROM defined as follows, in which the sign ** denotes an increase in power.
  • Roo Scc (No) ** 2 + Scc (No + 4) ** 2 + Scc (No + 8) ** 2 + ... + Scc (No + 160) ** 2
  • the quantity Roo is calculated in step 4 only once, the quantity RMM is calculated entirely in step 5 only for certain values of M and by iteration for the others, and the ROM quantity is calculated, in full in step 5 for each value of M.
  • RMM (M) RMM (M-4) + Scc (No-M) ** 2-Scc (No + 164-M) ** 2
  • the continuation of the processing consists in keeping up to date a scoreboard associated with the different possible values for the "Pitch" M.
  • the Pitch value M retained for the NO position is that corresponding to the maximum of the scoreboard, ScoreMax, located at the Imax index in this table.
  • ScoreMax namely Score (Imax), Score (Imax + 1), Score (Imax + dI)
  • the value retained for the Pitch is that which corresponds to Imax + [dI / 2], [dI / 2] being the integer value of the division dI by 2, as indicated in figure 4.
  • the final value of the Pitch is that obtained at the last iteration, it being understood that there are between 2 and 4 iterations per frame.
  • the Pitch value M which is thus obtained corresponds to the most likely periodicity of the speech signal centered around the position N o with a resolution of 1, 2 or 4 depending on the range where the value of M. is located. voicing is then calculated by performing an autocorrelation, normalized for a delay equal to M and possibly for neighboring values if the resolution is greater than 1, of the original speech signal S (n) and not on the pretreated signal Scc (n) as for the Pitch calculation.
  • Roo S (N o ) ** 2 + S (N o +1) ** 2 + ... + S (N o +160) ** 2
  • the signal S (n) is not undersampled.
  • the quantity Roo does not depend on M and is calculated only once. It is possible to be satisfied with calculating RMM for the only nominal value of M, namely that provided by the Pitch calculation method described above and for values close to M to calculate RMM by iteration if necessary. However, the ROM quantity must be calculated for each of the values of M.
  • Rf (P) [R m (P-1) + 2R m (P) + R m (P + 1)] / 4
  • the quantity Rf (P) is compared as shown in FIG. 5 to two thresholds S V and S NV called respectively the voicing and non-voicing threshold such that the threshold S V is greater than the threshold S NV to obtain a binary indicator voicing IV as shown in Figure 5.
  • these thresholds be adjustable to give a certain inertia to the decision which is not perceptible at hearing to avoid local errors in the appreciation of the voicing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The method consists in splitting the speech signal, after sampling, into frames of defined duration, in carrying out a first auto-adaptive filtering (1) of the sampled signal (Sn) obtained in each frame in order to limit the influence of the first formant, in order to carry out a second filtering (2) in order to keep only a minimum of harmonics of the fundamental frequency, and in comparing (3) the signal obtained with respect to two adaptive thresholds SfMin(n) and SfMax(n), respectively positive and negative, and evolving as a function of time according to a predetermined law in order to retain only the signal portions which are respectively greater and less than the two thresholds. It next consists in calculating over a predetermined possible number of fundamental frequencies or "Pitch" M the autocorrelation of the signal obtained at the end of the preceding processing from a defined sampling instant No and in adopting for candidate values of "Pitch" M or of fundamental frequency those which are equal in number to a predetermined number n and which correspond to maxima of the autocorrelation and in applying the corresponding values from the autocorrelation into a table of scores updated on each new autocorrelation in order to retain as "Pitch" value only that which corresponds to a maximum score. <??>Application: low-bit rate vocoders. <IMAGE>

Description

La présente invention concerne un procédé et un dispositif pour l'évaluation de la périodicité et du voisement du signal de parole dans les vocodeurs à très bas débit.The present invention relates to a method and a device for evaluating the periodicity and the voicing of the speech signal in vocoders at very low bit rate.

Dans les vocodeurs à bas débit connus, le signal de parole est découpé en trames de 20 et 30 ms de façon à déterminer à l'intérieur de celles-ci la périodicité encore désignée par "Pitch" dans le langage anglo-saxon, du signal de parole. Cependant lors des transitions cette période n'est pas stable, et des erreurs se produisent dans l'estimation du "Pitch" et par voie de conséquence du voisement dans ces parties-là. Par ailleurs, si le signal de parole est fortement bruité par du bruit ambiant, l'évaluation du "Pitch" est alors fortement perturbée voire même erronée.In known low bit rate vocoders, the speech signal is divided into frames of 20 and 30 ms so as to determine within them the periodicity still designated by "Pitch" in the English language, of the signal of speech. However during the transitions this period is not stable, and errors occur in the estimation of the "Pitch" and consequently of the voicing in these parts. Furthermore, if the speech signal is strongly noised by ambient noise, the evaluation of the "pitch" is then strongly disturbed or even erroneous.

Le but de l'invention est de pallier les inconvénients précités.The object of the invention is to overcome the aforementioned drawbacks.

A cet effet, l'invention a pour objet un procédé pour l'évaluation de la périodicité et du voisement du signal de parole dans les vocodeurs à très bas débit caractérisé en ce qu'il consiste à effectuer un premier traitement consistant à découper après échantillonnage le signal en trames de durée déterminée, à effectuer un premier filtrage auto-adaptatif du signal échantillonné (Sn) obtenu dans chaque trame pour limiter l'influence du premier formant, à effectuer un deuxième filtrage pour ne conserver qu'un minimum d'harmoniques de la fréquence fondamentale, et à comparer le signal obtenu par rapport à deux seuils adaptatifs SfMin(n) et SfMax(n) respectivement positif et négatif et évolutifs en fonction du temps suivant une loi prédéterminée pour ne retenir que les portions de signal qui sont respectivement supérieures et inférieures aux deux seuils et à effectuer un deuxième traitement sur le signal Scc(n) obtenu à la fin du premier traitement, consistant à calculer sur un nombre prédéterminé de fréquences fondamentales ou "Pitch" M possède l'autocorrélation du signal obtenu à la fin du premier traitement à partir d'un instant d'échantillonnage déterminée No et à retenir pour valeurs de "Pitch" M ou de fréquence fondamentale candidates celles en nombre égale à un nombre n prédéterminé qui correspondent à des maxima de l'autocorrélation et à consigner les valeurs correspondantes de l'autocorrélation dans un tableau de scores mis à jour à chaque nouvelle autocorrélation pour ne retenir comme valeur de "Pitch" que celle qui correspond à un score maximal.To this end, the subject of the invention is a method for evaluating the periodicity and the voicing of the speech signal in vocoders at very low bit rate, characterized in that it consists in carrying out a first processing consisting in cutting up after sampling. the signal in frames of fixed duration, to perform a first self-adaptive filtering of the sampled signal (Sn) obtained in each frame to limit the influence of the first forming, to perform a second filtering to keep only a minimum of harmonics of the fundamental frequency, and to compare the signal obtained with respect to two adaptive thresholds SfMin (n) and SfMax (n) respectively positive and negative and evolutionary as a function of time according to a predetermined law to retain only the signal portions which are respectively higher and lower than the two thresholds and to carry out a second processing on the signal Scc (n) obtained at the end of the first processing, consi being at calculate on a predetermined number of fundamental frequencies or "Pitch" M has the autocorrelation of the signal obtained at the end of the first processing from a determined sampling instant No and to be retained for values of "Pitch" M or frequency fundamental candidates those in number equal to a predetermined number n which correspond to maxima of the autocorrelation and to record the corresponding values of the autocorrelation in a scoreboard updated with each new autocorrelation so as not to retain as value of "Pitch "than that which corresponds to a maximum score.

D'autres caractéristiques et avantages de l'invention apparaîtront ci-après à l'aide de la description qui suit faite en regard des dessins annexés qui représentent :

  • la figure 1 un organigramme figurant un prétraitement du signal de parole mis en oeuvre par l'invention ;
  • la figure 2 des exemples d'évolution du signal filtré et du signal final obtenu en fin de la chaîne de prétraitement de la figure 1 ;
  • la figure 3 un organigramme pour le calcul de K valeurs candidates pour la détermination du "Pitch" selon l'invention ;
  • la figure 4 un diagramme pour illustrer un mode de détermination du "Pitch" à partir d'un tableau de coefficients représentants différentes valeurs possibles de "Pitch" ;
  • la figure 5 un diagramme illustrant le fonctionnement d'un indicateur de voisement.
Other characteristics and advantages of the invention will appear below with the aid of the description which follows given with reference to the appended drawings which represent:
  • Figure 1 a flowchart showing a preprocessing of the speech signal implemented by the invention;
  • FIG. 2 of the examples of evolution of the filtered signal and of the final signal obtained at the end of the preprocessing chain of FIG. 1;
  • FIG. 3 a flow diagram for the calculation of K candidate values for the determination of the "pitch" according to the invention;
  • FIG. 4 a diagram to illustrate a mode of determining the "Pitch" from a table of coefficients representing different possible values of "Pitch";
  • FIG. 5 is a diagram illustrating the operation of a voicing indicator.

Dans son principe l'invention consiste à faire, dans une trame donnée, plusieurs estimations du "Pitch" à intervalles réguliers et à favoriser les estimations successives qui ont des valeurs voisines, en donnant un facteur de qualité à chaque estimation. Le facteur de qualité possède une valeur maximale lorsque le signal est parfaitement périodique et une valeur plus faible si sa périodicité est moins marquée. Comme le voisement est directement lié à l'autocorrélation du signal de parole pour un retard égale à la valeur du "Pitch" retenue, pour un son voisé l'autocorrélation est maximale alors qu'elle est faible pour un son non voisé. L'indication de voisement est obtenue en comparant l'autocorrélation à des seuils après avoir effectué des opérations de lissage temporel et d'hystérésis pour éviter des transitions erronés de l'état voisé à l'état non voisé ou vice-versa.In principle, the invention consists in making, in a given frame, several "Pitch" estimates at regular intervals and in favoring successive estimates which have similar values, by giving a quality factor to each estimate. The quality factor has a maximum value when the signal is perfectly periodic and a lower value if its periodicity is less marked. As the voicing is directly linked to the autocorrelation of the speech signal for a delay equal to the value of "Pitch" retained, for a sound voiced autocorrelation is maximum while it is weak for an unvoiced sound. The voicing indication is obtained by comparing the autocorrelation with thresholds after having performed time smoothing and hysteresis operations to avoid erroneous transitions from the voiced state to the unvoiced state or vice versa.

Le procédé de détermination des "Pitch" comporte deux étapes principales de traitement, une étape de prétraitement représentée par l'organigramme de la figure 1 et une étape de calcul d'autocorrélation, ces deux étapes pouvant être aisément programmées sur tout processeur de traitement du signal connu.The “Pitch” determination method comprises two main processing steps, a preprocessing step represented by the flow diagram of FIG. 1 and an autocorrelation calculation step, these two steps being able to be easily programmed on any processing processor of the known signal.

L'étape de prétraitement se décompose de la manière représentée à la figure 1 en une étape de filtrage auto-adaptatif 1 suivie d'une étape de filtrage passe bas 2 et d'une étape de d'écrêtage" auto-adaptatif 3.The pretreatment step is broken down as shown in FIG. 1 into a self-adaptive filtering step 1 followed by a low-pass filtering step 2 and a "self-adaptive clipping step 3".

Dans l'étape de filtrage auto-adaptatif 1 le signal de parole échantillonné est d'abord blanchi par un filtre auto-adaptatif d'ordre pas trop élevé, égal à 4 par exemple, de façon à limiter l'influence du premier formant. Si S(n) représente le néme échantillon de parole et Ai(n) est la valeur du iéme coefficient le signal Sb(n) obtenu à la sortie du filtre auto-adaptatif est un signal de la forme : Sb(n)=S(n)-A 1(n) .S(n-1)-A 2(n) .S(n-2)-A 3(n) .S(n-3)-A 4(n) .S(n-4)

Figure imgb0001
et l'adaptation des coefficients Ai(n) est obtenue par application d'une relation de la forme : Ai(n+1) = Ai(n) + Eps. Signe(Sb(n)xS (n-i))
Figure imgb0002
ou Eps est une constante de faible de valeur par exemple égale à 1/128.In the self-adaptive filtering step 1, the sampled speech signal is first whitened by a self-adaptive filter of order not too high, equal to 4 for example, so as to limit the influence of the first forming. If S (n) represents the n th speech sample and A i (n) is the value of the i th coefficient the signal Sb (n) obtained at the output of the self-adaptive filter is a signal of the form: Sb (n) = S (n) -A 1 (n) .S (n-1) -A 2 (n) .S (n-2) -A 3 (n) .S (n-3) -A 4 (n) .S (n-4)
Figure imgb0001
and the adaptation of the coefficients Ai (n) is obtained by applying a relation of the form: Ai (n + 1) = Ai (n) + Eps. Sign (Sb (n) xS (ni))
Figure imgb0002
or Eps is a low constant of value for example equal to 1/128.

Le signal Sb(n) est ensuite appliqué à l'étape 2 à l'entrée d'un filtre passe bas dont le rôle est à la fois de ne garder qu'un minimum d'harmoniques du fondamental et de réduire la bande de fréquence du signal pour ensuite effectuer un sous échantillonnage dans le but de réduire le temps d'exécution des calculs d'autocorrélation qui sont décrits par la suite.The signal S b (n) is then applied in step 2 to the input of a low pass filter whose role is both to keep only a minimum of fundamental harmonics and to reduce the band of signal frequency to then perform subsampling in order to reduce the execution time of the autocorrelation calculations which are described below.

Le signal filtré Sf(n) qui est ainsi obtenu peut revêtir une équation de la forme : Sf(n) = [Sb(n)+Sb(n-9)+3((Sb(n-1)+Sb(n-8))+6(Sb(n-2)+Sb(n-7))+9(Sb(n-3)+Sb(n-6))+11(Sb(n-4)+Sb(n-5))]./64

Figure imgb0003
ou de toute autre forme similaire capable de conférer au filtre passe bas une fréquence de coupure de l'ordre de 800 HZ, et une atténuation suffisante des fréquences au-delà de 1 000 HZ.The filtered signal Sf (n) which is thus obtained can take an equation of the form: Sf (n) = [Sb (n) + Sb (n-9) +3 ((Sb (n-1) + Sb (n-8)) + 6 (Sb (n-2) + Sb (n-7 )) + 9 (Sb (n-3) + Sb (n-6)) + 11 (Sb (n-4) + Sb (n-5))] ./ 64
Figure imgb0003
or any other similar form capable of giving the low-pass filter a cut-off frequency of the order of 800 HZ, and a sufficient attenuation of the frequencies beyond 1000 HZ.

Le dernier prétraitement qui est effectué à l'étape 3, transforme le signal Sf(n) en un signal Scc(n) par un procédé d'écrêtage auto-adaptatif du type encore connu sous la désignation "center clipping" dans le langage anglo-saxon. Il a pour effet de renforcer les différences temporelles du signal filtré.The last preprocessing which is carried out in step 3, transforms the signal Sf (n) into a signal Scc (n) by a self-adaptive clipping method of the type still known under the designation "center clipping" in the English language. -Saxon. It has the effect of reinforcing the temporal differences of the filtered signal.

Dans le cas par exemple, où le signal Sf(n) contient fort peu de fondamental à une fréquence Fo et beaucoup d'harmonique 2, la forme d'onde qui est obtenue à la fin de l'étape 3 est alors proche d'une sinusoïde de fréquence 2.Fo présentant une légère distorsion toutes les deux périodes. Ce prétraitement de l'étape 3 a alors pour effet de renforcer encore cette distorsion pour rendre plus aisé le calcul du "Pitch" effectué par la suite. Ce prétraitement consiste comme le montre les figures 2A et 2B à calculer deux seuils adaptatifs, SfMin(n) et SfMax(n), évolutifs au cours du temps, pour ne retenir que les portions de signal qui sont respectivement inférieures et supérieures à ces deux seuils.In the case for example, where the signal Sf (n) contains very little fundamental at a frequency F o and a lot of harmonic 2, the waveform which is obtained at the end of step 3 is then close to d '' a sinusoid of frequency 2.F o presenting a slight distortion every two periods. This pretreatment of step 3 then has the effect of further reinforcing this distortion to make it easier to calculate the "Pitch" performed subsequently. This preprocessing consists, as shown in FIGS. 2A and 2B, of calculating two adaptive thresholds, SfMin (n) and SfMax (n), which change over time, so as to retain only the signal portions which are respectively lower and greater than these two thresholds.

Les seuils SfMin(n) et SfMax(n) vérifient les relations : SfMin(n) = E.SfMin(n-1)

Figure imgb0004
SfMax(n) = E.SfMax(n-1)
Figure imgb0005
avec E = exp (-Te/Tau)
Figure imgb0006
où Te est la période d'échantillonnage et Tau est une constante de temps de l'ordre de 5 à 10 ms.The thresholds SfMin (n) and SfMax (n) check the relationships: SfMin (n) = E.SfMin (n-1)
Figure imgb0004
SfMax (n) = E.SfMax (n-1)
Figure imgb0005
with E = exp (-Te / Tau)
Figure imgb0006
where Te is the sampling period and Tau is a time constant of the order of 5 to 10 ms.

Il résulte de ce qui précède que le signal Scc(n) obtenu en fin d'exécution de l'étape 3 est toujours d'amplitude nulle sauf pour : SfMax(n)<Sf(n)<SfMin(n)

Figure imgb0007
Si Sf(n)>SfMax(n) alors la différence Sf(n)-SfMax(n) est amplifiée pour donner un signal Scc(n) défini suivant la relation : Scc(n)=G[Sf(n)-SfMax(n)]
Figure imgb0008
Dans ce cas l'ancienne valeur de SfMax(n) est actualisée par la nouvelle valeur de Sf(n) et SfMax(n) est rendue égal à Sf(n). Par contre si Sf(n)<SfMin(n) c'est la différence Sf(n)-SfMin(n) qui est amplifiée pour donner un signal
Scc(n) défini suivant la relation : Scc(n)=G[Sf(n)-SfMin(n)]
Figure imgb0009
et l'ancienne valeur de SfMin(n)=Sf(n) est actualisée par la nouvelle valeur de Sf(n).It follows from the above that the signal Scc (n) obtained at the end of the execution of step 3 is always of zero amplitude except for: SfMax (n) <Sf (n) <SfMin (n)
Figure imgb0007
If Sf (n)> SfMax (n) then the difference Sf (n) -SfMax (n) is amplified to give a signal Scc (n) defined according to the relation: Scc (n) = G [Sf (n) -SfMax (n)]
Figure imgb0008
In this case the old value of SfMax (n) is updated with the new value of Sf (n) and SfMax (n) is made equal to Sf (n). On the other hand if Sf (n) <SfMin (n) it is the difference Sf (n) -SfMin (n) which is amplified to give a signal
Scc (n) defined according to the relation: Scc (n) = G [Sf (n) -SfMin (n)]
Figure imgb0009
and the old value of SfMin (n) = Sf (n) is updated by the new value of Sf (n).

Dans les relations (7) et (8) G représente une valeur de gain qui est choisie de préférence constante pour améliorer la précision de calcul pour le cas où un processeur de signal travaillant en virgule fixe serait utilisé.In equations (7) and (8) G represents a gain value which is preferably chosen constant to improve the calculation accuracy for the case where a signal processor working in fixed point is used.

Pour le cas où dans les relations précédentes la valeur de la constante de temps Tau est choisie nulle, il va de soi que le signal Scc(n) est identique au signal Sf(n).For the case where in the previous relationships the value of the time constant Tau is chosen to be zero, it goes without saying that the signal Scc (n) is identical to the signal Sf (n).

L'étape de calcul d'autocorrélation qui suit est effectuée pour chaque valeur M du "Pitch" pour une position déterminée No d'échantillonnage. Dans la description qui suit le calcul a lieu au moyen d'un sous échantillonnage d'un facteur 4 sur une plage temporelle de 160 échantillons correspondant à une valeur maximale qui peut être admise pour le "Pitch". Il est bien évident que le même principe peut encore s'appliquer pour un ordre d'échantillonnage différent et sur une autre plage différente.The following autocorrelation calculation step is carried out for each value M of the "Pitch" for a determined position Sampling no. In the description which follows, the calculation takes place by means of a sub-sampling of a factor of 4 over a time range of 160 samples corresponding to a maximum value which can be accepted for the "pitch". It is obvious that the same principle can still apply for a different sampling order and for another different range.

Le calcul consiste comme représenté par les étapes 4 à 6 sur l'organigramme de la figure 3, à calculer trois quantités Roo, RMM et ROM définies comme suit, dans lesquelles le signe ** désigne une élévation de puissance. Roo=Scc(No)**2+Scc(No+4)**2+Scc(No+8)**2+...+Scc(No+160)**2

Figure imgb0010
RMM=Scc(No-M)**2+Scc(No+4-M)**2+Scc(No+8-M)+...+Scc(No+-160-M)**2
Figure imgb0011
ROM=Scc(No).Scc(No-M)+Scc(No+4).Scc(No+4-M)+...+Scc(No+1-60).Scc(No+160-M)
Figure imgb0012
The calculation consists, as represented by steps 4 to 6 on the flow diagram of FIG. 3, of calculating three quantities Roo, RMM and ROM defined as follows, in which the sign ** denotes an increase in power. Roo = Scc (No) ** 2 + Scc (No + 4) ** 2 + Scc (No + 8) ** 2 + ... + Scc (No + 160) ** 2
Figure imgb0010
RMM = Scc (No-M) ** 2 + Scc (No + 4-M) ** 2 + Scc (No + 8-M) + ... + Scc (No + -160-M) ** 2
Figure imgb0011
ROM = Scc (No) .Scc (No-M) + Scc (No + 4) .Scc (No + 4-M) + ... + Scc (No + 1-60) .Scc (No + 160-M )
Figure imgb0012

Pour chaque position No, choisie la quantité Roo n'est calculée à l'étape 4 qu'une seule fois, la quantité RMM est calculée intégralement à l'étape 5 que pour certaines valeurs de M et par itération pour les autres, et la quantité ROM est calculée, intégralement à l'étape 5 pour chaque valeur de M.For each position No, chosen the quantity Roo is calculated in step 4 only once, the quantity RMM is calculated entirely in step 5 only for certain values of M and by iteration for the others, and the ROM quantity is calculated, in full in step 5 for each value of M.

Les valeurs de M pour lesquelles le calcul d'autocorrélation a lieu correspondent à une fréquence fondamentale du signal de parole pouvant évoluer entre 50 HZ et 400 HZ. Celles-ci sont délerminées sur trois plages définies comme suit :
   Plage 1 M=20, 21, 22 ...... 40 soit 21 valeurs au pas 1
   Plage 2 M=42, 44, 46 ...... 80 soit 20 valeurs au pas 2
   Plage 3 M=84, 88, 92 ...... 160 soit 20 valeurs au pas 4
soit un total de 61 valeurs différentes pouvant être codées par exemple sur 6 bits avec une précision minimale de 5 % correspondant à la valeur d'un demi-ton de la gamme chromatique.
The values of M for which the autocorrelation calculation takes place correspond to a fundamental frequency of the speech signal which can vary between 50 HZ and 400 HZ. These are delimited on three ranges defined as follows:
Range 1 M = 20, 21, 22 ...... 40 i.e. 21 values in step 1
Range 2 M = 42, 44, 46 ...... 80 i.e. 20 values in step 2
Range 3 M = 84, 88, 92 ...... 160 i.e. 20 values in step 4
that is to say a total of 61 different values which can be coded for example on 6 bits with a minimum precision of 5% corresponding to the value of a semitone of the chromatic range.

La formule d'itération utilisée pour le calcul RMM est la suivante : RMM(M)=RMM(M-4)+Scc(No-M)**2-Scc(No+164-M)**2

Figure imgb0013
The iteration formula used for the RMM calculation is as follows: RMM (M) = RMM (M-4) + Scc (No-M) ** 2-Scc (No + 164-M) ** 2
Figure imgb0013

Par ailleurs, pour améliorer la précision de recherche des maxima de l'autocorrélation, on utilise une formule d'interpolation parabolique qui, pour une valeur M donnée, utilise les valeurs des quantités précédentes pour M-dM, M, et M+dM, dM étant une valeur de pas égale à 1, 2 ou 4 suivant la plage considérée. Il en résulte que seules les valeurs de RMM (19), RMM (20), RMM (21), et RMM (22) sont à calculer intégralement, les autres le sont par itération, y compris pour M=164.Furthermore, to improve the accuracy of finding the autocorrelation maxima, a parabolic interpolation formula is used which, for a given value M, uses the values of the preceding quantities for M-dM, M, and M + dM, dM being a step value equal to 1, 2 or 4 depending on the range considered. It follows that only the values of RMM (19), RMM (20), RMM (21), and RMM (22) are to be fully calculated, the others are iterated, including for M = 164.

En fonction de ce qui précède on procède au calcul d'une valeur : Rau (M) définie comme suit : Rau(M) = 0   si ROM(M)< = 0

Figure imgb0014
et Rau(M) = ROM(M)**2/[ROO(M).RMM(M)] si ROM(M)> 0Depending on the above, we calculate a value: Rau (M) defined as follows: Rau (M) = 0 if ROM (M) <= 0
Figure imgb0014
and Rau (M) = ROM (M) ** 2 / [ROO (M) .RMM (M)] if ROM (M)> 0

Seules les valeurs de M pour lesquelles un maximum local est obtenu, à savoir celles pour lesquelles Rau(M) vérifie les inégalités : Rau(M) > Rau(M-dM) et Rau(M) > = Rau (M+dM)

Figure imgb0015
sont prises en considération à l'étape 6. Pour ces seules valeurs de M il est calculé ensuite une valeur Rint interpolée paraboliquement, suivant la relation : Rint = Rau(M) + 1/8 [Rau(M+dM) - Rau(MdM)]**2 / [2.Rau(M) - Rau(M-dM) - Rau(M+dM)]
Figure imgb0016
pour ne retenir dans la suite des traitements que les K valeurs correspondant aux K valeurs les plus élevées de Rint (et les valeurs associées de M), par exemple les K=2 maxima les plus importants, notés Rmax (1) , ..., Rmax (K) (et Mmax (1), ..., Mmax (K)).Only the values of M for which a local maximum is obtained, namely those for which Rau (M) checks the inequalities: Rau (M)> Rau (M-dM) and Rau (M)> = Rau (M + dM)
Figure imgb0015
are taken into consideration in step 6. For these values of M only, a Rint value is parabolically interpolated, according to the relation: Rint = Rau (M) + 1/8 [Rau (M + dM) - Rau (MdM)] ** 2 / [2.Rau (M) - Rau (M-dM) - Rau (M + dM)]
Figure imgb0016
to retain in the following processing only the K values corresponding to the K highest values of Rint (and the associated values of M), for example the most important K = 2 maxima, denoted Rmax (1), ... , Rmax (K) (and Mmax (1), ..., Mmax (K)).

La suite du traitement consiste à tenir à jour un tableau de scores associés aux différentes valeurs possibles pour le "Pitch" M.The continuation of the processing consists in keeping up to date a scoreboard associated with the different possible values for the "Pitch" M.

Ce tableau, noté Score (i), sur la figure 4 contient pour les i=1 a 61 valeurs M de "Pitch" une quantité qui est fonction croissante du degré de vraisemblance du "Pitch" associé (de 20 à 160), et qui est mise à jour à chaque nouvelle évaluation des autocorrélations (typiquement toutes les 5 à 10 ms), en tenant compte du fait que, d'une évaluation à la suivante, les positions des maxima peuvent varier de plus une unité, rester stationnaires ou varier de moins une unité suivant que le "Pitch" est respectivement croissant, stationnaire, ou décroissant.This table, denoted Score (i), in FIG. 4 contains for the i = 1 to 61 M values of "Pitch" an amount which is an increasing function of the degree of likelihood of the associated "Pitch" (from 20 to 160), and which is updated with each new evaluation of the autocorrelations (typically every 5 to 10 ms), taking into account the fact that, from one evaluation to the next, the positions of the maxima can vary by more than one unit, remain stationary or vary by at least one unit depending on whether the "Pitch" is respectively increasing, stationary, or decreasing.

Le tableau des scores est transféré dans un tableau temporaire, note ExScore (i) non représenté. Ce tableau est défini en fonction des valeurs de i de la façon suivante : ExScore (0) = 0

Figure imgb0017
ExScore (i) = Score (i) pour i = 2
Figure imgb0018
et ExScore (62) = 0
Figure imgb0019
The scoreboard is transferred to a temporary table, ExScore (i) note not shown. This table is defined according to the values of i as follows: ExScore (0) = 0
Figure imgb0017
ExScore (i) = Score (i) for i = 2
Figure imgb0018
and ExScore (62) = 0
Figure imgb0019

Périodiquement ( sinon systématiquement ), la valeur minimale est retirée pour éviter d'éventuels débordements de façon telle que : ExScore (i) = ExScore (i) - ScoreMin

Figure imgb0020
avec ScoreMin = MIN [Score (20)), Score (21), ..., Score (61)]
Figure imgb0021
Periodically (if not systematically), the minimum value is withdrawn to avoid possible overflows in such a way that: ExScore (i) = ExScore (i) - ScoreMin
Figure imgb0020
with ScoreMin = MIN [Score (20)), Score (21), ..., Score (61)]
Figure imgb0021

Les différents scores sont initialisés pour tenir compte d'une éventuelle dérive du "Pitch", ce qui donne : Score (i) = MAX [ExScore(i-1), ExScore(i), ExScore (i+1)] pour i = 20, ..., 61

Figure imgb0022
The different scores are initialized to take account of a possible "Pitch" drift, which gives: Score (i) = MAX [ExScore (i-1), ExScore (i), ExScore (i + 1)] for i = 20, ..., 61
Figure imgb0022

Enfin, pour les valeurs I (1), ..., I (K) de i correspondant aux K Pitchs Mmax(1) ... Mmax(K) où des maximum sont rencontrés, les scores sont augmentés d'une quantité égale aux maxima de l'autocorrélation trouvés tel que Score (I(K)) = Score(I(K)) +Rmax(K)

Figure imgb0023
pour k = 1, 2, ..., K.
Figure imgb0024
et i = I(1), ..., I(K)
Figure imgb0025
Finally, for the values I (1), ..., I (K) of i corresponding to the K Pitchs Mmax (1) ... Mmax (K) where maximums are encountered, the scores are increased by an equal amount at the maxima of the autocorrelation found such that Score (I (K)) = Score (I (K)) + Rmax (K)
Figure imgb0023
for k = 1, 2, ..., K.
Figure imgb0024
and i = I (1), ..., I (K)
Figure imgb0025

Finalement, la valeur M du Pitch retenue pour la position NO est celle correspondant au maximum du tableau des scores, ScoreMax, situé à l'indice Imax dans ce tableau.Finally, the Pitch value M retained for the NO position is that corresponding to the maximum of the scoreboard, ScoreMax, located at the Imax index in this table.

Si, pour des raisons de précision de calcul et/ou d'algorithmie, plusieurs valeurs successives du score sont égales au maximum ScoreMax, à savoir Score(Imax), Score(Imax+1), Score(Imax+dI), la valeur retenue pour le Pitch est celle qui correspond à Imax+[dI/2], [dI/2] étant la valeur entière de la division dI par 2, comme indiqué sur la figure 4.If, for reasons of calculation accuracy and / or algorithm, several successive values of the score are equal to the maximum ScoreMax, namely Score (Imax), Score (Imax + 1), Score (Imax + dI), the value retained for the Pitch is that which corresponds to Imax + [dI / 2], [dI / 2] being the integer value of the division dI by 2, as indicated in figure 4.

Pour une trame donnée, ou les calculs décrits ci-dessus sont effectués plusieurs fois, la valeur finale du Pitch est celle obtenue à la dernière itération, étant entendu qu'il y a entre 2 et 4 itérations par trame.For a given frame, or the calculations described above are carried out several times, the final value of the Pitch is that obtained at the last iteration, it being understood that there are between 2 and 4 iterations per frame.

La valeur M du Pitch qui est ainsi obtenue correspond à la périodicité la plus vraisemblable du signal de parole centrée autour de la position No avec une résolution de 1, 2 ou 4 suivant la plage où est située la valeur de M. Le taux de voisement est calculé ensuite en effectuant une autocorrélation, normalisée pour un retard égal à M et éventuellement pour des valeurs voisines si la résolution est supérieure à 1, du signal de parole original S(n) et non pas sur le signal Scc(n) prétraité comme pour le calcul du Pitch.The Pitch value M which is thus obtained corresponds to the most likely periodicity of the speech signal centered around the position N o with a resolution of 1, 2 or 4 depending on the range where the value of M. is located. voicing is then calculated by performing an autocorrelation, normalized for a delay equal to M and possibly for neighboring values if the resolution is greater than 1, of the original speech signal S (n) and not on the pretreated signal Scc (n) as for the Pitch calculation.

Par exemple, pour M = 30, l'autocorrélation normalisée n'est calculée que pour un retard de 30, pour M = 40 , elle est calculée pour des retards de 40 et 41 et pour M = 100 elle est calculée pour des retards de 100, mais aussi pour des retards de 98, 99 ainsi que 101 et 102 (la résolution étant de 4 pour M = 100 ).For example, for M = 30, the normalized autocorrelation is calculated only for a delay of 30, for M = 40 , it is calculated for delays of 40 and 41 and for M = 100 it is calculated for delays of 100, but also for delays of 98, 99 as well as 101 and 102 (the resolution being 4 for M = 100).

Dans tous les cas, la valeur retenue Rm est la plus grande des valeurs ainsi calculées, une valeur élémentaire pour M données étant définie par les relations : R = ROM 2/(Roo.RMM) si ROM est positif

Figure imgb0026
ou R = 0 si ROM est plus petit ou égal à zéro
Figure imgb0027
Roo = S(N o )**2+S(N o +1)**2+...+S(N o +160)**2
Figure imgb0028
RMM = S (N o -M)**2+S (N o +1-M)**2+...+S(N o +160-M)**2
Figure imgb0029
ROM = S(N o ).S(N o -M)+S(N o +1).S(N o +1-M)+...
Figure imgb0030
+S (N o +160).S(N o +160-M)
Figure imgb0031
In all cases, the value retained Rm is the largest of the values thus calculated, an elementary value for M given being defined by the relations: R = ROM 2 / (Roo.RMM) if ROM is positive
Figure imgb0026
or R = 0 if ROM is less than or equal to zero
Figure imgb0027
Roo = S (N o ) ** 2 + S (N o +1) ** 2 + ... + S (N o +160) ** 2
Figure imgb0028
RMM = S (N o -M) ** 2 + S (N o + 1-M) ** 2 + ... + S (N o + 160-M) ** 2
Figure imgb0029
ROM = S (N o ) .S (N o -M) + S (N o +1) .S (N o + 1-M) + ...
Figure imgb0030
+ S (N o +160) .S (N o + 160-M)
Figure imgb0031

Contrairement au procédé de calcul précédent mis en oeuvre pour calculer le signal Scc(n) le signal S(n) n'est pas sous échantillonné.Unlike the previous calculation method implemented to calculate the signal S cc (n), the signal S (n) is not undersampled.

La quantité Roo ne dépend pas de M et n'est calculée qu'une fois. Il est possible de se contenter de calculer RMM pour la seule valeur nominale de M à savoir celle fournie par le procédé de calcul du Pitch décrit précédemment et pour les valeurs proches de M de calculer RMM par itération si nécessaire. La quantité ROM doit par contre être calculée pour chacune des valeurs de M.The quantity Roo does not depend on M and is calculated only once. It is possible to be satisfied with calculating RMM for the only nominal value of M, namely that provided by the Pitch calculation method described above and for values close to M to calculate RMM by iteration if necessary. However, the ROM quantity must be calculated for each of the values of M.

Pour limiter les fluctuations, spécialement en ambiance bruitée de la valeur maximale de la quantité Rm ainsi obtenue celle-ci est filtrée par un filtre passe bas entre deux passages successifs (correspondant à deux valeurs successives de la valeur de référence No), pour obtenir une valeur filtrée Rf(P) définie à chaque itération p par la relation : Rf(P) = (1-a).Rf(P-1) + a.R m

Figure imgb0032
où a est une constante égale de préférence à ¼ ou ½ pour que les performances soient satisfaisantes.To limit fluctuations, especially in a noisy environment, the maximum value of the quantity R m thus obtained is filtered by a low-pass filter between two successive passages (corresponding to two successive values of the reference value N o ), for obtain a filtered value Rf (P) defined at each iteration p by the relation: Rf (P) = (1-a) .Rf (P-1) + aR m
Figure imgb0032
where a is a constant preferably equal to ¼ or ½ for the performances to be satisfactory.

En tolérant un retard de codage une expression encore plus satisfaisante peut encore être la suivante Rf(P) = [R m (P-1)+2R m (P)+R m (P+1) ]/4

Figure imgb0033
By tolerating a delay in coding an even more satisfactory expression may still be the following Rf (P) = [R m (P-1) + 2R m (P) + R m (P + 1)] / 4
Figure imgb0033

Enfin, la quantité Rf (P) est comparée comme le montre la figure 5 à deux seuils SV et SNV appelés respectivement seuil de voisement et de non voisement tels que le seuil SV soit supérieur au seuil SNV pour obtenir un indicateur binaire de voisement IV comme représenté à la figure 5.Finally, the quantity Rf (P) is compared as shown in FIG. 5 to two thresholds S V and S NV called respectively the voicing and non-voicing threshold such that the threshold S V is greater than the threshold S NV to obtain a binary indicator voicing IV as shown in Figure 5.

Sur la figure 5,In Figure 5,

l'état IV = 1 correspond à un son voisé et l'état IV = 0 correspond à un son non voisé.state IV = 1 corresponds to a voiced sound and state IV = 0 corresponds to an unvoiced sound.

En partant de l'état IV = 1, IV passe à l'état 0 lorsque Rf(P) devient inférieur à SNV et en partant de l'état IV = 0, IV passe à l'état 1 lorsque Rf(P) devient supérieur à SV.Starting from state IV = 1, IV goes to state 0 when Rf (P) becomes lower than SNV and starting from state IV = 0, IV goes to state 1 when Rf (P) becomes greater than SV.

Des valeurs typiques pour ajuster les deux seuils SNV et SV peuvent être par exemple fixées à SV = 0,2 et SNV = 0,05 en prenant 1 pour valeur maximale de Rf(P) et 0 pour valeur minimale de Rf(P).Typical values for adjusting the two thresholds SNV and SV can for example be fixed at SV = 0.2 and SNV = 0.05 by taking 1 for maximum value of Rf (P) and 0 for minimum value of Rf (P).

Afin d'optimiser les performances de la décision de voisement il est préférable que ces seuils soient ajustables pour donner une certaine inertie à la décision qui soit non perceptible à l'audition pour éviter des erreurs locales sur l'appréciation du voisement.In order to optimize the performance of the voicing decision, it is preferable that these thresholds be adjustable to give a certain inertia to the decision which is not perceptible at hearing to avoid local errors in the appreciation of the voicing.

Claims (6)

Procédé pour l'évaluation de la périodicité et du voisement du signal de parole dans les vocodeurs à très bas débit caractérisé en ce qu'il consiste à effectuer un premier traitement consistant à découper après échantillonnage le signal en trames de durée déterminée, à effectuer un premier filtrage auto-adaptatif (1) du signal échantillonné (Sn) obtenu dans chaque trame pour limiter l'influence du premier formant, à effectuer un deuxième filtrage (2) pour ne conserver qu'un minimum d'harmoniques de la fréquence fondamentale, et à comparer (3) le signal obtenu pair rapport à deux seuils adaptatifs SfMin(n) et SfMax(n) respectivement positif et négatif et évolutifs en fonction du temps suivant une loi prédéterminée pour ne retenir que les portions de signal qui sont respectivement supérieures et inférieures aux deux seuils et à effectuer un deuxième traitement (4, 5, 6) sur le signal S c c(n) obtenu à la fin du premier traitement, consistant à calculer sur un nombre prédéterminé de fréquences fondamentales ou "Pitch" M possible l'autocorrélation du signal obtenu à la fin du premier traitement à partir d'un instant d'échantillonnage déterminée No et à retenir pour valeurs de "Pitch" M ou de fréquence fondamentale candidates celles en nombre égal à un nombre n prédéterminé qui correspondent à des maxima de l'autocorrélaition et à consigner les valeurs correspondantes de l'autocorrélation dans un tableau de scores mis à jour à chaque nouvelle autocorrélation pour ne retenir comme valeur de "Pitch" que celle qui correspond à un score maximal.Method for the evaluation of the periodicity and of the voicing of the speech signal in vocoders at very low bit rate characterized in that it consists in carrying out a first processing consisting in cutting up after sampling the signal into frames of determined duration, in carrying out a first self-adaptive filtering (1) of the sampled signal (Sn) obtained in each frame to limit the influence of the first component, to perform a second filtering (2) to keep only a minimum of harmonics of the fundamental frequency, and to compare (3) the signal obtained even with respect to two adaptive thresholds SfMin (n) and SfMax (n) respectively positive and negative and evolutionary as a function of time according to a predetermined law to retain only the signal portions which are respectively greater and lower than the two thresholds and to carry out a second processing (4, 5, 6) on the signal S c c (n) obtained at the end of the first processing, consisting in calculating r on a predetermined number of fundamental frequencies or "Pitch" M possible the autocorrelation of the signal obtained at the end of the first processing from a determined sampling instant No and to be retained for values of "Pitch" M or frequency fundamental candidates those in number equal to a predetermined number n which correspond to maxima of autocorrelation and to record the corresponding values of autocorrelation in a scoreboard updated with each new autocorrelation so as not to retain the value of "Pitch""than that which corresponds to a maximum score. Procédé selon la revendication 1 caractérisé en ce que l'autocorrélation du signal Scc(n) obtenu à la fin du premier traitement est calculée à partir de l'instant d'échantillonnage No sur un nombre déterminé d'échantillons qui le suit en effectuant : - une première addition (ROO) d'une première suite d'échantillons séparés entre eux par un nombre déterminé d'échantillons ; - une deuxième addition (RMM) d'une deuxième suite d'échantillons correspondant chacun à un échantillon de la première suite décalé en retard de la valeur de "Pitch" M ; - une troisième addition (ROM) de produits respectivement d'échantillon de la première suite avec leur homologue dans la deuxième suite de façon à effectuer le quotient (RauM) du résultat (ROM) de la troisième addition par le produit des deux autres (ROO x RMM) pour ne considérer qu'un nombre déterminé K de valeurs de M pour lesquelles le quotient Rau (M) est maximum localement.Method according to Claim 1, characterized in that the autocorrelation of the signal Scc (n) obtained at the end of the first processing is calculated from the sampling instant No on a determined number of samples which follows it by carrying out: - a first addition (ROO) of a first series of samples separated from each other by a determined number of samples; - a second addition (RMM) of a second series of samples each corresponding to a sample of the first series shifted late by the value of "Pitch"M; - a third addition (ROM) of respectively sample products from the first suite with their counterpart in the second suite so as to perform the quotient (RauM) of the result (ROM) of the third addition by the product of the other two (ROO x RMM) to consider only a determined number K of values of M for which the quotient Rau (M) is maximum locally. Procédé selon les revendications 1 et 2 caractérisé en ce qu'il consiste pour évaluer le voisement à calculer l'autocorrélation du signal de parole échantillonnée pour un retard égal à la valeur du "Pitch" M retenu et les valeurs voisines pour ne retenir que la plus grande des valeurs ainsi calculées, à effectuer un filtrage passe bas de cette valeur et la comparer avec hystérésis à deux seuils respectivement de voisement et de non voisement pour décider de l'état voisé ou non voisé du signal de parole.Method according to Claims 1 and 2, characterized in that it consists in evaluating the voicing by calculating the autocorrelation of the sampled speech signal for a delay equal to the value of the "Pitch" M retained and the neighboring values so as to retain only the larger of the values thus calculated, to perform a low pass filtering of this value and compare it with hysteresis to two thresholds of voicing and non voicing respectively to decide on the voiced or unvoiced state of the speech signal. Procédé selon l'une quelconque des revendications 1 à 3 caractérisé en ce que le premier filtrage auto-adaptatif consiste à soustraire de chaque échantillon courant Sn la somme pondérée par des coefficients Ai(n) d'un nombre i déterminé d'échantillons précédents, l'adaptation des coefficients Ai(n+1)étant obtenue en ajoutant au coefficient courant Ai(n) une quantité EPS affectée d'un signe égal au signal du résultat de la soustraction par le signe de l'échantillon S(n-i).Method according to any one of Claims 1 to 3, characterized in that the first self-adaptive filtering consists in subtracting from each current sample Sn the sum weighted by coefficients Ai (n) from a determined number i of previous samples, the adaptation of the coefficients Ai (n + 1) being obtained by adding to the current coefficient Ai (n) a quantity EPS affected by a sign equal to the signal of the result of the subtraction by the sign of the sample S (ni). Procédé selon la revendication 4 caractérisé en ce que les deux seuils adaptatifs SfMin(n) et SfMax(n) sont déterminés pour chaque échantillon courant à l'instant n à partir de l'échantillon précédent de l'instant n-1 par les relations : SfMin(n) = E. SfMin(n-1)
Figure imgb0034
SfMax(n) = E. SfMax(n-1)
Figure imgb0035
où E est une fonction exponentielle du rapport entre la période Te des échantillons et une constante Tau comprise entre 5 et 10 ms.
Method according to Claim 4, characterized in that the two adaptive thresholds SfMin (n) and SfMax (n) are determined for each current sample at time n from the previous sample of time n-1 by the relationships : SfMin (n) = E. SfMin (n-1)
Figure imgb0034
SfMax (n) = E. SfMax (n-1)
Figure imgb0035
where E is an exponential function of the ratio between the period Te of the samples and a constant Tau between 5 and 10 ms.
Dispositif pour l'évaluation de la périodicité et du voisement du signal de parole dans les vocodeurs à très bas débit caractérisé en ce qu'il comporte un processeur de traitement du signal programmé selon l'une quelconque des revendications 1 à 5.Device for the evaluation of the periodicity and of the voicing of the speech signal in vocoders at very low bit rate characterized in that it comprises a signal processing processor programmed according to any one of claims 1 to 5.
EP91403309A 1990-12-11 1991-12-06 Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders Ceased EP0490740A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR9015477 1990-12-11
FR9015477A FR2670313A1 (en) 1990-12-11 1990-12-11 METHOD AND DEVICE FOR EVALUATING THE PERIODICITY AND VOICE SIGNAL VOICE IN VOCODERS AT VERY LOW SPEED.

Publications (1)

Publication Number Publication Date
EP0490740A1 true EP0490740A1 (en) 1992-06-17

Family

ID=9403105

Family Applications (1)

Application Number Title Priority Date Filing Date
EP91403309A Ceased EP0490740A1 (en) 1990-12-11 1991-12-06 Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders

Country Status (4)

Country Link
US (1) US5313553A (en)
EP (1) EP0490740A1 (en)
CA (1) CA2057139A1 (en)
FR (1) FR2670313A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2739482A1 (en) * 1995-10-03 1997-04-04 Thomson Csf Speech signal analysis method e.g. for low rate vocoder
WO1998001848A1 (en) * 1996-07-05 1998-01-15 The Victoria University Of Manchester Speech synthesis system
WO1999010879A1 (en) * 1997-08-25 1999-03-04 Telefonaktiebolaget Lm Ericsson Waveform-based periodicity detector

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1263050B (en) * 1993-02-03 1996-07-24 Alcatel Italia METHOD FOR ESTIMATING THE PITCH OF A SPEAKING ACOUSTIC SIGNAL AND SYSTEM FOR THE RECOGNITION OF SPOKEN USING THE SAME
JP3601074B2 (en) * 1994-05-31 2004-12-15 ソニー株式会社 Signal processing method and signal processing device
US5704000A (en) * 1994-11-10 1997-12-30 Hughes Electronics Robust pitch estimation method and device for telephone speech
FR2738383B1 (en) * 1995-09-05 1997-10-03 Thomson Csf METHOD FOR VECTOR QUANTIFICATION OF LOW FLOW VOCODERS
IL115697A (en) * 1995-10-19 1999-09-22 Audiocodes Ltd Pitch determination preprocessor based on correlation techniques
US6026357A (en) * 1996-05-15 2000-02-15 Advanced Micro Devices, Inc. First formant location determination and removal from speech correlation information for pitch detection
FR2778041A1 (en) * 1998-04-24 1999-10-29 Thomson Csf Power transmitter tube dynamic compensation method
FR2788390B1 (en) 1999-01-12 2003-05-30 Thomson Csf HIGH EFFICIENCY SHORTWAVE BROADCAST TRANSMITTER OPTIMIZED FOR DIGITAL TYPE TRANSMISSIONS
FR2790343B1 (en) * 1999-02-26 2001-06-01 Thomson Csf SYSTEM FOR ESTIMATING THE COMPLEX GAIN OF A TRANSMISSION CHANNEL
FR2799592B1 (en) 1999-10-12 2003-09-26 Thomson Csf SIMPLE AND SYSTEMATIC CONSTRUCTION AND CODING METHOD OF LDPC CODES
GB2375028B (en) * 2001-04-24 2003-05-28 Motorola Inc Processing speech signals
AU2003901538A0 (en) * 2003-03-28 2003-05-01 Cochlear Limited Maxima search method for sensed signals
US7421298B2 (en) * 2004-09-07 2008-09-02 Cochlear Limited Multiple channel-electrode mapping
CN113327601B (en) * 2021-05-26 2024-02-13 清华大学 Method, device, computer equipment and storage medium for identifying harmful voice

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2145501A1 (en) * 1971-07-09 1973-02-23 Western Electric Co
FR2321738A1 (en) * 1975-08-22 1977-03-18 Nippon Telegraph & Telephone CIRCUIT FOR DETERMINING THE FUNDAMENTAL PERIOD OF A SPEECH SIGNAL FOR SPEECH ANALYZER
EP0125423A1 (en) * 1983-04-13 1984-11-21 Texas Instruments Incorporated Voice messaging system with pitch tracking based on adaptively filtered LPC residual signal
US4653098A (en) * 1982-02-15 1987-03-24 Hitachi, Ltd. Method and apparatus for extracting speech pitch
EP0345675A2 (en) * 1988-06-09 1989-12-13 National Semiconductor Corporation Hybrid stochastic gradient for convergence of adaptive filters

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3603738A (en) * 1969-07-07 1971-09-07 Philco Ford Corp Time-domain pitch detector and circuits for extracting a signal representative of pitch-pulse spacing regularity in a speech wave
US4015088A (en) * 1975-10-31 1977-03-29 Bell Telephone Laboratories, Incorporated Real-time speech analyzer

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2145501A1 (en) * 1971-07-09 1973-02-23 Western Electric Co
FR2321738A1 (en) * 1975-08-22 1977-03-18 Nippon Telegraph & Telephone CIRCUIT FOR DETERMINING THE FUNDAMENTAL PERIOD OF A SPEECH SIGNAL FOR SPEECH ANALYZER
US4653098A (en) * 1982-02-15 1987-03-24 Hitachi, Ltd. Method and apparatus for extracting speech pitch
EP0125423A1 (en) * 1983-04-13 1984-11-21 Texas Instruments Incorporated Voice messaging system with pitch tracking based on adaptively filtered LPC residual signal
EP0345675A2 (en) * 1988-06-09 1989-12-13 National Semiconductor Corporation Hybrid stochastic gradient for convergence of adaptive filters

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
IEEE JOURNAL OF SOLID STATE CIRCUITS vol. 22, no. 3, Juin 1987, N. YORK USA pages 479 - 487; POPE ET AL: 'A single chip linear predictive coding vocoder' *
IEEE TRANSACTIONS ON ACOUSTICS,SPEECH AND SIGNAL PROCESSING. vol. 24, no. 5, Octobre 1976, NEW YORK US pages 399 - 418; RABINER ET AL: 'A comparative performance of several pitch detection algorithms' *
INTERNATIONAL CONFERENCE ON ACOUSTICS SPEECH AND SIGNAL PROCESSING vol. 1, 26 Mars 1985, TAMPA FLORIDA USA pages 403 - 406; KWON ET AL: 'A robust real time pitch extraction from the ACF of LPC residual error signals' *
INTERNATIONAL CONFERENCE ON ACOUSTICS SPEECH AND SIGNAL PROCESSING vol. 1, 7 Avril 1986, TOKYO JAPAN pages 121 - 124; VERHELST ET AL: 'An adaptive non uniform sign clipping preprocessor (ANUSC) for real-time autocorrelative pitch detection' *
RABINER, SCHAFER 'Digital processing of speech signals' 1978 , PRENTICE HALL , ENGLEWOOD CLIFFS USA *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2739482A1 (en) * 1995-10-03 1997-04-04 Thomson Csf Speech signal analysis method e.g. for low rate vocoder
WO1998001848A1 (en) * 1996-07-05 1998-01-15 The Victoria University Of Manchester Speech synthesis system
WO1999010879A1 (en) * 1997-08-25 1999-03-04 Telefonaktiebolaget Lm Ericsson Waveform-based periodicity detector
US5970441A (en) * 1997-08-25 1999-10-19 Telefonaktiebolaget Lm Ericsson Detection of periodicity information from an audio signal

Also Published As

Publication number Publication date
CA2057139A1 (en) 1992-06-12
US5313553A (en) 1994-05-17
FR2670313A1 (en) 1992-06-12

Similar Documents

Publication Publication Date Title
EP1356461B1 (en) Noise reduction method and device
EP0490740A1 (en) Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders
EP1016072B1 (en) Method and apparatus for suppressing noise in a digital speech signal
EP0782128B1 (en) Method of analysing by linear prediction an audio frequency signal, and its application to a method of coding and decoding an audio frequency signal
FR2734389A1 (en) METHOD FOR ADAPTING THE NOISE MASKING LEVEL IN A SYNTHETIC ANALYSIS ANALYTICAL ENCODER USING A SHORT-TERM PERCEPTUAL WEIGHING FILTER
WO1996021220A1 (en) Speech coding method using synthesis analysis
EP1016071B1 (en) Method and apparatus for detecting speech activity
EP1016073B1 (en) Method and apparatus for suppressing noise in a digital speech signal
EP1021805B1 (en) Method and apparatus for conditioning a digital speech signal
EP0573358B1 (en) Variable speed voice synthesizer method and apparatus
FR2797343A1 (en) METHOD AND DEVICE FOR DETECTING VOICE ACTIVITY
EP1192619B1 (en) Audio coding and decoding by interpolation
EP0596785A1 (en) Method for the discrimination of speech in presence of ambient noise and low bit-rate vocoder to implement the method
EP1192621B1 (en) Audio encoding with harmonic components
FR2739482A1 (en) Speech signal analysis method e.g. for low rate vocoder
EP1194923B1 (en) Methods and device for audio analysis and synthesis
EP0543719A1 (en) Method and arrangement for voiced-unvoiced decision applied in a very low rate vocoder
EP0454552A2 (en) Method and apparatus for low bitrate speech coding
WO1999027523A1 (en) Method for reconstructing sound signals after noise abatement
WO2001003121A1 (en) Encoding and decoding with harmonic components and minimum phase
FR2741743A1 (en) Speech intelligibility improvement method for low bit rate vocoder
WO2001003117A1 (en) Audio coding with adaptive liftering
WO2002093553A1 (en) Estimation of fundamental periods of multiple concurrent sources in particular of sound

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE ES GB IT

17P Request for examination filed

Effective date: 19921008

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: THOMSON-CSF

17Q First examination report despatched

Effective date: 19950519

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 19960725