EP2080194B1

EP2080194B1 - Attenuation of overvoicing, in particular for generating an excitation at a decoder, in the absence of information

Info

Publication number: EP2080194B1
Application number: EP07858612A
Authority: EP
Inventors: David Virette; Balazs Kovesi
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2006-10-20
Filing date: 2007-10-17
Publication date: 2011-12-07
Anticipated expiration: 2027-10-17
Also published as: RU2437170C2; ATE536613T1; BRPI0718423B1; BRPI0718423A2; US20100324907A1; ES2378972T3; JP2010507120A; KR20090090312A; CN101573751A; KR101409305B1; US8417520B2; JP5289319B2; WO2008047051A2; EP2080194A2; MX2009004212A; CN101573751B; WO2008047051A3; RU2009118918A

Abstract

The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitation is generated on the basis of the pitch period (T) estimated or transmitted at the previous block, by optionally applying a correction of plus or minus a sample of the duration of this period (counted in terms of number of samples), by constituting groups (A′,B′,C′,D′) of at least two samples and inverting positions of samples in the groups, randomly (B′,C′) or in a forced manner. An over-harmonicity in the excitation generated is thus broken and the effect of overvoicing in the synthesis of the generated signal is thereby attenuated.

Description

La présente invention concerne le traitement de signaux audionumériques, tels que des signaux de parole en télécommunication, en particulier au décodage de tels signaux.The present invention relates to the processing of digital audio signals, such as speech signals in telecommunications, in particular to the decoding of such signals.

On rappelle rapidement qu'un signal de parole peut être prédit à partir de son passé récent (par exemple de 8 à 12 échantillons à 8 kHz) au moyen de paramètres évalués sur des fenêtres courtes (10 à 20 ms dans cet exemple). Ces paramètres de prédiction à court terme, représentatifs de la fonction de transfert du conduit vocal (par exemple pour prononcer des consonnes), sont obtenus par des méthodes d'analyse LPC (pour "Linear Prediction Coding"). On met en oeuvre aussi une corrélation à plus long terme pour déterminer des périodicités de sons voisés (par exemple les voyelles) dues à la vibration des cordes vocales. Il s'agit donc de déterminer au moins la fréquence fondamentale du signal voisé qui varie typiquement de 60 Hz (voix grave) à 600 Hz (voix aiguë) selon les locuteurs. On détermine alors, par une analyse LTP (pour "Long Term Prediction"), les paramètres LTP d'un prédicteur à long terme, et en particulier l'inverse de la fréquence fondamentale, appelée souvent "période de pitch". On définit alors le nombre d'échantillons dans une période de pitch par le rapport F_e/F₀ (ou sa partie entière), où :

F_e est la cadence d'échantillonnage, et
F₀ est la fréquence fondamentale.

It is quickly recalled that a speech signal can be predicted from its recent past (for example from 8 to 12 samples at 8 kHz) using parameters evaluated on short windows (10 to 20 ms in this example). These short-term prediction parameters, representative of the transfer function of the vocal tract (for example to pronounce consonants), are obtained by LPC (for Linear Prediction Coding ) analysis methods. A longer-term correlation is also used to determine the periodicities of voiced sounds (eg vowels) due to the vibration of the vocal cords. It is therefore a question of determining at least the fundamental frequency of the voiced signal which varies typically from 60 Hz (deep voice) to 600 Hz (high voice) according to the speakers. A LTP ( Long Term Prediction ) analysis then determines the LTP parameters of a long-term predictor, and in particular the inverse of the fundamental frequency, often called the pitch period . We then define the number of samples in a pitch period by the ratio F _e / F ₀ (or its integer part), where:

F _e is the sampling rate, and
F ₀ is the fundamental frequency.

On retiendra donc que les paramètres de prédiction à long terme LTP, dont la période de pitch, représentent la vibration fondamentale du signal de parole (lorsqu'il est voisé), tandis que les paramètres de prédiction à court terme LPC représentent l'enveloppe spectrale de ce signal.We therefore note that the LTP long-term prediction parameters, including the pitch period, represent the fundamental vibration of the speech signal (when it is voiced), while the LPC short-term prediction parameters represent the spectral envelope. of this signal.

L'ensemble de ces paramètres LPC et LTP, résultant donc d'un codage de parole, est transmis par blocs vers un décodeur homologue, via un ou plusieurs réseaux de télécommunication, pour restituer ensuite le signal de parole initial.All of these LPC and LTP parameters, thus resulting from speech coding, are transmitted in blocks to a peer decoder, via one or more telecommunication networks, to then restore the initial speech signal.

Dans le cadre de la communication de tels signaux par blocs, la perte d'un ou de plusieurs blocs consécutifs peut survenir. On entend par le terme "bloc" une succession de données de signal qui peut être par exemple une trame en communication radiomobile, ou encore un paquet par exemple en communication sur IP (pour "Internet Protocol"), ou autres.In the context of the communication of such block signals, the loss of one or more consecutive blocks may occur. The term " block " is understood to mean a succession of signal data which may be, for example, a frame in radiomobile communication, or else a packet for example in communication over IP (for " Internet Protocol "), or others.

En communication radiomobile par exemple, la plupart des techniques de codage par synthèse prédictive, et notamment le codage de type CELP (pour "Code Excited Liner Predictive"), proposent des solutions pour récupérer des trames effacées. Le décodeur est informé de l'occurrence d'une trame effacée, par exemple par la transmission d'une information d'effacement de trame provenant du décodeur canal. La récupération de trames effacées a pour objectif d'extrapoler les paramètres de la trame effacée à partir d'une ou plusieurs trames précédentes considérées comme valides. Certains paramètres manipulés ou codés par les codeurs prédictifs présentent une forte corrélation entre trames. Il s'agit typiquement des paramètres de prédiction à long terme LTP, pour les sons voisés par exemple, et des paramètres de prédiction à court terme LPC. Du fait de cette corrélation, il est beaucoup plus avantageux de réutiliser les paramètres de la dernière trame valide pour synthétiser la trame effacée, que d'utiliser des paramètres aléatoires, voire erronés.In radiotelephone communication, for example, most predictive synthesis coding techniques, and in particular CELP coding (for " Code Excited Liner Predictive "), propose solutions for recovering erased frames. The decoder is informed of the occurrence of an erased frame, for example by transmitting frame erase information from the channel decoder. The purpose of recovering erased frames is to extrapolate the parameters of the erased frame from one or more previous frames considered valid. Some parameters manipulated or coded by the predictive coders have a strong correlation between frames. These are typically long-term LTP prediction parameters, for voiced sounds for example, and LPC short-term prediction parameters. Because of this correlation, it is much more advantageous to reuse the parameters of the last valid frame to synthesize the erased frame, than to use random or even erroneous parameters.

En génération d'excitation CELP, les paramètres de la trame effacée sont classiquement obtenus comme suit.In CELP excitation generation, the parameters of the erased frame are conventionally obtained as follows.

Les paramètres LPC d'une trame à reconstruire sont obtenus à partir des paramètres LPC de la dernière trame valide, par simple recopie des paramètres ou encore avec introduction d'un certain amortissement (technique utilisée par exemple dans le codeur normalisé G723.1). Ensuite, on détecte un voisement ou un non voisement dans le signal de parole pour déterminer un degré d'harmonicité du signal au niveau de la trame effacée.The LPC parameters of a frame to be reconstructed are obtained from the LPC parameters of the last valid frame, by simple copy of the parameters or with introduction of a certain damping (technique used for example in the standardized encoder G723.1). Then, a voicing or non-voicing in the speech signal is detected to determine a degree of harmonicity of the signal at the erased frame.

Si le signal est non voisé, un signal d'excitation peut être généré de manière aléatoire (par tirage d'un mot de code de l'excitation passée, par léger amortissement du gain de l'excitation passée, par sélection aléatoire dans l'excitation passée, ou en utilisant encore des codes transmis qui peuvent être totalement erronés).If the signal is unvoiced, an excitation signal can be generated randomly (by drawing a codeword from the past excitation, by a slight damping of the gain of past excitation, by random selection in the past excitation, or by still using transmitted codes which can be totally erroneous).

Si le signal est voisé, la période de pitch (appelée aussi "délai LTP") est généralement celle calculée pour la trame précédente, éventuellement avec une légère "gigue" (augmentation de la valeur du délai LTP pour les trames d'erreur consécutive, le gain LTP étant pris très voisin de 1 ou égal à 1). Le signal d'excitation est donc limité à la prédiction à long terme effectuée à partir d'une excitation passée.If the signal is voiced, the pitch period (also called " LTP delay ") is generally the one calculated for the previous frame, possibly with a slight "jitter" (increase of the value of the LTP delay for consecutive error frames, the gain LTP being taken very close to 1 or equal to 1). The excitation signal is therefore limited to the long-term prediction made from a past excitation.

Les moyens de dissimulation des trames effacées, au décodage, sont généralement fortement liés à la structure du décodeur et peuvent être communs à des modules de ce décodeur, comme par exemple le module de synthèse du signal. Ces moyens utilisent aussi des signaux intermédiaires disponibles au sein du décodeur, comme par exemple le signal d'excitation passé et mémorisé lors du traitement des trames valides précédant les trames effacées.The means for hiding erased frames, at decoding, are generally strongly related to the structure of the decoder and may be common to modules of this decoder, such as for example the signal synthesis module. These means also use intermediate signals available within the decoder, such as the excitation signal passed and stored during the processing of valid frames preceding the erased frames.

Certaines techniques utilisées pour dissimuler les erreurs produites par des paquets perdus lors du transport de données codées selon un codage de type temporel font souvent appel à des techniques de substitution de formes d'ondes. De telles techniques visent à reconstituer le signal en sélectionnant des portions du signal décodé avant la période perdue et ne font pas appel à des modèles de synthèse. Des techniques de lissage sont également mises en oeuvre pour éviter les artefacts produits par la concaténation des différents signaux.Certain techniques used to conceal the errors produced by lost packets during the transport of coded data in time-type coding often use waveform substitution techniques. Such techniques seek to reconstruct the signal by selecting portions of the decoded signal before the lost period and do not use synthesis models. Smoothing techniques are also used to avoid the artifacts produced by the concatenation of the different signals.

Pour les décodeurs opérant sur des signaux codés par codage par transformée, les techniques de reconstruction des trames effacées s'appuient généralement sur la structure de codage utilisée. Certaines techniques visent à régénérer les coefficients transformés perdus à partir des valeurs prises par ces coefficients avant l'effacement.For decoders operating on signals coded by transform coding, the techniques for reconstructing erased frames generally rely on the coding structure used. Some techniques aim at regenerating the transformed coefficients lost from the values taken by these coefficients before the erasure.

D'autres techniques de dissimulation des trames effacées ont été développées conjointement avec le codage canal. Elles se servent d'informations fournies par le décodeur canal, par exemple d'informations concernant le degré de fiabilité des paramètres reçus. On indique ici qu'au contraire, l'objet de la présente invention ne présuppose pas l'existence d'un codeur canal.Other techniques for concealing erased frames have been developed in conjunction with channel coding. They make use of information provided by the channel decoder, for example information concerning the degree of reliability of the received parameters. It is indicated here that, on the contrary, the object of the present invention does not presuppose the existence of a channel coder.

On a proposé, dans le document Combescure et al. :

" A 16,24,32 kbit/s Wideband Speech Codec Based on ATCELP", P.Combescure, J.Schnitzler, K.Ficher, R.Kirchherr, C.Lamblin, A.Le Guyader, D.Massaloux, C.Quinquis, J.Stegmann, P.Vary, Proceedings Conference ICASSP (1998 ), l'usage d'une méthode de dissimulation des trames effacées équivalente à celle utilisée dans les codeurs CELP pour un codeur par transformée.

It has been proposed in Combescure et al. :

" A 16,24,32 kbit / s Wideband Speech Codec Based on ATCELP ", P.Combescure, J. Schnitzler, K.Ficher, R.Kirchherr, C.Lamblin, A.The Guyader, D.Massaloux, C.Quinquis, J.Stegmann, P.Vary, ICASSP Proceedings Conference (1998) ), the use of a method of hiding erased frames equivalent to that used in CELP encoders for a transform coder.

Les inconvénients de cette méthode étaient l'introduction de distorsions spectrales audibles (voix "synthétique", résonances parasites, etc.). Ces inconvénients étaient dus notamment à l'usage de filtres de synthèse à long terme mal contrôlés (composante harmonique unique en sons voisés, usage de portions du signal résiduel passé en sons non voisés). En outre, le contrôle d'énergie s'effectue ici au niveau du signal d'excitation et la cible énergétique de ce signal est maintenue constante pendant toute la durée de l'effacement, ce qui engendre également des artefacts audibles et gênants.The drawbacks of this method were the introduction of audible spectral distortions ( "synthetic " voice, spurious resonances, etc.). These drawbacks were due in particular to the use of poorly controlled long-term synthesis filters (unique harmonic component in voiced sounds, use of portions of the residual signal passed in unvoiced sounds). In addition, the energy control is performed here at the excitation signal and the energy target of this signal is kept constant throughout the erasure, which also generates audible and annoying artifacts.

Dans le document FR-2,813,722 , on a proposé une technique de dissimulation des trames effacées, ne générant pas plus de distorsion à des taux d'erreurs plus élevés et/ou pour des intervalles effacés plus longs. Cette technique vise à éviter l'excès de périodicité pour les sons voisés et à mieux contrôler la génération de l'excitation non voisée. Pour ce faire, on considère le signal d'excitation (s'il est voisé) comme la somme de deux signaux :

une composante fortement harmonique limitée en bande aux basses fréquences du spectre total, et
une autre composante moins harmonique et limitée aux plus hautes fréquences.

La composante fortement harmonique est obtenue par filtrage LTP. La seconde composante est obtenue aussi par un filtrage LTP rendu non périodique par la modification aléatoire de sa période fondamentale.In the document FR-2813722 a technique of concealing erased frames has been proposed, not generating more distortion at higher error rates and / or for longer erased intervals. This technique aims to avoid excessive periodicity for voiced sounds and to better control the generation of unvoiced excitation. To do this, we consider the excitation signal (if it is voiced) as the sum of two signals:

a strongly harmonic component limited in band at the low frequencies of the total spectrum, and
another less harmonic component and limited to the highest frequencies.

The strongly harmonic component is obtained by LTP filtering. The second component is also obtained by LTP filtering made non-periodic by the random modification of its fundamental period.

Le problème principal des techniques de dissimulation d'erreur utilisées jusque là dans les codeurs CELP réside dans la génération de l'excitation voisée qui, lorsque plusieurs trames consécutives ont été perdues, peuvent entraîner un effet de survoisement dû à la répétition de la même période de pitch sur plusieurs trames. Il est ainsi proposé dans WO 2006/07 9348 de faire varier les échantillons dans les trames successives.The main problem of the error concealment techniques previously used in CELP encoders is the generation of voiced excitation which, when several consecutive frames have been lost, may result in an over-event effect due to the repetition of the same period. pitch on several frames. It is thus proposed in WO 2006/07 9348 to vary the samples in the successive frames.

La présente invention telle que definie par les revendications 1, 7 et 8 vient améliorer la situation.The present invention as defined by claims 1, 7 and 8 improves the situation.

Elle propose à cet effet un procédé de synthèse d'un signal audionumérique représenté par des blocs consécutifs d'échantillons, dans lequel, à la réception d'un tel signal, pour remplacer au moins un bloc invalide, on génère un bloc de remplacement à partir des échantillons d'au moins un bloc valide, précédant le bloc invalide.To this end, it proposes a method for synthesizing a digital audio signal represented by consecutive blocks of samples, in which, on receiving such a signal, to replace at least one invalid block, a replacement block is generated at from samples of at least one valid block, preceding the invalid block.

Le procédé selon l'invention comporte les étapes suivantes :

a) sélectionner un nombre choisi d'échantillons formant une succession dans au moins un dernier bloc valide précédant le bloc invalide,
b) fragmenter la succession d'échantillons en groupes d'échantillons, et, dans au moins une partie des groupes, inverser des échantillons selon des règles prédéterminées,
c) re-concaténer les groupes dont les échantillons de certains au moins ont été inversés à l'étape b), pour former une partie au moins du bloc de remplacement, et
d) si ladite partie obtenue à l'étape c) ne remplit pas tout le bloc de remplacement, recopier ladite partie dans le bloc de remplacement et appliquer à nouveau les étapes a), b), c) à ladite partie recopiée.

The process according to the invention comprises the following steps:

a) selecting a selected number of samples forming a succession in at least one last valid block preceding the invalid block,
b) fragmenting the succession of samples into groups of samples, and, in at least a portion of the groups, inverting samples according to predetermined rules,
c) re-concatenating the groups of which at least some samples were reversed in step b) to form at least a portion of the replacement block, and
d) if said part obtained in step c) does not fill all the replacement block, copy said part into the replacement block and reapply steps a), b), c) to said copied part.

Cette inversion d'échantillons, qui consiste donc en une manipulation d'échantillons très simple et peu coûteuse en termes de calcul et de moyens de traitement, a pour but de "casser" une sur-harmonicité qui aurait pu être présente si une simple recopie de période de pitch avait été mise en oeuvre.This inversion of samples, which therefore consists of a very simple and inexpensive sample manipulation in terms of calculation and processing means, is intended to "break" an over-harmonicity that could have been present if a simple copy pitch period had been implemented.

Ainsi, parmi les avantages qu'offre la présente invention, sa mise en oeuvre ne nécessite qu'un très faible coût de calcul.Thus, among the advantages offered by the present invention, its implementation requires only a very low calculation cost.

L'invention s'applique avantageusement au cas où le signal audionumérique est un signal de parole voisé, et, plus particulièrement, faiblement voisé car la simple recopie de période de pitch donne des résultats médiocres dans ce cas. Ainsi, selon une caractéristique avantageuse, on détecte un degré de voisement dans le signal de parole et on applique les étapes a) à d) si le signal est au moins faiblement voisé.The invention advantageously applies to the case where the digital audio signal is a voiced speech signal, and, more particularly, slightly voiced because the simple copy of pitch period gives poor results in this case. Thus, according to an advantageous characteristic, a degree of voicing is detected in the speech signal and steps a) to d) are applied if the signal is at least slightly voiced.

La présente invention s'appuie avantageusement sur la fréquence fondamentale du signal audionumérique pour constituer les groupes à l'étape b). Ainsi, avantageusement, à l'étape a) :

al) on détecte un ton dans le signal audionumérique, et
a2) ledit nombre choisi d'échantillons sélectionnés à l'étape a) correspond au nombre d'échantillons que comporte une période correspondant à l'inverse d'une fréquence fondamentale du ton détecté.

The present invention relies advantageously on the fundamental frequency of the digital audio signal to constitute the groups in step b). Thus, advantageously, in step a):

al) detects a tone in the digital audio signal, and
a2) said selected number of samples selected in step a) corresponds to the number of samples that comprises a period corresponding to the inverse of a fundamental frequency of the detected tone.

Bien entendu, dans le cas d'un signal de parole, l'opération a1) peut consister à détecter un voisement et l'opération a2) viserait, si le signal de parole est voisé, à sélectionner un nombre d'échantillons qui s'étend sur toute une période de pitch (inverse d'une fréquence fondamentale d'un ton de voix). Néanmoins, on relèvera que cette réalisation peut aussi viser un signal autre qu'un signal de parole, notamment un signal musical, si une fréquence fondamentale propre à un ton global de musique peut y être détectée.Of course, in the case of a speech signal, the operation a1) may consist in detecting a voicing and the operation a2) would aim, if the speech signal is voiced, to select a number of samples which extends over a whole pitch period (inverse of a fundamental frequency of a voice tone). Nevertheless, it will be noted that this embodiment may also target a signal other than a speech signal, in particular a musical signal, if a fundamental frequency specific to a global tone of music can be detected therein.

Dans un mode de réalisation, la fragmentation de l'étape b) s'effectue par groupes de deux échantillons, et on inverse les positions des échantillons d'un même groupe l'une avec l'autre.In one embodiment, the fragmentation of step b) is carried out in groups of two samples, and the positions of the samples of the same group are reversed with each other.

Toutefois, dans ce mode de réalisation, il convient de distinguer le cas où la période de pitch (ou plus généralement la période inverse de la fréquence fondamentale) comporte un nombre d'échantillons pair ou impair. En particulier, si le nombre d'échantillons que comporte la période du ton détecté est un nombre pair, un nombre impair d'échantillons (préférentiellement un seul échantillon) est avantageusement ajouté ou retranché aux échantillons de ladite période pour former la sélection de l'étape a).However, in this embodiment, it is necessary to distinguish the case where the pitch period (or more generally the inverse period of the fundamental frequency) comprises an even or odd number of samples. In particular, if the number of samples that comprises the period of the detected tone is an even number, an odd number of samples (preferably a single sample) is advantageously added or removed from the samples of said period to form the selection of the step a).

Il convient de préciser aussi ce que l'on entend par les "règles prédéterminées d'inversion". Ces règles, qui peuvent être choisies selon les caractéristiques du signal reçu, imposent notamment le nombre d'échantillons par groupes à l'étape b) et la manière d'inverser les échantillons dans un groupe. Dans le mode de réalisation ci-avant, on prévoit des groupes de deux échantillons et une simple inversion des positions respectives de ces deux échantillons. Toutefois, d'autres configurations sont possibles (groupes comportant plus de deux échantillons et permutation de tous les échantillons de tels groupes). Par ailleurs, les règles d'inversion peuvent fixer aussi le nombre de groupes dans lesquels l'inversion est effectuée. Une réalisation particulière consiste à rendre aléatoires les occurrences d'inversion d'échantillons dans chaque groupe et fixer un seuil de probabilité pour inverser ou non les échantillons d'un groupe. Ce seuil de probabilité peut avoir une valeur fixe, ou encore une valeur variable et dépendre avantageusement d'une fonction de corrélation portant sur la période de pitch. Dans ce cas, la détermination formelle de la période de pitch, elle-même, n'est pas nécessaire. D'ailleurs, plus généralement, le traitement au sens de l'invention peut être effectué aussi si le signal valide reçu n'est simplement pas voisé, auquel cas il n'existe pas réellement de période de pitch détectable. Dans ce cas, il peut être prévu de fixer un nombre donné d'échantillons arbitraire (par exemple deux cents échantillons) et réaliser le traitement au sens de l'invention sur ce nombre d'échantillons. Il est aussi possible de prendre la valeur correspondant au maximum de la fonction de corrélation en limitant la recherche dans un intervalle de valeur (par exemple entre MAX_PITCH/2 et MAX_PITCH, où MAX_PITCH est la valeur maximale dans la recherche de période de pitch).It is also necessary to specify what is meant by the "predetermined rules of inversion". These rules, which can be chosen according to the characteristics of the signal received, in particular impose the number of samples in groups in step b) and the manner of inverting the samples in a group. In the embodiment above, groups of two samples and a simple inversion of the respective positions of these two samples are provided. However, other configurations are possible (groups comprising more than two samples and permutation of all the samples of such groups). In addition, the inversion rules can also set the number of groups in which inversion is performed. One particular achievement is to randomize the sample inversion occurrences in each group and set a probability threshold to invert or not the samples of a group. This probability threshold may have a fixed value or a variable value and advantageously depend on a correlation function relating to the pitch period. In this case, the formal determination of the pitch period, itself, is not necessary. Moreover, more generally, the treatment in the sense of the invention can be carried out also if the valid signal received is simply not voiced, in which case there is not really a detectable pitch period. In this case, it may be provided to set a given number of arbitrary samples (for example two hundred samples) and perform the treatment in the sense of the invention on this number of samples. It is also possible to take the value corresponding to the maximum of the correlation function by limiting the search in a value range (for example between MAX_PITCH / 2 and MAX_PITCH, where MAX_PITCH is the maximum value in the search for pitch period).

La présente invention, proposant ainsi l'atténuation de survoisement, offre les avantages suivants :

la parole synthétisée lors d'une perte de bloc ne présente pratiquement plus de phénomène de sur-harmonicité ou de survoisement, et
la complexité nécessaire pour générer une excitation voisée est très faible, comme on le verra dans l'exemple de réalisation décrit en détail ci-après.

The present invention, thus providing override attenuation, offers the following advantages:

the speech synthesized during a block loss has practically no more phenomenon of over-harmonicity or overexposure, and
the complexity necessary to generate a voiced excitation is very small, as will be seen in the embodiment described in detail below.

D'ailleurs, d'autres avantages et caractéristiques de l'invention apparaîtront à l'examen de la description détaillée, donnée à titre d'exemple ci-après, et des dessins annexés sur lesquels :

la figure 1 illustre le principe d'une génération d'excitation permettant d'atténuer l'effet de survoisement, en intégrant une inversion aléatoire d'échantillons, sur des blocs de deux échantillons et avec une probabilité de 50% dans l'exemple représenté, sur toute une période de pitch,
la figure 2 illustre le principe d'une génération d'excitation intégrant une inversion d'échantillons, ici systématique, sur des blocs de deux échantillons dans l'exemple représenté et sur toute une période de pitch,
la figure 3a illustre l'application de l'inversion systématique de la figure 2 sur un signal dont on a estimé une période de pitch comportant un nombre d'échantillons impair,
la figure 3b représente, à titre purement illustratif, l'application de l'inversion systématique de la figure 2 sur un signal dont on a estimé une période de pitch comportant un nombre d'échantillons pair,
la figure 3c illustre l'application de l'inversion systématique de la figure 2, avec ici une correction par ajout d'un échantillon à la durée correspondant à la période de pitch, pour rendre cette durée impaire en termes de nombre d'échantillons qu'elle comporte,
la figure 4 illustre schématiquement les étapes principales d'un procédé au sens de l'invention, au décodage,
la figure 5 illustre très schématiquement la structure d'un appareil de réception d'un signal audionumérique comportant un dispositif de synthèse pour la mise en oeuvre du procédé au sens de l'invention.

Moreover, other advantages and characteristics of the invention will appear on examining the detailed description, given by way of example below, and the appended drawings in which:

the figure 1 illustrates the principle of an excitation generation for attenuating the overwrite effect, by integrating a random inversion of samples, on blocks of two samples and with a probability of 50% in the example represented, over any a pitch period,
the figure 2 illustrates the principle of an excitation generation integrating a sample inversion, here systematically, on blocks of two samples in the example shown and over a whole pitch period,
the figure 3a illustrates the application of the systematic inversion of the figure 2 on a signal which has been estimated a pitch period comprising an odd number of samples,
the figure 3b represents, for purely illustrative purposes, the application of the systematic inversion of the figure 2 on a signal which has been estimated a pitch period comprising an even number of samples,
the figure 3c illustrates the application of the systematic inversion of the figure 2 , here with a correction by adding a sample to the duration corresponding to the pitch period, to make this duration odd in terms of the number of samples that it comprises,
the figure 4 schematically illustrates the main steps of a method according to the invention, the decoding,
the figure 5 illustrates very schematically the structure of an apparatus for receiving a digital audio signal comprising a synthesis device for implementing the method within the meaning of the invention.

On se réfère tout d'abord à la figure 4 pour illustrer le contexte de mise en oeuvre de la présente invention. Sur réception d'un signal d'entrée Se au décodage, on détecte (test 50) la perte d'un ou plusieurs blocs consécutifs. Si aucune perte de bloc n'est constatée (flèche O en sortie du test 50), aucun problème ne se pose, bien entendu, et le traitement de la figure 4 s'achève.We first refer to the figure 4 to illustrate the context of implementation of the present invention. Upon reception of an input signal Se at decoding, the loss of one or more consecutive blocks is detected (test 50). If no block loss is found (arrow O at the output of the test 50), no problem arises, of course, and the treatment of the figure 4 ends.

En revanche, si la perte d'un ou plusieurs blocs consécutifs est constatée (flèche N en sortie du test 50), on détecte alors le degré de voisement (test 51) du signal.On the other hand, if the loss of one or more consecutive blocks is noted (arrow N at the output of the test 50), then the degree of voicing (test 51) of the signal is detected.

Si le signal n'est pas voisé (flèche N en sortie du test 51), on remplace les blocs perdus par exemple par un bruit blanc, audible, dit "bruit de confort" 52, et on ajuste le gain 61 des échantillons des blocs ainsi reconstruits. On peut réaliser par exemple un contrôle sur l'énergie du signal reconstruit Ss, avec adaptation de la loi d'évolution, et/ou faire évoluer des paramètres du modèle vers un signal de repos tel que le bruit de confort 52.If the signal is not voiced (arrow N at the output of the test 51), the lost blocks are replaced for example by a white noise, audible, called " comfort noise " 52, and the gain 61 of the samples of the blocks is adjusted thus reconstructed. For example, it is possible to control the energy of the reconstructed signal Ss, with adaptation of the evolution law, and / or to change model parameters to a rest signal such as comfort noise 52.

Dans une variante de la présente invention, on ne considère que deux classes de signaux, les signaux voisés d'une part, et les signaux faiblement ou non voisés d'autre part. L'avantage de cette variante est que la génération du signal non voisé sera identique à la synthèse faiblement voisée. Comme indiqué précédemment, la "période de pitch" utilisée pour les signaux non voisés est une valeur aléatoire, de préférence assez grande (par exemple deux cents échantillons). Dans un bloc non voisé, le signal précédent est non harmonique, en appliquant le traitement au sens de l'invention à une période suffisamment grande, on garantit que le signal ainsi généré reste non harmonique. La nature du signal sera avantageusement conservée, ce qui ne serait pas le cas en utilisant un signal aléatoirement généré (par exemple un bruit blanc).In a variant of the present invention, only two classes of signals are considered, the voiced signals on the one hand, and the weakly or unvoiced signals, on the other hand. The advantage of this variant is that the generation of the unvoiced signal will be identical to the weakly voiced synthesis. As indicated above, the "pitch period" used for the unvoiced signals is a random value, preferably quite large (for example two hundred samples). In an unvoiced block, the preceding signal is non-harmonic, by applying the processing in the sense of the invention to a sufficiently large period, it is ensured that the signal thus generated remains non-harmonic. The nature of the signal will advantageously be preserved, which would not be the case using a randomly generated signal (for example a white noise).

Si le signal est fortement voisé (flèche O en sortie du test 51), on remplace les blocs perdus par recopie de la période de pitch T. On détermine donc la période de pitch T identifiée dans la dernière partie encore valide du signal reçu Se (par une technique 53 quelconque qui peut être connue en soi). On recopie ensuite les échantillons de cette période de pitch T dans les blocs perdus (référence 54). On applique ensuite un gain approprié 61 aux échantillons ainsi replacés (pour effectuer par exemple une atténuation ou "fading").If the signal is strongly voiced (arrow O at the output of the test 51), the lost blocks are replaced by copying the pitch period T. Thus, the pitch period T identified in the last still valid part of the received signal Se is determined ( by a technique 53 any that can be known per se). The samples of this pitch period T are then copied into the lost blocks (reference 54). An appropriate gain 61 is then applied to the samples thus replaced (for example to perform attenuation or "fading").

Dans l'exemple décrit, si le signal est moyennement voisé (ou, dans une variante moins sophistiquée mais plus générale, si le signal est simplement voisé), on applique le procédé au sens de l'invention (flèche M en sortie du test 51 sur le degré de voisement).In the example described, if the signal is moderately voiced (or, in a less sophisticated but more general variant, if the signal is simply voiced), the method is applied in the sense of the invention (arrow M at the output of the test 51 on the degree of voicing).

En référence aux figures 1 et 2, le principe de l'invention consiste à rassembler les échantillons des derniers blocs valides reçus, par groupes d'au moins deux échantillons. Dans l'exemple des figures 1 et 2, on a regroupé effectivement ces échantillons par deux. On pourrait néanmoins les regrouper par plus de deux échantillons, auquel cas les règles d'inversion d'échantillons par groupe et de prise en compte de la parité en nombre d'échantillons de la période de pitch T, décrites en détail ci-après, seraient légèrement adaptées.With reference to figures 1 and 2 the principle of the invention consists in collecting the samples of the last valid blocks received, in groups of at least two samples. In the example of figures 1 and 2 these samples were effectively grouped together in two. However, they could be grouped by more than two samples, in which case the rules for inversion of samples per group and for taking into account the parity in the number of samples of the pitch period T, described in detail below, would be slightly adapted.

En se référant en particulier à la figure 2, les groupes A, B, C, D, de deux échantillons dans les derniers blocs valides reçus sont recopiés et concaténés aux derniers échantillons reçus. Toutefois, dans ces groupes recopiés, référencés A', B', C', D', on a inversé les valeurs des deux échantillons dans chaque groupe (ou conservé leur valeur et inversé leurs positions respectives). Ainsi, le groupe A devient le groupe A', avec ses deux échantillons inversés par rapport au groupe A (conformément aux deux flèches du groupe A' sur la figure 2). Le groupe B devient le groupe B', avec ses deux échantillons inversés par rapport au groupe B, et ainsi de suite. La recopie et concaténation des groupes A', B', C', D', s'effectue avantageusement en respectant la période de pitch T. Ainsi, le groupe A', constitué des échantillons inversés du groupe A, est séparé du groupe A d'un nombre d'échantillons correspondant à la durée de la période de pitch T. De même, le groupe B' est séparé du groupe B par une durée correspondant à la période de pitch T, et ainsi de suite.Referring in particular to the figure 2 groups A, B, C, D, of two samples in the last valid blocks received are copied and concatenated to the last samples received. However, in these copied groups, referenced A ', B', C ', D', the values of the two samples in each group were inverted (or kept their value and inverted their respective positions). Thus, the group A becomes the group A ', with its two samples reversed compared to the group A (according to the two arrows of the group A' on the figure 2 ). Group B becomes group B ', with its two samples inverted with respect to group B, and so on. The copy and concatenation of the groups A ', B', C ', D' is advantageously carried out while respecting the pitch period T. Thus, the group A ', consisting of the inverted samples of the group A, is separated from the group A a number of samples corresponding to the duration of the pitch period T. Similarly, the group B 'is separated from the group B by a duration corresponding to the pitch period T, and so on.

Sur la figure 2, l'inversion des échantillons par groupe est systématique. Dans une variante telle que représentée sur la figure 1, on peut rendre aléatoire l'occurrence de cette inversion. Il peut même être prévu de fixer un seuil p de probabilité pour inverser ou non les échantillons d'un groupe. Dans l'exemple représenté sur la figure 1, le seuil p est fixé à 50% de sorte que seuls deux groupes B', C', sur quatre, ont leurs échantillons inversés. Il peut être prévu aussi de rendre variable le seuil p de probabilité, en particulier de le faire dépendre d'une fonction de corrélation portant sur la période de pitch T, comme on le verra plus loin.On the figure 2 , the inversion of the samples by group is systematic. In a variant as represented on the figure 1 we can make the occurrence of this inversion random. It can even be expected to set a probability threshold p to reverse or not the samples of a group. In the example shown on the figure 1 the threshold p is set at 50% so that only two groups B ', C' out of four have their samples inverted. It may also be planned to make the probability threshold p variable, in particular to make it depend on a correlation function relating to the pitch period T, as will be seen below.

En reprenant la description du mode de réalisation illustré sur la figure 2, où l'on applique une inversion systématique des échantillons par groupe, on obtient, en référence maintenant à la figure 3a, une nouvelle succession d'échantillons T', de durée correspondant à la période de pitch T, mais avec inversion des échantillons deux à deux. On a représenté sur la figure 3a les derniers échantillons des derniers blocs valides reçus dans le signal Se et qui ont été mémorisés dans un décodeur. Ici, comme l'inversion est systématique et non pas aléatoire avec estimation d'une corrélation, on a déterminé la période de pitch T du signal voisé (par un moyen connu en soi) et on a recueilli les derniers échantillons 10,11,...,22 du signal Se, qui s'étendent sur la durée de la période de pitch T. Les deux premiers échantillons 10 et 11 sont inversés dans le signal à reconstruire, noté Ss. Les troisième et quatrième échantillons 12 et 13 sont inversés aussi, et ainsi de suite. On obtient alors une succession T' d'échantillons 11, 10, 13, 12, ... qui s'étend sur une même durée que la période de pitch. Si plusieurs blocs s'étendant sur plusieurs périodes de pitch manquent au décodage, on continue la reconstruction du signal Ss en prenant la succession T' et en recommençant l'inversion des échantillons deux à deux de la succession T', pour obtenir une nouvelle succession T", et ainsi de suite.By repeating the description of the embodiment illustrated on the figure 2 , where we apply a systematic inversion of the samples by group, we obtain, with reference now to the figure 3a , a new succession of samples T ' , of duration corresponding to the pitch period T, but with inversion of the samples two by two. We have shown on the figure 3a the last samples of the last valid blocks received in the signal Se and which have been stored in a decoder. Here, since the inversion is systematic and not random with estimation of a correlation, the pitch period T of the voiced signal (by means known per se) was determined and the last samples 10, 11, were collected. .., 22 of the signal Se, which extend over the duration of the pitch period T. The first two samples 10 and 11 are inverted in the signal to be reconstructed, denoted S. The third and fourth samples 12 and 13 are reversed. also, and so on. We then obtain a succession T 'of samples 11, 10, 13, 12, ... which extends over the same duration as the pitch period. If several blocks extending over several pitch periods are missing at the decoding, the reconstruction of the signal Ss is continued by taking the succession T 'and starting again the inversion of the two by two samples of the succession T ', to obtain a new succession T " , and so on.

Dans le cas de la figure 3a, le nombre d'échantillons par périodes T, T', T" est égal à un même nombre impair (treize échantillons dans l'exemple représenté), ce qui permet d'obtenir un mélange progressif des échantillons au fur et à mesure de la reconstruction du signal Ss, et de là, une atténuation efficace de la sur-harmonicité (ou, autrement dit, du survoisement du signal reconstruit).In the case of figure 3a the number of samples per periods T , T ', T "is equal to the same odd number (thirteen samples in the example shown), which makes it possible to obtain a gradual mixing of the samples as and when the reconstruction of the signal Ss, and from there, an effective attenuation of the over-harmonicity (or, in other words, the overwriting of the reconstructed signal).

En revanche, dans le cas illustré sur la figure 3b où le nombre d'échantillons par périodes T, T', T" est un nombre pair (douze échantillons dans l'exemple représenté), en pratiquant deux fois une inversion (de la période T à la période T', puis de la période T' à la période T") des échantillons, pris deux à deux, de la période de pitch T, on retrouve exactement la même succession que la période de pitch T dans la succession T", ce qui génère alors une sur-harmonicité.On the other hand, in the case illustrated figure 3b where the number of samples per periods T , T ', T "is an even number (twelve samples in the example shown), practicing twice an inversion (of the period T to the period T' , then of the period T 'at the period T " ) of the samples, taken two by two, from the pitch period T, we find exactly the same succession as the pitch period T in the succession T" , which then generates an over-harmonicity.

Ce problème peut être surmonté en modifiant le nombre d'échantillons à inverser par groupe (et prendre par exemple un nombre impair d'échantillons par groupe).This problem can be overcome by modifying the number of samples to be inverted per group (and for example taking an odd number of samples per group).

On a toutefois illustré un autre mode de réalisation sur la figure 3c. Ce mode de réalisation consiste simplement, lorsque la période de pitch comporte un nombre pair d'échantillons et lorsque les inversions visent des nombres pairs d'échantillons par groupe, à ajouter un nombre impair d'échantillons à la période de pitch du signal à reconstruire. Sur la figure 3c, la dernière période de pitch détectée T comporte douze échantillons 31, 32, ..., 42. On ajoute alors un échantillon à la période de pitch et on obtient une période T+1 comportant un nombre impair d'échantillons. Ainsi, dans l'exemple illustré sur la figure 3c, l'échantillon 30 devient le premier échantillon de la mémoire à partir de laquelle on applique l'inversion d'échantillons deux à deux comme illustré sur la figure 2 (ou la figure 3a). On obtient une période T' du signal reconstruit Ss, comportant un nombre d'échantillons impair, à laquelle on applique encore l'inversion d'échantillons deux à deux pour obtenir la période T", comportant encore un nombre d'échantillons impair, et ainsi de suite. On notera alors que la succession d'échantillons 33, 30, 35, 32, 34, ...de la période T" est bien différente, cette fois, de la succession d'échantillons 30, 31, 32, 33, ... de la période de pitch initiale T. However, another embodiment has been illustrated on the figure 3c . This embodiment simply consists, when the pitch period comprises an even number of samples and when the inversions aim at even numbers of samples per group, to add an odd number of samples to the pitch period of the signal to be reconstructed. . On the figure 3c , the last detected pitch period T comprises twelve samples 31, 32, ..., 42. A sample is then added to the pitch period and a period T + 1 having an odd number of samples is obtained. So, in the example shown on the figure 3c the sample 30 becomes the first sample of the memory from which the two-by-two sample inversion is applied as shown in FIG. figure 2 (or the figure 3a ). We obtain a period T ' of the reconstructed signal Ss, comprising an odd number of samples, to which the sample inversion is again applied two by two to obtain the period T " , again including an odd number of samples, and Thus, it will be noted that the succession of samples 33, 30, 35, 32, 34, ... of the period T " is very different, this time, from the succession of samples 30, 31, 32, 33, ... of the initial pitch period T.

En référence à nouveau à la figure 4 mettant en oeuvre, dans l'exemple représenté, le mode de réalisation illustré sur les figures 2, 3a et 3c, lorsque le signal Se est moyennement voisé (flèche M en sortie du test 51), on détermine la période de pitch T sur les derniers échantillons du signal Se valablement reçus (par une technique 56 qui peut être connue en soi). On détecte si le nombre d'échantillons dans la période de pitch T est pair ou impair. Si ce nombre est impair (flèche N en sortie du test 57), on applique directement l'inversion des échantillons deux à deux (étape 58) comme décrit ci-avant en référence à la figure 3a. Si le nombre d'échantillons dans la période de pitch T est pair (flèche O en sortie du test 57), on ajoute un échantillon à la période de pitch T (étape 59) et on applique ensuite l'inversion des échantillons deux à deux (étape 58), conformément au traitement décrit ci-avant en référence à la figure 3c. Ensuite, on applique éventuellement un gain choisi 61 à la succession d'échantillons ainsi obtenue pour former le signal finalement reconstruit Ss.With reference again to the figure 4 implementing, in the example shown, the embodiment illustrated on the figures 2 , 3a and 3c when the signal Se is moderately voiced (arrow M at the output of the test 51), the pitch period T is determined on the last samples of the signal Is validly received (by a technique 56 which can be known per se). It is detected whether the number of samples in the pitch period T is even or odd. If this number is odd (arrow N at the output of the test 57), the sample inversion is applied directly two by two (step 58) as described above with reference to the figure 3a . If the number of samples in the pitch period T is even (arrow O at the output of the test 57), a sample is added to the pitch period T (step 59) and then the sample inversion is applied two by two. (step 58), in accordance with the treatment described above with reference to the figure 3c . Then, a chosen gain 61 is optionally applied to the succession of samples thus obtained to form the finally reconstructed signal Ss.

Comme indiqué ci-avant en référence à la figure 4, la période de pitch est tout d'abord calculée à partir d'une ou de quelques trames précédentes. Ensuite, l'excitation à harmonicité réduite est générée de la manière illustrée sur la figure 2, avec inversion systématique. Toutefois, dans la variante illustrée sur la figure 1, elle peut être générée avec inversion aléatoire. Cette inversion irrégulière des échantillons de l'excitation voisée permet avantageusement d'atténuer la sur-harmonicité. On détaille ci-après cette réalisation avantageuse.As indicated above with reference to the figure 4 , the pitch period is firstly calculated from one or a few previous frames. Then the reduced harmonic excitation is generated as illustrated on the figure 2 , with systematic inversion. However, in the variant illustrated on the figure 1 it can be generated with random inversion. This irregular inversion of the samples of the voiced excitation advantageously makes it possible to attenuate the over-harmonicity. This advantageous embodiment is described below.

Habituellement, en simple recopie de période de pitch, l'excitation voisée est calculée selon une formule du type : $s (n) = g_{ltp} . s (n - T)$

où T la période de pitch estimée et g_ltp est un gain LTP choisi.Usually, in simple copy of pitch period, the voiced excitation is calculated according to a formula of the type:

s (not) = {boy Wut}_{ltp} . s (not - T)

where T is the estimated pitch period and g _ltp is a chosen LTP gain.

Dans une forme de réalisation de l'invention, l'excitation voisée est calculée par groupe de deux échantillons et avec inversion aléatoire selon le traitement ci-après.In one embodiment of the invention, the voiced excitation is calculated by group of two samples and with random inversion according to the treatment below.

Tout d'abord, on génère un nombre aléatoire x dans l'intervalle [0 ; 1]. Ensuite, en fonction de la valeur de x :

si x < p, s(n) et s(n+1) sont calculés à partir de l'équation (1)
si x ≥ p, s(n) et s(n+1) sont calculés selon les équations (2) et (3) suivantes :

s (n) = g_{ltp} . s (n - T + 1)

s (n + 1) = g_{ltp} . s (n - T)

First, we generate a random number x in the interval [0; 1]. Then, depending on the value of x:

if x <p, s (n) and s (n + 1) are calculated from equation (1)
if x ≥ p, s (n) and s (n + 1) are calculated according to the following equations (2) and (3):

s (not) = {boy Wut}_{ltp} . s (not - T + 1)

s (not + 1) = {boy Wut}_{ltp} . s (not - T)

La valeur p représente la probabilité d'inverser les deux échantillons s(n) et s(n+1). Par exemple, on peut fixer la valeur p telle que p = 50%.The value p represents the probability of inverting the two samples s (n) and s (n + 1). For example, we can set the value p such that p = 50%.

Dans une variante avantageuse, on peut aussi choisir une probabilité variable, par exemple de la forme : $p = corr$

où la variable corr correspond à la valeur maximum de la fonction de corrélation sur la période de pitch, notée Corr(T). Pour une période de pitch T, la fonction de corrélation Corr(T) est calculée en n'utilisant que 2*T_m échantillons à la fin du signal mémorisé, et:

Corr (T) = \frac{2 \sum_{i = Lmem - 2 T_{m} + T}^{Lmem - 1} m_{i} m_{i - T}}{\sum_{i = Lmem - 2 T_{m}}^{Lmem - 1} {m_{i}}^{2} + \sum_{i = Lmem - 2 T_{m} + T}^{Lmem - 1 - T} {m_{i}}^{2}}

où m ₀ ··· m _Lmem-1 sont les derniers échantillons du signal décodé précédemment, et sont encore disponibles dans la mémoire du décodeur.In an advantageous variant, it is also possible to choose a variable probability, for example of the form:

p = corr

where the variable corr corresponds to the maximum value of the correlation function over the pitch period, denoted Corr ( T ). For a pitch period T, the correlation function Corr ( T ) is calculated using only 2 * T _m samples at the end of the memorized signal, and:

Corr (T) = \frac{2 Σ_{i = LMEM - 2 T_{m} + T}^{LMEM - 1} m_{i} m_{i - T}}{Σ_{i = LMEM - 2 T_{m}}^{LMEM - 1} {m_{i}}^{2} + Σ_{i = LMEM - 2 T_{m} + T}^{LMEM - 1 - T} {m_{i}}^{2}}

where m ₀ ··· m _{Lmem -1} are the last samples of the decoded signal, and are still available in the decoder memory.

De cette formule, on comprendra que la longueur de cette mémoire L_mem (en nombre d'échantillons stockés) doit être égale à au moins deux fois la valeur maximale de la durée de période de pitch (en nombre d'échantillons). Pour tenir compte des voix les plus graves (plus basse fréquence fondamentale de l'ordre de 50 Hz), le nombre d'échantillons à stocker peut être de l'ordre de 300, pour un faible taux d'échantillonnage en bande étroite, et de plus de 300 pour des taux d'échantillonnage plus élevés.From this formula, it will be understood that the length of this memory L _mem (in number of stored samples) must be at least twice the maximum value of the pitch period duration (in number of samples). In order to take account of the most serious voices (the lowest fundamental frequency of the order of 50 Hz), the number of samples to be stored can be of the order of 300, for a low sampling rate in narrow band, and more than 300 for higher sampling rates.

La fonction de corrélation corr(T), donnée par la formule (5), atteint une valeur maximale lorsque la variable T correspond à la période de pitch T ₀ et cette valeur maximale donne une indication du degré de voisement. Typiquement, si cette valeur maximale est très proche de 1, alors le signal est fortement voisé. Si elle est proche de 0, le signal n'est pas voisé.The correlation function corr ( T ), given by the formula (5), reaches a maximum value when the variable T corresponds to the pitch period T ₀ and this maximum value gives an indication of the degree of voicing. Typically, if this maximum value is very close to 1, then the signal is strongly voiced. If it is close to 0, the signal is not voiced.

On comprendra ainsi que dans cette réalisation, la détermination préalable de la période de pitch n'est pas nécessaire pour construire les groupes d'échantillons à inverser. En particulier, la détermination de la période de pitch T ₀ peut être effectuée conjointement avec la constitution des groupes au sens de l'invention, par application de la formule (5) ci-avant.It will thus be understood that in this embodiment, the prior determination of the pitch period is not necessary to build the groups of samples to be reversed. In particular, the determination of the pitch period T ₀ can be carried out together with the constitution of the groups within the meaning of the invention, by application of formula (5) above.

Si le signal est très voisé, alors la probabilité p sera très grande, et le voisement sera conservé conformément au calcul selon la formule (1). Si, en revanche, le voisement du signal Se n'est pas trop marqué, la probabilité p sera plus faible et on utilisera avantageusement les équations (2) et (3).If the signal is very voiced, then the probability p will be very large, and the voicing will be preserved according to the calculation according to the formula (1). If, on the other hand, the voicing of the signal Se is not too marked, the probability p will be lower and the equations (2) and (3) will advantageously be used.

Bien entendu, d'autres calculs de corrélations peuvent aussi être utilisés.Of course, other correlation calculations can also be used.

Par exemple, il est aussi possible de calculer l'excitation harmonique en fonction de classes prédéfinies. Pour les classes très voisées, l'équation (1) sera plutôt utilisée. Pour les classes moyennement ou faiblement voisées, les équations (2) et (3) seront plutôt utilisées. Pour les classes non voisées, aucune excitation harmonique n'est générée et l'excitation peut alors être générée à partir d'un bruit blanc. Toutefois, dans la variante décrite précédemment, les équations (2) et (3) seront aussi utilisées avec une période de pitch arbitraire suffisamment grande.For example, it is also possible to calculate the harmonic excitation according to predefined classes. For highly voiced classes, equation (1) will be used instead. For moderately or weakly voiced classes, equations (2) and (3) will be used instead. For unvoiced classes, no harmonic excitation is generated and the excitation can then be generated from a white noise. However, in the variant described above, equations (2) and (3) will also be used with a sufficiently large arbitrary pitch period.

De manière plus générale, la présente invention ne se limite pas aux formes de réalisation décrites ci-avant à titre d'exemple ; elle s'étend à d'autres variantes.More generally, the present invention is not limited to the embodiments described above by way of example; it extends to other variants.

Dans le contexte de réalisation de l'invention décrite en détail ci-avant, la génération d'excitation en codage par synthèse prédictive CELP vise à éviter le survoisement dans le contexte de la dissimulation d'erreurs de transmissions de trames. Il peut être prévu néanmoins d'utiliser les principes de l'invention pour de l'extension de bande. On peut alors utiliser la génération d'une excitation en bande élargie dans un système d'extension de bande (avec ou sans transmission d'informations), basée sur un modèle de type CELP (ou sous-bande CELP). L'excitation de la bande haute peut être alors calculée comme décrit précédemment, ce qui permet de limiter alors la sur-harmonicité de cette excitation.In the context of the embodiment of the invention described in detail above, the generation of excitation in predictive synthesis coding CELP aims to avoid overwriting in the context of the concealment of frame transmission errors. Nevertheless, it is possible to use the principles of the invention for band extension. It is then possible to use the generation of an expanded band excitation in a band extension system (with or without information transmission), based on a CELP type model (or CELP subband). The excitation of the high band can then be calculated as previously described, which then limits the over-harmonicity of this excitation.

Par ailleurs, la mise en oeuvre de l'invention est particulièrement adaptée à la transmission de signaux sur réseaux par trames, ou encore par paquets, par exemple des paquets de "voix sur IP" (pour "Internet Protocol"), de manière à fournir une qualité acceptable lors de la perte de tels paquets sur IP, tout en garantissant néanmoins une complexité limitée.Moreover, the implementation of the invention is particularly suited to the transmission of signals over packet networks, or even by packets, for example " voice over IP " (for " Internet Protocol ") packets, so as to provide acceptable quality when losing such packets over IP, while still ensuring limited complexity.

Bien entendu, l'inversion des échantillons peut être menée sur des groupes d'échantillons de taille supérieure à deux.Of course, the inversion of the samples can be carried out on groups of samples larger than two.

Par ailleurs, on a décrit ci-avant la génération d'un bloc de remplacement d'un bloc invalide à partir des échantillons d'un bloc valide, précédant le bloc invalide. Dans une variante, on peut s'appuyer plutôt sur un bloc valide succédant le bloc invalide pour réaliser la synthèse du bloc invalide (synthèse a posteriori). Cette réalisation peut être avantageuse notamment pour synthétiser plusieurs blocs invalides successifs et, en particulier, pour synthétiser :

des blocs invalides succédant immédiatement des blocs valides précédents, à partir de ces blocs précédents,
puis des blocs invalides précédant immédiatement des blocs valides suivants, à partir de ces blocs suivants.

Furthermore, it has been described above the generation of a replacement block of an invalid block from the samples of a valid block, preceding the invalid block. In a variant, one can rather rely on a valid block succeeding the invalid block to realize the synthesis of the invalid block (a posteriori synthesis). This embodiment may be advantageous in particular for synthesizing several successive invalid blocks and, in particular, for synthesizing:

invalid blocks immediately succeeding previous valid blocks, from these previous blocks,
then invalid blocks immediately preceding subsequent valid blocks, from these subsequent blocks.

La présente invention vise aussi un programme d'ordinateur destiné à être stocké en mémoire d'un dispositif de synthèse d'un signal audionumérique. Ce programme comporte alors des instructions pour la mise en oeuvre du procédé au sens de l'invention, lorsqu'il est exécuté par un processeur d'un tel dispositif de synthèse. D'ailleurs, la figure 4 décrite ci-avant peut illustrer un organigramme d'un tel programme d'ordinateur.The present invention also relates to a computer program intended to be stored in memory of a device for synthesizing a digital audio signal. This program then comprises instructions for implementing the method within the meaning of the invention, when it is executed by a processor of such a synthesis device. Moreover, the figure 4 described above can illustrate a flowchart of such a computer program.

Par ailleurs, la présente invention vise aussi un dispositif de synthèse d'un signal audionumérique constitué d'une succession de blocs. Ce dispositif pourrait comporter d'ailleurs une mémoire stockant le programme d'ordinateur précité. En référence à la figure 5, ce dispositif SYN, comporte :

une entrée E pour recevoir des blocs du signal Se, précédant au moins un bloc courant à synthétiser, et
une sortie S pour délivrer le signal synthétisé Ss et comportant au moins ce bloc courant à synthétiser.

Furthermore, the present invention also provides a device for synthesizing a digital audio signal consisting of a succession of blocks. This device could also include a memory storing the aforementioned computer program. With reference to the figure 5 this SYN device comprises:

an input E for receiving blocks of the signal Se, preceding at least one current block to be synthesized, and
an output S for delivering the synthesized signal Ss and comprising at least this current block to be synthesized.

Le dispositif de synthèse SYN au sens de l'invention comporte des moyens tels qu'une mémoire de travail MEM (ou de stockage du programme d'ordinateur précité) et un processeur PROC coopérant avec cette mémoire MEM, pour la mise en oeuvre du procédé au sens de l'invention, et pour synthétiser ainsi le bloc courant à partir d'au moins un des blocs précédents du signal Se.The synthesis device SYN within the meaning of the invention comprises means such as a working memory MEM (or storage of the aforementioned computer program) and a PROC processor cooperating with this memory MEM, for the implementation of the method within the meaning of the invention, and thus to synthesize the current block from at least one of the preceding blocks of the signal Se.

La présente invention vise aussi un appareil de réception d'un signal audionumérique constitué d'une succession de blocs, tel qu'un décodeur d'un tel signal par exemple. En référence encore à la figure 5, cet appareil peut comporter avantageusement un détecteur de blocs invalides DET, ainsi que le dispositif SYN au sens de l'invention pour synthétiser des blocs invalides détectés par le détecteur DET.The present invention also provides an apparatus for receiving a digital audio signal consisting of a succession of blocks, such as a decoder of such a signal for example. Still referring to the figure 5 this apparatus may advantageously comprise an invalid block detector DET, as well as the device SYN within the meaning of the invention for synthesizing invalid blocks detected by the detector DET.

Claims

Method for synthesizing a digital audio signal represented by consecutive blocks of samples, in which, on receipt of such a signal, to replace at least one invalid block, a replacement block is generated on the basis of the samples of at least one valid block preceding the invalid block,
characterized in that it comprises the following steps:
a) estimating a correlation making it possible to detect a period, if any, corresponding to the inverse of a fundamental frequency of a tone in the digital audio signal, and selecting a number (T) of samples which is dependent on this estimation, forming a succession in at least one last valid block preceding the invalid block,

b) fragmenting the succession of samples into groups (A,B,C,D) of two samples, and, in at least some of the groups, inverting or not inverting the two samples as a function of said estimation of correlation,

c) re-concatenating the groups (A',B',C',D') for which the samples of some at least were inverted in step b), so as to form a part (T') at least of the replacement block, and

d) if said part obtained in step c) does not fill the whole of the replacement block, copying said part (T') into the replacement block and re-applying steps b) and c) to said copied part.
Method according to Claim 1, in which the digital audio signal is a speech signal, characterized in that the estimation of correlation comprises the detection of a degree of voicing (51) in the speech signal and steps b) to d) are applied if the signal is weakly voiced or unvoiced.
Method according to one of the preceding claims, characterized in that, to conduct step a) :
al) a search is undertaken for a correlation so as to detect a period, if any, corresponding to the inverse of a fundamental frequency of a tone in the digital audio signal (56), and

a2) said number of samples selected in step a) corresponds:
• to the number of samples contained in a period corresponding to the inverse of a fundamental frequency of the tone if the correlation search detects said period, and

• to a predetermined fixed number of samples, otherwise.
Method according to Claim 3, characterized in that, if the number of samples contained in the period of the detected tone is an even number, an odd number of samples (30) is added to or deducted from the samples of said period to form the selection of step a), a sample of the succession (T') thus formed in step a) not being grouped together, in step b), with another sample of said succession (T').
Method according to one of the preceding claims, characterized in that said predetermined rules make it necessary for the occurrences of inversion of samples in each group to be rendered random and said rules fix a probability threshold (p) for inverting or not inverting the samples of a group.
Method according to Claim 5, characterized in that the probability threshold (p) is variable and depends on the estimation of correlation.
Computer program intended to be stored in memory of a device for synthesizing a digital audio signal, characterized in that it comprises instructions for the implementation of the method according to one of Claims 1 to 6 when it is executed by a processor of such a synthesizing device.
Device for synthesizing a digital audio signal consisting of a succession of blocks, comprising:
- an input for receiving blocks of the signal (Se), preceding at least one current block to be synthesized,
and

- an output for delivering the synthesized signal (Ss) and comprising at least said current block, characterized in that it comprises means (MEM, PROC) adapted for the implementation of the method according to one of Claims 1 to 6, for synthesizing the current block on the basis of at least one of said preceding blocks.
Apparatus for receiving a digital audio signal consisting of a succession of blocks, comprising a detector of invalid blocks (DET),
characterized in that it furthermore comprises a device (SYN) according to Claim 8, for synthesizing replacement blocks for the invalid blocks.