FR2830970A1

FR2830970A1 - Telephone channel transmission speech signal error sample processing has errors identified and preceding/succeeding valid frames found/samples formed following speech signal period and part blocks forming synthesised frame.

Info

Publication number: FR2830970A1
Application number: FR0113181A
Authority: FR
Inventors: Franck Bouteille
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2001-10-12
Filing date: 2001-10-12
Publication date: 2003-04-18
Anticipated expiration: 2021-10-12
Also published as: FR2830970B1

Abstract

The error correction system identifies an erroneous frame finding a first valid frame (t-1) preceding the erroneous frame and a second valid frame (t+1) following the frame. A succession of synthesised blocks are formed following the preceding and succeeding frames with a number of samples (TBL) calculated as a fundamental period of the speech signal in the valid frames (P0,P1). Part of the block samples are used to form a synthesised frame (tr).

Description

Procédé et dispositif de synthèse de trames de substitution, dans une succession de trames représentant un signal de parole L'invention concerne le domaine du traitement de signaux de parole reçus par échantillons. The invention relates to the field of the processing of speech signals received by samples.

Le codage d'un signal de parole donne lieu à un train binaire qui est acheminé par un canal de transmission. A l'autre extrémité du canal, un décodeur restitue le signal de parole sous la forme d'une succession de trames d'échantillons, pour son exploitation.

The coding of a speech signal gives rise to a bit stream which is routed through a transmission channel. At the other end of the channel, a decoder renders the speech signal in the form of a succession of sample frames for its operation.

On distingue plusieurs types de codeurs, parmi lesquels : des codeurs dits temporels , effectuent la compression des échantillons de signal numérisé, tels que les codeurs MIC ou MICDA décrits notamment dans : W. R. Daumer, P. Mermelstein, X. Maitre et I. Tokizawa. There are several types of encoders, among which: so-called temporal coders, perform the compression of the digitized signal samples, such as the MIC or ADPCM encoders described in particular in: W. R. Daumer, P. Mermelstein, X. Master and I. Tokizawa.

"Overview of the ADPCM coding algorithm". Proc. of GLOBECOM 1984, PP. 23.1. 1 23.1. 4, et X. Maitre. 117 kHz audio coding within 64 kbit/s". IEEE Journal on Selected Areas on Communications, Vol. 6-2, février 1988, PP. 283-298, - et des codeurs paramétriques qui analysent des trames successives d'échantillons du signal à coder pour extraire, à chacune de ces trames, un certain nombre de paramètres qui sont ensuite codés et transmis. "Overview of the ADPCM coding algorithm". Proc. of GLOBECOM 1984, PP. 23.1. 1 23.1. 4, and X. Master. 117 kHz audio coding within 64 kbit / s, IEEE Journal on Selected Areas on Communications, Vol 6-2, February 1988, pp. 283-298, and parametric coders which analyze successive frames of samples of the signal at code to extract, at each of these frames, a number of parameters which are then coded and transmitted.

Dans cette catégorie, on connaît : - des vocodeurs, décrits notamment dans : In this category, we know: vocoders, described in particular in:

T. E. Tremain."The government standard linear predictive coding algorithm : LPC 10". Speech technology, avril 1982, PP. 40-49, - des codeurs IMBE, décrits notamment dans : J. C. Hardwick et J. S. Lim."The application of the IMBE speech coder to mobile communications". Proc. of ICASSP conference, 1991, PP. 249-252, - ou encore des codeurs dits par transformée , décrits notamment dans : K. H. Brandenburg et M. Bossi."Overview of MPEG audio : current and future standards for low-bit-rate audio coding". Journal of Audio Eng. Soc., Vol. 45-1/2, janvier/février 1997, PP. 4-21. T. E. Tremain, "The standard linear predictive coding algorithm: LPC 10". Speech technology, April 1982, PP. 40-49, IMBE coders, described in particular in: J. C. Hardwick and J. S. Lim. "The application of the IMBE speech coder to mobile communications". Proc. of ICASSP conference, 1991, PP. 249-252, or so-called transform encoders, described in particular in: K. H. Brandenburg and M. Bossi "Overview of MPEG audio: current and future standards for low-bit-rate audio coding". Journal of Audio Eng. Soc., Vol. 45-1 / 2, January / February 1997, PP. 4-21.

On connaît aussi des codeurs qui complètent le codage des paramètres représentatifs des codeurs paramétriques par le codage d'une forme d'onde temporelle résiduelle. Encoders are also known which complete the coding of the parameters representative of the parametric coders by the coding of a residual temporal waveform.

Dans cette catégorie, on trouve les codeurs prédictifs et notamment la famille des codeurs à analyse par synthèse tels que : - le codeur RPE-LTP, décrit notamment dans : K. Hellwig, P. Vary, D. Massaloux, J. P. Petit, C. Goland et M. Rosso,"Speech codec for the European mobile radio system", GLOBECOM conference, 1989, PP. 1065-1069,

- et le codeur CELP, décrit notamment dans : M. R. Schroeder et B. S. Atal,"Code-Excited Linear Prediction (CELP) : High-Quality Speech at Very Low Bit Rates", Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, PP. 937-940, mars 1985. In this category, there are predictive coders and in particular the family of synthesis analysis coders such as: the RPE-LTP coder, described in particular in: K. Hellwig, P. Vary, D. Massaloux, JP Petit, C. Goland and M. Rosso, "Speech codec for the European mobile radio system", GLOBECOM conference, 1989, PP. 1065-1069,

- and the CELP coder, described in particular in: MR Schroeder and BS Atal, "Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates", Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, PP. 937-940, March 1985.

Pour tous ces codeurs, les valeurs codées sont ensuite transformées en un train binaire qui sera transmis sur un canal de transmission. Selon la qualité de ce canal et le type de transport, des perturbations peuvent affecter le signal transmis et produire des erreurs sur le train binaire reçu par le décodeur. Des erreurs peuvent intervenir de manière isolée dans le train binaire ou encore se produire par rafales. Dans ce dernier cas, un paquet de données binaires, correspondant à une portion complète du signal, présente des erreurs ou n'est pas reçu. De tels problèmes se rencontrent par exemple pour les transmissions par trames sur les réseaux mobiles. Ils se rencontrent aussi dans les transmissions sur les réseaux par paquets et en particulier sur les réseaux de type interne.

For all these coders, the coded values are then transformed into a bit stream which will be transmitted on a transmission channel. Depending on the quality of this channel and the type of transport, disturbances may affect the transmitted signal and produce errors on the bitstream received by the decoder. Errors can occur in isolation in the bit stream or occur in bursts. In the latter case, a binary data packet, corresponding to a full portion of the signal, has errors or is not received. Such problems are encountered, for example, for frame transmissions over mobile networks. They are also found in transmission on packet networks and in particular on internal networks.

On désignera alors par les termes trame erronée , une trame ou un paquet qui présente des erreurs à la réception, qui n'est pas reçu, ou encore dont la réception est trop tardive pour être traité. By the terms erroneous frame, a frame or a packet which has errors on reception, which is not received, or whose reception is too late to be processed, will then be designated by the terms erroneous frame.

Lorsque le système de transmission ou des modules chargés de la réception permettent d'identifier des trames erronées en détectant que les données reçues présentent des erreurs (par exemple sur les réseaux mobiles) ou qu'un bloc de données n'a pas été reçu (cas de systèmes à transmission par paquets par exemple), des procédures de dissimulation des erreurs sont alors mises en oeuvre. Ces procédures permettent d'extrapoler au décodeur les échantillons du signal manquant à partir des signaux et When the transmission system or reception modules identify erroneous frames by detecting that the received data has errors (for example on mobile networks) or that a block of data has not been received ( case of packet transmission systems for example), error concealment procedures are then implemented. These procedures make it possible to extrapolate to the decoder the samples of the signal missing from the signals and

données disponibles issus des trames précédant les zones effacées. available data from the frames preceding the erased areas.

De telles techniques ont été mises en oeuvre principalement dans le cas des codeurs paramétriques, avec des techniques de synthèse des trames effacées. Elles permettent de limiter fortement la dégradation subjective du signal perçue au décodeur en présence de trames effacées. La plupart des traitements développés reposent sur la technique utilisée pour le codeur et le décodeur, et constituent en fait une extension du décodeur. Such techniques have been implemented mainly in the case of parametric encoders, with techniques for synthesizing erased frames. They make it possible to strongly limit the subjective degradation of the signal perceived at the decoder in the presence of erased frames. Most of the treatments developed rely on the technique used for the encoder and decoder, and are in fact an extension of the decoder.

La plupart des algorithmes de codage de type prédictif proposent des techniques de récupération de trames effacées, décrites notamment dans : - la recommandation GSM 06. 11,"Substitution and muting of lost frames for full rate speech traffic channels". Most of the predictive coding algorithms propose techniques for recovering erased frames, described in particular in: - Recommendation GSM 06. 11, "Substitution and muting of lost frames for full rate speech traffic channels".

ETSI/TC SMG, ver. : 3.0. 1., février 1992, - ITU T Annex A to recommendation G. 723. 1"Silence compression scheme for dual rate speech coder for multimedia communications transmitting at 5.3 & 6.3 kbit/s", - R. Salami, C. Laflamme, J. P. Adoul, A. Kataoka, S. ETSI / TC SMG, ver. : 3.0. 1. February 1992. ITU-A T Recommendation G.723. Adoul, A. Kataoka, S.

Hayashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon et Y. Shoham,"Design and description of CSACELP : a toll quality 8 kb/s speech coder". IEEE Trans. on Speech and Audio Processing, Vol. 6-2, mars 1998, PP. 116- 130,
T. Honkanen, J. Vainio, P. Kapanen, P. Haavisto, R. Hayashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon and Y. Shoham, "Design and description of CSACELP: a toll quality 8 kb / s speech coder". IEEE Trans. on Speech and Audio Processing, Vol. 6-2, March 1998, PP. 116-130,
T. Honkanen, J. Vainio, P. Kapanen, P. Haavisto, R.

Salami, C. Laflamme et J. P. Adoul,"GSM enhanced full Salami, C. Laflamme and J. P. Adoul, "GSM enhanced full

rate speech codée", Proc. of ICASSP conference, 1997, PP. 771-774, - R. V. Cox,"An improved frame erasure concealment method for ITU-T Rec. G728", Delayed contribution D. 107 (WP 3/16), ITU-T, janvier 1998, - les brevets US-5574825, US-5550543, US-5615298, US- 5717822 et la demande publiée EP0673015, et - C. R. Watkins, J. H. Chen."Improving 16 kb/s G. 728 LDCELP Speech Coder for Frame Erasure Channels", Proc. of ICASSP conference, 1995, PP. 241-244. coded rate speech ", Proceedings of ICASSP conference, 1997, pp. 771-774, - R. V. Cox," An improved frame erasure concealment method for ITU-T Rec. G728 ", Delayed contribution D. 107 (WP 3/16), ITU-T, January 1998, - US-5574825, US-5550543, US-5615298, US-5717822 and published application EP0673015, and - CR Watkins , JH Chen, "Improving 16 kb / s G. 728 LDCELP Speech Coder for Frame Erasure Channels", Proceedings of ICASSP conference, 1995, pp. 241-244.

Le décodeur est informé de l'occurrence d'une trame effacée (ou erronée), par exemple dans le cas des systèmes radio-mobiles par la transmission de l'information d'effacement de trame provenant du décodeur canal. Les dispositifs de récupération par synthèse de trames effacées ont pour objectif d'extrapoler les paramètres de la trame effacée à partir de la dernière trame précédente, au moins, considérée comme valide. Certains paramètres manipulés ou codés par les codeurs prédictifs présentent une forte corrélation inter-trames (cas des paramètres LPC qui représentent l'enveloppe spectrale, et des paramètres de prédiction à long terme pour les sons voisés, par exemple). Du fait de cette corrélation, il est beaucoup plus avantageux de réutiliser les paramètres de la dernière trame valide pour synthétiser la trame effacée que d'utiliser des paramètres erronés ou aléatoires. The decoder is informed of the occurrence of an erased (or erroneous) frame, for example in the case of mobile radio systems by transmitting the frame erase information from the channel decoder. The recovery devices by synthesis of erased frames are intended to extrapolate the parameters of the erased frame from the last previous frame, at least, considered valid. Some parameters manipulated or encoded by predictive coders have a strong inter-frame correlation (for example, LPC parameters that represent the spectral envelope, and long-term prediction parameters for voiced sounds, for example). Because of this correlation, it is much more advantageous to reuse the parameters of the last valid frame to synthesize the erased frame than to use erroneous or random parameters.

Pour l'algorithme de codage CELP, les paramètres de la trame effacée sont généralement obtenus de la manière qui suit. For the CELP coding algorithm, the parameters of the erased frame are generally obtained in the following manner.

Le filtre LPC est obtenu à partir des paramètres LPC de la dernière trame valide soit par recopie des paramètres ou avec introduction d'un certain amortissement (cas du codeur G723. 1, tel que décrit dans : ITU T Annex A to recommendation G. 723. 1"Silence compression scheme for dual rate speech coder for multimedia communications transmitting at 5. 3 & 6. 3 kbit/sl1). The LPC filter is obtained from the LPC parameters of the last valid frame either by copying the parameters or by introducing a certain damping (case of the G723.1 encoder, as described in ITU T Annex A to recommendation G. 723 1 "Silence compression scheme for dual rate speech coder for multimedia communications transmitting at 5. 3 & 6. 3 kbit / sl1).

On détecte aussi le voisement pour déterminer le degré d'harmonicité du signal au niveau de la trame effacée, comme décrit notamment dans : R. Salami, C. Laflamme, J. P. Adoul, A. Kataoka, S. Voicing is also detected to determine the degree of harmonicity of the signal at the level of the erased frame, as described in particular in: R. Salami, C. Laflamme, J. P. Adoul, A. Kataoka, S.

Hayashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon et Y. Shoham,"Design and description of CSACELP : a toll quality 8 kb/s speech coder", IEEE Trans. on Speech and Audio Processing, Vol. 6-2, mars 1998, PP. 116- 130. Hayashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon and Y. Shoham, "Design and description of CSACELP: a toll quality 8 kb / s speech coder", IEEE Trans. on Speech and Audio Processing, Vol. 6-2, March 1998, PP. 116-130.

Ainsi, dans le cas d'un signal non voisé, un signal d'excitation est généré de manière aléatoire. Dans la technique décrite dans la référence précédente, on tire aléatoirement un mot de code, avec un gain de l'excitation passée légèrement amorti. Une autre technique selon laquelle on effectue une sélection aléatoire dans l'excitation passée est décrite dans : J. H. Chen, R. V. Cox, Y. C. Lin, N. Jayant et M. J. Melchner,"A low-delay CELP coder for the CCITT 16 kb/s speech coding standard", IEEE Journal on Selected Areas on Communications, Vol. 10-5, juin 1992, PP. 830-849. Thus, in the case of an unvoiced signal, an excitation signal is generated randomly. In the technique described in the previous reference, a codeword is randomly drawn, with a gain of the past excitation slightly damped. Another technique in which random selection is made in past arousal is described in: JH Chen, Cox RV, Lin YC, Jayant N and MJ Melchner, "A low-delay CELP coder for the CCITT 16 kb / s speech standard coding ", IEEE Journal on Selected Areas on Communications, Vol. 10-5, June 1992, PP. 830-849.

D'autres techniques font usage des codes transmis, même s'ils sont totalement erronés, comme dans : T. Honkanen, J. Vainio, P. Kapanen, P. Haavisto, R. Other techniques make use of transmitted codes, even if they are totally erroneous, as in: T. Honkanen, J. Vainio, P. Kapanen, P. Haavisto, R.

Salami, C. Laflamme et J. P. Adoul."GSM enhanced full rate speech codée", Proc. of ICASSP conference, 1997, PP. 771-774. Salami, C. Laflamme and J. P. Adoul, "GSM enhanced full rate speech coded", Proc. of ICASSP conference, 1997, PP. 771-774.

Dans le cas d'un signal voisé, le délai LTP est généralement le délai calculé à la trame précédente, éventuellement avec une légère"gigue" (comme dans R. In the case of a voiced signal, the LTP delay is usually the delay calculated at the previous frame, possibly with a slight "jitter" (as in R.

Salami, C. Laflamme, J. P. Adoul, A. Kataoka, S. Hayashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon et Y. Shoham,"Design and description of CS-ACELP : a toll quality 8 kb/s speech coder", IEEE Trans. on Speech and Audio Processing, Vol. 6-2, mars 1998, PP. 116-130). Le gain LTP est pris très voisin de 1 ou égal à 1. Le signal d'excitation est limité à la prédiction à long terme effectuée sur l'excitation passée. Cependant, ces techniques utilisant une modélisation du signal précédant la perte ne permettent généralement pas de faire évoluer le modèle sur une ou plusieurs trames perdues consécutives, jusqu'à une trame valide suivante. Salami, C. Laflamme, JP Adoul, A. Kataoka, S. Hayashi, Moriya T., Lamblin C., Massaloux D., Proust S., Kroon P. and Y. Shoham, "Design and description of CS-ACELP: a toll quality 8 kb / s speech coder ", IEEE Trans. on Speech and Audio Processing, Vol. 6-2, March 1998, PP. 116-130). The LTP gain is taken very close to 1 or equal to 1. The excitation signal is limited to the long-term prediction made on the past excitation. However, these techniques using pre-loss signal modeling generally do not allow the model to evolve over one or more consecutive lost frames to a next valid frame.

Dans tous les exemples cités précédemment, les procédés de récupération des trames effacées sont fortement liées au décodeur et utilisent des modules de ce décodeur, comme le module de synthèse du signal. Ils utilisent aussi des signaux intermédiaires disponibles au sein de ce décodeur, comme le signal d'excitation passé et mémorisé lors du In all the examples cited above, the methods for recovering erased frames are strongly related to the decoder and use modules of this decoder, such as the signal synthesis module. They also use intermediate signals available within this decoder, as the excitation signal passed and stored during the

traitement des trames valides précédant les trames effacées. processing valid frames preceding erased frames.

Les procédés utilisés pour dissimuler les erreurs produites par des paquets perdus lors du transport de données codées par des codeurs de type temporel font généralement appel à des techniques de substitution de formes d'ondes telles celles présentées notamment dans : - D. J. Goodman, G. B. Lockhart, O. J. Wasem, W. C. Wong, "Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. The methods used to conceal the errors produced by lost packets during the transport of coded data by time-type coders generally make use of waveform substitution techniques such as those presented in particular in: - DJ Goodman, GB Lockhart, OJ Wasem, WC Wong, "Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications", IEEE Trans. On Acoustics, Speech and Signal Processing, Vol.

ASSP-34, décembre 1986, PP. 1440-1448, -N. Erdöl, C. Castelluccia, A. Zilouchian,"Recovery of Missing Speech Packets Using the ShortTime Energy and Zero-Crossing Measurements"IEEE Trans. on Speech and Audio Processing, Vol. 1-3, juillet 1993, PP. 295-303, et - AT & T (D. A. Kapilow, R. V. Cox)"A high quality lowcomplexity algorithm for frame erasure concealment (FEC) with G. 711", Delayed Contribution D. 249 (WP 3/16), ITU, mai 1999. ASSP-34, December 1986, PP. 1440-1448, -N. Erdöl, C. Castelluccia, A. Zilouchian, "Recovery of Missing Speech Packets Using the ShortTime Energy and Zero-Crossing Measurements" IEEE Trans. on Speech and Audio Processing, Vol. 1-3, July 1993, PP. 295-303, and - AT & T (DA Kapilow, RV Cox) "A high quality lowcomplexity algorithm for frame erasure concealment (FEC) with G. 711", Delayed Contribution D. 249 (WP 3/16), ITU, May 1999.

Ces techniques reconstituent le signal en sélectionnant des portions du signal décodé avant la portion perdue et ne font pas appel à des modèles de synthèse. Des techniques de lissage sont également mises en oeuvre pour éviter les artefacts produits par la concaténation des différents signaux. These techniques reconstruct the signal by selecting portions of the decoded signal before the lost portion and do not use synthesis models. Smoothing techniques are also used to avoid the artifacts produced by the concatenation of the different signals.

D'autres techniques utilisent la connaissance de la trame suivant la trame perdue, ce qui permet d'avoir une Other techniques use the knowledge of the frame following the lost frame, which makes it possible to have a

connaissance a priori (trame précédente) et une connaissance a posteriori (trame suivante) du signal. Il est donc possible d'utiliser ces informations pour reconstruire la portion de signal manquante, comme la technique DSPM ("Double Side Pattern Matching") décrite notamment dans : D. J. Goodman, G. B. Lockhart, O. J. Wasem, W. C. Wong, "Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. a priori knowledge (previous frame) and a posterior knowledge (next frame) of the signal. It is therefore possible to use this information to reconstruct the missing portion of the signal, such as the Double Side Pattern Matching (DSPM) technique described in particular in: DJ Goodman, Lockhart GB, OJ Wasem, Wong WC, "Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications ", IEEE Trans. On Acoustics, Speech and Signal Processing, Vol.

ASSP-34, décembre 1986, PP. 1440-1448. ASSP-34, December 1986, PP. 1440-1448.

C'est également le cas des techniques"DSPS" (Double Side Periodic Substitution) qui mettent en oeuvre une recopie de périodes fondamentales du signal de parole ( pitch replication ). La période fondamentale du signal de parole correspond à l'inverse de sa fréquence fondamentale et constitue un paramètre important puisqu'elle représente une hauteur tonale de la voix. This is also the case of the "DSPS" (Double Side Periodic Substitution) techniques which implement a copy of fundamental periods of the speech signal (pitch replication). The fundamental period of the speech signal corresponds to the inverse of its fundamental frequency and is an important parameter since it represents a tonal pitch of the voice.

De telles techniques sont décrites notamment dans : - J. TANG, F. ITAKURA,"Double Side Periodic Substitution (DSPS) Method for Recovering Missing Speech", ISSPA, pp. Such techniques are described in particular in: J. TANG, F. ITAKURA, "Double Side Periodic Substitution (DSPS) Method for Recovering Missing Speech", ISSPA, pp.

544-549, août 1987, et J. TANG,"Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing Speech in Packet Voice Communications", IEEE Computers and Communications, pp. 454-458,1991. 544-549, August 1987, and J. TANG, "Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing Speech in Packets and Voice Communications," IEEE Computers and Communications, pp. 454 to 458.1991.

Cependant, le nombre d'échantillons que doit comprendre la trame à synthétiser ne correspond pas, en principe, à un However, the number of samples to be included in the frame to be synthesized does not correspond, in principle, to a

multiple du nombre d'échantillons dans un segment représentant une période de Pitch, ce qui se traduit généralement par la présence d'artefacts audibles dans le signal reconstitué. De plus, ces techniques de"Pitch replication"ne s'appliquent que lorsque le signal perdu est localisé dans une zone voisée. Dans les autres cas (zone non voisée), la trame précédente est simplement recopiée. multiple of the number of samples in a segment representing a pitch period, which usually results in the presence of audible artifacts in the reconstructed signal. In addition, these "pitch replication" techniques only apply when the lost signal is located in a voiced area. In other cases (unvoiced area), the previous frame is simply copied.

Pour les codeurs par transformée, les techniques de reconstruction des trames effacées s'appuient également sur la structure de codage utilisée. Certaines techniques visent à régénérer les coefficients de transformée perdus, à partir des valeurs prises par ces coefficients avant l'effacement. Une technique de ce type est décrite notamment dans : - PictureTel Corporation,"Detailed Description of the PTC

(PictureTel Transform Coder)", Contribution ITU-T, SG15/WP2/Q6, 8-9 Octobre 1996 Baltimore meeting, TD7. Une autre technique est basée sur la construction d'un modèle sinusoïdal de signal décodé, lequel sert à régénérer la partie du signal perdue. Elle est décrite notamment dans : V. N. Parikh, J. H. Chen, G. Aguilar,"Frame Erasure Concealment Using Sinusoidal Analysis Synthesis and its Application to MDCT-Based Codecs", Proc. of ICASSP conference, 2000. For transform coders, the techniques for reconstructing erased frames also rely on the coding structure used. Some techniques aim at regenerating the lost transform coefficients from the values taken by these coefficients before the erasure. A technique of this type is described in particular in: - PictureTel Corporation, "Detailed Description of the PTC

(PictureTel Transform Coder) ", ITU-T Contribution, SG15 / WP2 / Q6, 8-9 October 1996 Baltimore meeting, TD7 Another technique is based on the construction of a sinusoidal decoded signal model, which serves to regenerate the part of the lost signal.It is described in particular in: VN Parikh, JH Chen, G. Aguilar, "Frame Erasure Concealment Using Sinusoidal Analysis Synthesis and its Application to MDCT-Based Codecs", Proc of ICASSP conference, 2000.

Certaines techniques de dissimulation par synthèse des trames erronées ont été développées conjointement avec le Some synthetic concealment techniques for erroneous frames have been developed in conjunction with the

codage canal. Ces techniques se servent d'informations fournies par le décodeur canal, par exemple d'informations concernant le degré de fiabilité des paramètres reçus. channel coding. These techniques make use of information provided by the channel decoder, for example information concerning the degree of reliability of the parameters received.

Cependant, il s'agit là d'approches qui diffèrent sensiblement de celle de la présente invention qui ne présuppose pas l'existence d'un codeur canal, comme on le verra plus loin. Un exemple de ce type de techniques est décrit notamment dans : T. Fingscheidt, P. Vary,"Robust speech decoding : a universal approach to bit error concealment", Proc. of ICASSP conference, 1997, pp. 1667-1670. However, these are approaches which differ substantially from that of the present invention which does not presuppose the existence of a channel coder, as will be discussed below. An example of this type of technique is described in particular in: T. Fingscheidt, P. Vary, "Robust speech decoding: a universal approach to a bit error concealment", Proc. of ICASSP conference, 1997, pp. 1667-1670.

Les techniques utilisées pour dissimuler les trames effacées dans les codeurs de type CELP ont montré leur efficacité, en particulier sur les signaux de parole. D'autres types de codeurs comme les codeurs par transformée, qui utilisent des techniques d'extrapolation liées à leur représentation du signal, donnent généralement de moins bons résultats sur les signaux de parole. The techniques used to conceal erased frames in CELP encoders have been shown to be effective, particularly on speech signals. Other types of encoders, such as transform coders, which use extrapolation techniques related to their signal representation, generally give poorer results on speech signals.

Il apparaît en outre que les techniques dites de"Pitch replication"décrites ci-avant (qui utilisent une recopie modulée en amplitude d'une période fondamentale du signal de parole), ainsi que les techniques basées sur l'utilisation d'une modélisation du signal (LPC, LTP, ou autre), sont celles qui donnent les résultats les plus prometteurs. It also appears that the so-called "pitch replication" techniques described above (which use an amplitude modulated copy of a fundamental period of the speech signal), as well as the techniques based on the use of a modeling of the signal (LPC, LTP, or other), are those that give the most promising results.

Cependant, avec les techniques de modélisation du signal précédant la perte, il est difficile de faire évoluer le modèle sur une ou plusieurs trames perdues consécutives. However, with signal modeling techniques prior to loss, it is difficult to evolve the model on one or more consecutive lost frames.

Quant aux techniques de"Pitch replication", elles introduisent généralement des artefacts qui se traduisent par un saut de fréquence audible, un contraste d'énergie et/ou des effets de bords dus au recollage des segments représentant les périodes fondamentales (ou périodes de Pitch ). Pitch replication techniques generally introduce artifacts that result in audible frequency hopping, energy contrast, and / or ribbing edge effects of segments representing fundamental periods (or pitch periods). ).

La présente invention vient améliorer la situation. The present invention improves the situation.

Elle propose à cet effet un procédé de synthèse d'un signal de parole représenté par des trames successives d'échantillons, dans lequel : a) on identifie au moins une trame erronée, ainsi qu'au moins une première trame valide précédant la trame erronée et une seconde trame valide succédant la trame erronée, b) on forme une succession de blocs synthétisés en fonction des trames valides précédant et succédant la trame erronée, chaque bloc comportant un nombre d'échantillons calculé en fonction d'une période fondamentale du signal de parole dans l'une au moins des trames valides, et c) on construit une trame de synthèse en utilisant une partie au moins des échantillons des blocs. To this end, it proposes a method for synthesizing a speech signal represented by successive frames of samples, in which: a) at least one erroneous frame is identified, as well as at least one first valid frame preceding the erroneous frame; and a second valid frame succeeding the erroneous frame, b) forming a succession of synthesized blocks according to the valid frames preceding and succeeding the erroneous frame, each block comprising a number of samples calculated according to a fundamental period of the signal of speech in at least one of the valid frames, and c) constructing a synthesis frame using at least a portion of the block samples.

Selon l'un des avantages que procure la présente invention, le traitement au sens de la présente invention According to one of the advantages provided by the present invention, the treatment in the sense of the present invention

ne dépend pas du type de codage numérique effectué initialement sur le signal de la parole. does not depend on the type of digital coding initially performed on the speech signal.

Selon un autre avantage que procure la présente invention, le signal valide, à la fois avant et après la trame erronée, est utilisé pour construire la trame de synthèse. According to another advantage provided by the present invention, the valid signal, both before and after the erroneous frame, is used to construct the synthesis frame.

Selon un avantage qui découle de la présente invention, il est possible de faire évoluer, de la trame valide précédente jusqu'à la trame valide suivante, des modèles associés aux blocs de synthèse, les durées respectives de ces blocs correspondant à au moins une période fondamentale du signal de parole dans la trame de synthèse. According to an advantage that follows from the present invention, it is possible to change, from the previous valid frame to the next valid frame, models associated with the synthesis blocks, the respective durations of these blocks corresponding to at least one period fundamental of the speech signal in the synthesis frame.

Selon un avantage qui découle de la présente invention, il est possible de synthétiser plus d'une trame de substitution, en connaissant le signal avant et après la ou les trames erronées. According to an advantage that follows from the present invention, it is possible to synthesize more than one substitution frame, knowing the signal before and after the erroneous frame or frames.

Dans une réalisation préférée, les blocs formés sont des blocs de synthèse de signaux de parole. In a preferred embodiment, the blocks formed are speech signal synthesis blocks.

Selon une caractéristique avantageuse de l'invention, on prévoit : - l'obtention de valeurs de premiers et seconds paramètres d'un modèle appliqué à des première et seconde portions respectivement de la première trame et de la seconde trame, et According to an advantageous characteristic of the invention, provision is made for: obtaining values of first and second parameters of a model applied to first and second portions respectively of the first frame and the second frame, and

- la formation des échantillons d'un bloc en appliquant à ce bloc le modèle précité, avec des paramètres intermédiaires entre les premiers et seconds paramètres. - The formation of the samples of a block by applying to this block the aforementioned model, with intermediate parameters between the first and second parameters.

Selon une autre caractéristique avantageuse de l'invention, on prévoit l'obtention d'informations respectives d'un degré de voisement du signal de parole dans la première trame et dans la seconde trame. Dans cette réalisation, on tient avantageusement compte du fait que les trames précédente et suivante sont sensiblement voisées ou non pour former les échantillons des blocs de synthèse. According to another advantageous characteristic of the invention, provision is made for obtaining respective information of a degree of voicing of the speech signal in the first frame and in the second frame. In this embodiment, it is advantageously taken into account that the preceding and following frames are substantially voiced or not to form the samples of the synthesis blocks.

Avantageusement, on ajoute ou supprime des échantillons dans les blocs de manière à obtenir un nombre d'échantillons total dans les blocs qui correspond sensiblement au nombre d'échantillons à prévoir dans la trame de synthèse. Advantageously, samples are added or deleted in the blocks so as to obtain a total number of samples in the blocks which corresponds substantially to the number of samples to be provided in the synthesis frame.

De préférence, on pondère les valeurs binaires des échantillons précédant et succédant un échantillon supprimé en fonction de celles de l'échantillon supprimé, et/ou les valeurs binaires d'un échantillon ajouté en fonction de celles des échantillons précédant et succédant l'échantillon ajouté. Preferably, the binary values of the samples preceding and succeeding a deleted sample are weighted according to those of the deleted sample, and / or the binary values of an added sample as a function of those of the samples preceding and succeeding the added sample. .

Préférentiellement, on effectue un lissage : entre les valeurs binaires respectives du dernier échantillon au moins d'un bloc et du premier échantillon au moins du bloc suivant, et/ou Preferably, smoothing is carried out between the respective binary values of the last sample of at least one block and the first sample of at least the next block, and / or

entre les valeurs binaires respectives du dernier échantillon au moins d'une trame valide précédant la trame erronée et du premier échantillon au moins de la trame de synthèse, et/ou entre les valeurs binaires respectives du dernier échantillon au moins de la trame de synthèse et du premier échantillon au moins d'une trame valide succédant la trame erronée. between the respective bit values of the last sample of at least one valid frame preceding the erroneous frame and at least the first sample of the synthesis frame, and / or between the respective bit values of the last least sample of the synthesis frame and the first sample at least one valid frame succeeding the erroneous frame.

La présente invention vise aussi un dispositif synthétiseur d'un signal de parole représenté par des trames successives d'échantillons, qui comporte des moyens pour mettre en oeuvre le procédé ci-avant. The present invention also relates to a synthesizer device of a speech signal represented by successive frames of samples, which comprises means for implementing the method above.

Dans une réalisation particulière, le dispositif comporte en outre des moyens de connexion destinés à être reliés à un décodeur d'un flux de données codées de signaux de parole, de sorte que le dispositif reçoit des trames successives décodées. In a particular embodiment, the device further comprises connection means intended to be connected to a decoder of a coded data stream of speech signals, so that the device receives decoded successive frames.

En variante, le dispositif comporte des moyens de réception d'un flux de données codées de signaux de parole, ainsi que des moyens de décodage au moins partiel de ce flux. In a variant, the device comprises means for receiving an encoded data stream of speech signals, as well as means for at least partial decoding of this stream.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ciaprès, et des dessins annexés sur lesquels : - la figure 1 illustre un système de transmission d'un signal de parole, via un canal de transmission ; Other features and advantages of the invention will appear on examining the following detailed description, and the accompanying drawings, in which: FIG. 1 illustrates a system for transmitting a speech signal, via a transmission channel;

- la figure 2 représente schématiquement des modules d'un dispositif synthétiseur au sens de la présente invention ; - la figure 3 illustre les étapes d'un procédé d'identification d'une trame valide précédant une ou plusieurs trames erronées et d'une trame valide succédant une ou plusieurs trames erronées ; - la figure 4A représente schématiquement un signal de parole échantillonné et comportant une trame erronée Te ; - la figure 4B représente une succession de blocs de synthèse, dont une partie au moins des échantillons est destinée à être utilisée pour remplacer la trame erronée Te ; - la figure 4C représente le signal de parole synthétisé, comportant une trame reconstruite tr, en remplacement de la trame erronée Te ; - la figure 5 illustre les étapes d'un procédé d'estimation de période fondamentale du signal de parole pour une trame reconstruite tr, en fonction du degré de voisement des trames valides précédentes et suivantes ; - la figure 6 représente les étapes d'un procédé de synthèse des blocs, suivies d'un lissage des premiers et derniers échantillons de ces blocs ; - la figure 7 représente schématiquement des parties du signal destinées à être lissées, en particulier au début et à la fin de la trame reconstruite tr ; - la figure 8A représente un signal de parole, dont une trame centrale a été reconstruite, avec une trame voisée précédant la partie reconstruite et une trame voisée succédant la trame reconstruite ; - la figure 8B représente le signal de parole d'origine, qui correspond à la figure 8A ;

FIG. 2 schematically represents modules of a synthesizer device within the meaning of the present invention; FIG. 3 illustrates the steps of a method of identification of a valid frame preceding one or more erroneous frames and of a valid frame succeeding one or more erroneous frames; FIG. 4A schematically represents a sampled speech signal comprising an erroneous frame Te; FIG. 4B represents a succession of synthesis blocks, of which at least part of the samples is intended to be used to replace the erroneous frame Te; FIG. 4C represents the synthesized speech signal comprising a reconstructed frame tr, replacing the erroneous frame Te; FIG. 5 illustrates the steps of a method for estimating the fundamental period of the speech signal for a reconstructed frame tr, as a function of the degree of voicing of the preceding and following valid frames; FIG. 6 represents the steps of a method for synthesizing the blocks, followed by a smoothing of the first and last samples of these blocks; FIG. 7 schematically represents parts of the signal intended to be smoothed, in particular at the beginning and at the end of the reconstructed tr-frame; FIG. 8A represents a speech signal, a central frame of which has been reconstructed, with a voiced frame preceding the reconstructed part and a voiced frame succeeding the reconstructed frame; FIG. 8B represents the original speech signal, which corresponds to FIG. 8A;

- la figure 9A représente un signal de parole, dont une trame centrale a été reconstruite, avec une trame voisée précédant la partie reconstruite et une trame non voisée succédant la trame reconstruite ; - la figure 9B représente le signal de parole d'origine, qui correspond à la figure 9A ; - la figure 10A représente un signal de parole, dont deux trames centrales ont été reconstruites, avec une trame non voisée précédant la partie reconstruite et une trame non voisée succédant la partie reconstruite ; et - la figure 10B représente le signal de parole d'origine, qui correspond à la figure 10A. FIG. 9A represents a speech signal, of which a central frame has been reconstructed, with a voiced frame preceding the reconstructed part and an unvoiced frame succeeding the reconstructed frame; FIG. 9B represents the original speech signal, which corresponds to FIG. 9A; FIG. 10A represents a speech signal, of which two central frames have been reconstructed, with an unvoiced frame preceding the reconstructed part and an unvoiced frame succeeding the reconstructed part; and Figure 10B shows the original speech signal, which corresponds to Figure 10A.

Les dessins et la description ci-après contiennent, pour l'essentiel, des éléments de caractère certain. Ils pourront donc non seulement servir à mieux faire comprendre l'invention, mais aussi contribuer à sa définition, le cas échéant. The drawings and the description below contain, for the most part, elements of a certain character. They can therefore not only serve to better understand the invention, but also contribute to its definition, if any.

On se réfère tout d'abord à la figure 1 sur laquelle un signal numérique de parole S est, dans un premier temps, codé par un codeur CO qui transforme le signal de parole S en un flux binaire de données codées FC. Ce flux binaire FC est transporté ensuite par un canal de transmission CL. Referring first to Figure 1 in which a digital speech signal S is, at first, encoded by a CO encoder which converts the speech signal S into a bit stream of encoded data FC. This bit stream FC is then transported by a transmission channel CL.

Il peut aussi bien s'agir d'un réseau de type GSM, auquel cas le signal de parole est représenté par une succession de trames consécutives, ou encore d'un réseau de type Internet, auquel cas le signal S est représenté par une succession de paquets de bits. Par la suite, on désignera par le terme"trame"une trame transmise par réseau GSM, It may also be a GSM-type network, in which case the speech signal is represented by a succession of consecutive frames, or an Internet-type network, in which case the signal S is represented by a succession bit packets. Subsequently, the term "frame" will be used to designate a frame transmitted by GSM network,

ou un paquet transmis par réseau Internet, ou encore d'autres types de quanta d'échantillons. or a packet transmitted over the Internet, or other types of quanta of samples.

On décrit ci-après un bloc de réception et de décodage d'un flux binaire FC'issu du canal de transmission CL. Le flux binaire FC'n'est pas toujours identique au flux binaire FC, puisque certaines trames ou paquets ont pu être erronés lors de la transmission via le canal CL, ou ont pu être perdus, ou encore reçus trop tardivement pour pouvoir être utilisés. A block for receiving and decoding a bit stream FC'issu of the transmission channel CL is described below. The bitstream FC 'is not always identical to the bitstream FC, since some frames or packets could be erroneous during transmission via the CL channel, or may have been lost, or received too late to be used.

Dans ce qui suit, on entend par les termes"trame erronée", une trame ou un paquet erroné, ou reçu tardivement, ou encore qui a été perdu pendant la transmission. In what follows, the terms "erroneous frame", a frame or an erroneous packet, or received late, or which was lost during transmission.

A titre d'exemple non limitatif, le bloc de réception et de décodage de la figure 1 correspond à un terminal d'audioconférence sur IP (Protocole Internet) de codage/décodage numérique de type G711, équipé cependant d'un dispositif synthétiseur 1 au sens de la présente invention. Le G711 est un codeur en bande téléphonique (300-3400 Hz) de fréquence d'échantillonnage de 8 kHz, à 64 kbits/s. Dans l'exemple, les trames codées ont une durée de 30 millisecondes (ce qui correspond à 240 échantillons, avec la fréquence d'échantillonnage de 8 kHz). By way of non-limiting example, the reception and decoding block of FIG. 1 corresponds to a G711-type digital coding / decoding Internet Protocol (IP) audio conferencing terminal, which is however equipped with a synthesizer device 1 at sense of the present invention. The G711 is a telephone band coder (300-3400 Hz) with a sampling frequency of 8 kHz at 64 kbit / s. In the example, the coded frames have a duration of 30 milliseconds (which corresponds to 240 samples, with the sampling frequency of 8 kHz).

A la réception, un module 3 de détection de données erronées reçoit le flux binaire FC'issu du canal de transmission CL. En particulier, le module 3 détecte les On reception, a module 3 for detecting erroneous data receives the bit stream FC'issu of the transmission channel CL. In particular, Module 3 detects

trames erronées Te et transmet un signal d'indication de trame erronée ITE à un dispositif synthétiseur 1, pour une dissimulation des erreurs. Les trames valides Tv (trames non erronées) sont transmises à un dispositif 2 de décodage. Le dispositif synthétiseur 1 reçoit en outre des portions au moins de signal décodé SD, issues du dispositif de décodage 2 et utilise ces portions de signal pour synthétiser un signal de reconstruction SR. Le signal de reconstruction SR est, le cas échéant, additionné au signal décodé (en sortie du dispositif de décodage 2), de manière à délivrer un signal de parole S'prêt à être utilisé. Il est à noter que le signal S'n'est pas toujours identique au signal S, puisque des portions du signal S'ont pu être synthétisées. Erroneous frames Te and transmits an erroneous ITE indication signal to a synthesizer device 1, for error concealment. The valid frames Tv (non-erroneous frames) are transmitted to a decoding device 2. The synthesizer device 1 further receives at least portions of the decoded signal SD from the decoding device 2 and uses these signal portions to synthesize a reconstruction signal SR. The reconstruction signal SR is, if necessary, added to the decoded signal (at the output of the decoding device 2), so as to deliver a speech signal Ready to be used. It should be noted that the signal S'n is not always identical to the signal S, since portions of the signal could be synthesized.

En se référant à la figure 2, le dispositif synthétiseur 1 comprend une mémoire 10 destinée à stocker un nombre choisi de trames valides issues du décodeur 2. Cette mémoire 10 coopère avec un synthétiseur de blocs 11 qui reçoit le signal d'indication de trame erronée ITE, pour démarrer la synthèse des blocs BL. Le dispositif synthétiseur 1 comprend en outre un module 12 de lissage des échantillons de synthèse produits en remplacement des échantillons de la trame erronée. Le dispositif synthétiseur 1 fournit, en sortie du module 12, le signal reconstruit SR. Referring to FIG. 2, the synthesizer device 1 comprises a memory 10 intended to store a selected number of valid frames coming from the decoder 2. This memory 10 cooperates with a block synthesizer 11 which receives the erroneous frame indication signal. ITE, to start the synthesis of BL blocks. The synthesizer device 1 further comprises a module 12 for smoothing the synthesis samples produced to replace the samples of the erroneous frame. The synthesizer device 1 provides, at the output of the module 12, the reconstructed signal SR.

Ainsi, après décodage des données valides, on stocke dans la mémoire 10 des échantillons décodés en nombre suffisant pour synthétiser des trames effacées ou erronées, par la suite. Préférentiellement, la mémoire 10 stocke deux Thus, after decoding the valid data, there are stored in the memory 10 decoded samples in sufficient number to synthesize erased or erroneous frames, thereafter. Preferably, the memory 10 stores two

trames valides précédant la trame à reconstruire (soit 60 millisecondes de signal avant la trame synthétisée à jouer à l'instant t), ainsi qu'une trame valide qui succède la trame à synthétiser (soit 30 millisecondes du signal après la trame synthétisée à jouer à l'instant t). Le contenu temporaire de la mémoire 10 comprend donc 960 échantillons (480 échantillons avant l'instant t et 240 échantillons après l'instant t pour une fréquence d'échantillonnage de 8 kHz). Dans ce mode de réalisation, il est pris une trame de retard, soit donc 30 millisecondes, pour avoir des informations a posteriori sur le signal futur. Les échantillons de la dernière trame reçue sont placés en fin du contenu temporaire de la mémoire 10 et les échantillons des trames les plus anciennes sont décalés, selon le fonctionnement classique d'une mémoire à registre de décalage.

valid frames preceding the frame to be reconstructed (ie 60 milliseconds of signal before the synthesized frame to play at time t), as well as a valid frame that succeeds the frame to be synthesized (ie 30 milliseconds of the signal after the synthesized frame to play at time t). The temporary content of the memory 10 therefore comprises 960 samples (480 samples before the instant t and 240 samples after the instant t for a sampling frequency of 8 kHz). In this embodiment, a delay frame, ie 30 milliseconds, is taken to have a posteriori information on the future signal. The samples of the last received frame are placed at the end of the temporary content of the memory 10 and the samples of the oldest frames are shifted according to the conventional operation of an offset register memory.

Ainsi, en se référant à la figure 2, le dispositif synthétiseur 1 reçoit ainsi deux informations : - la première information correspond aux échantillons des trames reçues précédemment, et - la seconde information correspond au nombre de trames successives perdues ou erronées, à synthétiser. Thus, with reference to FIG. 2, the synthesizer device 1 thus receives two pieces of information: the first piece of information corresponds to the samples of the frames received previously, and the second piece of information corresponds to the number of successive frames lost or erroneous, to be synthesized.

Dans l'exemple décrit, le nombre de trames successives que le dispositif est capable de synthétiser est limité à deux trames erronées successives. Bien entendu, en choisissant une mémoire 10 dotée de capacités de stockage suffisantes, il est possible de synthétiser plus de deux trames erronées successives, selon la qualité souhaitée du signal restitué. In the example described, the number of successive frames that the device is able to synthesize is limited to two successive erroneous frames. Of course, by choosing a memory 10 with sufficient storage capacity, it is possible to synthesize more than two successive erroneous frames, depending on the desired quality of the restored signal.

En se référant à la figure 3, le dispositif synthétiseur 1 reçoit une indication de trame erronée relative à la trame Ti, à l'étape 30. A la suite des tests 31 et 33, si les deux trames suivantes sont aussi erronées, les trois trames consécutives Ti, Ti+l, Ti+2 sont remplacées préférentiellement par un bruit de confort BCO, à l'étape 35. En revanche, si, à la suite du test 31, il s'avère que la trame suivante Ti+i est valide, on désigne la trame précédente Tir et la trame suivante Ti+l comme trames valides qui serviront de base, à l'étape 32, pour reconstruire la trame erronée Ti. Par ailleurs, si les deux trames consécutives Ti et Ti+l sont erronées, alors que la trame suivante Ti+2 est valide, cette dernière trame Ti+2 et la trame Ti-l qui précède la première trame erronée Ti, sont choisies à l'étape 34 en tant que trames valides de base pour la reconstruction des deux trames erronées Ti et Ti+l. Referring to FIG. 3, the synthesizer device 1 receives an erroneous frame indication relative to the frame Ti, in step 30. Following the tests 31 and 33, if the following two frames are also erroneous, the three consecutive frames Ti, Ti + 1, Ti + 2 are replaced preferentially by a comfort noise BCO, in step 35. On the other hand, if, following the test 31, it turns out that the following frame Ti + i is valid, we designate the previous frame Shot and the following frame Ti + 1 as valid frames which will be used as a basis, in step 32, to reconstruct the erroneous frame Ti. On the other hand, if the two consecutive frames Ti and Ti + 1 are erroneous, while the following frame Ti + 2 is valid, the latter frame Ti + 2 and the frame Ti-1 which precedes the first erroneous frame Ti, are chosen at step 34 as valid base frames for the reconstruction of the two erroneous frames Ti and Ti + 1.

En variante du remplacement des trois trames successives erronées Ti, Ti+l et Ti+2 par un bruit de confort BCO, il est prévu une réinitialisation du contenu temporaire de la mémoire 10 (mise à zéro des valeurs binaires des échantillons). Cependant, le remplacement de trames erronées de longue durée, par un bruit de confort, peut être mieux perçu en terme de qualité de la communication par le destinataire du signal de parole. As a variant of the replacement of the three successive erroneous frames Ti, Ti + 1 and Ti + 2 by a comfort noise BCO, provision is made for a reset of the temporary content of the memory 10 (zeroing of the sample binary values). However, the replacement of erroneous frames of long duration, by a comfort noise, can be better perceived in terms of quality of communication by the recipient of the speech signal.

On décrit ci-après une réalisation particulière dans laquelle le nombre d'échantillons qui est affecté à chaque A particular embodiment is described below in which the number of samples that is assigned to each

bloc de synthèse est sensiblement constant d'un bloc de synthèse à l'autre. synthesis block is substantially constant from one synthesis block to another.

Dans l'exemple de la figure 5, on calcule une première estimation d'une période fondamentale P du signal de parole dans la trame à synthétiser, en fonction des trames valides précédant (t-1) et succédant (t+1) la trame erronée. Comme on le verra plus loin, le nombre d'échantillons par bloc dépend de cette première estimation P. In the example of FIG. 5, a first estimate of a fundamental period P of the speech signal in the frame to be synthesized is calculated, as a function of the valid frames preceding (t-1) and succeeding (t + 1) the frame wrong. As will be seen below, the number of samples per block depends on this first estimate P.

Dans une réalisation préférée, la première estimation de la période fondamentale P s'effectue en fonction du voisement des trames valides précédant (t-1) et succédant (t+1) la trame erronée. In a preferred embodiment, the first estimate of the fundamental period P is made as a function of the voicing of the valid frames preceding (t-1) and succeeding (t + 1) the erroneous frame.

Le traitement démarre par une détection 50 des sons voisés VO ou non voisés NV dans les dernières données mémorisées dans la mémoire 10. Pour effectuer cette détection, il est avantageux d'utiliser une corrélation normalisée, par exemple du type décrit dans : W. B. KLEIJN, K. K. PALIWAL,"Speech Coding and Synthesis", ELSEVIER, 1995. The processing starts with a detection 50 of VO voiced or voiceless voices NV in the last data stored in the memory 10. To perform this detection, it is advantageous to use a normalized correlation, for example of the type described in: WB KLEIJN, KK PALIWAL, "Speech Coding and Synthesis", ELSEVIER, 1995.

Préférentiellement, la technique de détection de voisement dans les trames et de détermination de leur période fondamentale (ou"Ptchs"PO et PI) qui est ici utilisée est celle décrite dans la référence : H. Kobayashi, T. Shimamura,"A weighted autocorrelation method for pitch extraction of noisy speech", Proc ; of ICASSP conference, 2000, Preferably, the technique of detection of voicing in the frames and determining their fundamental period (or "Ptchs" PO and PI) which is used here is that described in the reference: H. Kobayashi, T. Shimamura, "A weighted autocorrelation method for pitch extraction of noisy speech ", Proc; of ICASSP conference, 2000,

qui met en oeuvre l'utilisation conjointe de la fonction d'auto-corrélation normalisée et de la fonction de différence moyenne en valeur absolue (AMDF, de l'anglais "Average Magnitude Difference Functiort"), afin d'améliorer l'estimation du pitch en présence de signaux de parole bruités. De préférence, l'estimation des pitchs PO et PI est contrainte à des valeurs maximales et minimales qui correspondent respectivement à une fréquence fondamentale du signal de parole de 400 Hz (20 échantillons pour une fréquence d'échantillonnage de 8 kHz) et de 66 Hz (120 échantillons pour une fréquence d'échantillonnage de 8 kHz). which implements the joint use of the normalized autocorrelation function and the AMDF (Average Magnitude Difference Functior) function, in order to improve the estimation of the pitch in the presence of noisy speech signals. Preferably, the estimation of the pitch PO and PI is constrained to maximum and minimum values which respectively correspond to a fundamental frequency of the speech signal of 400 Hz (20 samples for a sampling frequency of 8 kHz) and of 66 Hz (120 samples for a sampling frequency of 8 kHz).

Bien entendu, toute autre technique permettant d'obtenir la période de voisement dans une trame de signal de parole peut être utilisée ici. Of course, any other technique for obtaining the voicing period in a speech signal frame can be used here.

Suivant le degré de voisement des trames (t-l) et (t+l), des décisions (qui suivent les tests 51,52 et 55) sont prises quant à la technique de synthèse de trames à utiliser. Dans l'exemple décrit, quatre cas sont distingués : - si les trames (t-l) et (t+l) sont voisées (sortie 0 du test 52), on calcule (étape 53) une première estimation de la période fondamentale P du signal de parole dans la trame à synthétiser, en fonction des périodes fondamentales PO de la trame valide précédant la trame erronée et PI de la trame valide succédant la trame erronée. La fonction permettant d'obtenir la période P peut être la moyenne entre les pitchs respectifs PO et PI, Depending on the degree of frame voicing (t-1) and (t + 1), decisions (which follow tests 51, 52 and 55) are made as to which frame synthesis technique to use. In the example described, four cases are distinguished: if the frames (t1) and (t + 1) are voiced (output 0 of the test 52), a first estimate of the fundamental period P of the signal is calculated (step 53) in the frame to be synthesized, based on the fundamental periods PO of the valid frame preceding the erroneous frame and PI of the valid frame succeeding the erroneous frame. The function making it possible to obtain the period P can be the average between the respective pitches PO and PI,

ou le maximum entre ces deux pitchs, ou le minimum entre ces deux pitchs, ou encore toute combinaison de PO et PI ; - si la trame (t-1) est voisée, alors que la trame (t+1) est non voisée (sortie N du test 52), la trame à synthétiser correspond vraisemblablement à une fin de mot (transition parole/bruit) et l'estimation de P correspond, de préférence, à la période PO de la trame valide voisée (étape 54) ; si la trame (t-1) est non voisée, alors que la trame (t+1) est voisée (sortie 0 du test 55), la trame à synthétiser correspond vraisemblablement à un début de mot (transition bruit/parole) et l'estimation de P correspond, de préférence, au pitch PI de la trame valide voisée (étape 57) ; et - si les trames (t-1) et (t+1) sont non voisées (sortie N du test 55), la trame à synthétiser correspond vraisemblablement à une période de bruit et, de préférence, la synthèse de cette trame se fait par une recopie de la trame précédente (t-1) (étape 56). En variante, la synthèse de la trame erronée peut se faire par recopie de la trame suivante (t+1). or the maximum between these two pitches, or the minimum between these two pitches, or any combination of PO and PI; if the frame (t-1) is voiced, while the frame (t + 1) is unvoiced (output N of test 52), the frame to be synthesized probably corresponds to a word end (speech / noise transition) and the estimate of P preferably corresponds to the period PO of the voiced valid frame (step 54); if the frame (t-1) is unvoiced, while the frame (t + 1) is voiced (output 0 of the test 55), the frame to be synthesized probably corresponds to a start of a word (noise / speech transition) and the estimation of P preferably corresponds to the pitch PI of the voiced valid frame (step 57); and if the frames (t-1) and (t + 1) are unvoiced (output N of test 55), the frame to be synthesized probably corresponds to a noise period and, preferably, the synthesis of this frame is done by a copy of the previous frame (t-1) (step 56). As a variant, the synthesis of the erroneous frame can be done by copying the next frame (t + 1).

Dans ce qui suit, on décrit un mode de réalisation préféré pour estimer le nombre NBL de blocs à synthétiser, ainsi que le nombre TBL d'échantillons à prévoir dans chaque bloc, supposé ici constant d'un bloc à l'autre. In what follows, a preferred embodiment is described for estimating the NBL number of blocks to be synthesized, as well as the TBL number of samples to be provided in each block, here supposed constant from one block to another.

En se référant à la figure 6, après la détection et l'estimation, à l'étape 60, des périodes PO et/ou PI des trames valides encadrant la trame erronée, on calcule une première estimation de la période P dans la trame de Referring to FIG. 6, after detection and estimation, at step 60, PO and / or PI periods of the valid frames surrounding the erroneous frame, a first estimate of the period P in the frame of the frame is calculated.

synthèse, en fonction des périodes PO et/ou PI comme décrit ci-avant.

synthesis, according to the periods PO and / or PI as described above.

Le nombre de blocs NBL à synthétiser est fonction de cette première estimation de la période P et du nombre NTR d'échantillons perdus dans la trame erronée (obtenu à l'étape 62). The number of NBL blocks to be synthesized is a function of this first estimate of the period P and of the number NTR of samples lost in the erroneous frame (obtained in step 62).

Le nombre de blocs NBL est estimé à l'étape 61, en fonction du rapport :

Cependant, les périodes PO et PI (si les trames valides sont toutes deux voisées) ne sont pas forcément identiques et il faut définir un nombre d'échantillons par bloc TBL qui soit adapté. Avantageusement, la valeur du nombre de blocs NBL, en relation avec le nombre d'échantillons par bloc TBL comme on le verra plus loin, est optimisée par minimisation de la valeur absolue de la quantité suivante :

En fonction de cette valeur minimum de AN calculée à l'étape 72 et prise pour la valeur i, on incrémente le nombre de blocs NBL de cette valeur de i (NBL = NBL+i à l'étape 74). Finalement, la valeur de NBL correspond à l'entier le plus proche du rapport NTR/P. La valeur AN The number of NBL blocks is estimated at step 61, depending on the ratio:

However, the periods PO and PI (if the valid frames are both voiced) are not necessarily identical and it is necessary to define a number of samples per block TBL that is adapted. Advantageously, the value of the number of NBL blocks, in relation to the number of samples per TBL block, as will be seen below, is optimized by minimizing the absolute value of the following quantity:

Based on this minimum value of AN calculated in step 72 and taken for the value i, the number of NBL blocks of this value of i is incremented (NBL = NBL + i in step 74). Finally, the value of NBL corresponds to the integer closest to the ratio NTR / P. AN value

ainsi estimée correspond au nombre d'échantillons à ajouter aux blocs pour former la trame de synthèse (si la différence AN précitée est positive), ou encore à supprimer pour former la trame de synthèse (si la différence AN est négative). thus estimated corresponds to the number of samples to be added to the blocks to form the synthesis frame (if the aforementioned difference AN is positive), or else to delete to form the synthesis frame (if the difference AN is negative).

A l'étape 63 de la figure 6, on estime ensuite le nombre d'échantillons par bloc TBL à synthétiser. Ce nombre est donné par le rapport :

dont la valeur est arrondie à l'entier le plus proche. In step 63 of FIG. 6, the number of samples per TBL block to be synthesized is then estimated. This number is given by the report:

whose value is rounded to the nearest integer.

Finalement, la durée des blocs TBL (donnée par le rapport du nombre d'échantillons par bloc TBL ainsi estimé sur la fréquence d'échantillonnage) est représentative d'une période fondamentale du signal de parole dans la trame de synthèse tr. Finally, the duration of the TBL blocks (given by the ratio of the number of samples per TBL block thus estimated on the sampling frequency) is representative of a fundamental period of the speech signal in the synthesis frame tr.

Dans ce qui suit, on décrit, dans un mode de réalisation préféré, la synthèse des blocs, dont le nombre NBL et le nombre d'échantillons par bloc TBL ont été calculés au préalable. In what follows, in a preferred embodiment, the synthesis of the blocks, whose NBL number and the number of samples per block TBL have been calculated beforehand, is described.

Dans l'exemple représenté sur la figure 4A, le signal de parole reçu et décodé présente une trame erronée Te,

précédée d'une trame (t-l) valide et voisée, de pitch PO, et succédée d'une trame (t+l) valide et voisée, de pitch PI. Après avoir détecté et estimé les pitchs PO et PI, les échantillons d'un segment SEG (PO) de durée correspondant à In the example shown in FIG. 4A, the received and decoded speech signal has an erroneous frame Te,

preceded by a frame (tl) valid and voiced pitch PO, and succeeded by a frame (t + 1) valid and voiced pitch PI. After detecting and estimating the pitch PO and PI, the samples of a segment SEG (PO) of duration corresponding to

la période PO dans la trame (t-1), ainsi que les échantillons d'un segment SEG (P1) de durée correspondant à la période PI dans la trame (t+1), sont enregistrés en mémoire 10, en tant que vecteurs Xo et Xi. the period PO in the frame (t-1), as well as the samples of a segment SEG (P1) of duration corresponding to the period PI in the frame (t + 1), are stored in memory 10, as vectors Xo and Xi.

Préférentiellement, ces segments SEG (PO) et SEG (P1) comportent respectivement les derniers échantillons et premiers échantillons des trames valides précédente (t-1) et suivante (t+1). Une modélisation de ces éléments de signal est ensuite effectuée, par exemple en utilisant une transformée spectrale (transformée de Fourier, transformée en cosinus discrète, transformée en sinus discrète, ou autre), ou encore en appliquant une modélisation LPC (du type utilisé dans le codage CELP). Les modèles qui résultent de cette modélisation sont notés ci-après Mod (PO) et Mod (pal). Preferably, these segments SEG (PO) and SEG (P1) respectively comprise the last samples and first samples of the previous (t-1) and following (t + 1) valid frames. A modeling of these signal elements is then carried out, for example by using a spectral transform (Fourier transform, discrete cosine transform, discrete sinus transform, or other), or by applying an LPC model (of the type used in FIG. CELP coding). The models that result from this modeling are noted below Mod (PO) and Mod (pal).

Plus particulièrement, en se référant à nouveau à la figure 5, on distingue préférentiellement trois cas :

- si les deux trames valides précédente et suivante sont voisées, on sélectionne à l'étape 53 un segment dans la trame valide précédente, dont la durée correspond à une période PO de la trame valide précédente, ainsi qu'un segment dans la trame valide suivante, dont la durée correspond à la période PI dans la trame valide suivante ; - si seule la trame précédente est voisée, on sélectionne, à l'étape 54, un segment dans la trame valide précédente dont la durée correspond à la période PO de la trame valide précédente, ainsi qu'un segment dans la trame valide suivante (non voisée) dont la durée correspond à la période PO de la trame valide précédente (voisée) ; More particularly, referring again to FIG. 5, three cases are preferentially distinguished:

if the two preceding and following valid frames are voiced, in step 53 a segment is selected in the previous valid frame whose duration corresponds to a period PO of the previous valid frame, as well as a segment in the valid frame next, whose duration is the PI period in the next valid frame; if only the previous frame is voiced, a segment in the previous valid frame whose duration corresponds to the period PO of the previous valid frame and a segment in the following valid frame are selected in step 54 ( unvoiced) whose duration corresponds to the period PO of the previous valid frame (voiced);

- de même, si seule la trame valide suivante est voisée, on sélectionne, à l'étape 57, un segment comprenant les derniers échantillons de la trame valide précédente (non voisée) dont la durée correspond à la période PI dans la trame suivante (voisée), ainsi qu'un segment comportant les premiers échantillons de la trame valide suivante et dont la durée correspond à la période PI de la trame valide suivante.

- Similarly, if only the following valid frame is voiced, in step 57, a segment is selected comprising the last samples of the previous valid frame (unvoiced) whose duration corresponds to the period PI in the following frame ( voiced), as well as a segment comprising the first samples of the next valid frame and whose duration corresponds to the period PI of the next valid frame.

On applique ensuite à ces segments la modélisation décrite ci-avant. Les modèles résultants sont notés Mod (PO) et Mod (Pl), respectivement pour les segments de la trame précédente (t-1) et de la trame suivante (t+1). These models are then applied to the modeling described above. The resulting models are denoted Mod (PO) and Mod (Pl) respectively for the segments of the previous frame (t-1) and the following frame (t + 1).

Dans une réalisation préférentielle, la modélisation appliquée correspond à une transformée spectrale en cosinus discrète (DCT). On calcule alors 120 coefficients respectifs résultant de la transformée spectrale des vecteurs Xo et Xi. Ces 120 coefficients sont respectivement mémorisés en tant que vecteurs Mod (PO) et Mod (Pl). In a preferred embodiment, the modeling applied corresponds to a discrete cosine spectral transform (DCT). 120 respective coefficients resulting from the spectral transformation of the Xo and Xi vectors are then calculated. These 120 coefficients are respectively stored as Mod (PO) and Mod (Pl) vectors.

Dans l'exemple représenté sur la figure 4B, la première estimation de la période P de la trame de synthèse correspond à la moyenne des périodes PO de la trame valide précédente et PI de la trame valide suivante. Le critère d'optimisation de la valeur AN impose le nombre de blocs NBL (six blocs dans l'exemple représenté sur la figure 4B), ainsi que le nombre d'échantillons par bloc TBL (cinq échantillons par bloc dans l'exemple représenté sur la figure 4B, ce qui correspond à la moyenne entre le nombre In the example shown in FIG. 4B, the first estimate of the period P of the synthesis frame corresponds to the average of the PO periods of the previous valid frame and PI of the next valid frame. The optimization criterion of the value AN imposes the number of NBL blocks (six blocks in the example represented in FIG. 4B), as well as the number of samples per block TBL (five samples per block in the example represented on FIG. Figure 4B, which is the average of the number

d'échantillons dans une période PO et le nombre d'échantillons dans une période PI). samples in a PO period and the number of samples in a PI period).

Bien entendu, les figures 4A à 4C n'illustrent les signaux qu'à titre de schémas explicatifs. Les segments SEG (PO) et SEG (Pl) comprennent sept et cinq échantillons. En pratique, ils devraient comprendre au moins vingt échantillons, ce qui correspond à la fréquence fondamentale 400 Hz, la plus élevée dans un signal de parole. Of course, FIGS. 4A to 4C illustrate the signals only as explanatory diagrams. The segments SEG (PO) and SEG (Pl) comprise seven and five samples. In practice, they should comprise at least twenty samples, which corresponds to the fundamental frequency 400 Hz, the highest in a speech signal.

On décrit ci-après une réalisation préférée pour former les échantillons des blocs de synthèse. A preferred embodiment is described below for forming the samples of the synthesis blocks.

En se référant à nouveau à la figure 6, après avoir obtenu à l'étape 66 les paramètres des modèles Mod (PO) et Mod (Pl) à partir des segments SEG (PO) et SEG (Pl) (étapes 64 et 65), on attribue, dans la succession de blocs NBL, un indice n (0 < n NBL-1) à chaque bloc. A l'étape 67, on synthétise les paramètres S (n) k d'un bloc en fonction des modèles Mod (PO) et Mod (Pl), de l'indice n de ce bloc et du nombre de blocs à synthétiser NBL. Dans une réalisation avantageuse, la fonction S (n) correspond à une interpolation des paramètres des modèles Mod (PO) et Mod (Pl), comme décrit ci-après. Referring again to FIG. 6, after obtaining in step 66 the parameters of the Mod (PO) and Mod (P1) models from the SEG (PO) and SEG (P1) segments (steps 64 and 65) , in the NBL block sequence, an index n (0 <n NBL-1) is assigned to each block. In step 67, the parameters S (n) k of a block are synthesized as a function of the Mod (PO) and Mod (Pl) models, of the index n of this block and of the number of blocks to be synthesized NBL. In an advantageous embodiment, the function S (n) corresponds to an interpolation of the parameters of the models Mod (PO) and Mod (Pl), as described below.

Pour un bloc d'indice n, on estime, par une interpolation préférentiellement linéaire, des paramètres intermédiaires Mod (S (n) ) qui devraient être attribués à ce bloc, en fonction des modèles Mod (PO) et Mod (Pl). For a block of index n, it is estimated, by preferentially linear interpolation, intermediate parameters Mod (S (n)) which should be attributed to this block, according to the models Mod (PO) and Mod (Pl).

Ainsi, pour tout ke [0, TBL-l],

Si, pour estimer les modèles Mod (PO) et Mod (Pl), il a été utilisé préalablement une transformée spectrale, telle qu'une transformée en cosinus discrète, on procède, après l'estimation des paramètres intermédiaires Mod [S (n)] à une transformée spectrale inverse (telle qu'une transformée en cosinus discrète inverse IDCT) pour obtenir les valeurs binaires des échantillons S (n) k que doit comprendre chaque bloc de synthèse.

So, for all ke [0, TBL-1],

If, in order to estimate the Mod (PO) and Mod (Pl) models, a spectral transform, such as a discrete cosine transform, has been used beforehand, proceed after estimating the intermediate parameters Mod [S (n) ] to an inverse spectral transform (such as an IDCT inverse discrete cosine transform) to obtain the binary values of the samples S (n) k to be included in each synthesis block.

Ainsi, pour tout k < = [0, TBL-l], S (n) k=IDTC (Mod [S (n) k D. Thus, for all k <= [0, TBL-1], S (n) k = IDTC (Mod [S (n) k D.

Lorsque tous les blocs sont synthétisés (NBL blocs en tout), il faut supprimer ou ajouter le nombre d'échantillons défini par le critère d'optimisation AN. La position des échantillons à ajouter ou à supprimer est, ici, choisie de façon aléatoire sur l'ensemble du nombre d'échantillons déjà synthétisés. Pour l'ajout d'un échantillon, les valeurs binaires de celui-ci correspondent à une pondération (préférentiellement à la moyenne) des valeurs binaires des échantillons qui l'encadrent. Pour une suppression d'un échantillon, l'échantillon suivant dans la trame à synthétiser est préférentiellement affecté des valeurs binaires qui correspondent à une pondération (préférentiellement à la moyenne) des valeurs binaires de l'échantillon supprimé et de ses propres valeurs binaires avant la suppression. When all the blocks are synthesized (NBL blocks in all), it is necessary to delete or add the number of samples defined by the optimization criterion AN. The position of the samples to be added or removed is here chosen randomly over the whole number of samples already synthesized. For the addition of a sample, the binary values of this sample correspond to a weighting (preferably to the average) of the binary values of the samples that surround it. For a deletion of a sample, the following sample in the frame to be synthesized is preferably assigned binary values which correspond to a weighting (preferably to the average) of the binary values of the deleted sample and of its own binary values before the deletion.

Dans ce qui suit, on décrit, dans un mode de réalisation préféré, un lissage des valeurs binaires des échantillons de début et de fin de bloc, appliqué pour recoller chacun des blocs. In the following, we describe, in a preferred embodiment, a smoothing of the binary values of the beginning and end of block samples, applied to re-stick each of the blocks.

Afin qu'aucun artefact ne vienne dégrader la qualité du signal reconstitué, on effectue un lissage entre les valeurs binaires des derniers échantillons d'un bloc et les valeurs binaires des premiers échantillons d'un bloc suivant. Préférentiellement, le lissage effectué est un lissage de type dit"à facteur d'oubli variable". En se référant à la figure 6, l'étape 68 prévoit le remplacement des valeurs binaires des échantillons S (n) i par les valeurs lissées SI (n) i, en particulier pour les échantillons de début et de fin de bloc. Préférentiellement, ce lissage s'effectue selon des équations suivantes :

Pour tout ne [l, NBL-l],

So that no artefact comes to degrade the quality of the reconstituted signal, a smoothing is carried out between the binary values of the last samples of a block and the binary values of the first samples of a following block. Preferably, the smoothing performed is a smoothing type called "variable forgetting factor". With reference to FIG. 6, step 68 provides for the replacement of the binary values of the samples S (n) i by the smoothed values SI (n) i, in particular for the start and end of block samples. Preferably, this smoothing is carried out according to the following equations:

For all [l, NBL-l],

Dans ces équations, NLIS correspond au nombre d'échantillons sur lequel chaque lissage entre deux blocs est effectué. Il s'agit d'un entier inférieur à la taille de chaque bloc TBL. Le terme a (i) correspond au facteur d'oubli du lissage effectué. Avantageusement, de bons

In these equations, NLIS is the number of samples on which each smoothing between two blocks is performed. This is an integer less than the size of each TBL block. The term a (i) corresponds to the forgetting factor of the smoothing performed. Advantageously, good

résultats sont obtenus pour un lissage faisant intervenir cinq échantillons (NLIS=5).

results are obtained for smoothing involving five samples (NLIS = 5).

Il est à noter que le premier bloc de synthèse (n=0) ne subit pas de lissage en début de bloc. En effet, un lissage du même type est réalisé entre les trames précédant et suivant la trame erronée, comme représenté sur la figure 7. It should be noted that the first synthesis block (n = 0) does not undergo smoothing at the beginning of the block. Indeed, a smoothing of the same type is performed between the frames preceding and following the erroneous frame, as shown in FIG.

En se référant à la figure 6, l'étape 69 prévoit l'utilisation des derniers échantillons de la trame précédente (t-1) à l'étape 70, ainsi que l'utilisation des premiers échantillons de la trame suivante (t+1) à l'étape 71, pour lisser la transition avec les valeurs des premiers échantillons et des derniers échantillons de la trame de synthèse. A l'étape 69, on calcule alors le facteur d'oubli a (i), ainsi que les valeurs lissées SR' (k) i qui remplacent les anciennes valeurs de ces échantillons SR (k) i. Referring to FIG. 6, step 69 provides for the use of the last samples of the previous frame (t-1) in step 70, as well as the use of the first samples of the following frame (t + 1 ) in step 71, to smooth the transition with the values of the first samples and the last samples of the synthesis frame. In step 69, the forgetting factor a (i) and the smoothed values SR '(k) i which replace the old values of these samples SR (k) i are then calculated.

Comme le lissage entre blocs, le lissage entre trames reconstruites et trames valides s'effectue selon les équations suivantes, données à titre d'exemple :

Like the smoothing between blocks, the smoothing between reconstructed frames and valid frames is carried out according to the following equations, given by way of example:

Le nombre d'échantillons à lisser NLIS est strictement inférieur au nombre d'échantillons par trame NTR. Dans les équations qui précèdent, on remarque que ce sont les premiers points de la trame valide suivante (t+l) qui sont lissés. En variante, il peut, bien entendu, être prévu de lisser plutôt les derniers échantillons de la trame de synthèse tr. The number of samples to smooth NLIS is strictly less than the number of samples per NTR frame. In the preceding equations, we note that it is the first points of the next valid frame (t + 1) that are smoothed. Alternatively, it may, of course, be expected to smooth rather the last samples of the synthesis frame tr.

Avantageusement, si le lissage entre trames est effectué sur les valeurs de cinq échantillons (NLIS=5), aucun artefact ne dégrade la qualité du signal reconstitué SR, obtenu finalement à l'étape 73 de la figure 6. Le lissage à facteur d'oubli variable appliqué pour recoller la trame de synthèse tr aux trames qui l'encadrent, comme représenté sur la figure 7, permet donc d'obtenir un résultat satisfaisant. Advantageously, if the smoothing between frames is performed on the values of five samples (NLIS = 5), no artifact degrades the quality of the reconstituted signal SR, finally obtained in step 73 of FIG. variable forgetting applied to glue the synthesis frame tr frames that frame, as shown in Figure 7, thus provides a satisfactory result.

Dans l'exemple représenté sur la figure 4C, la trame de synthèse tr a été construite à partir des TBL échantillons dans les NBL blocs de la figure 4B, en particulier en sélectionnant une partie seulement de ces échantillons (en supprimant deux échantillons dans l'exemple représenté). In the example shown in FIG. 4C, the tr synthesis frame was constructed from the TBL samples in the NBL blocks of FIG. 4B, in particular by selecting only a portion of these samples (by deleting two samples in FIG. example shown).

Les valeurs des échantillons intermédiaires entre les trames valides et la trame de synthèse ont été lissées, comme décrit précédemment. The values of the intermediate samples between the valid frames and the synthesis frame were smoothed, as previously described.

On se réfère maintenant à la figure 8A qui représente un signal de parole féminine reconstruit, à comparer avec le même signal de parole, sans reconstruction (figure 8B). Reference is now made to FIG. 8A, which represents a reconstructed female speech signal, to be compared with the same speech signal, without reconstruction (FIG. 8B).

Dans l'exemple représenté, la trame reconstituée est In the example shown, the reconstituted frame is

précédée et succédée de trames valides voisées. En se référant à la figure 8A, il a fallu synthétiser sept blocs successifs pour reconstruire la trame tr. En comparant les figures 8A et 8B, on remarque une régularité satisfaisante du signal de parole et aucune non-linéarité. En particulier, très peu de différences dans la forme des signaux sont remarquables entre la trame de synthèse et la trame du signal original sans perte. preceded and succeeded by valid voiced frames. Referring to Figure 8A, it was necessary to synthesize seven successive blocks to reconstruct the tr-frame. Comparing FIGS. 8A and 8B, there is a satisfactory regularity of the speech signal and no non-linearity. In particular, very few differences in the form of the signals are remarkable between the synthesis frame and the frame of the original signal without loss.

Dans l'exemple représenté sur la figure 9A, la trame reconstruite succède une trame valide voisée et précède une trame valide non voisée. Il s'agit donc d'une perte de trame lors d'une transition parole/bruit (fin de mot). In the example shown in FIG. 9A, the reconstructed frame succeeds a valid voiced frame and precedes a valid unvoiced frame. It is therefore a loss of frame during a speech / noise transition (end of word).

Ici, le signal de parole est masculin. En se référant à la figure 9A, la transition s'effectue de façon régulière sur les quatre blocs de synthèse. La modification du signal voisé vers le signal non voisé est réalisée sans aucune non-linéarité. Néanmoins, une différence dans le prolongement du voisement de la trame de synthèse est à remarquer sur la figure 9A. En effet, alors que le signal original est voisé sur une plus longue période, le voisement est progressivement atténué dans la trame de synthèse. Here, the speech signal is masculine. Referring to Figure 9A, the transition is smooth on the four synthesis blocks. The modification of the voiced signal to the unvoiced signal is performed without any non-linearity. Nevertheless, a difference in the extension of the voicing of the synthesis frame is to be noted in FIG. 9A. Indeed, while the original signal is voiced over a longer period, the voicing is gradually attenuated in the synthesis frame.

Ci-avant, on a décrit de façon détaillée un exemple de synthèse d'une seule trame de substitution tr. Si deux trames reçues successivement sont erronées, le nombre d'échantillons perdus en tout correspond à deux tailles de trame (2*NTR) et la synthèse des blocs s'effectue comme décrit ci-avant, en utilisant la première trame valide Above, an example of synthesis of a single tr substitution frame has been described in detail. If two frames received successively are erroneous, the number of samples lost in all corresponds to two frame sizes (2 * NTR) and the synthesis of the blocks is carried out as described above, using the first valid frame

précédente (t-1) et la première trame valide suivante (t+2), comme représenté à l'étape 34 de la figure 3. preceding (t-1) and the next first valid frame (t + 2), as shown in step 34 of FIG.

Les figures 10A et 10B représentent le cas d'une perte de deux trames successives, encadrées par une trame non voisée et une trame voisée. Il s'agit donc d'une transition bruit/parole (début de mot). Ici, le signal de parole est masculin. En se référant à la figure 10A, on constate que la transition entre le signal non voisé et le signal voisé est régulière sur la durée des deux trames reconstruites. Cependant, le voisement de la trame de synthèse commence dès le second bloc (échantillon 300) dans la première trame de synthèse, alors que, dans le signal original, le voisement ne commence qu'à l'échantillon 500 (correspondant au cinquième bloc de synthèse). Toutefois, l'évolution du signal semble satisfaisante lors de l'écoute. FIGS. 10A and 10B show the case of a loss of two successive frames framed by an unvoiced frame and a voiced frame. It is therefore a noise / speech transition (beginning of the word). Here, the speech signal is masculine. Referring to FIG. 10A, it can be seen that the transition between the unvoiced signal and the voiced signal is regular over the duration of the two reconstructed frames. However, the voicing of the synthesis frame starts from the second block (sample 300) in the first synthesis frame, whereas, in the original signal, the voicing only begins with the sample 500 (corresponding to the fifth block of synthesis). However, the evolution of the signal seems satisfactory when listening.

Ainsi, de façon plus générale, la présente invention offre une excellente qualité de synthèse entre deux trames voisées. Elle offre en outre une synthèse satisfaisante d'un signal de transition entre une zone voisée et une zone non voisée. Selon un autre avantage que procure la présente invention, il est possible de générer plusieurs trames de synthèse (deux trames successives comme représenté sur la figure 10A, ou plus). Thus, more generally, the present invention offers an excellent quality of synthesis between two voiced frames. It also offers a satisfactory synthesis of a transition signal between a voiced area and an unvoiced area. According to another advantage provided by the present invention, it is possible to generate several synthetic frames (two successive frames as shown in FIG. 10A, or more).

Bien entendu, la présente invention ne se limite pas à la forme de réalisation décrite ci-avant à titre d'exemple elle s'étend à d'autres variantes. Of course, the present invention is not limited to the embodiment described above by way of example it extends to other variants.

Ainsi, on comprendra que le dispositif synthétiseur au sens de la présente invention peut aussi bien être séparé du décodeur, comme représenté sur la figure 1, ou encore faire partie intégrante d'un décodeur, en tant que module synthétiseur de trames manquantes ou erronées. Thus, it will be understood that the synthesizer device within the meaning of the present invention may as well be separated from the decoder, as shown in FIG. 1, or may also be an integral part of a decoder, as a synthesizer module for missing or erroneous frames.

Dans ce qui précède, le dispositif synthétiseur opère sur la base de trames décodées. En variante, notamment lorsque le codage CELP est utilisé, le dispositif synthétiseur peut opérer sur des données partiellement codées, notamment entre deux étapes intermédiaires de décodage. Par exemple, le dispositif synthétiseur peut obtenir les périodes fondamentales dans les trames valides précédente et suivante (pitchs PO et PI), ainsi que des paramètres LPC (Mod (PO) et Mod (Pl) ) associés à ces périodes ou à des segments de durées multiples de ces périodes, ces informations étant en général disponibles dans le flux binaire codé par un codage de type CELP (par exemple de type G 723.1). La synthèse peut donc s'effectuer en estimant des tailles de blocs adéquates en fonction des périodes de pitch PO et PI qui sont données dans le flux binaire codé, la synthèse des blocs s'effectuant ensuite par une interpolation de Mod (PO) à Mod (Pl). In the above, the synthesizer device operates on the basis of decoded frames. As a variant, especially when the CELP coding is used, the synthesizer device can operate on partially coded data, in particular between two intermediate decoding steps. For example, the synthesizer device can obtain the fundamental periods in the previous and next valid frames (pitchs PO and PI), as well as parameters LPC (Mod (PO) and Mod (Pl)) associated with these periods or segments of multiple periods of these periods, this information is generally available in the bit stream encoded by a type CELP encoding (eg type G 723.1). The synthesis can therefore be carried out by estimating appropriate block sizes as a function of the pitch periods PO and PI which are given in the coded bitstream, the synthesis of the blocks then taking place by an interpolation from Mod (PO) to Mod. (Pl).

La présente invention trouve avantageusement une application à tout autre type de codage. Elle trouve par exemple une application intéressante, sans toutefois s'y limiter, à la synthèse de trames codées initialement par des codeurs temporels dont la structure se prête moins bien a priori à la dissimulation des paquets d'erreurs. The present invention is advantageously applicable to any other type of coding. It finds for example an interesting application, but not limited to, the synthesis of frames encoded initially by time coders whose structure lends itself less well a priori to the concealment of error packets.

On a décrit ci-avant une réalisation particulière dans laquelle le nombre d'échantillons qui est affecté à chaque bloc de synthèse est sensiblement constant d'un bloc de synthèse à l'autre. Dans une variante, le premier bloc de synthèse et le dernier bloc de synthèse comportent un nombre d'échantillons correspondant respectivement à une première période PO (si la trame valide précédente est voisée) et à une seconde période PI (si la trame valide suivante est voisée), tandis que le nombre d'échantillons par bloc varie, de préférence de façon continue, dans la succession de blocs. Cette réalisation trouve une application avantageuse au codage IMBE. A particular embodiment has been described above in which the number of samples assigned to each synthesis block is substantially constant from one synthesis block to another. In a variant, the first synthesis block and the last synthesis block comprise a number of samples respectively corresponding to a first period PO (if the previous valid frame is voiced) and to a second period PI (if the next valid frame is voiced), while the number of samples per block varies, preferably continuously, in the succession of blocks. This embodiment finds an advantageous application to IMBE coding.

Dans l'exemple décrit ci-avant, le choix des échantillons à ajouter ou supprimer pour obtenir un nombre d'échantillons dans la trame de synthèse correspondant au nombre d'échantillons perdus de la trame erronée, s'effectue de manière aléatoire. Dans une variante plus sophistiquée, la détection des trames valides voisées ou non voisées est utilisée comme suit : - si les deux trames valides précédentes et suivantes sont voisées, la suppression ou l'ajout de points s'effectue de manière aléatoire ; - si l'une des trames seulement est voisée (début de mot ou fin de mot), la suppression ou l'ajout d'échantillons s'effectue préférentiellement dans une partie de la trame de synthèse qui est adjacente à la trame valide non voisée. Néanmoins, on conserve un caractère avantageusement aléatoire pour l'ajout ou la suppression d'échantillons. De préférence, l'ajout ou la suppression des échantillons ne s'effectue pas à des intervalles In the example described above, the choice of samples to be added or deleted to obtain a number of samples in the synthesis frame corresponding to the number of samples lost from the erroneous frame, is made randomly. In a more sophisticated variant, the detection of valid voiced or unvoiced frames is used as follows: if the two preceding and following valid frames are voiced, the deletion or the addition of points is done randomly; if only one of the frames is voiced (start of word or end of word), the deletion or addition of samples is preferably carried out in a part of the synthesis frame which is adjacent to the valid unvoiced frame . Nevertheless, an advantageously random character is retained for adding or deleting samples. Preferably, the addition or deletion of the samples does not occur at intervals

réguliers du signal, ce qui pourrait entraîner une surpériodicité audible. signal, which could lead to audible overdetermination.

Dans l'exemple décrit ci-avant, une interpolation linéaire entre Mod (PO) et Mod (Pl), pour synthétiser les blocs, donne des résultats satisfaisants à l'écoute lorsque les trames valides sont voisées de part et d'autre de la trame erronée. Néanmoins, en particulier si l'une des trames est non voisée, la synthèse des blocs peut être basée sur un modèle différent d'une interpolation linéaire. En effet, en cas de fin de mot (ou de début de mot), il n'est pas nécessaire de poursuivre trop longtemps (ou de commencer trop tôt) le voisement lors de la synthèse de la trame de substitution. Ainsi, dans une variante plus sophistiquée, on tient compte, dans l'attribution des paramètres Mod [S (n)] pour synthétiser les blocs, du degré de voisement des trames valides précédente et suivante. In the example described above, a linear interpolation between Mod (PO) and Mod (Pl), to synthesize the blocks, gives satisfactory results in listening when the valid frames are voiced on either side of the wrong frame. Nevertheless, especially if one of the frames is unvoiced, the synthesis of the blocks can be based on a different model of a linear interpolation. Indeed, in case of end of word (or beginning of word), it is not necessary to continue too long (or to start too early) the voicing during the synthesis of the substitution frame. Thus, in a more sophisticated variant, in the allocation of the parameters Mod [S (n)] to synthesize the blocks, the degree of voicing of the preceding and following valid frames is taken into account.

Typiquement, si l'une des trames est non voisée, les coefficients de son modèle seront prépondérants pour l'estimation des paramètres associés aux blocs. Typically, if one of the frames is unvoiced, the coefficients of its model will be preponderant for the estimation of the parameters associated with the blocks.

La présente invention trouve une application intéressante, mais non limitative, au traitement de signaux de parole transmis par téléphonie.The present invention finds an interesting application, but not limited to the processing of speech signals transmitted by telephony.

Claims

1. A method of synthesizing a speech signal represented by successive frames of samples, wherein: a) identifying at least one erroneous frame (Te) and at least one first valid frame (t-1); ) preceding the erroneous frame and a second valid frame succeeding the erroneous frame (t + 1), b) forming a succession of synthesized blocks (NBL) as a function of the valid frames preceding (t-1) and succeeding (t + 1) the erroneous frame, each block comprising a number of samples (TBL) calculated according to a fundamental period of the speech signal in at least one of the valid frames (PO, PI), and c) a frame of synthesis (tr) using at least a portion of the samples of the blocks.

The method of claim 1, wherein the blocks formed are speech signal synthesis blocks.

3. Method according to one of claims 1 and 2, wherein: - one obtains values of first (ModPO) and second parameters (ModPl) of a model applied to first and second portions (SEG (PO), SEG (P1)) respectively of the first frame and the second frame, and - the samples of a block (S (n)) are formed by applying to this block said model with intermediate parameters (Mod [S (n)] ) between the first and second parameters.

4. Method according to claim 3, in which each block of the succession is identifiable by a block index (n), and in which the intermediate parameters of a block are estimated as a function of the index of the block (n), the total number of blocks in the succession (NBL) and said first and second parameters (ModPO, ModPl).

5. Method according to claim 4, wherein the intermediate parameters of a block are estimated by involving an interpolation between said first and second parameters.

6. Method according to one of claims 3 to 5, wherein said model comprises the allocation of spectral parameters (DCT) associated with a portion of speech signal.

7. Method according to one of claims 3 to 6, wherein one obtains binary values of the samples of a block by involving an inverse spectral transform (IDCT) applied to the intermediate parameters of this block.

8. Method according to one of the preceding claims, wherein there is obtained respective degree of voicing information (VO / NV) of the speech signal in the first frame and in the second frame.

The method of claim 8, wherein, if the speech signals are substantially unvoiced in the first and second frames, the frame of synthesis

substantially corresponds to one of the first or second valid frames (56).

10. Method according to one of the preceding claims, wherein the number of samples per block (TBL) is substantially constant in said succession of blocks, and in which is calculated an estimate (P) of a fundamental period of the signal of speech in the synthesis frame according to a fundamental period of the speech signal in at least one of the valid frames (PO, PI).

The method of claim 10, taken in combination with one of claims 8 and 9, wherein, if the speech signal is substantially voiced in only one of the first and second valid frames, the estimate of the period (P) in the synthesis frame substantially corresponds to a period value of the speech signal in the valid frame that is voiced (54; 57).

The method according to one of claims 8 to 11, taken in combination with one of claims 3 to 7, wherein, if the speech signal is substantially voiced in only one of the first and second valid frames, the first and second portions are of durations corresponding to said period of the speech signal in the voiced valid frame (54; 57).

The method of claim 10, taken in combination with one of claims 8, 9, 11 and 12, wherein, if the speech signal is substantially voiced in the first and second frames, with a first

fundamental period (PO) in the first frame (t-1) and a second fundamental period (PI) in the second frame (t + 1), the estimate of the period (P) in the synthesis frame corresponds substantially to a a value selected from a set including the average of the first and second periods, the maximum between the first and second periods and the minimum between the first and second periods (53).

14. The method according to one of claims 8 to 13, taken in combination with one of claims 3 to 7, wherein the first and second portions are respective durations corresponding to the first and second periods (53).

15. Method according to one of claims 10 to 14, wherein determining a number of blocks (NBL) to form corresponding to the integer closest to a fraction of the number of samples to be provided in the synthesis frame (NTR) on the number of samples in the estimated period (P) of the synthesis frame (tr).

16. The method of claim 15, wherein the number of samples provided per block (TBL) corresponds to the rounding to the integer closest to the fraction of the number of samples to predict in the synthesis frame (NTR). ) on the number of blocks to be formed (NBL).

17. Method according to one of the preceding claims, wherein one adds or deletes samples in the blocks so as to obtain a number of samples.

total in the blocks corresponding substantially to the number of samples (NTR) to be provided in the synthesis frame.

18. The method of claim 17, taken in combination with one of claims 15 and 16, wherein it is estimated a difference (AN) between the number of samples to predict in the synthesis frame (NTR) and the number of blocks to be formed (NBL) multiplied by the number of samples in the estimated period (P) for the synthesis frame, and in which: - only a part of the samples are kept in the blocks to construct the synthesis frame if said difference (AN) is negative, or - we keep all the samples of the blocks, to which we add additional samples, to build the synthesis frame, if the difference (AN) is positive.

19. Method according to one of claims 17 and 18, wherein the samples to be added or removed are selected substantially randomly.

20. The method according to one of claims 17 to 19, wherein each additional sample is assigned binary values corresponding to a weighting of respective binary values of samples preceding and succeeding the additional sample in the synthesis frame.

21. Method according to one of claims 17 to 20, wherein neighboring samples, preceding and / or succeeding

deleted samples are assigned binary values corresponding to a weighting of the binary values of the neighboring samples before deletion and the binary values of the deleted sample.

22. Method according to one of the preceding claims, wherein at least one smoothing is performed between the respective binary values of the last sample of a block and the first sample of the next block.

23. Method according to one of the preceding claims, wherein smoothing is carried out between the respective binary values of the last sample at least one valid frame (tl) preceding the erroneous frame and the first sample at least of the frame of synthesis (tr), and / or between the respective binary values of the last sample at least of the synthesis frame (tr) and the first sample of at least one valid frame succeeding the erroneous frame (t + 1).

24. Method according to one of claims 22 and 23, wherein performs a smoothing with variable forgetting factor.

25. The method according to one of claims 22 to 24, wherein the smoothed binary values of a sample are calculated according to a weighting (a (i)) with the binary values of at least one preceding sample.

26. Synthesizer device of a speech signal represented by successive frames of samples,

characterized in that it comprises means for implementing the method according to one of claims 1 to 25.

27. Device according to claim 26, characterized in that it further comprises connection means (SD) intended to be connected to a decoder (2) of an encoded data stream of speech signals (FC '), and in that said connecting means is adapted to receive a succession of decoded frames.

28. Device according to claim 26, characterized in that it further comprises means for receiving a stream of coded speech signal data (FC '), and at least partial decoding means (COD) of this flow.

29. Device according to one of claims 26 to 28, characterized in that it comprises a memory capable (10) of storing at least three frames, which allows to synthesize at least two successive erroneous frames.