EP2951813B1

EP2951813B1 - Improved correction of frame loss when decoding a signal

Info

Publication number: EP2951813B1
Application number: EP14705848.1A
Authority: EP
Inventors: Julien Faure; Stéphane RAGOT
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2013-01-31
Filing date: 2014-01-30
Publication date: 2016-12-07
Anticipated expiration: 2034-01-30
Also published as: US9613629B2; BR112015018102B1; FR3001593A1; JP2016511432A; MX350634B; JP6426626B2; CN105122356A; CA2899438C; US20150371647A1; CA2899438A1; CN105122356B; RU2015136540A; RU2652464C2; KR102398818B1; BR112015018102A2; EP2951813A1; KR20150113161A; MX2015009964A; WO2014118468A1

Description

La présente invention concerne une correction de signal, notamment dans un décodeur, en cas de perte de trame à la réception du signal par ce décodeur.The present invention relates to a signal correction, in particular in a decoder, in the event of loss of frame upon reception of the signal by this decoder.

Le signal se présente sous la forme d'une succession d'échantillons, découpée en trames successives et on entend alors par « trame » un segment de signal composé d'un ou plusieurs échantillons (une réalisation où une trame comporte un échantillon unique étant possible si le signal se présente sous la forme d'une succession d'échantillons, comme par exemple dans les codecs selon la recommandation UIT-T G.711).The signal is in the form of a succession of samples, divided into successive frames and "frame" is then understood to mean a signal segment composed of one or more samples (an embodiment where a frame comprises a single sample being possible if the signal is in the form of a series of samples, for example in codecs according to ITU-T Recommendation G.711).

L'invention se situe dans le domaine du traitement numérique du signal, notamment mais non exclusivement dans le domaine du codage/décodage d'un signal audio. Les pertes de trames interviennent lorsqu'une communication (soit par transmission en temps-réel, soit par stockage en vue d'une transmission ultérieure) utilisant un codeur et un décodeur, est perturbée par les conditions de canal (à cause de problèmes radio, de congestion de réseau d'accès, etc.).The invention lies in the field of digital signal processing, in particular but not exclusively in the field of coding / decoding of an audio signal. Frame loss occurs when communication (either real-time transmission or storage for later transmission) using an encoder and a decoder is disturbed by channel conditions (because of radio problems, access network congestion, etc.).

Dans ce cas, le décodeur utilise des mécanismes de correction (ou « masquage ») de perte de trames pour tenter de substituer le signal manquant par un signal reconstitué, en utilisant les informations disponibles au sein du décodeur (par exemple le signal déjà décodé ou les paramètres reçus dans des trames précédentes). Cette technique permet de maintenir une bonne qualité de service malgré des performances de canal dégradées.In this case, the decoder uses frame loss correction mechanisms (or "masking") to try to substitute the missing signal with a reconstituted signal, using the information available within the decoder (for example the already decoded signal or parameters received in previous frames). This technique maintains a good quality of service despite degraded channel performance.

Les techniques de correction de perte de trames sont le plus souvent très dépendantes du type de codage utilisé.Frame loss correction techniques are most often very dependent on the type of coding used.

Dans le cas du codage d'un signal de parole basé sur des technologies de type CELP (pour « Code Excited Linear Prediction »), la correction de perte de trame exploite en particulier le modèle CELP. Par exemple, dans un codage selon la recommandation UIT-T G.722.2, la solution pour remplacer une trame perdue (ou un «paquet») consiste à prolonger l'utilisation d'un gain de prédiction à long terme en l'atténuant, ainsi qu'à prolonger l'utilisation de chaque paramètre ISF (pour « Imittance Spectral Frequency ») en les faisant tendre vers leurs moyennes respectives. La hauteur tonale du signal de parole (ou « pitch », paramètre désigné « LTP lag ») est aussi répétée. Par ailleurs, on fournit au décodeur des valeurs aléatoires de paramètres caractérisant « l'innovation » (l'excitation dans le codage CELP).In the case of coding a speech signal based on CELP (Code Excited Linear Prediction) technologies, the frame loss correction exploits in particular the CELP model. For example, in ITU-T G.722.2 coding, the solution to replace a lost frame (or a "packet") is to prolong the use of a long-term prediction gain by attenuating it, and to extend the use of each ISF parameter (for "Imittance Spectral Frequency") by making them tend towards their respective averages. The pitch of the speech signal (or "pitch", parameter designated "LTP lag") is also repeated. In addition, the decoder of the random values of parameters characterizing "innovation" (excitation in CELP coding).

Il convient de noter déjà que l'application de ce type de méthode, pour des codages par transformée ou des codages de forme d'ondes de type « PCM » ou « ADPCM », nécessite un une analyse paramétrique de type CELP du signal passé au niveau du décodeur, ce qui introduit une complexité supplémentaire.It should be noted already that the application of this type of method, for transform coding or "PCM" or "ADPCM" type waveform coding, requires a CELP parametric analysis of the signal passed to the decoder level, which introduces additional complexity.

Dans la recommandation UIT-T G.711 correspondant à un codeur de forme d'ondes, un exemple informatif de traitement de correction de perte de trame (donné dans la partie Appendice I du texte de cette recommandation) consiste à trouver une période de pitch dans le signal de parole déjà décodé et à répéter la dernière période de pitch par recouvrement-addition (ou « overlap-add » en anglais) entre le signal déjà décodé et le signal répété (reconstruit par masquage). Ce traitement permet de « gommer » les artefacts audio mais nécessite un délai supplémentaire au décodeur (délai correspondant à la durée du recouvrement).In ITU-T Recommendation G.711 corresponding to a waveform encoder, an informative example of frame loss correction processing (given in the Appendix I part of the text of this recommendation) is to find a pitch period in the already decoded speech signal and to repeat the last overlap-add pitch period between the already decoded signal and the repeated signal (reconstructed by masking). This processing makes it possible to "erase" the audio artifacts but requires an additional delay to the decoder (delay corresponding to the duration of the recovery).

La technique la plus employée pour corriger la perte de trame dans le cas d'un codage par transformée consiste à répéter le spectre décodé dans la dernière trame reçue. Par exemple, dans le cas du codage selon la recommandation UIT-T G.722.1, la transformée MLT (pour « modulated lapped transform »), équivalente à une transformée en cosinus discrète modifiée (ou MDCT pour « modified discrete cosine transform ») avec un recouvrement de 50% et des fenêtres d'analyse/synthèse de forme sinusoïdale, permet d'assurer une transition (entre la dernière trame perdue et la trame répétée) qui est suffisamment lente pour gommer les artefacts liés à la simple répétition du spectre ; typiquement, si plus d'une trame est perdue, le spectre répété est mis à zéro.The most used technique for correcting frame loss in the case of transform coding is to repeat the decoded spectrum in the last received frame. For example, in the case of ITU-T G.722.1 coding, the modulated lapped transform (MLT) is equivalent to a modified discrete cosine transform (MDCT) with a 50% overlap and sinusoidal analysis / synthesis windows, ensures a transition (between the last lost frame and the repeated frame) which is slow enough to erase the artifacts related to the simple repetition of the spectrum; typically, if more than one frame is lost, the repeated spectrum is set to zero.

Avantageusement, cette méthode de masquage ne nécessite pas de délai supplémentaire puisqu'elle exploite le recouvrement-addition entre le signal reconstitué et le signal passé pour réaliser une sorte de « fondu enchaîné » (avec repliement temporel dû à la transformée MLT). Il s'agit d'une technique très peu coûteuse en termes de ressources.Advantageously, this masking method does not require additional delay since it exploits the overlap-addition between the reconstituted signal and the passed signal to achieve a sort of "crossfade" (with time folding due to the MLT transform). This is a very inexpensive technique in terms of resources.

Toutefois, elle présente un défaut lié à l'incohérence temporelle entre le signal juste avant la perte de trame et le signal répété. Il en résulte une discontinuité (ou incohérence) de phase, qui peut produire des artefacts audio importants si la durée de recouvrement entre les signaux associés à deux trames est réduite (comme c'est le cas en particulier lorsque des fenêtres MDCT dites « à faible retard » sont utilisées). On a illustré cette situation de durée courte de recouvrement sur la figure 1B dans le cas d'une transformée MLT à faible retard, en comparaison de la situation habituelle de la figure 1A dans laquelle des fenêtres longues en sinus sont utilisées selon la recommandation G.722.1 (offrant alors une grande durée de recouvrement ZRA, avec une modulation très progressive). Il apparait qu'une modulation par une fenêtre à faible retard produit un déphasage qui est audible du fait d'une zone de recouvrement courte ZRB, comme représenté sur la figure 1B.However, it has a defect related to the time inconsistency between the signal just before the frame loss and the repeated signal. This results in discontinuity (or incoherence) of phase, which can produce significant audio artifacts if the overlap time between the signals associated with two frames is reduced (as is particularly the case when MDCT windows called "low" delay 'are used). This situation of short duration of recovery on the Figure 1B in the case of a low delay MLT transform, compared to the usual situation of the Figure 1A in which long sinus windows are used according to the recommendation G.722.1 (thus offering a long duration of recovery ZRA, with a very progressive modulation). It appears that a modulation by a low delay window produces a phase shift that is audible due to a short overlap zone ZRB, as shown in FIG. Figure 1B .

Dans ce cas, quand bien même une solution combinant une recherche de pitch (cas du décodage selon la recommandation G.711 App. I) et un recouvrement-addition produit pas la fenêtre d'une transformée MDCT serait mise en oeuvre, elle ne serait pas suffisante pour supprimer les artefacts audio liés notamment au déphasage entre différentes composantes fréquentielles.In this case, even if a solution combining a pitch search (case of the decoding according to the recommendation G.711 App.I) and a recovery-addition produces the window of an MDCT transform would be implemented, it would not be possible. not enough to remove the audio artifacts related in particular to the phase difference between different frequency components.

Un autre example de solution au problème de corriger la perte de trames au décodage d'un signal audio est proposé dans " Frame Erasure Concealment Using Sinusoidal Analysis-Synthesis and Its Application to MDCT-Based Codecs" by Parikh ET Al., in Acoustics, Speech, and Signal Processing, 2000, ICASSP '00 .Another example of solution to the problem of correcting the loss of frames when decoding an audio signal is proposed in " Frame Erasure Concealment Using Sinusoidal Analysis-Synthesis and Its Application to MDCT-Based Codecs "by Parikh et al., In Acoustics, Speech, and Signal Processing, 2000, ICASSP '00 .

La présente invention vient améliorer la situation.The present invention improves the situation.

Elle propose à cet effet un procédé de traitement d'un signal comportant une succession d'échantillons répartis en trames successives, le procédé étant mis en oeuvre pendant un décodage dudit signal pour remplacer au moins une trame de signal perdue au décodage. En particulier, le procédé comporte les étapes :

a) recherche, dans un signal valide disponible au décodage, d'un segment de signal, de durée correspondant à une période déterminée en fonction dudit signal valide,
b) analyse spectrale du segment, pour une détermination de composantes spectrales du segment,
c) synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir d'une partie au moins des composantes spectrales.

To this end, it proposes a method of processing a signal comprising a succession of samples distributed in successive frames, the method being implemented during a decoding of said signal to replace at least one lost signal frame at decoding. In particular, the method comprises the steps:

a) search, in a valid signal available for decoding, of a signal segment, of duration corresponding to a period determined according to said valid signal,
b) spectral analysis of the segment, for a determination of spectral components of the segment,
c) synthesis of at least one replacement frame of the lost frame, by constructing a synthesis signal from at least a part of the spectral components.

On entend ici par « trame » un bloc d'au moins un échantillon. Dans la plupart des codecs, ces trames sont constituées de plusieurs échantillons. Toutefois, dans des codecs notamment de type PCM (pour « Pulse Code Modulation »), par exemple selon la recommandation G.711, le signal est constitué simplement d'une succession d'échantillons (une « trame » au sens de l'invention ne comportant alors qu'un unique échantillon). L'invention peut alors aussi s'appliquer à ce type de codecs.Here, "frame" is understood to mean a block of at least one sample. In most codecs, these frames consist of several samples. However, in particular codecs of the PCM (for "Pulse Code Modulation") type, for example according to the G.711 recommendation, the signal consists simply of a succession of samples (a "frame" within the meaning of the invention with only one sample). The invention can then also be applied to this type of codecs.

Par exemple, le signal valide peut être constitué des dernières trames valides reçues avant la perte de trame. Eventuellement, on peut avoir recours aussi à une ou quelques trames valides suivantes, reçues après la trame perdue (bien qu'une telle réalisation entraîne un retard au décodage). Les échantillons du signal valide qui sont utilisés peuvent être directement ceux des trames, et éventuellement ceux qui correspondent à la mémoire de la transformée et qui contiennent typiquement un repliement (ou « aliasing ») dans le cas d'un décodage par transformée avec recouvrement de type MLT ou MDCT.For example, the valid signal may consist of the last valid frames received before the frame loss. Optionally, one can also use one or more following valid frames, received after the lost frame (although such an implementation leads to a delay in decoding). The samples of the valid signal that are used can be directly those of the frames, and possibly those which correspond to the memory of the transform and which typically contain a folding (or "aliasing") in the case of a transforming decoding with type MLT or MDCT overlay.

L'invention apporte alors une solution avantageuse à la correction de perte de trame(s), notamment dans le cas où un retard supplémentaire au décodeur est proscrit, par exemple lorsque l'on utilise un décodeur par transformée avec des fenêtres ne permettant pas d'avoir un recouvrement suffisamment grand entre le signal de substitution et le signal issu du dépliement temporel (cas typique des fenêtres à faible retard pour une MDCT ou une MLT, comme représenté sur la figure 1B). L'invention offre un avantage particulier pour un recouvrement, du fait de l'utilisation des composantes spectrales sur les dernières trames valides reçues pour construire un signal de synthèse comportant la coloration spectrale des ces dernières trames valides. Néanmoins, l'invention s'applique bien entendu à tout type de codage/décodage (par transformée, CELP, PCM, ou autres).The invention then provides an advantageous solution to the loss of frame correction (s), especially in the case where an additional delay to the decoder is prohibited, for example when using a decoder transform with windows not allowing have a sufficiently large overlap between the substitution signal and the signal resulting from the temporal unfolding (typical case of low delay windows for an MDCT or an MLT, as shown in FIG. Figure 1B ). The invention offers a particular advantage for an overlap, because of the use of the spectral components on the last valid frames received to construct a synthesis signal comprising the spectral coloring of these last valid frames. Nevertheless, the invention applies of course to any type of coding / decoding (by transform, CELP, PCM, or other).

Dans un mode de réalisation, le procédé comporte une recherche, par corrélation dans le signal valide, d'une période de répétition, la durée du segment précitée comportant alors au moins une période de répétition.In one embodiment, the method comprises searching, by correlation in the valid signal, of a repetition period, the duration of the aforementioned segment then comprising at least one repetition period.

Une telle « période de répétition » correspond par exemple à une période de pitch dans le cas d'un signal de parole voisé (inverse de la fréquence fondamentale du signal). Néanmoins, le signal peut être aussi issu d'un signal de musique par exemple, présentant une tonalité globale à laquelle est associée une fréquence fondamentale, ainsi qu'une période fondamentale qui pourrait correspondre à la période de répétition précitée.Such a "repetition period" corresponds for example to a pitch period in the case of a voiced speech signal (inverse of the fundamental frequency of the signal). Nevertheless, the signal may also be derived from a music signal, for example, having a global tone associated with a fundamental frequency, as well as a fundamental period that could correspond to the aforementioned repetition period.

On peut par exemple avoir recours à une recherche de période de répétition liée à la tonalité du signal. Par exemple, on peut constituer un premier buffer (ou « mémoire tampon » en français) des quelques derniers échantillons valablement reçus et chercher par corrélation dans un deuxième buffer de plus grande taille, les quelques échantillons du deuxième buffer qui correspondent le mieux dans leur succession à ceux du premier buffer. L'écart temporel entre ces échantillons identifiés du deuxième buffer et ceux du premier buffer peut constituer une période de répétition ou un multiple de cette période (selon la finesse de la recherche par corrélation). On peut noter que le fait de prendre un multiple de la période de répétition ne dégrade pas la mise en oeuvre de l'invention, car, dans ce cas, l'analyse spectrale se fait simplement sur une longueur couvrant plusieurs périodes au lieu d'une seule, ce qui contribue à augmenter la finesse de l'analyse.For example, it may be necessary to search for a repetition period related to the tone of the signal. For example, we can build a first buffer (or "buffer" in French) of the last few validly received samples and search by correlation in a second buffer of larger size, the few samples of the second buffer that best fit in their succession to those of the first buffer. The time difference between these identified samples of the second buffer and those of the first buffer may constitute a repetition period or a multiple of this period (depending on the fineness of the correlation search). It may be noted that the fact of taking a multiple of the repetition period does not degrade the implementation of the invention, because, in this case, the spectral analysis is simply done over a length covering several periods instead of only one, which helps to increase the fineness of the analysis.

Ainsi, on peut déterminer la durée de signal sur laquelle on effectue l'analyse spectrale comme étant :

une durée correspondant à une période de répétition (si une tonalité du signal est bien identifiable),
une durée correspondant à plusieurs périodes de répétition (cycles de pitch par exemple), si la corrélation donne un premier résultat de corrélation supérieur à un seuil prédéterminé, comme expliqué dans un mode de réalisation optionnel ci-après,
une durée arbitraire de signal (par exemple quelques dizaines d'échantillons), si une telle tonalité n'est pas identifiable (signal comportant essentiellement du bruit).

Thus, it is possible to determine the signal duration on which the spectral analysis is performed as being:

a duration corresponding to a repetition period (if a signal tone is clearly identifiable),
a duration corresponding to several repetition periods (pitch cycles for example), if the correlation gives a first correlation result greater than a predetermined threshold, as explained in an optional embodiment below,
an arbitrary duration of signal (for example a few tens of samples), if such a tone is not identifiable (signal comprising essentially noise).

Dans une réalisation particulière, la période de répétition précitée correspond à une durée pour laquelle la corrélation dépasse une valeur seuil prédéterminée. Ainsi, dans cette réalisation, on identifie la durée du signal dès que la corrélation dépasse une valeur seuil prédéterminée pour cette durée. La durée ainsi identifiée correspond à une ou plusieurs périodes associées à une fréquence de la tonalité globale précitée. Une telle réalisation permet avantageusement de limiter la complexité de la recherche par corrélation (par exemple, en fixant un seuil de corrélation à 60 ou 70%), même si en réalité on détecte non pas une seule, mais plusieurs périodes de pitch (par exemple entre deux et cinq périodes de pitch). D'une part, la complexité de la recherche par corrélation est alors plus basse. D'autre part, l'analyse spectrale sur plusieurs périodes est plus fine et les composantes spectrales obtenues sont plus finement analysées.In a particular embodiment, the aforementioned repetition period corresponds to a duration for which the correlation exceeds a predetermined threshold value. Thus, in this embodiment, the duration of the signal is identified as soon as the correlation exceeds a predetermined threshold value for this duration. The duration thus identified corresponds to one or more periods associated with a frequency of the above-mentioned overall tone. Such an embodiment advantageously makes it possible to limit the complexity of the search by correlation (for example, by fixing a correlation threshold at 60 or 70%), even if in fact one detects not only one, but several pitch periods (for example between two and five pitch periods). On the one hand, the complexity of the search by correlation is then lower. On the other hand, the spectral analysis over several periods is finer and the spectral components obtained are more finely analyzed.

Concernant l'obtention des composantes spectrales par analyse du segment (par exemple par transformée de Fourier rapide, ou « FFT »), le procédé comporte en outre une détermination des phases respectives associées à ces composantes spectrales et la construction du signal de synthèse comporte alors les phases des composantes spectrales. La construction du signal intègre alors ces phases, comme on le verra plus loin, pour une optimisation du raccordement du signal de synthèse aux dernières trames valides et, dans la plupart des cas naturels, aux trames valides suivantes.As regards obtaining the spectral components by segment analysis (for example by fast Fourier transform, or "FFT"), the method further comprises a determination of the respective phases associated with these spectral components and the construction of the synthesis signal then comprises the phases of the spectral components. The construction of the signal then integrates these phases, as will be seen later, for an optimization of the connection of the synthesis signal to the last valid frames and, in most natural cases, to the following valid frames.

Dans une réalisation particulière aussi, le procédé comporte en outre une détermination d'amplitudes respectives associées aux composantes spectrales, et la construction du signal de synthèse comporte ces amplitudes des composantes spectrales (pour leur prise en compte dans la construction du signal de synthèse).In a particular embodiment also, the method further comprises a determination of respective amplitudes associated with the spectral components, and the construction of the synthesis signal comprises these amplitudes of the spectral components (for their inclusion in the construction of the synthesis signal).

Dans une réalisation particulière, il est possible de sélectionner des composantes issues de l'analyse pour la construction du signal de synthèse. Par exemple, dans une réalisation où le procédé comporte une détermination d'amplitudes respectives associées aux composantes spectrales, les composantes spectrales d'amplitudes les plus élevées peuvent être celles sélectionnées pour la construction du signal de synthèse. On peut sélectionner aussi, en complément ou en variante, celles dont l'amplitude forme un pic dans le spectre des fréquences.In a particular embodiment, it is possible to select components resulting from the analysis for the construction of the synthesis signal. For example, in an embodiment where the process comprises a determination of respective amplitudes associated with the spectral components, the spectral components of the highest amplitudes may be those selected for the construction of the synthesis signal. It is also possible to select, in addition or alternatively, those whose amplitude forms a peak in the frequency spectrum.

Dans le cas où seule une partie des composantes spectrales est sélectionnée, dans une réalisation particulière, on ajoute du bruit au signal de synthèse pour compenser une perte d'énergie relative à des composantes spectrales non sélectionnées pour la construction du signal de synthèse.In the case where only a part of the spectral components is selected, in a particular embodiment, noise is added to the synthesis signal to compensate for a loss of energy relative to spectral components not selected for the construction of the synthesis signal.

Dans une réalisation, le bruit précité est obtenu par un résidu pondéré (temporellement) entre le signal du segment et le signal de synthèse. Il peut par exemple être pondéré par des fenêtres de recouvrement, comme dans le cadre d'un codage/décodage par transformation avec recouvrement.In one embodiment, the aforementioned noise is obtained by a weighted (temporally) residual between the segment signal and the synthesis signal. For example, it may be weighted by overlapping windows, as in the case of overlap transformation encoding / decoding.

L'analyse spectrale du segment comporte une analyse sinusoïdale par transformée de Fourier rapide (FFT), préférentiellement de longueur 2^k, où k est supérieur ou égal à log₂(P), P étant le nombre d'échantillons dans le segment de signal. Une telle réalisation permet de réduire la complexité du traitement, comme détaillé plus loin. On peut noter que d'autres transformées sont possibles, par exemple une transformée de type Modulated Complex Lapped Transform (MCLT) en tant qu'alternative possible à la transformée FFT.The spectral analysis of the segment comprises a Fast Fourier Transform (FFT) sinusoidal analysis, preferably of length 2 k, where k is greater than or equal to log ₂ (P), where P is the number of samples in the signal. Such an embodiment reduces the complexity of the treatment, as detailed below. It may be noted that other transforms are possible, for example a Modulated Complex Lapped Transform (MCLT) transform as a possible alternative to the FFT transform.

En particulier, on peut prévoir, dans l'étape d'analyse spectrale :

une interpolation des échantillons du segment pour obtenir un deuxième segment comportant un nombre d'échantillons 2^ceil(log₂(P)), où ceil(x) est l'entier supérieur ou égal à x,
un calcul de la transformée de Fourier du deuxième segment ; et
après détermination des composantes spectrales, identification de fréquences associées aux composantes, et construction du signal de synthèse par ré-échantillonnage avec modification desdites fréquences en fonction du ré-échantillonnage.

In particular, it can be provided in the spectral analysis step:

an interpolation of the samples of the segment to obtain a second segment comprising a number of samples 2 ^ ceil (log ₂ (P)), where ceil (x) is the integer greater than or equal to x,
a calculation of the Fourier transform of the second segment; and
after determination of the spectral components, identification of frequencies associated with the components, and construction of the synthesis signal by resampling with modification of said frequencies as a function of resampling.

La présente invention trouve une application avantageuse mais aucunement limitative au contexte de décodage par transformée avec recouvrement. Dans un tel contexte, il peut être avantageux que le signal de synthèse soit construit (répété) sur une durée d'au moins deux trames, de manière à couvrir aussi les parties comportant un repliement temporel (aliasing) au-delà d'une seule trame.The present invention finds an advantageous but in no way limiting application to the context of decoding by transform with overlap. In such a context, it may be advantageous for the synthesis signal to be constructed (repeated) over a period of at least two frames, so as to cover also the parts having an aliasing beyond one frame.

Dans une réalisation particulière, le signal de synthèse peut être construit sur deux durées de trame et encore une durée supplémentaire correspondant à un délai introduit par un filtre de ré-échantillonnage (notamment dans la réalisation exposée ci-avant et où un ré-échantillonnage est prévu).In a particular embodiment, the synthesis signal can be constructed over two frame times and still an additional duration corresponding to a delay introduced by a resampling filter (in particular in the embodiment described above and where a resampling is planned).

Il peut être avantageux de gérer un buffer de gigue dans certaines réalisations. Dans le cas où la correction de perte de trames est réalisée conjointement avec la gestion d'un buffer de gigue, l'invention peut alors être appliquée dans ces conditions en adaptant la durée du signal synthèse.It may be advantageous to manage a jitter buffer in some embodiments. In the case where the frame loss correction is carried out jointly with the management of a jitter buffer, the invention can then be applied under these conditions by adapting the duration of the synthesis signal.

Dans une réalisation, le procédé comporte en outre une séparation en une bande de fréquences hautes et une bande de fréquences basses, du signal issu de la ou des trame(s) valide(s), et les composantes spectrales sont sélectionnées dans la bande de fréquences basses. Une telle réalisation permet de limiter la complexité du traitement essentiellement à la bande de fréquences basses, les hautes fréquences apportant peu de richesse spectrale au signal de synthèse et pouvant être répétées de façon plus simple.In one embodiment, the method further comprises a separation in a high frequency band and a low frequency band, of the signal coming from the valid frame (s), and the spectral components are selected in the band of low frequencies. Such an embodiment makes it possible to limit the complexity of the processing essentially to the low frequency band, the high frequencies providing little spectral richness to the synthesis signal and which can be repeated in a simpler way.

Dans cette réalisation, la trame de remplacement peut être synthétisée par addition :

d'un premier signal construit à partir de composantes spectrales sélectionnées dans la bande de fréquences basses, et
d'un deuxième signal issu d'un filtrage dans la bande de fréquences hautes,

le deuxième signal étant obtenu par duplication successive d'au moins une demi-trame valide et sa version retournée temporellement.In this embodiment, the replacement frame can be synthesized by addition:

a first signal constructed from selected spectral components in the low frequency band, and
a second signal resulting from a filtering in the high frequency band,

the second signal being obtained by successive duplication of at least one valid half-frame and its version returned temporally.

La présente invention vise aussi un programme informatique comportant des instructions pour la mise en oeuvre du procédé (dont, par exemple, un organigramme général peut être le schéma général de la figure 2, et éventuellement des organigrammes particuliers des figures 5 et/ou 8 dans certains modes de réalisation).The present invention is also directed to a computer program comprising instructions for implementing the method (of which, for example, a general flow chart may be the general diagram of the figure 2 , and possibly specific flow charts of figures 5 and / or 8 in some embodiments).

La présente invention vise aussi un dispositif de décodage d'un signal comportant une succession d'échantillons répartis en trames successives, le dispositif comportant des moyens pour remplacer au moins une trame de signal perdue, comportant :

a) des moyens de recherche, dans un signal valide disponible au décodage, d'un segment de signal, de durée correspondant à une période déterminée en fonction dudit signal valide,
b) des moyens d'analyse spectrale du segment, pour une détermination de composantes spectrales du segment,
c) des moyens de synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir d'une partie au moins des composantes spectrales.

The present invention also relates to a device for decoding a signal comprising a succession of samples distributed in successive frames, the device comprising means for replacing at least one lost signal frame, comprising:

a) search means, in a valid signal available at decoding, of a signal segment, of duration corresponding to a determined period as a function of said valid signal,
b) spectral analysis means of the segment, for a determination of spectral components of the segment,
c) means for synthesizing at least one replacement frame of the lost frame, by constructing a synthesis signal from at least a part of the spectral components.

Un tel dispositif peut prendre la forme matérielle par exemple d'un processeur et éventuellement d'une mémoire de travail, typiquement dans un terminal de communication.Such a device can take the physical form of, for example, a processor and possibly a working memory, typically in a communication terminal.

D'autres avantages et caractéristiques de l'invention apparaîtront à la lecture de la description détaillée ci-après d'exemples de réalisation de l'invention et à l'examen des dessins sur lesquels :

la figure 1A illustre un recouvrement avec des fenêtres classiques dans le cadre d'une transformée MLT,
la figure 1B illustre un recouvrement avec des fenêtres à faible retard, en comparaison de la représentation de la figure 1A,
la figure 2 représente un exemple de traitement général au sens de l'invention,
la figure 3 illustre la détermination d'un segment de signal correspondant à une période fondamentale,
la figure 4 illustre la détermination d'un segment de signal correspondant à une période fondamentale, avec, dans cet exemple de réalisation, un décalage la recherche de corrélation,
la figure 5 représente un mode de réalisation d'une analyse spectrale du segment de signal,
la figure 6 illustre un exemple de réalisation pour recopier, dans les hautes fréquences, une trame valide en remplacement de plusieurs trames perdues,
la figure 7 illustre la reconstruction du signal des trames perdues, avec la pondération par les fenêtres de synthèse,
la figure 8 illustre un exemple d'application du procédé au sens de la présente invention, au décodage d'un signal,
la figure 9 représente schématiquement un dispositif comportant des moyens de mise en oeuvre du procédé au sens de l'invention.

Other advantages and characteristics of the invention will appear on reading the following detailed description of embodiments of the invention and on examining the drawings in which:

the Figure 1A illustrates an overlap with conventional windows as part of an MLT transform,
the Figure 1B illustrates a recovery with low delay windows, compared to the representation of the Figure 1A ,
the figure 2 represents an example of general treatment within the meaning of the invention,
the figure 3 illustrates the determination of a signal segment corresponding to a fundamental period,
the figure 4 illustrates the determination of a signal segment corresponding to a fundamental period, with, in this embodiment, an offset correlation search,
the figure 5 represents an embodiment of a spectral analysis of the signal segment,
the figure 6 illustrates an exemplary embodiment for copying, at high frequencies, a valid frame replacing several lost frames,
the figure 7 illustrates the reconstruction of the signal of the lost frames, with the weighting by the windows of synthesis,
the figure 8 illustrates an example of application of the method within the meaning of the present invention, to the decoding of a signal,
the figure 9 schematically represents a device comprising means for implementing the method within the meaning of the invention.

Un traitement au sens de l'invention est illustré sur la figure 2. Il est mis en oeuvre auprès d'un décodeur. Le décodeur peut être de type quelconque, le traitement étant globalement indépendant de la nature du codage/décodage. Dans l'exemple décrit, le traitement s'applique à un signal audio reçu. Il peut s'appliquer toutefois de façon plus générale à tout type de signal analysé par fenêtrage temporel et transformation, avec une harmonisation à assurer avec une ou plusieurs trames de remplacement lors d'une synthèse par recouvrement-addition.A treatment according to the invention is illustrated on the figure 2 . It is implemented with a decoder. The decoder can be of any type, the processing being generally independent of the nature of the coding / decoding. In the example described, the processing applies to a received audio signal. However, it can be applied more generally to any type of signal analyzed by time windowing and transformation, with harmonization to ensure with one or more replacement frames during a recovery-addition synthesis.

Au cours d'une première étape S1 du traitement de la figure 2, N échantillons audio sont stockés successivement dans une mémoire tampon ou « buffer » (par exemple de type FIFO). Le buffer audio b(n) peut être ainsi constitué par exemple de 47 ms de signal, soit par exemple de 2,35=47/20 trames audio de 20 ms chacune, à une fréquence d'échantillonnage Fe donnée, par exemple Fe=32 kHz. Ces échantillons correspondent à des échantillons déjà décodés et donc accessibles au moment du traitement de correction de perte de trame(s). Si le premier échantillon à synthétiser est l'échantillon d'indice temporel N (d'une ou plusieurs trames consécutives perdues), le buffer audio b(n) correspond alors aux N échantillons précédents d'indices temporels 0 à N-1. Dans le cas d'un codeur par transformée, le buffer audio correspond aux échantillons déjà décodés dans la trame passée (et sont donc non modifiables). Si l'ajout d'un retard supplémentaire au décodeur est possible (par exemple de D échantillons), le buffer peut ne contenir qu'une partie seulement des échantillons disponibles au décodeur, laissant par exemple les D derniers échantillons pour le recouvrement-addition (de l'étape S10 de la figure 2).During a first step S1 of the treatment of the figure 2 N audio samples are stored successively in a buffer or "buffer" (eg FIFO type). The audio buffer b (n) can thus be constituted for example by 47 ms of signal, for example of 2.35 = 47/20 audio frames of 20 ms each, at a given sampling frequency Fe, for example Fe = 32 kHz. These samples correspond to samples already decoded and therefore accessible at the time of the loss of frame correction processing (s). If the first sample to be synthesized is the time index sample N (of one or more consecutive frames lost), the audio buffer b (n) then corresponds to the N preceding samples of time indices 0 to N-1. In the case of a transform coder, the audio buffer corresponds to the samples already decoded in the past frame (and are therefore non-modifiable). If the addition of an additional delay to the decoder is possible (for example of D samples), the buffer may contain only a part of the samples available at the decoder, leaving, for example, the last D samples for the recovery-addition ( from step S10 of the figure 2 ).

A l'étape de filtrage S2, le buffer audio b(n) est ensuite séparé en deux bandes de fréquences, une bande de fréquences basses BB et une bande de fréquences hautes BH avec une fréquence de séparation notée Fc ci-après, avec par exemple Fc=4kHz. Ce filtrage est préférentiellement un filtrage sans délai. La taille du buffer audio définie précédemment correspond alors préférentiellement, avec cette fréquence Fc, maintenant à N' = N Fc/Fe.At the filtering step S2, the audio buffer b (n) is then separated into two frequency bands, a low frequency band BB and a high frequency band BH with a separation frequency denoted Fc hereinafter, with example Fc = 4kHz. This filtering is preferably a filtering without delay. The size of the audio buffer defined above then preferably corresponds, with this frequency Fc, now to N '= N Fc / Fe.

L'étape S3, appliquée sur la bande de fréquences basses, consiste à rechercher ensuite un point de bouclage et un segment P correspondant à la période fondamentale (ou période de « pitch ») au sein du buffer b(n) ré-échantillonné avec la fréquence Fc. A cet effet, on calcule dans un exemple de réalisation une corrélation normalisée corr(n) entre :

un segment cible du buffer (référence CIB de la figure 3), ce segment étant de taille Ns comprise entre N'-Ns et N'-1 (d'une durée par exemple de 6ms), et
un segment glissant de taille Ns qui commence à un échantillon occupant une position entre l'échantillon 0 et l'échantillon Nc (avec Nc>N'-Ns ; Nc correspondant pas exemple à une durée de 35 ms),

avec :

Corr (n) = \frac{\sum_{k = 0}^{k = Ns} b (n + k) b (Nʹ - Ns + k)}{\sqrt{\sum_{k = 0}^{k = Ns} b {(n + k)}^{2}} \sqrt{\sum_{k = 0}^{k = Ns} b {(Nʹ - Ns + k)}^{2}}} n \in [0, Nc]

Step S3, applied to the low frequency band, consists in then searching for a loop point and a segment P corresponding to the fundamental period (or "pitch" period) within the buffer b (n) resampled with the frequency Fc. For this purpose, a standard correlation corr (n) between:

a target segment of the buffer (reference CIB of the figure 3 ), this segment being of size Ns between N'-Ns and N'-1 (of a duration for example of 6 ms), and
a sliding segment of size Ns which starts at a sample occupying a position between the sample 0 and the sample Nc (with Nc>N'-Ns; Nc corresponding, for example, to a duration of 35 ms),

with:

Corr (not) = \frac{Σ_{k = 0}^{k = ns} b (not + k) b (NOT - ns + k)}{\sqrt{Σ_{k = 0}^{k = ns} b {(not + k)}^{2}} \sqrt{Σ_{k = 0}^{k = ns} b {(NOT - ns + k)}^{2}}} not \in [0, Nc]

En référence à la figure 3, si le maximum de corrélation est trouvé pour l'échantillon d'indice temporel n=mc, le point de bouclage avec une période de pitch, d'indice n=pb, correspond à l'échantillon mc+Ns et le segment noté p(n) qui suit sur la figure 3 correspond à une période de pitch de taille P=N'-Ns-mc, défini entre les échantillons n=pb et n=N'-1.With reference to the figure 3 if the maximum correlation is found for the time index sample n = mc, the loop point with a pitch period, index n = pb, corresponds to the sample mc + Ns and the segment noted p (n) following on the figure 3 corresponds to a pitch period of size P = N'-Ns-mc defined between the samples n = pb and n = N'-1.

Le segment glissant, de recherche, est antérieur au segment cible, comme représenté sur la figure 3. En particulier, le premier échantillon du segment cible correspond au dernier échantillon du segment de recherche. Si le maximum de corrélation avec le segment cible CIB est trouvé antérieurement dans le segment de recherche en un point d'indice mc, alors il s'écoule au moins une période de pitch (avec une même intensité de sinusoïde par exemple) entre le point d'indice temporel mc et l'échantillon d'indice temporel mc+P. Il s'écoule de la même manière au moins une période de pitch entre l'échantillon d'indice mc+Ns (point de bouclage, d'indice pb) et le dernier échantillon du buffer N'.The sliding segment, of search, is prior to the target segment, as represented on the figure 3 . In particular, the first sample of the target segment corresponds to the last sample of the search segment. If the maximum correlation with the target segment CIB is previously found in the search segment at a point of index mc, then at least one pitch period (with the same sinusoidal intensity for example) flows between the point of temporal index mc and the temporal index sample mc + P. In the same way, at least one pitch period is passed between the sample of index mc + Ns (loopback point, of index pb) and the last sample of buffer N '.

Il convient de noter qu'une variante de cette réalisation consiste en une auto-corrélation sur le buffer, revenant à trouver une période moyenne P identifiée dans le buffer. Dans ce cas, le segment servant pour la synthèse comporte les P derniers échantillons du buffer. Toutefois, un calcul d'auto-corrélation sur un grand segment peut être complexe et nécessiter plus de ressource informatique qu'une simple corrélation du type décrit ci-avant.It should be noted that a variant of this embodiment consists of an autocorrelation on the buffer, returning to find an average period P identified in the buffer. In this case, the segment serving for the synthesis comprises the last P samples of the buffer. However, a self-correlation calculation on a large segment can be complex and require more computing resource than a simple correlation of the type described above.

D'ailleurs, une autre variante de cette réalisation consiste à ne pas rechercher nécessairement le maximum de corrélation sur tout le segment de recherche, mais à rechercher simplement un segment où la corrélation avec le segment cible est supérieure à un seuil choisi (par exemple 70%). Une telle réalisation ne donne pas précisément une seule période de pitch P (mais possiblement plusieurs périodes successives), mais néanmoins la complexité liée au traitement d'un long segment de synthèse (de plusieurs périodes de pitch) nécessite autant, voire moins de ressource, que la recherche d'un maximum de corrélation sur tout le segment de recherche.Moreover, another variant of this embodiment consists in not necessarily seeking the maximum correlation over the entire search segment, but simply looking for a segment where the correlation with the target segment is greater than a chosen threshold (for example 70 %). Such an embodiment does not give precisely a single pitch period P (but possibly several successive periods), but nevertheless the complexity related to the processing of a long synthetic segment (of several pitch periods) requires as much or less resource, as the search for maximum correlation across the entire search segment.

Dans ce qui suit, on présume qu'une seule période de pitch P sert à la synthèse du signal, mais il convient de noter toutefois que le principe du traitement s'applique aussi bien pour un segment s'étendant sur plusieurs périodes fondamentales. Les résultats s'avèrent même meilleurs avec plusieurs périodes de pitch, en termes de précision sur la transformée FFT et de richesse sur les composantes spectrales obtenues.In what follows, it is assumed that a single pitch period P is used for the synthesis of the signal, but it should be noted, however, that the principle of the treatment applies equally well for a segment extending over several fundamental periods. The results are even better with several pitch periods, in terms of precision on the FFT transform and richness on the spectral components obtained.

Dans le cas où des transitoires seraient présentes dans le signal audio contenu dans le buffer (pics d'intensité très courts temporellement, dans le signal audio), il est possible d'adapter la zone de recherche de corrélation, par exemple en décalant la recherche de corrélation (en la faisant commencer typiquement 20 ms après le début du buffer audio comme illustré à titre d'exemple sur la figure 4, ou en effectuant la recherche de corrélation dans une zone temporelle commençant après la fin d'un transitoire).In the case where transients are present in the audio signal contained in the buffer (very short intensity peaks temporally, in the audio signal), it is possible to adapt the correlation search zone, for example by shifting the search correlation (by starting it typically 20 ms after the start of the audio buffer as shown by way of example on the figure 4 , or performing the correlation search in a time zone beginning after the end of a transient).

L'étape suivante S4 consiste à décomposer le segment p(n) en une somme de sinus. Une façon classique pour décomposer un signal en une somme de sinus consiste à calculer la transformée de Fourier discrète (ou DFT en anglais) du signal sur une durée correspondant à la longueur du signal. On obtient ainsi la fréquence, la phase et l'amplitude de chacune des composantes sinusoïdales qui composent le signal. Dans un mode de réalisation particulier de l'invention, pour des raisons de réduction de complexité, cette analyse est faite par une transformée de Fourier rapide FFT, de taille 2^k (avec k supérieur ou égal à log₂(P)).The next step S4 consists of breaking down the segment p (n) into a sum of sines. A conventional way of breaking down a signal into a sum of sines is to calculate the discrete Fourier transform (or DFT in English) of the signal over a duration corresponding to the length of the signal. This gives the frequency, the phase and the amplitude of each of the sinusoidal components that make up the signal. In a particular embodiment of the invention, for complexity reduction reasons, this analysis is performed by a fast Fourier transform FFT, size 2 ^ k (with k greater than or equal to log ₂ (P)).

Dans ce mode particulier, l'étape S4 est décomposée en trois opérations, avec, en référence à la figure 5 :

l'opération S41 où les échantillons du segment p(n) sont interpolés de manière à obtenir un segment p'(n) composé de P' échantillons avec P' = 2^{ceil(log2(P))} > P, où ceil(x) est l'entier supérieur ou égal à x (on peut par exemple et de manière non restrictive utiliser une interpolation linéaire ou encore de type « spline cubique ») ;
l'opération S42 avec le calcul de la transformée FFT de p'(n) : Π(k) = FFT(p'(n)); et
l'opération S43 dans laquelle, à partir de la transformée FFT, on obtient directement les phases ϕ(k)et amplitudes A(k) des composantes sinusoïdales, les fréquences normalisées entre 0 et 1 étant données par : $f (k) = \frac{2 kPt}{p^{2}} k \subset [0; \frac{pʹ}{2} - 1]$

In this particular mode, step S4 is decomposed into three operations, with, with reference to figure 5 :

the operation S41 where the samples of the segment p (n) are interpolated so as to obtain a segment p '(n) composed of P' samples with P '= 2 ^{ceil (log} ² ^{( P ))} > P , where ceil (x) is the integer greater than or equal to x (it is possible, for example and without limitation, to use linear interpolation or else of "cubic spline"type);
operation S42 with the calculation of the FFT transform of p (n): Π (k) = FFT (p '(n)); and
the operation S43 in which, from the FFT transform, the phases φ ( k ) and amplitudes A ( k ) of the sinusoidal components are obtained directly, the normalized frequencies between 0 and 1 being given by: $f (k) = \frac{2 KPT}{p^{2}} k \subset [0; \frac{p'}{2} - 1]$

A l'étape S5 de la figure 2, les composantes sinusoïdales sont sélectionnées de manière à ne garder uniquement que les composantes les plus importantes. Dans un mode de réalisation particulier, la sélection des composantes revient à :

sélectionner tout d'abord les amplitudes A(k) pour lesquelles A(k)>A(k-1) et A(k)>A(k+1) avec $k \in [0; \frac{pʹ}{2} - 1],$
ensuite, parmi les amplitudes de cette première sélection, on sélectionne les composantes, par exemple par ordre décroissant d'amplitude, de manière à ce que l'amplitude cumulée des pics sélectionnés soit d'au moins x% (par exemple x=70%) de l'amplitude cumulée du demi-spectre.

At step S5 of the figure 2 , the sinusoidal components are selected so as to keep only the most important components. In a particular embodiment, the selection of the components amounts to:

first select the amplitudes A (k) for which A (k)> A (k-1) and A (k)> A (k + 1) with $k \in [0; \frac{p'}{2} - 1],$
then, among the amplitudes of this first selection, the components are selected, for example in descending order of amplitude, so that the cumulative amplitude of the selected peaks is at least x% (for example x = 70% ) of the cumulative amplitude of the half-spectrum.

Il est aussi possible en plus, de limiter le nombre de composantes (par exemple à 20) de manière à rendre la synthèse moins complexe. De façon alternative, une recherche d'un nombre prédéterminé des pics les plus importants peut être utilisée.It is also possible in addition to limit the number of components (for example to 20) so as to make the synthesis less complex. Alternatively, a search for a predetermined number of the most important peaks can be used.

Bien entendu, la méthode de sélection des composantes spectrales ne se limite pas aux exemples présentés ci-dessus. Elle est susceptible de variantes. Elle peut notamment se baser sur tout critère permettant d'identifier des composantes spectrales utiles à la synthèse du signal (par exemple des critères subjectifs liés au masquage, des critères liés l'harmonicité du signal, ou autres).Of course, the spectral component selection method is not limited to the examples presented above. It is susceptible of variants. It can in particular be based on any criterion making it possible to identify spectral components useful for the synthesis of the signal (for example subjective criteria related to masking, criteria related to the harmonicity of the signal, or others).

L'étape suivante S6 vise une synthèse sinusoïdale. Dans un exemple de réalisation, elle consiste à générer un segment s(n) de longueur au moins égale à la taille d'une trame perdue (T). Dans un mode de réalisation particulier, une longueur égale à 2 trames (par exemple 40 ms) est générée de manière à pouvoir effectuer un mixage sonore de type « fondu enchainé » (comme une transition) entre le signal synthétisé (par correction de perte d'une trame) et le signal décodé à la trame valide suivante lorsqu'une telle trame est à nouveau reçue correctement.The next step S6 is a sinusoidal synthesis. In an exemplary embodiment, it consists in generating a segment s (n) of length at least equal to the size of a lost frame (T). In a particular embodiment, a length equal to 2 frames (for example 40 msec) is generated so as to be able to perform a "cross-fade" sound mix (as a transition) between the synthesized signal (by loss correction). a frame) and the decoded signal to the next valid frame when such a frame is received again correctly.

Pour anticiper le ré-échantillonnage de la trame (longueur d'échantillons notée LF), le nombre d'échantillons à synthétiser peut être augmenté de la moitié de la taille du filtre de ré-échantillonnage (LF). Le signal de synthèse s(n) est calculé comme une somme des composantes sinusoïdales sélectionnées : $s (n) = \sum_{k = 0}^{k = K} [A (k) \sin (πf (k)] n + ϕ (k)) n \in [0; 2 T + \frac{LF}{2}]$

où k est l'indice des K composantes sélectionnées de l'étape S5. Plusieurs méthodes classiques pour réaliser cette synthèse sinusoïdale sont possibles.To anticipate resampling of the frame (length of samples noted LF), the number of samples to be synthesized can be increased by half the size of the resampling filter (LF). The synthesis signal s (n) is calculated as a sum of the selected sinusoidal components:

s (not) = Σ_{k = 0}^{k = K} [AT (k) \sin (πf (k)] not + φ (k)) not \in [0; 2 T + \frac{LF}{2}]

where k is the index of K selected components of step S5. Several conventional methods for performing this sinusoidal synthesis are possible.

L'étape S7 de la figure 2 consiste à injecter du bruit de manière à compenser la perte d'énergie liée à l'omission de certaines composantes fréquentielles dans la bande de fréquences basses. Un mode de réalisation particulier consiste à calculer le résidu r(n)=p(n)-s(n) entre le segment correspondant au pitch p(n) et le signal synthétisé s(n), avec : n ∈ [0; P - 1].Step S7 of the figure 2 consists in injecting noise so as to compensate for the energy loss associated with the omission of certain frequency components in the low frequency band. A particular embodiment consists in calculating the residue r (n) = p (n) -s (n) between the segment corresponding to the pitch p (n) and the synthesized signal s (n), with: n ∈ [0; P - 1].

Ce résidu de taille P est répété de manière à ce qu'il atteigne une taille $2 T + \frac{LF}{2} .$

Le signal s(n) est ensuite mixé (additionné avec éventuellement une pondération) au signal r(n).This residue of size P is repeated so that it reaches a size

2 T + \frac{LF}{2} .

The signal s (n) is then mixed (added with possibly weighting) to the signal r (n).

Bien entendu, la méthode de génération du bruit (pour obtenir un bruit de fond naturel) n'est pas limitée à l'exemple ci-avant et admet des variantes. Par exemple, il est possible aussi de calculer le résidu dans le domaine fréquentiel (en supprimant les composantes spectrales sélectionnées du spectre orignal) et d'obtenir un bruit de fond par transformée inverse.Of course, the noise generation method (to obtain a natural background noise) is not limited to the example above and admits variants. For example, it is also possible to calculate the residual in the frequency domain (by removing the selected spectral components of the original spectrum) and to obtain a background noise by inverse transform.

Parallèlement, l'étape S8 consiste à traiter la bande des hautes fréquences simplement en répétant le signal. Par exemple, il peut s'agir de répéter une longueur de trame T. Dans une réalisation plus sophistiquée, la synthèse de la bande BH est obtenue en prenant les derniers T' échantillons avant la perte de trame (avec par exemple T'=N/2), et en les retournant temporellement, puis en les répétant sans les retourner, et ainsi de suite, comme illustré sur la figure 6. Une telle réalisation permet avantageusement d'éviter des artefacts audibles en mettant à même niveau les intensités en début et fin de trames.At the same time, step S8 consists of processing the high frequency band simply by repeating the signal. For example, it may be to repeat a frame length T. In a more sophisticated embodiment, the synthesis of the BH band is obtained by taking the last T 'samples before the frame loss (with for example T' = N / 2), and turning them over temporally, then repeating them without turning them over, and so on, as illustrated on the figure 6 . Such an embodiment advantageously makes it possible to avoid audible artifacts by putting the intensities at the beginning and end of the frames at the same level.

Dans un mode de réalisation particulier, la trame de taille T' peut être pondérée de manière à éviter certain artefacts lorsque les contenus sont particulièrement énergétiques dans la bande de fréquences hautes. La pondération (notée W sur la figure 6) peut par exemple prendre la forme d'une demi-fenêtre sinusoïdale de 1 ms au début et à la fin de la trame de taille T/2. Les trames successives peuvent aussi se recouvrir.In a particular embodiment, the frame of size T 'may be weighted so as to avoid certain artifacts when the contents are particularly energetic in the high frequency band. The weighting (denoted W on the figure 6 ) can for example take the form of a sinusoidal half-window of 1 ms at the beginning and at the end of the frame of size T / 2. The successive frames can also be overlapped.

Dans une étape S9, le signal est synthétisé en ré-échantillonnant la bande des basses fréquences à sa fréquence Fc d'origine, et en l'additionnant au signal issu de la répétition de l'étape S8 dans la bande des fréquences hautes.In a step S9, the signal is synthesized by resampling the low frequency band at its original frequency Fc, and adding it to the signal from the repetition of step S8 in the high frequency band.

A l'étape S10, on procède à un recouvrement-addition qui permet d'assurer une continuité entre le signal avant la perte de trame et le signal synthétisé. Par exemple, dans le cas d'un codage par transformée à bas délai, on utilise pour la mise en oeuvre de cette étape S10, les L échantillons situés entre le début de la partie « aliasée » (partie repliée restante) de la transformée MDCT et les trois-quarts de la taille de la fenêtre (avec par exemple un axe de repliement temporel des fenêtres comme habituellement dans le cadre d'une transformée MDCT). En référence à la figure 7, ces échantillons sont déjà recouverts par la fenêtre de synthèse W1 de la transformée MDCT. De manière à pouvoir leur appliquer une fenêtre de recouvrement W2, les échantillons sont divisés par la fenêtre W1 (laquelle est déjà connue du décodeur), puis multipliés par la fenêtre W2. Le signal S(n) synthétisé par la mise en oeuvre des étapes S1 à S9 décrites précédemment s'exprime ainsi : $S (n) = L (n) \frac{W 3 (n)}{W 1 (n)} + S (n) W 2 (n) n \in [0, L - 1]$

avec par exemple, et de manière non limitative, des fonctions de recouvrement définies par :

W 2 (n) = \sin {(\frac{π (n + 0.5)}{2 L})}^{2} et W 3 (n) = 1 - W 2 (n) n \in [0; L - 1]

In step S10, a recovery-addition is carried out which ensures a continuity between the signal before the loss of frame and the synthesized signal. For example, in the case of a low-delay transform coding, for the implementation of this step S10, the L samples located between the beginning of the "aliased" portion (the remaining folded portion) of the MDCT transform are used. and three quarters of the size of the window (with for example a temporal folding axis of windows as usually in the context of an MDCT transform). With reference to the figure 7 these samples are already covered by the synthesis window W1 of the MDCT transform. In order to be able to apply a cover window W2, the samples are divided by the window W1 (which is already known to the decoder), then multiplied by the window W2. The signal S (n) synthesized by the implementation of the steps S1 to S9 described above is expressed as follows:

S (not) = The (not) \frac{W 3 (not)}{W 1 (not)} + S (not) W 2 (not) not \in [0, The - 1]

with, for example, and without limitation, overlay functions defined by:

W 2 (not) = \sin {(\frac{π (not + 0.5)}{2 The})}^{2} and W 3 (not) = 1 - W 2 (not) not \in [0; The - 1]

Comme décrit précédemment, si l'on autorise un délai au niveau du décodeur, cette durée du délai peut être utilisée pour faire un recouvrement avec la partie synthétisée, en utilisant toute pondération appropriée au recouvrement-addition.As previously described, if a delay is allowed at the decoder, this delay time can be used to overlap with the synthesized portion, using any appropriate weighting for the overlay.

Bien entendu, la présente invention ne se limite pas à la forme de réalisation décrite ci-avant ; elle s'étend à d'autres variantes.Of course, the present invention is not limited to the embodiment described above; it extends to other variants.

Ainsi par exemple, la séparation en bandes de fréquences hautes et basses à l'étape S2 est optionnelle. Dans une variante de réalisation, le signal issu du buffer (étape S1) n'est pas séparé en deux sous-bandes et les étapes S3 à S10 restent identiques à celles décrites ci-avant. Néanmoins, le traitement des composantes spectrales dans les basses fréquences seulement permet avantageusement d'en limiter la complexité.For example, separation in high and low frequency bands in step S2 is optional. In an alternative embodiment, the signal from the buffer (step S1) is not separated into two subbands and the steps S3 to S10 remain identical to those described above. Nevertheless, the processing of the spectral components in the low frequencies only advantageously makes it possible to limit their complexity.

L'invention peut être mise en oeuvre dans un décodeur conversationnel, dans le cas d'une perte de trame. Matériellement, elle peut être mise en oeuvre dans un circuit pour le décodage, dans un terminal de téléphonie typiquement. A cet effet, un tel circuit CIR peut comporter ou être connecté à un processeur PROC, comme illustré sur la figure 9, et peut comporter une mémoire de travail MEM, programmée avec des instructions de programme informatique selon l'invention pour exécuter le procédé ci-avant.The invention can be implemented in a conversational decoder, in the case of a loss of frame. Materially, it can be implemented in a circuit for decoding, typically in a telephony terminal. For this purpose, such a circuit CIR may comprise or be connected to a processor PROC, as illustrated on the figure 9 , and may include a MEM working memory, programmed with computer program instructions according to the invention to perform the above method.

Par exemple, l'invention peut être mise en oeuvre dans un décodeur par transformée temps réel. En référence à la figure 8, le décodeur envoie des requêtes pour obtenir une trame audio dans un buffer de trames (étape S81). Si la trame est disponible (sortie OK du test), le décodeur décode la trame (S82) pour obtenir un signal dans le domaine transformé, opère une transformé inverse IMDCT (S83) qui permet alors d'obtenir des échantillons temporels « aliasés », et procède à une dernière étape S84 de fenêtrage (par une fenêtre de synthèse) et de recouvrement pour obtenir des échantillons temporels exempts d'aliasing qui seront ensuite envoyés à un convertisseur numérique analogique pour restitution.For example, the invention can be implemented in a real-time transform decoder. With reference to the figure 8 , the decoder sends requests to obtain an audio frame in a frame buffer (step S81). If the frame is available (OK output of the test), the decoder decodes the frame (S82) to obtain a signal in the transformed domain, operates an IMDCT inverse transform (S83) which then makes it possible to obtain "aliased" time samples, and proceeds to a final step S84 of windowing (through a synthesis window) and overlapping to obtain temporal samples free of aliasing that will then be sent to a digital-to-analog converter for playback.

Lorsqu'une trame est manquante (sortie KO du test), le décodeur utilise alors le signal déjà décodé ainsi que la partie « aliasée » de la trame précédente (étape S85), dans le procédé de correction de perte de trames au sens de l'invention.When a frame is missing (KO output of the test), the decoder then uses the already decoded signal as well as the "aliased" part of the previous frame (step S85), in the frame loss correction method according to the invention. 'invention.

Claims

Method for processing a signal comprising a succession of samples distributed into successive frames, the method being implemented during a decoding of said signal so as to replace at least one lost signal frame on decoding,
the method comprising the steps:
a) searching (S3), in a valid signal available on decoding, for a signal segment of duration corresponding to a period determined as a function of said valid signal,

b) spectral analysis of the segment (S4), for a determination of spectral components of the segment,

c) synthesis (S6) of at least one replacement frame to replace the lost frame, by construction of a synthesis signal on the basis of part at least of the spectral components.
Method according to Claim 1, comprising a search, by correlation in said valid signal, for a repetition period, the duration of the segment comprising at least one repetition.
Method according to Claim 2, in which the repetition period corresponds to a duration for which the correlation exceeds a predetermined threshold value.
Method according to one of the preceding claims, furthermore comprising a determination of respective phases associated with the spectral components, and in which the construction of the synthesis signal comprises said phases of the spectral components.
Method according to one of the preceding claims, furthermore comprising a determination of respective amplitudes associated with the spectral components, and in which the construction of the synthesis signal comprises said amplitudes of the spectral components.
Method according to one of the preceding claims, furthermore comprising a determination of respective amplitudes associated with the spectral components, and in which the spectral components of highest amplitudes are selected (S5) for the construction of the synthesis signal.
Method according to one of the preceding claims, in which noise (S7) is added to the synthesis signal to compensate a loss of energy relative to spectral components not selected for the construction of the synthesis signal.
Method according to Claim 7, in which the noise is obtained by a residual weighted between the signal of the segment and the synthesis signal.
Method according to one of the preceding claims, in which the spectral analysis of the segment comprises a sinusoidal analysis by fast Fourier transform of length 2^k, where k is greater than or equal to log₂(P), P being the number of samples in the signal segment.
Method according to Claim 9, in which the spectral analysis comprises:
- an interpolation (S41) of the samples of the segment so as to obtain a second segment comprising a number of samples 2^ceil(log₂(P)), where ceil(x) is the integer greater than or equal to x,

- a calculation (S42) of the Fourier transform of the second segment; and

- after determination of the spectral components, identification of frequencies associated with the components, and construction of the synthesis signal by resampling with modification of said frequencies as a function of the resampling.
Method according to one of the preceding claims, applied in a context of transform decoding with overlap, in which the synthesis signal is constructed over at least two frame durations.
Method according to Claims 10 and 11, in which the synthesis signal is constructed over two frame durations and an additional duration corresponding to a lag introduced by a resampling filter.
Method according to one of the preceding claims, furthermore comprising a separation (S2) into a high-frequency band and a low-frequency band, of a signal arising from said valid frame or frames, and in which the spectral components are selected from the low-frequency band.
Method according to Claim 13, in which the replacement frame is synthesized by addition:
- of a first signal constructed on the basis of spectral components selected from the low-frequency band,

- of a second signal arising from a filtering in the high-frequency band,
the second signal being obtained by successive duplication (S8) of at least one valid half-frame and its time-reversed version.
Computer program comprising instructions which, when executed by a processor, allow the implementation of the method according to one of Claims 1 to 14.
Device for decoding a signal comprising a succession of samples distributed into successive frames, the device comprising means (MEM, PROC) for replacing at least one lost signal frame, comprising:
a) means of searching, in a valid signal available on decoding, for a signal segment of duration corresponding to a period determined as a function of said valid signal,

b) means of spectral analysis of the segment, for a determination of spectral components of the segment,

c) means of synthesis of at least one replacement frame to replace the lost frame, by construction of a synthesis signal on the basis of part at least of the spectral components.