EP1907812B1

EP1907812B1 - Method for switching rate- and bandwidth-scalable audio decoding rate

Info

Publication number: EP1907812B1
Application number: EP06779036A
Authority: EP
Inventors: Stéphane RAGOT; David Virette; Balazs Kovesi
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2005-07-22
Filing date: 2006-07-10
Publication date: 2010-12-01
Anticipated expiration: 2026-07-10
Also published as: ES2356492T3; KR20080033997A; RU2008106750A; KR101295729B1; RU2419171C2; JP2009503559A; DE602006018618D1; US8630864B2; EP1907812A2; US20090306992A1; JP5009910B2; CN101263554A; WO2007010158A2; WO2007010158A3; ATE490454T1; CN101263554B

Abstract

A method of bitrate switching on decoding an audio signal coded by a audio coding system, said decoding comprising a post-processing step depending on the bitrate. On switching from an initial bitrate to a final bitrate, said method includes a transition step of continuous change from a signal at the initial bitrate to a signal at the final bitrate, one or both of said signals being post-processed. Application to transmission of VoIP speech and/or audio signals in data packet networks.

Description

La présente invention concerne un procédé de commutation de débit au décodage d'un signal audio codé par un système de codage audio multi-débit et plus particulièrement un système de codage audio scalable en débit et éventuellement en largeur de bande. Elle concerne également une application dudit procédé à un système de décodage audio scalable en débit et en largeur de bande et un décodeur audio scalable en débit et en largeur de bande.The present invention relates to a rate switching method for decoding an audio signal coded by a multi-rate audio coding system and more particularly a scalable audio scalability and possibly bandwidth encoding system. It also relates to an application of said method to a bit rate and bandwidth scalable audio decoding system and a bandwidth scalable and scalable audio decoder.

L'invention trouve une application particulièrement avantageuse dans le domaine de la transmission de signaux de parole et/ou audio sur des réseaux de paquets, de type voix sur IP, afin de fournir une qualité modulable en fonction de la capacité du canal de transmission.The invention finds a particularly advantageous application in the field of the transmission of speech and / or audio signals over voice-over-IP packet networks, in order to provide a quality which can be modulated according to the capacity of the transmission channel.

Le procédé selon l'invention permet d'obtenir des transitions sans artefacts entre les différents débits d'un codeur/décodeur (codec) audio scalable en débit et en largeur de bande, ceci plus spécialement dans le cas des transitions entre la bande téléphonique et la bande élargie dans le contexte d'un codage audio scalable en débit et en largeur de bande avec un coeur en bande téléphonique avec un post-traitement dépendant du débit et une ou plusieurs couches d'amélioration en bande élargie.The method according to the invention makes it possible to obtain non-artifact transitions between the different bit rates of a scalable audio encoder / decoder (codec) in bandwidth and bandwidth, especially in the case of transitions between the telephone band and the band. the broadband in the context of scalable bit rate and bandwidth audio coding with a telephone band core with rate dependent post-processing and one or more broadband enhancement layers.

De manière habituelle, on entend par « bande téléphonique » ou « bande étroite » la bande de fréquence située entre 300 et 3400 Hz, tandis que le terme « bande élargie » est réservé à la bande s'étalant de 50 à 7000 Hz.Usually, the term "telephone band" or "narrow band" the frequency band located between 300 and 3400 Hz, while the term "broadband" is reserved for the band spreading from 50 to 7000 Hz.

De nombreuses techniques existent aujourd'hui pour convertir un signal audio-fréquences (parole et/ou audio) sous la forme d'un signal numérique et traiter les signaux ainsi numérisés.Many techniques exist today to convert an audio-frequency signal (speech and / or audio) in the form of a digital signal and process the signals thus digitized.

Les techniques les plus courantes sont les méthodes de « codage de forme d'onde », telles que le codage MIC ou MICDA (PCM ou ADPCM en anglais), les méthodes de « codage paramétrique par analyse par synthèse» comme le codage CELP (« Code Excited Linear Prédiction »), et les méthodes de « codage perceptuel en sous-bandes ou par transformée ». On rappelle qu'en codage CELP en bande étroite, on utilise en général un post-traitement servant à améliorer la qualité. Ce post-traitement comprend typiquement un post-filtrage adaptatif et un filtrage passe-haut. Ces techniques classiques de codage des signaux audio-fréquences sont décrites par exemple dans l'ouvrage de WB. Kleijn and K.K. Paliwal editors, Speech Coding and Synthesis, Elsevier, 1995 . On s'intéresse ici uniquement aux techniques utilisées en transmission bi-directionnelle des signaux audio-fréquences.The most common techniques are "waveform coding" methods, such as MIC or ADPCM (PCM or ADPCM), methods of "parametric analysis by synthesis analysis" such as CELP coding ("Code Excited Linear Prediction"), and methods of "perceptual coding in subbands or by transform". It is recalled that in narrow-band CELP coding, a post-processing is generally used to improve the quality. This post-processing typically includes adaptive post-filtering and high-pass filtering. These conventional techniques for encoding audio-frequency signals are described, for example, in the book WB. Kleijn and KK Paliwal Editors, Speech Coding and Synthesis, Elsevier, 1995 . We are interested here only in the techniques used in two-way transmission of audio-frequency signals.

En codage de parole conventionnel, le codeur génère un flux binaire à débit fixe. Cette contrainte de débit fixe simplifie la mise en oeuvre et l'utilisation du codeur et du décodeur. Des exemples de tels systèmes sont donnés par le codage G.711 à 64 kbit/s ou le codage G.729 à 8 kbit/sIn conventional speech coding, the encoder generates a fixed rate bit stream. This fixed rate constraint simplifies the implementation and use of the encoder and the decoder. Examples of such systems are given by the G.711 coding at 64 kbit / s or the G.729 coding at 8 kbit / s

Dans certaines applications, comme la téléphonie mobile, la voix sur IP, ou les communications sur réseaux ad hoc, il est préférable de générer un flux binaire à débit variable, les valeurs du débit étant prises dans un ensemble pré-défini. On distingue plusieurs techniques de codage multi-débits :

Le codage multi-modes contrôlé par la source et/ou le canal tel que mis en oeuvre dans les systèmes AMR-NB, AMR-WB, SMV, ou VMR-WB.
Le codage hiérarchique, appelé encore codage « scalable », qui génère un flux binaire dit hiérarchique car il comprend un débit coeur et une ou plusieurs couches d'amélioration. Le système G.722 à 48, 56 et 64 kbit/s est un exemple simple de codage scalable en débit. Le codec MPEG-4 CELP est quant à lui scalable en débit et en largeur de bande ( T. Nomura et al., A bitrate and bandwidth scalable CELP coder, ICASSP 1998 ).
Le codage à descriptions multiples ( A. Gersho, J.D. Gibson, V. Superman, H. Dong, A multipte description speech coder based on AMER-WU for mobile ad hoc networks, ICASSP 2004 ).

In some applications, such as mobile telephony, voice over IP, or ad-hoc network communications, it is preferable to generate a variable rate bit stream, the bit rate values being taken in a pre-defined set. We distinguish several multi-rate coding techniques:

Multi-mode coding controlled by the source and / or the channel as implemented in the AMR-NB, AMR-WB, SMV, or VMR-WB systems.
Hierarchical coding, also called "scalable" coding, which generates a so-called hierarchical bitstream because it comprises a core rate and one or more improvement layers. The 48, 56 and 64 kbit / s G.722 system is a simple example of scalable rate scaling. The MPEG-4 CELP codec is scalable in bit rate and bandwidth ( T. Nomura et al., A bitrate and scalable bandwidth CELP coder, ICASSP 1998 ).
Multiple description coding ( A. Gersho, JD Gibson, V. Superman, H. Dong, A multipte description speech coder based on AMER-WU for mobile ad hoc networks, ICASSP 2004 ).

En codage multi-débits, il est nécessaire de s'assurer que la commutation d'un débit de codage à un autre n'implique aucun défaut, ou artefact.In multi-rate coding, it is necessary to ensure that switching from one coding rate to another does not involve any defect, or artifact.

La commutation de débit est facile à réaliser si le codage repose à tous les débits sur la représentation par un même modèle de codage d'un signal audio dans une même largeur de bande. Par exemple, dans le système AMR-NB, le signal est défini en bande téléphonique (300-3400 Hz) et le codage s'appuie sur le modèle ACELP (« Algebraic Code Excited Linear Prediction »), sauf pour la génération de bruit de confort, laquelle est néanmoins réalisée par un modèle de type LPC (« Linear Predictive Coding ») compatible avec le modèle ACELP. A noter que le codage AMR-NB utilise de façon classique un post-traitement sous la forme d'un post-filtrage adaptatif et d'un filtrage passe-haut, les coefficients du post-filtrage adaptatif dépendant du débit de décodage. Aucune précaution n'est cependant prise pour gérer les problèmes éventuels liés à l'utilisation de paramètres de post-traitement variables suivant le débit. A contrario, le codage CELP en bande élargie de type AMR-WB n'utilise pas de post-traitement, essentiellement pour des raisons de complexité.Rate switching is easy to achieve if the coding is based on all the bit rates on the representation by the same coding model of a signal audio in the same bandwidth. For example, in the AMR-NB system, the signal is defined in a telephone band (300-3400 Hz) and the coding is based on the ACELP model ("Algebraic Code Excited Linear Prediction"), except for the generation of noise. comfort, which is nevertheless achieved by a model of the LPC type ("Linear Predictive Coding") compatible with the ACELP model. It should be noted that the AMR-NB coding conventionally uses a post-processing in the form of an adaptive post-filtering and a high-pass filtering, the coefficients of the adaptive post-filtering being dependent on the decoding bit rate. However, no precautions are taken to deal with potential problems related to the use of variable post-processing parameters depending on the rate. On the other hand, AMR-WB wide band CELP coding does not use post-processing, mainly for reasons of complexity.

La commutation de débit est encore plus problématique en codage audio scalable en débit et en largeur de bande. En effet, dans ce cas le codage s'appuie sur des modèles et des largeurs de bande différentes suivant le débit.Flow switching is even more problematic in scalable audio scalability and bandwidth encoding. Indeed, in this case the coding is based on different models and bandwidths depending on the rate.

Le concept de base du codage audio hiérarchique est illustré par exemple dans l'article de Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, Scalable Speech Coding Technology for High-Quality Ubiquitous Communications, NIT Technical Review, March 2004 . Dans ce type de codage, le flux binaire comprend une couche de base et une ou plusieurs couches d'amélioration. La couche de base est générée par un codec à bas débit fixe, qualifié de « codec coeur », garantissant la qualité minimale du codage. Cette couche doit être reçue par le décodeur pour maintenir un niveau de qualité acceptable. Les couches d'amélioration servent à améliorer la qualité. Si elles sont toutes émises par le codeur, il peut arriver cependant qu'elles ne soient pas toutes reçues par le décodeur. L'intérêt principal du codage hiérarchique est qu'il permet une adaptation du débit par simple troncature du flux binaire. Le nombre de couches, à savoir le nombre de troncatures possibles du flux binaire, définit la granularité du codage. On parle de codage à granularité forte si le flux binaire comprend peu de couches, de l'ordre de 2 à 4, un codage à granularité fine permettant un pas de l'ordre de 1 kbit/s.The basic concept of hierarchical audio coding is illustrated for example in the article by Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, Scalable Speech Coding Technology for High-Quality Ubiquitous Communications, NIT Technical Review, March 2004 . In this type of coding, the bit stream comprises a base layer and one or more enhancement layers. The base layer is generated by a fixed low rate codec, termed a "core codec", which guarantees the minimum quality of the coding. This layer must be received by the decoder to maintain an acceptable level of quality. Improvement layers are used to improve quality. If they are all sent by the coder, it may happen that they are not all received by the decoder. The main advantage of hierarchical coding is that it allows an adaptation of the bit rate by simple truncation of the bit stream. The number of layers, namely the number of possible truncations of the bit stream, defines the granularity of the coding. We speak of coding with high granularity if the bit stream comprises few layers, of the order of 2 to 4, fine granular coding allowing a step of the order of 1 kbit / s.

On s'intéresse ici plus particulièrement aux techniques de codage hiérarchique qui sont scalables en débit et en largeur de bande avec un codeur coeur de type CELP en bande téléphonique et une ou plusieurs couche(s) d'amélioration en bande élargie. Des exemples de tels systèmes sont donnés dans H. Taddéi et al., A Scalable Three Bitrate (8, 14.2 and 24 kbit/s) Audio Coder; 107th Convention AES, 1999 avec une granularité forte de 8, 14,2 et 24 kbit/s, et dans B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004 avec granularité fine de 6,4 à 32 kbit/s, ou encore le codage MPEG-4 CELP.Of particular interest here are hierarchical coding techniques that are scalable in rate and bandwidth with a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s). Examples of such systems are given in H. Taddei et al., Scalable Three Bitrate (8, 14.2 and 24 kbit / s) Audio Coder; 107th AES Convention, 1999 with a high granularity of 8, 14.2 and 24 kbit / s, and in B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004 with fine granularity from 6.4 to 32 kbit / s, or the MPEG-4 CELP coding.

Parmi les références les plus pertinentes liées au problème de la commutation de débit dans le contexte du codage audio scalable en débit et en largeur de bande, on peut citer les demandes internationales WO 01/48931 et WO 02/060075 .Among the most relevant references related to the problem of rate switching in the context of scalable bit rate and bandwidth audio coding are international applications. WO 01/48931 and WO 02/060075 .

Cependant, les techniques décrites dans ces deux documents ne traitent que des problèmes d'interopérabilité entre réseaux de communication utilisant des codages en bande téléphonique et en bande élargie.However, the techniques described in these two documents deal only with interoperability problems between communication networks using bandband and wideband coding.

En particulier, la demande internationale WO 02/060075 décrit un système optimisé de décimation permettant la conversion de la bande élargie vers la bande téléphonique.In particular, international demand WO 02/060075 discloses an optimized decimation system for converting the enlarged band to the telephone band.

Le procédé proposé dans la demande internationale WO 01/48931 est en fait une technique d'extension de bande qui consiste à générer un signal en bande pseudo-élargie à partir d'un signal en bande téléphonique, en particulier par extraction d'un "profil spectral". Les techniques similaires connues de l'art antérieur répondent principalement aux problèmes liés à la commutation de la bande élargie vers la bande téléphonique en cherchant à éviter la réduction de bande par l'utilisation d'une technique d'extension de bande sans transmission d'information permettant de générer un signal en bande élargie à partir du signal reçu en bande téléphonique. On notera que ces méthodes ne cherchent pas à véritablement contrôler la transition entre largeurs de bande et qu'elles présentent par ailleurs l'inconvénient de s'appuyer sur des techniques d'extension de bande dont la qualité est très variable et qui ne peut donc assurer une qualité stable en sortie. Dans US 2001/044712 , un post-traitement est effectué au décodage lors des transitions afin de simuler une variation continue de la largeur de bande.The process proposed in the international application WO 01/48931 is in fact a band extension technique which consists in generating a pseudo-wide band signal from a telephone band signal, in particular by extracting a "spectral profile". Similar techniques known from the prior art mainly address the problems related to the switching of the broadband to the telephone band seeking to avoid band reduction by the use of a band extension technique without transmission of information for generating an expanded band signal from the received bandband signal. It should be noted that these methods do not seek to really control the transition between bandwidths and that they also have the disadvantage of to rely on band extension techniques whose quality is very variable and which can not therefore ensure stable output quality. In US 2001/044712 , a post-processing is performed on the decoding during transitions to simulate a continuous variation of the bandwidth.

Aussi, le problème technique à résoudre par l'objet de la présente invention est de proposer un procédé de commutation de débit au décodage d'un signal audio codé par un système de codage audio multi-débit, ledit décodage comprenant au moins une étape de post-traitement dépendant du débit, qui permettrait de traiter les transitions entre débits différents pour lesquels sont utilisés des post-traitements suivant le débit de décodage, de manière à éliminer les artefacts particulièrement sensibles lors de variations rapides de débit au décodage. En effet, un post-traitement introduit un déphasage sur le signal, et l'utilisation de deux post-traitements différents implique des problèmes de continuité de phase lors des transitions.Also, the technical problem to be solved by the object of the present invention is to propose a method of switching the rate at the decoding of an audio signal coded by a multi-rate audio coding system, said decoding comprising at least one step of rate-dependent post-processing, which would make it possible to process the transitions between different rates for which post-processing is used according to the decoding rate, so as to eliminate the particularly sensitive artefacts during rapid rate variations at decoding. Indeed, a post-processing introduces a phase shift on the signal, and the use of two different post-treatments involves phase continuity problems during transitions.

La solution au problème technique posé est selon la présente invention décrite selon la revendication 1.The solution to the technical problem is according to the present invention described according to claim 1.

L'invention concerne aussi un programme d'ordinateur comprenant des instructions de code pour la mise en oeuvre du procédé selon l'invention lorsque ledit programme est exécuté par un ordinateur.The invention also relates to a computer program comprising code instructions for implementing the method according to the invention when said program is executed by a computer.

L'invention concerne de plus une application du procédé selon l'invention à un système de décodage audio scalable en débit.The invention further relates to an application of the method according to the invention to an audio scalable scalable audio decoding system.

L'invention concerne en outre une application du procédé selon l'invention à un système de décodage audio scalable en débit et largeur de bande dans lequel le débit initial est obtenu par au moins une première couche de décodage dans une première bande de fréquence, et le débit final est obtenu par au moins une seconde couche de décodage, dite couche d'extension de ladite première bande de fréquence dans une deuxième bande de fréquence, l'étape de post-traitement étant appliquée au décodage réalisé au débit initial.The invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the initial bit rate is obtained by at least a first decoding layer in a first frequency band, and the final rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the initial rate.

L'invention concerne en outre une application du procédé selon l'invention à un système de décodage audio scalable en débit et largeur de bande dans lequel le débit final est obtenu par au moins une première couche de décodage dans une première bande de fréquence, et le débit initial est obtenu par au moins une seconde couche de décodage, dite couche d'extension de ladite première bande de fréquence dans une deuxième bande de fréquence, l'étape de post-traitement étant appliquée au décodage réalisé au débit final.The invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the final bit rate is obtained by at least a first decoding layer in a first frequency band, and the initial rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the final rate.

Un exemple particulier de « bande étendue » est celui de la « bande élargie » définie plus haut, ladite première bande étant dans ce cas la bande téléphonique.A particular example of "extended band" is that of the "enlarged band" defined above, said first band being in this case the telephone band.

L'invention concerne également un décodeur audio multi-débit, comme revendiquée en revendication 10.The invention also relates to a multi-rate audio decoder as claimed in claim 10.

La description qui va suivre en regard des dessins annexés, donnés à titre d'exemples non limitatifs, fera bien comprendre en quoi consiste l'invention et comment elle peut être réalisée.

La figure 1 un schéma d'un codeur scalable en débit et en largeur de bande à quatre couches.
La figure 2 est un schéma d'un décodeur selon l'invention associé au codeur de la figure 1.
La figure 3 donne une structure du train binaire associé au codeur de la figure 1.
La figure 4 est un organigramme d'un procédé de commutation entre un signal post-traité et un signal non post-traité en bande téléphonique du décodeur selon l'invention.
La figure 5 est un organigramme du procédé de commutation conforme à l'invention entre une bande téléphonique et une bande élargie avec extension de bande.
La figure 6 est un organigramme du procédé de commutation conforme à l'invention entre une bande téléphonique et une bande élargie avec couche de décodage prédictif par transformée.
La figure 7 est un organigramme de la gestion du comptage de trames reçues en bande élargie pour la commutation entre débits et entre bandes conformément au procédé selon l'invention.
La figure 8 est un tableau résumant le fonctionnement de l'organigramme de la figure 7.
La figure 9 est un tableau donnant les coefficients d'atténuation adaptative lors d'une commutation de la bande téléphonique à la bande élargie.

The following description with reference to the accompanying drawings, given as non-limiting examples, will make it clear what the invention consists of and how it can be achieved.

The figure 1 a diagram of a four-layer scalability and bandwidth scalable encoder.
The figure 2 is a diagram of a decoder according to the invention associated with the coder of the figure 1 .
The figure 3 gives a structure of the bitstream associated with the coder of the figure 1 .
The figure 4 is a flowchart of a switching method between a post-processed signal and a non-post-processed signal in a telephone band of the decoder according to the invention.
The figure 5 is a flowchart of the switching method according to the invention between a telephone band and an enlarged band with band extension.
The figure 6 is a flowchart of the switching method according to the invention between a telephone band and an enlarged band with a transform predictive decoding layer.
The figure 7 is a flowchart of the management of the counting of received frames in wideband for switching between rates and between bands in accordance with the method according to the invention.
The figure 8 is a table summarizing the functioning of the organization chart of the figure 7 .
The figure 9 is a table giving adaptive attenuation coefficients when switching from the telephone band to the enlarged band.

L'invention est maintenant décrite dans le cadre d'un codec audio scalable en débit et en largeur de bande. La structure de codage scalable en débit et en largeur de bande considérée ici a comme codage coeur un codeur de type CELP en bande téléphonique, dont un cas particulier utilise le codeur G.729A tel que décrit dans ITU-T G729 Recommandation, Coding of Speech at 8 kbit/s using Conjugate Structure Algebraic Code Excited Linear Prediction (CS-ACELP), March 1996 et dans R. Salami et al., Description of ITU-T Recommandation G.729 Annex A: Reduced complexity 8 kbit/s CS-ACELP codec, ICASSP 1997 .The invention is now described in the context of a scalable audio codec in bit rate and bandwidth. The scalable bandwidth and bandwidth coding structure considered herein has a CELP coder in the form of a telephone band, a particular case of which uses the G.729A coder as described in ITU-T G729 Recommendation, Coding of Speech at 8 kbps using Conjugate Structure Algebraic Excited Linear Prediction Code (CS-ACELP), March 1996 and in R. Salami et al., Description of ITU-T Recommendation G.729 Annex A: 8 kbit / s Reduced Complexity CS-ACELP codec, ICASSP 1997 .

Au codage coeur CELP s'ajoutent trois étages d'amélioration, à savoir une amélioration du codage CELP en bande téléphonique, une extension de bande et un codage prédictif par transformée.In the CELP core coding, three enhancement stages are added, namely an improvement in CELP coding in a telephone band, a band extension and a transform predictive coding.

Les commutations de débit considérées ici concerneront des commutations entre la bande téléphonique et la bande élargie et vice versa.The flow switching considered here will involve switching between the telephone band and the enlarged band and vice versa.

La figure 1 donne un schéma du codeur utilisé.The figure 1 gives a diagram of the encoder used.

Un signal audio de bande utile 50-7000 Hz et échantillonné à 16 kHz est découpé en trames de 320 échantillons, soit 20 ms. Un filtrage passe-haut 101 de fréquence de coupure 50 Hz est appliqué au signal d'entrée. Le signal obtenu, appelé S^WB, est réutilisé dans plusieurs branches du codeur.A 50-7000 Hz bandwidth audio signal sampled at 16 kHz is cut into frames of 320 samples, or 20 ms. A high-pass filtering 101 of 50 Hz cut-off frequency is applied to the input signal. The resulting signal, called S ^WB , is reused in several branches of the encoder.

Tout d'abord, dans une première branche, un filtrage passe-bas et un sous-échantillonnage par deux, 102, de 16 à 8 kHz sont appliqués au signal S^WB. Cette opération permet d'obtenir un signal en bande téléphonique échantillonné à 8 kHz. Ce signal est traité par le codeur coeur 103, selon un codage de type CELP. Ce codage correspond ici au codeur G.729A, lequel génère le coeur du train binaire avec un débit de 8 kbit/s.Firstly, in a first branch, a low pass filtering and a two subsampling, 102, of 16 to 8 kHz are applied to the signal S ^WB . This operation makes it possible to obtain a sampled telephone band signal at 8 kHz. This signal is processed by the heart encoder 103, according to a CELP coding. This coding corresponds here to the G.729A coder, which generates the core of the bit stream with a bit rate of 8 kbit / s.

Ensuite, une première couche d'amélioration introduit un deuxième étage 103 de codage CELP. Ce deuxième étage consiste en un dictionnaire innovateur qui effectue un enrichissement de l'excitation CELP et offre une amélioration de qualité, particulièrement sur les sons non voisés. Le débit de ce deuxième étage de codage est de 4 kbit/s et les paramètres associés sont les positions et les signes des impulsions ainsi que le gain du dictionnaire innovateur associé pour chaque sous-trame de 40 échantillons (5 ms à 8 kHz).Then, a first enhancement layer introduces a second CELP coding stage 103. This second stage consists of an innovative dictionary that enriches the CELP excitation and offers a quality improvement, especially on unvoiced sounds. The rate of this second coding stage is 4 kbit / s and the associated parameters are the positions and the signs of the pulses as well as the gain of the associated innovative dictionary for each subframe of 40 samples (5 ms at 8 kHz).

Les décodages du codeur coeur et de la première couche d'amélioration sont réalisés pour obtenir le signal de synthèse 104 en bande téléphonique à 12 kbit/s. Un sur-échantillonnage par deux de 8 à 16 kHz et un filtrage passe-bas 105 permettent d'obtenir la version échantillonnée à 16 kHz des deux premiers étages du codeur.The decoding of the core coder and the first enhancement layer are performed to obtain the synthesis signal 104 in a 12 kbit / s telephone band. Over-sampling by two from 8 to 16 kHz and low-pass filtering 105 make it possible to obtain the sampled version at 16 kHz of the first two stages of the encoder.

La troisième couche d'amélioration permet de passer en bande élargie 106. Le signal d'entrée S^WB peut être pré-traité par un filtre de pré-emphase. Ce filtre permet de mieux représenter les hautes fréquences à partir du filtre de prédiction linéaire en bande élargie. Pour compenser l'effet du filtre de pré-emphase, un filtre inverse de dé-emphase est alors utilisé à la synthèse. Une alternative à cette structure de codage et de décodage n'utilisera aucun filtre de pré-emphase et de dé-emphase.The third enhancement layer makes it possible to switch to an enlarged band 106. The input signal S ^WB can be pre-processed by a pre-emphasis filter. This filter makes it possible to better represent the high frequencies from the broadband linear prediction filter. To compensate for the effect of the pre-emphasis filter, a de-emphasis inverse filter is then used in the synthesis. An alternative to this coding and decoding structure will not use any pre-emphasis and de-emphasis filters.

L'étape suivante consiste à calculer et à quantifier les filtres de prédiction linéaire en bande élargie. L'ordre du filtre de prédiction linéaire est de 18, mais dans une variante, un ordre de prédiction plus faible sera choisi, par exemple 16. Le filtre de prédiction linéaire peut être calculé par la méthode de l'autocorrélation et l'algorithme de Levinson-Durbin.The next step is to calculate and quantify the wideband linear prediction filters. The order of the linear prediction filter is 18, but in a variant, a lower prediction order will be chosen, for example 16. The linear prediction filter can be calculated by the autocorrelation method and the algorithm of Levinson-Durbin.

Ce filtre de prédiction linéaire A_WB(z) en bande élargie est quantifié en utilisant une prédiction de ces coefficients à partir du filtre Â_NB(z) issu du codeur coeur en bande téléphonique. Les coefficients peuvent ensuite être quantifiés en utilisant par exemple une quantification vectorielle multi-étages et utilisant les paramètres LSF (« Line Spectrum Frequency ») déquantifiés du codeur coeur en bande téléphonique comme décrit dans H. Ehara, T. Morii, M. Oshikiri et K. Yoshida, Prédictive VQ for bandwidth scalable LSP quantization, ICASSP 2005 .This broadband linear prediction filter A _WB (z) is quantized using a prediction of these coefficients from the _NB (z) filter from the telephone band core encoder. The coefficients can then be quantized using, for example, a multistage vector quantization and using the LSF (Line Spectrum Frequency) parameters dequantized from the telephone band core encoder as described in FIG. H. Ehara, T. Morii, M. Oshikiri and K. Yoshida, Predictive VQ for scalable bandwidth LSP quantization, ICASSP 2005 .

L'excitation en bande élargie est obtenue à partir des paramètres de l'excitation en bande téléphonique du codeur coeur: le retard de période fondamentale ou « pitch », le gain associé ainsi que les excitations algébriques du codeur coeur et de la première couche d'enrichissement de l'excitation CELP et les gains associés. Cette excitation est générée en utilisant une version sur-échantillonnée des paramètres de l'excitation des étages en bande téléphonique.The excitation in broadband is obtained from the parameters of the telephone band excitation of the core encoder: the fundamental period delay or "pitch", the associated gain as well as the algebraic excitations of the core coder and the first layer of enrichment of CELP excitation and associated gains. This excitation is generated by using an oversampled version of the parameters of the excitation of the telephone band stages.

Cette excitation en bande élargie est ensuite mise en forme par le filtre de synthèse Â_WB(Z) calculé précédemment. Dans le cas où une pré-emphase a été appliquée au signal d'entrée, on applique le filtre de dé-emphase sur le signal de sortie du filtre de synthèse. Le signal obtenu est un signal en bande élargie qui n'est pas ajusté en énergie. Pour le calcul du gain permettant la mise à niveau de l'énergie de la bande haute (3400-7000 Hz), un filtrage passe-haut est appliqué au signal de synthèse en bande élargie. Parallèlement, le même filtre passe-haut est appliqué au signal d'erreur correspondant à la différence entre le signal original retardé et le signal de synthèse des deux étages précédents. Ces deux signaux sont ensuite utilisés pour le calcul du gain à appliquer au signal de synthèse de la bande haute. Ce gain est calculé par un rapport d'énergie entre les deux signaux. Le gain g_WB quantifié est ensuite appliqué au signal S₁₄ ^WB par sous-trame de 80 échantillons (5 ms à 16 kHz), le signal ainsi obtenu est ajouté au signal de synthèse de l'étage précédent pour créer le signal en bande élargie correspondant au débit de 14 kbit/s.This excitation in broadband is then shaped by the synthesis filter Δ _WB (Z) calculated previously. In the case where a pre-emphasis has been applied to the input signal, the de-emphasis filter is applied to the output signal of the synthesis filter. The signal obtained is an expanded band signal which is not adjusted in energy. For the calculation of the gain for upgrading the energy of the high band (3400-7000 Hz), high-pass filtering is applied to the broadband synthesis signal. At the same time, the same high-pass filter is applied to the error signal corresponding to the difference between the original delayed signal and the synthesis signal of the two previous stages. These two signals are then used for calculating the gain to be applied to the synthesis signal of the high band. This gain is calculated by a ratio of energy between the two signals. The quantized gain g _WB is then applied to the signal S ₁₄ ^WB per subframe of 80 samples (5 ms at 16 kHz), the signal thus obtained is added to the synthesis signal of the preceding stage to create the signal in an enlarged band. corresponding to the bit rate of 14 kbit / s.

La suite du codage est effectuée dans le domaine fréquentiel en utilisant un schéma de codage prédictif par transformée. Les signaux d'entrée retardés 108 et de synthèse à 14 kbit/s, 107, sont filtrés par un filtre 109, 111 de pondération perceptuelle de type A_WB(z/γ)*(1-µz), typiquement γ=0.92 et µ=0.68. Ces signaux sont ensuite encodés par le schéma de codage par transformée à recouvrement de type TDAC (« Time Domain Aliasing Cancellation ») ( Y. Mahieux et J.P. Petit, Transform coding of audio signals at 64 kbit/s, IEEE GLOBECOM 1990 ).The further coding is performed in the frequency domain using a transform predictive coding scheme. Delayed input signals 108 and synthesis at 14 kbit / s, 107, are filtered by a perceptual weighting filter 109, 111 of type A _WB (z / γ) * (1-μz), typically γ = 0.92 and μ = 0.68. These signals are then encoded by the Time Domain Aliasing Cancellation (TDAC) type Y. Mahieux and JP Petit, Transform coding of audio signals at 64 kbit / s, IEEE GLOBECOM 1990 ).

Une transformée en cosinus discrète modifiée (ou MDCT en anglais) est appliquée, d'une part, 110, sur des blocs de 640 échantillons du signal d'entrée pondéré avec un recouvrement de 50% (rafraîchissement de l'analyse MDCT toutes les 20 ms), et, d'autre part, 112, sur le signal de synthèse pondéré issu de l'étage précédent d'extension de bande à 14 kbit/s (même longueur de bloc et même taux de recouvrement). Le spectre MDCT à encoder, 113, correspond à la différence entre le signal d'entrée pondéré et le signal de synthèse à 14 kbit/s pour la bande de 0 à 3400 Hz, et au signal d'entrée pondéré de 3400 Hz à 7000 Hz. On limite le spectre à 7000 Hz en mettant à zéro les 40 derniers coefficients (seuls les 280 premiers coefficients sont codés). Le spectre est divisé en 18 bandes : une bande de 8 coefficients et 17 bandes de 16 coefficients. Pour chaque bande du spectre, l'énergie des coefficients MDCT est calculée (facteurs d'échelle). Les 18 facteurs d'échelle constituent l'enveloppe spectrale du signal pondéré qui est ensuite quantifiée, codée et transmise dans la trame. La figure 3 montre le format du train binaire.A Modified Discrete Cosine Transform (or MDCT) is applied, on the one hand, 110, on blocks of 640 samples of the weighted input signal with an overlap of 50% (refresh of the MDCT analysis every 20 ms), and, on the other hand, 112, on the weighted synthesis signal from the previous 14 kbit / s bandwidth stage (same block length and same overlay rate). The MDCT spectrum to be encoded, 113, corresponds to the difference between the weighted input signal and the 14 kbit / s synthesis signal for the 0 to 3400 Hz band, and the 3400 Hz to 7000 weighted input signal. Hz. The spectrum is limited to 7000 Hz by setting the last 40 coefficients to zero (only the first 280 coefficients are coded). The spectrum is divided into 18 bands: a band of 8 coefficients and 17 bands of 16 coefficients. For each band of the spectrum, the energy of the MDCT coefficients is calculated (scale factors). The 18 scale factors constitute the spectral envelope of the weighted signal which is then quantized, coded and transmitted in the frame. The figure 3 shows the format of the binary train.

L'allocation dynamique des bits se base sur l'énergie des bandes du spectre à partir de la version déquantifiée de l'enveloppe spectrale. Ceci permet d'avoir une compatibilité entre l'allocation binaire du codeur et du décodeur. Les coefficients MDCT normalisés (structure fine) dans chaque bande sont ensuite quantifiés par des quantificateurs vectoriels utilisant des dictionnaires imbriqués en taille et en dimension, les dictionnaires étant composés d'une union de codes à permutation tels que décrits dans C. Lamblin et al., Quantification vectorielle en dimension et résolution variables, brevet PCT FR 04 00219 , 2004 . Finalement, les informations sur le codeur coeur, l'étage d'enrichissement CELP en bande téléphonique, l'étage CELP en bande élargie et enfin l'enveloppe spectrale et les coefficients normalisés codés sont multiplexés et transmis en trame.The dynamic bit allocation is based on the energy of the spectrum bands from the dequantized version of the spectral envelope. This makes it possible to have compatibility between the bit allocation of the encoder and the decoder. The normalized MDCT coefficients (fine structure) in each band are then quantized by vector quantizers using size and dimension nested dictionaries, the dictionaries being composed of a permutation code union as described in C. Lamblin et al. , Vector quantization in variable size and resolution, patent PCT FR 04 00219 , 2004 . Finally, the information on the core coder, the CELP enrichment stage in the telephone band, the broadband CELP stage and finally the spectral envelope and the standardized coded coefficients are multiplexed and transmitted in a frame.

La figure 2 représente un schéma bloc du décodeur associé au codeur de la figure 1.The figure 2 represents a block diagram of the decoder associated with the encoder of the figure 1 .

Le module 201 effectue le démultiplexage des paramètres contenus dans le train binaire. Il existe plusieurs cas de décodage en fonction du nombre de bits reçus pour une trame, les quatre cas sont décrits à partir de la figure 2 :

1. Le premier concerne la réception du nombre de bits minimum par le décodeur, pour un débit reçu de 8 kbit/s. Dans ce cas, seul le premier étage est décodé. Donc, seul le train binaire relatif au décodeur coeur 202 de type CELP (G.729A+) est reçu et décodé. Cette synthèse peut être traitée par le post-filtrage adaptatif 203 et le post-traitement de type filtrage passe-haut 204 du décodeur G.729. On appellera dans cet exemple de réalisation « post-traitement » la combinaison de ces deux opérations. Cependant, il est bien clair que le terme de « post-traitement » peut également faire référence uniquement au post-filtrage adaptatif ou au post-traitement de type filtrage passe-haut. Ce signal est sur-échantillonné, 206, et filtré, 207, pour produire un signal échantillonné à 16 kHz.
2. Le deuxième cas concerne la réception du nombre de bits relatif aux premier et deuxième étages de décodage uniquement, pour un débit reçu de 12 kbit/s. Dans ce cas, le décodeur coeur ainsi que le premier étage d'enrichissement de l'excitation CELP sont décodés. Cette synthèse peut être traitée par le post-traitement 203, 204 du décodeur G.729. Comme précédemment, ce signal est ensuite sur-échantillonné, 206, et filtré, 207 pour produire un signal échantillonné à 16 kHz.
3. Le troisième cas correspond à la réception du nombre de bits relatifs aux trois premiers étages de décodage, pour un débit reçu de 14 kbit/s. Dans ce cas, les deux premiers étages de décodage sont tout d'abord réalisés comme dans le cas 2, mis à part le fait que le post-traitement appliqué à la sortie de décodage CELP n'est pas effectué, puis le module d'extension de bande génère un signal échantillonné à 16 kHz après décodage des paramètres des paires de raies spectrales (WB-LSF) en bande élargie, 209, ainsi que des gains associés à l'excitation, 213. L'excitation en bande élargie est générée à partir des paramètres du codeur coeur et du premier étage d'enrichissement de l'excitation CELP 208. Cette excitation est ensuite filtrée par le filtre 210 de synthèse et éventuellement par le filtre 211 de dé-emphase dans le cas où un filtre de pré-emphase a été utilisé au codeur. On applique un filtre passe-haut 212 au signal obtenu et on adapte l'énergie du signal d'extension de bande à l'aide des gains associés 214 toutes les 5 ms. Ce signal est ensuite ajouté au signal en bande téléphonique échantillonné à 16 kHz obtenu à partir des deux premiers étages 215 de décodage. Dans le but d'obtenir un signal limité à 7000 Hz, ce signal est filtré dans le domaine transformé par mise à 0 des 40 derniers coefficients MDCT avant le passage par la MDCT inverse 220 et le filtre de synthèse pondéré 221.
4. Ce dernier cas correspond au décodage de tous les étages du décodeur, pour un débit reçu supérieur ou égal à 16 kbit/s. Le dernier étage est constitué d'un décodeur prédictif par transformée. L'étape 3 décrite précédemment est tout d'abord réalisée. Puis, en fonction du nombre de bits supplémentaires reçus, le schéma de décodage e prédictif par transformée est adapté :
- * Dans le cas où le nombre de bits ne correspond qu'à une partie ou à la totalité de l'enveloppe spectrale, mais que la structure fine n'est pas reçue, l'enveloppe spectrale partielle ou complète est utilisée pour ajuster l'énergie des bandes de coefficients MDCT, 216 et 217. entre 3400 Hz et 7000 Hz 218, correspondant au signal généré par l'étage 215 d'extension de bande. Ce système permet d'obtenir une amélioration progressive de la qualité audio en fonction du nombre de bits reçu.
- * Dans le cas où le nombre de bits correspond à la totalité de l'enveloppe spectrale et à une partie ou à la totalité de la structure fine, l'allocation binaire est effectuée de la même manière qu'à l'encodeur. Dans les bandes où la structure fine est reçue, les coefficients MDCT décodés sont calculés à partir de l'enveloppe spectrale et de la structure fine déquantifiées. Dans les bandes spectrales entre 3400 Hz et 7000 Hz où la structure fine n'a pas été reçue, la procédure du paragraphe précédent est utilisée, c'est à dire que les coefficients MDCT calculés sur le signal obtenu par l'extension de bande, 216 et 217, sont ajustés en énergie à partir de l'enveloppe spectrale reçue 218. Le spectre MDCT utilisé pour la synthèse est donc constitué, d'une part, du signal de synthèse des deux premiers étages de décodage ajouté au signal d'erreur décodé dans les bandes entre 0 et 3400 Hz; d'autre part, pour les bandes comprises entre 3400 Hz et 7000 Hz des coefficients MDCT décodés dans les bandes où la structure fine a été reçu et des coefficients MDCT de l'étage d'extension de bande ajustés en énergie pour les autres bandes spectrales.

The module 201 demultiplexes the parameters contained in the bitstream. There are several decoding cases depending on the number of bits received for a frame, the four cases are described from the figure 2 :

1. The first concerns the reception of the minimum number of bits by the decoder, for a received bit rate of 8 kbit / s. In this case, only the first stage is decoded. Thus, only the bit stream relating to the CELP core decoder 202 (G.729A +) is received and decoded. This synthesis can be processed by the adaptive post-filtering 203 and the high-pass filtering type 204 postprocessing of the G.729 decoder. In this embodiment example, the combination of these two operations will be called "post-processing". However, it is clear that the term "post-processing" can also refer only to adaptive post-filtering or high-pass filtering post-processing. This signal is oversampled, 206, and filtered, 207, to produce a signal sampled at 16 kHz.
2. The second case concerns the reception of the number of bits relative to the first and second decoding stages only, for a received bit rate of 12 kbit / s. In this case, the core decoder as well as the first enhancement stage of the CELP excitation are decoded. This synthesis can be processed by the post-processing 203, 204 of the G.729 decoder. As before, this signal is then oversampled, 206, and filtered, 207 to produce a signal sampled at 16 kHz.
3. The third case corresponds to receiving the number of bits relative to the first three decoding stages, for a received bit rate of 14 kbit / s. In this case, the first two decoding stages are first performed as in case 2, apart from the fact that the post-processing applied to the CELP decoding output is not performed, and then the module of bandwidth generates a signal sampled at 16 kHz after decoding parameters of WB-LSF spectral line pairs, 209, and gains associated with excitation, 213. Broadband excitation is generated from the parameters of the core encoder and the first enrichment stage of the CELP excitation 208. This excitation is then filtered by the synthesis filter 210 and possibly by the de-emphasis filter 211 in the case where a pre-filter -emphasis was used at the encoder. A high-pass filter 212 is applied to the obtained signal and the energy of the band-extension signal is adjusted with the associated gains 214 every 5 ms. This signal is then added to the sampled 16 kHz telephone band signal obtained from the first two decoding stages 215. In order to obtain a signal limited to 7000 Hz, this signal is filtered in the transformed domain by setting to 0 the last 40 MDCT coefficients before passing through the inverse MDCT 220 and the weighted synthesis filter 221.
4. This last case corresponds to the decoding of all the stages of the decoder, for a received bit rate greater than or equal to 16 kbit / s. The last stage consists of a decoder predictive transform. Step 3 described above is first performed. Then, according to the number of additional bits received, the decoding scheme e predictive by transform is adapted:
- * In the case where the number of bits only corresponds to a part or the whole of the spectral envelope, but the fine structure is not received, the partial or complete spectral envelope is used to adjust the energy bands MDCT coefficients, 216 and 217. between 3400 Hz and 7000 Hz 218, corresponding to the signal generated by the band extension stage 215. This system provides a gradual improvement in audio quality based on the number of bits received.
- * In the case where the number of bits corresponds to the totality of the spectral envelope and to a part or the whole of the fine structure, the binary allocation is carried out in the same way as to the encoder. In the bands where the fine structure is received, the decoded MDCT coefficients are computed from the dequantized thin spectral envelope and structure. In the spectral bands between 3400 Hz and 7000 Hz where the fine structure has not been received, the procedure of the preceding paragraph is used, that is to say that the MDCT coefficients calculated on the signal obtained by the band extension, 216 and 217, are adjusted in energy from the received spectral envelope 218. The spectrum MDCT used for the synthesis is thus constituted, on the one hand, of the synthesis signal of the two first stages of decoding added to the error signal decoded in the bands between 0 and 3400 Hz; on the other hand, for the bands between 3400 Hz and 7000 Hz decoded MDCT coefficients in the bands where the fine structure has been received and MDCT coefficients of the energy-adjusted band extension stage for the other spectral bands .

Une MDCT inverse est ensuite appliquée aux coefficients MDCT décodés, 220, et un filtrage par le filtre 221 de synthèse pondérée permet d'obtenir le signal de sortie.An inverse MDCT is then applied to the decoded MDCT coefficients 220, and filtering by the weighted synthesis filter 221 provides the output signal.

Le procédé de commutation conforme à l'invention va maintenant être exposé dans le cadre du décodeur de la figure 2.The switching method according to the invention will now be exposed in the context of the decoder of the figure 2 .

Le bloc 205 représente un module de "fondu enchaîné ». Lorsque le nombre de bits reçus par le décodeur ne permet de décoder que le premier ou le premier et le deuxième étages, c'est à dire pour un débit reçu de 8 ou 12 kbit/s, la bande passante effective de la sortie finale du décodeur est la bande téléphonique. Dans ces cas, pour améliorer la qualité du signal synthétisé, le post-traitement 203, 204 au sens large qui fait partie du décodeur G.729A est appliqué en bande téléphonique, avant sur-échantillonnage.Block 205 represents a "cross-fade" module When the number of bits received by the decoder only decodes the first or the first and second stages, ie for a received bit rate of 8 or 12 kbit the effective bandwidth of the final output of the decoder is the telephone band In these cases, to improve the quality of the synthesized signal, the post-processing 203, 204 in the broad sense which is part of the G.729A decoder is applied in telephone band, before over-sampling.

Par contre, si le décodage des étages en bande élargie est également réalisé, pour un débit reçu supérieur ou égal à 14 kbit/s, ce post-traitement n'est pas activé car, à l'encodeur, l'encodage des étages supérieurs a été calculé à partir de la version sans post-traitement de la bande téléphonique.On the other hand, if the decoding of the broadband stages is also performed, for a received bit rate greater than or equal to 14 kbit / s, this post-processing is not activated because, at the encoder, the encoding of the higher floors has been calculated from the version without post-processing of the telephone band.

Le post-traitement, 203 et 204, introduit un déphasage du signal. Lors de la commutation entre modes sans et avec post-traitement il faut donc assurer une transition douce. La figure 4 décrit la réalisation du bloc 205 qui assure cette transition lente entre le signal en bande téléphonique post-traité et non post-traité, en appliquant des fondus enchaînés.Post-processing, 203 and 204, introduces a phase shift of the signal. When switching between modes with and without post-processing, a smooth transition must be ensured. The figure 4 describes the realization of block 205 which provides this slow transition between the post-processed and non-post-processed telephone band signal by applying cross-fades.

L'étape 401 examine si la trame courante est une trame en bande téléphonique ou non, c'est-à-dire qu'on vérifie si le débit de la trame courante est à 8 ou 12 kbit/s. Sur réponse négative, une étape 402 est appelée pour vérifier si la trame précédente a été post-traitée ou pas dans la bande téléphonique (ce qui revient à vérifier si le débit de la trame précédente était de 8-12 kbit/s ou pas). Sur réponse négative, dans l'étape 403, le signal non post-traité S₁ est copié dans le signal S₃. Au contraire, sur réponse positive au test 402, dans l'étape 404, le signal S₃ contiendra le résultat d'un fondu enchaîné, où le poids du composant non post-traité S₁ augmente tandis que le poids du composant post-filtré S₂ diminue. L'étape 404 est suivie par l'étape 405 qui remet à jour le drapeau prevPF avec la valeur 0.Step 401 examines whether the current frame is a voice band frame or not, that is, whether the current frame rate is 8 or 12 kbit / s. On negative answer, a step 402 is called to check whether the previous frame was post-processed or not in the telephone band (which amounts to checking whether the bit rate of the previous frame was 8-12 kbit / s or not) . On negative response, in step 403, the non-post-processed signal S ₁ is copied into the signal S ₃ . On the contrary, on a positive response to the test 402, in the step 404, the signal S ₃ will contain the result of a cross-fade, where the weight of the non-post-processed component S ₁ increases while the weight of the post-filtered component S ₂ decreases. Step 404 is followed by step 405 which updates the prevPF flag with the value 0.

Dans le cas d'une réponse positive à l'étape 401, dans l'étape 406, on vérifie si dans la trame précédente le post-traitement était actif ou pas dans la bande téléphonique. Sur réponse positive, dans l'étape 408, le signal post-traité S₂ est copié dans le signal S₃. Lorsqu'au contraire, la réponse est négative à l'étape 406, le signal, S₃ est calculé, dans l'étape 407, comme le résultat d'un fondu enchaîné, où cette fois le poids du composant non post-traité S₁ diminue tandis que le poids du composant post-traité S₂ augmente. Après l'étape 407, l'étape 409 est appelée pour remettre à jour le drapeau prevPF avec la valeur 1.In the case of a positive response in step 401, in step 406, it is checked whether in the previous frame the post-processing was active or not in the telephone band. On positive response, in step 408, the post-processed signal S ₂ is copied into the signal S ₃ . When, on the contrary, the response is negative at step 406, the signal S ₃ is calculated, in step 407, as the result of a cross-fade, where this time the weight of the non-post-processed component S ₁ decreases while the weight of the post-treated component S ₂ increases. After step 407, step 409 is called to update the prevPF flag with the value 1.

Dans une variante de ce mode de réalisation, lorsque le nombre de bits reçus par le décodeur ne permet de décoder que le premier ou le premier et le deuxième étages, c'est à dire pour un débit reçu de 8 ou 12 kbit/s, la bande passante effective de la sortie finale du décodeur est la bande téléphonique (signal S₁). Dans ces cas, pour améliorer la qualité du signal synthétisé, un post-traitement est appliqué en bande téléphonique, avant sur-échantillonnage.In a variant of this embodiment, when the number of bits received by the decoder makes it possible to decode only the first or the first and the second stages, ie for a received bit rate of 8 or 12 kbit / s, the effective bandwidth of the final output of the decoder is the telephone band (signal S ₁ ). In these cases, to improve the quality of the synthesized signal, a post-processing is applied in telephone band, before over-sampling.

Par contre, si le décodage des étages en bande élargie est également réalisé, pour un débit reçu supérieur ou égal à 14 kbit/s, un post-traitement différent est activé (signal S₂), à l'encodeur, l'encodage des étages supérieurs a été calculé à partir de la version avec ce post-traitement de la bande téléphonique.On the other hand, if the decoding of the broadband stages is also carried out, for a received bit rate greater than or equal to 14 kbit / s, a different post-processing is activated (signal S ₂ ), to the encoder, the encoding of the upper floors was calculated from the version with this post-processing of the telephone band.

Le post-traitement utilisé pour les débits de 8 ou 12 kbit/s et le post-traitement utilisé pour les débits supérieurs ou égaux à 14 kbit/s introduisent des déphasages du signal différents l'un de l'autre. Lors de la commutation entre modes avec les différents post-traitemerits il faut donc assurer une transition douce. Cette transition lente entre les signaux en bande téléphonique avec les différents post-traitements est réalisée en appliquant des fondus enchaînés (qui donnent le signal S₃).The post-processing used for rates of 8 or 12 kbit / s and the post-processing used for rates greater than or equal to 14 kbit / s introduce signal phase differences different from each other. When switching between modes with different post-processing, it is necessary to ensure a smooth transition. This slow transition between the telephone band signals with the different post-treatments is carried out by applying cross-fades (which give the signal S ₃ ).

On examine si la trame courante est une trame en bande téléphonique ou non. Sur réponse négative, on vérifie si la trame précédente était une trame en bande téléphonique. Sur réponse négative, le signal post-traité S1 est copié dans le signal S3. Au contraire, sur réponse positive, le signal S3 contiendra le résultat d'un fondu enchaîné, où le poids du composant post-traité S1 augmente tandis que le poids du composant post-traité S2 diminue.We examine whether the current frame is a frame in telephone band or not. On negative answer, it is checked whether the previous frame was a telephone band frame. On negative response, the post-processed signal S1 is copied into the signal S3. On the contrary, on a positive response, the signal S3 will contain the result of a cross-fade, where the weight of the post-processed component S1 increases while the weight of the post-processed component S2 decreases.

Dans le cas d'une réponse positive, on vérifie si la trame précédente était une trame en bande téléphonique. Sur réponse positive, le signal post-traité S2 est copié dans le signal S3. Lorsqu'au contraire, la réponse est négative, le signal S3 est calculé comme le résultat d'un fondu enchaîné, où cette fois le poids du composant post-traité S1 diminue tandis que le poids du composant post-traité S2 augmente.In the case of a positive response, it is checked whether the previous frame was a telephone band frame. On positive response, the post-processed signal S2 is copied into the signal S3. When, on the contrary, the response is negative, the signal S3 is calculated as the result of a crossfade, where this time the weight of the post-processed component S1 decreases while the weight of the post-treated component S2 increases.

Le bloc 209 calcule les filtres de prédiction linéaire en bande élargie nécessaires aux étages d'extension de bande et décodage prédictif par transformée. Ce calcul est nécessaire dans le cas où l'on ne reçoit que la partie en bande téléphonique du train binaire d'une trame, après avoir reçu une trame en bande élargie et que l'on souhaite réaliser une extension de bande afin de maintenir l'effet de bande. Un jeu de LSF est extrapolé à partir des LSF du décodeur coeur en bande téléphonique. On peut par exemple répartir uniformément 8 LSF sur la bande comprise entre le dernier LSF issu de la bande téléphonique et la fréquence de Nyquist. Cela permet de faire tendre le filtre de prédiction linéaire vers un filtre de réponse en amplitude plate pour les hautes fréquences.Block 209 calculates the broadband linear prediction filters required for the band extension and transform prediction decoding stages. This calculation is necessary in the case where only the telephone band portion of the bitstream of a frame is received after having received an expanded band frame and it is desired to carry out a band extension in order to maintain the band. band effect. A set of LSF is extrapolated from the LSF of the telephone band core decoder. One can for example evenly distribute 8 LSF on the band between the last LSF from the telephone band and the Nyquist frequency. This allows the linear prediction filter to be stretched to a flat amplitude response filter for high frequencies.

Le bloc 213 réalise l'adaptation du gain utilisé pour l'extension de bande selon la présente invention. Les organigrammes correspondant à ce bloc sont décrits aux figures 5 et 7.Block 213 realizes the gain adaptation used for the band extension according to the present invention. The organizational charts corresponding to this block are described in figures 5 and 7 .

Le principe de l'atténuation adaptative du gain appliqué à la bande haute est décrit à la figure 5. Tout d'abord, le calcul du gain de la première couche de décodage en bande élargie se fait, 501, selon deux possibilités. Dans le cas où le train binaire correspondant à cette couche d'extension de bande a été reçu, le gain est obtenu par décodage, 503. Par contre, dans le cas où ce gain n'a pas été reçu dans le train binaire, une extrapolation du gain associé à cette couche de décodage est réalisée, 502. On peut par exemple réaliser un calcul du gain par alignement de l'énergie de la bande basse de l'étage de décodage en bande élargie avec le décodage réel de la bande téléphonique précédemment réalisé.The principle of adaptive attenuation of gain applied to the high band is described in figure 5 . Firstly, the calculation of the gain of the first broadband decoding layer is done, 501, according to two possibilities. In the case where the bit stream corresponding to this band extension layer has been received, the gain is obtained by decoding 503. On the other hand, in the case where this gain has not been received in the bit stream, a extrapolation of the gain associated with this decoding layer is carried out, 502. For example, it is possible to calculate the gain by aligning the energy of the low band of the broadband decoding stage with the actual decoding of the telephone band. previously realized.

Ensuite un compteur du nombre de trames en bande élargie précédemment reçues est mis à jour, 504, selon le principe décrit à la figure 7.Then a counter of the number of previously received wideband frames is updated, 504, according to the principle described in FIG. figure 7 .

Enfin, ce compteur est utilisé pour paramétrer l'atténuation appliquée au gain du premier étage de décodage en bande élargie, 505.Finally, this counter is used to parameterize the attenuation applied to the gain of the first wide band decoding stage, 505.

La figure 7 représente l'organigramme de la gestion du comptage du nombre de trames en bande élargie reçues. La mise à jour du compteur se fait de la façon suivante. Si la trame courante est une trame en bande élargie, donc si le gain associé au premier étage de décodage en bande élargie a été reçu (bloc 501 de la figure 5) et que la trame précédente était aussi une trame en bande élargie, alors le compteur est incrémenté de 1 et saturé à la valeur MAX_COUNT_RCV. Cette valeur correspond au nombre de trames pendant lesquelles le signal décodé en bande élargie sera atténué lors d'une commutation entre un débit bande téléphonique vers un débit bande élargie.The figure 7 represents the flowchart of the count management of the number of received wideband frames. The update of the counter is done as follows. If the current frame is an expanded band frame, then if the gain associated with the first wideband decode stage has been received (block 501 of the figure 5 ) and that the previous frame was also an expanded band frame, then the counter is incremented by 1 and saturated with the value MAX_COUNT_RCV. This value corresponds to the number of frames during which the broadband decoded signal will be attenuated when switching between a telephone bandwidth to an enlarged bandwidth.

Par contre si la trame courante reçue est une trame en bande téléphonique, il existe plusieurs comportements possibles. Si la trame précédente était aussi une trame en bande téléphonique, le compteur est positionné à 0. Dans le cas contraire, si la trame précédente était une trame en bande élargie et que le compteur a une valeur inférieure à MAX_COUNT_RCV, on positionne aussi le compteur à 0. Dans tous les autres cas, le compteur reste à la valeur précédente.On the other hand, if the received current frame is a telephone band frame, there are several possible behaviors. If the previous frame was also a telephone band frame, the counter is set to 0. Otherwise, if the previous frame was an expanded band frame and the counter has a value less than MAX_COUNT_RCV, the counter is also set to 0. In all other cases, the counter remains at the value previous.

Le fonctionnement de cet organigramme est résumé dans le tableau de la figure 8. Les valeurs prises par le coefficient d'atténuation sont fournies dans le tableau de la figure 9 dans le cas où MAX_COUNT_RCV prend la valeur de 100, ce tableau est fourni à titre d'exemple. On peut constater que jusqu'à la trame 65 le coefficient d'atténuation est maintenu à 0, correspondant à une phase de prolongement du décodage dans la bande téléphonique. La phase de transition proprement dite est effectuée à partir de la trame 66 par augmentation progressive du coefficient d'atténuation.The functioning of this organization chart is summarized in the table of the figure 8 . The values taken by the attenuation coefficient are given in the table of the figure 9 in the case where MAX_COUNT_RCV is set to 100, this table is provided as an example. It can be seen that up to the frame 65 the attenuation coefficient is maintained at 0, corresponding to a phase of extension of the decoding in the telephone band. The actual transition phase is performed from the frame 66 by gradually increasing the attenuation coefficient.

Le bloc 219 effectue l'atténuation adaptative des couches d'amélioration par codage prédictif par transformée selon la présente invention telle que décrite à la figure 6.Block 219 performs the adaptive attenuation of the transform prediction coding enhancement layers according to the present invention as described in FIG. figure 6 .

Cette figure donne l'organigramme de la procédure d'atténuation adaptative de la couche de décodage prédictif par transformée. Tout d'abord, on vérifie si l'enveloppe spectrale de cette couche a été totalement reçue, 601. Si tel est le cas, alors une atténuation des coefficients MDCT de correction de la bande basse 0-3500 Hz est réalisée, 602, en utilisant le compteur de trames en bande élargie reçues et le tableau d'atténuation défini à la figure 9.This figure gives the flowchart of the adaptive attenuation procedure of the transform predictive decoding layer. Firstly, it is checked whether the spectral envelope of this layer has been totally received, 601. If this is the case, then an attenuation of the MDCT coefficients of correction of the low band 0-3500 Hz is carried out, 602, in using the received broadband frame counter and the attenuation table defined in the figure 9 .

Ensuite, dans les deux cas, on contrôle le nombre de trames en bande élargie reçues. Si ce nombre est inférieur à MAX_COUNT_RCV, les coefficients MDCT correspondant au premier étage de décodage en bande élargie avec extension de bande avec transmission d'information sont utilisés pour l'étage de décodage prédictif par transformée. Par contre, si le compteur a la valeur maximale, on réalise la procédure de mise à niveau de l'énergie des bandes du décodage prédictif par transformée avec l'enveloppe spectrale décodée.Then, in both cases, the number of received broadband frames is monitored. If this number is less than MAX_COUNT_RCV, the MDCT coefficients corresponding to the first bandwidth broadband decoding stage with information transmission are used for the transform prediction decoding stage. On the other hand, if the counter has the maximum value, the procedure of upgrading the energy of the bands of the predictive decoding by transforming with the decoded spectral envelope is carried out.

Claims

Method for switching rate when decoding an audio signal coded by a multirate audio coding system, characterized in that, from a decoded signal, two signals, called first signal (S1) and second signal (S2), are supplied to the input of a cross-fading module, at least one of the signals being post-processed in a post-processing step, the post-processing forming part of a set of post-processing operations suited to different sets of rates, and in that:
- upon the detection (401, 406) of a rate switch between a current frame at a rate lying within a first set of rates and a preceding frame at a rate lying within a second set of rates, a cross-fading step (407) is performed by weighting, by reducing the weight of the second signal, post-processed or not, according to the post-processing suited to the second set of rates and by increasing the weight of the first signal, post-processed or not, according to the post-processing suited to the first set of rates, to obtain an output signal (S3); and

- upon the detection (401, 402) of a rate switch between a current frame at a rate lying within a second set of rates and a preceding frame at a rate lying within a first set of rates, the rates of the first set being greater than those of the second set, a cross-fading step (404) is performed by weighting, by reducing the weight of the first signal, post-processed or not, according to the post-processing suited to the first set of rates and by increasing the weight of the second signal, post-processed or not, according to the post-processing suited to the second set of rates, to obtain an output signal (S3).
Method according to Claim 1, characterized in that one of the post-processing operations is a high-pass filtering (204).
Method according to Claim 1, characterized in that one of the post-processing operations is an adaptive post-filtering (203).
Method according to Claim 1, characterized in that one of the post-processing operations is a combination of a high-pass filtering and an adaptive post-filtering.
Method according to Claim 1, characterized in that a single signal at the input of the cross-fading module is post-processed.
Method according to Claim 1, characterized in that the two signals at the input of the cross-fading module are post-processed with different post-processing operations suited to different sets of rates.
Computer program comprising code instructions for implementing the method according to any one of Claims 1 to 6 when said program is run by a computer.
Application of the method according to any one of Claims 1 to 6 to a rate-scalable audio decoding system.
Application of the method according to any one of Claims 1 to 6 to a rate- and bandwidth-scalable audio decoding system in which a first rate is obtained by at least a first decoding layer in a first frequency band, and a second rate is obtained by a second decoding layer, called extension layer of said first frequency band, in a second frequency band.
Multirate audio decoder, characterized in that it comprises a cross-fading module (205) receiving as input a first signal (S1) and a second signal (S2) obtained from a decoded signal, at least one of the two signals having undergone a post-processing (203, 204) from a set of post-processing operations suited to different sets of rates, the cross-fading module being able:
- upon the detection (401, 406) of a rate switch between a current frame at a rate lying within a first set of rates and a preceding frame at a rate lying within a second set of rates, the rates of the first set being greater than those of the second set, to perform a cross-fading (407) by weighting, by reducing the weight of the second signal, post-processed or not, according to the post-processing operation suited to the second set of rates and by increasing the weight of the first signal, post-processed or not, according to the post-processing operation suited to the first set of rates, to obtain the output signal (S3) from the cross-fading module; and

- upon the detection (401, 402) of a rate switch between a current frame at a rate lying within a second set of rates and a preceding frame at a rate lying within a first set of rates, to perform a cross-fading (404) by weighting, by reducing the weight of the first signal, post-processed or not, according to the post-processing operation suited to the first set of rates and by increasing the weight of the second signal, post-processed or not, according to the post-processing operation suited to the second set of rates, to obtain the output signal (S3) from the cross-fading module.
Decoder according to Claim 10, characterized in that at least one of the post-processing operations is a high-pass filtering.
Decoder according to Claim 10, characterized in that at least one of the post-processing operations is an adaptive post-filtering.
Decoder according to Claim 10, characterized in that at least one of the post-processing operations is a combination of a high-pass filtering and an adaptive post-filtering.
Decoder according to Claim 10, characterized in that a single signal at the input of the cross-fading module is post-processed.
Decoder according to Claim 10, characterized in that the two signals at the input of the cross-fading module are post-processed with different post-processing operations suited to different sets of rates.