EP1907812B1 - Method for switching rate- and bandwidth-scalable audio decoding rate - Google Patents
Method for switching rate- and bandwidth-scalable audio decoding rate Download PDFInfo
- Publication number
- EP1907812B1 EP1907812B1 EP06779036A EP06779036A EP1907812B1 EP 1907812 B1 EP1907812 B1 EP 1907812B1 EP 06779036 A EP06779036 A EP 06779036A EP 06779036 A EP06779036 A EP 06779036A EP 1907812 B1 EP1907812 B1 EP 1907812B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- post
- signal
- rates
- rate
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Not-in-force
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012805 post-processing Methods 0.000 claims abstract description 52
- 230000005236 sound signal Effects 0.000 claims abstract description 8
- 238000001914 filtration Methods 0.000 claims description 22
- 230000003044 adaptive effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 2
- 238000005562 fading Methods 0.000 claims 13
- 238000001514 detection method Methods 0.000 claims 4
- 230000007704 transition Effects 0.000 abstract description 12
- 230000005540 biological transmission Effects 0.000 abstract description 6
- 230000015572 biosynthetic process Effects 0.000 description 20
- 238000003786 synthesis reaction Methods 0.000 description 20
- 230000005284 excitation Effects 0.000 description 18
- 230000003595 spectral effect Effects 0.000 description 17
- 230000004044 response Effects 0.000 description 11
- 238000001228 spectrum Methods 0.000 description 7
- 230000006872 improvement Effects 0.000 description 5
- 230000007423 decrease Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000010363 phase shift Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 235000014698 Brassica juncea var multisecta Nutrition 0.000 description 1
- 241000251184 Rajiformes Species 0.000 description 1
- 241000897276 Termes Species 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 229940082150 encore Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000011045 prefiltration Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
Definitions
- the present invention relates to a rate switching method for decoding an audio signal coded by a multi-rate audio coding system and more particularly a scalable audio scalability and possibly bandwidth encoding system. It also relates to an application of said method to a bit rate and bandwidth scalable audio decoding system and a bandwidth scalable and scalable audio decoder.
- the invention finds a particularly advantageous application in the field of the transmission of speech and / or audio signals over voice-over-IP packet networks, in order to provide a quality which can be modulated according to the capacity of the transmission channel.
- the method according to the invention makes it possible to obtain non-artifact transitions between the different bit rates of a scalable audio encoder / decoder (codec) in bandwidth and bandwidth, especially in the case of transitions between the telephone band and the band.
- codec scalable audio encoder / decoder
- the broadband in the context of scalable bit rate and bandwidth audio coding with a telephone band core with rate dependent post-processing and one or more broadband enhancement layers.
- the term “telephone band” or “narrow band” the frequency band located between 300 and 3400 Hz, while the term “broadband” is reserved for the band spreading from 50 to 7000 Hz.
- Waveform coding methods, such as MIC or ADPCM (PCM or ADPCM), methods of "parametric analysis by synthesis analysis” such as CELP coding ("Code Excited Linear Prediction"), and methods of "perceptual coding in subbands or by transform".
- PCM or ADPCM PCM or ADPCM
- CELP coding Code Excited Linear Prediction
- perceptual coding in subbands or by transform a post-processing is generally used to improve the quality.
- This post-processing typically includes adaptive post-filtering and high-pass filtering.
- the encoder In conventional speech coding, the encoder generates a fixed rate bit stream. This fixed rate constraint simplifies the implementation and use of the encoder and the decoder. Examples of such systems are given by the G.711 coding at 64 kbit / s or the G.729 coding at 8 kbit / s
- Rate switching is easy to achieve if the coding is based on all the bit rates on the representation by the same coding model of a signal audio in the same bandwidth.
- the signal is defined in a telephone band (300-3400 Hz) and the coding is based on the ACELP model ("Algebraic Code Excited Linear Prediction"), except for the generation of noise. comfort, which is nevertheless achieved by a model of the LPC type ("Linear Predictive Coding") compatible with the ACELP model.
- the AMR-NB coding conventionally uses a post-processing in the form of an adaptive post-filtering and a high-pass filtering, the coefficients of the adaptive post-filtering being dependent on the decoding bit rate.
- no precautions are taken to deal with potential problems related to the use of variable post-processing parameters depending on the rate.
- AMR-WB wide band CELP coding does not use post-processing, mainly for reasons of complexity.
- Flow switching is even more problematic in scalable audio scalability and bandwidth encoding. Indeed, in this case the coding is based on different models and bandwidths depending on the rate.
- the bit stream comprises a base layer and one or more enhancement layers.
- the base layer is generated by a fixed low rate codec, termed a "core codec", which guarantees the minimum quality of the coding.
- This layer must be received by the decoder to maintain an acceptable level of quality. Improvement layers are used to improve quality. If they are all sent by the coder, it may happen that they are not all received by the decoder.
- the main advantage of hierarchical coding is that it allows an adaptation of the bit rate by simple truncation of the bit stream.
- the number of layers namely the number of possible truncations of the bit stream, defines the granularity of the coding.
- Hierarchical coding techniques that are scalable in rate and bandwidth with a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s). Examples of such systems are given in H. Taddei et al., Scalable Three Bitrate (8, 14.2 and 24 kbit / s) Audio Coder; 107th AES Convention, 1999 with a high granularity of 8, 14.2 and 24 kbit / s, and in B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004 with fine granularity from 6.4 to 32 kbit / s, or the MPEG-4 CELP coding.
- international demand WO 02/060075 discloses an optimized decimation system for converting the enlarged band to the telephone band.
- the process proposed in the international application WO 01/48931 is in fact a band extension technique which consists in generating a pseudo-wide band signal from a telephone band signal, in particular by extracting a "spectral profile".
- Similar techniques known from the prior art mainly address the problems related to the switching of the broadband to the telephone band seeking to avoid band reduction by the use of a band extension technique without transmission of information for generating an expanded band signal from the received bandband signal. It should be noted that these methods do not seek to really control the transition between bandwidths and that they also have the disadvantage of to rely on band extension techniques whose quality is very variable and which can not therefore ensure stable output quality.
- a post-processing is performed on the decoding during transitions to simulate a continuous variation of the bandwidth.
- the technical problem to be solved by the object of the present invention is to propose a method of switching the rate at the decoding of an audio signal coded by a multi-rate audio coding system, said decoding comprising at least one step of rate-dependent post-processing, which would make it possible to process the transitions between different rates for which post-processing is used according to the decoding rate, so as to eliminate the particularly sensitive artefacts during rapid rate variations at decoding.
- a post-processing introduces a phase shift on the signal, and the use of two different post-treatments involves phase continuity problems during transitions.
- the invention also relates to a computer program comprising code instructions for implementing the method according to the invention when said program is executed by a computer.
- the invention further relates to an application of the method according to the invention to an audio scalable scalable audio decoding system.
- the invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the initial bit rate is obtained by at least a first decoding layer in a first frequency band, and the final rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the initial rate.
- the invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the final bit rate is obtained by at least a first decoding layer in a first frequency band, and the initial rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the final rate.
- extended band is that of the "enlarged band” defined above, said first band being in this case the telephone band.
- the invention also relates to a multi-rate audio decoder as claimed in claim 10.
- the invention is now described in the context of a scalable audio codec in bit rate and bandwidth.
- the scalable bandwidth and bandwidth coding structure considered herein has a CELP coder in the form of a telephone band, a particular case of which uses the G.729A coder as described in ITU-T G729 Recommendation, Coding of Speech at 8 kbps using Conjugate Structure Algebraic Excited Linear Prediction Code (CS-ACELP), March 1996 and in R. Salami et al., Description of ITU-T Recommendation G.729 Annex A: 8 kbit / s Reduced Complexity CS-ACELP codec, ICASSP 1997 .
- CELP core coding In the CELP core coding, three enhancement stages are added, namely an improvement in CELP coding in a telephone band, a band extension and a transform predictive coding.
- the flow switching considered here will involve switching between the telephone band and the enlarged band and vice versa.
- the figure 1 gives a diagram of the encoder used.
- a 50-7000 Hz bandwidth audio signal sampled at 16 kHz is cut into frames of 320 samples, or 20 ms.
- a high-pass filtering 101 of 50 Hz cut-off frequency is applied to the input signal.
- the resulting signal, called S WB is reused in several branches of the encoder.
- a low pass filtering and a two subsampling, 102, of 16 to 8 kHz are applied to the signal S WB .
- This operation makes it possible to obtain a sampled telephone band signal at 8 kHz.
- This signal is processed by the heart encoder 103, according to a CELP coding.
- This coding corresponds here to the G.729A coder, which generates the core of the bit stream with a bit rate of 8 kbit / s.
- a first enhancement layer introduces a second CELP coding stage 103.
- This second stage consists of an innovative dictionary that enriches the CELP excitation and offers a quality improvement, especially on unvoiced sounds.
- the rate of this second coding stage is 4 kbit / s and the associated parameters are the positions and the signs of the pulses as well as the gain of the associated innovative dictionary for each subframe of 40 samples (5 ms at 8 kHz).
- the decoding of the core coder and the first enhancement layer are performed to obtain the synthesis signal 104 in a 12 kbit / s telephone band.
- Over-sampling by two from 8 to 16 kHz and low-pass filtering 105 make it possible to obtain the sampled version at 16 kHz of the first two stages of the encoder.
- the third enhancement layer makes it possible to switch to an enlarged band 106.
- the input signal S WB can be pre-processed by a pre-emphasis filter. This filter makes it possible to better represent the high frequencies from the broadband linear prediction filter. To compensate for the effect of the pre-emphasis filter, a de-emphasis inverse filter is then used in the synthesis. An alternative to this coding and decoding structure will not use any pre-emphasis and de-emphasis filters.
- the next step is to calculate and quantify the wideband linear prediction filters.
- the order of the linear prediction filter is 18, but in a variant, a lower prediction order will be chosen, for example 16.
- the linear prediction filter can be calculated by the autocorrelation method and the algorithm of Levinson-Durbin.
- This broadband linear prediction filter A WB (z) is quantized using a prediction of these coefficients from the NB (z) filter from the telephone band core encoder.
- the coefficients can then be quantized using, for example, a multistage vector quantization and using the LSF (Line Spectrum Frequency) parameters dequantized from the telephone band core encoder as described in FIG. H. Ehara, T. Morii, M. Oshikiri and K. Yoshida, Predictive VQ for scalable bandwidth LSP quantization, ICASSP 2005 .
- the excitation in broadband is obtained from the parameters of the telephone band excitation of the core encoder: the fundamental period delay or "pitch", the associated gain as well as the algebraic excitations of the core coder and the first layer of enrichment of CELP excitation and associated gains.
- This excitation is generated by using an oversampled version of the parameters of the excitation of the telephone band stages.
- This excitation in broadband is then shaped by the synthesis filter ⁇ WB (Z) calculated previously.
- the de-emphasis filter is applied to the output signal of the synthesis filter.
- the signal obtained is an expanded band signal which is not adjusted in energy.
- high-pass filtering is applied to the broadband synthesis signal.
- the same high-pass filter is applied to the error signal corresponding to the difference between the original delayed signal and the synthesis signal of the two previous stages.
- This gain is calculated by a ratio of energy between the two signals.
- the quantized gain g WB is then applied to the signal S 14 WB per subframe of 80 samples (5 ms at 16 kHz), the signal thus obtained is added to the synthesis signal of the preceding stage to create the signal in an enlarged band. corresponding to the bit rate of 14 kbit / s.
- the further coding is performed in the frequency domain using a transform predictive coding scheme.
- TDAC Time Domain Aliasing Cancellation
- a Modified Discrete Cosine Transform (or MDCT) is applied, on the one hand, 110, on blocks of 640 samples of the weighted input signal with an overlap of 50% (refresh of the MDCT analysis every 20 ms), and, on the other hand, 112, on the weighted synthesis signal from the previous 14 kbit / s bandwidth stage (same block length and same overlay rate).
- the MDCT spectrum to be encoded, 113 corresponds to the difference between the weighted input signal and the 14 kbit / s synthesis signal for the 0 to 3400 Hz band, and the 3400 Hz to 7000 weighted input signal. Hz.
- the spectrum is limited to 7000 Hz by setting the last 40 coefficients to zero (only the first 280 coefficients are coded).
- the spectrum is divided into 18 bands: a band of 8 coefficients and 17 bands of 16 coefficients.
- the energy of the MDCT coefficients is calculated (scale factors).
- the 18 scale factors constitute the spectral envelope of the weighted signal which is then quantized, coded and transmitted in the frame.
- the figure 3 shows the format of the binary train.
- the dynamic bit allocation is based on the energy of the spectrum bands from the dequantized version of the spectral envelope. This makes it possible to have compatibility between the bit allocation of the encoder and the decoder.
- the normalized MDCT coefficients (fine structure) in each band are then quantized by vector quantizers using size and dimension nested dictionaries, the dictionaries being composed of a permutation code union as described in C. Lamblin et al. , Vector quantization in variable size and resolution, patent PCT FR 04 00219 , 2004 .
- the information on the core coder, the CELP enrichment stage in the telephone band, the broadband CELP stage and finally the spectral envelope and the standardized coded coefficients are multiplexed and transmitted in a frame.
- the figure 2 represents a block diagram of the decoder associated with the encoder of the figure 1 .
- An inverse MDCT is then applied to the decoded MDCT coefficients 220, and filtering by the weighted synthesis filter 221 provides the output signal.
- Block 205 represents a "cross-fade” module
- the post-processing 203, 204 in the broad sense which is part of the G.729A decoder is applied in telephone band, before over-sampling.
- this post-processing is not activated because, at the encoder, the encoding of the higher floors has been calculated from the version without post-processing of the telephone band.
- Post-processing, 203 and 204 introduces a phase shift of the signal.
- a smooth transition must be ensured.
- the figure 4 describes the realization of block 205 which provides this slow transition between the post-processed and non-post-processed telephone band signal by applying cross-fades.
- Step 401 examines whether the current frame is a voice band frame or not, that is, whether the current frame rate is 8 or 12 kbit / s.
- a step 402 is called to check whether the previous frame was post-processed or not in the telephone band (which amounts to checking whether the bit rate of the previous frame was 8-12 kbit / s or not) .
- the non-post-processed signal S 1 is copied into the signal S 3 .
- the signal S 3 will contain the result of a cross-fade, where the weight of the non-post-processed component S 1 increases while the weight of the post-filtered component S 2 decreases.
- Step 404 is followed by step 405 which updates the prevPF flag with the value 0.
- step 406 it is checked whether in the previous frame the post-processing was active or not in the telephone band.
- step 408 the post-processed signal S 2 is copied into the signal S 3 .
- the signal S 3 is calculated, in step 407, as the result of a cross-fade, where this time the weight of the non-post-processed component S 1 decreases while the weight of the post-treated component S 2 increases.
- step 409 is called to update the prevPF flag with the value 1.
- the effective bandwidth of the final output of the decoder is the telephone band (signal S 1 ).
- a post-processing is applied in telephone band, before over-sampling.
- the post-processing used for rates of 8 or 12 kbit / s and the post-processing used for rates greater than or equal to 14 kbit / s introduce signal phase differences different from each other.
- This slow transition between the telephone band signals with the different post-treatments is carried out by applying cross-fades (which give the signal S 3 ).
- the post-processed signal S2 is copied into the signal S3.
- the signal S3 is calculated as the result of a crossfade, where this time the weight of the post-processed component S1 decreases while the weight of the post-treated component S2 increases.
- Block 209 calculates the broadband linear prediction filters required for the band extension and transform prediction decoding stages. This calculation is necessary in the case where only the telephone band portion of the bitstream of a frame is received after having received an expanded band frame and it is desired to carry out a band extension in order to maintain the band. band effect.
- a set of LSF is extrapolated from the LSF of the telephone band core decoder. One can for example evenly distribute 8 LSF on the band between the last LSF from the telephone band and the Nyquist frequency. This allows the linear prediction filter to be stretched to a flat amplitude response filter for high frequencies.
- Block 213 realizes the gain adaptation used for the band extension according to the present invention.
- the organizational charts corresponding to this block are described in figures 5 and 7 .
- the principle of adaptive attenuation of gain applied to the high band is described in figure 5 .
- the calculation of the gain of the first broadband decoding layer is done, 501, according to two possibilities.
- the gain is obtained by decoding 503.
- a extrapolation of the gain associated with this decoding layer is carried out, 502. For example, it is possible to calculate the gain by aligning the energy of the low band of the broadband decoding stage with the actual decoding of the telephone band. previously realized.
- a counter of the number of previously received wideband frames is updated, 504, according to the principle described in FIG. figure 7 .
- this counter is used to parameterize the attenuation applied to the gain of the first wide band decoding stage, 505.
- the figure 7 represents the flowchart of the count management of the number of received wideband frames.
- the update of the counter is done as follows. If the current frame is an expanded band frame, then if the gain associated with the first wideband decode stage has been received (block 501 of the figure 5 ) and that the previous frame was also an expanded band frame, then the counter is incremented by 1 and saturated with the value MAX_COUNT_RCV. This value corresponds to the number of frames during which the broadband decoded signal will be attenuated when switching between a telephone bandwidth to an enlarged bandwidth.
- the counter is set to 0. Otherwise, if the previous frame was an expanded band frame and the counter has a value less than MAX_COUNT_RCV, the counter is also set to 0. In all other cases, the counter remains at the value previous.
- Block 219 performs the adaptive attenuation of the transform prediction coding enhancement layers according to the present invention as described in FIG. figure 6 .
- This figure gives the flowchart of the adaptive attenuation procedure of the transform predictive decoding layer. Firstly, it is checked whether the spectral envelope of this layer has been totally received, 601. If this is the case, then an attenuation of the MDCT coefficients of correction of the low band 0-3500 Hz is carried out, 602, in using the received broadband frame counter and the attenuation table defined in the figure 9 .
- the number of received broadband frames is monitored. If this number is less than MAX_COUNT_RCV, the MDCT coefficients corresponding to the first bandwidth broadband decoding stage with information transmission are used for the transform prediction decoding stage. On the other hand, if the counter has the maximum value, the procedure of upgrading the energy of the bands of the predictive decoding by transforming with the decoded spectral envelope is carried out.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
La présente invention concerne un procédé de commutation de débit au décodage d'un signal audio codé par un système de codage audio multi-débit et plus particulièrement un système de codage audio scalable en débit et éventuellement en largeur de bande. Elle concerne également une application dudit procédé à un système de décodage audio scalable en débit et en largeur de bande et un décodeur audio scalable en débit et en largeur de bande.The present invention relates to a rate switching method for decoding an audio signal coded by a multi-rate audio coding system and more particularly a scalable audio scalability and possibly bandwidth encoding system. It also relates to an application of said method to a bit rate and bandwidth scalable audio decoding system and a bandwidth scalable and scalable audio decoder.
L'invention trouve une application particulièrement avantageuse dans le domaine de la transmission de signaux de parole et/ou audio sur des réseaux de paquets, de type voix sur IP, afin de fournir une qualité modulable en fonction de la capacité du canal de transmission.The invention finds a particularly advantageous application in the field of the transmission of speech and / or audio signals over voice-over-IP packet networks, in order to provide a quality which can be modulated according to the capacity of the transmission channel.
Le procédé selon l'invention permet d'obtenir des transitions sans artefacts entre les différents débits d'un codeur/décodeur (codec) audio scalable en débit et en largeur de bande, ceci plus spécialement dans le cas des transitions entre la bande téléphonique et la bande élargie dans le contexte d'un codage audio scalable en débit et en largeur de bande avec un coeur en bande téléphonique avec un post-traitement dépendant du débit et une ou plusieurs couches d'amélioration en bande élargie.The method according to the invention makes it possible to obtain non-artifact transitions between the different bit rates of a scalable audio encoder / decoder (codec) in bandwidth and bandwidth, especially in the case of transitions between the telephone band and the band. the broadband in the context of scalable bit rate and bandwidth audio coding with a telephone band core with rate dependent post-processing and one or more broadband enhancement layers.
De manière habituelle, on entend par « bande téléphonique » ou « bande étroite » la bande de fréquence située entre 300 et 3400 Hz, tandis que le terme « bande élargie » est réservé à la bande s'étalant de 50 à 7000 Hz.Usually, the term "telephone band" or "narrow band" the frequency band located between 300 and 3400 Hz, while the term "broadband" is reserved for the band spreading from 50 to 7000 Hz.
De nombreuses techniques existent aujourd'hui pour convertir un signal audio-fréquences (parole et/ou audio) sous la forme d'un signal numérique et traiter les signaux ainsi numérisés.Many techniques exist today to convert an audio-frequency signal (speech and / or audio) in the form of a digital signal and process the signals thus digitized.
Les techniques les plus courantes sont les méthodes de « codage de forme d'onde », telles que le codage MIC ou MICDA (PCM ou ADPCM en anglais), les méthodes de « codage paramétrique par analyse par synthèse» comme le codage CELP (« Code Excited Linear Prédiction »), et les méthodes de « codage perceptuel en sous-bandes ou par transformée ». On rappelle qu'en codage CELP en bande étroite, on utilise en général un post-traitement servant à améliorer la qualité. Ce post-traitement comprend typiquement un post-filtrage adaptatif et un filtrage passe-haut. Ces techniques classiques de codage des signaux audio-fréquences sont décrites par exemple dans l'ouvrage de
En codage de parole conventionnel, le codeur génère un flux binaire à débit fixe. Cette contrainte de débit fixe simplifie la mise en oeuvre et l'utilisation du codeur et du décodeur. Des exemples de tels systèmes sont donnés par le codage G.711 à 64 kbit/s ou le codage G.729 à 8 kbit/sIn conventional speech coding, the encoder generates a fixed rate bit stream. This fixed rate constraint simplifies the implementation and use of the encoder and the decoder. Examples of such systems are given by the G.711 coding at 64 kbit / s or the G.729 coding at 8 kbit / s
Dans certaines applications, comme la téléphonie mobile, la voix sur IP, ou les communications sur réseaux ad hoc, il est préférable de générer un flux binaire à débit variable, les valeurs du débit étant prises dans un ensemble pré-défini. On distingue plusieurs techniques de codage multi-débits :
- Le codage multi-modes contrôlé par la source et/ou le canal tel que mis en oeuvre dans les systèmes AMR-NB, AMR-WB, SMV, ou VMR-WB.
- Le codage hiérarchique, appelé encore codage « scalable », qui génère un flux binaire dit hiérarchique car il comprend un débit coeur et une ou plusieurs couches d'amélioration. Le système G.722 à 48, 56 et 64 kbit/s est un exemple simple de codage scalable en débit. Le codec MPEG-4 CELP est quant à lui scalable en débit et en largeur de bande (
T. Nomura et al., A bitrate and bandwidth scalable CELP coder, ICASSP 1998 - Le codage à descriptions multiples (
A. Gersho, J.D. Gibson, V. Superman, H. Dong, A multipte description speech coder based on AMER-WU for mobile ad hoc networks, ICASSP 2004
- Multi-mode coding controlled by the source and / or the channel as implemented in the AMR-NB, AMR-WB, SMV, or VMR-WB systems.
- Hierarchical coding, also called "scalable" coding, which generates a so-called hierarchical bitstream because it comprises a core rate and one or more improvement layers. The 48, 56 and 64 kbit / s G.722 system is a simple example of scalable rate scaling. The MPEG-4 CELP codec is scalable in bit rate and bandwidth (
T. Nomura et al., A bitrate and scalable bandwidth CELP coder, ICASSP 1998 - Multiple description coding (
A. Gersho, JD Gibson, V. Superman, H. Dong, A multipte description speech coder based on AMER-WU for mobile ad hoc networks, ICASSP 2004
En codage multi-débits, il est nécessaire de s'assurer que la commutation d'un débit de codage à un autre n'implique aucun défaut, ou artefact.In multi-rate coding, it is necessary to ensure that switching from one coding rate to another does not involve any defect, or artifact.
La commutation de débit est facile à réaliser si le codage repose à tous les débits sur la représentation par un même modèle de codage d'un signal audio dans une même largeur de bande. Par exemple, dans le système AMR-NB, le signal est défini en bande téléphonique (300-3400 Hz) et le codage s'appuie sur le modèle ACELP (« Algebraic Code Excited Linear Prediction »), sauf pour la génération de bruit de confort, laquelle est néanmoins réalisée par un modèle de type LPC (« Linear Predictive Coding ») compatible avec le modèle ACELP. A noter que le codage AMR-NB utilise de façon classique un post-traitement sous la forme d'un post-filtrage adaptatif et d'un filtrage passe-haut, les coefficients du post-filtrage adaptatif dépendant du débit de décodage. Aucune précaution n'est cependant prise pour gérer les problèmes éventuels liés à l'utilisation de paramètres de post-traitement variables suivant le débit. A contrario, le codage CELP en bande élargie de type AMR-WB n'utilise pas de post-traitement, essentiellement pour des raisons de complexité.Rate switching is easy to achieve if the coding is based on all the bit rates on the representation by the same coding model of a signal audio in the same bandwidth. For example, in the AMR-NB system, the signal is defined in a telephone band (300-3400 Hz) and the coding is based on the ACELP model ("Algebraic Code Excited Linear Prediction"), except for the generation of noise. comfort, which is nevertheless achieved by a model of the LPC type ("Linear Predictive Coding") compatible with the ACELP model. It should be noted that the AMR-NB coding conventionally uses a post-processing in the form of an adaptive post-filtering and a high-pass filtering, the coefficients of the adaptive post-filtering being dependent on the decoding bit rate. However, no precautions are taken to deal with potential problems related to the use of variable post-processing parameters depending on the rate. On the other hand, AMR-WB wide band CELP coding does not use post-processing, mainly for reasons of complexity.
La commutation de débit est encore plus problématique en codage audio scalable en débit et en largeur de bande. En effet, dans ce cas le codage s'appuie sur des modèles et des largeurs de bande différentes suivant le débit.Flow switching is even more problematic in scalable audio scalability and bandwidth encoding. Indeed, in this case the coding is based on different models and bandwidths depending on the rate.
Le concept de base du codage audio hiérarchique est illustré par exemple dans l'article de
On s'intéresse ici plus particulièrement aux techniques de codage hiérarchique qui sont scalables en débit et en largeur de bande avec un codeur coeur de type CELP en bande téléphonique et une ou plusieurs couche(s) d'amélioration en bande élargie. Des exemples de tels systèmes sont donnés dans H. Taddéi et al., A Scalable Three Bitrate (8, 14.2 and 24 kbit/s) Audio Coder; 107th Convention AES, 1999 avec une granularité forte de 8, 14,2 et 24 kbit/s, et dans B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004 avec granularité fine de 6,4 à 32 kbit/s, ou encore le codage MPEG-4 CELP.Of particular interest here are hierarchical coding techniques that are scalable in rate and bandwidth with a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s). Examples of such systems are given in H. Taddei et al., Scalable Three Bitrate (8, 14.2 and 24 kbit / s) Audio Coder; 107th AES Convention, 1999 with a high granularity of 8, 14.2 and 24 kbit / s, and in B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004 with fine granularity from 6.4 to 32 kbit / s, or the MPEG-4 CELP coding.
Parmi les références les plus pertinentes liées au problème de la commutation de débit dans le contexte du codage audio scalable en débit et en largeur de bande, on peut citer les demandes internationales
Cependant, les techniques décrites dans ces deux documents ne traitent que des problèmes d'interopérabilité entre réseaux de communication utilisant des codages en bande téléphonique et en bande élargie.However, the techniques described in these two documents deal only with interoperability problems between communication networks using bandband and wideband coding.
En particulier, la demande internationale
Le procédé proposé dans la demande internationale
Aussi, le problème technique à résoudre par l'objet de la présente invention est de proposer un procédé de commutation de débit au décodage d'un signal audio codé par un système de codage audio multi-débit, ledit décodage comprenant au moins une étape de post-traitement dépendant du débit, qui permettrait de traiter les transitions entre débits différents pour lesquels sont utilisés des post-traitements suivant le débit de décodage, de manière à éliminer les artefacts particulièrement sensibles lors de variations rapides de débit au décodage. En effet, un post-traitement introduit un déphasage sur le signal, et l'utilisation de deux post-traitements différents implique des problèmes de continuité de phase lors des transitions.Also, the technical problem to be solved by the object of the present invention is to propose a method of switching the rate at the decoding of an audio signal coded by a multi-rate audio coding system, said decoding comprising at least one step of rate-dependent post-processing, which would make it possible to process the transitions between different rates for which post-processing is used according to the decoding rate, so as to eliminate the particularly sensitive artefacts during rapid rate variations at decoding. Indeed, a post-processing introduces a phase shift on the signal, and the use of two different post-treatments involves phase continuity problems during transitions.
La solution au problème technique posé est selon la présente invention décrite selon la revendication 1.The solution to the technical problem is according to the present invention described according to
L'invention concerne aussi un programme d'ordinateur comprenant des instructions de code pour la mise en oeuvre du procédé selon l'invention lorsque ledit programme est exécuté par un ordinateur.The invention also relates to a computer program comprising code instructions for implementing the method according to the invention when said program is executed by a computer.
L'invention concerne de plus une application du procédé selon l'invention à un système de décodage audio scalable en débit.The invention further relates to an application of the method according to the invention to an audio scalable scalable audio decoding system.
L'invention concerne en outre une application du procédé selon l'invention à un système de décodage audio scalable en débit et largeur de bande dans lequel le débit initial est obtenu par au moins une première couche de décodage dans une première bande de fréquence, et le débit final est obtenu par au moins une seconde couche de décodage, dite couche d'extension de ladite première bande de fréquence dans une deuxième bande de fréquence, l'étape de post-traitement étant appliquée au décodage réalisé au débit initial.The invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the initial bit rate is obtained by at least a first decoding layer in a first frequency band, and the final rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the initial rate.
L'invention concerne en outre une application du procédé selon l'invention à un système de décodage audio scalable en débit et largeur de bande dans lequel le débit final est obtenu par au moins une première couche de décodage dans une première bande de fréquence, et le débit initial est obtenu par au moins une seconde couche de décodage, dite couche d'extension de ladite première bande de fréquence dans une deuxième bande de fréquence, l'étape de post-traitement étant appliquée au décodage réalisé au débit final.The invention further relates to an application of the method according to the invention to a bit rate and bandwidth scalable audio decoding system in which the final bit rate is obtained by at least a first decoding layer in a first frequency band, and the initial rate is obtained by at least one second decoding layer, called the extension layer of said first frequency band in a second frequency band, the post-processing step being applied to the decoding performed at the final rate.
Un exemple particulier de « bande étendue » est celui de la « bande élargie » définie plus haut, ladite première bande étant dans ce cas la bande téléphonique.A particular example of "extended band" is that of the "enlarged band" defined above, said first band being in this case the telephone band.
L'invention concerne également un décodeur audio multi-débit, comme revendiquée en revendication 10.The invention also relates to a multi-rate audio decoder as claimed in
La description qui va suivre en regard des dessins annexés, donnés à titre d'exemples non limitatifs, fera bien comprendre en quoi consiste l'invention et comment elle peut être réalisée.
- La
figure 1 un schéma d'un codeur scalable en débit et en largeur de bande à quatre couches. - La
figure 2 est un schéma d'un décodeur selon l'invention associé au codeur de lafigure 1 . - La
figure 3 donne une structure du train binaire associé au codeur de lafigure 1 . - La
figure 4 est un organigramme d'un procédé de commutation entre un signal post-traité et un signal non post-traité en bande téléphonique du décodeur selon l'invention. - La
figure 5 est un organigramme du procédé de commutation conforme à l'invention entre une bande téléphonique et une bande élargie avec extension de bande. - La
figure 6 est un organigramme du procédé de commutation conforme à l'invention entre une bande téléphonique et une bande élargie avec couche de décodage prédictif par transformée. - La
figure 7 est un organigramme de la gestion du comptage de trames reçues en bande élargie pour la commutation entre débits et entre bandes conformément au procédé selon l'invention. - La
figure 8 est un tableau résumant le fonctionnement de l'organigramme de lafigure 7 . - La
figure 9 est un tableau donnant les coefficients d'atténuation adaptative lors d'une commutation de la bande téléphonique à la bande élargie.
- The
figure 1 a diagram of a four-layer scalability and bandwidth scalable encoder. - The
figure 2 is a diagram of a decoder according to the invention associated with the coder of thefigure 1 . - The
figure 3 gives a structure of the bitstream associated with the coder of thefigure 1 . - The
figure 4 is a flowchart of a switching method between a post-processed signal and a non-post-processed signal in a telephone band of the decoder according to the invention. - The
figure 5 is a flowchart of the switching method according to the invention between a telephone band and an enlarged band with band extension. - The
figure 6 is a flowchart of the switching method according to the invention between a telephone band and an enlarged band with a transform predictive decoding layer. - The
figure 7 is a flowchart of the management of the counting of received frames in wideband for switching between rates and between bands in accordance with the method according to the invention. - The
figure 8 is a table summarizing the functioning of the organization chart of thefigure 7 . - The
figure 9 is a table giving adaptive attenuation coefficients when switching from the telephone band to the enlarged band.
L'invention est maintenant décrite dans le cadre d'un codec audio scalable en débit et en largeur de bande. La structure de codage scalable en débit et en largeur de bande considérée ici a comme codage coeur un codeur de type CELP en bande téléphonique, dont un cas particulier utilise le codeur G.729A tel que décrit dans ITU-T G729 Recommandation, Coding of Speech at 8 kbit/s using Conjugate Structure Algebraic Code Excited Linear Prediction (CS-ACELP), March 1996 et dans
Au codage coeur CELP s'ajoutent trois étages d'amélioration, à savoir une amélioration du codage CELP en bande téléphonique, une extension de bande et un codage prédictif par transformée.In the CELP core coding, three enhancement stages are added, namely an improvement in CELP coding in a telephone band, a band extension and a transform predictive coding.
Les commutations de débit considérées ici concerneront des commutations entre la bande téléphonique et la bande élargie et vice versa.The flow switching considered here will involve switching between the telephone band and the enlarged band and vice versa.
La
Un signal audio de bande utile 50-7000 Hz et échantillonné à 16 kHz est découpé en trames de 320 échantillons, soit 20 ms. Un filtrage passe-haut 101 de fréquence de coupure 50 Hz est appliqué au signal d'entrée. Le signal obtenu, appelé SWB, est réutilisé dans plusieurs branches du codeur.A 50-7000 Hz bandwidth audio signal sampled at 16 kHz is cut into frames of 320 samples, or 20 ms. A high-
Tout d'abord, dans une première branche, un filtrage passe-bas et un sous-échantillonnage par deux, 102, de 16 à 8 kHz sont appliqués au signal SWB. Cette opération permet d'obtenir un signal en bande téléphonique échantillonné à 8 kHz. Ce signal est traité par le codeur coeur 103, selon un codage de type CELP. Ce codage correspond ici au codeur G.729A, lequel génère le coeur du train binaire avec un débit de 8 kbit/s.Firstly, in a first branch, a low pass filtering and a two subsampling, 102, of 16 to 8 kHz are applied to the signal S WB . This operation makes it possible to obtain a sampled telephone band signal at 8 kHz. This signal is processed by the
Ensuite, une première couche d'amélioration introduit un deuxième étage 103 de codage CELP. Ce deuxième étage consiste en un dictionnaire innovateur qui effectue un enrichissement de l'excitation CELP et offre une amélioration de qualité, particulièrement sur les sons non voisés. Le débit de ce deuxième étage de codage est de 4 kbit/s et les paramètres associés sont les positions et les signes des impulsions ainsi que le gain du dictionnaire innovateur associé pour chaque sous-trame de 40 échantillons (5 ms à 8 kHz).Then, a first enhancement layer introduces a second
Les décodages du codeur coeur et de la première couche d'amélioration sont réalisés pour obtenir le signal de synthèse 104 en bande téléphonique à 12 kbit/s. Un sur-échantillonnage par deux de 8 à 16 kHz et un filtrage passe-bas 105 permettent d'obtenir la version échantillonnée à 16 kHz des deux premiers étages du codeur.The decoding of the core coder and the first enhancement layer are performed to obtain the
La troisième couche d'amélioration permet de passer en bande élargie 106. Le signal d'entrée SWB peut être pré-traité par un filtre de pré-emphase. Ce filtre permet de mieux représenter les hautes fréquences à partir du filtre de prédiction linéaire en bande élargie. Pour compenser l'effet du filtre de pré-emphase, un filtre inverse de dé-emphase est alors utilisé à la synthèse. Une alternative à cette structure de codage et de décodage n'utilisera aucun filtre de pré-emphase et de dé-emphase.The third enhancement layer makes it possible to switch to an
L'étape suivante consiste à calculer et à quantifier les filtres de prédiction linéaire en bande élargie. L'ordre du filtre de prédiction linéaire est de 18, mais dans une variante, un ordre de prédiction plus faible sera choisi, par exemple 16. Le filtre de prédiction linéaire peut être calculé par la méthode de l'autocorrélation et l'algorithme de Levinson-Durbin.The next step is to calculate and quantify the wideband linear prediction filters. The order of the linear prediction filter is 18, but in a variant, a lower prediction order will be chosen, for example 16. The linear prediction filter can be calculated by the autocorrelation method and the algorithm of Levinson-Durbin.
Ce filtre de prédiction linéaire AWB(z) en bande élargie est quantifié en utilisant une prédiction de ces coefficients à partir du filtre ÂNB(z) issu du codeur coeur en bande téléphonique. Les coefficients peuvent ensuite être quantifiés en utilisant par exemple une quantification vectorielle multi-étages et utilisant les paramètres LSF (« Line Spectrum Frequency ») déquantifiés du codeur coeur en bande téléphonique comme décrit dans
L'excitation en bande élargie est obtenue à partir des paramètres de l'excitation en bande téléphonique du codeur coeur: le retard de période fondamentale ou « pitch », le gain associé ainsi que les excitations algébriques du codeur coeur et de la première couche d'enrichissement de l'excitation CELP et les gains associés. Cette excitation est générée en utilisant une version sur-échantillonnée des paramètres de l'excitation des étages en bande téléphonique.The excitation in broadband is obtained from the parameters of the telephone band excitation of the core encoder: the fundamental period delay or "pitch", the associated gain as well as the algebraic excitations of the core coder and the first layer of enrichment of CELP excitation and associated gains. This excitation is generated by using an oversampled version of the parameters of the excitation of the telephone band stages.
Cette excitation en bande élargie est ensuite mise en forme par le filtre de synthèse ÂWB(Z) calculé précédemment. Dans le cas où une pré-emphase a été appliquée au signal d'entrée, on applique le filtre de dé-emphase sur le signal de sortie du filtre de synthèse. Le signal obtenu est un signal en bande élargie qui n'est pas ajusté en énergie. Pour le calcul du gain permettant la mise à niveau de l'énergie de la bande haute (3400-7000 Hz), un filtrage passe-haut est appliqué au signal de synthèse en bande élargie. Parallèlement, le même filtre passe-haut est appliqué au signal d'erreur correspondant à la différence entre le signal original retardé et le signal de synthèse des deux étages précédents. Ces deux signaux sont ensuite utilisés pour le calcul du gain à appliquer au signal de synthèse de la bande haute. Ce gain est calculé par un rapport d'énergie entre les deux signaux. Le gain gWB quantifié est ensuite appliqué au signal S14 WB par sous-trame de 80 échantillons (5 ms à 16 kHz), le signal ainsi obtenu est ajouté au signal de synthèse de l'étage précédent pour créer le signal en bande élargie correspondant au débit de 14 kbit/s.This excitation in broadband is then shaped by the synthesis filter Δ WB (Z) calculated previously. In the case where a pre-emphasis has been applied to the input signal, the de-emphasis filter is applied to the output signal of the synthesis filter. The signal obtained is an expanded band signal which is not adjusted in energy. For the calculation of the gain for upgrading the energy of the high band (3400-7000 Hz), high-pass filtering is applied to the broadband synthesis signal. At the same time, the same high-pass filter is applied to the error signal corresponding to the difference between the original delayed signal and the synthesis signal of the two previous stages. These two signals are then used for calculating the gain to be applied to the synthesis signal of the high band. This gain is calculated by a ratio of energy between the two signals. The quantized gain g WB is then applied to the signal S 14 WB per subframe of 80 samples (5 ms at 16 kHz), the signal thus obtained is added to the synthesis signal of the preceding stage to create the signal in an enlarged band. corresponding to the bit rate of 14 kbit / s.
La suite du codage est effectuée dans le domaine fréquentiel en utilisant un schéma de codage prédictif par transformée. Les signaux d'entrée retardés 108 et de synthèse à 14 kbit/s, 107, sont filtrés par un filtre 109, 111 de pondération perceptuelle de type AWB(z/γ)*(1-µz), typiquement γ=0.92 et µ=0.68. Ces signaux sont ensuite encodés par le schéma de codage par transformée à recouvrement de type TDAC (« Time Domain Aliasing Cancellation ») (
Une transformée en cosinus discrète modifiée (ou MDCT en anglais) est appliquée, d'une part, 110, sur des blocs de 640 échantillons du signal d'entrée pondéré avec un recouvrement de 50% (rafraîchissement de l'analyse MDCT toutes les 20 ms), et, d'autre part, 112, sur le signal de synthèse pondéré issu de l'étage précédent d'extension de bande à 14 kbit/s (même longueur de bloc et même taux de recouvrement). Le spectre MDCT à encoder, 113, correspond à la différence entre le signal d'entrée pondéré et le signal de synthèse à 14 kbit/s pour la bande de 0 à 3400 Hz, et au signal d'entrée pondéré de 3400 Hz à 7000 Hz. On limite le spectre à 7000 Hz en mettant à zéro les 40 derniers coefficients (seuls les 280 premiers coefficients sont codés). Le spectre est divisé en 18 bandes : une bande de 8 coefficients et 17 bandes de 16 coefficients. Pour chaque bande du spectre, l'énergie des coefficients MDCT est calculée (facteurs d'échelle). Les 18 facteurs d'échelle constituent l'enveloppe spectrale du signal pondéré qui est ensuite quantifiée, codée et transmise dans la trame. La
L'allocation dynamique des bits se base sur l'énergie des bandes du spectre à partir de la version déquantifiée de l'enveloppe spectrale. Ceci permet d'avoir une compatibilité entre l'allocation binaire du codeur et du décodeur. Les coefficients MDCT normalisés (structure fine) dans chaque bande sont ensuite quantifiés par des quantificateurs vectoriels utilisant des dictionnaires imbriqués en taille et en dimension, les dictionnaires étant composés d'une union de codes à permutation tels que décrits dans C. Lamblin et al., Quantification vectorielle en dimension et résolution variables, brevet
La
Le module 201 effectue le démultiplexage des paramètres contenus dans le train binaire. Il existe plusieurs cas de décodage en fonction du nombre de bits reçus pour une trame, les quatre cas sont décrits à partir de la
- 1. Le premier concerne la réception du nombre de bits minimum par le décodeur, pour un débit reçu de 8 kbit/s. Dans ce cas, seul le premier étage est décodé. Donc, seul le train binaire relatif au décodeur
coeur 202 de type CELP (G.729A+) est reçu et décodé. Cette synthèse peut être traitée par le post-filtrage adaptatif 203 et le post-traitement de type filtrage passe-haut 204 du décodeur G.729. On appellera dans cet exemple de réalisation « post-traitement » la combinaison de ces deux opérations. Cependant, il est bien clair que le terme de « post-traitement » peut également faire référence uniquement au post-filtrage adaptatif ou au post-traitement de type filtrage passe-haut. Ce signal est sur-échantillonné, 206, et filtré, 207, pour produire un signal échantillonné à 16 kHz. - 2. Le deuxième cas concerne la réception du nombre de bits relatif aux premier et deuxième étages de décodage uniquement, pour un débit reçu de 12 kbit/s. Dans ce cas, le décodeur coeur ainsi que le premier étage d'enrichissement de l'excitation CELP sont décodés. Cette synthèse peut être traitée
203, 204 du décodeur G.729. Comme précédemment, ce signal est ensuite sur-échantillonné, 206, et filtré, 207 pour produire un signal échantillonné à 16 kHz.par le post-traitement - 3. Le troisième cas correspond à la réception du nombre de bits relatifs aux trois premiers étages de décodage, pour un débit reçu de 14 kbit/s. Dans ce cas, les deux premiers étages de décodage sont tout d'abord réalisés comme dans le
cas 2, mis à part le fait que le post-traitement appliqué à la sortie de décodage CELP n'est pas effectué, puis le module d'extension de bande génère un signal échantillonné à 16 kHz après décodage des paramètres des paires de raies spectrales (WB-LSF) en bande élargie, 209, ainsi que des gains associés à l'excitation, 213. L'excitation en bande élargie est générée à partir des paramètres du codeur coeur et du premier étage d'enrichissement de l'excitationCELP 208. Cette excitation est ensuite filtrée par le filtre 210 de synthèse et éventuellement par le filtre 211 de dé-emphase dans le cas où un filtre de pré-emphase a été utilisé au codeur. On applique un filtre passe-haut 212 au signal obtenu et on adapte l'énergie du signal d'extension de bande à l'aide des gains associés 214 toutes les 5 ms. Ce signal est ensuite ajouté au signal en bande téléphonique échantillonné à 16 kHz obtenu à partir des deux premiers étages 215 de décodage. Dans le but d'obtenir un signal limité à 7000 Hz, ce signal est filtré dans le domaine transformé par mise à 0 des 40 derniers coefficients MDCT avant le passage par la MDCT inverse 220 et le filtre de synthèse pondéré 221. - 4. Ce dernier cas correspond au décodage de tous les étages du décodeur, pour un débit reçu supérieur ou égal à 16 kbit/s. Le dernier étage est constitué d'un décodeur prédictif par transformée.
L'étape 3 décrite précédemment est tout d'abord réalisée. Puis, en fonction du nombre de bits supplémentaires reçus, le schéma de décodage e prédictif par transformée est adapté :- * Dans le cas où le nombre de bits ne correspond qu'à une partie ou à la totalité de l'enveloppe spectrale, mais que la structure fine n'est pas reçue, l'enveloppe spectrale partielle ou complète est utilisée pour ajuster l'énergie des bandes de coefficients MDCT, 216
et 217. entre 3400 Hz et 7000Hz 218, correspondant au signalgénéré par l'étage 215 d'extension de bande. Ce système permet d'obtenir une amélioration progressive de la qualité audio en fonction du nombre de bits reçu. - * Dans le cas où le nombre de bits correspond à la totalité de l'enveloppe spectrale et à une partie ou à la totalité de la structure fine, l'allocation binaire est effectuée de la même manière qu'à l'encodeur. Dans les bandes où la structure fine est reçue, les coefficients MDCT décodés sont calculés à partir de l'enveloppe spectrale et de la structure fine déquantifiées. Dans les bandes spectrales entre 3400 Hz et 7000 Hz où la structure fine n'a pas été reçue, la procédure du paragraphe précédent est utilisée, c'est à dire que les coefficients MDCT calculés sur le signal obtenu par l'extension de bande, 216
et 217, sont ajustés en énergie à partir de l'enveloppe spectrale reçue 218. Le spectre MDCT utilisé pour la synthèse est donc constitué, d'une part, du signal de synthèse des deux premiers étages de décodage ajouté au signal d'erreur décodé dans les bandes entre 0 et 3400 Hz; d'autre part, pour les bandes comprises entre 3400 Hz et 7000 Hz des coefficients MDCT décodés dans les bandes où la structure fine a été reçu et des coefficients MDCT de l'étage d'extension de bande ajustés en énergie pour les autres bandes spectrales.
- * Dans le cas où le nombre de bits ne correspond qu'à une partie ou à la totalité de l'enveloppe spectrale, mais que la structure fine n'est pas reçue, l'enveloppe spectrale partielle ou complète est utilisée pour ajuster l'énergie des bandes de coefficients MDCT, 216
- 1. The first concerns the reception of the minimum number of bits by the decoder, for a received bit rate of 8 kbit / s. In this case, only the first stage is decoded. Thus, only the bit stream relating to the CELP core decoder 202 (G.729A +) is received and decoded. This synthesis can be processed by the
adaptive post-filtering 203 and the high-pass filtering type 204 postprocessing of the G.729 decoder. In this embodiment example, the combination of these two operations will be called "post-processing". However, it is clear that the term "post-processing" can also refer only to adaptive post-filtering or high-pass filtering post-processing. This signal is oversampled, 206, and filtered, 207, to produce a signal sampled at 16 kHz. - 2. The second case concerns the reception of the number of bits relative to the first and second decoding stages only, for a received bit rate of 12 kbit / s. In this case, the core decoder as well as the first enhancement stage of the CELP excitation are decoded. This synthesis can be processed by the post-processing 203, 204 of the G.729 decoder. As before, this signal is then oversampled, 206, and filtered, 207 to produce a signal sampled at 16 kHz.
- 3. The third case corresponds to receiving the number of bits relative to the first three decoding stages, for a received bit rate of 14 kbit / s. In this case, the first two decoding stages are first performed as in
case 2, apart from the fact that the post-processing applied to the CELP decoding output is not performed, and then the module of bandwidth generates a signal sampled at 16 kHz after decoding parameters of WB-LSF spectral line pairs, 209, and gains associated with excitation, 213. Broadband excitation is generated from the parameters of the core encoder and the first enrichment stage of theCELP excitation 208. This excitation is then filtered by thesynthesis filter 210 and possibly by thede-emphasis filter 211 in the case where a pre-filter -emphasis was used at the encoder. A high-pass filter 212 is applied to the obtained signal and the energy of the band-extension signal is adjusted with the associatedgains 214 every 5 ms. This signal is then added to the sampled 16 kHz telephone band signal obtained from the first two decoding stages 215. In order to obtain a signal limited to 7000 Hz, this signal is filtered in the transformed domain by setting to 0 the last 40 MDCT coefficients before passing through theinverse MDCT 220 and theweighted synthesis filter 221. - 4. This last case corresponds to the decoding of all the stages of the decoder, for a received bit rate greater than or equal to 16 kbit / s. The last stage consists of a decoder predictive transform.
Step 3 described above is first performed. Then, according to the number of additional bits received, the decoding scheme e predictive by transform is adapted:- * In the case where the number of bits only corresponds to a part or the whole of the spectral envelope, but the fine structure is not received, the partial or complete spectral envelope is used to adjust the energy bands MDCT coefficients, 216 and 217. between 3400 Hz and 7000 Hz 218, corresponding to the signal generated by the
band extension stage 215. This system provides a gradual improvement in audio quality based on the number of bits received. - * In the case where the number of bits corresponds to the totality of the spectral envelope and to a part or the whole of the fine structure, the binary allocation is carried out in the same way as to the encoder. In the bands where the fine structure is received, the decoded MDCT coefficients are computed from the dequantized thin spectral envelope and structure. In the spectral bands between 3400 Hz and 7000 Hz where the fine structure has not been received, the procedure of the preceding paragraph is used, that is to say that the MDCT coefficients calculated on the signal obtained by the band extension, 216 and 217, are adjusted in energy from the received
spectral envelope 218. The spectrum MDCT used for the synthesis is thus constituted, on the one hand, of the synthesis signal of the two first stages of decoding added to the error signal decoded in the bands between 0 and 3400 Hz; on the other hand, for the bands between 3400 Hz and 7000 Hz decoded MDCT coefficients in the bands where the fine structure has been received and MDCT coefficients of the energy-adjusted band extension stage for the other spectral bands .
- * In the case where the number of bits only corresponds to a part or the whole of the spectral envelope, but the fine structure is not received, the partial or complete spectral envelope is used to adjust the energy bands MDCT coefficients, 216 and 217. between 3400 Hz and 7000 Hz 218, corresponding to the signal generated by the
Une MDCT inverse est ensuite appliquée aux coefficients MDCT décodés, 220, et un filtrage par le filtre 221 de synthèse pondérée permet d'obtenir le signal de sortie.An inverse MDCT is then applied to the decoded
Le procédé de commutation conforme à l'invention va maintenant être exposé dans le cadre du décodeur de la
Le bloc 205 représente un module de "fondu enchaîné ». Lorsque le nombre de bits reçus par le décodeur ne permet de décoder que le premier ou le premier et le deuxième étages, c'est à dire pour un débit reçu de 8 ou 12 kbit/s, la bande passante effective de la sortie finale du décodeur est la bande téléphonique. Dans ces cas, pour améliorer la qualité du signal synthétisé, le post-traitement 203, 204 au sens large qui fait partie du décodeur G.729A est appliqué en bande téléphonique, avant sur-échantillonnage.
Par contre, si le décodage des étages en bande élargie est également réalisé, pour un débit reçu supérieur ou égal à 14 kbit/s, ce post-traitement n'est pas activé car, à l'encodeur, l'encodage des étages supérieurs a été calculé à partir de la version sans post-traitement de la bande téléphonique.On the other hand, if the decoding of the broadband stages is also performed, for a received bit rate greater than or equal to 14 kbit / s, this post-processing is not activated because, at the encoder, the encoding of the higher floors has been calculated from the version without post-processing of the telephone band.
Le post-traitement, 203 et 204, introduit un déphasage du signal. Lors de la commutation entre modes sans et avec post-traitement il faut donc assurer une transition douce. La
L'étape 401 examine si la trame courante est une trame en bande téléphonique ou non, c'est-à-dire qu'on vérifie si le débit de la trame courante est à 8 ou 12 kbit/s. Sur réponse négative, une étape 402 est appelée pour vérifier si la trame précédente a été post-traitée ou pas dans la bande téléphonique (ce qui revient à vérifier si le débit de la trame précédente était de 8-12 kbit/s ou pas). Sur réponse négative, dans l'étape 403, le signal non post-traité S1 est copié dans le signal S3. Au contraire, sur réponse positive au test 402, dans l'étape 404, le signal S3 contiendra le résultat d'un fondu enchaîné, où le poids du composant non post-traité S1 augmente tandis que le poids du composant post-filtré S2 diminue. L'étape 404 est suivie par l'étape 405 qui remet à jour le drapeau prevPF avec la valeur 0.Step 401 examines whether the current frame is a voice band frame or not, that is, whether the current frame rate is 8 or 12 kbit / s. On negative answer, a
Dans le cas d'une réponse positive à l'étape 401, dans l'étape 406, on vérifie si dans la trame précédente le post-traitement était actif ou pas dans la bande téléphonique. Sur réponse positive, dans l'étape 408, le signal post-traité S2 est copié dans le signal S3. Lorsqu'au contraire, la réponse est négative à l'étape 406, le signal, S3 est calculé, dans l'étape 407, comme le résultat d'un fondu enchaîné, où cette fois le poids du composant non post-traité S1 diminue tandis que le poids du composant post-traité S2 augmente. Après l'étape 407, l'étape 409 est appelée pour remettre à jour le drapeau prevPF avec la valeur 1.In the case of a positive response in
Dans une variante de ce mode de réalisation, lorsque le nombre de bits reçus par le décodeur ne permet de décoder que le premier ou le premier et le deuxième étages, c'est à dire pour un débit reçu de 8 ou 12 kbit/s, la bande passante effective de la sortie finale du décodeur est la bande téléphonique (signal S1). Dans ces cas, pour améliorer la qualité du signal synthétisé, un post-traitement est appliqué en bande téléphonique, avant sur-échantillonnage.In a variant of this embodiment, when the number of bits received by the decoder makes it possible to decode only the first or the first and the second stages, ie for a received bit rate of 8 or 12 kbit / s, the effective bandwidth of the final output of the decoder is the telephone band (signal S 1 ). In these cases, to improve the quality of the synthesized signal, a post-processing is applied in telephone band, before over-sampling.
Par contre, si le décodage des étages en bande élargie est également réalisé, pour un débit reçu supérieur ou égal à 14 kbit/s, un post-traitement différent est activé (signal S2), à l'encodeur, l'encodage des étages supérieurs a été calculé à partir de la version avec ce post-traitement de la bande téléphonique.On the other hand, if the decoding of the broadband stages is also carried out, for a received bit rate greater than or equal to 14 kbit / s, a different post-processing is activated (signal S 2 ), to the encoder, the encoding of the upper floors was calculated from the version with this post-processing of the telephone band.
Le post-traitement utilisé pour les débits de 8 ou 12 kbit/s et le post-traitement utilisé pour les débits supérieurs ou égaux à 14 kbit/s introduisent des déphasages du signal différents l'un de l'autre. Lors de la commutation entre modes avec les différents post-traitemerits il faut donc assurer une transition douce. Cette transition lente entre les signaux en bande téléphonique avec les différents post-traitements est réalisée en appliquant des fondus enchaînés (qui donnent le signal S3).The post-processing used for rates of 8 or 12 kbit / s and the post-processing used for rates greater than or equal to 14 kbit / s introduce signal phase differences different from each other. When switching between modes with different post-processing, it is necessary to ensure a smooth transition. This slow transition between the telephone band signals with the different post-treatments is carried out by applying cross-fades (which give the signal S 3 ).
On examine si la trame courante est une trame en bande téléphonique ou non. Sur réponse négative, on vérifie si la trame précédente était une trame en bande téléphonique. Sur réponse négative, le signal post-traité S1 est copié dans le signal S3. Au contraire, sur réponse positive, le signal S3 contiendra le résultat d'un fondu enchaîné, où le poids du composant post-traité S1 augmente tandis que le poids du composant post-traité S2 diminue.We examine whether the current frame is a frame in telephone band or not. On negative answer, it is checked whether the previous frame was a telephone band frame. On negative response, the post-processed signal S1 is copied into the signal S3. On the contrary, on a positive response, the signal S3 will contain the result of a cross-fade, where the weight of the post-processed component S1 increases while the weight of the post-processed component S2 decreases.
Dans le cas d'une réponse positive, on vérifie si la trame précédente était une trame en bande téléphonique. Sur réponse positive, le signal post-traité S2 est copié dans le signal S3. Lorsqu'au contraire, la réponse est négative, le signal S3 est calculé comme le résultat d'un fondu enchaîné, où cette fois le poids du composant post-traité S1 diminue tandis que le poids du composant post-traité S2 augmente.In the case of a positive response, it is checked whether the previous frame was a telephone band frame. On positive response, the post-processed signal S2 is copied into the signal S3. When, on the contrary, the response is negative, the signal S3 is calculated as the result of a crossfade, where this time the weight of the post-processed component S1 decreases while the weight of the post-treated component S2 increases.
Le bloc 209 calcule les filtres de prédiction linéaire en bande élargie nécessaires aux étages d'extension de bande et décodage prédictif par transformée. Ce calcul est nécessaire dans le cas où l'on ne reçoit que la partie en bande téléphonique du train binaire d'une trame, après avoir reçu une trame en bande élargie et que l'on souhaite réaliser une extension de bande afin de maintenir l'effet de bande. Un jeu de LSF est extrapolé à partir des LSF du décodeur coeur en bande téléphonique. On peut par exemple répartir uniformément 8 LSF sur la bande comprise entre le dernier LSF issu de la bande téléphonique et la fréquence de Nyquist. Cela permet de faire tendre le filtre de prédiction linéaire vers un filtre de réponse en amplitude plate pour les hautes fréquences.
Le bloc 213 réalise l'adaptation du gain utilisé pour l'extension de bande selon la présente invention. Les organigrammes correspondant à ce bloc sont décrits aux
Le principe de l'atténuation adaptative du gain appliqué à la bande haute est décrit à la
Ensuite un compteur du nombre de trames en bande élargie précédemment reçues est mis à jour, 504, selon le principe décrit à la
Enfin, ce compteur est utilisé pour paramétrer l'atténuation appliquée au gain du premier étage de décodage en bande élargie, 505.Finally, this counter is used to parameterize the attenuation applied to the gain of the first wide band decoding stage, 505.
La
Par contre si la trame courante reçue est une trame en bande téléphonique, il existe plusieurs comportements possibles. Si la trame précédente était aussi une trame en bande téléphonique, le compteur est positionné à 0. Dans le cas contraire, si la trame précédente était une trame en bande élargie et que le compteur a une valeur inférieure à MAX_COUNT_RCV, on positionne aussi le compteur à 0. Dans tous les autres cas, le compteur reste à la valeur précédente.On the other hand, if the received current frame is a telephone band frame, there are several possible behaviors. If the previous frame was also a telephone band frame, the counter is set to 0. Otherwise, if the previous frame was an expanded band frame and the counter has a value less than MAX_COUNT_RCV, the counter is also set to 0. In all other cases, the counter remains at the value previous.
Le fonctionnement de cet organigramme est résumé dans le tableau de la
Le bloc 219 effectue l'atténuation adaptative des couches d'amélioration par codage prédictif par transformée selon la présente invention telle que décrite à la
Cette figure donne l'organigramme de la procédure d'atténuation adaptative de la couche de décodage prédictif par transformée. Tout d'abord, on vérifie si l'enveloppe spectrale de cette couche a été totalement reçue, 601. Si tel est le cas, alors une atténuation des coefficients MDCT de correction de la bande basse 0-3500 Hz est réalisée, 602, en utilisant le compteur de trames en bande élargie reçues et le tableau d'atténuation défini à la
Ensuite, dans les deux cas, on contrôle le nombre de trames en bande élargie reçues. Si ce nombre est inférieur à MAX_COUNT_RCV, les coefficients MDCT correspondant au premier étage de décodage en bande élargie avec extension de bande avec transmission d'information sont utilisés pour l'étage de décodage prédictif par transformée. Par contre, si le compteur a la valeur maximale, on réalise la procédure de mise à niveau de l'énergie des bandes du décodage prédictif par transformée avec l'enveloppe spectrale décodée.Then, in both cases, the number of received broadband frames is monitored. If this number is less than MAX_COUNT_RCV, the MDCT coefficients corresponding to the first bandwidth broadband decoding stage with information transmission are used for the transform prediction decoding stage. On the other hand, if the counter has the maximum value, the procedure of upgrading the energy of the bands of the predictive decoding by transforming with the decoded spectral envelope is carried out.
Claims (15)
- Method for switching rate when decoding an audio signal coded by a multirate audio coding system, characterized in that, from a decoded signal, two signals, called first signal (S1) and second signal (S2), are supplied to the input of a cross-fading module, at least one of the signals being post-processed in a post-processing step, the post-processing forming part of a set of post-processing operations suited to different sets of rates, and in that:- upon the detection (401, 406) of a rate switch between a current frame at a rate lying within a first set of rates and a preceding frame at a rate lying within a second set of rates, a cross-fading step (407) is performed by weighting, by reducing the weight of the second signal, post-processed or not, according to the post-processing suited to the second set of rates and by increasing the weight of the first signal, post-processed or not, according to the post-processing suited to the first set of rates, to obtain an output signal (S3); and- upon the detection (401, 402) of a rate switch between a current frame at a rate lying within a second set of rates and a preceding frame at a rate lying within a first set of rates, the rates of the first set being greater than those of the second set, a cross-fading step (404) is performed by weighting, by reducing the weight of the first signal, post-processed or not, according to the post-processing suited to the first set of rates and by increasing the weight of the second signal, post-processed or not, according to the post-processing suited to the second set of rates, to obtain an output signal (S3).
- Method according to Claim 1, characterized in that one of the post-processing operations is a high-pass filtering (204).
- Method according to Claim 1, characterized in that one of the post-processing operations is an adaptive post-filtering (203).
- Method according to Claim 1, characterized in that one of the post-processing operations is a combination of a high-pass filtering and an adaptive post-filtering.
- Method according to Claim 1, characterized in that a single signal at the input of the cross-fading module is post-processed.
- Method according to Claim 1, characterized in that the two signals at the input of the cross-fading module are post-processed with different post-processing operations suited to different sets of rates.
- Computer program comprising code instructions for implementing the method according to any one of Claims 1 to 6 when said program is run by a computer.
- Application of the method according to any one of Claims 1 to 6 to a rate-scalable audio decoding system.
- Application of the method according to any one of Claims 1 to 6 to a rate- and bandwidth-scalable audio decoding system in which a first rate is obtained by at least a first decoding layer in a first frequency band, and a second rate is obtained by a second decoding layer, called extension layer of said first frequency band, in a second frequency band.
- Multirate audio decoder, characterized in that it comprises a cross-fading module (205) receiving as input a first signal (S1) and a second signal (S2) obtained from a decoded signal, at least one of the two signals having undergone a post-processing (203, 204) from a set of post-processing operations suited to different sets of rates, the cross-fading module being able:- upon the detection (401, 406) of a rate switch between a current frame at a rate lying within a first set of rates and a preceding frame at a rate lying within a second set of rates, the rates of the first set being greater than those of the second set, to perform a cross-fading (407) by weighting, by reducing the weight of the second signal, post-processed or not, according to the post-processing operation suited to the second set of rates and by increasing the weight of the first signal, post-processed or not, according to the post-processing operation suited to the first set of rates, to obtain the output signal (S3) from the cross-fading module; and- upon the detection (401, 402) of a rate switch between a current frame at a rate lying within a second set of rates and a preceding frame at a rate lying within a first set of rates, to perform a cross-fading (404) by weighting, by reducing the weight of the first signal, post-processed or not, according to the post-processing operation suited to the first set of rates and by increasing the weight of the second signal, post-processed or not, according to the post-processing operation suited to the second set of rates, to obtain the output signal (S3) from the cross-fading module.
- Decoder according to Claim 10, characterized in that at least one of the post-processing operations is a high-pass filtering.
- Decoder according to Claim 10, characterized in that at least one of the post-processing operations is an adaptive post-filtering.
- Decoder according to Claim 10, characterized in that at least one of the post-processing operations is a combination of a high-pass filtering and an adaptive post-filtering.
- Decoder according to Claim 10, characterized in that a single signal at the input of the cross-fading module is post-processed.
- Decoder according to Claim 10, characterized in that the two signals at the input of the cross-fading module are post-processed with different post-processing operations suited to different sets of rates.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0552286 | 2005-07-22 | ||
PCT/FR2006/050697 WO2007010158A2 (en) | 2005-07-22 | 2006-07-10 | Method for switching rate- and bandwidth-scalable audio decoding rate |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1907812A2 EP1907812A2 (en) | 2008-04-09 |
EP1907812B1 true EP1907812B1 (en) | 2010-12-01 |
Family
ID=36177265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06779036A Not-in-force EP1907812B1 (en) | 2005-07-22 | 2006-07-10 | Method for switching rate- and bandwidth-scalable audio decoding rate |
Country Status (10)
Country | Link |
---|---|
US (1) | US8630864B2 (en) |
EP (1) | EP1907812B1 (en) |
JP (1) | JP5009910B2 (en) |
KR (1) | KR101295729B1 (en) |
CN (1) | CN101263554B (en) |
AT (1) | ATE490454T1 (en) |
DE (1) | DE602006018618D1 (en) |
ES (1) | ES2356492T3 (en) |
RU (1) | RU2419171C2 (en) |
WO (1) | WO2007010158A2 (en) |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7461106B2 (en) * | 2006-09-12 | 2008-12-02 | Motorola, Inc. | Apparatus and method for low complexity combinatorial coding of signals |
EP2096632A4 (en) * | 2006-11-29 | 2012-06-27 | Panasonic Corp | Decoding apparatus and audio decoding method |
JP5255575B2 (en) * | 2007-03-02 | 2013-08-07 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | Post filter for layered codec |
RU2463674C2 (en) * | 2007-03-02 | 2012-10-10 | Панасоник Корпорэйшн | Encoding device and encoding method |
JP5377287B2 (en) * | 2007-03-02 | 2013-12-25 | パナソニック株式会社 | Post filter, decoding device, and post filter processing method |
US8576096B2 (en) * | 2007-10-11 | 2013-11-05 | Motorola Mobility Llc | Apparatus and method for low complexity combinatorial coding of signals |
US8209190B2 (en) * | 2007-10-25 | 2012-06-26 | Motorola Mobility, Inc. | Method and apparatus for generating an enhancement layer within an audio coding system |
EP2207166B1 (en) * | 2007-11-02 | 2013-06-19 | Huawei Technologies Co., Ltd. | An audio decoding method and device |
US9872066B2 (en) * | 2007-12-18 | 2018-01-16 | Ibiquity Digital Corporation | Method for streaming through a data service over a radio link subsystem |
DE102008009720A1 (en) * | 2008-02-19 | 2009-08-20 | Siemens Enterprise Communications Gmbh & Co. Kg | Method and means for decoding background noise information |
US20090234642A1 (en) * | 2008-03-13 | 2009-09-17 | Motorola, Inc. | Method and Apparatus for Low Complexity Combinatorial Coding of Signals |
US8639519B2 (en) * | 2008-04-09 | 2014-01-28 | Motorola Mobility Llc | Method and apparatus for selective signal coding based on core encoder performance |
PL2304719T3 (en) * | 2008-07-11 | 2017-12-29 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder, methods for providing an audio stream and computer program |
US20100057473A1 (en) * | 2008-08-26 | 2010-03-04 | Hongwei Kong | Method and system for dual voice path processing in an audio codec |
US20100063825A1 (en) * | 2008-09-05 | 2010-03-11 | Apple Inc. | Systems and Methods for Memory Management and Crossfading in an Electronic Device |
US9773505B2 (en) * | 2008-09-18 | 2017-09-26 | Electronics And Telecommunications Research Institute | Encoding apparatus and decoding apparatus for transforming between modified discrete cosine transform-based coder and different coder |
US8175888B2 (en) | 2008-12-29 | 2012-05-08 | Motorola Mobility, Inc. | Enhanced layered gain factor balancing within a multiple-channel audio coding system |
US8200496B2 (en) * | 2008-12-29 | 2012-06-12 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8140342B2 (en) * | 2008-12-29 | 2012-03-20 | Motorola Mobility, Inc. | Selective scaling mask computation based on peak detection |
US8219408B2 (en) * | 2008-12-29 | 2012-07-10 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
KR101622950B1 (en) * | 2009-01-28 | 2016-05-23 | 삼성전자주식회사 | Method of coding/decoding audio signal and apparatus for enabling the method |
FR2947944A1 (en) * | 2009-07-07 | 2011-01-14 | France Telecom | PERFECTED CODING / DECODING OF AUDIONUMERIC SIGNALS |
US8428936B2 (en) * | 2010-03-05 | 2013-04-23 | Motorola Mobility Llc | Decoder for audio signal including generic audio and speech frames |
US8423355B2 (en) * | 2010-03-05 | 2013-04-16 | Motorola Mobility Llc | Encoder for audio signal including generic audio and speech frames |
US8886523B2 (en) * | 2010-04-14 | 2014-11-11 | Huawei Technologies Co., Ltd. | Audio decoding based on audio class with control code for post-processing modes |
US9047875B2 (en) * | 2010-07-19 | 2015-06-02 | Futurewei Technologies, Inc. | Spectrum flatness control for bandwidth extension |
JP5489900B2 (en) * | 2010-07-27 | 2014-05-14 | ヤマハ株式会社 | Acoustic data communication device |
NO2669468T3 (en) * | 2011-05-11 | 2018-06-02 | ||
RU2480904C1 (en) * | 2012-06-01 | 2013-04-27 | Анна Валерьевна Хуторцева | Method for combined filtering and differential pulse-code modulation/demodulation of signals |
CN103516440B (en) | 2012-06-29 | 2015-07-08 | 华为技术有限公司 | Audio signal processing method and encoding device |
US9129600B2 (en) | 2012-09-26 | 2015-09-08 | Google Technology Holdings LLC | Method and apparatus for encoding an audio signal |
ES2688021T3 (en) * | 2012-12-21 | 2018-10-30 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Adding comfort noise to model background noise at low bit rates |
RU2639952C2 (en) | 2013-08-28 | 2017-12-25 | Долби Лабораторис Лайсэнзин Корпорейшн | Hybrid speech amplification with signal form coding and parametric coding |
WO2015163750A2 (en) * | 2014-04-21 | 2015-10-29 | 삼성전자 주식회사 | Device and method for transmitting and receiving voice data in wireless communication system |
KR102244612B1 (en) | 2014-04-21 | 2021-04-26 | 삼성전자주식회사 | Appratus and method for transmitting and receiving voice data in wireless communication system |
US10049684B2 (en) * | 2015-04-05 | 2018-08-14 | Qualcomm Incorporated | Audio bandwidth selection |
CA3074750A1 (en) | 2017-09-20 | 2019-03-28 | Voiceage Corporation | Method and device for efficiently distributing a bit-budget in a celp codec |
RU2744485C1 (en) | 2017-10-27 | 2021-03-10 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Noise reduction in the decoder |
JPWO2022009505A1 (en) * | 2020-07-07 | 2022-01-13 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0728494A (en) * | 1993-07-09 | 1995-01-31 | Nippon Steel Corp | Method and device for decoding compression-encoded voice signal |
US5732389A (en) * | 1995-06-07 | 1998-03-24 | Lucent Technologies Inc. | Voiced/unvoiced classification of speech for excitation codebook selection in celp speech decoding during frame erasures |
US5699485A (en) * | 1995-06-07 | 1997-12-16 | Lucent Technologies Inc. | Pitch delay modification during frame erasures |
US7145898B1 (en) * | 1996-11-18 | 2006-12-05 | Mci Communications Corporation | System, method and article of manufacture for selecting a gateway of a hybrid communication system architecture |
US6904110B2 (en) * | 1997-07-31 | 2005-06-07 | Francois Trans | Channel equalization system and method |
FI980132A (en) * | 1998-01-21 | 1999-07-22 | Nokia Mobile Phones Ltd | Adaptive post-filter |
JP2000259195A (en) * | 1999-01-08 | 2000-09-22 | Matsushita Electric Ind Co Ltd | Decode circuit and reproducing device using the same |
JP2000267686A (en) * | 1999-03-19 | 2000-09-29 | Victor Co Of Japan Ltd | Signal transmission system and decoding device |
US6496794B1 (en) * | 1999-11-22 | 2002-12-17 | Motorola, Inc. | Method and apparatus for seamless multi-rate speech coding |
GB2357682B (en) | 1999-12-23 | 2004-09-08 | Motorola Ltd | Audio circuit and method for wideband to narrowband transition in a communication device |
FI115329B (en) * | 2000-05-08 | 2005-04-15 | Nokia Corp | Method and arrangement for switching the source signal bandwidth in a communication connection equipped for many bandwidths |
JP2003050598A (en) * | 2001-08-06 | 2003-02-21 | Mitsubishi Electric Corp | Voice decoding device |
CA2388439A1 (en) * | 2002-05-31 | 2003-11-30 | Voiceage Corporation | A method and device for efficient frame erasure concealment in linear predictive based speech codecs |
US6590833B1 (en) * | 2002-08-08 | 2003-07-08 | The United States Of America As Represented By The Secretary Of The Navy | Adaptive cross correlator |
US7502743B2 (en) * | 2002-09-04 | 2009-03-10 | Microsoft Corporation | Multi-channel audio encoding and decoding with multi-channel transform selection |
DE602005022641D1 (en) * | 2004-03-01 | 2010-09-09 | Dolby Lab Licensing Corp | Multi-channel audio decoding |
US7668712B2 (en) * | 2004-03-31 | 2010-02-23 | Microsoft Corporation | Audio encoding and decoding with intra frames and adaptive forward error correction |
JP5618826B2 (en) * | 2007-06-14 | 2014-11-05 | ヴォイスエイジ・コーポレーション | ITU. T Recommendation G. Apparatus and method for compensating for frame loss in PCM codec interoperable with 711 |
US8560307B2 (en) * | 2008-01-28 | 2013-10-15 | Qualcomm Incorporated | Systems, methods, and apparatus for context suppression using receivers |
EP2311271B1 (en) * | 2008-07-29 | 2014-09-03 | Dolby Laboratories Licensing Corporation | Method for adaptive control and equalization of electroacoustic channels |
US8924222B2 (en) * | 2010-07-30 | 2014-12-30 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for coding of harmonic signals |
-
2006
- 2006-07-10 DE DE602006018618T patent/DE602006018618D1/en active Active
- 2006-07-10 EP EP06779036A patent/EP1907812B1/en not_active Not-in-force
- 2006-07-10 WO PCT/FR2006/050697 patent/WO2007010158A2/en active Application Filing
- 2006-07-10 RU RU2008106750/09A patent/RU2419171C2/en not_active IP Right Cessation
- 2006-07-10 AT AT06779036T patent/ATE490454T1/en not_active IP Right Cessation
- 2006-07-10 US US11/989,313 patent/US8630864B2/en not_active Expired - Fee Related
- 2006-07-10 JP JP2008522028A patent/JP5009910B2/en not_active Expired - Fee Related
- 2006-07-10 CN CN2006800338079A patent/CN101263554B/en not_active Expired - Fee Related
- 2006-07-10 KR KR1020087004177A patent/KR101295729B1/en not_active IP Right Cessation
- 2006-07-10 ES ES06779036T patent/ES2356492T3/en active Active
Also Published As
Publication number | Publication date |
---|---|
ES2356492T3 (en) | 2011-04-08 |
KR20080033997A (en) | 2008-04-17 |
RU2008106750A (en) | 2009-08-27 |
KR101295729B1 (en) | 2013-08-12 |
RU2419171C2 (en) | 2011-05-20 |
JP2009503559A (en) | 2009-01-29 |
DE602006018618D1 (en) | 2011-01-13 |
US8630864B2 (en) | 2014-01-14 |
EP1907812A2 (en) | 2008-04-09 |
US20090306992A1 (en) | 2009-12-10 |
JP5009910B2 (en) | 2012-08-29 |
CN101263554A (en) | 2008-09-10 |
WO2007010158A2 (en) | 2007-01-25 |
WO2007010158A3 (en) | 2007-05-10 |
ATE490454T1 (en) | 2010-12-15 |
CN101263554B (en) | 2011-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1907812B1 (en) | Method for switching rate- and bandwidth-scalable audio decoding rate | |
EP1905010B1 (en) | Hierarchical audio encoding/decoding | |
EP1989706B1 (en) | Device for perceptual weighting in audio encoding/decoding | |
EP2277172B1 (en) | Concealment of transmission error in a digital signal in a hierarchical decoding structure | |
EP2656343B1 (en) | Tonsignalkodierung mit geringer verzögerung unter alternierender verwendung von prädiktiver kodierung und transformationskodierung | |
EP2366177B1 (en) | Encoding of an audio-digital signal with noise transformation in a scalable encoder | |
CA2766864C (en) | Improved coding /decoding of digital audio signals | |
EP3069340B1 (en) | Transition from a transform coding/decoding to a predictive coding/decoding | |
EP3175443B1 (en) | Determining a budget for lpd/fd transition frame encoding | |
EP2005424A2 (en) | Method for post-processing a signal in an audio decoder | |
Sinder et al. | Recent speech coding technologies and standards | |
Ogunfunmi et al. | Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20080205 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
17Q | First examination report despatched |
Effective date: 20091228 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
DAX | Request for extension of the european patent (deleted) | ||
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 602006018618 Country of ref document: DE Date of ref document: 20110113 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: VDEP Effective date: 20101201 |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2356492 Country of ref document: ES Kind code of ref document: T3 Effective date: 20110408 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
LTIE | Lt: invalidation of european patent or patent extension |
Effective date: 20101201 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110301 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110302 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110401 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110401 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
26N | No opposition filed |
Effective date: 20110902 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602006018618 Country of ref document: DE Effective date: 20110902 |
|
BERE | Be: lapsed |
Owner name: FRANCE TELECOM Effective date: 20110731 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110710 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101201 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20160623 Year of fee payment: 11 Ref country code: GB Payment date: 20160627 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20160622 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20160627 Year of fee payment: 11 Ref country code: DE Payment date: 20160622 Year of fee payment: 11 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602006018618 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20170710 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20180330 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170710 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180201 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170731 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170710 |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FD2A Effective date: 20181107 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170711 |