EP2005424A2 - Method for post-processing a signal in an audio decoder - Google Patents

Method for post-processing a signal in an audio decoder

Info

Publication number
EP2005424A2
EP2005424A2 EP07731774A EP07731774A EP2005424A2 EP 2005424 A2 EP2005424 A2 EP 2005424A2 EP 07731774 A EP07731774 A EP 07731774A EP 07731774 A EP07731774 A EP 07731774A EP 2005424 A2 EP2005424 A2 EP 2005424A2
Authority
EP
European Patent Office
Prior art keywords
frequency
signal
envelope
module
post
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07731774A
Other languages
German (de)
French (fr)
Inventor
Stéphane RAGOT
Cyril Guillaume
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP2005424A2 publication Critical patent/EP2005424A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility

Definitions

  • the present invention relates to a method of post-processing a signal in an audio decoder.
  • the invention finds a particularly advantageous application in the field of transmission and storage of digital signals such as audio-frequency signals: speech, music, etc.
  • the encoder In conventional speech coding, the encoder generates a fixed rate bit stream. This fixed rate constraint simplifies the implementation and use of the encoder and decoder (called “coded" set). Examples of such systems are: ITU-T G.711 coding at 64 kbit / s, ITU-T G.729 coding at 8 kbit / s or GSM-EFR at 12.2 kbit / s.
  • variable rate bit stream In some applications, such as mobile telephony or voice over IP, it is preferable to generate a variable rate bit stream, the bit rate values being taken in a predefined set.
  • multi-rate coding techniques can be distinguished that are more flexible than fixed rate coding:
  • the multi-mode coding controlled by the source and / or the channel as implemented in the AMR-NB, AMR-WB, SMV 1 or VMR-WB systems, hierarchical coding, or "scalable" coding, which generates a so-called hierarchical bitstream because it comprises a core rate and one or more improvement layer (s).
  • the 48, 56 and 64 kbit / s G.722 system is a simple example of scalable rate scaling.
  • the MPEG-4 CELP codec is scalable in terms of bit rate and bandwidth.
  • Other examples of such encoders are found in the articles by B. Kovesi, D. Massaloux, A. Sollaud, A Scalable Speech and Audio Coding Scheme with Continuous Bitrate Flexibility, ICASSP 2004, and H. Taddéi et al, A Scalable. Three Bitrate (8, 14.2 and 24 kbit / s) Audio Coder; 107th AES Convention, 1999. - Multiple description coding.
  • the invention is more particularly concerned with hierarchical coding.
  • the basic concept of hierarchical audio coding for example, is illustrated in the article by Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, Scalable Speech Coding Technology for High-Quality. Ubiquitous Communications, NTT Technical Review, March 2004.
  • the bitstream includes a base layer and one or more enhancement layers.
  • the base layer is generated by a fixed low rate codec, known as a "core coded", guaranteeing the minimum quality of the coding; this layer must be received by the decoder to maintain an acceptable level of quality. Improvement layers are used to improve the quality; it may happen that they are not all received by the decoder.
  • the main advantage of hierarchical coding is that it allows an adaptation of the bit rate by simple truncation of the bit stream.
  • the number of layers that is to say the number of possible truncations of the bit stream, defines the granularity of the coding: we speak of coding with "high granularity” if the bit stream comprises few layers, of the order of 2 to 4 with steps in the range of 4 to 8 kbit / s; a "fine granularity" coding allows a large number of layers with a step of the order of 1 kbit / s.
  • the invention relates to scalable rate and bandwidth encoding techniques with a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s).
  • a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s).
  • broadband enhancement layer s
  • Examples of such systems are given in the aforementioned article H. Taddéi et al with a high granularity of 8, 14.2, 24 kbit / s, and in the aforementioned article by B. Kovesi with fine granularity of 6.4 to 32 kbit / s.
  • G.729EV EV for Embedded Variable Bitrate
  • the objective of the G.729EV standardization is to obtain a G.729 core hierarchical encoder, producing a signal whose band extends from the narrow band (300-3400 Hz) to the broadband (50-7000 Hz). ) at a rate of 8 to 32 kbit / s for conversational services.
  • This encoder is inherently interoperable with Recommendation G.729, which ensures compatibility with existing VoIP devices.
  • the input audio signals are sampled at 16 kHz over a useful band of 50 to 7000 Hz.
  • the high band typically corresponds to frequencies between 3400 Hz. and 7000 Hz.
  • This band is coded according to a band extension technique based on the time and frequency envelope encoder extraction, these envelopes being then applied to the decoder to a reconstructed synthetic excitation signal in the high band. from the parameters estimated in the low band (between 50 and 3400 Hz) sampled at 8 kHz.
  • the low band will be designated in the sequence "first frequency band"; the high band is then called "second frequency band".
  • This band extension technique is shown schematically in FIG.
  • the high frequency components of the original signal are isolated by a bandpass filter (100) between 3400 and 7000 Hz.
  • the temporal and frequency envelopes of the signal are calculated respectively by the modules (101) and (102). These envelopes are quantized together with 2 kbit / s at the block (103).
  • the synthetic excitation from the reconstruction module (104) is then shaped by a scaling module (106) from the time envelope and by a filtering module (107) from the frequency envelope.
  • the band extension mechanism that has just been described with reference to the ITU-T SG16 / WP3 D214 codec is therefore based on the shaping of a synthetic excitation by temporal and frequency envelopes.
  • the application of such a model is delicate and causes the appearance of artifacts in the form of very audible one-time "clicks" due to strong amplitude overruns.
  • the technical problem to be solved by the object of the present invention is to propose a method of post-processing, in an audio decoder, a signal reconstructed by temporal and frequency formatting of an excitation signal obtained.
  • temporal and frequency formatting being made from a temporal envelope and a frequency envelope received and decoded in a second frequency band.
  • said method comprises the steps consisting in comparing the amplitude of said reconstructed signal with said received and decoded time envelope, and, in case of exceeding at least one threshold function of said temporal envelope, to apply to said reconstructed signal an amplitude compression.
  • the method according to the invention compensates for the lack of adequate coupling between the excitation and the shaping functions by means of a post-processing by amplitude compression of the audio signal supplied by the decoder in the second frequency band, or high band.
  • said amplitude compression consists in applying to the amplitude of said signal at least one linear attenuation if said amplitude is greater than at least one trigger threshold according to said received and decoded time envelope.
  • the method of the invention has the advantage of being adaptive in the sense that the triggering threshold is variable since it follows the value of the time envelope received and decoded.
  • the invention also relates to a computer program comprising program code instructions for implementing the post-processing method according to the invention when said program is executed on a computer.
  • the invention further relates to a post-processing module, in an audio decoder, of a signal reconstructed by shaping an excitation signal obtained from at least one estimated parameter in a first frequency band. , said temporal and frequency formatting being made from a time envelope and a frequency envelope received and decoded in a second frequency band, the module being remarkable in that it comprises a comparator of the amplitude said reconstructed signal to said received and decoded time envelope and amplitude compression means adapted, in case of a positive comparison, to apply to said reconstructed signal an amplitude compression.
  • an audio decoder comprising a module for estimating at least one parameter of an excitation signal in a first frequency band, a module for reconstructing a signal of excitation from said parameter, a decoding module of a temporal envelope in a second frequency band, a module (802) for decoding a frequency envelope in a second frequency band, a module (805) for setting in temporal form of said excitation signal, by means, at least, of said decoded time envelope ( ⁇ ) and a frequency forming module (807) of said excitation signal, by means of, at least, said frequency envelope decoded, remarkable in that said decoder comprises a post-processing module according to the invention.
  • FIG. 1 is a diagram of a high-band coding / decoding stage in accordance with the prior art.
  • FIG. 2 is a high level diagram of a hierarchical audio coder to
  • FIG. 3 is a diagram of the high band encoder for the 13.65 kbit / s mode of the coder of FIG. 2.
  • FIG. 4 is a diagram showing the frame division performed by the high band encoder of FIG.
  • FIG. 5 is a high-level diagram of an 8, 12, 13.65 kbit / s hierarchical audio decoder associated with the coder of FIG. 2.
  • Fig. 6 is a diagram of the high band decoder for the 13.65 kbit / s mode of the decoder of Fig. 5.
  • Fig. 7 is a flowchart of a first embodiment of an amplitude compression function.
  • FIG. 8 is a graph of the amplitude compression function of FIG. 7.
  • Fig. 9 is a flowchart of a second embodiment of an amplitude compression function.
  • Figure 10 is a graph of the amplitude compression function of Figure 9.
  • Fig. 11 is a flowchart of a third embodiment of an amplitude compression function.
  • FIG. 12 is a graph of the amplitude compression function of FIG. 11. It will be recalled that the present invention is more particularly part of an overall hierarchical audio coding and decoding scheme in subbands operating at three possible rates: 8, 12 or 13.65 kbit / s. In practice, the encoder always operates at the maximum rate of 13.65 kbit / s, while the decoder can receive the heart at 8 kbit / s and one or two enhancement layers at 12 or 13.65 kbit / s.
  • the hierarchical audio coder is shown schematically in FIG.
  • the broadband input signal sampled at 16 kHz is first decomposed into two subbands by QMF ("Quadrature Mirror”) filtering.
  • QMF Quadrature Mirror
  • the first frequency band, or low band, between 0 and 4000 Hz is obtained by low-pass filtering L and decimation 401, and the second frequency band, or high band, between 4000 and 8000 Hz by filtering 402 passes. H and decimation 403.
  • the filters L and H are of length 64 and conform to those described in the J. Johnston article, ICASSP, flight. 5, pp. 291-294, 1980.
  • the low band is pre-processed by a high pass filter 404 eliminating components below 50 Hz before CELP 405 coding in 8 and 12 kbit / s narrowband.
  • This high-pass filtering takes account of the fact that the wide band is defined as covering the interval 50-7000 Hz.
  • the narrow-band CELP coding corresponds to that of the ITU-T SG16 / WP3 D135 coder ( ITU-T, COM 16, D135 (WP 3/16), "France Telecom G729EV Candidate: High level description and complexity evaluation," Q.10 / 16, Study Period 2005-2008, Geneva, 26 July - 5 August 2005) ; it is a cascaded CELP encoding comprising as a first 8 kbit / s stage a modified G.729 coding (ITU-T G729 Recommendation, Coding of Speech at 8 kbps using Conjugate Structure Algebraic Code Excited Linear Prediction ( CS-ACELP), March 1996) without a pre-processing filter and as a second stage at 12 kbit / s an additional fixed CELP dictionary.
  • CELP coding allows to determine the parameters of the excitation signal in the low band.
  • the high band is first folded spectrally 406 to compensate for the folding due to the high pass filter 402 combined with the decimation 403.
  • the high band is then pretreated by a low pass filter 407 eliminating the components between 3000 and 4000 Hz. of the high band, that is to say the components between 7000 and 8000 Hz of the original signal.
  • a band extension 408, or high band coding, at 13.65 kbit / s is realized.
  • the different bit streams generated by the coding modules 405 and 408 are multiplexed and structured into a hierarchical bit stream in the multiplexer 409.
  • the coding is done in blocks of samples, or frames, of 20 ms, ie 320 samples.
  • the hierarchical coding rate is 8, 12 and 13.65 kbit / s.
  • the high band encoder 408 is detailed in FIG. 3. Its principle is similar to the parametric band extension of the ITU-T SG16 / WP3 D214 encoder.
  • the high band signal x h i is coded in frames of N / 2 samples, where N is the number of samples of the original broadband frame and the division by 2 is due to the decimation by 2 of the high band.
  • N / 2 160 samples, or 20 ms at 8 kHz sampling.
  • time and frequency envelopes are extracted by the modules 600 and 601 as in the ITU-T SG16 / WP3 D214 encoder. These envelopes are then jointly quantized in block 602.
  • This operation requires future samples, commonly called “lookahead” because the spectral analysis uses a temporal window centered on the current frame that overflows on the future frame.
  • the frequent envelope extraction can be carried out for example as follows: calculation of the short-term spectrum with windowing of the current frame and lookahead, and discrete Fourier transform,
  • the frequency envelope is thus defined as the rms value of each of the sub-bands of the signal Xh ,.
  • Each frame of 20 ms consists of 160 samples:
  • the time envelope of the current frame is calculated as follows:
  • the time envelope is thus defined as the rms value of each of the 16 subframes of the signal X h ,.
  • FIG. 5 represents a hierarchical audio decoder associated with the encoder which has just been described with reference to FIGS. 2 and 3.
  • the bits describing each frame of 20 ms are demultiplexed by the demultiplexer 500.
  • the bitstream of the 8 and 12 kbit / s layers is used by the decoding module 501 CELP to generate the parameters of synthesis of the excitation signal in the band.
  • the low band synthetic speech signal is then postfiltered by block 502.
  • the portion of the bit stream associated with the 13.65 kbit / s layer is decoded by the band extension module 503.
  • the expanded band output signal, sampled at 16 kHz, is obtained through the synthesis QMF filter bank 504, 505, 507, 508 and 509, incorporating the reverse folding 506.
  • the high band decoder 503 of FIG. 5 is described in detail in FIG.
  • This decoder repeats the principle of synthesis of the high band described for the coder of FIG. 1, with however two modifications: a frequency envelope interpolation module 806 and a post-processing module 808. These two frequency envelope interpolation and post-processing modules are intended to improve the quality of coding in the high band.
  • the module 806 interpolates between the frequency envelope of the preceding frame and the frequency envelope of the current frame so that this envelope evolves every 10 ms, instead of 20 ms.
  • the high band decoder of FIG. 6 demultiplexes in the demultiplexer 800 the parameters received in the bitstream and decodes the time and frequency envelope information in the modules 801 and
  • a synthetic excitation signal is generated in a reconstruction module 803 from the CELP excitation parameters received by the 8 and 12 kbit / s layers. This excitation is filtered in the 804 low-pass filter to keep only the frequencies between 0 and 3000 Hz which correspond to the 4000 to 7000 Hz band of the original signal. As in the encoder of FIG. 1, the synthetic excitation signal is shaped by the modules 805 and 807:
  • the output of the temporal shaping module 805 ideally has an effective value (r.m.s.) per subframes which corresponds to the decoded time envelope; the module 805 therefore corresponds to the application of an adaptive gain in time,
  • the output of the frequency shaping module 807 ideally has an effective value (rms) per sub-band which corresponds to the decoded frequency envelope; the module 807 can be realized by means of a filterbank or a transform with overlap.
  • the signal resulting from the shaping of the excitation is finally processed by the post-processing module 808 to obtain the reconstructed high band y.
  • the post-processing module 808 will now be described in detail.
  • the post-processing performed by the module 808 consists in applying to the signal x coming from the frequency shaping module 807 an amplitude compression so as to limit the amplitude of the signal and thus avoid the artifacts that could occur as a result of lack of coupling between excitation and shaping.
  • this post-treatment acts instantaneously, that is to say sample per sample without causing a delay in treatment
  • the trigger threshold for the amplitude compression is provided by the time envelope as decoded by the time envelope decoding module 801.
  • the post-processing is of the adaptive type because the value of ⁇ changes at each subframe of 10 samples, namely every 1.25 ms,
  • the decoded time envelope for the current frame corresponds to a temporal support offset by 2 ms, ie 16 samples, as illustrated in FIG. 4.
  • the adaptive post-processing keeps in memory the effective value (rms) of the two sub-bits. -sames associated with the "lookahead": these two subframes correspond to the two subframes of the beginning of the current frame.
  • the flowchart of FIG. 7 details a first compression function, denoted C 1 (X), of post-processing.
  • C 1 (X) a first compression function
  • the beginning and end of the calculation are identified by blocks 1000 and 1006.
  • the value of the output is first initialized at x (block 1001). Then two tests are done (blocks 1002 and
  • FIG. 8 clearly shows that the function Ci (x) performs symmetrical amplitude compression with a "trigger threshold" set at +/- ⁇ . More precisely, the slope of Fi (x / ⁇ ) is 1 between [-1. + 1] and 1/16 elsewhere. Equivalently, the slope of Ci (x) is 1 between [- ⁇ , + ⁇ ] and 1/16 elsewhere.
  • FIGS. 9 to 12 Two variants of the post-processing are described in FIGS. 9 to 12.
  • the corresponding functions are denoted respectively C 2 (X) and C 3 (X).
  • the post-processing C 2 (x) shown in FIGS. 9 and 10 is identical to C- ⁇ (x) but with a value of the "trigger threshold" which goes from +/- ⁇ to +/- 2 ⁇ .
  • the slope of C 2 (x) is 1 between [-2 ⁇ , + 2 ⁇ ] and 1/16 elsewhere.
  • the post-processing C 3 (x) is a more evolved variant of Ci (x), in which the amplitude compression is performed in two successive steps.
  • the trip interval is always set to [- ⁇ , + ⁇ ] (blocks 1402 and 1406), whereas the value of y is only attenuated by a factor Vi, unless the value of y modified by blocks 1403 and 1407 is outside the range [-2.5 ⁇ , + 2.5 ⁇ ] in which case the value of y is further modified by blocks 1405 and 1409.
  • C 3 ( x) The operation of C 3 ( x) is illustrated in Figure 12 where we can see that the slope of C 3 (x) is: - 1/16 on [- ⁇ , -4 ⁇ ] and [4 ⁇ , + ⁇ ], - 1/2 on [-Aa, - ⁇ ] and [ ⁇ , 4 ⁇ ] and - 1 on [- ⁇ , + ⁇ ].

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a method for post-processing, in an audio decoder, a signal reconstructed by the temporal and frequential shaping (805, 807) of an excitation signal obtained on the basis of at least one parameter in a first frequency band, said temporal and frequential shaping being carried out at least on the basis of a temporal envelope and a frequential envelope received and decoded (801, 802) in a second frequency band. The method is such that, once the shaping (805,807) has been carried out, steps of comparing the amplitude of the reconstructed signal with the received and decoded temporal envelope (s) are followed, and an amplitude compression is applied to the reconstructed signal if at least one threshold of the temporal envelope is exceeded. The invention relates to a post-processing module for implementing the inventive method, and to an audio decoder. It is used for transmitting and storing digital signals such as audiofrequency signals: speech, music, etc.

Description

PROCEDE DE POST-TRAITEMENT D'UN SIGNAL DANS UN DECODEUR METHOD FOR POST-PROCESSING A SIGNAL IN A DECODER
AUDIOAUDIO
La présente invention concerne un procédé de post-traitement d'un signal dans un décodeur audio.The present invention relates to a method of post-processing a signal in an audio decoder.
L'invention trouve une application particulièrement avantageuse dans le domaine de la transmission et du stockage des signaux numériques tels que les signaux audio-fréquences : parole, musique, etc.The invention finds a particularly advantageous application in the field of transmission and storage of digital signals such as audio-frequency signals: speech, music, etc.
Différentes techniques existent pour convertir sous forme numérique un signal audio-fréquences, tel que parole, musique, etc. Les techniques les plus courantes sont les méthodes de « codage de forme d'onde », comme le codage MIC ou MICDA (PCM ou ADPCM en anglais), les méthodes de « codage paramétrique par analyse par synthèse » comme le codage CELP (Code Excited Linear Prédiction), et les méthodes de « codage perceptuel en sous-bandes ou par transformée ». Ces techniques classiques de codage et de quantification des signaux audio-fréquences sont décrites par exemple dans les ouvrages de A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Académie Publisher, 1992, et de B. Kleijn and K.K. Paliwal editors, Speech Coding and Synthesis, Elsevier, 1995.Various techniques exist for converting into digital form an audio-frequency signal, such as speech, music, etc. The most common techniques are "waveform coding" methods, such as MIC or ADPCM (PCM or ADPCM) coding, "parametric coding by synthesis analysis" methods such as CELP coding (Code Excited Linear Prediction), and the methods of "perceptual coding in subbands or by transform". These conventional techniques for encoding and quantifying audio-frequency signals are described, for example, in the works of A. Gersho and RM Gray, Vector Quantization and Signal Compression, Kluwer Academy Publisher, 1992, and B. Kleijn and KK Paliwal editors, Speech Coding and Synthesis, Elsevier, 1995.
En codage de parole conventionnel, le codeur génère un flux binaire à débit fixe. Cette contrainte de débit fixe simplifie la mise en œuvre et l'utilisation du codeur et du décodeur (appelés ensemble « codée »). Des exemples de tels systèmes sont : le codage UIT-T G.711 à 64 kbit/s, le codage UIT-T G.729 à 8 kbit/s ou le système GSM-EFR à 12.2 kbit/s.In conventional speech coding, the encoder generates a fixed rate bit stream. This fixed rate constraint simplifies the implementation and use of the encoder and decoder (called "coded" set). Examples of such systems are: ITU-T G.711 coding at 64 kbit / s, ITU-T G.729 coding at 8 kbit / s or GSM-EFR at 12.2 kbit / s.
Dans certaines applications, comme la téléphonie mobile ou la voix sur IP, il est préférable de générer un flux binaire à débit variable, les valeurs du débit étant prises dans un ensemble prédéfini. On peut distinguer plusieurs techniques de codage multi-débits, plus flexibles que le codage à débit fixe :In some applications, such as mobile telephony or voice over IP, it is preferable to generate a variable rate bit stream, the bit rate values being taken in a predefined set. Several multi-rate coding techniques can be distinguished that are more flexible than fixed rate coding:
- le codage multi-modes contrôlé par la source et/ou le canal, tel que mis en œuvre dans les systèmes AMR-NB, AMR-WB, SMV1 ou VMR-WB, - le codage hiérarchique, ou codage "scalable", qui génère un flux binaire dit hiérarchique car il comprend un débit cœur et une ou plusieurs couche(s) d'amélioration. Le système G.722 à 48, 56 et 64 kbit/s est un exemple simple de codage scalable en débit. Le codée MPEG-4 CELP est quant à lui scalable en débit et en largeur de bande. On trouve d'autres exemples de tels codeurs dans les articles de B. Kovesi, D. Massaloux, A. Sollaud, A scalable speech and audio coding scheme with continuous bitrate flexibility, ICASSP 2004, et de H. Taddéi et al, A Scalable Three Bitrate (8, 14.2 and 24 kbit/s) Audio Coder; 107th Convention AES, 1999. - le codage à descriptions multiples.the multi-mode coding controlled by the source and / or the channel, as implemented in the AMR-NB, AMR-WB, SMV 1 or VMR-WB systems, hierarchical coding, or "scalable" coding, which generates a so-called hierarchical bitstream because it comprises a core rate and one or more improvement layer (s). The 48, 56 and 64 kbit / s G.722 system is a simple example of scalable rate scaling. The MPEG-4 CELP codec is scalable in terms of bit rate and bandwidth. Other examples of such encoders are found in the articles by B. Kovesi, D. Massaloux, A. Sollaud, A Scalable Speech and Audio Coding Scheme with Continuous Bitrate Flexibility, ICASSP 2004, and H. Taddéi et al, A Scalable. Three Bitrate (8, 14.2 and 24 kbit / s) Audio Coder; 107th AES Convention, 1999. - Multiple description coding.
L'invention s'intéresse plus particulièrement au codage hiérarchique. Le concept de base du codage audio hiérarchique est par exemple illustré dans l'article de Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, Scalable Speech Coding Technology for High- Quality Ubiquitous Communications, NTT Technical Review, March 2004. Le flux binaire comprend une couche de base et une ou plusieurs couches d'amélioration. La couche de base est générée par un codée à bas débit fixe, qualifié de « codée cœur », garantissant la qualité minimale du codage ; cette couche doit être reçue par le décodeur pour maintenir un niveau de qualité acceptable. Les couches d'amélioration servent à améliorer la qualité ; il peut arriver qu'elles ne soient pas toutes reçues par le décodeur. L'intérêt principal du codage hiérarchique est qu'il permet une adaptation du débit par simple troncature du flux binaire. Le nombre de couches, c'est-à-dire le nombre de troncatures possibles du flux binaire, définit la granularité du codage : on parle de codage à « granularité forte » si le flux binaire comprend peu de couches, de l'ordre de 2 à 4 avec des pas de l'ordre de 4 à 8 kbit/s; un codage à « granularité fine » permet un grand nombre de couches avec un pas de l'ordre de 1 kbit/s.The invention is more particularly concerned with hierarchical coding. The basic concept of hierarchical audio coding, for example, is illustrated in the article by Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, and A. Kataoka, Scalable Speech Coding Technology for High-Quality. Ubiquitous Communications, NTT Technical Review, March 2004. The bitstream includes a base layer and one or more enhancement layers. The base layer is generated by a fixed low rate codec, known as a "core coded", guaranteeing the minimum quality of the coding; this layer must be received by the decoder to maintain an acceptable level of quality. Improvement layers are used to improve the quality; it may happen that they are not all received by the decoder. The main advantage of hierarchical coding is that it allows an adaptation of the bit rate by simple truncation of the bit stream. The number of layers, that is to say the number of possible truncations of the bit stream, defines the granularity of the coding: we speak of coding with "high granularity" if the bit stream comprises few layers, of the order of 2 to 4 with steps in the range of 4 to 8 kbit / s; a "fine granularity" coding allows a large number of layers with a step of the order of 1 kbit / s.
L'invention concerne plus particulièrement les techniques de codage scalable en débit et en largeur de bande avec un codeur cœur de type CELP en bande téléphonique et une ou plusieurs couche(s) d'amélioration en bande élargie. Des exemples de tels systèmes sont donnés dans l'article précité de H. Taddéi et al avec une granuiarité forte de 8, 14.2, 24 kbit/s, et dans l'article précité de B. Kovesi avec granuiarité fine de 6.4 à 32 kbit/s.More specifically, the invention relates to scalable rate and bandwidth encoding techniques with a CELP heart-type coder in a telephone band and one or more broadband enhancement layer (s). Examples of such systems are given in the aforementioned article H. Taddéi et al with a high granularity of 8, 14.2, 24 kbit / s, and in the aforementioned article by B. Kovesi with fine granularity of 6.4 to 32 kbit / s.
L'UIT-T a lancé en 2004 un projet de codeur hiérarchique à cœur normalisé. Ce codeur, appelé G.729EV (EV pour « Embedded Variable bitrate ») est une annexe du codeur G.729 connu. L'objectif de la normalisation G.729EV est d'obtenir un codeur hiérarchique à cœur G.729, produisant un signal dont la bande s'étend de la bande étroite (300-3400 Hz) à la bande élargie (50-7000 Hz) à un débit de 8 à 32 kbit/s pour les services conversationnels. Ce codeur est par nature inter-opérable avec la recommandation G.729, ce qui assure la compatibilité avec les équipements de voix sur IP existants.In 2004, the ITU-T launched a standardized core hierarchical coder project. This encoder, called G.729EV (EV for Embedded Variable Bitrate) is an appendix of the known G.729 encoder. The objective of the G.729EV standardization is to obtain a G.729 core hierarchical encoder, producing a signal whose band extends from the narrow band (300-3400 Hz) to the broadband (50-7000 Hz). ) at a rate of 8 to 32 kbit / s for conversational services. This encoder is inherently interoperable with Recommendation G.729, which ensures compatibility with existing VoIP devices.
En réponse à ce projet, il a été proposé notamment un système de codage à trois couches, à savoir un codage CELP en cascade à 8-12 kbit/s, suivi d'une extension de bande paramétrique à 14 kbit/s, puis d'un codage par transformée de 14 à 32 kbit/s. Ce codeur est connu sous la référence ITU-T SG16/WP3 D214 (UIT-T, COM 16, D214 (WP 3/16), "High level description of the scalable 8-32 kbit/s algorithm submitted to the Qualification Test by Matsushita, Mindspeed and Siemens," Q.10/16, Study Period 2005-2008, Geneva, 26 JuIy - 5 August 2005). La notion d'extension de bande fait référence au codage de la bande haute d'un signal. Dans le contexte de l'invention, les signaux audio d'entrée sont échantillonnés à 16 kHz sur une bande utile de 50 à 7000 Hz. Pour le codeur ITU-T SG16/WP3 D214 précité, la bande haute correspond typiquement aux fréquences entre 3400 et 7000 Hz. Cette bande est codée suivant une technique d'extension de bande reposant sur l'extraction au codeur d'enveloppes temporelle et fréquentielle, ces enveloppes étant ensuite appliquées au décodeur à un signal d'excitation synthétique reconstruit dans la bande haute à partir des paramètres estimés dans la bande basse (entre 50 et 3400 Hz), échantillonnée à 8 kHz. La bande basse sera désignée dans la suite « première bande de fréquence » ; la bande haute étant alors appelée « deuxième bande de fréquence ».In response to this project, a three-layer coding scheme was proposed, namely 8-12 kbit / s cascaded CELP coding, followed by a 14 kbit / s parametric band transform coding from 14 to 32 kbit / s. This coder is known as ITU-T SG16 / WP3 D214 (ITU-T, COM 16, D214 (WP 3/16), "High level description of the scalable 8-32 kbit / s algorithm submitted to the Qualification Test by Matsushita, Mindspeed and Siemens, "Q.10 / 16, Study Period 2005-2008, Geneva, 26 July - 5 August 2005). The notion of band extension refers to the coding of the high band of a signal. In the context of the invention, the input audio signals are sampled at 16 kHz over a useful band of 50 to 7000 Hz. For the aforementioned ITU-T SG16 / WP3 D214 encoder, the high band typically corresponds to frequencies between 3400 Hz. and 7000 Hz. This band is coded according to a band extension technique based on the time and frequency envelope encoder extraction, these envelopes being then applied to the decoder to a reconstructed synthetic excitation signal in the high band. from the parameters estimated in the low band (between 50 and 3400 Hz) sampled at 8 kHz. The low band will be designated in the sequence "first frequency band"; the high band is then called "second frequency band".
Cette technique d'extension de bande est schématisée à la figure 1. Au codeur, les composantes hautes fréquences du signal original sont isolées par un filtre (100) passe-bande entre 3400 et 7000 Hz. Ensuite, les enveloppes temporelle et fréquentielle du signal sont calculées respectivement par les modules (101 ) et (102). Ces enveloppes sont quantifiées conjointement à 2 kbit/s au niveau du bloc (103).This band extension technique is shown schematically in FIG. At the encoder, the high frequency components of the original signal are isolated by a bandpass filter (100) between 3400 and 7000 Hz. Then, the temporal and frequency envelopes of the signal are calculated respectively by the modules (101) and (102). These envelopes are quantized together with 2 kbit / s at the block (103).
Au décodeur, une excitation synthétique est reconstruite par le moduleAt the decoder, a synthetic excitation is reconstructed by the module
(104) de reconstruction à partir des paramètres du décodeur CELP en cascade. Les enveloppes temporelle et fréquentielle sont décodées par le bloc(104) reconstruction from the parameters of the cascaded CELP decoder. The temporal and frequency envelopes are decoded by the block
(105) de quantification inverse. L'excitation synthétique issue du module (104) de reconstruction est ensuite mise en forme par un module (106) de mise à l'échelle à partir de l'enveloppe temporelle et par un module (107) de filtrage à partir de l'enveloppe fréquentielle.(105) inverse quantization. The synthetic excitation from the reconstruction module (104) is then shaped by a scaling module (106) from the time envelope and by a filtering module (107) from the frequency envelope.
Le mécanisme d'extension de bande qui vient d'être décrit en référence au codée ITU-T SG16/WP3 D214 repose donc sur la mise en forme d'une excitation synthétique par des enveloppes temporelle et fréquentielle. Cependant, en l'absence de couplage entre l'excitation et la mise en forme, l'application d'un tel modèle est délicate et provoque l'apparition d'artefacts sous la forme de "clics" ponctuels très audibles dus à de forts dépassements d'amplitude. Aussi, le problème technique à résoudre par l'objet de la présente invention est de proposer un procédé de post-traitement, dans un décodeur audio, d'un signal reconstruit par mises en forme temporelle et fréquentielle d'un signal d'excitation obtenu à partir d'au moins un paramètre estimé dans une première bande de fréquence, qui permettrait d'éviter les artefacts induits par les mises en forme du signal d'excitation synthétique, les dites mises en forme temporelle et fréquentielle étant réalisées à partir d'une enveloppe temporelle et d'une enveloppe fréquentielle reçues et décodées dans une deuxième bande de fréquence.The band extension mechanism that has just been described with reference to the ITU-T SG16 / WP3 D214 codec is therefore based on the shaping of a synthetic excitation by temporal and frequency envelopes. However, in the absence of coupling between the excitation and the shaping, the application of such a model is delicate and causes the appearance of artifacts in the form of very audible one-time "clicks" due to strong amplitude overruns. Also, the technical problem to be solved by the object of the present invention is to propose a method of post-processing, in an audio decoder, a signal reconstructed by temporal and frequency formatting of an excitation signal obtained. from at least one estimated parameter in a first frequency band, which would make it possible to avoid the artifacts induced by the shaping of the synthetic excitation signal, said temporal and frequency formatting being made from a temporal envelope and a frequency envelope received and decoded in a second frequency band.
La solution au problème technique posé consiste, selon la présente invention, en ce que ledit procédé comprend les étapes consistant à comparer l'amplitude dudit signal reconstruit à ladite enveloppe temporelle reçue et décodée, et, en cas de dépassement d'au moins un seuil fonction de ladite enveloppe temporelle, à appliquer audit signal reconstruit une compression d'amplitude.The solution to the technical problem that is posed, according to the present invention, in that said method comprises the steps consisting in comparing the amplitude of said reconstructed signal with said received and decoded time envelope, and, in case of exceeding at least one threshold function of said temporal envelope, to apply to said reconstructed signal an amplitude compression.
Ainsi, le procédé conforme à l'invention compense l'absence de couplage adéquat entre l'excitation et les fonctions de mise en forme au moyen d'un post-traitement par compression d'amplitude du signal audio fourni par le décodeur dans la deuxième bande de fréquence, ou bande haute.Thus, the method according to the invention compensates for the lack of adequate coupling between the excitation and the shaping functions by means of a post-processing by amplitude compression of the audio signal supplied by the decoder in the second frequency band, or high band.
Selon un mode de réalisation, ladite compression d'amplitude consiste à appliquer à l'amplitude dudit signal au moins une atténuation linéaire si ladite amplitude est supérieure à au moins un seuil de déclenchement fonction de ladite enveloppe temporelle reçue et décodée.According to one embodiment, said amplitude compression consists in applying to the amplitude of said signal at least one linear attenuation if said amplitude is greater than at least one trigger threshold according to said received and decoded time envelope.
On remarquera qu'outre le fait de limiter l'amplitude du signal et donc les artefacts associés aux fortes amplitudes, le procédé de l'invention a l'avantage d'être adaptatif au sens où le seuil de déclenchement est variable puisqu'il suit la valeur de l'enveloppe temporelle reçue et décodée.It will be noted that, in addition to limiting the amplitude of the signal and therefore the artifacts associated with the high amplitudes, the method of the invention has the advantage of being adaptive in the sense that the triggering threshold is variable since it follows the value of the time envelope received and decoded.
L'invention concerne également un programme d'ordinateur comprenant des instructions de code de programme pour la mise en œuvre du procédé de post-traitement selon l'invention lorsque ledit programme est exécuté sur un ordinateur. L'invention concerne en outre un module de post-traitement, dans un décodeur audio, d'un signal reconstruit par mises en forme d'un signal d'excitation obtenu à partir d'au moins un paramètre estimé dans une première bande de fréquence, les dites mises en forme temporelle et fréquentielle étant réalisées à partir d'une enveloppe temporelle et d'une enveloppe fréquentielle reçues et décodées dans une deuxième bande de fréquence, le module étant remarquable en ce qu'il comprend un comparateur de l'amplitude dudit signal reconstruit à ladite enveloppe temporelle reçue et décodée et des moyens de compression d'amplitude aptes, en cas de comparaison positive, à appliquer audit signal reconstruit une compression d'amplitude.The invention also relates to a computer program comprising program code instructions for implementing the post-processing method according to the invention when said program is executed on a computer. The invention further relates to a post-processing module, in an audio decoder, of a signal reconstructed by shaping an excitation signal obtained from at least one estimated parameter in a first frequency band. , said temporal and frequency formatting being made from a time envelope and a frequency envelope received and decoded in a second frequency band, the module being remarkable in that it comprises a comparator of the amplitude said reconstructed signal to said received and decoded time envelope and amplitude compression means adapted, in case of a positive comparison, to apply to said reconstructed signal an amplitude compression.
Enfin, l'invention concerne un décodeur audio, comprenant un module d'estimation d'au moins un paramètre d'un signal d'excitation dans une première bande de fréquence, un module de reconstruction d'un signal d'excitation à partir dudit paramètre, un module de décodage d'une enveloppe temporelle dans une deuxième bande de fréquence, un module (802) de décodage d'une enveloppe fréquentielle dans une deuxième bande de fréquence, un module (805) de mise en forme temporelle dudit signal d'excitation, au moyen, au moins, de ladite enveloppe temporelle décodée (σ) et un module (807) de mise en forme fréquentielle dudit signal d'excitation, au moyen, au moins, de ladite enveloppe fréquentielle décodée, remarquable en ce que ledit décodeur comprend un module de post-traitement selon l'invention. La description qui va suivre en regard des dessins annexés, donnés à titre d'exemples non limitatifs, fera bien comprendre en quoi consiste l'invention et comment elle peut être réalisée.Finally, the invention relates to an audio decoder, comprising a module for estimating at least one parameter of an excitation signal in a first frequency band, a module for reconstructing a signal of excitation from said parameter, a decoding module of a temporal envelope in a second frequency band, a module (802) for decoding a frequency envelope in a second frequency band, a module (805) for setting in temporal form of said excitation signal, by means, at least, of said decoded time envelope (σ) and a frequency forming module (807) of said excitation signal, by means of, at least, said frequency envelope decoded, remarkable in that said decoder comprises a post-processing module according to the invention. The following description with reference to the accompanying drawings, given as non-limiting examples, will make it clear what the invention consists of and how it can be achieved.
La figure 1 est un schéma d'un étage de codage-décodage en bande haute conforme à l'art antérieur. La figure 2 est un schéma haut niveau d'un codeur audio hiérarchique àFIG. 1 is a diagram of a high-band coding / decoding stage in accordance with the prior art. FIG. 2 is a high level diagram of a hierarchical audio coder to
8, 12, 13.65 kbit/s.8, 12, 13.65 kbit / s.
La figure 3 est un schéma du codeur en bande haute pour le mode à 13.65 kbit/s du codeur de la figure 2.FIG. 3 is a diagram of the high band encoder for the 13.65 kbit / s mode of the coder of FIG. 2.
La figure 4 est un schéma montrant le découpage par trames effectué par le codeur en bande haute de la figure 3.FIG. 4 is a diagram showing the frame division performed by the high band encoder of FIG.
La figure 5 est un schéma haut niveau d'un décodeur audio hiérarchique à 8, 12, 13.65 kbit/s associé au codeur de la figure 2.FIG. 5 is a high-level diagram of an 8, 12, 13.65 kbit / s hierarchical audio decoder associated with the coder of FIG. 2.
La figure 6 est un schéma du décodeur en bande haute pour le mode à 13.65 kbit/s du décodeur de la figure 5. La figure 7 est un organigramme d'un premier mode de réalisation d'une fonction de compression d'amplitude.Fig. 6 is a diagram of the high band decoder for the 13.65 kbit / s mode of the decoder of Fig. 5. Fig. 7 is a flowchart of a first embodiment of an amplitude compression function.
La figure 8 est un graphe de la fonction de compression d'amplitude de la figure 7.FIG. 8 is a graph of the amplitude compression function of FIG. 7.
La figure 9 est un organigramme d'un deuxième mode de réalisation d'une fonction de compression d'amplitude.Fig. 9 is a flowchart of a second embodiment of an amplitude compression function.
La figure 10 est un graphe de la fonction de compression d'amplitude de la figure 9. La figure 11 est un organigramme d'un troisième mode de réalisation d'une fonction de compression d'amplitude.Figure 10 is a graph of the amplitude compression function of Figure 9. Fig. 11 is a flowchart of a third embodiment of an amplitude compression function.
La figure 12 est un graphe de la fonction de compression d'amplitude de la figure 11. On rappelle que la présente invention s'inscrit plus particulièrement dans un schéma global de codage et décodage audio hiérarchique en sous- bandes fonctionnant à trois débits possibles : 8, 12 ou 13.65 kbit/s. En pratique, le codeur fonctionne toujours au débit maximal de 13.65 kbit/s, tandis que le décodeur peut recevoir le cœur à 8 kbit/s ainsi qu'une ou deux couches d'amélioration à 12 ou 13.65 kbit/s.FIG. 12 is a graph of the amplitude compression function of FIG. 11. It will be recalled that the present invention is more particularly part of an overall hierarchical audio coding and decoding scheme in subbands operating at three possible rates: 8, 12 or 13.65 kbit / s. In practice, the encoder always operates at the maximum rate of 13.65 kbit / s, while the decoder can receive the heart at 8 kbit / s and one or two enhancement layers at 12 or 13.65 kbit / s.
Le codeur audio hiérarchique est schématisé à la figure 2.The hierarchical audio coder is shown schematically in FIG.
Le signal d'entrée en bande élargie, échantillonné à 16 kHz, est d'abord décomposé en deux sous-bandes par filtrage QMF (« Quadrature MirrorThe broadband input signal sampled at 16 kHz is first decomposed into two subbands by QMF ("Quadrature Mirror") filtering.
Filterbank »). La première bande de fréquence, ou bande basse, entre 0 et 4000 Hz est obtenue par le filtrage 400 passe-bas L et décimation 401 , et la deuxième bande de fréquence, ou bande haute, entre 4000 et 8000 Hz par filtrage 402 passe-haut H et décimation 403. Dans un mode de réalisation préféré, les filtres L et H sont de longueur 64 et conformes à ceux décrits dans l'article de J. Johnston, A filter family designed for use in quadrature mirror filter banks, ICASSP, vol. 5, pp. 291 - 294, 1980.Filterbank "). The first frequency band, or low band, between 0 and 4000 Hz is obtained by low-pass filtering L and decimation 401, and the second frequency band, or high band, between 4000 and 8000 Hz by filtering 402 passes. H and decimation 403. In a preferred embodiment, the filters L and H are of length 64 and conform to those described in the J. Johnston article, ICASSP, flight. 5, pp. 291-294, 1980.
La bande basse est pré-traitée par un filtre 404 passe-haut éliminant les composantes en dessous de 50 Hz avant codage CELP 405 en bande étroite à 8 et 12 kbit/s. Ce filtrage passe-haut tient compte du fait que la bande élargie est définie comme couvrant l'intervalle 50-7000 Hz. Selon un mode de réalisation, le codage CELP en bande étroite correspond à celui du codeur ITU-T SG16/WP3 D135 (UIT-T, COM 16, D135 (WP 3/16), "France Telecom G729EV Candidate: High level description and complexity évaluation," Q.10/16, Study Period 2005-2008, Geneva, 26 JuIy - 5 August 2005) ; il s'agit d'un codage CELP en cascade comprenant comme premier étage à 8 kbit/s un codage G.729 modifié (ITU-T G729 Recommandation, Coding of Speech at 8 kbit/s using Conjugate Structure Algebraic Code Excited Linear Prédiction (CS-ACELP), March 1996) sans filtre de pré-traitement et comme deuxième étage à 12 kbit/s un dictionnaire CELP fixe supplémentaire. Le codage CELP permet de déterminer les paramètres du signal d'excitation dans la bande basse.The low band is pre-processed by a high pass filter 404 eliminating components below 50 Hz before CELP 405 coding in 8 and 12 kbit / s narrowband. This high-pass filtering takes account of the fact that the wide band is defined as covering the interval 50-7000 Hz. According to one embodiment, the narrow-band CELP coding corresponds to that of the ITU-T SG16 / WP3 D135 coder ( ITU-T, COM 16, D135 (WP 3/16), "France Telecom G729EV Candidate: High level description and complexity evaluation," Q.10 / 16, Study Period 2005-2008, Geneva, 26 July - 5 August 2005) ; it is a cascaded CELP encoding comprising as a first 8 kbit / s stage a modified G.729 coding (ITU-T G729 Recommendation, Coding of Speech at 8 kbps using Conjugate Structure Algebraic Code Excited Linear Prediction ( CS-ACELP), March 1996) without a pre-processing filter and as a second stage at 12 kbit / s an additional fixed CELP dictionary. CELP coding allows to determine the parameters of the excitation signal in the low band.
La bande haute est d'abord repliée spectralement 406 pour compenser le repliement dû au filtre 402 passe-haut combiné avec la décimation 403. La bande haute est ensuite pré-traitée par un filtre 407 passe-bas éliminant les composantes entre 3000 et 4000 Hz de la bande haute, c'est-à-dire les composantes entre 7000 et 8000 Hz du signal original. Une extension 408 de bande, ou codage en bande haute, à 13.65 kbit/s est réalisée.The high band is first folded spectrally 406 to compensate for the folding due to the high pass filter 402 combined with the decimation 403. The high band is then pretreated by a low pass filter 407 eliminating the components between 3000 and 4000 Hz. of the high band, that is to say the components between 7000 and 8000 Hz of the original signal. A band extension 408, or high band coding, at 13.65 kbit / s is realized.
Les différents flux binaires générés par les modules 405 et 408 de codage sont multiplexes et structurés en un train binaire hiérarchique dans le multiplexeur 409.The different bit streams generated by the coding modules 405 and 408 are multiplexed and structured into a hierarchical bit stream in the multiplexer 409.
Le codage est réalisé par blocs d'échantillons, ou trames, de 20 ms, soit 320 échantillons. Le débit de codage hiérarchique est de 8, 12 et 13.65 kbit/s. Le codeur 408 en bande haute est détaillé à la figure 3. Son principe est similaire à l'extension de bande paramétrique du codeur ITU-T SG16/WP3 D214.The coding is done in blocks of samples, or frames, of 20 ms, ie 320 samples. The hierarchical coding rate is 8, 12 and 13.65 kbit / s. The high band encoder 408 is detailed in FIG. 3. Its principle is similar to the parametric band extension of the ITU-T SG16 / WP3 D214 encoder.
Le signal de bande haute xhi est codé par trames de N/2 échantillons, où N est le nombre d'échantillons de la trame originale en bande élargie et la division par 2 est due à la décimation par 2 de la bande haute. Dans un mode de réalisation préféré, N/2 = 160 échantillons, soit 20 ms à 8 kHz d'échantillonnage. Pour chaque trame, soit toutes les 20 ms, des enveloppes temporelle et fréquentielle sont extraites par les modules 600 et 601 comme dans le codeur ITU-T SG16/WP3 D214. Ces enveloppes sont ensuite quantifiées conjointement dans le bloc 602.The high band signal x h i is coded in frames of N / 2 samples, where N is the number of samples of the original broadband frame and the division by 2 is due to the decimation by 2 of the high band. In a preferred embodiment, N / 2 = 160 samples, or 20 ms at 8 kHz sampling. For each frame, every 20 ms, time and frequency envelopes are extracted by the modules 600 and 601 as in the ITU-T SG16 / WP3 D214 encoder. These envelopes are then jointly quantized in block 602.
On présente maintenant un aperçu du fonctionnement de l'extraction d'enveloppe fréquentielle par le module 600.An overview of the operation of the frequency envelope extraction by the module 600 is now presented.
Cette opération nécessite de disposer d'échantillons futurs, couramment appelés « lookahead » car l'analyse spectrale utilise un fenêtrage temporel centré sur la trame courante qui déborde sur la trame future. Dans un mode de réalisation préféré, le « lookahead » dans la bande haute est fixé à L = 16 échantillons, soit 2 ms. L'extraction d'enveloppe fréquentîelle peut être réalisée par exemple de la manière suivante : - calcul du spectre court-terme avec fenêtrage de la trame courante et « lookahead », et transformée de Fourier discrète,This operation requires future samples, commonly called "lookahead" because the spectral analysis uses a temporal window centered on the current frame that overflows on the future frame. In a preferred embodiment, the "lookahead" in the high band is set at L = 16 samples, ie 2 ms. The frequent envelope extraction can be carried out for example as follows: calculation of the short-term spectrum with windowing of the current frame and lookahead, and discrete Fourier transform,
- découpage du spectre en sous-bandes,- division of the spectrum into subbands,
- calcul de l'énergie court-terme de chacune des sous-bandes et conversion en valeur efficace (r.m.s.).calculation of the short-term energy of each of the sub-bands and conversion into an effective value (r.m.s.).
L'enveloppe fréquentielle est donc définie comme la valeur efficace de chacune des sous-bandes du signal Xh,.The frequency envelope is thus defined as the rms value of each of the sub-bands of the signal Xh ,.
L'extraction d'enveloppe temporelle par le module 601 est expliquée maintenant à l'aide de la figure 4 qui détaille le découpage temporelle du signal xh,.The temporal envelope extraction by the module 601 is explained now with the aid of FIG. 4 which details the temporal division of the signal x h ,.
Chaque trame de 20 ms est constituée des 160 échantillons :Each frame of 20 ms consists of 160 samples:
Xh1 = [X0 Xi ... X159]Xh 1 = [X 0 Xi ... X159]
Les 16 derniers échantillons de Xp1, correspondent en fait auThe last 16 samples of Xp 1 actually correspond to
« lookahead » pour la trame courante.Lookahead for the current frame.
L'enveloppe temporelle de la trame courante est calculée de la manière suivante :The time envelope of the current frame is calculated as follows:
- découpage de Xm en 16 sous-trames de 10 échantillons, - calcul de l'énergie de chacune des sous-trames et conversion en valeur efficace (r.m.s.).- Xm division into 16 subframes of 10 samples, - calculation of the energy of each of the subframes and conversion into an effective value (r.m.s.).
L'enveloppe temporelle est donc définie comme la valeur efficace de chacune des 16 sous-trames du signal Xh,.The time envelope is thus defined as the rms value of each of the 16 subframes of the signal X h ,.
La figure 5 représente un décodeur audio hiérarchique associé au codeur qui vient d'être décrit en référence aux figures 2 et 3.FIG. 5 represents a hierarchical audio decoder associated with the encoder which has just been described with reference to FIGS. 2 and 3.
Les bits décrivant chaque trame de 20 ms sont démultiplexés par le démultiplexeur 500. Le flux binaire des couches à 8 et 12 kbit/s est utilisé par le module 501 de décodage CELP pour générer les paramètres de synthèse du signal d'excitation dans la bande basse entre 0 et 4000 Hz. Le signal de parole synthétique en bande basse est ensuite postfiltré par le bloc 502.The bits describing each frame of 20 ms are demultiplexed by the demultiplexer 500. The bitstream of the 8 and 12 kbit / s layers is used by the decoding module 501 CELP to generate the parameters of synthesis of the excitation signal in the band. The low band synthetic speech signal is then postfiltered by block 502.
La portion du flux binaire associé à la couche à 13.65 kbit/s est décodée par le module 503 d'extension de bande. Le signal de sortie en bande élargie, échantillonné à 16 kHz, est obtenu par l'intermédiaire du banc de filtres QMF de synthèse 504, 505, 507, 508 et 509, intégrant le repliement inverse 506.The portion of the bit stream associated with the 13.65 kbit / s layer is decoded by the band extension module 503. The expanded band output signal, sampled at 16 kHz, is obtained through the synthesis QMF filter bank 504, 505, 507, 508 and 509, incorporating the reverse folding 506.
Le décodeur en bande haute 503 de la figure 5 est décrit en détail à la figure 6.The high band decoder 503 of FIG. 5 is described in detail in FIG.
Ce décodeur reprend le principe de synthèse de la bande haute décrit pour le codeur de la figure 1 avec cependant deux modifications : un module 806 d'interpolation d'enveloppe fréquentielle et un module 808 de posttraitement. Ces deux modules d'interpolation d'enveloppe fréquentielle et de post-traitement sont destinés à l'amélioration de la qualité du codage dans la bande haute. Le module 806 réalise une interpolation entre l'enveloppe fréquentielle de la trame précédente et l'enveloppe fréquentielle de la trame courante pour que cette enveloppe évolue toutes les 10 ms, au lieu de 20 msThis decoder repeats the principle of synthesis of the high band described for the coder of FIG. 1, with however two modifications: a frequency envelope interpolation module 806 and a post-processing module 808. These two frequency envelope interpolation and post-processing modules are intended to improve the quality of coding in the high band. The module 806 interpolates between the frequency envelope of the preceding frame and the frequency envelope of the current frame so that this envelope evolves every 10 ms, instead of 20 ms.
Le décodeur en bande haute de la figure 6 démultiplexe dans le démultiplexeur 800 les paramètres reçus dans le train binaire et décode les informations d'enveloppes temporelle et fréquentielle dans les modules 801 etThe high band decoder of FIG. 6 demultiplexes in the demultiplexer 800 the parameters received in the bitstream and decodes the time and frequency envelope information in the modules 801 and
802 de décodage. Un signal d'excitation synthétique est généré dans un module 803 de reconstruction à partir des paramètres d'excitation CELP reçus par les couches à 8 et 12 kbit/s. Cette excitation est filtrée dans le filtre 804 passe-bas pour ne garder que les fréquences entre 0 et 3000 Hz qui correspondent à la bande 4000 à 7000 Hz du signal original. Comme dans le codeur de la figure 1 , le signal d'excitation synthétique est mis en forme par les modules 805 et 807 :802 decoding. A synthetic excitation signal is generated in a reconstruction module 803 from the CELP excitation parameters received by the 8 and 12 kbit / s layers. This excitation is filtered in the 804 low-pass filter to keep only the frequencies between 0 and 3000 Hz which correspond to the 4000 to 7000 Hz band of the original signal. As in the encoder of FIG. 1, the synthetic excitation signal is shaped by the modules 805 and 807:
- la sortie du module 805 de mise en forme temporelle a idéalement une valeur efficace (r.m.s.) par sous-trames qui correspond à l'enveloppe temporelle décodée ; le module 805 correspond donc à l'application d'un gain adaptatif dans le temps,the output of the temporal shaping module 805 ideally has an effective value (r.m.s.) per subframes which corresponds to the decoded time envelope; the module 805 therefore corresponds to the application of an adaptive gain in time,
- la sortie du module 807 de mise en forme fréquentielle a idéalement une valeur efficace (r.m.s.) par sous-bandes qui correspond à l'enveloppe fréquentielle décodée ; le module 807 peut être réalisé au moyen d'un banc de filtres ou d'une transformée avec recouvrement. Le signai x résultant de la mise en forme de l'excitation est finalement traité par le module 808 de post-traitement pour obtenir la bande haute reconstruite y.the output of the frequency shaping module 807 ideally has an effective value (rms) per sub-band which corresponds to the decoded frequency envelope; the module 807 can be realized by means of a filterbank or a transform with overlap. The signal resulting from the shaping of the excitation is finally processed by the post-processing module 808 to obtain the reconstructed high band y.
Le module 808 de post-traitement va maintenant être décrit en détail. Le post-traitement réalisé par le module 808 consiste à appliquer au signal x issu du module 807 de mise en forme fréquentielle une compression d'amplitude de manière à limiter l'amplitude du signal et ainsi éviter les artefacts qui pourraient se produire du fait de l'absence de couplage entre l'excitation et la mise en forme. Le signal y de sortie du module 808 de post-traitement sera écrit sous la forme : y = C(x) = σ.F(x/σ)The post-processing module 808 will now be described in detail. The post-processing performed by the module 808 consists in applying to the signal x coming from the frequency shaping module 807 an amplitude compression so as to limit the amplitude of the signal and thus avoid the artifacts that could occur as a result of lack of coupling between excitation and shaping. The output signal y of the post-processing module 808 will be written as: y = C (x) = σ.F (x / σ)
où σ désigne l'enveloppe temporelle décodée. Les propriétés du post-traitement proposé par l'invention sont les suivantes :where σ is the decoded time envelope. The properties of the post-treatment proposed by the invention are as follows:
- ce post-traitement agit de façon instantanée, c'est-à-dire échantillon par échantillon sans engendrer de retard de traitement,this post-treatment acts instantaneously, that is to say sample per sample without causing a delay in treatment,
- le seuil de déclenchement pour la compression d'amplitude est fourni par l'enveloppe temporelle telle que décodée par le module 801 de décodage d'enveloppe temporelle. Par définition, σ > 0,the trigger threshold for the amplitude compression is provided by the time envelope as decoded by the time envelope decoding module 801. By definition, σ> 0,
- le post-traitement est de type adaptatif car la valeur de σ change à chaque sous-trame de 10 échantillons, à savoir toutes les 1.25 ms,the post-processing is of the adaptive type because the value of σ changes at each subframe of 10 samples, namely every 1.25 ms,
- l'enveloppe temporelle décodée pour la trame courante correspond à un support temporel décalé de 2 ms, soit 16 échantillons, comme illustré à la figure 4. Ainsi, le post-traitement adaptatif garde en mémoire la valeur efficace (r.m.s.) des deux sous-trames associées au « lookahead » : ces deux sous- trames correspondent aux deux sous-trames du début de la trame courante.the decoded time envelope for the current frame corresponds to a temporal support offset by 2 ms, ie 16 samples, as illustrated in FIG. 4. Thus, the adaptive post-processing keeps in memory the effective value (rms) of the two sub-bits. -sames associated with the "lookahead": these two subframes correspond to the two subframes of the beginning of the current frame.
L'organigramme de la figure 7 détaille une première fonction de compression, notée C1(X), de post-traitement. Les début et fin du calcul sont identifiés par les bloc 1000 et 1006. La valeur de la sortie y est d'abord initialisée à x (bloc 1001). Ensuite, deux tests sont effectués (blocs 1002 etThe flowchart of FIG. 7 details a first compression function, denoted C 1 (X), of post-processing. The beginning and end of the calculation are identified by blocks 1000 and 1006. The value of the output is first initialized at x (block 1001). Then two tests are done (blocks 1002 and
1004) pour vérifier si y est dans l'intervalle [-σ, σ]. Trois cas sont possibles : - si y est dans l'intervalle [-σ, σ], le calcul de y est terminé : y = x et Ci(x) = x ; F^x/σ) = x/σ1004) to check if y is in the range [-σ, σ]. Three cases are possible: - if y is in the interval [-σ, σ], the computation of y is complete: y = x and Ci (x) = x; F ^ x / σ) = x / σ
- si y > σ, sa valeur est modifiée tel que défini dans le bloc 1003 ; l'écart entre y et +σ est atténué par un facteur 16. - si y < -σ, sa valeur est modifiée tel que défini dans le bloc 1005 ; l'écart entre y et -σ est atténué par un facteur 16.if y> σ, its value is modified as defined in block 1003; the difference between y and + σ is attenuated by a factor of 16. - if y <-σ, its value is modified as defined in block 1005; the difference between y and -σ is attenuated by a factor of 16.
Pour bien illustrer le fonctionnement de l'opération y = Ci(x), on montre à la figure 8 la courbe de y/σ en fonction de x/σ. Les données sont normalisées par σ pour rendre la caractéristique d'entrée/sortie indépendante de la valeur de σ. Cette caractéristique normalisée est notée F-ι(x/σ) ; on a par suite: Ci(x) = σ Fi(x/σ).To illustrate the operation of the operation y = Ci (x), we show in Figure 8 the curve of y / σ as a function of x / σ. The data are normalized by σ to make the input / output characteristic independent of the value of σ. This normalized characteristic is denoted F-ι (x / σ); we have as a result: Ci (x) = σ Fi (x / σ).
La figure 8 montre bien que la fonction Ci(x) réalise une compression d'amplitude symétrique avec un "seuil de déclenchement" fixé à +/-σ. Plus précisément la pente de Fi(x/σ) est de 1 entre [-1.+1] et de 1/16 ailleurs. De façon équivalente, la pente de Ci(x) est de 1 entre [-σ,+σ] et de 1/16 ailleurs.FIG. 8 clearly shows that the function Ci (x) performs symmetrical amplitude compression with a "trigger threshold" set at +/- σ. More precisely, the slope of Fi (x / σ) is 1 between [-1. + 1] and 1/16 elsewhere. Equivalently, the slope of Ci (x) is 1 between [-σ, + σ] and 1/16 elsewhere.
Deux variantes de réalisation du post-traitement sont décrites aux figures 9 à 12. Les fonctions correspondantes sont notées respectivement C2(X) et C3(X).Two variants of the post-processing are described in FIGS. 9 to 12. The corresponding functions are denoted respectively C 2 (X) and C 3 (X).
Le post-traitement C2(x) montré aux figures 9 et 10 est identique à C-ι(x) mais avec une valeur du "seuil de déclenchement" qui passe de +/-σ à +/-2σ. Ainsi, la pente de C2(x) est de 1 entre [-2σ,+2σ] et de 1/16 ailleurs.The post-processing C 2 (x) shown in FIGS. 9 and 10 is identical to C-ι (x) but with a value of the "trigger threshold" which goes from +/- σ to +/- 2σ. Thus, the slope of C 2 (x) is 1 between [-2σ, + 2σ] and 1/16 elsewhere.
Le post-traitement C3(x) est une variante plus évoluée de Ci(x), dans laquelle la compression d'amplitude est réalisée en deux étapes successives. Comme montré à la figure 11 , l'intervalle de déclenchement est toujours fixé à [-σ,+σ] (blocs 1402 et 1406), par contre la valeur de y est atténuée seulement d'un facteur Vi, sauf si la valeur de y modifiée par les blocs 1403 et 1407 est en dehors de l'intervalle [-2,5 σ,+2,5 σ] auquel cas la valeur de y est encore modifiée par les blocs 1405 et 1409. Le fonctionnement de C3(x) est illustré à la figure 12 où l'on peut voir que la pente de C3(x) est de : - 1/16 sur [-∞, -4σ] et [4σ, +∞], - 1/2 sur [-Aa, -σ] et [σ, 4σ] et - 1 sur [-σ,+σ]. The post-processing C 3 (x) is a more evolved variant of Ci (x), in which the amplitude compression is performed in two successive steps. As shown in FIG. 11, the trip interval is always set to [-σ, + σ] (blocks 1402 and 1406), whereas the value of y is only attenuated by a factor Vi, unless the value of y modified by blocks 1403 and 1407 is outside the range [-2.5 σ, + 2.5 σ] in which case the value of y is further modified by blocks 1405 and 1409. The operation of C 3 ( x) is illustrated in Figure 12 where we can see that the slope of C 3 (x) is: - 1/16 on [-∞, -4σ] and [4σ, + ∞], - 1/2 on [-Aa, -σ] and [σ, 4σ] and - 1 on [-σ, + σ].

Claims

REVENDICATIONS
1. Procédé de post-traitement, dans un décodeur audio, d'un signal reconstruit par mises en forme temporelle et fréquentielle (805,807) d'un signal d'excitation obtenu à partir d'au moins un paramètre estimé dans une première bande de fréquence, lesdites mises en forme temporelle et fréquentielle étant réalisées à partir, au moins, d'une enveloppe temporelle et d'une enveloppe fréquentielle reçues et décodées (801 , 802) dans une deuxième bande de fréquence, caractérisé en ce que ledit procédé comprend, après lesdites mises en forme (805,807), les étapes consistant à comparer l'amplitude dudit signal reconstruit à ladite enveloppe temporelle reçue et décodée (σ), et, en cas de dépassement d'au moins un seuil fonction de ladite enveloppe temporelle, à appliquer audit signal reconstruit une compression d'amplitude.1. A method of post-processing, in an audio decoder, a signal reconstructed by temporal and frequency formatting (805,807) of an excitation signal obtained from at least one estimated parameter in a first band of frequency, said temporal and frequency shaping being made from, at least, a time envelope and a frequency envelope received and decoded (801, 802) in a second frequency band, characterized in that said method comprises after said shaping (805,807), the steps of comparing the amplitude of said reconstructed signal with said received and decoded time envelope (σ), and, in case of exceeding at least one threshold depending on said time envelope, applying to said reconstructed signal an amplitude compression.
2. Procédé selon la revendication 1 , caractérisé en ce que ladite enveloppe temporelle reçue et décodée (σ) est définie comme une valeur efficace (r.m.s.) par sous-trames du signal de la deuxième bande de fréquence (Xhi). 2. Method according to claim 1, characterized in that said received and decoded time envelope (σ) is defined as an effective value (r.m.s.) by subframes of the signal of the second frequency band (Xhi).
3. Procédé selon l'une quelconque des revendications 1 à 2, caractérisé en ce que ladite compression d'amplitude consiste à appliquer à l'amplitude dudit signal reconstruit au moins une atténuation linéaire si ladite amplitude est supérieure à au moins un seuil de déclenchement fonction de ladite enveloppe temporelle reçue et décodée (σ). 3. Method according to any one of claims 1 to 2, characterized in that said amplitude compression consists in applying to the amplitude of said reconstructed signal at least one linear attenuation if said amplitude is greater than at least one trigger threshold function of said received and decoded time envelope (σ).
4. Procédé selon l'une des revendications 1 à 3, caractérisé en ce que ladite compression d'amplitude s'effectue selon une loi d'atténuation linéaire par morceaux déclenchée par une pluralité de seuils de déclenchement fonction de ladite enveloppe temporelle reçue et décodée. 4. Method according to one of claims 1 to 3, characterized in that said amplitude compression is performed according to a piecewise linear attenuation law triggered by a plurality of trigger thresholds according to said received and decoded time envelope .
5. Programme d'ordinateur comprenant des instructions de code de programme pour la mise en œuvre du procédé de post-traitement selon l'une quelconque des revendications 1 à 4 lorsque ledit programme est exécuté sur un ordinateur. A computer program comprising program code instructions for implementing the post-processing method according to any one of claims 1 to 4 when said program is run on a computer.
6. Module de post-traitement, dans un décodeur audio, d'un signai reconstruit par mises en forme temporelle et fréquentielle d'un signal d'excitation obtenu à partir d'au moins un paramètre estimé dans une première bande de fréquence, lesdites mises en forme temporelle et fréquentielle étant réalisées à partir, au moins, d'une enveloppe temporelle et d'une enveloppe fréquentielle reçues et décodées dans une deuxième bande de fréquence, caractérisé en ce que ledit module (808) de post-traitement comprend un comparateur de l'amplitude dudit signal reconstruit à ladite enveloppe temporelle reçue et décodée (σ) et des moyens de compression d'amplitude aptes, en cas de dépassement d'au moins un seuil fonction de ladite enveloppe temporelle, à appliquer audit signal reconstruit une compression d'amplitude.6. Module for post-processing, in an audio decoder, a signal reconstructed by temporal and frequency formatting of an excitation signal obtained from at least one estimated parameter in a first frequency band, said temporal and frequency shaping being made from at least one time envelope and a frequency envelope received and decoded in a second frequency band, characterized in that said post-processing module (808) comprises a comparing the amplitude of said reconstructed signal to said received and decoded time envelope (σ) and amplitude compression means capable, in case of exceeding at least one threshold depending on said time envelope, to apply to said reconstructed signal a amplitude compression.
7. Décodeur audio, comprenant un module (501 ) d'estimation d'au moins un paramètre d'un signal d'excitation dans une première bande de fréquence, un module (803) de reconstruction d'un signal d'excitation à partir dudit paramètre, un module (801 ) de décodage d'une enveloppe temporelle (σ) dans une deuxième bande de fréquence, un module (802) de décodage d'une enveloppe fréquentielle dans une deuxième bande de fréquence, un module (805) de mise en forme temporelle dudit signal d'excitation, au moyen, au moins, de ladite enveloppe temporelle décodée (σ) et un module (807) de mise en forme fréquentielle dudit signal d'excitation, au moyen, au moins, de ladite enveloppe fréquentielle décodée, caractérisé en ce que ledit décodeur comprend en outre un module (808) de post-traitement selon la revendication 6. 7. Audio decoder, comprising a module (501) for estimating at least one parameter of an excitation signal in a first frequency band, a module (803) for reconstructing an excitation signal from of said parameter, a module (801) for decoding a time envelope (σ) in a second frequency band, a module (802) for decoding a frequency envelope in a second frequency band, a module (805) for temporally shaping said excitation signal, by means of, at least, said decoded time envelope (σ) and a frequency forming module (807) of said excitation signal, by means of, at least, said envelope frequency decoded, characterized in that said decoder further comprises a post-processing module (808) according to claim 6.
8. Décodeur selon la revendication 7, caractérisé en ce qu'il comprend un module (806) d'interpolation d'enveloppe fréquentielle. 8. Decoder according to claim 7, characterized in that it comprises a module (806) frequency envelope interpolation.
EP07731774A 2006-03-20 2007-03-20 Method for post-processing a signal in an audio decoder Withdrawn EP2005424A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0650954 2006-03-20
PCT/FR2007/050959 WO2007107670A2 (en) 2006-03-20 2007-03-20 Method for post-processing a signal in an audio decoder

Publications (1)

Publication Number Publication Date
EP2005424A2 true EP2005424A2 (en) 2008-12-24

Family

ID=37500047

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07731774A Withdrawn EP2005424A2 (en) 2006-03-20 2007-03-20 Method for post-processing a signal in an audio decoder

Country Status (6)

Country Link
US (1) US20090299755A1 (en)
EP (1) EP2005424A2 (en)
JP (1) JP5457171B2 (en)
KR (1) KR101373207B1 (en)
CN (1) CN101405792B (en)
WO (1) WO2007107670A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101008508B1 (en) * 2006-08-15 2011-01-17 브로드콤 코포레이션 Re-phasing of decoder states after packet loss
JP4932917B2 (en) 2009-04-03 2012-05-16 株式会社エヌ・ティ・ティ・ドコモ Speech decoding apparatus, speech decoding method, and speech decoding program
EP2362376A3 (en) 2010-02-26 2011-11-02 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Apparatus and method for modifying an audio signal using envelope shaping
CN103069484B (en) * 2010-04-14 2014-10-08 华为技术有限公司 Time/frequency two dimension post-processing
JP5997592B2 (en) 2012-04-27 2016-09-28 株式会社Nttドコモ Speech decoder
CN110890101B (en) * 2013-08-28 2024-01-12 杜比实验室特许公司 Method and apparatus for decoding based on speech enhancement metadata
JP6035270B2 (en) * 2014-03-24 2016-11-30 株式会社Nttドコモ Speech decoding apparatus, speech encoding apparatus, speech decoding method, speech encoding method, speech decoding program, and speech encoding program

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07193548A (en) * 1993-12-25 1995-07-28 Sony Corp Noise reduction processing method
US5945932A (en) * 1997-10-30 1999-08-31 Audiotrack Corporation Technique for embedding a code in an audio signal and for detecting the embedded code
GB2351889B (en) * 1999-07-06 2003-12-17 Ericsson Telefon Ab L M Speech band expansion
WO2001022401A1 (en) * 1999-09-20 2001-03-29 Koninklijke Philips Electronics N.V. Processing circuit for correcting audio signals, receiver, communication system, mobile apparatus and related method
JP3810257B2 (en) * 2000-06-30 2006-08-16 松下電器産業株式会社 Voice band extending apparatus and voice band extending method
SE0004818D0 (en) * 2000-12-22 2000-12-22 Coding Technologies Sweden Ab Enhancing source coding systems by adaptive transposition
US7590525B2 (en) * 2001-08-17 2009-09-15 Broadcom Corporation Frame erasure concealment for predictive speech coding based on extrapolation of speech waveform
US7173966B2 (en) * 2001-08-31 2007-02-06 Broadband Physics, Inc. Compensation for non-linear distortion in a modem receiver
US6895375B2 (en) * 2001-10-04 2005-05-17 At&T Corp. System for bandwidth extension of Narrow-band speech
US6988066B2 (en) * 2001-10-04 2006-01-17 At&T Corp. Method of bandwidth extension for narrow-band speech
US20030187663A1 (en) * 2002-03-28 2003-10-02 Truman Michael Mead Broadband frequency translation for high frequency regeneration
CA2457988A1 (en) * 2004-02-18 2005-08-18 Voiceage Corporation Methods and devices for audio compression based on acelp/tcx coding and multi-rate lattice vector quantization
US7720230B2 (en) * 2004-10-20 2010-05-18 Agere Systems, Inc. Individual channel shaping for BCC schemes and the like
US8204261B2 (en) * 2004-10-20 2012-06-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Diffuse sound shaping for BCC schemes and the like
CN1937496A (en) 2005-09-21 2007-03-28 日电(中国)有限公司 Extensible false name certificate system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
KR20080109038A (en) 2008-12-16
US20090299755A1 (en) 2009-12-03
WO2007107670A3 (en) 2007-11-08
CN101405792B (en) 2012-09-05
KR101373207B1 (en) 2014-03-12
WO2007107670A2 (en) 2007-09-27
CN101405792A (en) 2009-04-08
JP5457171B2 (en) 2014-04-02
JP2009530679A (en) 2009-08-27

Similar Documents

Publication Publication Date Title
EP1989706B1 (en) Device for perceptual weighting in audio encoding/decoding
EP1907812B1 (en) Method for switching rate- and bandwidth-scalable audio decoding rate
EP1905010B1 (en) Hierarchical audio encoding/decoding
EP2115741B1 (en) Advanced encoding / decoding of audio digital signals
EP2366177B1 (en) Encoding of an audio-digital signal with noise transformation in a scalable encoder
EP2002428B1 (en) Method for trained discrimination and attenuation of echoes of a digital signal in a decoder and corresponding device
EP2452337B1 (en) Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals
EP2452336B1 (en) Improved coding /decoding of digital audio signals
WO2007096551A2 (en) Method for binary coding of quantization indices of a signal envelope, method for decoding a signal envelope and corresponding coding and decoding modules
EP2277172A1 (en) Concealment of transmission error in a digital signal in a hierarchical decoding structure
EP2080195A1 (en) Synthesis of lost blocks of a digital audio signal, with pitch period correction
EP2586133A1 (en) Controlling a noise-shaping feedback loop in a digital audio signal encoder
WO2007107670A2 (en) Method for post-processing a signal in an audio decoder
EP2936488B1 (en) Effective attenuation of pre-echos in a digital audio signal
EP2347411B1 (en) Pre-echo attenuation in a digital audio signal
EP2652735B1 (en) Improved encoding of an improvement stage in a hierarchical encoder
FR2737360A1 (en) Audio digital signal coding method of successive sample blocks - using spectral analysis to select vector dictionary for each sample block and allocating vector and scalar quantisation bits

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081010

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20090130

RIN1 Information on inventor provided before grant (corrected)

Inventor name: RAGOT, STEPHANE

Inventor name: GUILLAUME, CYRIL

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/04 20130101AFI20170223BHEP

Ipc: G10L 21/02 20130101ALI20170223BHEP

Ipc: G10L 19/24 20130101ALI20170223BHEP

INTG Intention to grant announced

Effective date: 20170310

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20170721

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/02 20130101ALI20170223BHEP

Ipc: G10L 19/24 20130101ALI20170223BHEP

Ipc: G10L 19/04 20130101AFI20170223BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/04 20130101AFI20170223BHEP

Ipc: G10L 19/24 20130101ALI20170223BHEP

Ipc: G10L 21/02 20130101ALI20170223BHEP