EP2139000A1 - Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal - Google Patents
Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal Download PDFInfo
- Publication number
- EP2139000A1 EP2139000A1 EP08159018A EP08159018A EP2139000A1 EP 2139000 A1 EP2139000 A1 EP 2139000A1 EP 08159018 A EP08159018 A EP 08159018A EP 08159018 A EP08159018 A EP 08159018A EP 2139000 A1 EP2139000 A1 EP 2139000A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- signal
- encoding
- mlt
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 30
- 230000005236 sound signal Effects 0.000 claims abstract description 8
- 230000001131 transforming effect Effects 0.000 claims description 9
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 4
- 230000003287 optical effect Effects 0.000 claims 1
- 238000001228 spectrum Methods 0.000 abstract description 14
- 239000000872 buffer Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000001934 delay Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000003139 buffering effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000969688 Homo sapiens Macrophage-expressed gene 1 protein Proteins 0.000 description 1
- 102100021285 Macrophage-expressed gene 1 protein Human genes 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the invention relates to a method and to an apparatus for encoding or decoding a speech and/or non-speech audio input signal.
- a disadvantage of the known audio/speech codecs is a clear dependency of the coding quality on the types of content, i.e. music-like audio signals are best coded by audio codecs and speech-like audio signals are best coded by speech codecs.
- No known codec is holding a dominant position for mixed speech/music content.
- a problem to be solved by the invention is to provide a good codec performance for both, speech and music, and to further improve the codec performance for such mixed signals. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses that utilise these methods are disclosed in claims 2 and 4.
- the inventive joined speech/audio codec uses speech coding techniques as well as audio transform coding techniques.
- Known transform-based audio coding processing is combined in an advantageous way with linear prediction-based speech coding processing using one or more Modulated Lapped Transform (MLT) at the codec input and one or more inverse Modulated Lapped Transform (IMLT) at the codec output.
- MLT Modulated Lapped Transform
- IMLT inverse Modulated Lapped Transform
- the MLT output spectrum is separated into frequency bins (low frequencies) assigned to the speech coding section of the codec, and the remaining frequency bins (high frequencies) assigned to the transform-based coding section of the codec, wherein the transform length at the codec input and output can be switched signal adaptively.
- the transform length can be switched input signal adaptively.
- the invention achieves a uniform good codec quality for both speech-like and music-like audio signals, especially for very low bit rates but also for higher bit rates.
- the inventive method is suited for encoding a speech and/or non-speech audio input signal, including the steps:
- the inventive apparatus is suited for encoding a speech and/or non-speech audio input signal, said apparatus including means being adapted for:
- the inventive method is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above method, said decoding method including the steps:
- the inventive apparatus is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above encoding method, said apparatus including means being adapted for:
- coding processing for speech-like signals linear prediction based speech coding processing, e.g. CELP, ACELP, cf. ISO/IEC 14496-3, Subparts 2 and 3, and MPEG4-CELP
- state-of-the-art coding processing for general audio or music-like signals based on a time-frequency transform, e.g. MDCT.
- the PCM audio input signal IS is transformed by a Modulated Lapped Transform MLT having a pre-determined length in step/stage 10.
- MLT Modulated Lapped Transform
- a Modified Discrete Cosine Transform MDCT is appropriate for audio coding applications.
- the MDCT was first called by Princen and Bradley "Oddly-stacked Time Domain Alias Cancellation Transform" and was published in John P. Princen and Alan B. Bradley, "Analysis/synthesis filter bank design based on time domain aliasing cancellation", IEEE Transactions on Acoustics Speech Sigal Processing ASSP-34 (5), pp.1153-1161, 1986 .
- the obtained spectrum is separated into frequency bins belonging to the speech band (representing a low band signal) and the remaining bins (high frequencies) representing a remaining band signal RBS.
- the speech band bins are transformed back into time domain using the inverse MLT, e.g. an inverse MDCT, with a short transform length with respect to the pre-determined length used in step/stage 10.
- the resulting time signal has a lower sampling frequency than the input time signal and contains only the corresponding frequencies of the speech band bins.
- the generated time domain signal is then used as input signal for a speech encoding step/stage 12.
- the output of the speech encoding can be transmitted in the output bit stream OBS, depending on a decision made by a below-described speech/audio switch 15.
- the encoded 'speech' signal is decoded in a related speech decoding step/stage 13, and the decoded 'speech' signal is transformed back into frequency domain in step/stage 14 using the MLT corresponding to the inverse MLT of step/stage 11 (i.e. an 'opposite type' MLT having the short length) in order to re-generate the speech band signal, i.e. a reconstructed speech signal RSS.
- That switch it is decided, whether the original low frequency bins are coded together with the remaining high frequency bins (this indicates that the coded 'speech' signal is not transmitted in bit stream OBS), or the difference signal DS is coded together with the remaining high frequency bins in a following quantisation&coding step/stage 16 (this indicates that the coded 'speech' signal is transmitted in bit stream OBS).
- That switch may be operated by using a rate-distortion optimisation.
- An information item SWI about the decision of switch 15 is included in bit stream OBS for use in the decoding.
- step/stage 16 In this switch, but also in the other steps/stages, the different delays introduced by the cascaded transforms are to be taken into account.
- the different delays can be balanced using corresponding buffering for these steps/stages. It is possible to use a mixture of original frequency bins and difference signal frequency bins in the low frequency band as input to step/stage 16. In such case, information about how that mixture is composed is conveyed to the decoding side.
- the remaining frequency bins output by step/stage 10 i.e. the high frequencies
- step/stage 16 an appropriate quantisation is used (e.g. like the quantisation techniques used in AAC), and subsequently the quantised frequency bins are coded using e.g. Huffman coding or arithmetic coding.
- the speech/audio switch 15 decides that a music-like input signal is present and therefore the speech coder/decoder or its output is not used at all, the original frequency bins corresponding to the speech band are to be encoded (together with the remaining frequency bins) in the quantisation&coding step/stage 16.
- the quantisation&coding step/stage 16 is controlled by a psycho-acoustic model calculation 18 that exploits masking properties of the input signal IS for the quantisation. Therefore side information SI can be transmitted in the bit stream multiplex to the decoder.
- Switch 15 can also receive suitable control information (e.g. degree of tonality or spectral flatness, or how noise-like the signal is) from psycho-acoustic model step/stage 18.
- a bit stream multiplexer step/stage 17 combines the output code (if present) of the speech encoder 12, the switch information of switch 15, the output code of the quantisation&coding step/stage 16, and optionally side information code SI, and provides the output bit stream OBS.
- inverse MDCT inverse MDCT
- iMDCT inverse MDCT
- the inverse MLT steps/stages 22 are arranged between a first grouping step/stage 21 and a second grouping step/stage 23 and provide a doubled number of output values.
- the number of combined MLT bins which means the transform length of the inverse MLT, defines the resulting time and frequency resolution, wherein a longer inverse MLTs delivers a higher time resolution.
- overlap/add is performed (optionally involving application of window functions) and the output of the inverse MLTs applied on the same input spectrum is sorted such that it results in several (the quantity depends on the size of the inverse MLTs) temporally successive 'short block' spectra which are quantised and coded in step/stage 16.
- the information about this 'short block coding' mode being used is included in the side information SI.
- multiple 'short block coding' modes with different inverse MLT transform lengths can be used and signalled in SI.
- a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies.
- the inverse MLT can get a length of 2 successive frequency bins and for the highest frequencies the inverse MLT can get a length of 16 successive frequency bins.
- a non-uniform frequency resolution is chosen, it is not possible to group e.g. 8 short block spectra.
- a different order of coding the resulting frequency bins can be used, for example one 'spectrum' may contain not only different frequency bins at a time, but also the same frequency bins at different points in time may be included.
- the input signal IS adaptive switching between the processing according to Fig. 1 and the processing according to Fig. 2 is controlled by psycho-acoustic model step/stage 18. For example, if from one frame to the following frame the signal energy in input signal IS rises above a threshold (i.e. there is a transient in the input signal), the processing according to Fig. 2 is carried out. In case the signal energy is below that threshold, the processing according to Fig. 1 is carried out.
- This switching information is included in output bitstream OBS for a corresponding switching in the decoding.
- the transform block sections can be weighted by a window function, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length.
- Analysis and synthesis windows can be identical, but need not.
- a further window function is disclosed in table 7.33. of the AC-3 audio coding standard.
- transition window functions are used, e.g. as described in B.Edler, "Cod mich von Audiosignalen mit überlappender Transformation und adaptiven Novafunktionen", FREQUENZ, vol.43, pp.252-256, 1989 , or as used in mp3 and described in the MPEG1 standard ISO/IEC 11172-3 in particular section 2.4.3.4.10.3, or as in AAC (e.g. as described in the MPEG4 standard ISO/IEC 14496-3, Subpart 4).
- the received or replayed bit stream OBS is demultiplexed in a corresponding step/stage 37, thereby providing code (if present) for the speech decoder 33, the switch information SWI for switch 35, the code and the switching information for the decoding step/stage 36, and optionally side information code SI.
- code if present
- the speech subcoder 11,12,13,14 was used at encoding side for a current data frame, in that current frame the corresponding encoded speech band frequency bins are correspondingly reconstructed by the speech decoding step/stage 33 and the downstream MLT step/stage 34, thereby providing the reconstructed speech signal RSS.
- the remaining encoded frequency bins are correspondingly decoded in decoding step/stage 36, whereby the encoder-side quantisation operation is reversed correspondingly.
- the speech/audio switch 35 operates corresponding to its operation at encoding side, controlled by switch information SWI.
- switch information SWI indicates that a music-like input signal is present in the current frame and therefore the speech coding/decoding was not used
- the frequency bins corresponding to the low band are decoded together with the remaining frequency bins in the decoding step/stage 36, thereby providing the reconstructed remaining band signal RRBS and the reconstructed low band signal RLBS.
- the output signal or signals of step/stage 36 and of switch 35 are correspondingly combined in inverse MLT (e.g.
- iMDCT iMDCT
- switch 35 and in the other steps/stages the different delays introduced by the cascaded transforms are to be taken into account.
- the different delays can be balanced using corresponding buffering for these steps/stages.
- the corresponding option was used at encoding side, not the frequency bins of the combined signal CS, but the frequency bins of the reconstructed speech signal RSS are used for the corresponding processing in switch 35 and in step/stage 30, i.e. in step/stages 16 and 36, respectively, there is no coding/decoding at all of the low band spectrum.
- step/stage 36 of the 'short block mode' is illustrated in Fig. 4 .
- the decoding in step/stage 36 of the 'short block mode' is illustrated in Fig. 4 .
- several temporally successive 'short block' spectra are to be decoded in step/stage 36 and collected in a first grouping step/stage 43. Overlap/add is performed (optionally involving application of window functions). Thereafter each set of temporally successive spectral coefficients is transformed using the corresponding MLT steps/stages 42, and provides a halved number of output values.
- the generated spectral coefficients are then grouped in a second grouping step/stage 41 to one MLT spectrum with the initial high frequency resolution and transform length.
- multiple 'short block decoding' modes with different MLT transform lengths can be used as signalled in SI, whereby a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies.
- a different cascading of the MLTs can be used wherein the order of the inner MLT/inverse MLT pair in the speech encoder is switched.
- Fig. 5 a block diagram of a corresponding encoding is depicted, wherein Fig. 1 reference signs mean the same operation as in Fig. 1 .
- the inverse MLT 11 is replaced by an MLT step/stage 51
- the MLT 14 is replaced by an inverse MLT step/stage 54 (i.e. an 'opposite type' MLT). Due to the exchanged order of these MLTs the speech encoder input signal has different properties compared to those in Fig. 1 . Therefore the speech coder 52 and the speech decoder 53 are adapted to these different properties (e.g. such that aliasing components are cancelled out).
- a 'short block mode' processing can be used as shown in Fig. 6 , wherein MLT steps/stages 62 corresponding to that in Fig. 4 replace the inverse MLT steps/stages 22 in Fig. 2 .
- the speech decoding step/stage 33 in Fig. 3 is replaced by a correspondingly adapted speech decoding step/stage 73 and the MLT step/stage 34 in Fig. 3 is replaced by a corresponding inverse MLT step/stage 74.
- a 'short block mode' processing can be used as shown in Fig. 8 , wherein corresponding inverse MLT steps/stages 82 corresponding to that in Fig. 1 replace the MLT steps/stages 42 in Fig. 4 .
- a different way of block switching is carried out.
- a fixed large MLT 10 e.g. an MDCT
- several short MLTs (or MDCTs) 90 can be switched on.
- a fixed large MLT 10 e.g. an MDCT
- 8 short MDCTs with a transform length of 256 samples can be used.
- the sum of the lengths of the short transforms is equal to the long transform length (although it makes buffer handling even more easier).
- the internal buffer handling is easier than for the long/short block mode switching according to figures 1 to 8 , at the cost of a less sharp band separation between the speech frequency band and the remaining frequency band.
- the reason for the internal buffer handling being easier is as follows: at least for each inverse MLT operation an additional buffer is required, which leads in case of an inner transform to the necessity of an additional buffer also in the parallel high frequency path. Therefore the switching at the outmost transform has the least side effects concerning buffers.
- the short blocks are used only for encoding transient input signals, the sharp separation in time domain is more important.
- Fig. 9 the Fig. 1 reference signs do mean the same operation as in Fig. 1 .
- the MLT 10 is input signal IS adaptively replaced by short MLT steps/stages 90, the inverse MLT 11 is replaced by shorter inverse MLT steps/stages 91, and the MLT 14 is replaced by shorter MLT steps/stages 94. Due to this kind of blocks switching, the lengths of the first transform 90, 30 and the second transform 11, 34, 51, 74 (iMDCT to reconstruct the speech band) and the third transform 14, 54 are coordinated. Furthermore, several short blocks of the speech band signal can be buffered after the iMDCT 91 in Fig. 9 in order to collect enough samples for a complete input frame for the speech coder.
- the encoding of Fig. 9 can also be adapted correspondingly to the encoding described for Fig. 5 .
- the decoding according to Fig. 3 is adapted correspondingly, i.e. the inverse MLTs 34 and 30 are each replaced by corresponding adaptively switched shorter inverse MLTs.
- the transform block sections are weighted at encoding side in MLT 90 and at decoding side in inverse MLT 30 by window functions, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length. In case of switching the transform length, to achieve a smooth transition between long and short blocks, especially shaped long windows (the start and stop windows, or transition windows) are used.
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- The invention relates to a method and to an apparatus for encoding or decoding a speech and/or non-speech audio input signal.
- Several wideband or speech/audio codecs are known, for example:
- S. Ragot et al., "ITU-T G.729.1: An 8-32 Kbit/s scalable coder interoperable with G.729 for wideband telephony and voice over IP", IEEE International Conference on Acoustics, Speech and Signal Processing 2007, ICASSP 2007, vol.4, pp.IV-529 to IV-532.
- This wideband speech coder includes an embedded G.729 speech coder, which is used permanently. Therefore the quality for music-like signals (non-speech) is not very good. Although this coder uses transform coding techniques it is a speech coder.
- S.A. Ramprashad, "A two stage hybrid embedded speech/audio coding structure", Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing 1998, ICASSP 1998, vol.1, pp.337-340.
This coder uses a principle structure similar to that of the above-mentioned coder. The processing is based on time domain signals, which implies a difficult handling of the delay in the core encoder/decoder (speech coder). Therefore the processing is based on a common transform in order to reduce this problem. Again, the core coder (i.e. the speech coder) is used permanently, which results in a non-optimal quality for music like (non-speech) signals. - M. Purat, P. Noll, "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1995, pp.183-186.
- M. Purat, P. Noll, "Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms", IEEE International Conference on Acoustics, Speech, and Signal Processing 1996, ICASSP 1996, vol.2, pp.1021-1024.
- A disadvantage of the known audio/speech codecs is a clear dependency of the coding quality on the types of content, i.e. music-like audio signals are best coded by audio codecs and speech-like audio signals are best coded by speech codecs. No known codec is holding a dominant position for mixed speech/music content.
- A problem to be solved by the invention is to provide a good codec performance for both, speech and music, and to further improve the codec performance for such mixed signals. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses that utilise these methods are disclosed in claims 2 and 4.
- The inventive joined speech/audio codec uses speech coding techniques as well as audio transform coding techniques. Known transform-based audio coding processing is combined in an advantageous way with linear prediction-based speech coding processing using one or more Modulated Lapped Transform (MLT) at the codec input and one or more inverse Modulated Lapped Transform (IMLT) at the codec output. The MLT output spectrum is separated into frequency bins (low frequencies) assigned to the speech coding section of the codec, and the remaining frequency bins (high frequencies) assigned to the transform-based coding section of the codec, wherein the transform length at the codec input and output can be switched signal adaptively.
As an alternative, in the transform-based coding/decoding sections the transform length can be switched input signal adaptively. - The invention achieves a uniform good codec quality for both speech-like and music-like audio signals, especially for very low bit rates but also for higher bit rates.
- In principle, the inventive method is suited for encoding a speech and/or non-speech audio input signal, including the steps:
- transforming successive and possibly overlapping sections of said input signal by at least one initial MLT transform and splitting the resulting output frequency bins into a low band signal and a remaining band signal;
- passing said low band signal to a speech/audio switching and through a speech coding/decoding loop including at least one short first-type MLT transform, a speech encoding, a corresponding speech decoding, and at least one short second-type MLT transform having a type opposite than that of said first-type short MLT transform;
- quantising and encoding said remaining band signal, controlled by a psycho-acoustic model that receives as its input said audio input signal;
- combining the output signal of said quantising and encoding, a switching information signal of said switching, possibly the output signal of said speech encoding, and optionally other encoding side information, in order to form for said current section of said input signal an output bit stream,
- In principle the inventive apparatus is suited for encoding a speech and/or non-speech audio input signal, said apparatus including means being adapted for:
- transforming successive and possibly overlapping sections of said input signal by at least one initial MLT transform and splitting the resulting output frequency bins into a low band signal and a remaining band signal;
- passing said low band signal to a speech/audio switching and through a speech coding/decoding loop including at least one short first-type MLT transform, a speech encoding, a corresponding speech decoding, and at least one short second-type MLT transform having a type opposite than that of said first-type short MLT transform;
- quantising and encoding said remaining band signal, controlled by a psycho-acoustic model that receives as its input said audio input signal;
- combining the output signal of said quantising and encoding, a switching information signal of said switching, possibly the output signal of said speech encoding, and optionally other encoding side information, in order to form for said current section of said input signal an output bit stream,
- In principle, the inventive method is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above method, said decoding method including the steps:
- demultiplexing successive sections of said bitstream to regain the output signal of said quantising and encoding, said switching information signal, possibly the output signal of said speech encoding, and said encoding side information if present;
- if present in a current section of said bitstream, passing said output signal of said speech encoding through a speech decoding and said short second-type MLT transform;
- decoding said output signal of said quantising and encoding, controlled by said encoding side information if present, in order to provide for said current section a reconstructed remaining band signal and a reconstructed low band signal;
- providing a speech/audio switching with said reconstructed low band signal and a second input signal derived from the output of said second-type MLT transform, and passing according to said switching information signal either said reconstructed low band signal or said second input signal;
- inversely MLT transforming the output signal of said switching combined with said reconstructed remaining band signal, and possibly overlapping successive sections, in order to form a current section of the reconstructed output signal.
- In principle the inventive apparatus is suited for decoding a bit stream representing an encoded speech and/or non-speech audio input signal that was encoded according to the above encoding method, said apparatus including means being adapted for:
- demultiplexing successive sections of said bitstream to regain the output signal of said quantising and encoding, said switching information signal, possibly the output signal of said speech encoding, and said encoding side information if present;
- if present in a current section of said bitstream, passing said output signal of said speech encoding through a speech decoding and said short second-type MLT transform;
- decoding said output signal of said quantising and encoding, controlled by said encoding side information if present, in order to provide for said current section a reconstructed remaining band signal and a reconstructed low band signal;
- providing a speech/audio switching with said reconstructed low band signal and a second input signal derived from the output of said second-type MLT transform, and passing according to said switching information signal either said reconstructed low band signal or said second input signal;
- inversely MLT transforming the output signal of said switching combined with said reconstructed remaining band signal, and possibly overlapping successive sections, in order to form a current section of the reconstructed output signal.
- Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
- Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
- Fig. 1
- Block diagram of the inventive joint speech and audio coder;
- Fig. 2
- Higher time resolution processing in the 'quantisation&coding' step/stage (short block coding);
- Fig. 3
- Block diagram of the inventive joint speech and audio decoder;
- Fig. 4
- Higher time resolution processing in the 'decoding' step/stage (short block decoding);
- Fig. 5
- Block diagram of an other embodiment of the inventive joint speech and audio coder;
- Fig. 6
- Higher time resolution processing in the 'quantisation&coding' step/stage (short block coding) of the other embodiment;
- Fig. 7
- Block diagram of the inventive joint speech and audio decoder of the other embodiment;
- Fig. 8
- Higher time resolution processing in the 'decoding' step/stage (short block decoding) of the other embodiment;
- Fig. 9
- Block diagram of a further embodiment of the inventive joint speech and audio coder (short block coding) .
- In the inventive joint speech and audio codec according to
Fig. 1 , known coding processing for speech-like signals (linear prediction based speech coding processing, e.g. CELP, ACELP, cf. ISO/IEC 14496-3, Subparts 2 and 3, and MPEG4-CELP) is combined with state-of-the-art coding processing for general audio or music-like signals based on a time-frequency transform, e.g. MDCT. The PCM audio input signal IS is transformed by a Modulated Lapped Transform MLT having a pre-determined length in step/stage 10. As a special processing of an MLT, e.g. a Modified Discrete Cosine Transform MDCT, is appropriate for audio coding applications. The MDCT was first called by Princen and Bradley "Oddly-stacked Time Domain Alias Cancellation Transform" and was published in John P. Princen and Alan B. Bradley, "Analysis/synthesis filter bank design based on time domain aliasing cancellation", IEEE Transactions on Acoustics Speech Sigal Processing ASSP-34 (5), pp.1153-1161, 1986. H.S.Malvar, "Signal processing with lapped transform", Artech House Inc., Norwood, 1992, and M.Temerinac, B.Edler, "A unified approach to lapped or-thogonal transforms", IEEE Transactions on Image Processing, Vol.1, No.1, pp.111-116, Januar 1992, called it Modulated Lapped Transform (MLT) and showed its relations to Lapped orthogonal Transforms in general and also proved it to be a special case of a QMF Filter bank. The Modified Discrete Cosine Transformation (MDCT) and the inverse MDCT (iMDCT) can be regarded as a critically sampled filter-bank with perfect reconstruction properties. The MDCT is calculated by: - At the MLT output the obtained spectrum is separated into frequency bins belonging to the speech band (representing a low band signal) and the remaining bins (high frequencies) representing a remaining band signal RBS. In step/stage 11 the speech band bins are transformed back into time domain using the inverse MLT, e.g. an inverse MDCT, with a short transform length with respect to the pre-determined length used in step/
stage 10. The resulting time signal has a lower sampling frequency than the input time signal and contains only the corresponding frequencies of the speech band bins. The theory behind using only a subset of the MLT bins in an inverse MLT is described in the above-cited 1995 and 1996 Purat articles. - The generated time domain signal is then used as input signal for a speech encoding step/
stage 12. The output of the speech encoding can be transmitted in the output bit stream OBS, depending on a decision made by a below-described speech/audio switch 15. The encoded 'speech' signal is decoded in a related speech decoding step/stage 13, and the decoded 'speech' signal is transformed back into frequency domain in step/stage 14 using the MLT corresponding to the inverse MLT of step/stage 11 (i.e. an 'opposite type' MLT having the short length) in order to re-generate the speech band signal, i.e. a reconstructed speech signal RSS. The difference signal DS between these frequency bins and the original low frequency bins, as well as the original low frequency bins signal, serve as input to the speech/audio switch 15. In that switch it is decided, whether the original low frequency bins are coded together with the remaining high frequency bins (this indicates that the coded 'speech' signal is not transmitted in bit stream OBS), or the difference signal DS is coded together with the remaining high frequency bins in a following quantisation&coding step/stage 16 (this indicates that the coded 'speech' signal is transmitted in bit stream OBS). That switch may be operated by using a rate-distortion optimisation. An information item SWI about the decision ofswitch 15 is included in bit stream OBS for use in the decoding. In this switch, but also in the other steps/stages, the different delays introduced by the cascaded transforms are to be taken into account. The different delays can be balanced using corresponding buffering for these steps/stages.
It is possible to use a mixture of original frequency bins and difference signal frequency bins in the low frequency band as input to step/stage 16. In such case, information about how that mixture is composed is conveyed to the decoding side.
In any case, the remaining frequency bins output by step/stage 10 (i.e. the high frequencies) are processed in quantisation&coding step/stage 16.
In step/stage 16 an appropriate quantisation is used (e.g. like the quantisation techniques used in AAC), and subsequently the quantised frequency bins are coded using e.g. Huffman coding or arithmetic coding. - In case the speech/
audio switch 15 decides that a music-like input signal is present and therefore the speech coder/decoder or its output is not used at all, the original frequency bins corresponding to the speech band are to be encoded (together with the remaining frequency bins) in the quantisation&coding step/stage 16.
The quantisation&coding step/stage 16 is controlled by a psycho-acoustic model calculation 18 that exploits masking properties of the input signal IS for the quantisation. Therefore side information SI can be transmitted in the bit stream multiplex to the decoder.
Switch 15 can also receive suitable control information (e.g. degree of tonality or spectral flatness, or how noise-like the signal is) from psycho-acoustic model step/stage 18.
A bit stream multiplexer step/stage 17 combines the output code (if present) of thespeech encoder 12, the switch information ofswitch 15, the output code of the quantisation&coding step/stage 16, and optionally side information code SI, and provides the output bit stream OBS. - As shown in
Fig. 2 , to achieve a higher time resolution in the transform-based coding, at the input of the quantisation&coding step/stage 16 several small inverse MLT (matching the type of MLT 10) can be used (e.g. inverse MDCT, iMDCT) for transforming 22 the long output spectrum of theinitial MLT 10 having high frequency resolution into several shorter spectra with lower frequency resolution but higher time resolution. The inverse MLT steps/stages 22 are arranged between a first grouping step/stage 21 and a second grouping step/stage 23 and provide a doubled number of output values. Again the theory behind this processing is described in the above-cited 1995 and 1996 Purat articles.
In thefirst grouping 21 several neighbouring MLT bins are combined and used as input for the inverse MLTs 22. The number of combined MLT bins, which means the transform length of the inverse MLT, defines the resulting time and frequency resolution, wherein a longer inverse MLTs delivers a higher time resolution. In the followinggrouping 23, overlap/add is performed (optionally involving application of window functions) and the output of the inverse MLTs applied on the same input spectrum is sorted such that it results in several (the quantity depends on the size of the inverse MLTs) temporally successive 'short block' spectra which are quantised and coded in step/stage 16.
The information about this 'short block coding' mode being used is included in the side information SI. Optionally, multiple 'short block coding' modes with different inverse MLT transform lengths can be used and signalled in SI. Thereby a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies. For instance, for the lowest frequencies the inverse MLT can get a length of 2 successive frequency bins and for the highest frequencies the inverse MLT can get a length of 16 successive frequency bins. In case a non-uniform frequency resolution is chosen, it is not possible to group e.g. 8 short block spectra. A different order of coding the resulting frequency bins can be used, for example one 'spectrum' may contain not only different frequency bins at a time, but also the same frequency bins at different points in time may be included.
The input signal IS adaptive switching between the processing according toFig. 1 and the processing according toFig. 2 is controlled by psycho-acoustic model step/stage 18. For example, if from one frame to the following frame the signal energy in input signal IS rises above a threshold (i.e. there is a transient in the input signal), the processing according toFig. 2 is carried out. In case the signal energy is below that threshold, the processing according toFig. 1 is carried out. This switching information, too, is included in output bitstream OBS for a corresponding switching in the decoding. The transform block sections can be weighted by a window function, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length.
Analysis and synthesis windows can be identical, but need not. The functions of the analysis an synthesis windows hA(n) and hS(n) must fulfil some constraints for the overlapping regions of successive blocks i and i+1 in order to enable a perfect reconstruction: -
-
- A further window function is disclosed in table 7.33. of the AC-3 audio coding standard.
In case of switching the transform length, transition window functions are used, e.g. as described in B.Edler, "Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen", FREQUENZ, vol.43, pp.252-256, 1989, or as used in mp3 and described in the MPEG1 standard ISO/IEC 11172-3 in particular section 2.4.3.4.10.3, or as in AAC (e.g. as described in the MPEG4 standard ISO/IEC 14496-3, Subpart 4). - In the inventive decoder in
Fig. 3 , the received or replayed bit stream OBS is demultiplexed in a corresponding step/stage 37, thereby providing code (if present) for thespeech decoder 33, the switch information SWI forswitch 35, the code and the switching information for the decoding step/stage 36, and optionally side information code SI. In case thespeech subcoder stage 33 and the downstream MLT step/stage 34, thereby providing the reconstructed speech signal RSS. The remaining encoded frequency bins are correspondingly decoded in decoding step/stage 36, whereby the encoder-side quantisation operation is reversed correspondingly. The speech/audio switch 35 operates corresponding to its operation at encoding side, controlled by switch information SWI. In case the switch signal SWI indicates that a music-like input signal is present in the current frame and therefore the speech coding/decoding was not used, the frequency bins corresponding to the low band are decoded together with the remaining frequency bins in the decoding step/stage 36, thereby providing the reconstructed remaining band signal RRBS and the reconstructed low band signal RLBS.
The output signal or signals of step/stage 36 and ofswitch 35 are correspondingly combined in inverse MLT (e.g. iMDCT) step/stage 30 and are synthesised in order to provide the decoded output signal OS. Inswitch 35 and in the other steps/stages, the different delays introduced by the cascaded transforms are to be taken into account. The different delays can be balanced using corresponding buffering for these steps/stages.
In case the corresponding option was used at encoding side, not the frequency bins of the combined signal CS, but the frequency bins of the reconstructed speech signal RSS are used for the corresponding processing inswitch 35 and in step/stage 30, i.e. in step/stages 16 and 36, respectively, there is no coding/decoding at all of the low band spectrum.
In case at encoding side the 'short block mode' encoding was used to achieve a higher time resolution in the transform-based coding, the decoding in step/stage 36 of the 'short block mode' is illustrated inFig. 4 . According to the encoding process, several temporally successive 'short block' spectra are to be decoded in step/stage 36 and collected in a first grouping step/stage 43. Overlap/add is performed (optionally involving application of window functions). Thereafter each set of temporally successive spectral coefficients is transformed using the corresponding MLT steps/stages 42, and provides a halved number of output values. The generated spectral coefficients are then grouped in a second grouping step/stage 41 to one MLT spectrum with the initial high frequency resolution and transform length. Optionally, multiple 'short block decoding' modes with different MLT transform lengths can be used as signalled in SI, whereby a non-uniform time-frequency resolution over the short block spectra is facilitated, e.g. a higher time resolution for high frequencies and a higher frequency resolution for low frequencies. - As an alternative embodiment, a different cascading of the MLTs can be used wherein the order of the inner MLT/inverse MLT pair in the speech encoder is switched. In
Fig. 5 a block diagram of a corresponding encoding is depicted, whereinFig. 1 reference signs mean the same operation as inFig. 1 .
The inverse MLT 11 is replaced by an MLT step/stage 51, and the MLT 14 is replaced by an inverse MLT step/stage 54 (i.e. an 'opposite type' MLT). Due to the exchanged order of these MLTs the speech encoder input signal has different properties compared to those inFig. 1 . Therefore thespeech coder 52 and thespeech decoder 53 are adapted to these different properties (e.g. such that aliasing components are cancelled out). - Like in
Fig. 2 for theFig. 1 embodiment, in decoding step/stage 36 for theFig. 5 embodiment a 'short block mode' processing can be used as shown inFig. 6 , wherein MLT steps/stages 62 corresponding to that inFig. 4 replace the inverse MLT steps/stages 22 inFig. 2 . - In the alternative embodiment decoder shown in
Fig. 7 , the speech decoding step/stage 33 inFig. 3 is replaced by a correspondingly adapted speech decoding step/stage 73 and the MLT step/stage 34 inFig. 3 is replaced by a corresponding inverse MLT step/stage 74. - Like in
Fig. 4 for theFig. 3 embodiment, for theFig. 7 embodiment a 'short block mode' processing can be used as shown inFig. 8 , wherein corresponding inverse MLT steps/stages 82 corresponding to that inFig. 1 replace the MLT steps/stages 42 inFig. 4 . - Instead of achieving a higher time resolution by the processing described in connection with
Fig. 2 andFig. 6 (block switching in the quantisation&coding step/stage 16 and in the decoding step/stage 36), in the further embodiment ofFig. 9 a different way of block switching is carried out. Instead of using a fixed large MLT 10 (e.g. an MDCT) before the separation into speech and audio bands, several short MLTs (or MDCTs) 90 can be switched on. For example, instead of using one MDCT with a transform length of 2048 samples, 8 short MDCTs with a transform length of 256 samples can be used. However, it is not mandatory that the sum of the lengths of the short transforms is equal to the long transform length (although it makes buffer handling even more easier). - Correspondingly, several
short inverse MLTs 91 are used in front ofspeech encoder 12 and severalshort MLTs 94 are used followingspeech decoder 13. Advantageously, for thisFig. 9 long/short block mode switching the internal buffer handling is easier than for the long/short block mode switching according tofigures 1 to 8 , at the cost of a less sharp band separation between the speech frequency band and the remaining frequency band. The reason for the internal buffer handling being easier is as follows: at least for each inverse MLT operation an additional buffer is required, which leads in case of an inner transform to the necessity of an additional buffer also in the parallel high frequency path. Therefore the switching at the outmost transform has the least side effects concerning buffers.
On the other hand, because the short blocks are used only for encoding transient input signals, the sharp separation in time domain is more important. - In
Fig. 9 , theFig. 1 reference signs do mean the same operation as inFig. 1 . TheMLT 10 is input signal IS adaptively replaced by short MLT steps/stages 90, the inverse MLT 11 is replaced by shorter inverse MLT steps/stages 91, and the MLT 14 is replaced by shorter MLT steps/stages 94. Due to this kind of blocks switching, the lengths of thefirst transform second transform third transform 14, 54 are coordinated. Furthermore, several short blocks of the speech band signal can be buffered after theiMDCT 91 inFig. 9 in order to collect enough samples for a complete input frame for the speech coder.
The encoding ofFig. 9 can also be adapted correspondingly to the encoding described forFig. 5 . - Based on the
Fig. 9 embodiment, the decoding according toFig. 3 , or the decoding according toFig. 7 , is adapted correspondingly, i.e. the inverse MLTs 34 and 30 are each replaced by corresponding adaptively switched shorter inverse MLTs.
Based on theFig. 9 embodiment, the transform block sections are weighted at encoding side inMLT 90 and at decoding side ininverse MLT 30 by window functions, in particular in an overlapping manner, wherein the length of a window function corresponds to the current transform length. In case of switching the transform length, to achieve a smooth transition between long and short blocks, especially shaped long windows (the start and stop windows, or transition windows) are used.
Claims (15)
- Method for encoding a speech and/or non-speech audio input signal (IS), said method including the steps:- transforming (10, 90) successive and possibly overlapping sections of said input signal (IS) by at least one initial MLT transform and splitting the resulting output frequency bins into a low band signal and a remaining band signal (RBS);- passing said low band signal to a speech/audio switching (15) and through a speech coding/decoding loop including at least one short first-type MLT transform (11, 51, 91), a speech encoding (12, 52), a corresponding speech decoding (13, 53), and at least one short second-type MLT transform (14, 54, 94) having a type opposite than that of said first-type short MLT transform;- quantising and encoding (16) said remaining band signal (RBS), controlled by a psycho-acoustic model that receives as its input said audio input signal (IS);- combining (17) the output signal of said quantising and encoding (16), a switching information signal (SWI) of said switching (15), possibly the output signal of said speech encoding (12, 52), and optionally other encoding side information (SI), in order to form for said current section of said input signal (IS) an output bit stream (OBS),wherein said speech/audio switching (15) receives said low band signal and a second input signal (DS) derived from the output of said short second-type MLT transform (14, 54, 94) and decides, whether said second input signal bypasses said quantising and encoding (16) step or said low band signal is coded together with said remaining band signal (RBS) in said quantising and encoding (16) step,
and wherein in the latter case said output signal of said speech encoding (12, 52) is not included in the current section of said output bit stream (OBS). - Apparatus for encoding a speech and/or non-speech audio input signal (IS), said apparatus including means being adapted for:- transforming (10, 90) successive and possibly overlapping sections of said input signal (IS) by at least one initial MLT transform and splitting the resulting output frequency bins into a low band signal and a remaining band signal (RBS);- passing said low band signal to a speech/audio switching (15) and through a speech coding/decoding loop including at least one short first-type MLT transform (11, 51, 91), a speech encoding (12, 52), a corresponding speech decoding (13, 53), and at least one short second-type MLT transform (14, 54, 94) having a type opposite than that of said first-type short MLT transform;- quantising and encoding (16) said remaining band signal (RBS), controlled by a psycho-acoustic model that receives as its input said audio input signal (IS);- combining (17) the output signal of said quantising and encoding (16), a switching information signal (SWI) of said switching (15), possibly the output signal of said speech encoding (12, 52), and optionally other encoding side information (SI), in order to form for said current section of said input signal (IS) an output bit stream (OBS),
wherein said speech/audio switching (15) receives said low band signal and a second input signal (DS) derived from the output of said short second-type MLT transform (14, 54, 94) and decides, whether said second input signal bypasses said quantising and encoding (16) step or said low band signal is coded together with said remaining band signal (RBS) in said quantising and encoding (16) step,
and wherein in the latter case said output signal of said speech encoding (12, 52) is not included in the current section of said output bit stream (OBS). - Method for decoding a bit stream (OBS) representing an encoded speech and/or non-speech audio input signal (IS) that was encoded according to the method of claim 1, said decoding method including the steps:- demultiplexing (37) successive sections of said bitstream (OBS) to regain the output signal of said quantising and encoding (16), said switching information signal (SWI), possibly the output signal of said speech encoding (12, 52), and said encoding side information (SI) if present;- if present in a current section of said bitstream (OBS), passing said output signal of said speech encoding through a speech decoding (33, 73) and said short second-type MLT transform (34, 74);- decoding (36) said output signal of said quantising and encoding (16), controlled by said encoding side information (SI) if present, in order to provide for said current section a reconstructed remaining band signal (RRBS) and a reconstructed low band signal (RLBS);- providing a speech/audio switching (15) with said reconstructed low band signal and a second input signal (CS) derived from the output of said second-type MLT transform (34, 74), and passing according to said switching information signal (SWI) either said reconstructed low band signal (RLBS) or said second input signal (CS);- inversely MLT transforming (30) the output signal of said switching (15) combined with said reconstructed remaining band signal (RRBS), and possibly overlapping successive sections, in order to form a current section of the reconstructed output signal (OS).
- Apparatus for decoding a bit stream (OBS) representing an encoded speech and/or non-speech audio input signal (IS) that was encoded according to the method of claim 1, said apparatus including means being adapted for:- demultiplexing (37) successive sections of said bitstream (OBS) to regain the output signal of said quantising and encoding (16), said switching information signal (SWI), possibly the output signal of said speech encoding (12, 52), and said encoding side information (SI) if present;- if present in a current section of said bitstream (OBS), passing said output signal of said speech encoding through a speech decoding (33, 73) and said short second-type MLT transform (34, 74);- decoding (36) said output signal of said quantising and encoding (16), controlled by said encoding side information (SI) if present, in order to provide for said current section a reconstructed remaining band signal (RRBS) and a reconstructed low band signal (RLBS);- providing a speech/audio switching (15) with said reconstructed low band signal and a second input signal (CS) derived from the output of said second-type MLT transform (34, 74), and passing according to said switching information signal (SWI) either said reconstructed low band signal (RLBS) or said second input signal (CS);- inversely MLT transforming (30) the output signal of said switching (15) combined with said reconstructed remaining band signal (RRBS), and possibly overlapping successive sections, in order to form a current section of the reconstructed output signal (OS).
- Method according to claim 1 or 3, or apparatus according to claim 2 or 4, wherein in case a single MLT transform (10) is used at the input of the encoding and a single inverse MLT transform (30) is used at the output of the decoding, input signal (IS) adaptively at the input of said quantisation&coding (16) and at the output of said decoding (36) several short MLT transforms each having a length smaller than the length of said single MLT transform (10) and said single inverse MLT transform (30), respectively, are carried out:either short inverse MLT transforms (22) at the input of said quantisation&coding (16) and short MLT transforms (22) at the output of said decoding (36),or short MLT transforms (62) at the input of said quantisation&coding (16) and short inverse MLT transforms (82) at the output of said decoding (36).
- Method or apparatus according to claim 5, wherein said short MLT transforms and said short inverse MLT transforms, respectively, are carried out if the signal energy in a current section of said input signal (IS) exceeds a threshold level.
- Method according to claim 1 or 3, or apparatus according to claim 2 or 4, wherein at the input of the encoding it is switched input signal (IS) adaptively from a single MLT transform (10) to multiple shorter MLT transforms (90), and at the output of said decoding (36) correspondingly from a single inverse MLT transform (30) to multiple shorter inverse MLT transforms.
- Method or apparatus according to claim 7, wherein said multiple shorter MLT transforms and said multiple shorter inverse MLT transforms, respectively, are carried out if the signal energy in a current section of said input signal (IS) exceeds a threshold level.
- Method according to one of claims 1, 3 and 5 to 8, or apparatus according to one of claims 2 and 4 to 8, wherein said second input signal (DS) is the difference signal between said low band signal and the output signal (RSS) of said second-type MLT transform (14, 54, 94).
- Method according to one of claims 1, 3 and 5 to 8, or apparatus according to one of claims 2 and 4 to 8, wherein said second input signal (DS) said output signal (RSS) of said second-type MLT transform (14, 54, 94).
- Method according to one of claims 1, 3 and 5 to 10, or apparatus according to one of claims 2 and 4 to 10, wherein said switching (15) is controlled by information received from said psycho-acoustic model (18).
- Method according to one of claims 1, 3 and 5 to 11, or apparatus according to one of claims 2 and 4 to 11, wherein said switching (15) is operated by using a rate-distortion optimisation.
- Method according to one of claims 1, 3 and 5 to 12, or apparatus according to one of claims 2 and 4 to 12, wherein successive sections of said input signal (IS) and successive sections for said output signal (OS) are weighted by a window function having a length corresponding to the related transform length, in particular in an overlapping manner, and wherein, if the transform length is switched, corresponding transition window functions are used.
- Digital audio signal that is encoded according to the method of one of claims 1, 3 and to 5 to 13.
- Storage medium, for example an optical disc, that contains or stores, or has recorded on it, a digital audio signal according to claim 14.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP08159018A EP2139000B1 (en) | 2008-06-25 | 2008-06-25 | Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal |
CN2009101503026A CN101615393B (en) | 2008-06-25 | 2009-06-19 | Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP08159018A EP2139000B1 (en) | 2008-06-25 | 2008-06-25 | Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2139000A1 true EP2139000A1 (en) | 2009-12-30 |
EP2139000B1 EP2139000B1 (en) | 2011-05-25 |
Family
ID=39718977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08159018A Not-in-force EP2139000B1 (en) | 2008-06-25 | 2008-06-25 | Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2139000B1 (en) |
CN (1) | CN101615393B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737636A (en) * | 2011-04-13 | 2012-10-17 | 华为技术有限公司 | Audio coding method and device thereof |
CN106033982A (en) * | 2015-03-13 | 2016-10-19 | 中国移动通信集团公司 | Method and device for realizing ultra wide band voice intercommunication and a terminal |
RU2667380C2 (en) * | 2014-06-24 | 2018-09-19 | Хуавэй Текнолоджиз Ко., Лтд. | Method and device for audio coding |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102074242B (en) * | 2010-12-27 | 2012-03-28 | 武汉大学 | Extraction system and method of core layer residual in speech audio hybrid scalable coding |
CN102103859B (en) * | 2011-01-11 | 2012-04-11 | 东南大学 | Methods and devices for coding and decoding digital audio signals |
CN103198834B (en) * | 2012-01-04 | 2016-12-14 | 中国移动通信集团公司 | A kind of acoustic signal processing method, device and terminal |
CN106463134B (en) | 2014-03-28 | 2019-12-13 | 三星电子株式会社 | method and apparatus for quantizing linear prediction coefficients and method and apparatus for inverse quantization |
EP4418266A2 (en) | 2014-05-07 | 2024-08-21 | Samsung Electronics Co., Ltd. | Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1278184A2 (en) * | 2001-06-26 | 2003-01-22 | Microsoft Corporation | Method for coding speech and music signals |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE60323331D1 (en) * | 2002-01-30 | 2008-10-16 | Matsushita Electric Ind Co Ltd | METHOD AND DEVICE FOR AUDIO ENCODING AND DECODING |
KR100467617B1 (en) * | 2002-10-30 | 2005-01-24 | 삼성전자주식회사 | Method for encoding digital audio using advanced psychoacoustic model and apparatus thereof |
DE10328777A1 (en) * | 2003-06-25 | 2005-01-27 | Coding Technologies Ab | Apparatus and method for encoding an audio signal and apparatus and method for decoding an encoded audio signal |
CN1471236A (en) * | 2003-07-01 | 2004-01-28 | 北京阜国数字技术有限公司 | Signal adaptive multi resolution wave filter set for sensing audio encoding |
-
2008
- 2008-06-25 EP EP08159018A patent/EP2139000B1/en not_active Not-in-force
-
2009
- 2009-06-19 CN CN2009101503026A patent/CN101615393B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1278184A2 (en) * | 2001-06-26 | 2003-01-22 | Microsoft Corporation | Method for coding speech and music signals |
Non-Patent Citations (10)
Title |
---|
"Analysis/synthesis filter bank design based on time domain aliasing cancellation", IEEE TRANSACTIONS ON ACOUSTICS SPEECH SIGAL PROCESSING ASSP-34, vol. 5, 1986, pages 1153 - 1161 |
B.EDLER: "Codierung von Audiosignalen mit uberlappender Transformation und adap- tiven Fensterfunktionen", FREQUENZ, vol. 43, 1989, pages 252 - 256 |
H.S.MALVAR: "Signal processing with lapped transform", 1992, ARTECH HOUSE INC. |
JOHN P. PRINCEN; ALAN B. BRADLEY, ODDLY-STACKED TIME DOMAIN ALIAS CANCELLATION TRANSFORM |
M. PURAT; P. NOLL: "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", IEEE ASSP WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 1995, pages 183 - 186, XP010154661, DOI: doi:10.1109/ASPAA.1995.482986 |
M. PURAT; P. NOLL: "Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 1996, ICASSP, vol. 2, 1996, pages 1021 - 1024 |
M.TEMERINAC; B.EDLER: "A unified approach to lapped or-thogonal transforms", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 1, no. 1, January 1992 (1992-01-01), pages 111 - 116 |
RAMPRASHAD S A: "A two stage hybrid embedded speech/audio coding structure", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, vol. 1, 12 May 1998 (1998-05-12), pages 337 - 340, XP010279163, ISBN: 978-0-7803-4428-0 * |
S. RAGOT ET AL.: "ITU-T G.729.1: An 8-32 Kbit/s scalable coder interoperable with G.729 for wideband telephony and voice over IP", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING 2007, ICASSP 2007, vol. 4, pages 529 - 532 |
S.A. RAMPRASHAD: "A two stage hybrid embedded speech/audio coding structure", PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 1998, ICASSP 1998, vol. 1, pages 337 - 340, XP000854584, DOI: doi:10.1109/ICASSP.1998.674436 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737636A (en) * | 2011-04-13 | 2012-10-17 | 华为技术有限公司 | Audio coding method and device thereof |
CN102737636B (en) * | 2011-04-13 | 2014-06-04 | 华为技术有限公司 | Audio coding method and device thereof |
RU2667380C2 (en) * | 2014-06-24 | 2018-09-19 | Хуавэй Текнолоджиз Ко., Лтд. | Method and device for audio coding |
US10347267B2 (en) | 2014-06-24 | 2019-07-09 | Huawei Technologies Co., Ltd. | Audio encoding method and apparatus |
US11074922B2 (en) | 2014-06-24 | 2021-07-27 | Huawei Technologies Co., Ltd. | Hybrid encoding method and apparatus for encoding speech or non-speech frames using different coding algorithms |
CN106033982A (en) * | 2015-03-13 | 2016-10-19 | 中国移动通信集团公司 | Method and device for realizing ultra wide band voice intercommunication and a terminal |
CN106033982B (en) * | 2015-03-13 | 2018-10-12 | 中国移动通信集团公司 | A kind of method, apparatus and terminal for realizing ultra wide band voice intercommunication |
Also Published As
Publication number | Publication date |
---|---|
CN101615393A (en) | 2009-12-30 |
CN101615393B (en) | 2013-01-02 |
EP2139000B1 (en) | 2011-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2139000B1 (en) | Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal | |
EP2255358B1 (en) | Scalable speech and audio encoding using combinatorial encoding of mdct spectrum | |
Neuendorf et al. | Unified speech and audio coding scheme for high quality at low bitrates | |
EP2311032B1 (en) | Audio encoder and decoder for encoding and decoding audio samples | |
EP2186088B1 (en) | Low-complexity spectral analysis/synthesis using selectable time resolution | |
KR101224884B1 (en) | Audio encoding/decoding scheme having a switchable bypass | |
US8595019B2 (en) | Audio coder/decoder with predictive coding of synthesis filter and critically-sampled time aliasing of prediction domain frames | |
EP2041745B1 (en) | Adaptive encoding and decoding methods and apparatuses | |
EP2044589B1 (en) | Method and apparatus for lossless encoding of a source signal, using a lossy encoded data stream and a lossless extension data stream | |
US20060173675A1 (en) | Switching between coding schemes | |
EP2849180B1 (en) | Hybrid audio signal encoder, hybrid audio signal decoder, method for encoding audio signal, and method for decoding audio signal | |
CN101371296B (en) | Apparatus and method for encoding and decoding signal | |
WO2010003491A1 (en) | Audio encoder and decoder for encoding and decoding frames of sampled audio signal | |
JP2001522156A (en) | Method and apparatus for coding an audio signal and method and apparatus for decoding a bitstream | |
KR20100086031A (en) | Technique for encoding/decoding of codebook indices for quantized mdct spectrum in scalable speech and audio codecs | |
EP2301020A1 (en) | Apparatus and method for encoding/decoding an audio signal using an aliasing switch scheme | |
CN101385079A (en) | Device for perceptual weighting in audio encoding/decoding | |
US20130173275A1 (en) | Audio encoding device and audio decoding device | |
US9240192B2 (en) | Device and method for efficiently encoding quantization parameters of spectral coefficient coding | |
Jung et al. | A bit-rate/bandwidth scalable speech coder based on ITU-T G. 723.1 standard | |
Friedrich et al. | Spectral band replication tool for very low delay audio coding applications | |
Motlicek et al. | Frequency domain linear prediction for QMF sub-bands and applications to audio coding | |
EP3002751A1 (en) | Audio encoder and decoder for encoding and decoding audio samples | |
Motlicek et al. | Scalable wide-band audio codec based on frequency domain linear prediction | |
Livshitz et al. | Perceptually Constrained Variable Bitrate Wideband Speech Coder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
17P | Request for examination filed |
Effective date: 20100223 |
|
17Q | First examination report despatched |
Effective date: 20100324 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
AKX | Designation fees paid |
Designated state(s): DE FR GB |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 19/02 20060101ALI20100830BHEP Ipc: G10L 19/14 20060101AFI20100830BHEP Ipc: G10L 19/04 20060101ALI20100830BHEP Ipc: G10L 11/02 20060101ALI20100830BHEP |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: THOMSON LICENSING |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602008007198 Country of ref document: DE Effective date: 20110707 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R084 Ref document number: 602008007198 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 746 Effective date: 20110627 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R084 Ref document number: 602008007198 Country of ref document: DE Effective date: 20110622 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20120228 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602008007198 Country of ref document: DE Effective date: 20120228 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20150626 Year of fee payment: 8 Ref country code: DE Payment date: 20150625 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20150622 Year of fee payment: 8 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 602008007198 Country of ref document: DE Representative=s name: KASTEL PATENTANWAELTE, DE |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 602008007198 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602008007198 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20160625 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20170228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160630 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170103 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160625 |