EP1692686A1 - Audio signal coding - Google Patents

Audio signal coding

Info

Publication number
EP1692686A1
EP1692686A1 EP04799284A EP04799284A EP1692686A1 EP 1692686 A1 EP1692686 A1 EP 1692686A1 EP 04799284 A EP04799284 A EP 04799284A EP 04799284 A EP04799284 A EP 04799284A EP 1692686 A1 EP1692686 A1 EP 1692686A1
Authority
EP
European Patent Office
Prior art keywords
type
frequency
decoder
granule
data samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04799284A
Other languages
German (de)
French (fr)
Inventor
Erik Schuijers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to EP04799284A priority Critical patent/EP1692686A1/en
Publication of EP1692686A1 publication Critical patent/EP1692686A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition

Definitions

  • the present invention relates to the encoding and decoding of data signals.
  • the invention relates particularly, but not exclusively, to apparatus for encoding and decoding MPEG-1 layer III data signals.
  • MPEG-1 layer III (commonly known as MP3) is a widely used audio codec.
  • the industry standard for MP3 is described in ISO/IEC JTC 1/SC29/WG11 MPEG, IS 11172- 3, Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s, Part 3: Audio, MPEG-1, 1992.
  • This standard is available from the International Organization for Standardization (ISO) (www.iso.ch) and is hereby incorporated herein by way of reference.
  • Figure 1 presents a simplified block diagram of a typical conventional MPEG-
  • the encoder 10 is arranged to receive a PCM input signal comprising a series, or a frame, of 1152 audio samples.
  • the input signal is supplied to a (polyphase) analysis filterbank 12 which filters the input signal into 32 uniformly spaced, overlapping frequency bands to produce 32 downsampled subband signal components, each comprising 36 subband samples.
  • a windowed (forward) MDCT Modified Discrete Cosine Transform
  • MDCT unit 14 Four window types are used to accommodate variable time segmentation.
  • the so-called normal windows can be used, while for non-stationary parts of the signal a sequence of so-called short windows can be used.
  • Two transitory types of windows, the so-called start and stop windows, have been defined to prevent discontinuities when switching from normal to short windows and vice versa.
  • the MDCT is performed on 36 inputs (i.e. 36 subband samples) and produces 18 output MDCT coefficients, which are commonly referred to as frequency lines.
  • the MDCT is performed on three sets of 12 inputs (i.e. three sets of 12 subband samples) and produces three sets of 6 output MDCT coefficients, or frequency lines.
  • a set of 576 MDCT coefficients is known as a granule.
  • two granules are produced as a result of the overlapping nature of the encoding process.
  • 18 x 32 576 MDCT coefficients, or frequency lines, are produced for each 576 input samples.
  • the MDCT frequency lines are provided to anti-aliasing butterflies 16 to reduce the effect of aliasing caused by downsampling the partially overlapping filters of the filterbank 12.
  • a quantization and coding unit 18 performs appropriate quantization and coding of the frequency lines to produce an output signal in a prescribed bitstream format.
  • the quantization and coding is performed under the control of a bit-allocation unit 20 which performs a bit-allocation algorithm, typically steered by a psycho-acoustic model.
  • Figure 2 presents a simplified block diagram of a conventional MPEG-1 layer
  • the decoder 30 is arranged to receive an input signal in the prescribed bitstream format.
  • a decoding and dequantizing unit 32 performs decoding and dequantization of the bitstream to produce frequency lines, or MDCT coefficients.
  • a respective 576 frequency lines are reproduced for each set of 576 MDCT frequency lines produced by the encoder 10.
  • the frequency lines are provided to a re-ordering unit 34 which re-orders the frequency lines, in case of short type of windows, within each granule. In case of normal, start or stop windows, the frequency lines are provided to aliasing butterflies 36 which perform the inverse of the anti-aliasing operation performed by the anti-aliasing butterflies 16.
  • An IMDCT unit 38 performs an IMDCT (Inverse Modified Discrete Cosine Transform) on the frequency lines to produce 32 subband signal components each comprising 36 subband samples. For those frequency lines corresponding to a normal, start or stop window MDCT, the IMDCT unit 38 takes as input 18 frequency lines and generates 36 subband samples. For those frequency lines corresponding to a short window MDCT, the IMDCT unit 38 takes as input 3 sets of 6 frequency lines and generates 3 sets of 12 subband samples. A windowing operation and standard overlapping and adding operations are performed on the subband samples by a windowing and overlap-add unit 40. Information on which type of window to use is carried in the associated side information of the bit stream.
  • IMDCT Inverse Modified Discrete Cosine Transform
  • the subband samples are provided to a (polyphase) synthesis filterbank 42, which also comprises upsampling by a factor of 32, to produce an output signal comprising PCM samples.
  • the filterbanks 12, 42 comprise a prototype low pass filter that is cosine modulated to form the higher frequency bands.
  • the serial combination of a subband filterbank and an MDCT unit is known as a hybrid filterbank, because it partially consists of a filterbank and partially consists of a transform.
  • the analysis filterbank 12 and the MDCT unit 14 together comprise a hybrid analysis filterbank while in the decoder 30, the IMDCT unit 38 and the synthesis filterbank 42 together comprise a hybrid synthesis filterbank.
  • a first aspect of the invention provides a decoder for data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the decoder comprising means for decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; means for performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; and means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples, wherein, in respect of at least a first type of window function, said inverse frequency transform means is arranged to perform a single inverse frequency transform on all frequency lines of a respective granule, and wherein the decoder further includes
  • a second aspect of the invention provides a method of decoding data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the method comprising decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; and constructing an output signal from said windowed data samples, wherein, in respect of at least a first type of window function, a single inverse frequency transform is performed on all frequency lines within a respective granule.
  • the first and second aspect of the invention each allow the output signal to be generated without the need for a filterbank.
  • the encoded data signals comprise MPEG-1 layer III data signals and the forward and inverse frequency transforms comprise the Modified Discrete Cosine Transform (MDCT) and the Inverse Modified Discrete Cosine Transform (IMDCT) respectively.
  • the forward frequency transform inverse comprises the Modified Discrete Cosine Transform (MDCT) and the encoded data signals comprise MPEG-1 layer III data signals.
  • a third aspect of the invention provides an encoder for an input signal comprising a plurality of data samples, the encoder comprising means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; means for performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; and means for encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, said MDCT means is arranged to perform a single MDCT on all windowed data samples of the received data signal in respect of which a respective granule is produced.
  • MDCTs modified discrete cosine transforms
  • a fourth aspect of the invention provides a method of encoding an input signal comprising a plurality of data samples, the method comprising applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, a single MDCT is performed on all windowed data samples of the received data signal in respect of which a respective granule is produced.
  • MDCTs modified discrete cosine transforms
  • a fifth aspect of the invention provides a system, or codec, for encoding and decoding data signals, the system comprising an encoder of the third aspect of the invention and a decoder of the first aspect of the invention.
  • Figure 1 is a block diagram of a conventional MPEG-1 layer III encoder
  • Figure 2 is a block diagram of a conventional MPEG-1 layer III decoder
  • Figure 3 is a graphical representation of MDCT coefficients coming from the
  • FIG. 1 is a block diagram of a decoder MPEG-1 layer III signals, the decoder embodying one aspect of the present invention
  • Figure 7 shows the order of MDCT coefficients for short windows after reordering in the decoding apparatus of Figure 6
  • Figure 8 is a block diagram of an encoder for generating MPEG-1 layer III type signals embodying a third aspect of the invention.
  • a typical data frame comprises two granules of 576 frequency lines, or MDCT coefficients, each.
  • the 576 frequency lines comprise a respective set of 18 frequency lines for each of the 32 subbands.
  • each set of 18 frequency lines is comprised of 3 sets of 6 frequency lines.
  • the transformations are performed by the hybrid filterbank 12, 14.
  • the MDCT unit 14 performs one or more
  • the MDCTs performed by MDCT unit 14 may be said to comprise "short" MDCTs in that each MDCT is performed on only a respective (relatively small) portion of the frame data at a time.
  • a single MDCT is performed on the 36 input samples of a subband to produce 18 frequency lines.
  • three MDCT transforms are performed each on a respective set of 12 input samples of a subband to produce a respective set of 6 frequency lines.
  • the inverse MDCTs performed by IMDCT unit 38 may be said to comprise "short" inverse MDCTs since each inverse MDCT is performed on only a respective portion of the decoded and dequantized frequency lines produced in respect of the data frame.
  • a single inverse MDCT is performed on the 18 frequency lines of a subband to produce 36 time domain samples.
  • three inverse MDCT transforms are performed each on a respective set of 6 frequency lines of a subband to produce a respective set of 12 time domain samples.
  • a method of decoding MP3 data wherein one or more "long" inverse MDCTs are performed on the decoded and dequantized frequency lines, or MDCT coefficients, produced in respect of a whole data granule.
  • a granule of 576 frequency lines, or MDCT coefficients when a normal, start or stop type of window is required, a single "long” inverse MDCT is performed on all 576 frequency lines to produce 1152 time domain samples while, for a short type of window, three "long” inverse MDCTs are performed on a respective set of 192 frequency lines to produce a respective set of 384 time domain samples.
  • one or more inverse MDCTs are performed on all of the frequency lines of a granule as a whole rather than being performed on the respective frequency lines associated with respective subbands. It is found that, with some pre-processing of the frequency lines and with appropriate windowing and overlap-add operations, the outputs of the "long" inverse MDCTs may be used to provide a perceptually close approximation of the desired PCM output signal, thereby removing the need for a filterbank in the decoder. Similar principles may be applied during the encoding process thereby removing the need for a filterbank in the encoder. This is described in more detail below. In arriving at the present invention, the following observations are made: -an ideal filterbank consists of rectangular non-overlapping passbands.
  • the hybrid filterbank could be approximated quite accurately by a single "long" MDCT as described above.
  • the combination of filterbank and anti-aliasing butterflies gives a relatively good approximation of an ideal filterbank.
  • the hybrid filterbank in combination with anti-aliasing butterflies can be replaced by a single "long" MDCT.
  • n is a time index which, for conventional MP3 encoders, denotes subband sample index
  • N is the transform length or size
  • k is a frequency index
  • x[n] is the time domain signal which, in conventional MP3 encoders, comprises the subband time domain signal comprised of the subband samples
  • c[k] is the frequency domain MDCT spectrum.
  • Figure 3 illustrates the result of the hybrid analysis filterbank after antialiasing butterflies of a delta pulse graphically. It can be seen that the spectrum shown in Figure 3 is comprised of a cosine-type waveform with the waveform corresponding to odd, i.e. alternate or every second, subbands being negated (multiplied by -1). This is a characteristic shared with the output of a hybrid filterbank, which is known to comprise negated alternate subband components. Indeed, for every second subband of the synthesis filterbank 42 in the decoder 30, every second input value is negated (i.e. multiplied by -1) to compensate for the frequency inversion caused by the analysis filterbank 12 in the encoder 10.
  • the distortion that can be seen in Figure 4 is caused by the aliasing due to downsampling in the analysis filterbank which is only partially compensated by the anti-aliasing butterflies and by the fact that the analysis filter bank does not have an ideal linear phase characteristic.
  • the operation of a hybrid filterbank may be approximated by a MDCT.
  • one or more "long" MDCTs are used to replace the operation of the hybrid synthesis filterbank 38, 42 of the decoder 30.
  • one or more "long” MDCT may be used to replace the operation of the hybrid analysis filterbank 12, 14 of the encoder 10.
  • the decoding apparatus, or decoder 60 comprises a decoding and dequantization unit 62 arranged to receive a data signal in the form of an MPEG-1 layer III bitstream, or similarly encoded data signal.
  • the decoding and dequantization unit 62 performs appropriate decoding (typically Huffman decoding as prescribed by MP3) and re- quantization of the received bitstream to recover a plurality of frequency lines, or MDCT coefficients.
  • the decoding and dequantization unit 62 may perform standard MP3 decoding and re-quantization. Typically, for a frame comprising 1152 input audio samples, two granules of 576 frequency lines are recovered by the unit 62 (due to the overlap-add operation performed in the windowing, effectively 576 input samples deliver 576 MDCT coefficients and so the system is critically sampled).
  • the decoder 60 includes a re-ordering unit 64 for re-ordering, as necessary, the frequency lines produced by the decoding and dequantization unit 62. The re-ordering reverses the re-ordering that is normally performed by an encoder. This is described in more detail below.
  • the re-ordering unit 62 may determine what type of re-ordering is required from the side information associated with the respective frame.
  • An inverse MDCT unit 68 IMDCT is provided for performing one or more inverse MDCTs on the re-ordered frequency lines. As described above, the IMDCT unit 68 is arranged to operate on a whole granule of frequency lines at a time, performing either a single inverse MDCT on all frequency lines within the granule (when normal, short or stop type windows are required) or a plurality inverse MDCTs on a corresponding number of subsets of all the frequency lines within the granule (when short type windows are required).
  • the IMDCT unit 68 For an MP3 bitstream where a granule comprises 576 frequency lines, the IMDCT unit 68 performs a single inverse MDCT on the whole granule for normal, start or stop windows resulting in 1152 time domain samples, and three inverse MDCTs on a respective one of 3 sub-sets of 192 frequency lines, resulting in three respective sequences, or sets, of 384 time domain samples.
  • the output of the IMDCT unit 68 comprises a plurality (1152 in the present example) of recovered signal components, or samples, which may be used to construct a PCM output signal. In order to construct the PCM output signal, windowing and overlap -add operations are performed on the signal samples produced by the IMDCT unit 68.
  • the decoder 60 further includes a windowing and overlap-add unit 70, the operation of which is described in more detail below.
  • the synthesis filterbank 42 of a conventional MP3 decoder 30 negates alternate subband signal components, or subband channels, to compensate for the frequency inversion of the analysis filterbank 12 of the encoder 10.
  • the decoder 60 includes a negation unit 66 for negating, i.e. multiplying the relevant MDCT coefficients by -1, alternate subband signal components, or channels.
  • the negation unit 66 is shown in Figure 6 between the re-ordering unit 64 and the IMDCT unit 68 but may alternatively be located elsewhere, for example between the decoding and dequantization unit 62 and the re-ordering unit 64. It is also noted that the analysis filterbank 12 has overlapping subbands. The effects of this are normally reduced by the anti-aliasing butterflies 16 that are normally included in the encoder 10.
  • conventional MP3 windowing is now described in more detail. Within MP3 four different window types (and accompanying lengths) are prescribed, namely 'normal', 'start', 'short' and 'stop'.
  • a particular type of window, or sequence of different window types, is selected to suit the characteristics of the portion of the data to which the window(s) are to be applied. For example, short type windows are usually applied to data portions corresponding to transients in the audio signal.
  • the side information associated with a given data frame indicates which window types are to be used with the granule.
  • the required window type affects both the length, or size, of the MDCT (and therefore inverse MDCT) and the windowing/overlap-add operations.
  • the window functions z(n) may be described as follows: For a normal type of window (type 0):
  • each granule of 576 MDCT coefficients (32 subbands times 3 windows times 6 MDCT coefficients) are ordered to allow a more efficient encoding.
  • corresponding re-ordering takes place to reverse the re-ordering performed by the encoder.
  • the MDCT coefficients, or frequency lines, of a granule are re-ordered, in increasing granularity, according to frequency line, then window index and then sub-band.
  • each frequency line, or MDCT coefficient may be accorded a respective frequency line index from 0 to 575.
  • the frequency lines are ordered in accordance with a subband index which denotes to which subband they belong and runs from 0 to 31.
  • the frequency lines are ordered in accordance with a window index which denotes which window is to be applied to the frequency lines and runs from 0 to 2.
  • the frequency lines are ordered in accordance with a frequency line sub-index which denotes the order in which the frequency lines are provided to the MDCT and runs from 0 to 5.
  • the re-ordering unit 64 is arranged to re-order the frequency lines of a granule in a manner different to that described above for a conventional decoder.
  • the re-ordering unit 64 re-orders the frequency lines, in increasing granularity, according to frequency line, then subband and finally window. This is illustrated in Figure 7 from which it may be seen that within a granule 50', the frequency lines are ordered at the highest level according to window index, then according to subband index and then according to frequency band sub-index.
  • the construction of the PCM output signal by the windowing and overlap -add unit 70 in conjunction with the IMDCT unit 68 is now described. It is assumed in the following example that the original PCM signal comprises frames of 1152 audio samples, each frame being transformed into two granules of 576 frequency lines (or MDCT coefficients).
  • the IMDCT unit 68 operates on granules of 576 MDCT coefficients to produce a signal comprising 1152 samples which are then provided to the windowing and overlap-add unit 70.
  • the output signal v 0 ( «) is initialised to zero for all n.
  • the generation of the signal x, (n) is dependent on the specified window type.
  • the window type for the I th granule is 0, 1, or 3
  • the IMDCT unit 68 performs an inverse MDCT on 576 input coefficients provided by X, (k) to produce a temporary signal x, mp ⁇ n) of 1152 points as described in equation [9]:
  • the windowing and overlap -add unit 70 calculates the signal x, ⁇ n) as:
  • the windowing and overlap-add unit 70 calculates the signal X / ( «) by first calculating three temporary signals:
  • the windowing and overlap -add unit 70 calculates the signal x,(n) as:
  • divisor 1152 corresponds with the IMDCT length N and divisor 384 corresponds with N/3.
  • the respective window lengths of the window functions z(n) in equations [11], [12], [13] and [15] are longer in accordance with the respective transform length N and the respective divisors are correspondingly larger.
  • the window functions z( ⁇ ) of equations [11], [12], [13] and [15] may be said to comprise up- sampled versions of the window functions z( ⁇ ) described in equations [4], [5], [6] and [7] respectively, the extent of the up-sampling depending on the respective transform length/window length, N. It will also be noted that the window functions of equations [11], [12], [13] and [15] each comprises a single window function even though its application may involve the application of more than one window.
  • the windowing and overlap -add unit 70 makes only one application of the specified window type, i.e. applies only one window function, to the samples of a whole granule. This is in contrast to the conventional decoder 30 in which a window function is applied in respect of each subband.
  • the PCM output signal produced by the windowing and overlap-add unit 70 is found to comprise a high quality audio signal although it is not fully conformant or bit-true with the MP3 standard.
  • some phase distortion and aliasing are present leading to relatively small spectral distortions and time-domain distortions in comparison with MP3 conformant signals.
  • these distortions or artefacts are found not to have a significant adverse effect on human perception of the audio signal.
  • decoder 60 Effectively, in decoder 60 the hybrid synthesis filterbank is replaced by a "long" phase distorted inverse MDCT with some spectral aliasing.
  • the computational complexity of the decoder 60 is significantly reduced.
  • a typically optimized conventional MP3 decoder requires approximately 22.11 multiplications and 26.73 additions per output sample.
  • a correspondingly optimized decoder 60 requires just 8 multiplications and 20.5 additions per output sample.
  • the decoder 60 offers a higher decoding efficiency, the latter leading to less power consumption or lower DSP requirements.
  • a further aspect of the invention provides an apparatus for encoding an audio signal to produce an MPEG-1 layer III type signal or bitstream. It is noted that the bitstream is not a standard MP3 bitstream, although it is conformant with MP3 - the resulting decoded signals vary from the MP3 standard in phase response and aliasing. In essence, a "long" phase distorted MDCT is used to replace the analysis hybrid filterbank 12, 14 of the conventional encoder 10.
  • FIG 8 shows a simplified block diagram of an encoder 80 embodying this aspect of the invention.
  • the encoder 80 includes a windowing unit 82 which performs windowing on the received PCM input samples.
  • the windowing functions are similar to those described in equations [4], [5], [6] and [7] although the window lengths are different in accordance with the required MDCT transform size.
  • an MDCT unit 84 performs a "long" MDCT on all 1152 input samples of a received frame to produce 576 frequency lines.
  • the MDCT unit 84 performs three "long" MDCTs on a respective one of 3 sets of 384 input samples to produce a respective set of 192 frequency lines.
  • the encoder 80 may include a conventional MP3 quantization and coding unit 86 and bit allocation unit 88.
  • a negation unit 85 may be provided between the MDCT unit 82 and the quantization and coding unit 86 for negating, alternate, i.e. every second, subbands. It will be understood that the role of the negation unit 66 in decoder 60 is to compensate for the inherent negation of alternate subbands that occurs in conventional MP3 encoders. Correspondingly, the role of the negation unit 85 in encoder 80 is to create the negation of alternate subbands that would normally occur in a conventional encoder 10. However, the negation of alternate subbands is not essential and so, in alternative embodiments, the negation units 66, 85 may be omitted.
  • the decoder 60 is capable not only of decoding standard conformant MPEG-1 layer III data but also non-standard MPEG-1 layer III type data as produced by, for example, encoder 80.
  • the invention is not limited to MPEG-1 layer III data signals or to MDCTs.
  • a decoder embodying the first aspect of the invention may be arranged to operate on encoded data signals produced by an encoder (including non-MPEG- 1 layer III encoders) which provides unencoded data signals (especially but not necessarily audio signals) to a subband filterbank and subsequently causes a respective forward frequency transform to be performed on each resulting subband signal, i.e. a hybrid filterbank.
  • the subsequent quantizing and encoding need not necessarily be in accordance with MP3 so long as corresponding dequantizing and decoding is performed at the decoder.
  • the forward frequency transform need not necessarily comprise the MDCT so long as a compatible inverse frequency transform is employed by the decoder.
  • the term “granule” is primarily an MP3 term but a skilled person will readily understand that, in the context of non-MP3 embodiments, the term “granule” as used herein may be interpreted as any equivalent grouping of frequency lines or coefficients (commonly the term “frame” is equivalent to "granule”).
  • the subband filterbank and the frequency transform are critically sampled and the window functions overlap by 50% (hence the transform exhibits the Time Domain Aliasing Cancellation (TDAC) property) and, more preferably, real valued. It is also preferred, but not essential, that aliasing reduction is performed, e.g. by anti-aliasing butterflies, on the transformed subband signals at the encoder.
  • TDAC Time Domain Aliasing Cancellation
  • aliasing reduction is performed, e.g. by anti-aliasing butterflies, on the transformed subband signals at the encoder.

Abstract

One aspect of the invention provides a decoder for MPEG-1 layer III data signals. In the preferred embodiment, the decoder performs a single inverse MDCT on all 576 frequency lines of a respective granule for type 0, 1 and 3 MP3 window functions, and performs three inverse MDCTs on three sets of 192 frequency lines for type 2 window functions. It is found that the use of 'long' inverse MDCTs provides an adequate approximation of a hybrid filterbank which comprises a plurality of 'short' inverse MDCTs and a synthesis filterbank. As a result, an output signal may be constructed without the need for a filterbank. Another aspect of the invention provides an encoder for generating MPEG-1 layer III type data signals in which 'long' MDCTs are used to replace the hybrid filterbank. As a result, MPEG-1 layer III type data signals may be generated without the need for a filterbank.

Description

Audio signal coding
The present invention relates to the encoding and decoding of data signals. The invention relates particularly, but not exclusively, to apparatus for encoding and decoding MPEG-1 layer III data signals. MPEG-1 layer III (commonly known as MP3) is a widely used audio codec. The industry standard for MP3 is described in ISO/IEC JTC 1/SC29/WG11 MPEG, IS 11172- 3, Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s, Part 3: Audio, MPEG-1, 1992. This standard is available from the International Organization for Standardization (ISO) (www.iso.ch) and is hereby incorporated herein by way of reference. Figure 1 presents a simplified block diagram of a typical conventional MPEG-
1 layer III encoder 10, showing only those components of the encoder 10 that are helpful for an appreciation of the present invention. The encoder 10 is arranged to receive a PCM input signal comprising a series, or a frame, of 1152 audio samples. The input signal is supplied to a (polyphase) analysis filterbank 12 which filters the input signal into 32 uniformly spaced, overlapping frequency bands to produce 32 downsampled subband signal components, each comprising 36 subband samples. In respect of each subband signal component, a windowed (forward) MDCT (Modified Discrete Cosine Transform) is performed by MDCT unit 14. Four window types are used to accommodate variable time segmentation. For (quasi-) stationary parts of the signal the so-called normal windows can be used, while for non-stationary parts of the signal a sequence of so-called short windows can be used. Two transitory types of windows, the so- called start and stop windows, have been defined to prevent discontinuities when switching from normal to short windows and vice versa. For a normal, start or stop window, the MDCT is performed on 36 inputs (i.e. 36 subband samples) and produces 18 output MDCT coefficients, which are commonly referred to as frequency lines. For a short window, the MDCT is performed on three sets of 12 inputs (i.e. three sets of 12 subband samples) and produces three sets of 6 output MDCT coefficients, or frequency lines. A set of 576 MDCT coefficients is known as a granule. In respect of a common MP3 frame, which comprises 1152 input samples, two granules are produced as a result of the overlapping nature of the encoding process. In total, 18 x 32 = 576 MDCT coefficients, or frequency lines, are produced for each 576 input samples. In case of normal, start or stop windows, the MDCT frequency lines are provided to anti-aliasing butterflies 16 to reduce the effect of aliasing caused by downsampling the partially overlapping filters of the filterbank 12. Finally, a quantization and coding unit 18 performs appropriate quantization and coding of the frequency lines to produce an output signal in a prescribed bitstream format. The quantization and coding is performed under the control of a bit-allocation unit 20 which performs a bit-allocation algorithm, typically steered by a psycho-acoustic model. Figure 2 presents a simplified block diagram of a conventional MPEG-1 layer
III decoder 30, showing only those components that are helpful for an appreciation of the present invention. The decoder 30 is arranged to receive an input signal in the prescribed bitstream format. A decoding and dequantizing unit 32 performs decoding and dequantization of the bitstream to produce frequency lines, or MDCT coefficients. A respective 576 frequency lines are reproduced for each set of 576 MDCT frequency lines produced by the encoder 10. The frequency lines are provided to a re-ordering unit 34 which re-orders the frequency lines, in case of short type of windows, within each granule. In case of normal, start or stop windows, the frequency lines are provided to aliasing butterflies 36 which perform the inverse of the anti-aliasing operation performed by the anti-aliasing butterflies 16. An IMDCT unit 38 performs an IMDCT (Inverse Modified Discrete Cosine Transform) on the frequency lines to produce 32 subband signal components each comprising 36 subband samples. For those frequency lines corresponding to a normal, start or stop window MDCT, the IMDCT unit 38 takes as input 18 frequency lines and generates 36 subband samples. For those frequency lines corresponding to a short window MDCT, the IMDCT unit 38 takes as input 3 sets of 6 frequency lines and generates 3 sets of 12 subband samples. A windowing operation and standard overlapping and adding operations are performed on the subband samples by a windowing and overlap-add unit 40. Information on which type of window to use is carried in the associated side information of the bit stream. Finally, the subband samples are provided to a (polyphase) synthesis filterbank 42, which also comprises upsampling by a factor of 32, to produce an output signal comprising PCM samples. The filterbanks 12, 42 comprise a prototype low pass filter that is cosine modulated to form the higher frequency bands. The serial combination of a subband filterbank and an MDCT unit is known as a hybrid filterbank, because it partially consists of a filterbank and partially consists of a transform. In the encoder 10, the analysis filterbank 12 and the MDCT unit 14 together comprise a hybrid analysis filterbank while in the decoder 30, the IMDCT unit 38 and the synthesis filterbank 42 together comprise a hybrid synthesis filterbank. The use of a hybrid filterbanks is a recognised weakness with MP3 in view of the computational, and therefore implementational, complexity it introduces. It would be desirable therefore to provide an MP3 encoder and/or decoder that is less computationally demanding. Accordingly, a first aspect of the invention provides a decoder for data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the decoder comprising means for decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; means for performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; and means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples, wherein, in respect of at least a first type of window function, said inverse frequency transform means is arranged to perform a single inverse frequency transform on all frequency lines of a respective granule, and wherein the decoder further includes means for constructing an output signal from said windowed data samples. A second aspect of the invention provides a method of decoding data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the method comprising decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; and constructing an output signal from said windowed data samples, wherein, in respect of at least a first type of window function, a single inverse frequency transform is performed on all frequency lines within a respective granule. The first and second aspect of the invention each allow the output signal to be generated without the need for a filterbank. In preferred embodiments, the encoded data signals comprise MPEG-1 layer III data signals and the forward and inverse frequency transforms comprise the Modified Discrete Cosine Transform (MDCT) and the Inverse Modified Discrete Cosine Transform (IMDCT) respectively. In preferred embodiments, the forward frequency transform inverse comprises the Modified Discrete Cosine Transform (MDCT) and the encoded data signals comprise MPEG-1 layer III data signals. A third aspect of the invention provides an encoder for an input signal comprising a plurality of data samples, the encoder comprising means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; means for performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; and means for encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, said MDCT means is arranged to perform a single MDCT on all windowed data samples of the received data signal in respect of which a respective granule is produced. A fourth aspect of the invention provides a method of encoding an input signal comprising a plurality of data samples, the method comprising applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, a single MDCT is performed on all windowed data samples of the received data signal in respect of which a respective granule is produced. The third and fourth aspects of the invention allow MPEG-1 layer III type data signals to be generated without using a filterbank. A fifth aspect of the invention provides a system, or codec, for encoding and decoding data signals, the system comprising an encoder of the third aspect of the invention and a decoder of the first aspect of the invention. Preferred features of each aspect of the invention are recited in the dependent claims. Further advantageous aspect of the invention will become apparent to those ordinarily skilled in the art upon review of the following description of a specific embodiment and with reference to the accompanying drawings.
An embodiment of the invention is now described by way of example and with reference to the accompanying drawings in which: Figure 1 is a block diagram of a conventional MPEG-1 layer III encoder; Figure 2 is a block diagram of a conventional MPEG-1 layer III decoder; Figure 3 is a graphical representation of MDCT coefficients coming from the
MPEG-1 Layer III hybrid analysis filterbank of a delta pulse; Figure 4 is a graphical representation of the MDCT coefficients of Figure 3 after negation (multiplication by -1) of odd subbands; Figure 5 shows the order of MDCT coefficients for short windows after re- ordering in a conventional MPEG-1 layer III decoder; Figure 6 is a block diagram of a decoder MPEG-1 layer III signals, the decoder embodying one aspect of the present invention; Figure 7 shows the order of MDCT coefficients for short windows after reordering in the decoding apparatus of Figure 6; and Figure 8 is a block diagram of an encoder for generating MPEG-1 layer III type signals embodying a third aspect of the invention.
In conventional MPEG-1 layer III (MP3) systems, a typical data frame comprises two granules of 576 frequency lines, or MDCT coefficients, each. As described above, in accordance with conventional MP3 encoding, the 576 frequency lines comprise a respective set of 18 frequency lines for each of the 32 subbands. When the short type of windows is used, each set of 18 frequency lines is comprised of 3 sets of 6 frequency lines. In the encoder 10 of Figure 1, the transformations are performed by the hybrid filterbank 12, 14. Depending on the required window type, the MDCT unit 14 performs one or more
MDCTs in respect of each subband. The MDCTs performed by MDCT unit 14 may be said to comprise "short" MDCTs in that each MDCT is performed on only a respective (relatively small) portion of the frame data at a time. For a normal, start or stop type of window, a single MDCT is performed on the 36 input samples of a subband to produce 18 frequency lines. For a short type of window, three MDCT transforms are performed each on a respective set of 12 input samples of a subband to produce a respective set of 6 frequency lines. Correspondingly, in the conventional MP3 decoder 30, the inverse MDCTs performed by IMDCT unit 38 may be said to comprise "short" inverse MDCTs since each inverse MDCT is performed on only a respective portion of the decoded and dequantized frequency lines produced in respect of the data frame. For normal, start or stop type of windows, a single inverse MDCT is performed on the 18 frequency lines of a subband to produce 36 time domain samples. For a short type of window, three inverse MDCT transforms are performed each on a respective set of 6 frequency lines of a subband to produce a respective set of 12 time domain samples. In contrast, in an embodiment of one aspect of the invention, a method of decoding MP3 data is provided wherein one or more "long" inverse MDCTs are performed on the decoded and dequantized frequency lines, or MDCT coefficients, produced in respect of a whole data granule. For a granule of 576 frequency lines, or MDCT coefficients, when a normal, start or stop type of window is required, a single "long" inverse MDCT is performed on all 576 frequency lines to produce 1152 time domain samples while, for a short type of window, three "long" inverse MDCTs are performed on a respective set of 192 frequency lines to produce a respective set of 384 time domain samples. In either case, one or more inverse MDCTs are performed on all of the frequency lines of a granule as a whole rather than being performed on the respective frequency lines associated with respective subbands. It is found that, with some pre-processing of the frequency lines and with appropriate windowing and overlap-add operations, the outputs of the "long" inverse MDCTs may be used to provide a perceptually close approximation of the desired PCM output signal, thereby removing the need for a filterbank in the decoder. Similar principles may be applied during the encoding process thereby removing the need for a filterbank in the encoder. This is described in more detail below. In arriving at the present invention, the following observations are made: -an ideal filterbank consists of rectangular non-overlapping passbands. If the filterbanks as used in MP3 were ideal, the hybrid filterbank could be approximated quite accurately by a single "long" MDCT as described above. The combination of filterbank and anti-aliasing butterflies gives a relatively good approximation of an ideal filterbank. Hence, the hybrid filterbank in combination with anti-aliasing butterflies can be replaced by a single "long" MDCT. From these observations, it is concluded that the overall encoding and decoding processes, and more particularly the operation of the respective hybrid filterbanks, may be approximated by a cosine modulated transform. In particular, it is supposed that the overall encoding and decoding processes may be approximated by the MDCT. If this supposition is correct, i.e. if a hybrid filterbank can be approximated as an MDCT, then the response on a delta pulse would comprise a cosine waveform. An analytical expression for the (forward) MDCT is as follows:
where n is a time index which, for conventional MP3 encoders, denotes subband sample index; N is the transform length or size; k is a frequency index; x[n] is the time domain signal which, in conventional MP3 encoders, comprises the subband time domain signal comprised of the subband samples; and c[k] is the frequency domain MDCT spectrum. A delta pulse can be described (independent of windowing) as follows: x(ri) = \ n = n x(n) = n ≠ ri [2]
A substitution of [2] into [1] results in:
Figure 3 illustrates the result of the hybrid analysis filterbank after antialiasing butterflies of a delta pulse graphically. It can be seen that the spectrum shown in Figure 3 is comprised of a cosine-type waveform with the waveform corresponding to odd, i.e. alternate or every second, subbands being negated (multiplied by -1). This is a characteristic shared with the output of a hybrid filterbank, which is known to comprise negated alternate subband components. Indeed, for every second subband of the synthesis filterbank 42 in the decoder 30, every second input value is negated (i.e. multiplied by -1) to compensate for the frequency inversion caused by the analysis filterbank 12 in the encoder 10. As a result the phase differences between adjacent subbands become approximately 180 degrees, i.e., multiplied by -1. This is described in more detail in the following paper: B. Edler, Aliasing reduction in sub-bands of cascaded filter banks with decimation, Electronics Letters, 4th June 1992, Vol. 28, No. 12. Figure 4 illustrates the spectrum of the hybrid filterbank after anti-aliasing butterflies of the delta pulse graphically after the negated subband components have been multiplied by -1 to compensate for the negation. After compensation, c[k] comprises a slightly distorted cosine function. The distortion that can be seen in Figure 4 is caused by the aliasing due to downsampling in the analysis filterbank which is only partially compensated by the anti-aliasing butterflies and by the fact that the analysis filter bank does not have an ideal linear phase characteristic. Hence, with some pre-processing of the MDCT coefficients, the operation of a hybrid filterbank may be approximated by a MDCT. As is described in more detail below, in the preferred embodiments one or more "long" MDCTs are used to replace the operation of the hybrid synthesis filterbank 38, 42 of the decoder 30. Equally, one or more "long" MDCT may be used to replace the operation of the hybrid analysis filterbank 12, 14 of the encoder 10. An apparatus for decoding MPEG-1 layer III data signals, and similarly encoded data signals, embodying one aspect of the present invention is shown in Figure 6, generally indicated as 60. Only those components that are necessary for understanding the present invention are shown. The decoding apparatus, or decoder 60, comprises a decoding and dequantization unit 62 arranged to receive a data signal in the form of an MPEG-1 layer III bitstream, or similarly encoded data signal. The decoding and dequantization unit 62 performs appropriate decoding (typically Huffman decoding as prescribed by MP3) and re- quantization of the received bitstream to recover a plurality of frequency lines, or MDCT coefficients. When the bitstream comprises MP3 conformant data, the decoding and dequantization unit 62 may perform standard MP3 decoding and re-quantization. Typically, for a frame comprising 1152 input audio samples, two granules of 576 frequency lines are recovered by the unit 62 (due to the overlap-add operation performed in the windowing, effectively 576 input samples deliver 576 MDCT coefficients and so the system is critically sampled). The decoder 60 includes a re-ordering unit 64 for re-ordering, as necessary, the frequency lines produced by the decoding and dequantization unit 62. The re-ordering reverses the re-ordering that is normally performed by an encoder. This is described in more detail below. The re-ordering unit 62 may determine what type of re-ordering is required from the side information associated with the respective frame. An inverse MDCT unit 68 IMDCT is provided for performing one or more inverse MDCTs on the re-ordered frequency lines. As described above, the IMDCT unit 68 is arranged to operate on a whole granule of frequency lines at a time, performing either a single inverse MDCT on all frequency lines within the granule (when normal, short or stop type windows are required) or a plurality inverse MDCTs on a corresponding number of subsets of all the frequency lines within the granule (when short type windows are required). For an MP3 bitstream where a granule comprises 576 frequency lines, the IMDCT unit 68 performs a single inverse MDCT on the whole granule for normal, start or stop windows resulting in 1152 time domain samples, and three inverse MDCTs on a respective one of 3 sub-sets of 192 frequency lines, resulting in three respective sequences, or sets, of 384 time domain samples. The output of the IMDCT unit 68 comprises a plurality (1152 in the present example) of recovered signal components, or samples, which may be used to construct a PCM output signal. In order to construct the PCM output signal, windowing and overlap -add operations are performed on the signal samples produced by the IMDCT unit 68. Hence, the decoder 60 further includes a windowing and overlap-add unit 70, the operation of which is described in more detail below. It is noted that the synthesis filterbank 42 of a conventional MP3 decoder 30 negates alternate subband signal components, or subband channels, to compensate for the frequency inversion of the analysis filterbank 12 of the encoder 10. Accordingly, in embodiments of the decoder 60 that are intended to decode standard MP3 conformant data, the decoder 60 includes a negation unit 66 for negating, i.e. multiplying the relevant MDCT coefficients by -1, alternate subband signal components, or channels. The negation unit 66 is shown in Figure 6 between the re-ordering unit 64 and the IMDCT unit 68 but may alternatively be located elsewhere, for example between the decoding and dequantization unit 62 and the re-ordering unit 64. It is also noted that the analysis filterbank 12 has overlapping subbands. The effects of this are normally reduced by the anti-aliasing butterflies 16 that are normally included in the encoder 10. In order that the re-ordering unit 64 and windowing and overlap -add unit 70 may be better understood, conventional MP3 windowing is now described in more detail. Within MP3 four different window types (and accompanying lengths) are prescribed, namely 'normal', 'start', 'short' and 'stop'. A particular type of window, or sequence of different window types, is selected to suit the characteristics of the portion of the data to which the window(s) are to be applied. For example, short type windows are usually applied to data portions corresponding to transients in the audio signal. The side information associated with a given data frame indicates which window types are to be used with the granule. The required window type affects both the length, or size, of the MDCT (and therefore inverse MDCT) and the windowing/overlap-add operations. For MP3, the window functions z(n) may be described as follows: For a normal type of window (type 0):
For a start type of window (type 1):
For short type of windows (type 2), three short windows are coded simultaneously:
, («) = n = 0...1l,p = 0,1,2 [6]
For a stop type of window (type 3):
Each of the window functions in equations [4], [5], [6] and [7] are normally regarded as single window functions even though they may involve the application of more than one window. It will be seen from functions [4], [5], and [7] that the window length is 36 (i.e. a 36 point window) and hence index n runs from 0 to 35. For function [6], the combined length of the three short 12 point windows is 36 and hence n runs from 0 to 11 for p = 0 to 2. Thus, the overall length of each window type corresponds to the size of a subband signal component (36 subband samples). For type 2 windows, i.e., a sequence of short windows, in the encoder 10 each granule of 576 MDCT coefficients (32 subbands times 3 windows times 6 MDCT coefficients) are ordered to allow a more efficient encoding. Hence in a decoder, corresponding re-ordering takes place to reverse the re-ordering performed by the encoder. In the conventional MP3 decoder 30, the MDCT coefficients, or frequency lines, of a granule are re-ordered, in increasing granularity, according to frequency line, then window index and then sub-band. This is illustrated in Figure 5 which shows the structure of part of a granule 50 in which each frequency line, or MDCT coefficient, may be accorded a respective frequency line index from 0 to 575. At the highest, or coarsest, granularity, the frequency lines are ordered in accordance with a subband index which denotes to which subband they belong and runs from 0 to 31. Within each subband, the frequency lines are ordered in accordance with a window index which denotes which window is to be applied to the frequency lines and runs from 0 to 2. Within each window, the frequency lines are ordered in accordance with a frequency line sub-index which denotes the order in which the frequency lines are provided to the MDCT and runs from 0 to 5. Hence, the first frequency line in the granule 50 (i.e. the frequency line with the lowest frequency line index (= 0)) is the frequency line with sub -index 0, window index 0 and subband index 0, the second frequency line (frequency line coefficient = 1) has sub -index 1, window index 0 and subband index 0 and so on until the final frequency line in granule 50, which has the highest frequency line index 575, has sub-index 5, window index 2 and subband index 31. In the decoder 60, the re-ordering unit 64 is arranged to re-order the frequency lines of a granule in a manner different to that described above for a conventional decoder. For "short" (type 2) windows, the re-ordering unit 64 re-orders the frequency lines, in increasing granularity, according to frequency line, then subband and finally window. This is illustrated in Figure 7 from which it may be seen that within a granule 50', the frequency lines are ordered at the highest level according to window index, then according to subband index and then according to frequency band sub-index. The construction of the PCM output signal by the windowing and overlap -add unit 70 in conjunction with the IMDCT unit 68 is now described. It is assumed in the following example that the original PCM signal comprises frames of 1152 audio samples, each frame being transformed into two granules of 576 frequency lines (or MDCT coefficients). Hence, the IMDCT unit 68 operates on granules of 576 MDCT coefficients to produce a signal comprising 1152 samples which are then provided to the windowing and overlap-add unit 70. The t*"1 set, or granule, of MDCT coefficients is denoted as X, (k) where k = 0...575 . The output signal produced by the windowing and overlap-add unit 70 after decoding the /th set (/ starting at 1) of MDCT coefficients is described as (using overlap-add): y, (n + 576 • /) = y,_, (« + 576 • /) + x, (ή) y, {n + 516(1 - 1)) = y,_, (n + 57β(l - 1)) + , (n) [8] where index n = 0...1151 , y,_t (n) is the output signal after decoding the /-7th set and JC, (») is the signal produced by the IMDCT unit 68 operating on the MDCT coefficients X, (k) . The output signal v0 («) is initialised to zero for all n. The generation of the signal x, (n) is dependent on the specified window type. When the window type for the Ith granule is 0, 1, or 3, the IMDCT unit 68 performs an inverse MDCT on 576 input coefficients provided by X, (k) to produce a temporary signal x,mp{n) of 1152 points as described in equation [9]:
with « = 0...N-1 and N = 1152. When the window type of the /* set is 2 (i.e. a "short" window), the IMDCT unit 68 performs three inverse MDCTs on a respective set of 192 input coefficients each provided by X,(k) to produce three temporary signals denoted as xlmpfi{n), x,mp {n) and x,mPι2{n) of 384 points each as described in equation [10]: with ^ = 0...2 , « = 0...N-1 and N = 384. It is the temporary signals xlmp{n), xtmp,p(n) that are effectively provided to the windowing and overlap-add unit 70. When the window type of the Ith set is 0, the windowing and overlap -add unit 70 calculates the signal x,(n) as:
where the divisor 1152 in [11] corresponds to the IMDCT transform length Ν When the window type of the /ft set is 1, the windowing and overlap -add unit 70 calculates the signal x,{n) as:
where the divisor 1152 in [12] corresponds with the IMDCT transform length Ν, the divisor 384 corresponds with Ν/3 and 576 corresponds with N/2. When the window type of the 7th set is 2, the windowing and overlap-add unit 70 calculates the signal X/(«) by first calculating three temporary signals:
I,,tmp,p (n) = sin ^f"+ ) n = 0..383,p = 0...2 [13] where the divisor 384 corresponds to the IMDCT transform length N. The signal x, (n) is then constructed as follows: x,(n)= 0 » = 0...191 xι (n) = xι,,mp,o{n - -192) « = 192...383 x ) = xι,,mp,o{n- -19 )+ ^(»-384) « = 384...575 [14] xι{n) = xι,,mpAn- -384)+ ,^2(«-576) n = 576...767 Xl {n) = Xl,,mp,2 {n - -576) « = 768...959 x, (n) = 0 « = 960...1151
When the window type of the / set is 3, the windowing and overlap -add unit 70 calculates the signal x,(n) as:
where divisor 1152 corresponds with the IMDCT length N and divisor 384 corresponds with N/3.
It will be seen that equations [11], [12], [13] and [15] are of the general type: xι(n) = z(n) xtmp(ή) [16] where xι(ή) is the windowed signal, xtmp(n) is the unwindowed signal and z(ή) is the window function. It is noted that the window functions z(ri) of equations [11], [12], [13] and [15] are generally similar to the window functions z( ) described in equations [4], [5], [6] and [7] respectively. However, the respective window lengths of the window functions z(n) in equations [11], [12], [13] and [15] are longer in accordance with the respective transform length N and the respective divisors are correspondingly larger. The window functions z(ή) of equations [11], [12], [13] and [15] may be said to comprise up- sampled versions of the window functions z(ή) described in equations [4], [5], [6] and [7] respectively, the extent of the up-sampling depending on the respective transform length/window length, N. It will also be noted that the window functions of equations [11], [12], [13] and [15] each comprises a single window function even though its application may involve the application of more than one window. Moreover, the windowing and overlap -add unit 70 makes only one application of the specified window type, i.e. applies only one window function, to the samples of a whole granule. This is in contrast to the conventional decoder 30 in which a window function is applied in respect of each subband. The PCM output signal produced by the windowing and overlap-add unit 70 is found to comprise a high quality audio signal although it is not fully conformant or bit-true with the MP3 standard. In particular, some phase distortion and aliasing are present leading to relatively small spectral distortions and time-domain distortions in comparison with MP3 conformant signals. However, these distortions or artefacts are found not to have a significant adverse effect on human perception of the audio signal. Effectively, in decoder 60 the hybrid synthesis filterbank is replaced by a "long" phase distorted inverse MDCT with some spectral aliasing. By eliminating the need for many "short" MDCTs and the synthesis filterbank, the computational complexity of the decoder 60 is significantly reduced. By way of example, a typically optimized conventional MP3 decoder requires approximately 22.11 multiplications and 26.73 additions per output sample. A correspondingly optimized decoder 60 requires just 8 multiplications and 20.5 additions per output sample. As a result, the decoder 60 offers a higher decoding efficiency, the latter leading to less power consumption or lower DSP requirements. The complexity of the decoder 60 is further reduced in that aliasing butterflies are not required (since their presence in the decoder 30 is to help the synthesis filterbank 42 to reconstruct the PCM output signal). The principles of the invention described above may equally be applied to an MP3-type encoder. Thus, a further aspect of the invention provides an apparatus for encoding an audio signal to produce an MPEG-1 layer III type signal or bitstream. It is noted that the bitstream is not a standard MP3 bitstream, although it is conformant with MP3 - the resulting decoded signals vary from the MP3 standard in phase response and aliasing. In essence, a "long" phase distorted MDCT is used to replace the analysis hybrid filterbank 12, 14 of the conventional encoder 10. Figure 8 shows a simplified block diagram of an encoder 80 embodying this aspect of the invention. The encoder 80 includes a windowing unit 82 which performs windowing on the received PCM input samples. The windowing functions are similar to those described in equations [4], [5], [6] and [7] although the window lengths are different in accordance with the required MDCT transform size. For normal, start or stop type of windows, an MDCT unit 84 performs a "long" MDCT on all 1152 input samples of a received frame to produce 576 frequency lines. For short windows, the MDCT unit 84 performs three "long" MDCTs on a respective one of 3 sets of 384 input samples to produce a respective set of 192 frequency lines. The encoder 80 may include a conventional MP3 quantization and coding unit 86 and bit allocation unit 88. A negation unit 85 may be provided between the MDCT unit 82 and the quantization and coding unit 86 for negating, alternate, i.e. every second, subbands. It will be understood that the role of the negation unit 66 in decoder 60 is to compensate for the inherent negation of alternate subbands that occurs in conventional MP3 encoders. Correspondingly, the role of the negation unit 85 in encoder 80 is to create the negation of alternate subbands that would normally occur in a conventional encoder 10. However, the negation of alternate subbands is not essential and so, in alternative embodiments, the negation units 66, 85 may be omitted. It will be understood that the decoder 60 is capable not only of decoding standard conformant MPEG-1 layer III data but also non-standard MPEG-1 layer III type data as produced by, for example, encoder 80. The invention is not limited to MPEG-1 layer III data signals or to MDCTs. For example, a decoder embodying the first aspect of the invention may be arranged to operate on encoded data signals produced by an encoder (including non-MPEG- 1 layer III encoders) which provides unencoded data signals (especially but not necessarily audio signals) to a subband filterbank and subsequently causes a respective forward frequency transform to be performed on each resulting subband signal, i.e. a hybrid filterbank. The subsequent quantizing and encoding need not necessarily be in accordance with MP3 so long as corresponding dequantizing and decoding is performed at the decoder. Similarly, the forward frequency transform need not necessarily comprise the MDCT so long as a compatible inverse frequency transform is employed by the decoder. In this connection, it is noted that the term "granule" is primarily an MP3 term but a skilled person will readily understand that, in the context of non-MP3 embodiments, the term "granule" as used herein may be interpreted as any equivalent grouping of frequency lines or coefficients (commonly the term "frame" is equivalent to "granule"). It is preferred, but not essential, that the subband filterbank and the frequency transform are critically sampled and the window functions overlap by 50% (hence the transform exhibits the Time Domain Aliasing Cancellation (TDAC) property) and, more preferably, real valued. It is also preferred, but not essential, that aliasing reduction is performed, e.g. by anti-aliasing butterflies, on the transformed subband signals at the encoder. The foregoing description relates to monoaural signals although the invention may readily be applied to stereo or multi-channel encoding and decoding by processing each respective channel in the manner described above. Encoders and decoders embodying the invention may be implemented in any convenient manner using, for example, computer program code, hardware or a combination of each. The invention is not limited to the embodiments described herein which may be modified or varied without departing from the scope of the invention.

Claims

CLAIMS:
1. A decoder for data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the decoder comprising means for decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; means for performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; and means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples, wherein, in respect of at least a first type of window function, said inverse frequency transform means is arranged to perform a single inverse frequency transform on all frequency lines of a respective granule, and wherein the decoder further includes means for constructing an output signal from said windowed data samples.
2. A decoder as claimed in Claim 1, wherein said subband filterbank comprises a critically sampled subband filterbank.
3. A decoder as claimed in Claim 1 or 2, wherein the encoding of said data signals further involves performing aliasing reduction on said subband signals.
4. A decoder as claimed in any preceding claim, wherein said forward frequency transform comprises a critically sampled transform.
5. A decoder as claimed in claim 4, wherein said window functions overlap with respect to the data samples by 50%.
6. A decoder as claimed in Claim 5, wherein said output signal constructing means applies one or more overlap-add operations to said windowed data signals to produce said output signal.
7. A decoder as claimed in any preceding claim, wherein said forward frequency transform comprises the Modified Discrete Cosine Transform (MDCT) and the inverse frequency transform comprises the Inverse Modified Discrete Cosine Transform (IMDCT).
8. A decoder as claimed in any preceding claim, wherein, in respect of at least said first type of window function, said window function application means is arranged to apply a single window function to all data samples produced in respect of a respective granule.
9. A decoder as claimed in any preceding claim, wherein said at least first type of window function includes length adjusted versions of MPEG-1 layer III type 0, type 1 and type 3 window functions.
10. A decoder as claimed in any preceding claim wherein in respect of at least a second type of window function, said inverse frequency transform means is arranged to perform a respective inverse frequency transform on a respective set of frequency lines of a granule, all of the frequency lines of said granule belonging to one or other of said sets.
11. A decoder as claimed in Claim 10, wherein, in respect of at least said second type of window function, said window function application means is arranged to apply a single window function to all data samples produced in respect of a respective set of frequency lines.
12. A decoder as claimed in Claim 10 or 11, wherein said at least second type of window function includes a length adjusted version of the MPEG-1 layer III type 2 window function, and the frequency lines of said granule belong to one or other of three sets.
13. A decoder as claimed in any preceding claim, wherein each frequency line within a granule is associated with a respective one of a plurality of frequency subbands, the decoder further including means for re-ordering the frequency lines within a granule when said at least second type of window function is to be applied, the re-ordering means being arranged to re-order the frequency lines according to, in decreasing granularity, which set they belong, which frequency subband they are associated with, and then frequency line order.
14. A decoder as claimed in any preceding claim, further including means for negating the frequency lines associated with alternate frequency subbands.
15. A decoder as claimed in Claim 14 when dependent on Claim 13, wherein said negating means is provided between said re-ordering means and said inverse frequency transform means.
16. A method of decoding data signals encoded by providing a data signal to a subband filterbank and by performing a respective forward frequency transform on each resulting subband signal, the method comprising decoding and dequantizing a received data signal to produce a plurality of granules of frequency lines; performing one or more inverse frequency transforms on each granule to produce a plurality of data samples; applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; and constructing an output signal from said windowed data samples, wherein, in respect of at least a first type of window function, a single inverse frequency transform is performed on all frequency lines within a respective granule.
17. An encoder for an input signal comprising a plurality of data samples, the encoder comprising means for applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; means for performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; and means for encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, said MDCT means is arranged to perform a single MDCT on all windowed data samples of the received data signal in respect of which a respective granule is produced.
18. An encoder as claimed in Claim 17, wherein, in respect of at least said first type of window function, said window function application means is arranged to apply a single window function to all windowed data samples of a respective of the received data signal in respect of which a respective granule is produced.
19. An encoder as claimed in Claim 17 or 18, wherein said at least first type of window function includes a length adjusted version of MPEG-1 layer III type 0, type 1 and type 3 window functions.
20. An encoder as claimed in any of Claims 17 to 19, wherein in respect of at least a second type of window function, said MDCT means is arranged to perform a respective MDCT on a respective set of windowed data samples in respect of which a respective granule is produced, all of the windowed data samples in respect of which a respective granule is produced belonging to one or other of said sets.
21. An encoder as claimed in Claim 20, wherein, in respect of at least said second type of window function, said window function application means is arranged to apply a single window function to all windowed data samples of a respective set.
22. An encoder as claimed in Claim 20 or 21, wherein said at least second type of window function includes a length adjusted version of the MPEG-1 layer III type 2 window function, and the windowed data samples in respect of which a respective granule is produced belong to one or other of three sets.
23. An encoder as claimed in any of Claims 17 to 22, wherein each frequency line within a granule is associated with a respective one of a plurality of frequency subbands, the encoder further including means for negating the frequency lines associated with alternate frequency subbands.
24. A method of encoding an input signal comprising a plurality of data samples, the method comprising applying one or more types of window functions to said data samples to produce a plurality of windowed data samples; performing one or more modified discrete cosine transforms (MDCTs) on the windowed data samples to produce a plurality of granules of frequency lines; encoding and quantizing each granule to generate MPEG-1 layer III type data signals, wherein, in respect of at least a first type of window function, a single MDCT is performed on all windowed data samples of the received data signal in respect of which a respective granule is produced.
25. A system for encoding and decoding data signals, the system comprising an encoder as claimed in Claim 17 and a decoder as claimed in Claim 1.
EP04799284A 2003-12-04 2004-11-30 Audio signal coding Withdrawn EP1692686A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP04799284A EP1692686A1 (en) 2003-12-04 2004-11-30 Audio signal coding

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP03104535 2003-12-04
EP04799284A EP1692686A1 (en) 2003-12-04 2004-11-30 Audio signal coding
PCT/IB2004/052602 WO2005055203A1 (en) 2003-12-04 2004-11-30 Audio signal coding

Publications (1)

Publication Number Publication Date
EP1692686A1 true EP1692686A1 (en) 2006-08-23

Family

ID=34639327

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04799284A Withdrawn EP1692686A1 (en) 2003-12-04 2004-11-30 Audio signal coding

Country Status (5)

Country Link
EP (1) EP1692686A1 (en)
JP (1) JP2007515672A (en)
KR (1) KR20060131767A (en)
CN (1) CN1890712A (en)
WO (1) WO2005055203A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102243872A (en) * 2010-05-10 2011-11-16 炬力集成电路设计有限公司 Method and system for encoding and decoding digital audio signals
EP2862168B1 (en) 2012-06-14 2017-08-09 Dolby International AB Smooth configuration switching for multichannel audio
EP3067886A1 (en) 2015-03-09 2016-09-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
JP7385531B2 (en) * 2020-06-17 2023-11-22 Toa株式会社 Acoustic communication system, acoustic transmitting device, acoustic receiving device, program and acoustic signal transmitting method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101726A1 (en) * 2001-06-08 2002-12-19 Stmicroelectronics Asia Pacific Pte Ltd Unified filter bank for audio coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2005055203A1 *

Also Published As

Publication number Publication date
CN1890712A (en) 2007-01-03
JP2007515672A (en) 2007-06-14
WO2005055203A1 (en) 2005-06-16
KR20060131767A (en) 2006-12-20

Similar Documents

Publication Publication Date Title
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
KR100892152B1 (en) Device and method for encoding a time-discrete audio signal and device and method for decoding coded audio data
US7343287B2 (en) Method and apparatus for scalable encoding and method and apparatus for scalable decoding
US8195730B2 (en) Apparatus and method for conversion into a transformed representation or for inverse conversion of the transformed representation
EP2264699B1 (en) Device and method for postprocessing spectral values and encoder and decoder for audio signals
US7917564B2 (en) Device and method for processing a signal having a sequence of discrete values
WO2005073959A1 (en) Audio signal decoding using complex-valued data
EP1943643A1 (en) Audio compression
JP2010538314A (en) Low-computation spectrum analysis / synthesis using switchable time resolution
JP3814611B2 (en) Method and apparatus for processing time discrete audio sample values
Geiger et al. IntMDCT-A link between perceptual and lossless audio coding
EP1692686A1 (en) Audio signal coding
Lam et al. Digital filtering for audio coding
Chen et al. Fast time-frequency transform algorithms and their applications to real-time software implementation of AC-3 audio codec
Herre Audio Coding Based on Integer Transforms

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060704

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LU MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20070420

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070831