EP1228507B1 - A method of reducing memory requirements in an ac-3 audio encoder - Google Patents

A method of reducing memory requirements in an ac-3 audio encoder Download PDF

Info

Publication number
EP1228507B1
EP1228507B1 EP99954578A EP99954578A EP1228507B1 EP 1228507 B1 EP1228507 B1 EP 1228507B1 EP 99954578 A EP99954578 A EP 99954578A EP 99954578 A EP99954578 A EP 99954578A EP 1228507 B1 EP1228507 B1 EP 1228507B1
Authority
EP
European Patent Office
Prior art keywords
channel
psd
bit
memory
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP99954578A
Other languages
German (de)
French (fr)
Other versions
EP1228507A1 (en
Inventor
Mohammed Javed Absar
Sapna George
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STMicroelectronics Asia Pacific Pte Ltd
Original Assignee
STMicroelectronics Asia Pacific Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by STMicroelectronics Asia Pacific Pte Ltd filed Critical STMicroelectronics Asia Pacific Pte Ltd
Publication of EP1228507A1 publication Critical patent/EP1228507A1/en
Application granted granted Critical
Publication of EP1228507B1 publication Critical patent/EP1228507B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture

Definitions

  • the present invention relates to a method of reducing memory requirements in an encoder particularly, but not exclusively, an AC-3 encoder.
  • AC-3 is a transform-based audio coding algorithm designed to provide data-rate reduction for wide-band signals while maintaining the high quality of the original content.
  • AC-3 soundtrack can be found on the latest generation of laser disc, can be found as the standard audio track on Digital Versatile Discs (DVD), is the standard audio format for High Definition Television (HDTV), and is being used for digital cable and satellite transmissions.
  • DVD Digital Versatile Discs
  • HDTV High Definition Television
  • Chip cost is dictated by parameters such as chip-area, memory and the more recently popularised low power consumption.
  • Decreasing memory requirements for a complex algorithm such as AC-3 Encoder means that algorithm must be deeply analysed and re-structured such that the quality of encoding is preserved but the quantity of storage area is decreased.
  • U.S. 5,687,282 discloses a method for reducing memory requirements for an encoder, by successively overwriting various functions used for encoding.
  • the present invention seeks to provide a method, particularly for an AC-3 encoder, which further reduces memory requirements.
  • a method of reducing memory requirements for an encoder which includes a function of bit allocation for quantising frequency coefficients of an input signal, including:
  • the method includes allocation of bits for coding mantissa values of the frequency coefficients, the bit allocation being determined on the basis of the PSD, SSMC and said variable, with bit allocation pointers being generated and stored for a current data block only.
  • bit allocation is followed by quantisation of the mantissa values according to the bit allocation pointers and packing of the quantised values into a data frame, the quantised values being stored in memory in place of original unquantised coeffients.
  • the frequency coefficients are initially separated into exponent and mantissa components and wherein the exponent components are overwritten with the PSD.
  • AC-3 is fundamentally an adaptive transform-based coder using a frequency-linear, critically sampled filterbank based on the Princen Bradley Time Domain Aliasing Cancellation (TDAC) technique.
  • TDAC Time Domain Aliasing Cancellation
  • AC-3 is a frame based encoder. Each frame contains information equivalent to 256x6 PCM (pulse code modulated) samples per channel. For coding convenience the frame is divided into six audio blocks, each block therefore containing information of 256 samples per channel.
  • PCM pulse code modulated
  • Transients are detected in the full-bandwidth channels in order to decide when to switch to short length audio blocks for restricting quantization noise associated with the transient within a small temporal region about the transient.
  • High-pass filtered versions of the signals are examined for an increase in energy from one sub-block time segment to the next.
  • Sub-blocks are examined at different time scales. If a transient is detected in the second half of an audio block in a channel, that channel switches to a short block. In presence of transient the bit 'blksw' for the channel in the encoded bit stream in the particular audio block is set.
  • the transient detector operates on 512 samples for every audio block.
  • AC-3 Encoder uses overlap technique, so although each block contains 256 samples only, when the block is presented for transient detection (or frequency-transformation) the previous block is prefixed to it, which produces a total of 512 samples.
  • Each channel's time domain input signal is windowed and filtered with a TDAC-based analysis filter bank to generate frequency domain coefficients. If the blksw bit is set, meaning that a transient was detected for the block, two short transforms of length 256 each are taken, which increases the temporal resolution of the signal. If not set, a single long transform of length 512 is taken, thereby providing a high spectral resolution.
  • the output frequency coefficient X k is defined as: where x [ n ] is the windowed input sequence for a channel and N is the transform length. Instead of evaluating X k in the form given above it could be computed in a computationally efficient manner in accordance with the following: where The symbol j represents the imaginary number -1 .
  • High compression can be achieved in AC-3 by use of a technique known as coupling.
  • Coupling takes advantage of the way the human ear determines directionality for very high frequency signals.
  • the encoder combines the high frequency coefficients of the individual channels to form a common coupling channel.
  • the original channels combined to form the coupling channel are called the coupled channel.
  • the most basic encoder can form the coupling channel by simply taking the average of all the individual channel coefficients.
  • a more sophisticated encoder could alter the signs of the individual channels before adding them into the sum to avoid phase cancellation.
  • the generated coupling channel is next sectioned into a number of bands. For each such band and each coupling channel a coupling co-ordinate is transmitted to the decoder. To obtain the high frequency coefficients in any band, for a particular coupled channel, from the coupling channel, the decoder multiplies the coupling channel coefficients in that frequency band by the coupling co-ordinate of that channel for that particular frequency band. For a dual channel encoder a phase correction information is also sent for each frequency band of the coupling channel. "Assume that the frequency domain coefficients are identified as:
  • An additional process, rematrixing, is invoked in the special case that the encoder is processing two channels only.
  • the sum and difference of the two signals from each channel are calculated on a band by band basis, and if, in a given band, the level disparity between the derived (matrixed) signal pair is greater than the corresponding level of the original signal, the matrix pair is chosen instead.
  • More bits are provided in the bit stream to indicate this condition, in response to which the decoder performs a complementary unmatrixing operation to restore the original signals.
  • the rematrix bits are omitted if the coded channels arc more than two.
  • This technique avoids directional unmasking if the decoded signals are subsequently processed by a matrix surround processor, such as Dolby Prologic decoder.
  • rematrixing is performed independently in separate frequency bands. There are four band with boundary locations dependent on the coupling information. The boundary location are by coefficient bin number, and the corresponding rematrixing band frequency boundaries change with sampling frequency.
  • the coefficient values which may have undergone rematrix and coupling process, are converted to a specific floating point representation, resulting in separate arrays of exponents and mantissas. This floating point arrangement is maintained through out the remainder of the coding process, until just prior to the decoder's inverse transform, and provides 144 dB dynamic range, as well as allows AC-3 to be implemented on either fixed or floating point hardware.
  • Coded audio information consists essentially of separate representation of the exponent and mantissas arrays. The remaining coding process focuses individually on reducing the exponent and mantissa data rate.
  • the exponents are coded using one of the exponent coding strategies.
  • Each mantissa is truncated to a fixed number of binary places.
  • the number of bits to be used for coding each mantissa is to be obtained from a bit allocation algorithm which is based on the masking property of the human auditory system.
  • Exponent values in AC-3 are allowed to range from 0 to -24.
  • the exponent acts as a scale factor for each mantissa.
  • Exponents for coefficients which have more than 24 leading zeros are fixed at -24 and the corresponding mantissas are allowed to have leading zeros.
  • AC-3 bit stream contains exponents for independent, coupled and the coupling channels. Exponent information may be shared across blocks within a frame, so blocks 1 through 5 may reuse exponents from previous blocks.
  • AC-3 exponent transmission employs differential coding technique, in which the exponents for a channel are differentially coded across frequency.
  • the first exponent is always sent as an absolute value.
  • the value indicates the number of leading zeros of the first transform coefficient.
  • Successive exponents are sent as differential values which must be added to the prior exponent value to form the next actual exponent value.
  • the differential encoded exponents are next combined into groups.
  • the grouping is done by one of the three methods: D15, D25 and D45 . These together with 'reuse' are referred to as exponent strategies.
  • the number of exponents in each group depends only on the exponent strategy.
  • each group is formed from three exponents.
  • D45 four exponents are represented by one differential value.
  • three consecutive such representative differential values are grouped together to form one group.
  • Each group always comprises of 7 bits.
  • the strategy is 'reuse' for a channel in a block, then no exponents are sent for that channel and the decoder reuses the exponents last sent for this channel
  • Pre-processing of exponents prior to coding can lead to better audio quality.
  • Choice of the suitable strategy for exponent coding forms a crucial aspect of AC-3.
  • D15 provides the highest accuracy but is low in compression.
  • transmitting only one exponent set for a channel in the frame (in the first audio block of the frame) and attempting to 'reuse' the same exponents for the next five audio block, can lead to high exponent compression but also sometimes very audible distortion.
  • the bit allocation algorithm analyses the spectral envelope of the audio signal being coded, with respect to masking effects, to determine the number of bits to assign to each transform coefficient mantissa.
  • the bit allocation is recommended to be performed globally on the ensemble of channels as an entity, from a common bit pool.
  • the bit allocation routine contains a parametric model of the human hearing for estimating a noise level threshold, expressed as a function of frequency, which separates audible from inaudible spectral components.
  • Various parameters of the hearing model can be adjusted by the encoder depending upon the signal characteristic.
  • the number of bits available for packing mantissas, in an AC-3 frame is dependent firstly, of course, on the frame-size and, secondly, on the number of bits consumed by other fields - exponents, coupling parameters etc.
  • a significant part of the bit-allocation process is the optimisation of the bit-allocation to mantissa such that under masking consideration, the sum total of all bits consumed by mantissas equals (or is almost close to) available bits. This optimisation is performed by what's known as a Binary-Convergence Algorithm.
  • Floating point arithmetic usually use IEEE 754 (32 bits : 24-bit mantissas, 7-bit exponent & 1 sign bit) which is adequate for high quality AC-3 encoding.
  • Work-stations like Sun SPARCstation 20 can provide much higher precision (e.g. double is 8 bytes).
  • floating point units require more chip area and consequently most DSP Processors use fixed point arithmetic.
  • the AC-3 Encoder is often intended to be a part of a consumer product e.g. DVD (Digital Versatile Disk) where cost (chip area) is an important factor.
  • the AC-3 Encoder has been implemented on 24-bit processors like the Motorola 56000 and has met with much commercial success.
  • the quality of AC-3 Encoder on a 16-bit processor though universally assumed to be of low quality, no adequate study (as yet not published) has been conducted to benchmark the quality or compare it with the floating point version.
  • double precision (32-bit) to implement the encoder on a 16-bit processor can lead to high quality (even more than the 24-bit version).
  • double precision arithmetic is very computationally expensive (e.g. on D950 single precision multiplication takes 1 cycle while double precision requires 6 cycles). Rather than allowing single or double precision throughout the whole cycle of processing, different precision calculations may be made for different stages of computation.
  • D950 contains two data-memory spaces called X-Memory and Y-Memory, from which load/store operations can be performed concurrently in a single cycle.
  • data-memory in DSPs are usually flat (unsegmented), for indexing and logistics purpose this implementation views memory as chunks of 512 words. Choice of 512 is natural since each block contains 256 words of PCM and for a stereo this adds to 512.
  • Segments in X-Memory are labelled as X00, X01 etc.
  • Y-Memory segments are labelled as Y00, Y01 etc. Consecutive segments are assumed to be adjacent to each other, e.g. if starting address of X04 is 1500, then address of X05 will be 2000.
  • segments X07-X12 is written with the input PCM-data of six blocks.
  • AC-3 uses overlap method for frequency transformation, whereby each block requires data from previous block to generate coefficients for current block.
  • PCM input from last block of previous frame is combined.
  • Previous frame last block is stored in X13 and upon start of processing for current frame it is copied to X06 , so that X06-X12 presents a continuous block of 6 block, each 512 samples (with overlap), as illustrated in Figure 2.
  • Transient detection for each block requires 512 inputs. As explained earlier, each block combined with data from previous one is presented for transient detection and frequency transformation.
  • the filtering operation does not alter the input but generates an equal number of high pass filtered information. This is analysed by the transient-detector to generate transient information. Filtering and transient detection requires a working buffer of 1024 words - X14-X15 , as illustrated in Figure 2.
  • Frequency Transformation using the Time-Domain-Aliasing-Cancellation Method produces 256 32-bit coefficients. These coefficients are transferred from X14 to appropriate location in the address space X00-X11 , so that a particular coefficient X[blk_no][ch][bin] can be addressed conveniently as X00[blk_no*1024+ch*512+bin*2] .
  • Figure 3 shows the arrangement of PCM samples from X06-X12 allowing the generated coefficients to be stored in the required format while safe guarding that coefficients storing does not result in over-writing (write-before-read) of PCM samples still required for generation of coefficients of next block (or channel).
  • Rematrixing is very straight forward as far as memory requirements and allocations are concerned. Rematrixed data is written in-place of the original channel coefficients.
  • the coupling channel may be mapped to the same memory reserved for one of the coupled channel. Normally a memory space of 256 bins would be reserved for storing and processing coefficients of each full-bandwidth channel (e.g. channel 0 & 1, for stereo encoder). However, instead of creating a new block of memory for coupling channel, a coupled channel's location may be reused. From bin zero to endmant[ch]-1 , coefficients for coupled channel (ch) are stored and from endmant[ch] onwards to max (255, cplendmant) the coupling channel coefficients are stored.
  • Each frequency coefficient (32-bit) generates a mantissa and an exponent. Exponents have a maximum value of 24 therefore sixteen bits are more than enough to store their value. For mantissas it is not obvious whether sixteen bits are enough or full thirty-two bits need to be retained. However, patent application by the author titled “Accuracy Demands on Mantissa Representation in AC- Encoder", addresses this issue and proves that sixteen bits are sufficient. Therefore, six block of frequency coefficients in locations X00-X11 are overwritten with exponents ( X00-X05 ) and mantissas ( X06-X11 ), see Figure 2.
  • exponents in AC-3 are differentially-coded and subsequently grouped using one of the schemes D15, D25, D45 and Reuse. Scratch pad memory of 2 K is required for coding and grouping process. The resulting grouped exponents require additional memory for storage before they are finally packed into AC-3 frame. The memory allocated must be sufficient even in the worst case. Let us check this.
  • the grouped exponents must be easy to index. Even though the grouped exponents may occupy 512 words, they would in general be spread out in memory because of indexing e.g. to index to grp_exp[blk_no][ch][grp] , the address should be X12[ (blk_no*max_grp_size*3) + (max_grp_size*ch) + grp] .
  • Bit allocation is one of the most complicated (computationally and memory wise) part of AC-3 encoding. It can be partitioned into the following steps
  • the first step of bit allocation determines the power-spectrum density (PSD) according to equation below.
  • PSD power-spectrum density
  • the PSD are to be stored in the same location as the exponents. This is possible as exponents are no longer required once PSD is generated.
  • Next step of the algorithm integrates fine-grain PSD values within each of a multiplicity of 1/6th octave bands to generate band-psd.
  • the integration of PSD values in each band is performed with log-addition.
  • the log-addition is implemented by computing the difference between the two operands and using the absolute difference divided by 2 as an address into a length 256 lookup table. In total, there can be 50 such bands per channel.
  • the coupling channel however can reuse the same location as one of the coupled channel.
  • the band-psd for the coupled channel occupies the lower part ( 0-bndstart[ch] ), the upper portion can be occupied be the coupling channel, provided the starting bin of the coupling channel always is on a new band - otherwise coupling band will overwrite the last band of the coupled channel.
  • Table I above shows the band structure for PSD-integration.
  • the excitation function is computed by applying the prototype masking curve selected by the encoder (and transmitted to the decoder) to the integrated PSD spectrum (bndpsd[]). The result of this computation is then offset downward in amplitude by the fgain and sgain parameters, which are also obtained from the bit stream.
  • excitation curve values can be written in-place of the band-psd. However, since band-psd values are required during initial portion of masking curve calculations, a temporary back-up of its value can be made.
  • FSMC First-Step-Masking Curve
  • This step computes the masking (noise level threshold) curve from the excitation function.
  • the hearing threshold is given in ATSC Document.
  • the fscod and dbknee variables are assigned by the encoder.
  • the FSMC is written over the excitation curve as its value are no longer required by the encoder.
  • AC-3 performs global bit allocation, that is, the allocation routine shuttles bits across channels and blocks as necessary, to meet the shifting demands of the signal.
  • Mantissa bits for the entire frame are allocated from a common pool. As a result the bit-allocation requires masking and psd information of the entire frame.
  • quantization of the mantissa according to the assigned bits can be performed on a block basis. This is because sharing of information about quantized mantissa is restricted to block level.
  • the first step is to separate quantization from the bit allocation process.
  • bit-allocation requires only three piece of information - FSMC, PSD and snroffst . While the first two are familiar by now, the third parameter needs to be explained.
  • the bit allocation algorithm iterates with various values of this parameter till it converges to a value with which the total quantized-mantissa bits in the frame add upto the available bits.
  • the masking curve needs to be re-computed at each iteration from the excitation curve.
  • the baps would be computed and totalled to estimate the mantissa size. Storing baps for the entire frame would require 3 K memory.
  • the masking curve would have to be stored at a separate location from the excitation curve, otherwise for next iteration the excitation curve values would be corrupted.
  • the SSMC second-step masking curve
  • the SSMC is computed and stored in a temporary location and disposed once its purpose is served. Baps are stored for current block only.
  • the optimal snroffst value is known, the SSMC and baps are re-computed for each block again as and when necessary. This effectively increases number of iterations by one, but since usually the number of iterations are quite large (-6) the impact is not significant.
  • the last step of the bit-allocation checks if the constraints (ATSC Doc.) on the AC-3 frame such as - size of block 0 and block 1 combined, will never exceed 5 / 8 of the frame are satisfied. Once constraint test is passed the bit allocation pointers for each block is computed and their value is used to quantize the mantissas.
  • the mantissas in X06-X11 are quantized up to number of bits dictated by the bit-allocation-pointers.
  • the quantized mantissas are stored in-place. However, in AC-3 mantissas with certain levels of quanization are grouped together. These mantissas need to be stored separately and grouped and then packed into the AC-3 frame.
  • baps 1,2 and 4 i.e. Lev-3, Lev-5 and Lev-11 mantissas
  • Mapping of bap to Quantizer bap quantizer levels quantization type mantissa bits qntztab[bap]) (group bits / num in group) 0 0 none 0 1 3 symmetric 1.67(5/3) 2 5 symmetric 2.33 (7/3) 3 7 symmetric 3 4 11 symmetric 3.5(7/2) 5 15 symmetric 4 6 32 asymmetric 5 7 64 asymmetric 6 8 128 asymmetric 7 9 256 asymmetric 8 10 512 asymmetric 9 11 1,024 asymmetric 10 12 2,048 asymmeuic 11 13 4,096 asymmetric 12 14 16,384 asymmetric 14 15 65,536 asymmetric 16
  • Figure 6 shows the Quantizer which quantizes mantissa of a particular block according to the corresponding baps.
  • Lev-3,5 and 11 mantissas are stored separately for grouping, one can store these mantissas in their original location but then would need pointer to them for grouping stage, these pointer being equal in number would occupy identical amount of space.
  • the compression of the level mantissas is 3,3 and 2 (corresponding to group size of 3,3 and 2), therefore proportional amount of space is reserved in Y06-Y07 for each.
  • the last step in the encoding process is the packing of mantissas onto the AC-3 frame. For each mantissa Q bits of the quantized mantissa is stored into the AC-3 frame, the size Q being determined from the bit-allocation pointer value. At this stage, the PSD values for the block under consideration are no longer required and so the Q values may be stored in their place (Location : X06-X11 ), see Figure 2.
  • the frame size depends on the compression ratio. For stereo AC-3, bitrates of up to 192-384 kbps are reasonable, for in this range transparent quality can be achieved. The largest frame size (836 words) results when the bitrate is 384 kbps, sampling frequency being 44.1 kHz. A 1 K of frame buffer size is therefore reasonable for storing the AC-3 frame ( X14-X15 ).

Description

    Field of the Invention
  • The present invention relates to a method of reducing memory requirements in an encoder particularly, but not exclusively, an AC-3 encoder.
  • Background of the Invention
  • AC-3 is a transform-based audio coding algorithm designed to provide data-rate reduction for wide-band signals while maintaining the high quality of the original content. In the consumer electronics industry AC-3 soundtrack can be found on the latest generation of laser disc, can be found as the standard audio track on Digital Versatile Discs (DVD), is the standard audio format for High Definition Television (HDTV), and is being used for digital cable and satellite transmissions.
  • For all these consumer applications, the AC-3 algorithm must be mapped to the firmware of DSP which would be part of a bigger system performing data formatting, synchronisation, error checking and recovery and perhaps video coding as well. The consumer market always requires a cheap, but high quality product, and this means the cost issue must be addressed right from the start. Chip cost is dictated by parameters such as chip-area, memory and the more recently popularised low power consumption.
  • Decreasing memory requirements for a complex algorithm such as AC-3 Encoder, means that algorithm must be deeply analysed and re-structured such that the quality of encoding is preserved but the quantity of storage area is decreased.
  • U.S. 5,687,282 discloses a method for reducing memory requirements for an encoder, by successively overwriting various functions used for encoding. The present invention seeks to provide a method, particularly for an AC-3 encoder, which further reduces memory requirements.
  • Summary of the Invention
  • In accordance with the invention, there is provided a method of reducing memory requirements for an encoder which includes a function of bit allocation for quantising frequency coefficients of an input signal, including:
  • calculating a power spectrum density (PSD);
  • integrating the PSD over a plurality of frequency bands to form a band-PSD;
  • computing an excitation function by applying a prototype masking curve to the band-PSD;
  • generating a first-step-masking curve (FSMC) of a noise level threshold from the excitation function;
  • calculating a second-step-masking curve (SSMC) by incrementing the FSMC in accordance with a selected signal to noise variable (snroffst), wherein
  • the excitation function, after being computed, is written to memory in place of the band-PSD and is subsequently overwritten by the FSMC and wherein the SSMC is stored in a temporary memory and recalculated for each block of data provided by the encoder,
  • characterised in that: the frequency coefficients of two coupled input channels are combined in a coupling channel, the method including mapping coupling channel data to memory reserved for one of the coupled channels.
  • Preferably, the method includes allocation of bits for coding mantissa values of the frequency coefficients, the bit allocation being determined on the basis of the PSD, SSMC and said variable, with bit allocation pointers being generated and stored for a current data block only.
  • Preferably, the bit allocation is followed by quantisation of the mantissa values according to the bit allocation pointers and packing of the quantised values into a data frame, the quantised values being stored in memory in place of original unquantised coeffients.
  • Preferably, the frequency coefficients are initially separated into exponent and mantissa components and wherein the exponent components are overwritten with the PSD.
  • Further information made redundant during encoding is identified in the following detailed description as being suitable for overwriting, to further reduce memory requirements for the AC-3 encoder.
  • Brief Description of the Drawings
  • The invention is more fully described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
  • Figure 1 is an encoder system diagram;
  • Figure 2 illustrates a memory allocation scheme;
  • Figure 3 illustrates a frequency coefficients transfer method;
  • Figure 4 illustrates coupling calculation using fixed-point method;
  • Figure 5 shows a bit allocation algorithm; and
  • Figure 6 illustrates quantisation of mantissas.
  • Detailed Description of a Preferred Embodiment
  • For subsequent implementation of AC-3 Encoder on a Fixed-Point DSP, the word-length requirements of each processing blocks, where fixed point arithmetic is used, needs to be ascertained.
  • Finally the issue of memory allocation scheme in the Encoder, which is the subject of the present invention is addressed. Based on the experience of the algorithm development and analysis/study of the quality requirements - a low memory solution to AC-3 Encoder will be described which performs memory optimisation without compromise in quality.
  • A. System overview
  • Like the AC-2 single channel coding technology from which it derives, AC-3 is fundamentally an adaptive transform-based coder using a frequency-linear, critically sampled filterbank based on the Princen Bradley Time Domain Aliasing Cancellation (TDAC) technique. The AC-3 system diagram is shown in Figure 1.
  • A.1 Input Format
  • AC-3 is a frame based encoder. Each frame contains information equivalent to 256x6 PCM (pulse code modulated) samples per channel. For coding convenience the frame is divided into six audio blocks, each block therefore containing information of 256 samples per channel.
  • A.2 Transient Detection
  • Transients are detected in the full-bandwidth channels in order to decide when to switch to short length audio blocks for restricting quantization noise associated with the transient within a small temporal region about the transient. High-pass filtered versions of the signals are examined for an increase in energy from one sub-block time segment to the next. Sub-blocks are examined at different time scales. If a transient is detected in the second half of an audio block in a channel, that channel switches to a short block. In presence of transient the bit 'blksw' for the channel in the encoded bit stream in the particular audio block is set.
  • The transient detector operates on 512 samples for every audio block. AC-3 Encoder uses overlap technique, so although each block contains 256 samples only, when the block is presented for transient detection (or frequency-transformation) the previous block is prefixed to it, which produces a total of 512 samples.
  • A.3 Frequency Transformation
  • Each channel's time domain input signal is windowed and filtered with a TDAC-based analysis filter bank to generate frequency domain coefficients. If the blksw bit is set, meaning that a transient was detected for the block, two short transforms of length 256 each are taken, which increases the temporal resolution of the signal. If not set, a single long transform of length 512 is taken, thereby providing a high spectral resolution.
  • The output frequency coefficient Xk is defined as:
    Figure 00040001
    where x[n] is the windowed input sequence for a channel and N is the transform length. Instead of evaluating Xk in the form given above it could be computed in a computationally efficient manner in accordance with the following:
    Figure 00040002
    where
    Figure 00040003
    The symbol j represents the imaginary number -1 . The expression
    Figure 00040004
    is obtained from the well known FFT method, by first using transformation x'[n]=x[n] * ej π n / N and then computing the FFT x'[n]=x[n] * ej π n / N .
  • A.4 Coupling
  • High compression can be achieved in AC-3 by use of a technique known as coupling. Coupling takes advantage of the way the human ear determines directionality for very high frequency signals. At high audio frequency (approx. above 4KHz.), the ear is physically unable to detect individual cycles of an audio waveform and instead responds to the envelope of the waveform. Consequently, the encoder combines the high frequency coefficients of the individual channels to form a common coupling channel. The original channels combined to form the coupling channel are called the coupled channel.
  • The most basic encoder can form the coupling channel by simply taking the average of all the individual channel coefficients. A more sophisticated encoder could alter the signs of the individual channels before adding them into the sum to avoid phase cancellation.
  • The generated coupling channel is next sectioned into a number of bands. For each such band and each coupling channel a coupling co-ordinate is transmitted to the decoder. To obtain the high frequency coefficients in any band, for a particular coupled channel, from the coupling channel, the decoder multiplies the coupling channel coefficients in that frequency band by the coupling co-ordinate of that channel for that particular frequency band. For a dual channel encoder a phase correction information is also sent for each frequency band of the coupling channel. "Assume that the frequency domain coefficients are identified as:
  • ai , for the first coupled channel ,
  • bi , for the second coupled channel ,
  • ci , for the coupling channel,
  • For each sub-band, the value .Σiai * bi is computed , index i extending over the frequency range of the sub-band. If Σiai * bi > 0, coupling for this sub-band is performed as ci = (ai + bi )/2. Similarly, if ci = (ai + bi )/2 , then coupling strategy for the sub-band is as ci = (ai + bi )/2.
    Adjacent sub-bands using identical coupling strategies may be grouped together to form one or more coupling bands. However, sub-bands with different coupling strategies must not be banded together. If overall coupling strategy for a band isci = (ai + bi )/2 , i.e. for all sub-bands comprising the band the phase flag for the band is set to +1, else it is set to -1.
  • A.5 Rematrixing
  • An additional process, rematrixing, is invoked in the special case that the encoder is processing two channels only. The sum and difference of the two signals from each channel are calculated on a band by band basis, and if, in a given band, the level disparity between the derived (matrixed) signal pair is greater than the corresponding level of the original signal, the matrix pair is chosen instead. More bits are provided in the bit stream to indicate this condition, in response to which the decoder performs a complementary unmatrixing operation to restore the original signals. The rematrix bits are omitted if the coded channels arc more than two.
  • The benefit of this technique is that it avoids directional unmasking if the decoded signals are subsequently processed by a matrix surround processor, such as Dolby Prologic decoder.
  • In AC-3, rematrixing is performed independently in separate frequency bands. There are four band with boundary locations dependent on the coupling information. The boundary location are by coefficient bin number, and the corresponding rematrixing band frequency boundaries change with sampling frequency.
  • A.6 Conversion to Floating Point
  • The coefficient values, which may have undergone rematrix and coupling process, are converted to a specific floating point representation, resulting in separate arrays of exponents and mantissas. This floating point arrangement is maintained through out the remainder of the coding process, until just prior to the decoder's inverse transform, and provides 144 dB dynamic range, as well as allows AC-3 to be implemented on either fixed or floating point hardware.
  • Coded audio information consists essentially of separate representation of the exponent and mantissas arrays. The remaining coding process focuses individually on reducing the exponent and mantissa data rate.
  • The exponents are coded using one of the exponent coding strategies. Each mantissa is truncated to a fixed number of binary places. The number of bits to be used for coding each mantissa is to be obtained from a bit allocation algorithm which is based on the masking property of the human auditory system.
  • A.7 Exponent Coding Strategy
  • Exponent values in AC-3 are allowed to range from 0 to -24. The exponent acts as a scale factor for each mantissa. Exponents for coefficients which have more than 24 leading zeros are fixed at -24 and the corresponding mantissas are allowed to have leading zeros.
  • AC-3 bit stream contains exponents for independent, coupled and the coupling channels. Exponent information may be shared across blocks within a frame, so blocks 1 through 5 may reuse exponents from previous blocks.
  • AC-3 exponent transmission employs differential coding technique, in which the exponents for a channel are differentially coded across frequency. The first exponent is always sent as an absolute value. The value indicates the number of leading zeros of the first transform coefficient. Successive exponents are sent as differential values which must be added to the prior exponent value to form the next actual exponent value.
  • The differential encoded exponents are next combined into groups. The grouping is done by one of the three methods: D15, D25 and D45. These together with 'reuse' are referred to as exponent strategies. The number of exponents in each group depends only on the exponent strategy. In the D15 mode, each group is formed from three exponents. In D45 four exponents are represented by one differential value. Next, three consecutive such representative differential values are grouped together to form one group. Each group always comprises of 7 bits. In case the strategy is 'reuse' for a channel in a block, then no exponents are sent for that channel and the decoder reuses the exponents last sent for this channel
  • Pre-processing of exponents prior to coding can lead to better audio quality.
  • Choice of the suitable strategy for exponent coding forms a crucial aspect of AC-3. D15 provides the highest accuracy but is low in compression. On the other hand transmitting only one exponent set for a channel in the frame (in the first audio block of the frame) and attempting to 'reuse' the same exponents for the next five audio block, can lead to high exponent compression but also sometimes very audible distortion.
  • A. 8 Bit Allocation for Mantissas
  • The bit allocation algorithm analyses the spectral envelope of the audio signal being coded, with respect to masking effects, to determine the number of bits to assign to each transform coefficient mantissa. In the encoder, the bit allocation is recommended to be performed globally on the ensemble of channels as an entity, from a common bit pool.
  • The bit allocation routine contains a parametric model of the human hearing for estimating a noise level threshold, expressed as a function of frequency, which separates audible from inaudible spectral components. Various parameters of the hearing model can be adjusted by the encoder depending upon the signal characteristic.
  • The number of bits available for packing mantissas, in an AC-3 frame, is dependent firstly, of course, on the frame-size and, secondly, on the number of bits consumed by other fields - exponents, coupling parameters etc. A significant part of the bit-allocation process is the optimisation of the bit-allocation to mantissa such that under masking consideration, the sum total of all bits consumed by mantissas equals (or is almost close to) available bits. This optimisation is performed by what's known as a Binary-Convergence Algorithm.
  • B. Word-Length Requirements in AC-3 Encoder
  • Floating point arithmetic usually use IEEE 754 (32 bits : 24-bit mantissas, 7-bit exponent & 1 sign bit) which is adequate for high quality AC-3 encoding. Work-stations like Sun SPARCstation 20 can provide much higher precision (e.g. double is 8 bytes). However floating point units require more chip area and consequently most DSP Processors use fixed point arithmetic. The AC-3 Encoder is often intended to be a part of a consumer product e.g. DVD (Digital Versatile Disk) where cost (chip area) is an important factor.
  • The AC-3 Encoder has been implemented on 24-bit processors like the Motorola 56000 and has met with much commercial success. The quality of AC-3 Encoder on a 16-bit processor, though universally assumed to be of low quality, no adequate study (as yet not published) has been conducted to benchmark the quality or compare it with the floating point version.
  • Using double precision (32-bit) to implement the encoder on a 16-bit processor can lead to high quality (even more than the 24-bit version). However, double precision arithmetic is very computationally expensive (e.g. on D950 single precision multiplication takes 1 cycle while double precision requires 6 cycles). Rather than allowing single or double precision throughout the whole cycle of processing, different precision calculations may be made for different stages of computation.
  • C. Memory Allocation Scheme in AC-3 Stereo Encoder
  • In what follows next, the implementation of a two-channel AC-3 Encoder is described, which is greatly optimised in terms of memory requirements. The specific DSP under consideration is the STMicroelectronics' proprietary D950-Core, a general purpose programmable 16-bit fixed point Digital Signal Processor. Although the memory optimisations are being described with reference to this 16-bit machine, many of the concepts are applicable to other DSPs (with different word length) as well.
  • Like most DSPs, D950 contains two data-memory spaces called X-Memory and Y-Memory, from which load/store operations can be performed concurrently in a single cycle. Although data-memory in DSPs are usually flat (unsegmented), for indexing and logistics purpose this implementation views memory as chunks of 512 words. Choice of 512 is natural since each block contains 256 words of PCM and for a stereo this adds to 512. Segments in X-Memory are labelled as X00, X01 etc. Similarly Y-Memory segments are labelled as Y00, Y01 etc. Consecutive segments are assumed to be adjacent to each other, e.g. if starting address of X04 is 1500, then address of X05 will be 2000.
  • C.1 Input Format
  • Six audio-blocks, each consisting of 256 samples per channel, are buffered at the input buffer and transmitted to the internal working memory of the AC-3 Encoder Algorithm.
    Figure 00080001
  • As in the diagram above, segments X07-X12 is written with the input PCM-data of six blocks. As mentioned earlier, AC-3 uses overlap method for frequency transformation, whereby each block requires data from previous block to generate coefficients for current block. For block zero of each frame, PCM input from last block of previous frame is combined. Previous frame last block is stored in X13 and upon start of processing for current frame it is copied to X06, so that X06-X12 presents a continuous block of 6 block, each 512 samples (with overlap), as illustrated in Figure 2.
  • C.2 Transient Detection
  • Transient detection for each block requires 512 inputs. As explained earlier, each block combined with data from previous one is presented for transient detection and frequency transformation.
  • The filtering operation does not alter the input but generates an equal number of high pass filtered information. This is analysed by the transient-detector to generate transient information. Filtering and transient detection requires a working buffer of 1024 words - X14-X15, as illustrated in Figure 2.
  • C.3 Frequency Transformation
  • 512 samples, each one word (16-bit), are multiplied by the corresponding window coefficient (32-bit) to produce a 48-bit product which is truncated to 32-bit (2-word) before storage. The input is in X14 (512 16-bit PCM) but after windowing the output is 512- 32-bit i.e. 1024 words (X14-X15). For in-place computation of windowed signal, the 16-bit PCM data is spaced out with a blank word between each data. When double word product is generated it can then be stored in-place.
  • Frequency Transformation using the Time-Domain-Aliasing-Cancellation Method produces 256 32-bit coefficients. These coefficients are transferred from X14 to appropriate location in the address space X00-X11, so that a particular coefficient X[blk_no][ch][bin] can be addressed conveniently as X00[blk_no*1024+ch*512+bin*2].
  • Figure 3 shows the arrangement of PCM samples from X06-X12 allowing the generated coefficients to be stored in the required format while safe guarding that coefficients storing does not result in over-writing (write-before-read) of PCM samples still required for generation of coefficients of next block (or channel).
  • C.4 Rematrixing
  • Rematrixing is very straight forward as far as memory requirements and allocations are concerned. Rematrixed data is written in-place of the original channel coefficients.
  • C.5 Coupling
  • Two or more coupled channel together form the coupling channel. Once generated, the coupling channel is treated as a new channel and processed differently. However the last bin for coupled channels is one short of the starting bin for coupling channel. endmant[ch] = cplstrtmant ;//ch is coupled channel, ATSC Doc. pg. 47
  • Based on this knowledge, the coupling channel may be mapped to the same memory reserved for one of the coupled channel. Normally a memory space of 256 bins would be reserved for storing and processing coefficients of each full-bandwidth channel (e.g. channel 0 & 1, for stereo encoder). However, instead of creating a new block of memory for coupling channel, a coupled channel's location may be reused. From bin zero to endmant[ch]-1, coefficients for coupled channel (ch) are stored and from endmant[ch] onwards to max (255, cplendmant) the coupling channel coefficients are stored.
  • The coupling process, as illustrated generally in Figure 4, has been described in detail in copending patent application entitled "Channel Coupling for an AC-3 Encoder". From memory point of view, 1 KW of X and Y-Memory, each, is required as scratch pad memory (working buffer) as shown in Figure 2.
  • C.6 Float to Scientific Notation
  • Each frequency coefficient (32-bit) generates a mantissa and an exponent. Exponents have a maximum value of 24 therefore sixteen bits are more than enough to store their value. For mantissas it is not obvious whether sixteen bits are enough or full thirty-two bits need to be retained. However, patent application by the author titled "Accuracy Demands on Mantissa Representation in AC- Encoder", addresses this issue and proves that sixteen bits are sufficient. Therefore, six block of frequency coefficients in locations X00-X11 are overwritten with exponents (X00-X05) and mantissas (X06-X11), see Figure 2.
  • C.7 Exponent Coding
  • As explained earlier, exponents in AC-3 are differentially-coded and subsequently grouped using one of the schemes D15, D25, D45 and Reuse. Scratch pad memory of 2 K is required for coding and grouping process. The resulting grouped exponents require additional memory for storage before they are finally packed into AC-3 frame. The memory allocated must be sufficient even in the worst case. Let us check this.
  • Six blocks, two channel in each (assuming that coupling channel, if present is overlapped with channel 0), present a total of 256x2x6 = 3072 exponents. The worst case is when all blocks use D15 coding strategy. In this case, coding gain is only 3 (since three exponents are grouped into one). Thus the total memory required for exponents is 3072/3 = 1K. In addition to that, if one ensures (as in done in the algorithm used here) that at least three blocks are using reuse, the memory requirement for grouped exponents drops to 512 words.
  • For future accesses, the grouped exponents must be easy to index. Even though the grouped exponents may occupy 512 words, they would in general be spread out in memory because of indexing e.g. to index to grp_exp[blk_no][ch][grp], the address should be X12[ (blk_no*max_grp_size*3) + (max_grp_size*ch) + grp]. This means that one needs to allocate more than 512 words for storage of grouped-exponents. The way to avoid this is to use a double indexing method. For each block and channel, the starting location of its grouped exponents are stored in a separate small table (size : 18 bytes). This pointer is then used to access the actual grouped exponent. In cases where the exponent coding strategy is reuse, no grouped-exponent exists and the table entry for such block and channel would be a null pointer.
  • C.8 Bit Allocation Algorithm
  • Bit allocation is one of the most complicated (computationally and memory wise) part of AC-3 encoding. It can be partitioned into the following steps
  • Figure 00100001
    PSD Calculation - Power Spectrum Density from Exponents
  • PSD Integration - Band together PSD, conforming to masking curve
  • Excitation & FSMC - First Step Masking Curve
  • Fast Bit Allocation - Bit Allocation to find Optimal SNR
  • Core Bit Allocation - Generation of Bit Allocation Pointers
  • C.8.1 PSD Calculation
  • The first step of bit allocation determines the power-spectrum density (PSD) according to equation below. psd[bin] = (3072 - (exp[bin] << 7));
  • The PSD are to be stored in the same location as the exponents. This is possible as exponents are no longer required once PSD is generated.
  • C.8..2 PSD Integration
  • Next step of the algorithm integrates fine-grain PSD values within each of a multiplicity of 1/6th octave bands to generate band-psd. The integration of PSD values in each band is performed with log-addition. The log-addition is implemented by computing the difference between the two operands and using the absolute difference divided by 2 as an address into a length 256 lookup table. In total, there can be 50 such bands per channel. The coupling channel however can reuse the same location as one of the coupled channel. The band-psd for the coupled channel occupies the lower part (0-bndstart[ch]), the upper portion can be occupied be the coupling channel, provided the starting bin of the coupling channel always is on a new band - otherwise coupling band will overwrite the last band of the coupled channel.
    Figure 00110001
    Figure 00120001
  • Table I above shows the band structure for PSD-integration. For each band, bndtab (col. 2) shows the starting index of the coefficients in the band. Coupling can begin at only discrete points as given in equation below - cplstrtmant = (cplbegf*12) + 37
  • Based on the table and above equation we note that for cplbegf from 0 to 8, the cplstrtmant is always the starting point of some band. For cplbegf beyond 8, if the values are restricted to even values, the condition is still satisfied. Thus, with this restriction (which is hardly restrictive) band-psd of coupling can be appended to that of a coupled channel.
  • C.8.3 Excitation Curve & FSMC
  • The excitation function is computed by applying the prototype masking curve selected by the encoder (and transmitted to the decoder) to the integrated PSD spectrum (bndpsd[]). The result of this computation is then offset downward in amplitude by the fgain and sgain parameters, which are also obtained from the bit stream.
  • The excitation curve values can be written in-place of the band-psd. However, since band-psd values are required during initial portion of masking curve calculations, a temporary back-up of its value can be made.
  • Following the excitation curve, the First-Step-Masking Curve (FSMC) is computed as given in the pseudo code below.
    Figure 00120002
  • This step computes the masking (noise level threshold) curve from the excitation function. The hearing threshold
    Figure 00120003
    is given in ATSC Document. The fscod and dbknee variables are assigned by the encoder. The FSMC is written over the excitation curve as its value are no longer required by the encoder.
  • C.8.3 Fast Bit Allocation
  • AC-3 performs global bit allocation, that is, the allocation routine shuttles bits across channels and blocks as necessary, to meet the shifting demands of the signal. Mantissa bits for the entire frame are allocated from a common pool. As a result the bit-allocation requires masking and psd information of the entire frame. Once the bits are assigned, quantization of the mantissa according to the assigned bits can be performed on a block basis. This is because sharing of information about quantized mantissa is restricted to block level. To decrease memory requirements the first step is to separate quantization from the bit allocation process.
  • The second important aspect to note, which brings about tremendous decrease in memory requirement, is the fact that the bit-allocation requires only three piece of information - FSMC, PSD and snroffst. While the first two are familiar by now, the third parameter needs to be explained. With the FSMC and PSD, fixed at this point of frame encoding process, the only parameter which alters the distribution of bits to mantissas is the value of snroffst. The bit allocation algorithm iterates with various values of this parameter till it converges to a value with which the total quantized-mantissa bits in the frame add upto the available bits.
  • From the general description in the ATSC standard it would seem that for computing the right value of the snroffst, the masking curve needs to be re-computed at each iteration from the excitation curve. Using the masking curve the baps would be computed and totalled to estimate the mantissa size. Storing baps for the entire frame would require 3 K memory. In addition to that the masking curve would have to be stored at a separate location from the excitation curve, otherwise for next iteration the excitation curve values would be corrupted. By breaking the masking curve calculation into two part - FSMC and SSMC (Second-Step Masking-Curve), one separates the part which is invariant under snroffst from the changing one.
  • One innovative aspect of the memory allocation process, designed by the author, is that the SSMC (second-step masking curve) is calculated on-the-fly i.e. each time the masking curve is required for the block. During the optimisation stage of the bit-allocation algorithm, from the FSMC (first-step-masking curve) and the chosen snroffst, the SSMC is computed and stored in a temporary location and disposed once its purpose is served. Baps are stored for current block only. Following the convergence process, when the optimal snroffst value is known, the SSMC and baps are re-computed for each block again as and when necessary. This effectively increases number of iterations by one, but since usually the number of iterations are quite large (-6) the impact is not significant.
  • Storing baps only for current block not only saves memory but also decreases computation and enhances ease of coding and testing. Consider the following common scenario in the AC-3 encoding process when the exponent coding strategy for a particular block is as follows : D15, Reuse, Reuse, Reuse, D25, Reuse. Processing for block zero begins and the FSMC and baps are computed. Mantissas for block zero are quantized, grouped and packed according to the baps. Since the strategy for block one is reuse, the SSMC and thereby baps are same as those of block zero. So the SSMC and baps calculations are simply skipped (see Figure 5) and the quantization routine quantizes and packs mantissas for block one according to baps in location Y00 (without any consideration to the block for which the baps were originally calculated). This design simplifies implementation and testing of the bit-allocation algorithm, which otherwise would have to resort to complex calculations and indexing to determine the block whose baps are to be reused in the current block.
  • It must be noted that this scheme works only upto block level. SSMC and baps cannot be worked upon at channel level in a block since in the quantization stage, grouping of mantissas may extent from one channel to another (within the same block) e.g. if channel 0 has only two level-three mantissas (i.e. mantissas quantized to three levels) then the first level-three mantissa in channel one will be added together to form a group of three. The point is, when quantizing mantissas of a block baps for all channels within the block must be pre-computed and ready.
  • C.8.5 Core Bit Allocation
  • The last step of the bit-allocation checks if the constraints (ATSC Doc.) on the AC-3 frame such as - size of block 0 and block 1 combined, will never exceed 5/8 of the frame are satisfied. Once constraint test is passed the bit allocation pointers for each block is computed and their value is used to quantize the mantissas.
  • C.9 Quantization
  • The mantissas in X06-X11 are quantized up to number of bits dictated by the bit-allocation-pointers. The quantized mantissas are stored in-place. However, in AC-3 mantissas with certain levels of quanization are grouped together. These mantissas need to be stored separately and grouped and then packed into the AC-3 frame.
  • As seen in Table 2 below mantissas with baps 1,2 and 4 (i.e. Lev-3, Lev-5 and Lev-11 mantissas) are put together into groups of 3,3 and 2, respectively.
    Mapping of bap to Quantizer
    bap quantizer levels quantization type mantissa bits (qntztab[bap]) (group bits / num in group)
    0 0 none 0
    1 3 symmetric 1.67(5/3)
    2 5 symmetric 2.33 (7/3)
    3 7 symmetric 3
    4 11 symmetric 3.5(7/2)
    5 15 symmetric 4
    6 32 asymmetric 5
    7 64 asymmetric 6
    8 128 asymmetric 7
    9 256 asymmetric 8
    10 512 asymmetric 9
    11 1,024 asymmetric 10
    12 2,048 asymmeuic 11
    13 4,096 asymmetric 12
    14 16,384 asymmetric 14
    15 65,536 asymmetric 16
  • Figure 6. below shows the Quantizer which quantizes mantissa of a particular block according to the corresponding baps. Lev-3,5 and 11 mantissas are stored separately for grouping, one can store these mantissas in their original location but then would need pointer to them for grouping stage, these pointer being equal in number would occupy identical amount of space. The compression of the level mantissas is 3,3 and 2 (corresponding to group size of 3,3 and 2), therefore proportional amount of space is reserved in Y06-Y07 for each.
  • The last step in the encoding process is the packing of mantissas onto the AC-3 frame. For each mantissa Q bits of the quantized mantissa is stored into the AC-3 frame, the size Q being determined from the bit-allocation pointer value. At this stage, the PSD values for the block under consideration are no longer required and so the Q values may be stored in their place (Location : X06-X11), see Figure 2.
  • C.10 AC-3 Frame Packing
  • The frame size depends on the compression ratio. For stereo AC-3, bitrates of up to 192-384 kbps are reasonable, for in this range transparent quality can be achieved. The largest frame size (836 words) results when the bitrate is 384 kbps, sampling frequency being 44.1 kHz. A 1 K of frame buffer size is therefore reasonable for storing the AC-3 frame (X14-X15).

Claims (9)

  1. A method of reducing memory requirements for an encoder which includes the function of bit allocation for quantising frequency coefficients of an input signal, including:
    calculating a power spectrum density (PSD);
    integrating the PSD over a plurality of frequency bands to form a band-PSD;
    computing an excitation function by applying a prototype masking curve to the band-PSD;
    generating a first-step-masking curve (FSMC) of a noise level threshold from the excitation function;
    calculating a second-step-masking curve (SSMC) by incrementing the FSMC in accordance with a selected signal to noise variable (snroffst), wherein
    the excitation function, after being computed, is written to memory in place of the band-PSD and is subsequently overwritten by the FSMC and wherein the SSMC is stored in a temporary memory and recalculated for each block of data processed by the encoder, characterised in that: the frequency coefficients of two coupled input channels are combined in a coupling channel, the method including mapping coupling channel data to memory reserved for one of the coupled channels.
  2. A method as claimed in claim 1, including allocation of bits for coding mantissa values of the frequency coefficients, the bit allocation being determined on the basis of the PSD, SSMC and the variable, with bit allocation pointers being generated and stored for a current data block only.
  3. A method as claimed in claim 2, wherein the bit allocation is followed by quantisation of the mantissa values according to the bit allocation pointers and packing of the quantised values into a data frame, the quantised values being stored in memory in place of original unquantised coefficients.
  4. A method as claimed in claim 1, wherein the frequency coefficients are initially separated into exponent and mantissa components and wherein the exponent components are overwritten with the PSD.
  5. A method as claimed in claim 1, wherein the encoder includes the function of rematrixingand coupling, wherein the rematrixing and coupling, if necessary, are performed in-place, with a coupling channel imposed on an upper half of a first coupled channel, thereby removing the need for creation of a new storage area for the coupling channel.
  6. A method as claimed in any one of the preceding claims, wherein the input signal is processed in frames of six blocks of stereo input which, are stored in a memory X at locations X07-X12 with a last block of a previous frame prefixed at X06, so that a continuous six block of overlapping 512 samples per input channel are presented to a transient detection and frequency transformation modules of the encoder and wherein the coefficients are represented by 32-bit 256 coefficients, per channel, which are stored in X00-X11.
  7. A method as claimed in claim 6, wherein 32-bit frequency coefficients are converted to 16-bit mantissa and exponent, each.
  8. A method as claimed in claim 7, wherein the exponents are coded according to a selected coding strategy using a double indexing method.
  9. A method as claimed in claim 2 or 3, wherein the mantissas are allocated quantisation values on the basis of the bit allocation pointers, the quantisation values being stored in place of PSD.
EP99954578A 1999-10-30 1999-10-30 A method of reducing memory requirements in an ac-3 audio encoder Expired - Lifetime EP1228507B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG1999/000111 WO2001033556A1 (en) 1999-10-30 1999-10-30 A method of reducing memory requirements in an ac-3 audio encoder

Publications (2)

Publication Number Publication Date
EP1228507A1 EP1228507A1 (en) 2002-08-07
EP1228507B1 true EP1228507B1 (en) 2003-05-28

Family

ID=20430245

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99954578A Expired - Lifetime EP1228507B1 (en) 1999-10-30 1999-10-30 A method of reducing memory requirements in an ac-3 audio encoder

Country Status (3)

Country Link
EP (1) EP1228507B1 (en)
DE (1) DE69908433T2 (en)
WO (1) WO2001033556A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996021975A1 (en) * 1995-01-09 1996-07-18 Philips Electronics N.V. Method and apparatus for determining a masked threshold
TW316302B (en) * 1995-05-02 1997-09-21 Nippon Steel Corp
US5819215A (en) * 1995-10-13 1998-10-06 Dobson; Kurt Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of digital audio or other sensory data

Also Published As

Publication number Publication date
DE69908433T2 (en) 2004-04-08
WO2001033556A1 (en) 2001-05-10
DE69908433D1 (en) 2003-07-03
EP1228507A1 (en) 2002-08-07

Similar Documents

Publication Publication Date Title
KR101168473B1 (en) Audio encoding system
EP0684705B1 (en) Multichannel signal coding using weighted vector quantization
EP0610975B1 (en) Coded signal formatting for encoder and decoder of high-quality audio
US5369724A (en) Method and apparatus for encoding, decoding and compression of audio-type data using reference coefficients located within a band of coefficients
KR100193353B1 (en) Adaptive block length, adaptive conversion, adaptive window conversion coder, decoder and encoder / decoder for high quality audio
JP3178026B2 (en) Digital signal encoding device and decoding device
EP1072036B1 (en) Fast frame optimisation in an audio encoder
EP1852851A1 (en) An enhanced audio encoding/decoding device and method
US5394508A (en) Method and apparatus for encoding decoding and compression of audio-type data
PL182240B1 (en) Multiple-channel predictive sub-band encoder employing psychoacoustic adaptive assignment of bits
US7680671B2 (en) Multi-precision technique for digital audio encoder
US20040220805A1 (en) Method and device for processing time-discrete audio sampled values
CN100489965C (en) Audio encoding system
KR20060131798A (en) Audio coding based on block grouping
EP1228576B1 (en) Channel coupling for an ac-3 encoder
JP4843142B2 (en) Use of gain-adaptive quantization and non-uniform code length for speech coding
US6775587B1 (en) Method of encoding frequency coefficients in an AC-3 encoder
EP1228507B1 (en) A method of reducing memory requirements in an ac-3 audio encoder
US6754618B1 (en) Fast implementation of MPEG audio coding
JP3093178B2 (en) Low bit rate conversion encoder and decoder for high quality audio
Chen et al. Fast time-frequency transform algorithms and their applications to real-time software implementation of AC-3 audio codec
JPH0758707A (en) Quantization bit allocation system
Absar et al. AC-3 Encoder Implementation on the D950 DSP-Core

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020529

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Designated state(s): DE FR GB IT

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 69908433

Country of ref document: DE

Date of ref document: 20030703

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20040302

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 18

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 19

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20180920

Year of fee payment: 20

Ref country code: IT

Payment date: 20180919

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20180925

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20180819

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 69908433

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20191029

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20191029