WO2007011080A1 - Apparatus and method of encoding and decoding audio signal - Google Patents

Apparatus and method of encoding and decoding audio signal Download PDF

Info

Publication number
WO2007011080A1
WO2007011080A1 PCT/KR2005/002292 KR2005002292W WO2007011080A1 WO 2007011080 A1 WO2007011080 A1 WO 2007011080A1 KR 2005002292 W KR2005002292 W KR 2005002292W WO 2007011080 A1 WO2007011080 A1 WO 2007011080A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
subdivided
prediction order
blocks
channel
Prior art date
Application number
PCT/KR2005/002292
Other languages
French (fr)
Inventor
Tilman Liebchen
Original Assignee
Lg Electronics Inc.
Noll, Peter
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lg Electronics Inc., Noll, Peter filed Critical Lg Electronics Inc.
Priority to PCT/KR2005/002292 priority Critical patent/WO2007011080A1/en
Priority to US11/481,926 priority patent/US7949014B2/en
Priority to US11/481,915 priority patent/US7996216B2/en
Priority to US11/481,927 priority patent/US7835917B2/en
Priority to US11/481,932 priority patent/US8032240B2/en
Priority to US11/481,916 priority patent/US8108219B2/en
Priority to US11/481,917 priority patent/US7991272B2/en
Priority to US11/481,941 priority patent/US8050915B2/en
Priority to US11/481,931 priority patent/US7411528B2/en
Priority to US11/481,942 priority patent/US7830921B2/en
Priority to US11/481,929 priority patent/US7991012B2/en
Priority to US11/481,933 priority patent/US7966190B2/en
Priority to US11/481,940 priority patent/US8180631B2/en
Priority to US11/481,930 priority patent/US8032368B2/en
Priority to US11/481,939 priority patent/US8121836B2/en
Priority to CNA2006800305499A priority patent/CN101243495A/en
Priority to EP06757765A priority patent/EP1913580A4/en
Priority to PCT/KR2006/002690 priority patent/WO2007008012A2/en
Priority to PCT/KR2006/002677 priority patent/WO2007007999A2/en
Priority to PCT/KR2006/002687 priority patent/WO2007008009A1/en
Priority to CNA2006800251376A priority patent/CN101218631A/en
Priority to EP06769224A priority patent/EP1913794A4/en
Priority to JP2008521316A priority patent/JP2009510810A/en
Priority to PCT/KR2006/002691 priority patent/WO2007008013A2/en
Priority to CNA2006800305412A priority patent/CN101243497A/en
Priority to PCT/KR2006/002683 priority patent/WO2007008005A1/en
Priority to JP2008521307A priority patent/JP2009500683A/en
Priority to JP2008521311A priority patent/JP2009500687A/en
Priority to JP2008521314A priority patent/JP2009500689A/en
Priority to JP2008521306A priority patent/JP2009500682A/en
Priority to JP2008521308A priority patent/JP2009500684A/en
Priority to EP06757764A priority patent/EP1913579A4/en
Priority to EP06769223A priority patent/EP1913587A4/en
Priority to PCT/KR2006/002688 priority patent/WO2007008010A1/en
Priority to CNA2006800294174A priority patent/CN101243489A/en
Priority to EP06769225A priority patent/EP1911021A4/en
Priority to EP06769220A priority patent/EP1913585A4/en
Priority to EP06757767A priority patent/EP1913582A4/en
Priority to PCT/KR2006/002678 priority patent/WO2007008000A2/en
Priority to PCT/KR2006/002680 priority patent/WO2007008002A2/en
Priority to PCT/KR2006/002689 priority patent/WO2007008011A2/en
Priority to PCT/KR2006/002681 priority patent/WO2007008003A2/en
Priority to PCT/KR2006/002685 priority patent/WO2007008007A1/en
Priority to CNA200680024866XA priority patent/CN101218852A/en
Priority to JP2008521319A priority patent/JP2009500693A/en
Priority to CN2006800251380A priority patent/CN101218628B/en
Priority to EP06757768A priority patent/EP1913583A4/en
Priority to CNA2006800304797A priority patent/CN101243493A/en
Priority to CNA2006800289829A priority patent/CN101238510A/en
Priority to EP06769219A priority patent/EP1913584A4/en
Priority to CNA2006800251395A priority patent/CN101218629A/en
Priority to PCT/KR2006/002686 priority patent/WO2007008008A2/en
Priority to CNA2006800304693A priority patent/CN101243492A/en
Priority to EP06769227A priority patent/EP1911020A4/en
Priority to JP2008521315A priority patent/JP2009500690A/en
Priority to EP06769222A priority patent/EP1908058A4/en
Priority to JP2008521310A priority patent/JP2009500686A/en
Priority to JP2008521313A priority patent/JP2009500688A/en
Priority to JP2008521318A priority patent/JP2009500692A/en
Priority to PCT/KR2006/002679 priority patent/WO2007008001A2/en
Priority to EP06769226A priority patent/EP1913588A4/en
Priority to EP06769218A priority patent/EP1913589A4/en
Priority to PCT/KR2006/002682 priority patent/WO2007008004A2/en
Priority to EP06757766A priority patent/EP1913581A4/en
Priority to CNA200680028892XA priority patent/CN101238509A/en
Priority to JP2008521305A priority patent/JP2009500681A/en
Priority to JP2008521309A priority patent/JP2009500685A/en
Priority to JP2008521317A priority patent/JP2009500691A/en
Priority to CN2006800252699A priority patent/CN101218630B/en
Priority to CN2006800294070A priority patent/CN101243496B/en
Priority to CNA2006800305111A priority patent/CN101243494A/en
Publication of WO2007011080A1 publication Critical patent/WO2007011080A1/en
Priority to US12/232,527 priority patent/US7962332B2/en
Priority to US12/232,526 priority patent/US8010372B2/en
Priority to US12/232,591 priority patent/US8255227B2/en
Priority to US12/232,593 priority patent/US8326132B2/en
Priority to US12/232,590 priority patent/US8055507B2/en
Priority to US12/232,595 priority patent/US8417100B2/en
Priority to US12/232,658 priority patent/US8510119B2/en
Priority to US12/232,662 priority patent/US8510120B2/en
Priority to US12/232,659 priority patent/US8554568B2/en
Priority to US12/232,747 priority patent/US8149878B2/en
Priority to US12/232,734 priority patent/US8155144B2/en
Priority to US12/232,748 priority patent/US8155153B2/en
Priority to US12/232,744 priority patent/US8032386B2/en
Priority to US12/232,743 priority patent/US7987008B2/en
Priority to US12/232,740 priority patent/US8149876B2/en
Priority to US12/232,741 priority patent/US8149877B2/en
Priority to US12/232,739 priority patent/US8155152B2/en
Priority to US12/232,783 priority patent/US8275476B2/en
Priority to US12/232,781 priority patent/US7930177B2/en
Priority to US12/232,784 priority patent/US7987009B2/en
Priority to US12/232,782 priority patent/US8046092B2/en
Priority to US12/314,891 priority patent/US8065158B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques

Definitions

  • the present invention relates to a method for processing audio signal, and more particularly to a method and apparatus of encoding and decoding audio signal.
  • Lossless reconstruction is becoming a more important feature than high efficiency in compression by means of perceptual coding as defined in MPEG standards such as MP3 or AAC.
  • MPEG standards such as MP3 or AAC.
  • DVD audio and Super CD Audio include proprietary lossless compression schemes
  • a new lossless coding scheme has been considered as an extension to the MPEG-4 Audio standard. Lossless audio coding permits the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal.
  • Audio Lossless Coding will define methods for lossless coding of audio signals with arbitrary sampling rates, resolutions of up to 32 bit, and up to 256 channels.
  • the lossless codec uses forward-adaptive Linear Predictive Coding (LPC) to reduce bit rates compared to PCM, leaving the optimization entirely to the encoder.
  • LPC Linear Predictive Coding
  • various encoder implementations are possible, offering a certain range in terms of efficiency and complexity. Although remarkable compression is achieved even for low predictor orders, still better compression becomes possible using high-order prediction. In this case, more efficient coding of the predictor coefficients is necessary in order to limit the amount of side information.
  • a method of processing an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks having non-uniform lengths, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.
  • the method may further include the steps of predicting data samples of each subdivided block using the optimum prediction order, and obtaining a residual of each subdivided block using the predicted data samples.
  • a method of encoding an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block. Each subdivided block results from a subdivision of a superordinate block of double length.
  • a method of decoding an audio signal includes the steps of receiving an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more block switching levels, and each block results from a subdivision of a superordinate block of double length.
  • the method further comprises the steps of parsing an optimum prediction order from each subdivided block, and reconstructing data samples of each subdivided block using the optimum prediction order.
  • an apparatus of encoding an audio signal includes an encoder which subdivides a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, where each block results from a subdivision of a superordinate block of double length.
  • the encoder determines an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.
  • an apparatus of decoding an audio signal includes a decoder which receives an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more bock switching levels. The decoder then parses an optimum prediction order from each subdivided block and reconstructs data samples of each block using the parsed optimum prediction order.
  • Figure 1 is an example illustration of an audio signal encoder.
  • Figure 2 is an example illustration of an audio signal decoder.
  • Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit audio material.
  • Figure 4 is an compander functions C(r) and -C(-r).
  • Figure 5 is an example of a block switching hierarchy structure.
  • Figure 7 is an example of a bit stream of old block switching scheme.
  • Figure 8 is an example of a bit stream of new block switching (BS) scheme: No BS (top), synchronized BS between CPE channels 1 and 2 (middle), independent BS (bottom).
  • BS new block switching
  • Figure 10 is a partition of the residual distribution.
  • Figure 1 shows the typical processing for one input channel of audio data.
  • a buffer stores one block of input samples, and an optimum set of parcor coefficients is calculated for each block.
  • the number of coefficients, i.e. the order of the predictor, can be adaptively chosen as well.
  • the quantized parcor values are entropy coded for transmission, and converted to LPC coefficients for the prediction filter which calculates the prediction residual.
  • the residual is entropy coded using different entropy codes.
  • the indices of the chosen codes have to be transmitted as side information.
  • Additional encoder options comprise block length switching, random access and joint channel coding.
  • the encoder may use these options to offer several compression levels with different complexities.
  • the basic version of the encoder uses a fixed block length.
  • the encoder can switch between different block lengths to adapt to stationary regions as well as to transient segments of the audio signal.
  • the codec allows random access in defined intervals down to some milliseconds, depending on the block length.
  • the entropy coding part of the prediction residual provides two alternative coding techniques with different complexities. Besides low complexity yet efficient Golomb-Rice coding, the BGMC arithmetic coding scheme offers even better compression at the expense of a slightly increased complexity.
  • the encoder will also offer efficient compression of floating-point audio data in the 32-bit IEEE format.
  • This codec extension employs an algorithm that basically splits the floating-point signal into a truncated integer signal and a difference signal which contains the remaining fractional part. The integer signal is then compressed using the normal encoding scheme for PCM signals, while the difference signal is coded separately. A detailed description of the floating-point extension can be found.
  • the Figure 2 shows the lossless audio signal decoder which is significantly less complex than the encoder, since no adaptation has to be carried out.
  • the decoder merely decodes the entropy coded residual and the parcor values, converts them into LPC coefficients, and applies the inverse prediction filter to calculate the lossless reconstruction signal.
  • Linear prediction is used in many applications for speech and audio signal processing. In the following, only FIR predictors are considered.
  • the current sample of a time-discrete signal x(n) can be approximately predicted from previous samples x(n - k) .
  • the prediction is
  • the procedure of estimating the predictor coefficients from a segment of input samples, prior to filtering that segment, is referred to as forward adaptation. In that case, the coefficients have to be transmitted. If the coefficients are estimated from previously processed segments or samples, e.g. from the residual, we speak of backward adaptation. This procedure has the advantage that no transmission of the coefficients is needed, since the data required to estimate the coefficients is available to the decoder as well.
  • the optimal predictor coefficients h k (in terms of a minimized variance of the residual) are usually estimated for
  • bit rate RJor the predictor coefficients will rise with the
  • the variance ⁇ ] of the corresponding residual can be
  • the total bit rate can be determined in each iteration, i.e. for each predictor order. The optimum order is found at the point where the total bit rate no longer decreases.
  • the first two parcor coefficients ⁇ and r 2 are typically very close to -1 and
  • the direct form predictor filter uses predictor coefficients h k
  • a lossless coding method specifies an integer-arithmetic function for conversion between quantized
  • Embodiments relate to encoders, decoders, methods of encoding, and methods of decoding.
  • an encoder is at least one of an audio encoder, and an Audio Lossless Coding encoder.
  • a method of encoding is implemented in at least one of an audio encoder, and an Audio Lossless Coding encoder.
  • a decoder is at least one of an audio decoder, and an Audio Lossless Coding decoder.
  • a method of decoding is implemented in at least one of an audio decoder, and an Audio Lossless Coding decoder.
  • Embodiments relate to a block switching mechanism which subdivides a frame of audio data into four quarter-length blocks, instead of encoding it as one single block. Switching between one long and four short blocks may be performed adaptively on a frame-by-frame basis.
  • a more flexible, hierarchical block switching scheme allows for up to six different blocks lengths (differing by factors of two) within a frame.
  • independent block switching for each channel may be implemented (e.g. each channel pair may be switched independently in the case of joint channel coding).
  • a maximum predictor order of 1023 may be implemented.
  • the same compression can be achieved with relatively low decoder complexity, which also allows higher compression at the same complexity.
  • Audio Lossless Coding includes a relatively simple block switching mechanism. Each frame of N samples is either encoded using one
  • this scheme may have some limitations. For example, only 1 :4 switching may be possible, although different switching (e.g. 1 :2, 1 :8, and combinations thereof) may be more efficient in some cases. For example, switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not correlated).
  • a relatively flexible block switching scheme may be implemented, where each frame can be hierarchically subdivided into many blocks.
  • Figure 5 illustrates a frame which can be hierarchically
  • N/2, N/4, N/8, N/16, and N/32 may be possible within a frame, as long as each block results from a subdivision of a superordinate block of double length, in accordance with embodiments.
  • a partition into N/4 + N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.
  • the actual partition may be signaled in an additional field block switching information(bs_info) (illustrated in the right column of Figure 6), where the length depends on the number of block switching levels.
  • Bs_info block switching information
  • Table 1 Block switching levels.
  • the bs_info field may include up to 4 bytes, in accordance with embodiments.
  • the mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444 55555555 55555555].
  • the first bit may be reserved for indicating independent block switching. In the example of Figure 26, there are
  • bsjnfo bsjnfo are set if a block is further subdivided. For the topmost example there is no subdivision at all, thus the code is (0)0000000.
  • the frame in the second row is subdivided ((0)1%), where only the second block of length N/2 is further split ((0)101%) into two blocks of length N/4. If an N/4 block is split as in the fourth row, it is indicated in the following bits ((0)111 0100).
  • bs_info fields may be transmitted for all channel pairs
  • bs_info field for each CPE and SCE in a frame (e.g. the two channels of a CPE are switched synchronously), in accordance with embodiments. If they are switched independently, the first bit of bsjnfo may be set to 1 , and the information applies to the CPE's first channel. In this example, another bsjnfo field for the second channel becomes necessary.
  • the arrangement of blocks in the bit stream can be dynamically arranged.
  • all channels use the same partition (e.g. either one long or four short blocks) and corresponding short blocks of different channels are arranged successively (e.g. blocks 1.1 , 2.1 , and 3.1 ), leading to an interleaved structure.
  • short blocks are only interleaved if they belong to a channel pair that uses difference coding and therefore synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may depend on previous blocks from both channels (e.g. blocks 1.1 and 2.1 ), so these previous blocks may need to be available prior to the current one.
  • channel data can be arranged separately (e.g. bottom row of Figure 8).
  • Embodiments relate to higher predictor orders. Absent hierarchical block switching, there may be a factor of 4 between the long and the short block length (e.g. 4096 & 1024 or 8192 & 2048), in accordance with embodiments. In embodiments (e.g. where hierarchical block switching is implemented), this factor can be increased (e.g. up to 32), enabling a larger range (e.g. 16384 down to 512 or even 32768 to 1024 for high sampling rates). In embodiments, in order to make better use of very long blocks, higher maximum predictor orders may be employed. The maximum order may be
  • ⁇ 1113x may be bound by the block length NB,
  • the max_order field in the file header is 10 bits.
  • the opt_order field of the block data is 10 bits. The actual number of bits in a particular block may depend on the maximum order allowed for a block. If the block is short, this local maximum order may be smaller than the global maximum order (stated in max_order in the file
  • the opt_order is determined based on the following equation.
  • opt_order min (global prediction order, local prediction order), and the global prediction order is determined from the max_order, and the local prediction order is determined from the length of the block.
  • the distance between random access frames can be chosen from 255 to one frame. Depending on frame length and sampling rate, random access down to some milliseconds is possible.
  • the codec uses progressive prediction, which makes use of as many available samples as possible. While it is of course not feasible to predict the first sample of a random access frame, we can use first-order prediction for the second sample, second-order prediction for the third sample, and so forth, until the samples from position K + 1 on are predicted using the full K-th order predictor. Since the predictor
  • Joint channel coding can be used to exploit dependencies between the two channels of a stereo signal, or between any two channels of a multi ⁇
  • each block can be carried out by comparison of the individual signals, depending on which two signals can be coded most efficiently (see Figure 9).
  • Such prediction with switched difference coding is beneficial in cases where two channels are very similar.
  • the channels can be rearranged by the encoder in order to assign suitable channel pairs.
  • Lossless audio codec also supports a more complex scheme for exploiting interchannel redundancy between arbitrary channels of multichannel signals. Entropy Coding of The Residual
  • the encoder can use a more complex and efficient coding scheme called BGMC (Block Gilbert-Moore Codes).
  • BGMC Block Gilbert-Moore Codes
  • the encoding of residuals is accomplished by splitting the distribution in two categories ( Figure 10): Residuals that belong to a central region of the
  • the BGMC encoder splits them into LSB and MSB components first, then it encodes MSBs using block Gilbert-Moore (arithmetic) codes, and finally it transmits LSBs using direct fixed-lengths codes. Both parameters emax and the number of directly transmitted LSBs are selected such that they only slightly affect the coding efficiency of this scheme, while making it significantly less complex.
  • the lossless audio codec is compared with two of the most popular programs for lossless audio compression:
  • the open-source codec FLAC which uses forward-adaptive prediction as well, and Monkey's Audio (MAC 3.97), a backward-adaptive codec as the current state-of-the-art algorithm in terms of compression.
  • Both codecs were run with options providing maximum compression (flac -8 and mac-c4000).
  • the results for the encoder were determined for a medium compression level (with the prediction order restricted to K _ 60) and a maximum compression level (K _ 1023), both with random access of 500 ms.
  • the tests were conducted on a 1.7 GHz Pentium-M system, with 1024 MB of memory. It comprises nearly 1 GB of stereo waveform data with sampling rates of 48, 96, and 192 kHz, and resolutions of 16 and 24 bits.
  • the compression ratio is defined as
  • Table 3 Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.
  • the codec is designed to offer a large range of complexity levels. While the maximum level achieves the highest compression at the expense of slowest encoding and decoding speed, the faster medium level only slightly degrades compression, but decoding is significantly less complex than for the maximum level (around 5% CPU load for 48 kHz material).
  • K _ 15 a low- complexity level
  • Rice coding degrades compression by only 1-1.5% compared to the medium level, but the decoder complexity is further reduced by a factor of three (less than 2% CPU load for 48 kHz material).
  • audio data can be decoded even on hardware with very low computing power.
  • the present invention is related the syntax which is comprised in encoded bit stream.
  • the syntax is as bellows;
  • the block_switching field is extended from 1 to 2 bits, the max_order field is extended from 8 to 10 bits.
  • the framejength and user_frame_length fields are merged, resulting in a framejength field of 16 bits, while the user_framejength field is removed.
  • Frame Data If block switching is used, the bs_info field is added. Depending on the value of block_switching, it has 8, 16, or 32 bits. The first bit of a CPE's bs_info field holds the independent_bs flag. The number of blocks is implicitly derived from bs_info as well. If block_switching is off, there is no bs_info field, thus blocks is one and independent_bs is zero.
  • the opt_order field is extended to a maximum of 10 bits (previously 8 bits).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method and apparatus of encoding and decoding an audio signal are disclosed. A channel of an audio data frame is subdivided into a plurality of blocks having non-uniform lengths, and an optimum prediction order for each subdivided block is determined based on a maximum prediction order and a length of each subdivided block. The blocks are subdivided hierarchically at one or more block switching levels, and each block results from a subdivision of superordinate block of double length. Block switching information is generated in order to indicate how the blocks are subdivided at the block switching levels, respectively.

Description

[DESCRIPTION]
APPARATUS AND METHOD OF ENCODING AND DECODING AUDIO SIGNAL
Technical Field
The present invention relates to a method for processing audio signal, and more particularly to a method and apparatus of encoding and decoding audio signal.
Background Art
The storage and replaying of audio signals has been accomplished in different ways in the past. For example, music and talk has been recorded and preserved by phonographic technology (e.g. record players), magnetic technology (e.g. cassette tapes), and digital technology (e.g. compact discs). As audio storage technology progresses, many challenges need to be overcome to optimize the quality and storability of audio signals.
For the archiving and broadband transmission of music signals, lossless reconstruction is becoming a more important feature than high efficiency in compression by means of perceptual coding as defined in MPEG standards such as MP3 or AAC. Although DVD audio and Super CD Audio include proprietary lossless compression schemes, there is a demand for an open and general compression scheme among content-holders and broadcasters. In response to this demand, a new lossless coding scheme has been considered as an extension to the MPEG-4 Audio standard. Lossless audio coding permits the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal.
Disclosure of Invention
The present invention relates to a method for processing forward- adaptive linear prediction, which offers remarkable compression even with low predictor orders. Nevertheless, performance can be significantly improved by using higher predictor orders, more efficient quantization and encoding of the predictor coefficients, and adaptive block length switching.
It is an object of the invention to provide an embedded a lossless audio coding to permit the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal.
Another object of the invention is to provide a lossless coding techniques for high-definition audio signals. Audio Lossless Coding will define methods for lossless coding of audio signals with arbitrary sampling rates, resolutions of up to 32 bit, and up to 256 channels. The lossless codec uses forward-adaptive Linear Predictive Coding (LPC) to reduce bit rates compared to PCM, leaving the optimization entirely to the encoder. Thus, various encoder implementations are possible, offering a certain range in terms of efficiency and complexity. Although remarkable compression is achieved even for low predictor orders, still better compression becomes possible using high-order prediction. In this case, more efficient coding of the predictor coefficients is necessary in order to limit the amount of side information. This is achieved by applying a non-linear compander to the most important coefficients, followed by linear quantization and entropy coding of the quantized values. In addition, adaptive block length switching is used to account for changing signal statistics. As a result, compression ratios are comparable to the best high-order backward adaptive prediction schemes, but with a significantly less complex decoder, and maintaining full random access to arbitrary parts of the encoded signal. The present invention relate to an encoder and/or decoder (including methods of encoding and decoding) data. Data may be encoded or decoded in a lossless manner. Embodiments relate to a flexible, hierarchical block switch scheme, allowing for up to six different block lengths within a frame. Embodiments relate to independent block switching for each channel. Embodiments relate to a maximum predictor order of 1023.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, a method of processing an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks having non-uniform lengths, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block. The method may further include the steps of predicting data samples of each subdivided block using the optimum prediction order, and obtaining a residual of each subdivided block using the predicted data samples.
In another aspect of the present invention, a method of encoding an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block. Each subdivided block results from a subdivision of a superordinate block of double length.
In another aspect of the present invention, a method of decoding an audio signal includes the steps of receiving an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more block switching levels, and each block results from a subdivision of a superordinate block of double length. The method further comprises the steps of parsing an optimum prediction order from each subdivided block, and reconstructing data samples of each subdivided block using the optimum prediction order.
In another aspect of the present invention, an apparatus of encoding an audio signal includes an encoder which subdivides a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, where each block results from a subdivision of a superordinate block of double length. The encoder then determines an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.
In another aspect of the present invention, an apparatus of decoding an audio signal includes a decoder which receives an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more bock switching levels. The decoder then parses an optimum prediction order from each subdivided block and reconstructs data samples of each block using the parsed optimum prediction order.
It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
Brief of Description of Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings:
Figure 1 is an example illustration of an audio signal encoder.
Figure 2 is an example illustration of an audio signal decoder.
Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit audio material. Figure 4 is an compander functions C(r) and -C(-r).
Figure 5 is an example of a block switching hierarchy structure.
Figure 6 is an example of a block switching examples and corresponding block switching information codes.
Figure 7 is an example of a bit stream of old block switching scheme. Figure 8 is an example of a bit stream of new block switching (BS) scheme: No BS (top), synchronized BS between CPE channels 1 and 2 (middle), independent BS (bottom).
Figure 9 is a switched difference coding scheme.
Figure 10 is a partition of the residual distribution.
Best Mode for Carrying out the Invention
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Prior to describing the present invention, it should be noted that most terms disclosed in the present invention correspond to general terms well known in the art, but some terms have been selected by the applicant as necessary and will hereinafter be disclosed in the following description of the present invention. Therefore, it is preferable that the terms defined by the applicant be understood on the basis of their meanings in the present invention.
In a lossless audio coding method, since the encoding process has to be perfectly reversible without loss of information, several parts of both encoder and decoder have to be implemented in a deterministic way.
[Structure of the codec]
Figure 1 shows the typical processing for one input channel of audio data. A buffer stores one block of input samples, and an optimum set of parcor coefficients is calculated for each block. The number of coefficients, i.e. the order of the predictor, can be adaptively chosen as well. The quantized parcor values are entropy coded for transmission, and converted to LPC coefficients for the prediction filter which calculates the prediction residual. The residual is entropy coded using different entropy codes. The indices of the chosen codes have to be transmitted as side information.
Finally, a multiplexing unit combines coded residual, code indices, predictor coefficients and other additional information to form the compressed bitstream. The encoder also provides a CRC checksum, which is supplied mainly for the decoder to verify the decoded data. On the encoder side, the CRC can be used to ensure that the compressed data is losslessly decodable.
Additional encoder options comprise block length switching, random access and joint channel coding. The encoder may use these options to offer several compression levels with different complexities. The basic version of the encoder uses a fixed block length. Optionally, the encoder can switch between different block lengths to adapt to stationary regions as well as to transient segments of the audio signal. The codec allows random access in defined intervals down to some milliseconds, depending on the block length.
Furthermore, joint channel coding is used to exploit dependencies between channels of stereo or multi-channel signals. This can be achieved by coding the difference between two channels in those segments where this difference can be coded more efficiently than one of the original channels.
The entropy coding part of the prediction residual provides two alternative coding techniques with different complexities. Besides low complexity yet efficient Golomb-Rice coding, the BGMC arithmetic coding scheme offers even better compression at the expense of a slightly increased complexity.
Furthermore, The encoder will also offer efficient compression of floating-point audio data in the 32-bit IEEE format. This codec extension employs an algorithm that basically splits the floating-point signal into a truncated integer signal and a difference signal which contains the remaining fractional part. The integer signal is then compressed using the normal encoding scheme for PCM signals, while the difference signal is coded separately. A detailed description of the floating-point extension can be found.
The Figure 2 shows the lossless audio signal decoder which is significantly less complex than the encoder, since no adaptation has to be carried out. The decoder merely decodes the entropy coded residual and the parcor values, converts them into LPC coefficients, and applies the inverse prediction filter to calculate the lossless reconstruction signal.
The computational effort of the decoder mainly depends on the predictor orders chosen by the encoder. Since the average order is typically well below the maximum order, prediction with greater maximum orders does not necessarily lead to a significant increase of decoder complexity. In most cases, realtime decoding is possible even on low-end systems.
[Linear Prediction]
Linear prediction is used in many applications for speech and audio signal processing. In the following, only FIR predictors are considered.
Prediction with FIR Filters The current sample of a time-discrete signal x(n) can be approximately predicted from previous samples x(n - k) . The prediction is
given by
K
*(Λ) = ∑A* *x(n -k), (1 )
where K is the order of the predictor. If the predicted samples are close to the original samples, the residual
e(n) = x(n) - x(n) (2)
has a smaller variance than χ(n) itself, hence e(ή) can be encoded
more efficiently.
The procedure of estimating the predictor coefficients from a segment of input samples, prior to filtering that segment, is referred to as forward adaptation. In that case, the coefficients have to be transmitted. If the coefficients are estimated from previously processed segments or samples, e.g. from the residual, we speak of backward adaptation. This procedure has the advantage that no transmission of the coefficients is needed, since the data required to estimate the coefficients is available to the decoder as well. Forward-adaptive prediction with orders around 10 is widely used in speech coding, and can be employed for lossless audio coding as well. The maximum order of most forward-adaptive lossless prediction schemes is still rather small, e.g. K = 32. An exception is the special 1-bit lossless codec for the Super Audio CD, which uses predictor orders of up to 128. On the other hand, backward-adaptive FIR filters with some hundred coefficients are commonly used in many areas, e.g. channel equalization and echo cancellation. Most systems are based on the LMS algorithm or a variation thereof, which has also been proposed for lossless audio coding. Such LMS-based coding schemes with high orders are applicable since the predictor coefficients do not have to be transmitted as side information, thus their number does not contribute to the data rate. However, backward- adaptive codecs have the drawback that the adaptation has to be carried out both in the encoder and the decoder, making the decoder significantly more complex than in the forward-adaptive case.
Forward-Adaptive Prediction
In forward-adaptive linear prediction, the optimal predictor coefficients hk (in terms of a minimized variance of the residual) are usually estimated for
each block by the autocorrelation method or the covariance method.
The autocorrelation method, using the Levinson-Durbin algorithm, has the additional advantage of providing a simple means to iteratively adapt the order of the predictor. Furthermore, the algorithm inherently calculates the corresponding parcor coefficients as well. Another crucial point in forward-adaptive prediction is to determine a suitable predictor order. Increasing the order decreases the variance of the
prediction error, which leads to a smaller bit rate Re for the residual. On the
other hand, the bit rate RJor the predictor coefficients will rise with the
number of coefficients to be transmitted. Thus, the task is to find the optimum order which minimizes the total bit rate. This can be expressed by minimizing Rlotal (K) = Re(K) + Rc(K) (3)
with respect to the prediction order K. As the prediction gain rises monotonically with higher orders, Re decreases with K. On the other hand Rc
rises monotonically with K, since an increasing number of coefficients have to be transmitted.
The search for the optimum order can be carried out efficiently by the
Levinson-Durbin algorithm, which determines recursively all predictors with increasing order. For each order, a complete set of predictor coefficients is
calculated. Moreover, the variance σ] of the corresponding residual can be
derived, resulting in an estimate of the expected bit rate for the residual. Together with the bit rate for the coefficients, the total bit rate can be determined in each iteration, i.e. for each predictor order. The optimum order is found at the point where the total bit rate no longer decreases.
While it is obvious from equation(3) that the coefficient bit rate has a direct effect on the total bit rate, a slower increase of Rc also allows to shift
the minimum of Rtotal to higher orders (where Re is smaller as well), which
would lead to better compression. Hence, efficient though accurate quantization of the predictor coefficients plays an important role in achieving maximum compression.
Quantization of Predictor Coefficients
Direct quantization of the predictor coefficients hk is not very efficient for transmission, since even small quantization errors may result in large deviations from the desired spectral characteristics of the optimum prediction filter. For this reason, the quantization of predictor coefficients is based on the
parcor (reflection) coefficients rk , which can be calculated by means of the
Levinson-Durbin algorithm. In that case, the resulting values are restricted to the interval [-1 , 1]. Although parcor coefficients are less sensitive to quantization, they are still too sensitive when their magnitude is close to unity.
The first two parcor coefficients η and r2 are typically very close to -1 and
+1 , respectively, while the remaining coefficients rk , k > 2, usually have
smaller magnitudes. The distributions of the first coefficients are very different, but high-order coefficients tend to converge to a zero-mean gaussian-like distribution (Figure 3).
Therefore, only the first two coefficients are companded based on the following function:
Figure imgf000013_0001
This compander results in a significantly finer resolution at r} → -1 ,
whereas -C(-r2 ) can be used to provide a finer resolution at r2 -> +1 (see
Figure 4).
However, in order to simplify computation, +C(-r2 ) is actually used for
the second coefficient, leading to an opposite sign of the companded value.
The two companded coefficients are then quantized using a simple 7- bit uniform quantizer. This results in the following values:
Figure imgf000014_0001
Figure imgf000014_0002
The remaining coefficients rk, k > 2 are not companded but simply
quantized using a 7-bit uniform quantizer again:
ak = [64rk] (7)
In all cases the resulting quantized values ak are restricted to the range [-64, +63]. These quantized coefficients are re-centered around their most probable values, and then encoded using Golomb-Rice codes. As a result, the average bit rate of the encoded parcor coefficients can be reduced to approximately 4 bits/coefficient, without noticeable degration of the spectral characteristics. Thus, it is possible to employ very high orders up to K = 1023, preferably in conjunction with large block lengths.
However, the direct form predictor filter uses predictor coefficients hk
according to Eq. (1 ). In order to employ identical coefficients in the encoder
and the decoder, these hk values have to be derived from the quantized ak
values in both cases (see Figures 1 and 2). While it is up to the encoder how to determine a set of suitable parcor coefficients, A lossless coding method specifies an integer-arithmetic function for conversion between quantized
values ak and direct predictor coefficients hk which ensures their identical reconstruction in both encoder and decoder.
Block Length Switching
Embodiments relate to encoders, decoders, methods of encoding, and methods of decoding. In embodiments, an encoder is at least one of an audio encoder, and an Audio Lossless Coding encoder. In embodiments, a method of encoding is implemented in at least one of an audio encoder, and an Audio Lossless Coding encoder. In embodiments, a decoder is at least one of an audio decoder, and an Audio Lossless Coding decoder. In embodiments, a method of decoding is implemented in at least one of an audio decoder, and an Audio Lossless Coding decoder.
<Hierarchical Block Switching>
Embodiments relate to a block switching mechanism which subdivides a frame of audio data into four quarter-length blocks, instead of encoding it as one single block. Switching between one long and four short blocks may be performed adaptively on a frame-by-frame basis.
Even though this switching mechanism may enable a higher compression ratio than using a constant block length, there may be some drawbacks. For example, if only 1 :4 switching is possible, 1 :2 or 1 :8 switching
(and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if switching is done identically for all channels, there may be challenges if different channels require different switching, in accordance with embodiments. For example, since a more flexible block switching scheme enables the use of a wide range of block lengths (including very long ones), even higher maximum predictor orders may be feasible, in accordance with embodiments.
In embodiments, a more flexible, hierarchical block switching scheme, allows for up to six different blocks lengths (differing by factors of two) within a frame. In embodiments, independent block switching for each channel may be implemented (e.g. each channel pair may be switched independently in the case of joint channel coding). In embodiments, a maximum predictor order of 1023 may be implemented.
In embodiments, the same compression can be achieved with relatively low decoder complexity, which also allows higher compression at the same complexity.
Audio Lossless Coding (ALS) includes a relatively simple block switching mechanism. Each frame of N samples is either encoded using one
full length block (N B = N) or four blocks of length NB = N/4, where the same
block partition applies to all channels. Under some circumstances, this scheme may have some limitations. For example, only 1 :4 switching may be possible, although different switching (e.g. 1 :2, 1 :8, and combinations thereof) may be more efficient in some cases. For example, switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not correlated).
In embodiments, a relatively flexible block switching scheme may be implemented, where each frame can be hierarchically subdivided into many blocks. For example, Figure 5 illustrates a frame which can be hierarchically
subdivided up to 32 blocks. Arbitrary combinations of blocks with NB = N,
N/2, N/4, N/8, N/16, and N/32 may be possible within a frame, as long as each block results from a subdivision of a superordinate block of double length, in accordance with embodiments. For example, as illustrated in example Figure 2, a partition into N/4 + N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.
In embodiments, the actual partition may be signaled in an additional field block switching information(bs_info) (illustrated in the right column of Figure 6), where the length depends on the number of block switching levels. Table 1 illustrates an example relationship of the maximum number of levels,
the minimum NB , and the number of bytes used for bs_info.
Table 1 : Block switching levels.
Figure imgf000017_0001
The bs_info field may include up to 4 bytes, in accordance with embodiments. The mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444 55555555 55555555]. The first bit may be reserved for indicating independent block switching. In the example of Figure 26, there are
three levels, thus the minimum block length is NB = N/8, and bs_info
consists of one byte. Starting at the maximum block length NB - N, the bits
of bsjnfo are set if a block is further subdivided. For the topmost example there is no subdivision at all, thus the code is (0)0000000. The frame in the second row is subdivided ((0)1...), where only the second block of length N/2 is further split ((0)101...) into two blocks of length N/4. If an N/4 block is split as in the fourth row, it is indicated in the following bits ((0)111 0100). In each frame, bs_info fields may be transmitted for all channel pairs
(CPEs) and all single channels (SCEs), enabling independent block switching for different channels, in accordance with embodiments.
<lndependent Block Switching> In Independent Block Switching, while the frame length is identical for all channels, block switching can be done individually for each channel, in accordance with embodiments. If difference coding is used, both channels of a channel pair should be switched synchronously, but other channel pairs can still use different block switching. If the two channels of a channel pair are not correlated with each other, difference coding may not pay off, and thus there will be no need to switch both channels synchronously. Accordingly, if the two channels of a channel pair are not correlated with each other, switching the channels independently may not be practical.
There may be a bs_info field for each CPE and SCE in a frame (e.g. the two channels of a CPE are switched synchronously), in accordance with embodiments. If they are switched independently, the first bit of bsjnfo may be set to 1 , and the information applies to the CPE's first channel. In this example, another bsjnfo field for the second channel becomes necessary.
In embodiments, as a result of the increased flexibility, the arrangement of blocks in the bit stream can be dynamically arranged. As illustrated in example Figure 7, all channels use the same partition (e.g. either one long or four short blocks) and corresponding short blocks of different channels are arranged successively (e.g. blocks 1.1 , 2.1 , and 3.1 ), leading to an interleaved structure. In embodiments illustrated in example Figure 8, short blocks are only interleaved if they belong to a channel pair that uses difference coding and therefore synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may depend on previous blocks from both channels (e.g. blocks 1.1 and 2.1 ), so these previous blocks may need to be available prior to the current one. For channels whose blocks are switched independently, channel data can be arranged separately (e.g. bottom row of Figure 8).
<Higher Predictor Orders> Embodiments relate to higher predictor orders. Absent hierarchical block switching, there may be a factor of 4 between the long and the short block length (e.g. 4096 & 1024 or 8192 & 2048), in accordance with embodiments. In embodiments (e.g. where hierarchical block switching is implemented), this factor can be increased (e.g. up to 32), enabling a larger range (e.g. 16384 down to 512 or even 32768 to 1024 for high sampling rates). In embodiments, in order to make better use of very long blocks, higher maximum predictor orders may be employed. The maximum order may be
£" max = 1023. In embodiments, ^1113x may be bound by the block length NB,
where Knmκ < NB / 8 (e.g. Kmm = 255 for NB = 2048). Therefore, using Kmm
= 1023 may require a block length of at least NB = 8192. In embodiments, the max_order field in the file header is 10 bits. In embodiments, the opt_order field of the block data is 10 bits. The actual number of bits in a particular block may depend on the maximum order allowed for a block. If the block is short, this local maximum order may be smaller than the global maximum order (stated in max_order in the file
header). For example, if KmiX = 1023, but NB = 2048, the opt_order field is 8
bits (instead of 10) due to a maximum local order of 255.
The opt_order is determined based on the following equation. opt_order = min (global prediction order, local prediction order), and the global prediction order is determined from the max_order, and the local prediction order is determined from the length of the block. In detail, global and local prediction orders are determined by global prediction order = ceil(log2(maximum prediction order +1 )), and local prediction order = max(ceil(log2((Nb»3)-1 )), 1 ) In embodiments, it is necessary to predict data samples of the subdivided block from channel. A first sample of a current block is predicted using the last K samples of a previous block. The K value is determined from the opt_order which is derived the aboved equation.
If the current block is a channel's first block, no samples from the previous block may be used. In this case, prediction with progressive order is employed, where the scaled parcor coefficients are converted progressively to
LPC coefficient inside the prediction filter.
Random Access Random access stands for fast access to any part of the encoded audio signal without costly decoding of previous parts. It is an important feature for applications that employ seeking, editing, or streaming of the compressed data. In order to enable random access, the encoder has to insert frames that can be decoded without decoding previous frames. In those random access frames, no samples from previous frames may be used for prediction.
The distance between random access frames can be chosen from 255 to one frame. Depending on frame length and sampling rate, random access down to some milliseconds is possible.
However, prediction at the beginning of random access frames still constitutes a problem. A conventional K-th order predictor would normally need K samples from the previous frame in order the predict the current frame's first sample. Since samples from previous frames may not be used, the encoder has either to assume zeros, or to transmit the first K original samples directly, starting the prediction at position K + 1.
As a result, compression at the beginning of random access frames would be poor. In order to minimize this problem, the codec uses progressive prediction, which makes use of as many available samples as possible. While it is of course not feasible to predict the first sample of a random access frame, we can use first-order prediction for the second sample, second-order prediction for the third sample, and so forth, until the samples from position K + 1 on are predicted using the full K-th order predictor. Since the predictor
coefficients hk are calculated recursively from the quantized parcor
coefficients ak anyway, it is possible to calculate each coefficient set from orders 1 to K without additional costs.
In the case of 500 ms random access intervals, this scheme produces an absolute overhead of only 0.01-0.02% compared to continuous prediction without random access.
Joint Channel Coding
Joint channel coding can be used to exploit dependencies between the two channels of a stereo signal, or between any two channels of a multi¬
channel signal. While it is straightforward to process two channels X1(Ti) and
x2(n) independently, a simple way to exploit dependencies between these
channels is to encode the difference signal
d(ή) = x2(n) -xλ (n) (8)
instead of x1 (n) or x2(n). Switching between X1(X) , χ2(n) and d(n) \n
each block can be carried out by comparison of the individual signals, depending on which two signals can be coded most efficiently (see Figure 9). Such prediction with switched difference coding is beneficial in cases where two channels are very similar. In the case of multi-channel material, the channels can be rearranged by the encoder in order to assign suitable channel pairs.
Besides simple difference coding, Lossless audio codec also supports a more complex scheme for exploiting interchannel redundancy between arbitrary channels of multichannel signals. Entropy Coding of The Residual
In simple mode, the residual values e(n) are entropy coded using
Rice codes. For each block, either all values can be encoded using the same Rice code, or the block can be further divided into four parts, each encoded with a different Rice code. The indices of the applied codes have to be transmitted, as shown in Figure 1. Since there are different ways to determine the optimal Rice code for a given set of data, it is up to the encoder to select suitable codes depending on the statistics of the residual.
Alternatively, the encoder can use a more complex and efficient coding scheme called BGMC (Block Gilbert-Moore Codes). In BGMC mode, the encoding of residuals is accomplished by splitting the distribution in two categories (Figure 10): Residuals that belong to a central region of the
distribution,
Figure imgf000023_0001
< emm , and ones that belong to its tails.
The residuals in tails are simply re-centered (i.e. for e(ri) > emax we have
^(n) = e(n) -emax ) and encoded using Rice codes as described earlier.
However, to encode residuals in the center of the distribution, the BGMC encoder splits them into LSB and MSB components first, then it encodes MSBs using block Gilbert-Moore (arithmetic) codes, and finally it transmits LSBs using direct fixed-lengths codes. Both parameters emax and the number of directly transmitted LSBs are selected such that they only slightly affect the coding efficiency of this scheme, while making it significantly less complex.
rCompression Results]
In the following, the lossless audio codec is compared with two of the most popular programs for lossless audio compression: The open-source codec FLAC, which uses forward-adaptive prediction as well, and Monkey's Audio (MAC 3.97), a backward-adaptive codec as the current state-of-the-art algorithm in terms of compression. Both codecs were run with options providing maximum compression (flac -8 and mac-c4000). The results for the encoder were determined for a medium compression level (with the prediction order restricted to K _ 60) and a maximum compression level (K _ 1023), both with random access of 500 ms. The tests were conducted on a 1.7 GHz Pentium-M system, with 1024 MB of memory. It comprises nearly 1 GB of stereo waveform data with sampling rates of 48, 96, and 192 kHz, and resolutions of 16 and 24 bits.
[Compression Ratio]
In the following, the compression ratio is defined as
c = CompreSsedFileSiZe H00% >
OriginalFileSize
where smaller values mean better compression. The results for the examined audio formats are shown in Table 2 (192 kHz material is not supported by the FLAC codec).
Table 2: Comparison of average compression ratios for different audio formats (kHz/bits)
Figure imgf000025_0001
The results show that ALS at maximum level outperforms both FLAG and Monkey's Audio for all formats, but particularly for high-definition material (i.e. 96 kHz / 24-bit and above). Even at medium level ALS delivers the best overall compression.
rComplexityl
The complexity of different codecs strongly depends on the actual implementation, particularly that of the encoder. As mentioned earlier, the audio signal encoder of the present invention is just a snapshot of an ongoing development. Thus, we restrict our analysis to the decoder, a simple C code implementation with no further optimizations. The compressed data was generated by the currently best encoder implementation. The average CPU load for real-time decoding of various audio formats, encoded at different complexity levels, is shown in Table 3. Even for maximum complexity, the CPU load of the decoder is only around 20-25%, which in return means that file based decoding is at least 4-5 times faster than real-time.
Table 3: Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.
Figure imgf000026_0001
The codec is designed to offer a large range of complexity levels. While the maximum level achieves the highest compression at the expense of slowest encoding and decoding speed, the faster medium level only slightly degrades compression, but decoding is significantly less complex than for the maximum level (around 5% CPU load for 48 kHz material). Using a low- complexity level (K _ 15, Rice coding) degrades compression by only 1-1.5% compared to the medium level, but the decoder complexity is further reduced by a factor of three (less than 2% CPU load for 48 kHz material). Thus, audio data can be decoded even on hardware with very low computing power.
While the encoder complexity may be increased by both higher maximum orders and a more elaborate block switching algorithm (in accordance with embodiments), the decoder may be affected by a higher average predictor order.
As the results for a scheme in accordance with embodiments with
Kmm = 127, The foregoing embodiments (e.g. hierarchical block switching)
and advantages are merely examples and are not to be construed as limiting the appended claims. The above teachings can be applied to other apparatuses and methods, as would be appreciated by one of ordinary skill in the art. Many alternatives, modifications, and variations will be apparent to those skilled in the art.
[Syntax]
The present invention is related the syntax which is comprised in encoded bit stream. The syntax is as bellows;
File Header: The block_switching field is extended from 1 to 2 bits, the max_order field is extended from 8 to 10 bits. The framejength and user_frame_length fields are merged, resulting in a framejength field of 16 bits, while the user_framejength field is removed.
Table 4: Syntax of als header
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
Frame Data: If block switching is used, the bs_info field is added. Depending on the value of block_switching, it has 8, 16, or 32 bits. The first bit of a CPE's bs_info field holds the independent_bs flag. The number of blocks is implicitly derived from bs_info as well. If block_switching is off, there is no bs_info field, thus blocks is one and independent_bs is zero.
In order to improve readability, both new and old syntax are shown separately in the following table, instead of mixing new with old syntax elements.
Table 5: Syntax of frame_data
Figure imgf000029_0002
CPE = channels / 2
SCE = channels % 2
else
SCE = channels
for (cp = 0; cp < CPE; cp++){ if (block_switching){ bs_info 8,16,3 UiMsbf 2
} if (independent_bs){ for (c = 0; c < 2; c++){
if (c == 1 K bs_info 8,16,3 UiMsbf 2
} for (b = 0; b < blocks;
b++){ block_header() block_data()
else{ for (b = 0; b < blocks; b++){ for (c = 0; c < 2; c++){ block_header() block_data()
}
}
0; sc < SCE; sc++){ if (block_switching){ bs_info 8,16,3 UiMsbf 2
} for (b = 0; b < blocks; b++){ block_header() block_data() if (inter_channel_correlation){ channei_data(c)
} } Block Header: The short_blocks field is removed, since block switching information is completely transmitted on frame level (bs_info, see previous paragraph).
Table 6: Syntax of blockjieader
Figure imgf000032_0001
Block Data: The opt_order field is extended to a maximum of 10 bits (previously 8 bits).
Table 7: Syntax of block_data
Figure imgf000033_0001
fSemanticsi
File Header:
Table 8: Elements of als header
Figure imgf000033_0002
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Frame Data:
Table 9: Elements of frame data
Figure imgf000036_0002
Figure imgf000037_0001
Table 10: Elements of block header
Figure imgf000037_0002
Figure imgf000038_0001
Table 11 : Elements of block data
Figure imgf000038_0002
Industrial Applicability
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the inventions. For example, the present invention can be adopted another audio signal codec like the lossy audio signal codec. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

[CLAIMS]
1. A method of processing an audio signal, the method comprising: subdividing a channel of an audio data frame into a plurality of blocks, wherein at least two of the subdivided blocks have different lengths; and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length (Nb) of each subdivided block.
2. The method of claim 1 , further comprising predicting data samples of each subdivided block using the optimum prediction order.
3. The method of claim 2, further comprising obtaining a residual of each subdivided block using the predicted data samples.
4. The method of claim 1 , wherein the optimum prediction order is determined based on the following equation:
optimal prediction order = min (global prediction order, local prediction order),
where the global prediction order is determined from the maximum prediction order and the local prediction order is determined from the length of each subdivided block.
5. The method of claim 4, wherein the global and local prediction orders are determined by:
global prediction order = ceil(log2(maximum prediction order +1 )), and
local prediction order = max(ceil(log2((Nb»3)-1 )), 1 ).
6. The method of claim 1 , wherein the plurality of blocks are subdivided hierarchically at one or more block switching levels, and each block results from a subdivision of a superordinate block of double length.
7. The method of claim 6, further comprising generating block switching information indicating how the blocks are subdivided at the block switching levels.
8. The method of claim 7, a length of each block is any one of N/2, N/4,
N/8, N/16, and N/32.
9. The method of claim 7, wherein a length of the block switching information is determined based on a number of the block switching levels.
10. The method of claim 7, wherein the block switching information includes a series of information bits representing how the blocks are subdivided at the block switching levels.
11. The method of claim 10, wherein each information bit has a value of 1 when a block is subdivided at a corresponding block switching level and has a value of 0 when the block is not subdivided at the corresponding block switching level.
12. The method of claim 7, further comprising transmitting the block switching information.
13. The method of claim 1 , further comprising predicting data samples of the blocks subdivided from the channel, wherein a first sample of a current block is predicted using the last K samples of a previous block.
14. The method of claim 13, wherein a first sample of the current block is predicted using prediction with progressive order when the current block is a foremost block of the channel.
15. A method of encoding an audio signal, the method comprising: subdividing a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from
a subdivision of a superordinate block of double length; and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.
16. A method of decoding an audio signal, the method comprising: receiving an audio data frame having at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length; parsing an optimum prediction order from each subdivided block; and reconstructing data samples of each subdivided block using the optimum prediction order.
17. An apparatus of encoding an audio signal, the apparatus comprising: an encoder configured to subdivide a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length, wherein the encoder is further configured to determine an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.
18. An apparatus of decoding an audio signal, the apparatus
comprising: a decoder configured to receive an audio data frame having at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, wherein the decoder is further configured to parse an optimum prediction order from each subdivided block, and to reconstruct data samples of each block using the parsed optimum prediction order.
PCT/KR2005/002292 2005-07-11 2005-07-16 Apparatus and method of encoding and decoding audio signal WO2007011080A1 (en)

Priority Applications (93)

Application Number Priority Date Filing Date Title
PCT/KR2005/002292 WO2007011080A1 (en) 2005-07-16 2005-07-16 Apparatus and method of encoding and decoding audio signal
US11/481,926 US7949014B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,915 US7996216B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,927 US7835917B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,932 US8032240B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,916 US8108219B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,917 US7991272B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,941 US8050915B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signals using hierarchical block switching and linear prediction coding
US11/481,931 US7411528B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,942 US7830921B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,929 US7991012B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,933 US7966190B2 (en) 2005-07-11 2006-07-07 Apparatus and method for processing an audio signal using linear prediction
US11/481,940 US8180631B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal, utilizing a unique offset associated with each coded-coefficient
US11/481,930 US8032368B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signals using hierarchical block swithcing and linear prediction coding
US11/481,939 US8121836B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
CNA2006800305499A CN101243495A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06757765A EP1913580A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002690 WO2007008012A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002677 WO2007007999A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002687 WO2007008009A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800251376A CN101218631A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769224A EP1913794A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521316A JP2009510810A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
PCT/KR2006/002691 WO2007008013A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA2006800305412A CN101243497A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
PCT/KR2006/002683 WO2007008005A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521307A JP2009500683A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521311A JP2009500687A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
JP2008521314A JP2009500689A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
JP2008521306A JP2009500682A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521308A JP2009500684A (en) 2005-07-11 2006-07-10 Audio signal processing method, audio signal encoding and decoding apparatus and method
EP06757764A EP1913579A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06769223A EP1913587A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002688 WO2007008010A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800294174A CN101243489A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06769225A EP1911021A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769220A EP1913585A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06757767A EP1913582A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002678 WO2007008000A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002680 WO2007008002A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002689 WO2007008011A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002681 WO2007008003A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002685 WO2007008007A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA200680024866XA CN101218852A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521319A JP2009500693A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
CN2006800251380A CN101218628B (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding an audio signal
EP06757768A EP1913583A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800304797A CN101243493A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
CNA2006800289829A CN101238510A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769219A EP1913584A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA2006800251395A CN101218629A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002686 WO2007008008A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800304693A CN101243492A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06769227A EP1911020A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
JP2008521315A JP2009500690A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
EP06769222A EP1908058A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521310A JP2009500686A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521313A JP2009500688A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
JP2008521318A JP2009500692A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
PCT/KR2006/002679 WO2007008001A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06769226A EP1913588A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769218A EP1913589A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002682 WO2007008004A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06757766A EP1913581A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA200680028892XA CN101238509A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521305A JP2009500681A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521309A JP2009500685A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521317A JP2009500691A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
CN2006800252699A CN101218630B (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CN2006800294070A CN101243496B (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800305111A CN101243494A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
US12/232,527 US7962332B2 (en) 2005-07-11 2008-09-18 Apparatus and method of encoding and decoding audio signal
US12/232,526 US8010372B2 (en) 2005-07-11 2008-09-18 Apparatus and method of encoding and decoding audio signal
US12/232,591 US8255227B2 (en) 2005-07-11 2008-09-19 Scalable encoding and decoding of multichannel audio with up to five levels in subdivision hierarchy
US12/232,593 US8326132B2 (en) 2005-07-11 2008-09-19 Apparatus and method of encoding and decoding audio signal
US12/232,590 US8055507B2 (en) 2005-07-11 2008-09-19 Apparatus and method for processing an audio signal using linear prediction
US12/232,595 US8417100B2 (en) 2005-07-11 2008-09-19 Apparatus and method of encoding and decoding audio signal
US12/232,658 US8510119B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with coded-coefficients
US12/232,662 US8510120B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with coded-coefficients
US12/232,659 US8554568B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with each coded-coefficients
US12/232,747 US8149878B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,734 US8155144B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,748 US8155153B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,744 US8032386B2 (en) 2005-07-11 2008-09-23 Apparatus and method of processing an audio signal
US12/232,743 US7987008B2 (en) 2005-07-11 2008-09-23 Apparatus and method of processing an audio signal
US12/232,740 US8149876B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,741 US8149877B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,739 US8155152B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,783 US8275476B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals
US12/232,781 US7930177B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals using hierarchical block switching and linear prediction coding
US12/232,784 US7987009B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals
US12/232,782 US8046092B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signal
US12/314,891 US8065158B2 (en) 2005-07-11 2008-12-18 Apparatus and method of processing an audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2005/002292 WO2007011080A1 (en) 2005-07-16 2005-07-16 Apparatus and method of encoding and decoding audio signal

Publications (1)

Publication Number Publication Date
WO2007011080A1 true WO2007011080A1 (en) 2007-01-25

Family

ID=37668946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/002292 WO2007011080A1 (en) 2005-07-11 2005-07-16 Apparatus and method of encoding and decoding audio signal

Country Status (1)

Country Link
WO (1) WO2007011080A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015100910A1 (en) * 2013-12-31 2015-07-09 深圳迈瑞生物医疗电子股份有限公司 Method, system and medical device for compressing physiological signal
GB2524424A (en) * 2011-10-24 2015-09-23 Peter Graham Craven Lossless buried data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAI YANG ET AL.: "A lossless audio compression scheme with random access property", ICASSP 2004, vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), pages 1016 - 1019, XP010718365 *
LIEBCHEN T.: "An introduction to MPEG-4 audio lossless coding", 2004 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP '04), vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), pages 1012 - 1015, XP010718364 *
MORIYA T. ET AL.: "Extended linear prediction tools for lossless audio coding", ICASSP 2004, vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), pages 1008 - 1011, XP010718363 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2524424A (en) * 2011-10-24 2015-09-23 Peter Graham Craven Lossless buried data
GB2495918B (en) * 2011-10-24 2015-11-04 Malcolm Law Lossless buried data
GB2524424B (en) * 2011-10-24 2016-04-27 Graham Craven Peter Lossless buried data
WO2015100910A1 (en) * 2013-12-31 2015-07-09 深圳迈瑞生物医疗电子股份有限公司 Method, system and medical device for compressing physiological signal

Similar Documents

Publication Publication Date Title
US7991272B2 (en) Apparatus and method of processing an audio signal
WO2007011080A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011083A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011078A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011079A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011085A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011084A1 (en) Apparatus and method of encoding and decoding audio signal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05761290

Country of ref document: EP

Kind code of ref document: A1