WO2007011080A1

WO2007011080A1 - Apparatus and method of encoding and decoding audio signal

Info

Publication number: WO2007011080A1
Application number: PCT/KR2005/002292
Authority: WO
Inventors: Tilman Liebchen
Original assignee: Lg Electronics Inc.; Noll, Peter
Priority date: 2005-07-16
Filing date: 2005-07-16
Publication date: 2007-01-25

Abstract

A method and apparatus of encoding and decoding an audio signal are disclosed. A channel of an audio data frame is subdivided into a plurality of blocks having non-uniform lengths, and an optimum prediction order for each subdivided block is determined based on a maximum prediction order and a length of each subdivided block. The blocks are subdivided hierarchically at one or more block switching levels, and each block results from a subdivision of superordinate block of double length. Block switching information is generated in order to indicate how the blocks are subdivided at the block switching levels, respectively.

Description

[DESCRIPTION]

APPARATUS AND METHOD OF ENCODING AND DECODING AUDIO SIGNAL

Technical Field

The present invention relates to a method for processing audio signal, and more particularly to a method and apparatus of encoding and decoding audio signal.

Background Art

The storage and replaying of audio signals has been accomplished in different ways in the past. For example, music and talk has been recorded and preserved by phonographic technology (e.g. record players), magnetic technology (e.g. cassette tapes), and digital technology (e.g. compact discs). As audio storage technology progresses, many challenges need to be overcome to optimize the quality and storability of audio signals.

For the archiving and broadband transmission of music signals, lossless reconstruction is becoming a more important feature than high efficiency in compression by means of perceptual coding as defined in MPEG standards such as MP3 or AAC. Although DVD audio and Super CD Audio include proprietary lossless compression schemes, there is a demand for an open and general compression scheme among content-holders and broadcasters. In response to this demand, a new lossless coding scheme has been considered as an extension to the MPEG-4 Audio standard. Lossless audio coding permits the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal.

Disclosure of Invention

The present invention relates to a method for processing forward- adaptive linear prediction, which offers remarkable compression even with low predictor orders. Nevertheless, performance can be significantly improved by using higher predictor orders, more efficient quantization and encoding of the predictor coefficients, and adaptive block length switching.

It is an object of the invention to provide an embedded a lossless audio coding to permit the compression of digital audio data without any loss in quality due to a perfect reconstruction of the original signal.

Another object of the invention is to provide a lossless coding techniques for high-definition audio signals. Audio Lossless Coding will define methods for lossless coding of audio signals with arbitrary sampling rates, resolutions of up to 32 bit, and up to 256 channels. The lossless codec uses forward-adaptive Linear Predictive Coding (LPC) to reduce bit rates compared to PCM, leaving the optimization entirely to the encoder. Thus, various encoder implementations are possible, offering a certain range in terms of efficiency and complexity. Although remarkable compression is achieved even for low predictor orders, still better compression becomes possible using high-order prediction. In this case, more efficient coding of the predictor coefficients is necessary in order to limit the amount of side information. This is achieved by applying a non-linear compander to the most important coefficients, followed by linear quantization and entropy coding of the quantized values. In addition, adaptive block length switching is used to account for changing signal statistics. As a result, compression ratios are comparable to the best high-order backward adaptive prediction schemes, but with a significantly less complex decoder, and maintaining full random access to arbitrary parts of the encoded signal. The present invention relate to an encoder and/or decoder (including methods of encoding and decoding) data. Data may be encoded or decoded in a lossless manner. Embodiments relate to a flexible, hierarchical block switch scheme, allowing for up to six different block lengths within a frame. Embodiments relate to independent block switching for each channel. Embodiments relate to a maximum predictor order of 1023.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, a method of processing an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks having non-uniform lengths, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block. The method may further include the steps of predicting data samples of each subdivided block using the optimum prediction order, and obtaining a residual of each subdivided block using the predicted data samples.

In another aspect of the present invention, a method of encoding an audio signal includes the steps of subdividing a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block. Each subdivided block results from a subdivision of a superordinate block of double length.

In another aspect of the present invention, a method of decoding an audio signal includes the steps of receiving an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more block switching levels, and each block results from a subdivision of a superordinate block of double length. The method further comprises the steps of parsing an optimum prediction order from each subdivided block, and reconstructing data samples of each subdivided block using the optimum prediction order.

In another aspect of the present invention, an apparatus of encoding an audio signal includes an encoder which subdivides a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, where each block results from a subdivision of a superordinate block of double length. The encoder then determines an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.

In another aspect of the present invention, an apparatus of decoding an audio signal includes a decoder which receives an audio data frame having at least one channel, where each channel is subdivided into a plurality of blocks hierarchically at one or more bock switching levels. The decoder then parses an optimum prediction order from each subdivided block and reconstructs data samples of each block using the parsed optimum prediction order.

It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

Brief of Description of Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings:

Figure 1 is an example illustration of an audio signal encoder.

Figure 2 is an example illustration of an audio signal decoder.

Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit audio material. Figure 4 is an compander functions C(r) and -C(-r).

Figure 5 is an example of a block switching hierarchy structure.

Figure 6 is an example of a block switching examples and corresponding block switching information codes.

Figure 7 is an example of a bit stream of old block switching scheme. Figure 8 is an example of a bit stream of new block switching (BS) scheme: No BS (top), synchronized BS between CPE channels 1 and 2 (middle), independent BS (bottom).

Figure 9 is a switched difference coding scheme.

Figure 10 is a partition of the residual distribution.

Best Mode for Carrying out the Invention

Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

Prior to describing the present invention, it should be noted that most terms disclosed in the present invention correspond to general terms well known in the art, but some terms have been selected by the applicant as necessary and will hereinafter be disclosed in the following description of the present invention. Therefore, it is preferable that the terms defined by the applicant be understood on the basis of their meanings in the present invention.

In a lossless audio coding method, since the encoding process has to be perfectly reversible without loss of information, several parts of both encoder and decoder have to be implemented in a deterministic way.

[Structure of the codec]

Figure 1 shows the typical processing for one input channel of audio data. A buffer stores one block of input samples, and an optimum set of parcor coefficients is calculated for each block. The number of coefficients, i.e. the order of the predictor, can be adaptively chosen as well. The quantized parcor values are entropy coded for transmission, and converted to LPC coefficients for the prediction filter which calculates the prediction residual. The residual is entropy coded using different entropy codes. The indices of the chosen codes have to be transmitted as side information.

Finally, a multiplexing unit combines coded residual, code indices, predictor coefficients and other additional information to form the compressed bitstream. The encoder also provides a CRC checksum, which is supplied mainly for the decoder to verify the decoded data. On the encoder side, the CRC can be used to ensure that the compressed data is losslessly decodable.

Additional encoder options comprise block length switching, random access and joint channel coding. The encoder may use these options to offer several compression levels with different complexities. The basic version of the encoder uses a fixed block length. Optionally, the encoder can switch between different block lengths to adapt to stationary regions as well as to transient segments of the audio signal. The codec allows random access in defined intervals down to some milliseconds, depending on the block length.

Furthermore, joint channel coding is used to exploit dependencies between channels of stereo or multi-channel signals. This can be achieved by coding the difference between two channels in those segments where this difference can be coded more efficiently than one of the original channels.

The entropy coding part of the prediction residual provides two alternative coding techniques with different complexities. Besides low complexity yet efficient Golomb-Rice coding, the BGMC arithmetic coding scheme offers even better compression at the expense of a slightly increased complexity.

Furthermore, The encoder will also offer efficient compression of floating-point audio data in the 32-bit IEEE format. This codec extension employs an algorithm that basically splits the floating-point signal into a truncated integer signal and a difference signal which contains the remaining fractional part. The integer signal is then compressed using the normal encoding scheme for PCM signals, while the difference signal is coded separately. A detailed description of the floating-point extension can be found.

The Figure 2 shows the lossless audio signal decoder which is significantly less complex than the encoder, since no adaptation has to be carried out. The decoder merely decodes the entropy coded residual and the parcor values, converts them into LPC coefficients, and applies the inverse prediction filter to calculate the lossless reconstruction signal.

The computational effort of the decoder mainly depends on the predictor orders chosen by the encoder. Since the average order is typically well below the maximum order, prediction with greater maximum orders does not necessarily lead to a significant increase of decoder complexity. In most cases, realtime decoding is possible even on low-end systems.

[Linear Prediction]

Linear prediction is used in many applications for speech and audio signal processing. In the following, only FIR predictors are considered.

Prediction with FIR Filters The current sample of a time-discrete signal x(n) can be approximately predicted from previous samples x(n - k) . The prediction is

given by

K

*(^Λ) = ∑^A* *x(n -k), (1 )

where K is the order of the predictor. If the predicted samples are close to the original samples, the residual

e(n) = x(n) - x(n) (2)

has a smaller variance than χ(n) itself, hence e(ή) can be encoded

more efficiently.

The procedure of estimating the predictor coefficients from a segment of input samples, prior to filtering that segment, is referred to as forward adaptation. In that case, the coefficients have to be transmitted. If the coefficients are estimated from previously processed segments or samples, e.g. from the residual, we speak of backward adaptation. This procedure has the advantage that no transmission of the coefficients is needed, since the data required to estimate the coefficients is available to the decoder as well. Forward-adaptive prediction with orders around 10 is widely used in speech coding, and can be employed for lossless audio coding as well. The maximum order of most forward-adaptive lossless prediction schemes is still rather small, e.g. K = 32. An exception is the special 1-bit lossless codec for the Super Audio CD, which uses predictor orders of up to 128. On the other hand, backward-adaptive FIR filters with some hundred coefficients are commonly used in many areas, e.g. channel equalization and echo cancellation. Most systems are based on the LMS algorithm or a variation thereof, which has also been proposed for lossless audio coding. Such LMS-based coding schemes with high orders are applicable since the predictor coefficients do not have to be transmitted as side information, thus their number does not contribute to the data rate. However, backward- adaptive codecs have the drawback that the adaptation has to be carried out both in the encoder and the decoder, making the decoder significantly more complex than in the forward-adaptive case.

Forward-Adaptive Prediction

In forward-adaptive linear prediction, the optimal predictor coefficients h_k (in terms of a minimized variance of the residual) are usually estimated for

each block by the autocorrelation method or the covariance method.

The autocorrelation method, using the Levinson-Durbin algorithm, has the additional advantage of providing a simple means to iteratively adapt the order of the predictor. Furthermore, the algorithm inherently calculates the corresponding parcor coefficients as well. Another crucial point in forward-adaptive prediction is to determine a suitable predictor order. Increasing the order decreases the variance of the

prediction error, which leads to a smaller bit rate R_e for the residual. On the

other hand, the bit rate RJor the predictor coefficients will rise with the

number of coefficients to be transmitted. Thus, the task is to find the optimum order which minimizes the total bit rate. This can be expressed by minimizing R_lotal (K) = R_e(K) + R_c(K) (3)

with respect to the prediction order K. As the prediction gain rises monotonically with higher orders, Re decreases with K. On the other hand R_c

rises monotonically with K, since an increasing number of coefficients have to be transmitted.

The search for the optimum order can be carried out efficiently by the

Levinson-Durbin algorithm, which determines recursively all predictors with increasing order. For each order, a complete set of predictor coefficients is

calculated. Moreover, the variance σ] of the corresponding residual can be

derived, resulting in an estimate of the expected bit rate for the residual. Together with the bit rate for the coefficients, the total bit rate can be determined in each iteration, i.e. for each predictor order. The optimum order is found at the point where the total bit rate no longer decreases.

While it is obvious from equation(3) that the coefficient bit rate has a direct effect on the total bit rate, a slower increase of R_c also allows to shift

the minimum of R_total to higher orders (where R_e is smaller as well), which

would lead to better compression. Hence, efficient though accurate quantization of the predictor coefficients plays an important role in achieving maximum compression.

Quantization of Predictor Coefficients

Direct quantization of the predictor coefficients h_k is not very efficient for transmission, since even small quantization errors may result in large deviations from the desired spectral characteristics of the optimum prediction filter. For this reason, the quantization of predictor coefficients is based on the

parcor (reflection) coefficients r_k , which can be calculated by means of the

Levinson-Durbin algorithm. In that case, the resulting values are restricted to the interval [-1 , 1]. Although parcor coefficients are less sensitive to quantization, they are still too sensitive when their magnitude is close to unity.

The first two parcor coefficients η and r₂ are typically very close to -1 and

+1 , respectively, while the remaining coefficients r_k , k > 2, usually have

smaller magnitudes. The distributions of the first coefficients are very different, but high-order coefficients tend to converge to a zero-mean gaussian-like distribution (Figure 3).

Therefore, only the first two coefficients are companded based on the following function:

This compander results in a significantly finer resolution at r_} → -1 ,

whereas -C(-r₂ ) can be used to provide a finer resolution at r₂ -> +1 (see

Figure 4).

However, in order to simplify computation, +C(-r₂ ) is actually used for

the second coefficient, leading to an opposite sign of the companded value.

The two companded coefficients are then quantized using a simple 7- bit uniform quantizer. This results in the following values:

The remaining coefficients r_k, k > 2 are not companded but simply

quantized using a 7-bit uniform quantizer again:

a_k = [64r_k] (7)

In all cases the resulting quantized values ak are restricted to the range [-64, +63]. These quantized coefficients are re-centered around their most probable values, and then encoded using Golomb-Rice codes. As a result, the average bit rate of the encoded parcor coefficients can be reduced to approximately 4 bits/coefficient, without noticeable degration of the spectral characteristics. Thus, it is possible to employ very high orders up to K = 1023, preferably in conjunction with large block lengths.

However, the direct form predictor filter uses predictor coefficients h_k

according to Eq. (1 ). In order to employ identical coefficients in the encoder

and the decoder, these h_k values have to be derived from the quantized a_k

values in both cases (see Figures 1 and 2). While it is up to the encoder how to determine a set of suitable parcor coefficients, A lossless coding method specifies an integer-arithmetic function for conversion between quantized

values a_k and direct predictor coefficients h_k which ensures their identical reconstruction in both encoder and decoder.

Block Length Switching

Embodiments relate to encoders, decoders, methods of encoding, and methods of decoding. In embodiments, an encoder is at least one of an audio encoder, and an Audio Lossless Coding encoder. In embodiments, a method of encoding is implemented in at least one of an audio encoder, and an Audio Lossless Coding encoder. In embodiments, a decoder is at least one of an audio decoder, and an Audio Lossless Coding decoder. In embodiments, a method of decoding is implemented in at least one of an audio decoder, and an Audio Lossless Coding decoder.

Embodiments relate to a block switching mechanism which subdivides a frame of audio data into four quarter-length blocks, instead of encoding it as one single block. Switching between one long and four short blocks may be performed adaptively on a frame-by-frame basis.

Even though this switching mechanism may enable a higher compression ratio than using a constant block length, there may be some drawbacks. For example, if only 1 :4 switching is possible, 1 :2 or 1 :8 switching

(and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if switching is done identically for all channels, there may be challenges if different channels require different switching, in accordance with embodiments. For example, since a more flexible block switching scheme enables the use of a wide range of block lengths (including very long ones), even higher maximum predictor orders may be feasible, in accordance with embodiments.

In embodiments, a more flexible, hierarchical block switching scheme, allows for up to six different blocks lengths (differing by factors of two) within a frame. In embodiments, independent block switching for each channel may be implemented (e.g. each channel pair may be switched independently in the case of joint channel coding). In embodiments, a maximum predictor order of 1023 may be implemented.

In embodiments, the same compression can be achieved with relatively low decoder complexity, which also allows higher compression at the same complexity.

Audio Lossless Coding (ALS) includes a relatively simple block switching mechanism. Each frame of N samples is either encoded using one

full length block (N _B = N) or four blocks of length N_B = N/4, where the same

block partition applies to all channels. Under some circumstances, this scheme may have some limitations. For example, only 1 :4 switching may be possible, although different switching (e.g. 1 :2, 1 :8, and combinations thereof) may be more efficient in some cases. For example, switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not correlated).

In embodiments, a relatively flexible block switching scheme may be implemented, where each frame can be hierarchically subdivided into many blocks. For example, Figure 5 illustrates a frame which can be hierarchically

subdivided up to 32 blocks. Arbitrary combinations of blocks with N_B = N,

N/2, N/4, N/8, N/16, and N/32 may be possible within a frame, as long as each block results from a subdivision of a superordinate block of double length, in accordance with embodiments. For example, as illustrated in example Figure 2, a partition into N/4 + N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.

In embodiments, the actual partition may be signaled in an additional field block switching information(bs_info) (illustrated in the right column of Figure 6), where the length depends on the number of block switching levels. Table 1 illustrates an example relationship of the maximum number of levels,

the minimum N_B , and the number of bytes used for bs_info.

Table 1 : Block switching levels.

The bs_info field may include up to 4 bytes, in accordance with embodiments. The mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444 55555555 55555555]. The first bit may be reserved for indicating independent block switching. In the example of Figure 26, there are

three levels, thus the minimum block length is N_B = N/8, and bs_info

consists of one byte. Starting at the maximum block length N_B - N, the bits

of bsjnfo are set if a block is further subdivided. For the topmost example there is no subdivision at all, thus the code is (0)0000000. The frame in the second row is subdivided ((0)1...), where only the second block of length N/2 is further split ((0)101...) into two blocks of length N/4. If an N/4 block is split as in the fourth row, it is indicated in the following bits ((0)111 0100). In each frame, bs_info fields may be transmitted for all channel pairs

(CPEs) and all single channels (SCEs), enabling independent block switching for different channels, in accordance with embodiments.

<lndependent Block Switching> In Independent Block Switching, while the frame length is identical for all channels, block switching can be done individually for each channel, in accordance with embodiments. If difference coding is used, both channels of a channel pair should be switched synchronously, but other channel pairs can still use different block switching. If the two channels of a channel pair are not correlated with each other, difference coding may not pay off, and thus there will be no need to switch both channels synchronously. Accordingly, if the two channels of a channel pair are not correlated with each other, switching the channels independently may not be practical.

There may be a bs_info field for each CPE and SCE in a frame (e.g. the two channels of a CPE are switched synchronously), in accordance with embodiments. If they are switched independently, the first bit of bsjnfo may be set to 1 , and the information applies to the CPE's first channel. In this example, another bsjnfo field for the second channel becomes necessary.

In embodiments, as a result of the increased flexibility, the arrangement of blocks in the bit stream can be dynamically arranged. As illustrated in example Figure 7, all channels use the same partition (e.g. either one long or four short blocks) and corresponding short blocks of different channels are arranged successively (e.g. blocks 1.1 , 2.1 , and 3.1 ), leading to an interleaved structure. In embodiments illustrated in example Figure 8, short blocks are only interleaved if they belong to a channel pair that uses difference coding and therefore synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may depend on previous blocks from both channels (e.g. blocks 1.1 and 2.1 ), so these previous blocks may need to be available prior to the current one. For channels whose blocks are switched independently, channel data can be arranged separately (e.g. bottom row of Figure 8).

<Higher Predictor Orders> Embodiments relate to higher predictor orders. Absent hierarchical block switching, there may be a factor of 4 between the long and the short block length (e.g. 4096 & 1024 or 8192 & 2048), in accordance with embodiments. In embodiments (e.g. where hierarchical block switching is implemented), this factor can be increased (e.g. up to 32), enabling a larger range (e.g. 16384 down to 512 or even 32768 to 1024 for high sampling rates). In embodiments, in order to make better use of very long blocks, higher maximum predictor orders may be employed. The maximum order may be

£^" _max = 1023. In embodiments, ^_1113x may be bound by the block length NB,

where K_nmκ < N_B / 8 (e.g. K_mm = 255 for N_B = 2048). Therefore, using K_mm

= 1023 may require a block length of at least N_B = 8192. In embodiments, the max_order field in the file header is 10 bits. In embodiments, the opt_order field of the block data is 10 bits. The actual number of bits in a particular block may depend on the maximum order allowed for a block. If the block is short, this local maximum order may be smaller than the global maximum order (stated in max_order in the file

header). For example, if K_miX = 1023, but N_B = 2048, the opt_order field is 8

bits (instead of 10) due to a maximum local order of 255.

The opt_order is determined based on the following equation. opt_order = min (global prediction order, local prediction order), and the global prediction order is determined from the max_order, and the local prediction order is determined from the length of the block. In detail, global and local prediction orders are determined by global prediction order = ceil(log2(maximum prediction order +1 )), and local prediction order = max(ceil(log2((Nb»3)-1 )), 1 ) In embodiments, it is necessary to predict data samples of the subdivided block from channel. A first sample of a current block is predicted using the last K samples of a previous block. The K value is determined from the opt_order which is derived the aboved equation.

If the current block is a channel's first block, no samples from the previous block may be used. In this case, prediction with progressive order is employed, where the scaled parcor coefficients are converted progressively to

LPC coefficient inside the prediction filter.

Random Access Random access stands for fast access to any part of the encoded audio signal without costly decoding of previous parts. It is an important feature for applications that employ seeking, editing, or streaming of the compressed data. In order to enable random access, the encoder has to insert frames that can be decoded without decoding previous frames. In those random access frames, no samples from previous frames may be used for prediction.

The distance between random access frames can be chosen from 255 to one frame. Depending on frame length and sampling rate, random access down to some milliseconds is possible.

However, prediction at the beginning of random access frames still constitutes a problem. A conventional K-th order predictor would normally need K samples from the previous frame in order the predict the current frame's first sample. Since samples from previous frames may not be used, the encoder has either to assume zeros, or to transmit the first K original samples directly, starting the prediction at position K + 1.

As a result, compression at the beginning of random access frames would be poor. In order to minimize this problem, the codec uses progressive prediction, which makes use of as many available samples as possible. While it is of course not feasible to predict the first sample of a random access frame, we can use first-order prediction for the second sample, second-order prediction for the third sample, and so forth, until the samples from position K + 1 on are predicted using the full K-th order predictor. Since the predictor

coefficients h_k are calculated recursively from the quantized parcor

coefficients a_k anyway, it is possible to calculate each coefficient set from orders 1 to K without additional costs.

In the case of 500 ms random access intervals, this scheme produces an absolute overhead of only 0.01-0.02% compared to continuous prediction without random access.

Joint Channel Coding

Joint channel coding can be used to exploit dependencies between the two channels of a stereo signal, or between any two channels of a multi¬

channel signal. While it is straightforward to process two channels X₁(Ti) and

x₂(n) independently, a simple way to exploit dependencies between these

channels is to encode the difference signal

d(^ή) = x₂(n) -x_λ (n) (8)

instead of x1 (n) or x2(n). Switching between X₁(X) , χ₂(n) and d(n) \n

each block can be carried out by comparison of the individual signals, depending on which two signals can be coded most efficiently (see Figure 9). Such prediction with switched difference coding is beneficial in cases where two channels are very similar. In the case of multi-channel material, the channels can be rearranged by the encoder in order to assign suitable channel pairs.

Besides simple difference coding, Lossless audio codec also supports a more complex scheme for exploiting interchannel redundancy between arbitrary channels of multichannel signals. Entropy Coding of The Residual

In simple mode, the residual values e(n) are entropy coded using

Rice codes. For each block, either all values can be encoded using the same Rice code, or the block can be further divided into four parts, each encoded with a different Rice code. The indices of the applied codes have to be transmitted, as shown in Figure 1. Since there are different ways to determine the optimal Rice code for a given set of data, it is up to the encoder to select suitable codes depending on the statistics of the residual.

Alternatively, the encoder can use a more complex and efficient coding scheme called BGMC (Block Gilbert-Moore Codes). In BGMC mode, the encoding of residuals is accomplished by splitting the distribution in two categories (Figure 10): Residuals that belong to a central region of the

distribution,

< e_mm , and ones that belong to its tails.

The residuals in tails are simply re-centered (i.e. for e(ri) > e_max we have

^(n) = e(n) -e_max ) and encoded using Rice codes as described earlier.

However, to encode residuals in the center of the distribution, the BGMC encoder splits them into LSB and MSB components first, then it encodes MSBs using block Gilbert-Moore (arithmetic) codes, and finally it transmits LSBs using direct fixed-lengths codes. Both parameters emax and the number of directly transmitted LSBs are selected such that they only slightly affect the coding efficiency of this scheme, while making it significantly less complex.

rCompression Results]

In the following, the lossless audio codec is compared with two of the most popular programs for lossless audio compression: The open-source codec FLAC, which uses forward-adaptive prediction as well, and Monkey's Audio (MAC 3.97), a backward-adaptive codec as the current state-of-the-art algorithm in terms of compression. Both codecs were run with options providing maximum compression (flac -8 and mac-c4000). The results for the encoder were determined for a medium compression level (with the prediction order restricted to K _ 60) and a maximum compression level (K _ 1023), both with random access of 500 ms. The tests were conducted on a 1.7 GHz Pentium-M system, with 1024 MB of memory. It comprises nearly 1 GB of stereo waveform data with sampling rates of 48, 96, and 192 kHz, and resolutions of 16 and 24 bits.

[Compression Ratio]

In the following, the compression ratio is defined as

_{c =} Compre_SsedFileSi_Ze _{H00% >}

OriginalFileSize

where smaller values mean better compression. The results for the examined audio formats are shown in Table 2 (192 kHz material is not supported by the FLAC codec).

Table 2: Comparison of average compression ratios for different audio formats (kHz/bits)

The results show that ALS at maximum level outperforms both FLAG and Monkey's Audio for all formats, but particularly for high-definition material (i.e. 96 kHz / 24-bit and above). Even at medium level ALS delivers the best overall compression.

rComplexityl

The complexity of different codecs strongly depends on the actual implementation, particularly that of the encoder. As mentioned earlier, the audio signal encoder of the present invention is just a snapshot of an ongoing development. Thus, we restrict our analysis to the decoder, a simple C code implementation with no further optimizations. The compressed data was generated by the currently best encoder implementation. The average CPU load for real-time decoding of various audio formats, encoded at different complexity levels, is shown in Table 3. Even for maximum complexity, the CPU load of the decoder is only around 20-25%, which in return means that file based decoding is at least 4-5 times faster than real-time.

Table 3: Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.

The codec is designed to offer a large range of complexity levels. While the maximum level achieves the highest compression at the expense of slowest encoding and decoding speed, the faster medium level only slightly degrades compression, but decoding is significantly less complex than for the maximum level (around 5% CPU load for 48 kHz material). Using a low- complexity level (K _ 15, Rice coding) degrades compression by only 1-1.5% compared to the medium level, but the decoder complexity is further reduced by a factor of three (less than 2% CPU load for 48 kHz material). Thus, audio data can be decoded even on hardware with very low computing power.

While the encoder complexity may be increased by both higher maximum orders and a more elaborate block switching algorithm (in accordance with embodiments), the decoder may be affected by a higher average predictor order.

As the results for a scheme in accordance with embodiments with

K_mm = 127, The foregoing embodiments (e.g. hierarchical block switching)

and advantages are merely examples and are not to be construed as limiting the appended claims. The above teachings can be applied to other apparatuses and methods, as would be appreciated by one of ordinary skill in the art. Many alternatives, modifications, and variations will be apparent to those skilled in the art.

[Syntax]

The present invention is related the syntax which is comprised in encoded bit stream. The syntax is as bellows;

File Header: The block_switching field is extended from 1 to 2 bits, the max_order field is extended from 8 to 10 bits. The framejength and user_frame_length fields are merged, resulting in a framejength field of 16 bits, while the user_framejength field is removed.

Table 4: Syntax of als header

Frame Data: If block switching is used, the bs_info field is added. Depending on the value of block_switching, it has 8, 16, or 32 bits. The first bit of a CPE's bs_info field holds the independent_bs flag. The number of blocks is implicitly derived from bs_info as well. If block_switching is off, there is no bs_info field, thus blocks is one and independent_bs is zero.

In order to improve readability, both new and old syntax are shown separately in the following table, instead of mixing new with old syntax elements.

Table 5: Syntax of frame_data

CPE = channels / 2

SCE = channels % 2

else

SCE = channels

for (cp = 0; cp < CPE; cp++){ if (block_switching){ bs_info 8,16,3 UiMsbf 2

} if (independent_bs){ for (c = 0; c < 2; c++){

if (c == 1 K bs_info 8,16,3 UiMsbf 2

} for (b = 0; b < blocks;

b++){ block_header() block_data()

else{ for (b = 0; b < blocks; b++){ for (c = 0; c < 2; c++){ block_header() block_data()

}

0; sc < SCE; sc++){ if (block_switching){ bs_info 8,16,3 UiMsbf 2

} for (b = 0; b < blocks; b++){ block_header() block_data() if (inter_channel_correlation){ channei_data(c)

} } Block Header: The short_blocks field is removed, since block switching information is completely transmitted on frame level (bs_info, see previous paragraph).

Table 6: Syntax of blockjieader

Block Data: The opt_order field is extended to a maximum of 10 bits (previously 8 bits).

Table 7: Syntax of block_data

fSemanticsi

File Header:

Table 8: Elements of als header

Frame Data:

Table 9: Elements of frame data

Table 10: Elements of block header

Table 11 : Elements of block data

Industrial Applicability

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the inventions. For example, the present invention can be adopted another audio signal codec like the lossy audio signal codec. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

[CLAIMS]

1. A method of processing an audio signal, the method comprising: subdividing a channel of an audio data frame into a plurality of blocks, wherein at least two of the subdivided blocks have different lengths; and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length (Nb) of each subdivided block.

2. The method of claim 1 , further comprising predicting data samples of each subdivided block using the optimum prediction order.

3. The method of claim 2, further comprising obtaining a residual of each subdivided block using the predicted data samples.

4. The method of claim 1 , wherein the optimum prediction order is determined based on the following equation:

optimal prediction order = min (global prediction order, local prediction order),

where the global prediction order is determined from the maximum prediction order and the local prediction order is determined from the length of each subdivided block.

5. The method of claim 4, wherein the global and local prediction orders are determined by:

global prediction order = ceil(log2(maximum prediction order +1 )), and

local prediction order = max(ceil(log2((Nb»3)-1 )), 1 ).

6. The method of claim 1 , wherein the plurality of blocks are subdivided hierarchically at one or more block switching levels, and each block results from a subdivision of a superordinate block of double length.

7. The method of claim 6, further comprising generating block switching information indicating how the blocks are subdivided at the block switching levels.

8. The method of claim 7, a length of each block is any one of N/2, N/4,

N/8, N/16, and N/32.

9. The method of claim 7, wherein a length of the block switching information is determined based on a number of the block switching levels.

10. The method of claim 7, wherein the block switching information includes a series of information bits representing how the blocks are subdivided at the block switching levels.

11. The method of claim 10, wherein each information bit has a value of 1 when a block is subdivided at a corresponding block switching level and has a value of 0 when the block is not subdivided at the corresponding block switching level.

12. The method of claim 7, further comprising transmitting the block switching information.

13. The method of claim 1 , further comprising predicting data samples of the blocks subdivided from the channel, wherein a first sample of a current block is predicted using the last K samples of a previous block.

14. The method of claim 13, wherein a first sample of the current block is predicted using prediction with progressive order when the current block is a foremost block of the channel.

15. A method of encoding an audio signal, the method comprising: subdividing a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from

a subdivision of a superordinate block of double length; and determining an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.

16. A method of decoding an audio signal, the method comprising: receiving an audio data frame having at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length; parsing an optimum prediction order from each subdivided block; and reconstructing data samples of each subdivided block using the optimum prediction order.

17. An apparatus of encoding an audio signal, the apparatus comprising: an encoder configured to subdivide a channel of an audio data frame into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length, wherein the encoder is further configured to determine an optimum prediction order for each subdivided block based on a maximum prediction order and a length of each subdivided block.

18. An apparatus of decoding an audio signal, the apparatus

comprising: a decoder configured to receive an audio data frame having at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, wherein the decoder is further configured to parse an optimum prediction order from each subdivided block, and to reconstruct data samples of each block using the parsed optimum prediction order.