WO2010086342A1 - Audio encoder, audio decoder, method for encoding an input audio information, method for decoding an input audio information and computer program using improved coding tables - Google Patents

Audio encoder, audio decoder, method for encoding an input audio information, method for decoding an input audio information and computer program using improved coding tables Download PDF

Info

Publication number
WO2010086342A1
WO2010086342A1 PCT/EP2010/050954 EP2010050954W WO2010086342A1 WO 2010086342 A1 WO2010086342 A1 WO 2010086342A1 EP 2010050954 W EP2010050954 W EP 2010050954W WO 2010086342 A1 WO2010086342 A1 WO 2010086342A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
value
frequencies
arith
tuple
Prior art date
Application number
PCT/EP2010/050954
Other languages
French (fr)
Inventor
Jeremie Lecomte
Frederik Nagel
Julien Robillard
Ralf Geiger
Arne Borsum
Guillaume Fuchs
Markus Multrus
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Publication of WO2010086342A1 publication Critical patent/WO2010086342A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error

Definitions

  • Audio Encoder Audio Decoder, Method for Encoding an Input Audio Information, Method for Decoding an Input Audio Information and Computer Program using
  • Embodiments according to the invention are related to an audio decoder for providing a decoded audio information on the basis of an encoded audio information, an audio encoder for providing an encoded audio information on the basis of an input audio information, a method for providing a decoded audio information on the basis of an encoded audio information, a method for providing an encoded audio information on the basis of an input audio information and a computer program.
  • Embodiments according to the invention are related to the concept of using updated arithmetic coder tables in an audio encoder, like, for example, a so-called unified-speech- and-audio-coder (USAC).
  • an audio encoder like, for example, a so-called unified-speech- and-audio-coder (USAC).
  • a time-domain audio signal is converted into a time-frequency representation.
  • the transform from the time-domain to the time-frequency-domain is typically performed using transform blocks, which are also designated as "frames" of time-domain samples. It has been found that it is advantageous to use overlapping frames, which are shifted, for example, by half a frame, because the overlap allows to efficiently avoid (or at least reduce) artifacts. In addition, it has been found that a windowing should be performed in order to avoid the artifacts originating from this processing of temporally limited frames.
  • an energy compaction is obtained in many cases, such that some of the spectral values comprise a significantly larger magnitude than a plurality of other spectral values. Accordingly, there are, in many cases, a comparatively small number of spectral values having a magnitude, which is significantly above an average magnitude of the spectral values.
  • a typical example of a time-domain to time-frequency domain transform resulting in an energy compaction is the so-called modified-discrete-cosine- transform (MDCT).
  • the spectral values are often scaled and quantized in accordance with a psychoacoustic model, such that quantization errors are comparatively smaller for psychoacoustically more important spectral values, and are comparatively larger for psychoacoustically less- important spectral values.
  • the scaled and quantized spectral values are encoded in order to provide a bitrate-efficient representation thereof.
  • An embodiment according to the invention creates an audio decoder for providing a decoded audio information on the basis of an encoded audio information.
  • the audio decoder comprises an arithmetic decoder for providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values.
  • the audio decoder also comprises a frequency-domain to time-domain converter for providing a time-domain audio representation using the decoded spectral values, in order to obtain the decoded audio information.
  • the arithmetic decoder is configured to derive, in dependence on a state-index describing a state of the arithmetic decoder, a group-index from a variable-length codeword representing the group-index.
  • the arithmetic decoder is configured to derive values of a most-significant bit-plane of a tuple of spectral values using the group-index and an element-index, the element index describing (or designating, or selecting) an element within a group selected by the group-index.
  • the arithmetic decoder is configured to provide a tuple of decoded spectral values using the values of the most-significant bit-plane of the tuple of spectral values.
  • the arithmetic decoder is configured to select a cumulative-frequencies-table out of a set of 32 cumulative- frequencies-tables in dependence on the state-index describing the state of the arithmetic decoder, and to apply the selected cumulative frequencies-table to derive the group-index from the variable-length-codeword representing the group index.
  • This embodiment according to the invention is based on the finding that the usage of a set of 32 cumulative-frequencies-tables provides for an optimal trade off between an achievable bitrate and a complexity of an audio encoder or audio decoder.
  • 32 different cumulative-frequencies-tables are appropriate (in that they result in a reasonably low bitrate) for any relevant time-frequency-domain representation of an audio content.
  • the arithmetic decoder is configured to derive a 7-bit hash-table index from the state-index and to obtain a hash-table entry value from a hash-table, which hash-table comprises a mapping of 128 hash-table index values onto corresponding hash- table entry values.
  • the arithmetic decoder is configured to decide whether the hash-table entry value (i.e. the content of the hash-table at a memory position designated by the hash-table index value) is an escape value, a valid cumulative-frequencies-table- identifier value associated to a state-index value (e.g.
  • the arithmetic decoder is configured to scan through the entries of the hash-table until an escape value or a valid cumulative-frequencies-table identifier-value is found.
  • the arithmetic decoder is configured to provide a cumulative-frequencies-index-value in dependence on an identification of an interval of values in which the state-index value is contained, if the obtained hash-table entry value is the escape value, and to derive the cumulative-frequencies-index- value from the obtained hash-table entry value, if the hash- table entry value is a valid cumulative-frequencies-table identifier-value associated to the state-index value.
  • This embodiment of the invention is based on the finding that there are only a small number of "significant" states of the audio decoder, for which it is important to use a specific probability model (associated to a small number of only about 1 to 10 states), represented by a specific cumulative-frequencies-table (associated to a small number of only about 1 to 10 states). It has also been found that for the vast majority of states of the audio decoder, it is preferred to map the states onto a cumulative-frequencies-index-value on the basis of a determination of an interval of values, in which the state-index is contained, thereby mapping a whole interval of state-index values (typically a range of more than 1000 different state-index values) onto the same cumulative-frequencies-index- value.
  • the hash-table is configured to map 67 values of the 7-bit hash- table-index onto valid cumulative-frequencies-table identifier- values and to map 61 values of the 7-bit hash-table-index onto escape values. Accordingly, a comparatively small number of only 67 "significant" states are present.
  • any "insignificant" states (the number of which insignificant states is significantly larger than the number of significant states, for example by a factor of at least 10, but preferably by a factor of even more than 1000), which differ from the "significant” states, are mapped onto escape values, thereby making it easy to distinguish between a "significant" state and an "insignificant” state. Accordingly, the difference between significant and insignificant states can be determined fast and with low memory consumption.
  • the arithmetic decoder is configured to map 67 different values of the state index onto 67 different cumulative-frequencies-table identifier-values, such that 26 different cumulative-frequencies index values are associated to 67 different significant states described by the state-index value.
  • the arithmetic decoder is configured to map the different nonsignificant states onto nine different cumulative-frequencies index values, such that a total of nine different cumulative-frequencies-tables are available for use in connection with the non-significant states, for which a range-type mapping is performed in order to derive the cumulative-frequencies index value. It has been found that a very small number of preferably nine different cumulative-frequencies-tables is sufficient in order to obtain a bitrate-efficient arithmetic encoding for most of the states.
  • the preferably 67 significant states, to which cumulative-frequencies-tables are associated by valid cumulative-frequencies-table identifier values of the hash table, are associated with particularly characteristic patterns of the spectral values like, for example, trajectories in the time-frequency representation having a particular width and direction.
  • all the other states, which are considered as non-significant states and to which a cumulative- frequencies-table is associated using a range-based algorithm merely represent less characteristic spectral value distributions.
  • the audio encoder for providing an encoded audio information on the basis of an input audio information.
  • the audio encoder comprises a time-domain to frequency-domain converter for providing a frequency-domain audio representation on the basis of a time-domain representation of the input audio information, such that the frequency-domain audio representation comprises a set of spectral values.
  • the audio encoder also comprises an arithmetic encoder configured to encode a tuple of adjacent spectral values, or a preprocessed version thereof, using a variable-length codeword.
  • the arithmetic encoder is configured to map values of a most- significant bit plane of a tuple of spectral values onto a group index and an element index, the element index describing an element within a group selected by the group index.
  • the arithmetic encoder is further configured to select a cumulative-frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index describing a state of the arithmetic encoder and to arithmetically encode the group index using a selected cumulative-frequencies-table, in order to obtain an arithmetically encoded variable-length codeword.
  • the audio encoder according to this embodiment is based on the same idea as the audio decoder discussed above.
  • the audio encoder is based on the finding that the number of 32 cumulative-frequencies-tables brings along an optimized trade-off between bitrate-efficiency and encoding/decoding complexity.
  • Another embodiment according to the invention creates a method for providing a decoded audio representation on the basis of an encoded audio representation. Another embodiment according to the invention creates a method for providing an encoded audio representation on the basis of an input audio representation.
  • Another embodiment according to the invention creates a computer program for performing the inventive methods.
  • Fig. 1 shows a block schematic diagram of an audio encoder, according to an embodiment of the invention
  • Fig. 2 shows a block schematic diagram of an audio decoder, according to an embodiment of the invention
  • Fig. 3 shows a pseudo-program-code representation of an algorithm "tuples_decode()" for decoding a tuple of spectral values
  • Fig. 4 shows a schematic representation of a context for a state calculation
  • Fig. 5a shows a pseudo-program-code representation of an algorithm "arith_reset_context ()" for resetting a context
  • Fig. 5b shows a pseudo-program-code representation of an algorithm
  • Fig. 5c shows a pseudo-program-code representation of an algorithm
  • Fig. 5d shows a pseudo-program-code representation of an algorithm
  • Fig. 5e shows a pseudo-prograni-code representation of an algorithm "arith_decode
  • FIG. 5f shows a pseudo-program-code representation of an algorithm for deriving an element number value mm and a group offset value og from a group index nq;
  • Fig. 5g shows a pseudo-program-code representation of an algorithm for obtaining spectral values a, b, c, d of a most-significant bit-plane of a tuple of spectral values on the basis of the group offset value og and an element index value ne;
  • Fig. 5h shows a pseudo-program-code representation of an algorithm for combining a tuple a, b, c, d of spectral values with values of a less-significant bit-plane, in order to obtain an updated version of the tuple a, b, c, d of spectral values;
  • Fig.5i shows a pseudo-program-code representation of an algorithm "arith__update_context ()" for updating the context;
  • Fig.5j shows a legend of definitions and variables
  • Fig.6a shows as syntax representation of a unified-speech-and-audio-coding (USAC) raw data block
  • Fig.6b shows a syntax representation of a single channel element
  • Fig.6c shows syntax representation of a channel pair element
  • Fig.6d shows a syntax representation of an "ics" control information
  • Fig.6e shows a syntax representation of a frequency-domain channel stream
  • Fig.6f shows a syntax representation of arithmetically-coded spectral data
  • Fig.6g shows a syntax representation for decoding a set of tuples of spectral values
  • Fig.6h shows a legend of data elements and variables
  • Fig.7 shows a table representation of memory requirements of a previously used arithmetic coder
  • Fig. 8 shows a table representation of memory requirements of an arithmetic coder according to the present invention
  • Fig. 9 shows a block schematic diagram of a set-up for evaluating performance improvements obtained by the arithmetic coder according to the present invention.
  • Fig. 10 shows a table representation of bitrates required for encoding different pieces of audio information using a previously used arithmetic coder
  • Fig. 11 shows a table representation of bitrates required for encoding different pieces of audio information using the inventive concept
  • Fig. 12 shows, in the form of a table representation, a comparison between average bitrates produced by a previously used audio coder and an audio coder according to the present invention
  • Fig. 13 shows, in the form of a table representation, a comparison of bitrate reductions and bitrate increases obtained using the inventive concept when compared to a previously used concept;
  • Fig. 14 shows a table representation of entries of a table "arith_cf_ng_hash[]"
  • Figs. 15(1) to 15(10) show a table representation of entries of a table "arith_cf_ne []";
  • Figs. 16(1) to 16(32) show a table representation of entries of a table "arith_cf_ng [pki] [545]" for 32 different values 0 to 31 of the index pki;
  • Figs. 17(1) and 17(2) show a table representation of entries of a table "dgroups []";
  • Figs. 18(1) to 18(11) show a table representation of entries of a table "dvectors[]" ;
  • Figs. 19(1) to 19(32) show a table representation of entries of a table "egroups [a][b][c][df;
  • Fig. 20 show a flow chart of a method for providing a decoded representation of an audio information
  • Fig. 21 shows a flow chart of a method for providing an encoded representation of an audio information.
  • FIG. 1 shows a block schematic diagram of such an audio encoder 100.
  • the audio encoder 100 is configured to receive an input audio information 110 and to provide, on the basis thereof, a bitstream 112, which constitutes an encoded audio information.
  • the audio encoder 100 optionally comprises a preprocessor 120, which is configured to receive the input audio information 110 and to provide, on the basis thereof, a preprocessed input audio information HOa.
  • the audio encoder 100 also comprises an energy-compacting time-domain to frequency-domain signal transformer 130, which is also designated as signal converter.
  • the signal converter 130 is configured to receive the input audio information 110, 110a and to provide, on the basis thereof, a frequency-domain audio information 132, which preferably takes the form of a set of spectral values.
  • the signal transformer 130 may be configured to receive a frame of the input audio information 110, 110a (for example, a block of time-domain samples) and to provide a set of spectral values representing the audio content of the respective audio frame.
  • the signal transformer 130 may be configured to receive a plurality of subsequent, overlapping or non-overlapping, audio frames of the input audio information 110, 110a and to provide, on the basis thereof, a time-frequency-domain audio representation, which comprises a sequence of subsequent sets of spectral values, one set of spectral values associated with each frame.
  • the energy-compacting time-domain to frequency-domain signal transformer 130 may comprise an energy-compacting filterbank, which provides spectral values associated with different, overlapping or non-overlapping, frequency ranges.
  • the signal transformer 130 may comprise a windowing MDCT transformer 130a, which is configured to window the input audio information 110, 110a (or a frame thereof) using a transform window and to perform a modified-discrete-cosine-transform of the windowed input audio information 110, HOa (or of the windowed frame thereof).
  • the frequency- domain audio representation 132 may comprise a set of, for example, 1024 spectral values in the form of MDCT coefficients associated with a frame of the input audio information.
  • the audio encoder 100 may further, optionally, comprise a spectral post-processor 140, which is configured to receive the frequency-domain audio representation 132 and to provide, on the basis thereof, a post-processed frequency-domain audio representation 142.
  • the spectral post-processor 140 may, for example, be configured to perform a temporal noise shaping and/or a long term prediction and/or any other spectral post-processing known in the art.
  • the audio encoder further comprises, optionally, a sealer/quantizer 150, which is configured to receive the frequency-domain audio representation 132 or the post- processed version 142 thereof and to provide a scaled and quantized frequency-domain audio representation 152.
  • the audio encoder 100 further comprises, optionally, a psycho-acoustic model processor 160, which is configured to receive the input audio information 110 (or the post-processed version 110a thereof) and to provide, on the basis thereof, an optional control information, which may be used for the control of the energy-compacting time-domain to frequency- domain signal transformer 130, for the control of the optional spectral post-processor 140 and/or for the control of the optional sealer/quantizer 150.
  • the psycho- acoustic model processor 160 may be configured to analyze the input audio information, to determine which components of the input audio information 110, 110a are particularly important for the human perception of the audio content and which components of the input audio information 110, HOa are less important for the perception of the audio content.
  • the psycho-acoustic model processor 160 may provide control information, which is used by the audio encoder 100 in order to adjust the scaling of the frequency-domain audio representation 132, 142 by the sealer/quantizer 150 and/or the quantization resolution applied by the sealer/quantizer 150. Consequently, perceptually important scale factor bands (i.e. groups of adjacent spectral values which are particularly important for the human perception of the audio content) are scaled with a large scaling factor and quantized with comparatively high resolution, while perceptually less-important scale factor bands (i.e. groups of adjacent spectral values) are scaled with a comparatively smaller scaling factor and quantized with a comparatively lower quantization resolution. Accordingly, scaled spectral values of perceptually more important frequencies are typically significantly larger than spectral values of perceptually less important frequencies.
  • the audio encoder also comprises an arithmetic encoder 170, which is configured to receive the scaled and quantized version 152 of the frequency-domain audio representation 132 (or, alternatively, the post-processed version 142 of the frequency-domain audio representation 132, or even the frequency-domain audio representation 132 itself) and to provide arithmetic codeword information 172a, 172b on the basis thereof, such that the arithmetic codeword information represents the frequency-domain audio representation 152.
  • an arithmetic encoder 170 is configured to receive the scaled and quantized version 152 of the frequency-domain audio representation 132 (or, alternatively, the post-processed version 142 of the frequency-domain audio representation 132, or even the frequency-domain audio representation 132 itself) and to provide arithmetic codeword information 172a, 172b on the basis thereof, such that the arithmetic codeword information represents the frequency-domain audio representation 152.
  • the audio encoder 100 also comprises a bitstream payload formatter 190, which is configured to receive the arithmetic codeword information 172a, 172b.
  • the bitstream payload formatter 190 is also typically configured to receive additional information, like, for example, scale factor information describing which scale factors have been applied by the sealer/quantizer 150.
  • the bitstream payload formatter 190 may be configured to receive other control information.
  • the bitstream payload formatter 190 is configured to provide the bitstream 112 on the basis of the received information by assembling the bitstream in accordance with a desired bitstream syntax, which will be discussed below.
  • the arithmetic encoder 170 is configured to receive a plurality of tuples of, for example, four post-processed and scaled and quantized spectral values of the frequency-domain audio representation 132.
  • the arithmetic encoder comprises a most-significant-bitplane-extractor 174, which is configured to extract a most-significant bit-plane from a tuple of spectral values. It should be noted here that the most-significant bit-plane may comprise one or even more bits (for example, two or three bits), which are the most-significant bits of the spectral values of the tuple of spectral values.
  • the most-significant bit-plane extractor 174 provides a most-significant bit-plane 176 of a tuple of spectral values (which spectral values are preferably adjacent in frequency).
  • the arithmetic encoder 170 also comprises a group index determinator/element index determinator 178, which is configured to map the most-significant bit-plane 176 onto a group-index value ng and an element-index value ne. This mapping may be performed using a look-up table, for example, the look-up table "egroups" discussed in detail below.
  • the group index determinator/element index determinator 178 may be configured to map some of the combinations of values of the most-significant bit-plane 176 onto a group index ng of a group comprising only one element and may be configured to map other combinations of values of the most- significant bit-plane 176 onto a group index ng of a group comprising a plurality of combinations of values. Accordingly, the group index determinator/element index determinator may be configured to map such combinations of values of the most- significant bit-plane 176 which comprise a comparatively high probability onto groups comprising only one or only a few elements, and to map combinations of values of the most-significant bit-plane 176 comprising a comparatively lower probability onto groups comprising more elements.
  • the element index ne of a combination of values which is mapped to a group comprising only a single element, may only take a single value and may therefore be neglected.
  • the element index ne of a combination of values, which is mapped to a group comprising a plurality of elements may take a plurality of values.
  • the group index determinator/element index determinator 178 provides a group index value ng (also designated with 180a) and, if required, the element index value ne (also designated with 180b), wherein the element index value ne may be set to a default value or omitted if the group ng to which the most-significant bit- plane 176 is mapped, comprises a single element only.
  • the arithmetic encoder 170 also comprises a first codeword determinator 180, which is configured to determine an arithmetic codeword acod_ng [pki][ng] representing the group index ng.
  • the first codeword determinator 180 may provide an arithmetic codeword acodjtie [ne] representing the element index ne, if the number of elements mm of the group ng is larger than 1. Otherwise, the provision of the arithmetic codeword acod_ne [ne] representing the element index ne may be omitted.
  • the codeword determinator 180 may also provide one or more escape codewords (also designated herein with "ARITH_ESCAPE") indicating, for example, how many less-significant bit-planes are available (and, consequently, indicating the numeric weight of the most-significant bit- plane).
  • the first codeword determinator 180 may be configured to provide the codeword associated with a group index ng using a selected curnulated-frequencies-table having (or being referenced by) a cumulative-frequencies-table index pki.
  • the arithmetic encoder preferably comprises a state tracker 182, which is configured to track the state of the arithmetic encoder, for example, by observing which tuples of spectral values have been encoded previously.
  • the state tracker 182 consequently provides a state information 184, for example a state value designated with "s" or "t".
  • the arithmetic encoder 170 also comprises a cumulative-frequencies-table selector 186, which is configured to receive the state information 184 and to provide an information 188 describing the selected cumulative-frequencies-table to the codeword determinator 180.
  • the cumulative-frequencies-table selector 186 may provide a cumulative- frequencies-table index pki describing which cumulative-frequencies-table, out of a set of 32 cumulative frequencies tables, is selected for usage by the codeword determinator.
  • the cumulative-frequencies-table selector 186 may provide the entire selected cumulative-frequencies-table to the codeword determinator.
  • the codeword determinator 180 may use the selected cumulative-frequencies-table for the provision of the codeword acod_ng[pki][ng] of the group index ng, such that the actual codeword acod_ng[pki][ng] encoding the group index ng is dependent on the value of ng and the cumulative-frequencies-table index pki, and consequently on the current state information 184.
  • the first codeword determinator 180 may use a default (state-independent) cumulative-frequencies-table for the provision of the codeword acod_ne [ne], which may, however, be dependent on the number of elements within the selected group ng. Further details regarding the coding process and the obtained codeword format will be described below.
  • the arithmetic encoder 170 further comprises a less-significant bit-plane extractor 189a, which is configured to extract one or more less-significant bit-planes from the scaled and quantized frequency-domain audio representation 152, if one or more of the values of a tuple of spectral values to be encoded exceed the range of values encodeable using the most-significant bit-plane only.
  • the less-significant bit-planes may comprise one or more bits per spectral values, as desired. Accordingly, the less-significant bit-plane extractor 189a provides a less-significant bit-plane information 189b.
  • the arithmetic encoder 170 also comprises a second codeword determinator 189c, which is configured to receive the less-significant bit-plane information 189d and to provide, on the basis thereof, 0, 1 or more codewords "acod_r" representing the content of 0, 1 or more less-significant bit- planes.
  • the second codeword determinator 189c may be configured to apply an arithmetic encoding algorithm or any other encoding algorithm in order to derive the less-significant bit-plane codewords "acodjr" from the less-significant bit-plane information 189b.
  • the number of less-significant bit-planes may vary in dependence on the value of the scaled and quantized spectral values 152, such that there may be no less-significant bit-plane at all, if the scaled and quantized spectral values of the current tuple are comparatively small, such that there may be one less-significant bit-plane if the scaled and quantized spectral values of the current tuple are of a medium range and such that there may be more than one less-significant bit-plane if the scaled and quantized spectral values take a comparatively large value.
  • the arithmetic encoder 170 is configured to encode a tuple of scaled and quantized spectral values, which is described by the information 152, using a hierarchical encoding process.
  • the most-significant bit-plane (comprising, for example, one, two or three bits per spectral value) is encoded to obtain an arithmetic codeword "acod_ng [pki][ng]" of a group index ng and, in some cases, a codeword "acod_ne [ne]" of an element index ne.
  • One or more less-significant bit-planes are encoded to obtain one or more codewords "acod_r".
  • codewords "acod_r” When encoding the most-significant bit-plane, the combination of values of the most-significant bit-plane is mapped to a group ng of the plurality of groups wherein some of the groups comprise one element only and wherein others of the groups comprise a plurality of elements each. Accordingly, the probability of the different combinations of values is considered.
  • the group index ng and the element index ne are encoded, wherein 32 different cumulative-frequencies-tables are available for the encoding of the group index ng in dependence on a state of the arithmetic encoder 170, i.e. in dependence on previously-encoded tuples of spectral values. Accordingly, codewords "acod__ng [pki][ng]” and “acod_ne [ne]” are obtained, wherein the latter codeword is only included in the bitstream 112 if the group index ng designates a group comprising more than one element. In addition, one or more codewords "acod_r" are provided and included into the bitstream if one or more less-significant bit-planes are present.
  • the audio encoder 100 may optionally be configured to decide whether an improvement in bitrate can be obtained by resetting the context, for example by setting the state index to a default value. Accordingly, the audio encoder 100 may be configured to provide a reset information (e.g. named "arith_reset_flag") indication whether the context for the arithmetic encoding is reset, and also indicating whether the context for the arithmetic decoding in a corresponding decoder should be reset.
  • a reset information e.g. named "arith_reset_flag”
  • bitstream format Details regarding the bitstream format and the applied cumulative-frequency tables will be discussed below.
  • FIG. 2 shows a block schematic diagram of such an audio decoder 200.
  • the audio decoder 200 is configured to receive a bitstream 210, which represents an encoded audio information and which may be identical to the bitstream 112 provided by the audio encoder 100.
  • the audio decoder 200 provides a decoded audio information 212 on the basis of the bitstream 210.
  • the audio decoder 200 comprises an optional bitstream payload deformatter 220, which is configured to receive the bitstream 210 and to extract from the bitstream 210 an encoded frequency-domain audio representation 222.
  • the bitstream payload deformatter 220 may be configured to extract from the bitstream 210 arithmetically-coded spectral data like, for example, an arithmetic codeword "acod_ng [pki][ng]” representing a group index ng, an arithmetic codeword “acod_ne [ne]” representing an element index ne and a codeword "acod_r” representing a content of a less-significant bit-plane of the frequency-domain audio representation.
  • the encoded frequency-domain audio representation 222 constitutes (or comprises) an arithmetically-encoded representation of spectral values.
  • the bitstream payload deformatter 220 is further configured to extract from the bitstream additional control information, which is not shown in Fig. 2.
  • the bitstream payload deformatter is optionally configured to extract from the bitstream 210 a state reset information 224, which is also designated as arithmetic reset flag or "arith_reset_flag".
  • the audio decoder 200 comprises an arithmetic decoder 230, which is also designated as "spectral noiseless decoder".
  • the arithmetic decoder 230 is configured to receive the encoded frequency-domain audio representation 220 and, optionally, the state reset information 224.
  • the arithmetic decoder 230 is also configured to provide a decoded frequency-domain audio representation 232, which may comprise a decoded representation of spectral values.
  • the decoded frequency-domain audio representation 232 may comprise a decoded representation of tuples of spectral values, which are described by the encoded frequency-domain audio representation 220.
  • the audio decoder 200 also comprises an optional inverse quantizer/rescaler 240, which is configured to receive the decoded frequency-domain audio representation 232 and to provide, on the basis thereof, an inversely-quantized and rescaled frequency-domain audio representation 242.
  • the audio decoder 200 further comprises an optional spectral pre-processor 250, which is configured to receive the inversely-quantized and rescaled frequency-domain audio representation 242 and to provide, on the basis thereof, a pre-processed version 252 of the inversely-quantized and rescaled frequency-domain audio representation 242.
  • the audio decoder 200 also comprises a frequency-domain to time-domain signal transformer 260, which is also designated as a "signal converter".
  • the signal transformer 260 is configured to receive the pre-processed version 252 of the inversely-quantized and rescaled frequency-domain audio representation 242 (or, alternatively, the inversely-quantized and rescaled frequency-domain audio representation 242 or the decoded frequency-domain audio representation 232) and to provide, on the basis thereof, a time-domain representation 262 of the audio information.
  • the frequency-domain to time-domain signal transformer 260 may, for example, comprise a transformer for performing an inverse- modified-discrete-cosine transform (IMDCT) and an appropriate windowing (as well as other auxiliary functionalities, like, for example, an overlap-and-add).
  • IMDCT inverse- modified-discrete-cosine transform
  • windowing as well as other auxiliary functionalities, like, for example, an overlap-and-add
  • the audio decoder 200 may further comprise an optional time-domain post-processor 270, which is configured to receive the time-domain representation 262 of the audio information and to obtain the decoded audio information 212 using a time-domain post-processing. However, if the post-processing is omitted, the time-domain representation 262 may be identical to the decoded audio information 212.
  • the inverse quantizer/rescaler 240, the spectral pre-processor 250, the frequency-domain to time-domain signal transformer 260 and the time-domain post-processor 270 may be controlled in dependence on control information, which is extracted from the bitstream 210 by the bitstream payload deformatter 220.
  • a decoded frequency- domain audio representation 232 for example, a set of spectral values associated with an audio frame of the encoded audio information, may be obtained on the basis of the encoded frequency-domain representation 222 using the arithmetic decoder 230.
  • the set of, for example, 1024 spectral values which may be MDCT coefficients, are inversely quantized, rescaled and pre-processed. Accordingly, an inversely-quantized, rescaled and spectrally pre-processed set of spectral values (for example, 1024 MDCT coefficients) is obtained.
  • a time-domain representation of an audio frame is derived from the inversely-quantized, rescaled and spectrally pre-processed set of frequency-domain values (e.g. MDCT coefficients). Accordingly, a time-domain representation of an audio frame is obtained.
  • the time-domain representation of a given audio frame may be combined with time-domain representations of previous and/or subsequent audio frames. For example, an overlap-and-add between time-domain representations of subsequent audio frames may be performed in order to smoothen the transitions between the time-domain representations of the adjacent audio frames and in order to obtain an aliasing cancellation.
  • the arithmetic decoder 230 comprises a group index determinator/element index determinator 280, which is configured to receive the arithmetic codeword acod_ng [pki][ng] describing the group index ng and to also receive the codeword acod_ne [ne] of the element index ne if the codeword acod_ne [ne] is available.
  • the group index determinator 280 is configured to provide a decoded group index value ng and to also provide a decoded element index value ne if the group described by the group index value ng comprises more than one element. However, the group index determinator/element index determinator 280 may be configured to provide the default element index value ne, for example, of one if the group described by the group index value ng comprises one element only. The group index determinator/element index determinator 280 may be configured to use a cumulative- frequencies table out of a set comprising a plurality of 32 cumulative frequencies tables for deriving the group index value ng from the arithmetic codeword "acod_ng [pki][ng]".
  • the arithmetic decoder 280 further comprises a most-significant bit-plane determinator 284, which is configured to derive values 286 of a most-significant bit-plane of a 2-bit tuple (or 3 -bit tuple) of spectral values on the basis of a group index value ng and an element index value ne.
  • the arithmetic decoder 230 further comprises a less-significant bit-plane determinator 288, which is configured to receive one or more codewords "acod_r" representing one or more less-significant bit-planes of a tuple of spectral values.
  • the less-significant bit-plane determinator 288 is configured to provide decoded values 290 of one or more less-significant bit-planes.
  • the audio decoder 200 also comprises a bit-plane combiner 292, which is configured to receive the decoded values 286 of the most-significant bit-plane of the tuple of spectral values and the decoded values 290 of one or more less-significant bit planes of the tuple of spectral values if such less- significant bit-planes are available for the current tuple of spectral values.
  • the bit-plane combiner 292 provides a tuple of encoded spectral values, which is part of the decoded frequency-domain audio representation 232.
  • the arithmetic decoder 230 is typically configured to provide a plurality of tuples of decoded spectral values in order to obtain a full set of decoded spectral values associated with a current frame of the audio content.
  • the arithmetic decoder 230 further comprises a cumulative-frequencies-table selector 296, which is configured to select one of the 32 cumulative-frequencies tables in dependence on a state index 298 describing a state of the arithmetic decoder.
  • the arithmetic decoder 230 further comprises a state tracker 299, which is configured to track a state of the arithmetic decoder in dependence on the tuples of previously-decoded spectral values.
  • the state information may optionally be reset to a default state information in response to the state reset information 224. Accordingly, the cumulative-frequencies-table selector 296 is configured to provide an index (e.g.
  • the audio decoder 200 is configured to receive a bitrate-efficiently-encoded frequency-domain audio representation 222 and to obtain a decoded frequency-domain audio representation on the basis thereof.
  • the arithmetic decoder 230 which is used for obtaining the decoded frequency-domain audio representation 232 on the basis of the encoded frequency-domain audio representation 222, a probability of different combinations of values of the most-significant bit-plane is exploited by using an arithmetic decoder 280, which is configured to apply a cumulative-frequencies-table.
  • the decoding which will be discussed in the following, is used in order to allow for a so-called “spectral noiseless coding” of typically post-processed, scaled and quantized spectral values.
  • the spectral noiseless coding is used in an audio encoding/decoding concept to further reduce the redundancy of the quantized spectrum, which is obtained, for example, by an energy-compacting time-domain to a frequency- domain transformer.
  • the spectral noiseless coding scheme which is used in embodiments of the invention, is based on an arithmetic coding in conjunction with a dynamically-adapted context.
  • the spectral values are processed by the spectral noiseless coding by tuples combining 4 successive spectral values in frequency and called then 4-tuples.
  • the noiseless coding is fed by (original or encoded representations of) quantized spectral values and uses context-dependent cumulative- frequencies-tables derived, for example, from four previously-decoded neighboring 4- tuples.
  • the neighborhood in both time and frequency is taken into account as illustrated in Fig. 4.
  • the cumulative-frequencies-tables (which will be explained below) are then used by the arithmetic coder to generate a variable-length binary code and by the arithmetic decoder to derive decoded values from a variable-length binary code.
  • the arithmetic coder 170 produces a binary code for a given set of symbols in dependence on the respective probabilities.
  • the binary code is generated by mapping a probability interval, where the set of symbol lies, to a codeword.
  • Fig. 3 shows a pseudo-program code representation of the process of decoding a plurality of tuples of spectral values.
  • the process of decoding a plurality of tuples of spectral values comprises an initialization 310 of a context.
  • the initialization 310 of the context selectively comprises a reset of the context using the function "arith_reset_context ()" or a derivation of the current context from a previous context using the function "arith_map_context (lg/4)". Both the reset of the context and the derivation of the current context from a previous context will be discussed below.
  • the decoding of a plurality of tuples of spectral values also comprises an iteration of a tuple decoding 312 and a context update 314, which context update is performed by a function "arith_update_context (a,b,c,d,i,lg/4)", which is described below.
  • the tuple decoding 312 and the context update 314 are repeated lg/4 times, wherein lg/4 indicates the number of tuples of spectral values to be decoded.
  • the tuple decoding 312 comprises a context- value calculation 312a, a group index decoding 312b, an element index decoding 312c, a most-significant bit-plane determination 312d and a less-significant bit-plane addition 312e.
  • the state value computation 312a comprises the computation of a first state value s using the function "arith_get_context (i)", which function returns the first state value s.
  • the state value computation 312a also comprises a computation of a level value lev, which is obtained by shifting the first state value s to the right by 24-bits.
  • the state value computation 312a also comprises a computation of a second state value t according to the formula shown in Fig. 3.
  • the group index decoding 312b comprises an iterative execution of a decoding algorithm 312ba, wherein a variable j is initialized to 0 before a first execution of the algorithm 312ba.
  • the algorithm 312ba comprises a computation of a state index pki (which also serves as a cumulative-frequencies-table index) in dependence on the second state value t using a function "arith_get_pk()", which is discussed above.
  • the algorithm 312ba also comprises the selection of a cumulative-frequencies-table in dependence on the state index pki, wherein a variable "cum_freq" may be set to a starting address of one out of 32 cumulative-frequencies-tables in dependence on the state index pki.
  • a variable "cfi” may be initialized to a length of the selected cumulative-frequencies-table, which is equal to the number of symbols in the alphabet, i.e.
  • a group index ng may be obtained by executing a function "arith_decode()", taking into consideration the selected cumulative frequencies-table.
  • the algorithm 312ba also comprises checking whether the group index ng is equal to an escape symbol "ARITH_ESCAPE" or not. If the group index is not equal to the arithmetic escape symbol, the algorithm 312ba is aborted ("break"-condition) and the remaining instructions of the algorithm 312ba are therefore skipped. Accordingly, execution of the process is continued with the element index decoding 312c (if required) or with the most- significant-bitplane-determination 312d. In contrast, if the decoded group index ng is identical to the arithmetic escape symbol "ARITH_ESCAPE", the level value lev is increased by two. Also, if the algorithm 312ba is executed for the first time, i.e.
  • the second state value t is increased by 4194304, and the second state value t is set to 0 otherwise.
  • the variable j is set to 1 prior to the repetition of the algorithm 312ba. As mentioned, the algorithm 312ba is repeated until the decoded group index ng is different from the arithmetic escape symbol.
  • the element index decoding 312c is executed, if required.
  • a cardinal (number of elements) of the group having the group index ng is determined, wherein the cardinal mm of the group designated by the group index ng is described by the eight least-significant bits (bits 0-7) of a table entry "dgroups [ng]" of the table "dgroups” at table position ng. If the cardinal mm of the group designated by the group index ng is larger than one, the element index ne is obtained by executing an algorithm 312ca.
  • the element index ne may optionally be set to 0, or a different default value.
  • the algorithm 312ca comprises the determination of a start address "cum_freq" of an appropriate cumulative-frequencies-table or a cumulative- frequencies-subtable.
  • the variable "cum_freq” may be set to a sum of the start address of the cumulative-frequencies-table "arith_cf_ne” and the value (mm) * (mm- l)/2, as shown in Fig. 3.
  • variable cfl may be initialized to an appropriate length value of the respective cumulative-frequencies-table or cumulative-frequencies-subtable, which is equal to the number of elements mm within the group of index ng
  • the element index ne may be obtained by performing the function "arith_decode()", wherein the selected cumulative-frequencies-table (e.g. a subtable of the table "arith_cf_ne") associated to the encoding of the element index is used.
  • the determination 312d of the most-significant bit-plane is executed.
  • an entry of the table "dgvectors" is evaluated, the index of which is determined by the most significant bits (e.g. bits 8-15) of the value j of the table "dgroups" at element index ng and by the element index ne, as can be seen in Fig. 3.
  • the value of the most-significant bit-plane of a first spectral value "a" is determined by an entry of the table "dgvectors" at an element index 4*(j»8+ne).
  • the less-significant bit planes are obtained, for example as shown at reference numerals 312e in Fig. 3.
  • reference numerals 312e For each less significant bit plane of the tuple, 1 over 16 binary combinations is decoded.
  • the concept for obtaining the values of the less-significant bit planes is not of particular relevance for the present invention.
  • 4-tuples of quantized spectral coefficients are noiselessly coded and transmitted (e.g. in the bitstream) starting from the lowest-frequency coefficient and progressing to the highest- frequency coefficient.
  • Coefficients from an advanced audio coding are stored in an array called "x_ac_quant[g][wm][sfb][bin]", and the order of transmission of the noiseless-coding-codeword (e.g. acod_ng, acod_ne, acod_r) is such that when they are decoded in the order received and stored in the array, "bin” (the frequency index) is the most rapidly incrementing index and "g” is the most slowly incrementing index.
  • the order of decoding is a, b, c, d.
  • the values a, b, c, d are spectral values of adjacent frequencies, wherein the spectral value a is associated to a lower frequency than the spectral value b, the spectral value b is associated to a lower frequency than the spectral value c, and the spectral value c is associated to a lower frequency than the spectral value d.
  • Coefficients from the transform-coded-excitation (tcx) are stored directly in an array x_tcx_invquant[win][bin], and the order of the transmission of the noiseless coding codewords is such that when they are decoded in the order received and stored in the array,
  • the order of decoding is a, b, c, d.
  • the spectral values describe a transform-coded-excitation of the linear-prediction filter of a speech coder
  • the spectral values a, b, c, d are associated to adjacent and increasing frequencies of the transform-coded-excitation.
  • the audio decoder 200 may be configured to apply the decoded frequency-domain audio representation 232, which is provided by the arithmetic decoder 230, both for a "direct” generation of a time-domain audio signal representation using a frequency-domain to time-domain signal transform and for an "indirect” provision of an audio signal representation using both a frequency-domain to time-domain decoder and a linear- prediction-filter excited by the output of the frequency-domain to time-domain signal transformer.
  • the arithmetic decoder 200 is well-suited for decoding spectral values of a time-frequency-domain representation of an audio content encoded in the frequency-domain and for the provision of a time-frequency-domain representation of a stimulus signal for a linear-prediction-filter adapted to decode a speech signal encoded in the linear-prediction-domain.
  • the arithmetic decoder is well-suited for use in an audio decoder which is capable of handling both frequency-domain-encoded audio content and linear-predictive-frequency-domain- encoded audio content.
  • the current context is stored in a global variable q [2] [290], which takes the form of an array having a first dimension of 2 and a second dimension of 290.
  • a past context is stored in a variable qs [258], which takes the form of a table having a dimension of 258.
  • the variable "previous_lg/4" describes a number of 4-tuples of a past context.
  • the reset of the context which is performed by the function "arith_reset_context()"
  • the entries "v" of the arrays q and qs (designated with qs[i].v, q[0][i].v and q[l][i].v) are initialized to -1.
  • the variable "previous_lg" is initialized to 1024.
  • mapping is performed if the number of spectral values associated to the current audio frame is different from the number of spectral values associated to the previous audio frame.
  • details regarding the mapping in this case are not particularly relevant for the key idea of present invention, such that reference is made to the pseudo program code of Fig. 5b for details.
  • the first state value s (as shown in Fig. 3) can be obtained as a returned value of the function "arith_get_context(i)", a pseudo program code representation of which is shown in Fig. 5c.
  • Fig. 4 shows the context used for a state calculation.
  • Fig. 4 shows a two-dimensional representation of tuples of spectral values, both over time and frequency.
  • An abscissa 410 describes the time, and an ordinate 412 describes the frequency.
  • a tuple 420 to decode is associated with a time index tO and a frequency index i (keeping in mind that the spectral values of the tuple 420 to decode are associated to four different frequencies).
  • the tuples having frequency indices i-1 and i-2 are already decoded at the time at which the tuple 420 having the frequency index i is to be decoded.
  • a tuple 430 having a time index tO and a frequency index i-1 is already decoded before the tuple 420 is decoded, and the tuple 430 is considered for the context which is used for decoding the tuple 420.
  • a tuple 440 having a time index t-1 and a frequency index i-1, a tuple 444 having a time index t-1 and a frequency index i, and a tuple 448 having a time index t-1 and a frequency index i+1 are already decoded before the tuple 420 is decoded, and are considered for the determination of the context which is used for decoding the tuple 420.
  • some other tuples already decoded which are represented by squares having dashed lines, and other tuples, which are not yet decoded and which are shown by circles having dashed lines, are not used for determining the context for decoding the tuple 420.
  • Fig. 5c which shows the functionality of the function "arith_get_context()"
  • the function "arith_get_context()" comprises a variable initialization 530a, during which the variables t ⁇ , tl, t2 and t3 are initialized in dependence on the entries "v" of the array q at index positions (0, i), (1, i-1), (0, i-1) and (0, i+1). Accordingly, the variables t0 to t3 are initialized with the values of the entries "v", which are associated to the tuples 444, 430, 440, 448 respectively as shown in Fig. 4.
  • the function "arith_get_context()" performs a subsequent check of a plurality of conditions, wherein the function “arith_get_context()" is terminated as soon as a "return” instruction is reached, wherein the return instruction (or operator) serves to return its operand (following the return instruction or operator) as the state value s.
  • the execution of the function "arith_get_context()" comprises a first condition check 530b. If it is found that (the values of) all the variables t ⁇ , tl, t2 and t3 are smaller than 10, the return value is computed as shown at reference numeral 530b, and the function "arith_get_context()" is terminated with the return of said return value.
  • the execution of the function "arith_get_context()" also comprises a second condition check 530c. If it is found in the second condition check 530c that all the variables t ⁇ , tl, t2 and t3 are smaller than 34, the variables t2 and t3 are conditionally modified, as shown at the reference numeral 530c, and the return value is computed as shown at reference numeral 530c.
  • the variable t2 is set to 2 if the variable t2 is greater than 1 and smaller than 10. Otherwise, if the variable t2 is greater than or equal to 10 the variable t2 is set to 3.
  • the variable t3 is set to 2, if the variable t3 is greater than 1 and smaller than 10. Otherwise, if the variable t3 is greater than or equal to 10, the variable t3 is set to 3. Accordingly, the range of values of the variable t2 is limited to a maximum positive value of 3.
  • a third condition check 53Od is performed. If it is found in the third condition check 530d that both the variables t0 and tl are smaller than 90, then the return value is computed as shown in reference numeral 530d, wherein the values of the variables t2 and t3 are left out of consideration. If 5 however, none of the conditions of the first condition check 530b, of the second condition check 530c and of the third condition check 53Od is fulfilled, a fourth condition check 530e is performed, in which it is determined whether the variables t0 and tl are both smaller than 544. If this is the case, the return value is computed as shown in reference numeral 530e, and the function "arith_get_context()" is terminated.
  • the context computation 530f comprises a variable initialization 530fa, a variable rescaling 530fb, a table-based value adaptation 530fc and a return value computation 530fd.
  • variables a ⁇ , b ⁇ , c0, d0 are set to the values of the entries "a", "b", “c", "d” of the array q at the array position (0,i) if the variable t0 takes a value greater than 1.
  • the variables a ⁇ , b ⁇ , cO and d0 are set to the spectral values a, b, c, d of a previously-decoded tuple of spectral values of time index t-1 and frequency index i, if the value of the variable t0 is greater than 1.
  • the variables al, bl, cl, dl are set to the spectral values a, b, c, d of the previously-decoded tuple of spectral values and the time index t0 and frequency index i-1.
  • the variables a ⁇ , b ⁇ , cO, d ⁇ , al, bl, cl, dl are iteratively rescaled in that the number representations are iteratively shifted to the right by one bit until all of the variables a ⁇ , b ⁇ , cO, d ⁇ , al, bl, cl, dl are in a range between -4 and +3, including the boundaries —4 and +3.
  • variable 1 indicates how often the set of variables a ⁇ , b ⁇ , cO, d ⁇ , al, bl, cl, dl has been shifted to the right, wherein at least one shift-to-the-right operation is performed. Accordingly, adapted variables a ⁇ , b ⁇ , cO, d0, al, bl, cl, dl are obtained, which are all in a range between -4 and +3.
  • the table-based values adaptation 530fc is performed.
  • the variable t0 is set to a value, which is determined by an entry of the table (or array) "egroups", if the variable t0 is greater than 1.
  • the entry at the position (4+aO, 4+b0, 4+cO, 4+d0) is used for this purpose.
  • the variable tl is set to a value, which is determined by an entry of the table "egroups" at a table position (4+al, 4+bl, 4+cl, 4+dl).
  • a return value is computed in dependence on the variable 1 (which indicates how often a shift-to-the-right operation has been applied), and also in dependence on the variables tO and tl, as shown at reference numeral 530fd.
  • the return value of the function "arith_get_context" is determined by the most-relevant bit planes (e.g. the three most- relevant bits) of the tuples 444, 430, 440 and 448 of the Fig. 4.
  • a table lookup is performed if the variable tO is greater than or equal to 544 or the variable tl is greater than or equal to 544, while a numerical computation of the return value, using multiplications and additions, is used otherwise.
  • a more liberate and more detailed computation of the return value of the function arith_get_context is performed if one of the variables tO and tl is greater than or equal 544.
  • variable "lev” is derived from the returned value s of the function arith_get_context(i).
  • the variable lev is derived from the value s by shifting the value to the right by 24 bits.
  • the state variable t is also derived from the value s by performing a mask operation of the value s by means of a "and” operation between the value s and the hexadecimal value "OxFFFFFF" and by adding a value of "1" to the result of the end operation.
  • process 312b of group index decoding will be discussed, which process 312b is based on a previous calculation of the state value t as described above. Also, the iterative execution of the algorithm 312ba comprises a call of the function "arith_get_pk()" with the state value t (as shown in Fig. 3) as a parameter.
  • the function "arith_get_pk()" will subsequently be described in reference to Fig. 5d.
  • the execution of the function “arith_getjpk()” comprises the initialization of an array psci with 28 values, as shown at reference numeral 540a.
  • the function “arith_get_pk()” comprises the initialization of a pointer p and of the variables i, j, as shown at reference numeral 540b.
  • the algorithm “arith_get_pk()” also comprises an initialization of the variable i to a value, which is equal to 63 *t, wherein t is the parameter handed over to the function "arith_get_pk()", when the function "arith_get_pk()" is called.
  • the input variable s of the function "arith_get_pk()" may be identical to the variable t of the algorithm "tuples_decode()" as shown in Fig. 3.
  • the initialization of the variable i is shown at reference numeral 540c.
  • the function "arith_get_pk()" also comprises an iterative execution of a hash table access 54Od, wherein the hash table access 54Od is repeated until a "break” condition is reached, or until a "return” operator is reached. If the "break” condition is reached, a range-based provision 54Oe of a return value is performed. If, however, the return operator is reached, the operand of the return operator is returned and the function "arith_getjpk()" is terminated.
  • the hash table access 54Od comprises an iterative execution of a first step 540da, a second step 540db, a third step 540dc and a fourth step 540dd.
  • the variable j is set to the value of an entry of the table "ari_pk_hash", wherein the index of the entry is determined by the seven least-significant bits of the variable i.
  • the second step 540db it is determined whether the value of the variable j, which is obtained in the first step 540da, takes the hexadecimal value of OxFFFFFFFF.
  • bits 8 to 31) of the value of the variable j are equal to the value of the input variable t of the function "arith_get_pk()". If this is the case, the eight least significant bits (bits 0 to 7) of the variable j are returned as a return value of the function "arith_get_pk()", and the function "arith_get_pk()" is terminated. If, however, it is found that the condition of the third step 540dc is not fulfilled, the variable i is incremented by 1 (step 540dd) and the hash table access 54Od is repeated, starting from the first step 540da.
  • the range-based provision 54Oe of a return value comprises an initialization 540ea of the pointer p to a starting point within the array psci.
  • the starting point is determined by bits 23 and 24 of the input variable t of the function "arith_get_pk()", which corresponds to the number of escape symbol "ARITH_ESCAPE" already decoded for the present tuple to decode.
  • the pointer p is initialized to point at the first entry (24) of the array psci if the bits 23 and 24 of the input variable t take the values "00", to the eighth entry (30) of the array psci, if the bits 23 and 24 of the input variable s take the values "01", to the 15 th entry (5) of the array psci, if the 23 rd and 24 th bit of the input variable s take the values "10" and to the 22 nd entry (5) of the array psci, if the 23 rd and 24 th bit of the input variable t take the values "11".
  • variable j is set to take the value represented by the 22 least-significant bits (bits 1 to bit 22) of the input variable t, as shown at reference numeral 540eb.
  • a decision is made which entry of the array psci is returned as the return value of the algorithm "arith_get_pk()". The following decision is made:
  • step 540ea the second entry after the starting point determined in step 540ea is returned;
  • the function "arith_get_pk()" which is called with the state value t, provides the value pki as a return value, as can be seen in Fig. 3 at reference numeral 312ba.
  • the value of the variable pki is used to select a cumulative-frequencies-table for the execution of the function "arith_decode()" as it has been discussed with reference to Fig.3. Accordingly, the variable “cum_freq[]" is initialized appropriately to designate the selected cumulative-frequencies-table.
  • arith_decode() uses the helper function "arith_f ⁇ rst_symbol (void)", which returns TRUE, if it is the first symbol of the sequence and FALSE otherwise.
  • the function “arith_decode()” also uses the helper function "arith_get_next_bit()", which gets and provides the next bit of the bitstream.
  • the function "arith_decode()" uses the global variables "low”, “high” and “value”. Further, the function “arith_decode()” receives, as an input variable, the variable “cum_freq[]", which points towards a first entry or element (having element index or entry index 0) of the selected cumulative-frequencies-table. Also, the function “arith_decode()” uses the input variable cfl, which indicates the length of the selected cumulative- frequencies-table designated by the variable "cum_freq[]".
  • the function "arith_decode()" comprises, as a first step, a variable initialization 550a, which is performed if the helper function "arith_f ⁇ rst_syrnbol()" indicates that the first symbol of a sequence of symbols is being decoded.
  • the value initialization 550a initializes the variable "value” in dependence on a plurality of, for example, 20 bits, which are obtained from the bitstream using the helper function "arith_get_next_bit", such that the variable "value” takes the value represented by said bits.
  • the variable “low” is initialized to take the value of 0
  • the variable "high” is initialized to take the value of 1048575.
  • variable "range” is set to a value, which is larger, by 1, than the difference between the values of the variables "high” and “low”.
  • variable “cum” is set to a value which represents a relative position of the value of the variable “value” between the value of the variable “low” and the value of the variable “high”. Accordingly, the variable “cum” takes, for example, a value between 0 and 2 16 in dependence on the value of the variable "value”.
  • the pointer p is initialized to a value which is smaller, by 1, than the starting address of the selected cumulative-frequencies-table.
  • the algorithm "arith_decode()" also comprises an iterative cumulative-frequencies-table- search 550c.
  • the iterative cumulative-frequencies-table-search is repeated until the variable cfl is smaller than or equal to 1.
  • the pointer variable q is set to a value, which is equal to the sum of the current value of the pointer variable p and half the value of the variable cfl.
  • the pointer variable p is set to the value of the pointer variable q, and the variable cfl is incremented. Finally, the variable cfl is shifted to the right by one bit, thereby effectively dividing the value of the variable cfl by 2 and neglecting the modulo portion.
  • the iterative cumulative-frequencies-table-search 550C effectively compares the value of the variable "cum" with a plurality of entries of the selected cumulative- frequencies-table, in order to identify an interval within the selected cumulative- frequencies-table, which is bounded by entries of the cumulative-frequencies-table, such that the value cum lies within the identified interval.
  • the entries of the selected cumulative-frequencies-table define intervals, wherein a respective symbol value is associated to each of the intervals of the selected cumulative-frequencies-table.
  • the widths of the intervals between two adjacent values of the cumulative-frequencies-table define probabilities of the symbols associated with said intervals, such that the selected cumulative-frequencies-table in its entirety defines a probability distribution of the different symbols (or symbol values). Details regarding the available cumulative- frequencies-tables will be discussed below taking reference to Fig. 16.
  • the symbol value is derived from the value of the pointer variable p, wherein the symbol value is derived as shown at reference numeral 55Od.
  • the difference between the value of the pointer variable p and the starting address "cum_freq" is evaluated in order to obtain the symbol value, which is represented by the variable "symbol”.
  • the algorithm "arith_decode” also comprises an adaptation 550e of the variables "high” and "low". If the symbol value represented by the variable “symbol” is different from 0, the variable “high” is updated, as shown at reference numeral 55Od. Also, the value of the variable “low” is updated, as shown at reference numeral 550e.
  • the variable "high” is set to a value which is determined by the value of the variable “low”, the variable “range” and the entry having the index "symbol -1" of the selected cumulative-frequencies-table.
  • the variable “low” is increased, wherein the magnitude of the increase is determined by the variable "range” and the entry of the selected cumulative-frequencies-table having the index "symbol". Accordingly, the difference between the values of the variables "low” and “high” is adjusted in dependence on the numeric difference between two adjacent entries of the selected cumulative-frequencies-table.
  • the interval between the values of the variables "low” and “high” is reduced to a narrow width.
  • the detected symbol value comprises a relatively large probability
  • the width of the interval between the values of the variables "low” and “high” is set to a comparatively large value. Again, the width of the interval between the values of the variable "low” and “high” is dependent on the detected symbol and the corresponding entries of the cumulative- frequencies-table.
  • the algorithm "arith_decode()" also comprises an interval renormalization 55Of, in which the interval determined in the step 55Oe is iteratively shifted and scaled until the "break"- condition is reached.
  • an interval renormalization 55Of In the interval renormalization 55Of, a selective shift-downward operation 55Ofa is performed. If the variable "high" is smaller than 524286, nothing is done, and the interval renormalization continues with an interval-size-increase operation 560fb.
  • variable "high” is not smaller than 524286 and the variable "low” is greater than or equal to 524286, the variables "values", "low” and “high” are all reduced by 524286, such that an interval defined by the variables “low” and “high” is shifted downwards, and such that the value of the variable "value” is also shifted downwards.
  • variable "high” is not smaller than 524286, and that the variable "low” is not greater than or equal to 524286, and that the variable "low” is greater than or equal to 262143 and that the variable "high” is smaller than 786429
  • the variables "value", "low” and “high” are all reduced by 262143, thereby shifting down the interval between the values of the variables "high” and “low” and also the value of the variable "value”. If, however, neither of the above conditions is fulfilled, the interval renormalization is aborted.
  • the interval-increase-operation 550fb is executed.
  • the value of the variable "low” is doubled.
  • the value of the variable "high” is doubled, and the result of the doubling is increased by 1.
  • the value of the variable "value” is doubled (shifted to the left by one bit), and a bit of the bitstream, which is obtained by the helper function "arith_get_next_bit" is used as the least-significant bit.
  • the size of the interval between the values of the variables "low” and “high” is approximately doubled, and the precision of the variable "value” is increased by using a new bit of the bitstream.
  • the steps 550fa and 550fb are repeated until the "break” condition is reached, i.e. until the interval between the values of the variables "low” and “high” is large enough.
  • the interval between the values of the variables "low” and “high” is reduced in the step 55Oe in dependence on two adjacent entries of the cumulative-frequencies-table referenced by the variable "cum_freq". If an interval between two adjacent values of the selected cumulative-frequencies-table is small, i.e. if the adjacent values are comparatively close together, the interval between the values of the variables "low” and “high”, which is obtained in the step 55Oe, will be comparatively small. In contrast, if two adjacent entries of the cumulative-frequencies-table are spaced further, the interval between the values of the variables "low” and "high”, which is obtained in the step 55Oe, will be comparatively large.
  • the entries of the cumulative-frequencies-tables reflect the probabilities of the different symbols and also reflect a number of bits required for decoding a sequence of symbols.
  • the decoded group index ng is the escape symbol "ARITH_ESCAPE”
  • an additional group index ng is decoded and the variable lev is incremented by 2. Accordingly, an information is obtained about the numeric significance of the most-significant bit-plane and also about a number of less-significant bit-planes to be decoded.
  • the state variable t is then incremented by the value 4194304, which corresponds to set to "1" the 23th bit of the variable t. If an escape symbol is decoded for the second and more time, the state variable t is then set to zero. In both cases, when a escape symbol "ARITH_ESCAPE" is decoded, the updated state variable t is then used for a new iteration of the group index decoding 312b.
  • variable mm is set to a value, which is determined by the least- significant bits (e.g. bits 0-7) of the entries of the table "dgroups[]" at a table position determined by the group index ng.
  • group offset og is determined by the more significant bits (bit 8 and onwards) of the entry of the table "dgroupsQ", which entry is determined by the position offset defined by the variable ng.
  • variable mm is greater than 1 , i.e. if the group determined by the group index ng comprises more than one element, the element index ne is decoded by calling the function "arith_decode()" with the cumulative-frequencies-table
  • a subsection of the table "arith_cf_ne[]" is selected, wherein the cumulative-frequencies-table “arith_cf_ne[]” in its entirety describes probability distributions for a plurality of different numbers of elements of a group selected by the group index ng. It should be noted that offsets of the different subsections (or subtables) of the cumulative-frequencies-table "arith_cf_ne[]” are described by the formula (mm*(mm- 1))»1.
  • variable "cum_freq”, which is used as an input variable of the algorithm "arith_decode()" 5 is initialized to the starting address of the subsection (or subtable) of the table or array "arith_cf_ne[]" which is associated to the number of elements mm of the current group described by the group index ng.
  • the variable cfl, which is an input variable of the algorithm "arith_decode()” is initialized to the value mm. Subsequently, the function "arith_decode()" is called, the operation of which has been described in detail above.
  • a first spectral value "a" of the tuple of spectral values can be set to an entry of the table or array "dgvectors[]", wherein the array element index (or table element index, or briefly “element index” or “entry index”) is determined as 4*(og+ne).
  • the second spectral value "b" of the tuple of spectral values can be set to an entry of the array "dgvectors[]", wherein the array element index is determined by 4*(og+ne)+l.
  • a third spectral value "c" of the tuple of spectral values can be set to an entry of the array "dgvectorsQ", wherein an element index is determined by 4*(og+ne)+2.
  • a fourth spectral value "d” of the tuple of spectral values can be set to an entry of the array "dgvectorsQ", wherein the element index is determined by 4*(og+ne)+3.
  • the spectral values "a”,” b”, “c”, “d”, which represent the most-significant bit planes of the tuple of spectral values are derived from the array "dgvectors[]", wherein the entries of the array determining the spectral values "a”, “b”, “c”, “d” are selected in accordance with the group index ng and the element index ne (if available).
  • the remaining bit planes are then decoded from the most-significant to the lowest significant level by calling lev times the function "arith_decode()" with the cumulative- frequencies-table "arith_cf_r[]".
  • the input variable "cum_freq" of the function "arith_decode()” may be initialized to the starting address of the array "arith_cf_r[]”.
  • the input variable cfl of the function "arith_decode()” may be initialized to an appropriate value representing the length of the table "arith_cf_r[]", which is in case of a tuple of dimension 4 equals to 16.
  • the function "arith_decode()" can return the variable r, which represents the binary values of one of the less-significant bit-plane of the decoded tuple.
  • the decoded bit-plane r permits to refine the decoded 4-tuple according to the algorithm shown in Fig. 5h.
  • the first spectral value "a” is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the least-significant bit (bit 0) of the value r is added as the new least-significant bit (which may be done using an OR operation).
  • the second spectral value "b” is multiplied by 2, and the second bit (bit 1) of the value r is added as a least-significant value of the spectral value "b”.
  • the third spectral value "c” is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the third bit (bit 2) of the value r is added as the least-significant bit.
  • the fourth spectral value d" is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the fourth bit (bit 3) of the value r is added as a least-significant bit.
  • Context Update (Tig. 5 ⁇ )
  • the function "arith_update_context” receives, as input variables, the spectral values "a", “b", “c", “d” of the decoded 4-tuple, the index "i” of the 4-tuple to decode (or the decoded 4-tuple) and the number lg/4 of 4-tuples associated with the current audio frame.
  • the function "arith_update_context()" comprises a step 580a of copying the spectral values "a", "b", "c", "d", into the array q.
  • the entry “a” of the array q at the position (1, i) (also designated with “q[l][i].a") is set to take the first spectral value "a”.
  • the entry “b” of the array q at the position (1, i) is set to the second spectral value "b”.
  • the entry “c” of the array q at the position (1, i) is set to the third spectral value "c”
  • the entry "d” of the array q at the position (1, i) is set to the fourth spectral value "d”.
  • the spectral values "a”, “b”, “c”, “d” are stored in the entries "a", "b", "c", "d” of the array q at the position (1 , i).
  • the function "arith_update_context()" also comprises a step 580b of setting the entry "v" of the array q at the position (1, i). If one of the spectral values "a”, “b”, “c”, “d” of the currently-decoded tuple of spectral values is smaller than -4 or greater than or equal to 4, the entry "V of the array q at the position (1, i) is set to the value of 1024. Otherwise, i.e.
  • the entry "v” of the array q at the position (1, i) is set to an entry of the table or array "egroups[]” at the position (4+a, 4+b, 4+c, 4+d). Accordingly, the entry "v” of the array q is typically set to a standard value of 1024, if one of the spectral values "a", “b", “c", “d” is comparatively large, thereby causing the function "arith_get_context()" to perform the process by 53Of during the decoding of an adjacent tuple of spectral values.
  • the function "arith_update_context()" also comprises a first mapping 580c, which is performed if the last tuple of spectral values of a current audio frame is decoded and the core mode is the linear-predictive-frequency-domain core mode (for the case of an audio coder switchable between a frequency-domain core mode and a linear-predictive- frequency-domain core mode.
  • the function "arith_update_context()” also comprises a second mapping 58Od, which is performed if the last tuple of spectral values of the current audio frame is decoded and if the core mode is the frequency-domain core mode.
  • 4-tuples of quantized spectral coefficients are noiselessly coded and transmitted starting from the lowest-frequency coefficient and progressing to the highest-frequency coefficient.
  • the coefficients from the advanced-audio-coding are stored in an array
  • Coefficients from the transform-coded-excitation are stored directly in an array "x_tcx_invquant[win][bin]", and the order of the transmission of the noiseless coding codewords is such that when they are decoded in the order received and stored in the array, "bin” is the most rapidly incrementing index and "win” is the most slowly incrementing index.
  • the order of decoding is a, b, c, d.
  • the flag "arith_reset_fiag” determines if the context must be reset. If the flag is TRUE, the function "arith_reset_context", a pseudo-program-code representation of which is shown in Fig. 5a, is called. Otherwise, when the flag "arith_reset_flag" is FALSE, a mapping is done between the past context and the current context in accordance with the function "arith_map_context()", a pseudo-program-code representation of which is shown in Fig. 5b.
  • the noiseless decoder outputs 4-tuples of signed quantized spectral coefficients.
  • the state of the context is calculated based on the four previously decoded groups surrounding the 4-tuple to decode.
  • the state of the context is given by the function "arith_get_context()", a pseudo-program-code representation of which is shown in Fig. 5c.
  • the group to which belongs the most-significant signed 2-bits wise plane of the 4-tuple is decoded using the function "arith_decode()", fed with the appropriated cumulative-frequencies-table corresponding to the context state.
  • the function "arith_decode()" is called with the cumulative-frequencies-table, "arith_cf_ng[pki][], corresponding to the index returned by the function "arith_get_pk()".
  • the arithmetic coder (or decoder) is an integer implementation using the method of tag generation with scaling. For details, reference is made, for example, to the book “Introduction to Data Compression” of K. Sayood, third edition, 2006, Elsevier Inc.
  • the pseudo-C-code of Fig. 5e describes the used algorithm of the function "arith_decode()".
  • decoded group index ng is the escape symbol
  • ARITH_ESCAPE an additional group index ng is decoded and the variable lev is incremented by 2. The state of the context is also adjusted.
  • the decoded group index is not the escape symbol
  • ARITH_ESCAPE the number of elements, mm, within the group and the group offset, og, are deduced by looking up to the table dgroups[], in accordance with the algorithm shown in Fig. 5f.
  • the element index ne is then decoded by calling the function "arith_decode()" with the cumulative-frequencies-table (arith_cf_ne+((mm*(mm-l)) »1)[].
  • the most-significant 2-bits wise plane of the 4-tuple can be derived with the table "dgvector[]" in accordance with the algorithm shown in Fig. 5g.
  • bit planes are then decoded from the most-significant to the lowest- significant level by calling lev times the function "arith_decode()" with the cumulative- frequencies-table "arith_cf_r[]".
  • the decoded bit plane r permits to refine the decoded 4-tuple by the algorithm shown in Fig. 5h.
  • the table of Fig. 14 lists the entries of the table “arith_cf_ng_hash[]. Said entries are referenced by a one-dimensional integer-type entry index (also designated as “element index” or “array index”), which is, for example, designated with "i”.
  • a first column 1410 which is an index column, describes starting indices associated to the respective lines 1412a to 1412p.
  • a first value column 1420 shows the values of entries of the table "arith_cf_ng_hash[]" for entry indices identical to the start index shown in the index column 1310.
  • a second value column 1422 shows entries of the table “arith_cf_ng_hash” for entry indices which are large, by one, compared to the start indices shown in column 1410 of the respective line.
  • a third value column 1424 describes entries of the table "arith_cf_ng_hash", for which the element index is larger, by two, than the starting index shown in column 1410 of the corresponding line.
  • columns 1426, 1428, 1430, 1432, 1434 show entries of the table "arith_cf_ng_hash", for which the element index is larger, by 3 (column 1526) , by 4 (column 1428), by 5 (column 1430), by 6 (column 1432) or by 7 (column 1434) than the starting index shown in column 1410 of the respective line 1412a to 1412p.
  • the line 1412a shows, in the columns 1420 to 1434, entries of the table "arith_cf_ng_hash” having element indices 0, 1, 2, 3, 4, 5, 6 and 7.
  • the line 1412b shows, in the columns 1420 to 1434, a sequence of entries of the table "arith_cf_ng_hash” having element indices of 8, 9, 10, 11, 12, 13, 14 and 15.
  • the arrangement of the entries discussed above applies analogously.
  • Table “arith cf ne” Figs. 15 (1) to 15 (10) show table presentations of entries of the table “arith_cf_ne", which is evaluated by the function "arith_decode()" when decoding the element index ne.
  • the entries of the table "arith_cf_ne[]" comprise indices between 0 and 2699.
  • a starting index column 1510 describes starting indices associated to the respective lines.
  • a first line is designated, for example, with 1512a, and a second line is designated, for example, with 1512b.
  • a first entry column 1520 shows entries of the table "arith_cf_ne[]", the entry index of which is determined by the value given in the starting index column 1510 of the corresponding lines (e.g. lines 1512, 1512b). Entry columns 1522, 1524, 1526, 1528, 1530,
  • 1532, 1534 show entries of the table "arith_cf_ne[]" having element indices which are larger, by one (column 1522), by 2 (columnl524), by 3 (column 1526), by 4 (column
  • the first line 1512a shows, in columns 1520 to 1534, elements of the table "arith_cf_ne[]" having entry indices between 0 and 7.
  • the second line 1512b shows, in columns 1520 to 1534, entries of the table "arith_cf_ne[]" having entry indices between 0 and 7.
  • Figs. 16(1) to 16(32) show a set of 32 cumulative-frequencies-tables "arith_cf_ng[pki][545], one of which is selected by an audio encoder 100 or audio decoder 200, for example, for the execution of the function "arith_decode()", i.e. for the decoding of the group index ng.
  • the selected one of the 32 cumulative-frequencies-tables shown in Fig. 16(1) to 16(32) takes the function of the table "cum_freq[]" in the execution of the function "arith_decode()".
  • the table 1601 comprises an index column 1640, which shows starting indices associated with respective lines of the table 1601. A first line is designated to 1642a, and a second line is designated with 1642b.
  • a first entry column 1650 represents entries of the table 1601, the entry indices of which are identical to the starting indices shown in the index column 1640 of the respective lines.
  • the first line 1632a shows the values of the entries of the table 1601 having element indices between 0 (value 16684) and 15 (value 6352).
  • the second line 1642b shows values of entries of the table 1601 having element indices between 16 (value 6202) and 31 (value 3547).
  • Figs. 17(1) and 17(2) show a representation of the entries of a table "dgroups[]", which may be applied by the audio encoder 100 and the audio encoder 200.
  • the table “dgroups[]” may be applied in the execution of the algorithm “tuples_decode()", which is shown in Fig. 3.
  • the table “dgroups[]” may be applied by the algorithm shown in Fig. 5f in order to determine the number mm of elements in a group and the group offset og of a group designated by the group index ng.
  • the representation of the table “dgroupsf]" comprises an index column 1710, which shows start indices associated with respective lines, e.g. a first line 1712a and a second line 1712b of the table representation.
  • a first entry column 1720 of the representation shows entries of the table "dgroups[]' ⁇ the element indices of which are identical to the starting indices shown in the index column 1710 of the respective lines.
  • entry columns 1722, 1724, 1726, 1728, 1730, 1732, 1734 show entries of the table "dgroupsf]", the element indices which are larger by 1 (col. 1722), by 2 (col. 1724), by 3 (col. 1726), by 4 (col. 1728), by 5 (col.
  • the first line 1712a shows, in entry columns 1722 to 1734, entries of the table "dgroupsf]" having element indices between 0 (col. 1720) and 7 (col. 1734).
  • the second line 1712b shows, in columns 1720 to 1734, entries of the table "dgroupsf]” having element indices between 8 (col. 1720) and 15 (col. 1734).
  • the entries of the table "dgroups[]" are shown in a hexadecimal notation in Figs. 17 (1) and 17 (2), which is indicated by the prefix "Ox". The most-significant hexadecimal digit is shown on the left side and the least- significant hexadecimal digit is shown on the right side.
  • Figs. 18(1) to 18(11) show a table representation of the entries of the table "dgvectors[]".
  • the table dgvectorsf] may for example be used in the audio encoder 100 or the audio decoder 200.
  • the table “dgvectorsf]” may be used in the step 312d of the function "tuples_decode()" shown in Fig. 3 or in the execution of the algorithm of Fig. 5g.
  • the table “dgvectorsf]” may be used to map a group index and an element index onto values of a most-significant bit-plane of a tuple of spectral values.
  • the table representation of Figs. 18(1) to 18(11) comprises an index column 1810, which comprises starting indices associated with the lines (for example a first line 1812a or a second line 1812b) of the table representation.
  • Entry columns 1820 to 1882 show entries of the table "dgvectorsf]".
  • the entry column 1820 shows entries of the table "dgvectorsf]", entry indices of which are identical to a corresponding starting index shown in the index column 1810 of the respective lines.
  • Subsequent entry columns 1822 to 1882 show, in ascending order, entries of the table "dgvectorsf]" which are larger, by 1 (col. 1822), by 2 (col. 1824), by 3 (col.
  • the first line 1812a shows, in columns 1820 to 1882, entries of the table "dgvectorsf]" having element indices between 0 and 31, wherein the element indices associated with the entries increase monotonically from the left to the right.
  • the second line 1812b shows, in columns 1820 to 1882, entries of the table "dgvectorsf]" having element indices between
  • Table “eeroups” Figs. 19(1) to 19(32) show a table representation of a table “egroups[a][d][c][d]", which may also be considered as a 4-dimensional array having four element indices a, b, c, d. It should be noted that each of the element indices a, b, c, d may take values between 0 and 7.
  • the table or array "egroups[a][b][c][d]" may be used in the audio encoder 100 or in the audio decoder 200.
  • the table or array "egroups[a][b][c][d]" may be used in the function "arith_get_context()" to derive the return value and in the function "arith_update_context()" to determine the entry "v" of the array q at entry index (1, 1+i).
  • the table 1901 comprises an index column 1970 representing values of the third index c associated with the respective lines 1972a to 1972h.
  • an index line 1980 represents index values of the fourth index d associated with respective columns 1982a to 1982h of the table 1901.
  • the indices a and b associated to an entry of the tables 1901 to 1964 are written next to the respective table.
  • the index c associated with an entry of one of the tables 1901 to 1964 is determined by the value of the index column of the respective line
  • the fourth index d of an entry is determined by the value of the index line of the respective column of the entry.
  • the entries of the array "egroups” are represented in hexadecimal notation, which is indicated by the prefix "Ox".
  • the embodiments according to the invention use an updated set of tables, as discussed above, which reduces significantly the memory requirements for the spectral noiseless coding when compared to a previously-used set of tables.
  • a lossless transcoding is possible within the bitrate constraints.
  • Audio coding concepts like, for example, the so-called unified-speech-and-audio-coding (USAC) use a context-adapted arithmetic coder (and decoder) for noiselessly (or losslessly) coding the quantized spectral coefficients (for example the spectral coefficients 252).
  • the context adaptation associated with an arithmetic coder permits to achieve high noiseless coding performances.
  • the main drawback of this technology comes from its relatively high complexity in terms of a memory requirement. Indeed, the context adaptation requires a significantly large set of tables modeling different probability distributions.
  • the read-only-memory (ROM) consumption of the previously- used unified-speech-and-audio-coding (USAC) is evaluated to be about 150 kWords, wherein the entropy coder represents around 73 percent of the total requirement.
  • One of the aims of the present contribution is to propose a new set of tables for the arithmetic coder (or decoder) requiring significantly less memory space while maintaining the original performance of the noiseless coder (encoder or decoder).
  • Fig. 7 shows a table listing the detailed memory requirements of a previously-implemented USAC noiseless coder (encoder or decoder).
  • the new set of tables shows a size reduction factor of about 7 compared to the original ones.
  • the reduction was achieved by reducing the number of probability distribution models and by optimizing the selected models. Further, the mapping between the context state and the newly-defined models was optimized.
  • Fig. 9 shows a block schematic diagram of a lossless transcoding from the "reference model 0 tables” to the “reduced tables”.
  • the evaluation setup comprises a USAC RMO encoder 910, which receives, as an input information, spectral values 908, and which provides an arithmetically encoded representation 912 of the spectral values using the "old" tables of the reference model 0. Accordingly, a so-called “RMO" bitstream is obtained, which comprises the arithmetically-encoded spectral values 912.
  • the evaluation setup 900 also comprises lossless transcoding 920, in which the arithmetically-encoded spectral values 912 of the RMO bitstream are decoded by an entropy decoder using the "old" tables (reference model 0 tables) to obtain decoded spectral values 924.
  • the lossless transcoding also comprises an entropy encoder, which is configured to encode the decoded spectral values 924 using the "new" reduced tables in order to obtain a reduced-tables bitstream 928.
  • the RMO bitstream which comprises the representation of the spectral values encoded using the "old” tables
  • the reduced-tables bitstream 928 which comprises the representation of the spectral values encoded using the "new" tables.
  • the so-called RMO bitstream is decoded using a USAC reference decoder and using the "old" tables to obtain a so-called RMO synthesis result 942.
  • the so-called reduced-table bitstream 928 is decoded using a USAC reference decoder and using the "new" tables. Accordingly, a reduced-tables synthesis result 952 is obtained.
  • the RMO synthesis result 942 is subsequently compared with the reduced-table synthesis result 952 to verify the correctness of the implementation.
  • the tables of Figs. 10 and 11 display minimum, maximum and average bitrates over all sub-segments of the entire, concatenated, encoded item with the RMO tables and the reduced table respectively.
  • an individual sub-segment length was determined by a combination of succeeding access-units, whose length is closest to 100 ms.
  • Sub-segment lengths and according bitrates are specified in the two tables.
  • the lossless transcoding from the RMO tables bitstream to the reduced tables bitstream is achieved for each operating mode, i.e. the bit reservoir conditions were not violated while obtaining a bit-exact synthesis.
  • the table of Fig.12 compares the bitrates generated only by the core coder when using the RMO tables and the reduced tables.
  • the reduced tables perform on average slightly better than the RMO tables for each operating mode except at 64 kbit per second (kbps) stereo, where the average increase of bits is only 0.02% of 64 kbps, which corresponds to approximately 0.5 bits/frame. Nevertheless, the bitstream generated with the reduced tables still matches the bitrate requirement at this operating mode.
  • the table of Fig. 13 shows, for each of the operating modes, the worse and best cases of the difference of gathered bits after and before transcoding the RMO tables bitstream to the reduced tables bitstream.
  • the difference is looked on a sub-segment basis. It can be observed in all the cases that the performance of the reduced tables is very consistent and very stable.
  • the maximum increase of bits by replacing the RMO tables with the reduced tables is below 6 % of the total bitrate.
  • the decrease of bits can reach more than 6 %.
  • a new hash table arith_cf_ng_hash[128], a new set of 32 probability models, arith_cf_ng[32][545], and the corresponding mapping function "arith_get_pk()" are used.
  • the update of the arithmetic coder tables changes the mapping of the state index s into the probability model index pki as well as the probability models themselves. It does not change neither the computation of the state index s, nor the way the probability model is used afterwards for coding the current symbol (i.e.
  • the main advantage of the new tables is to reduce the memory requirements for storing the tables, which have now a size of about 15 kWords (i.e. 15*1024 words of 32 bits each) instead of about 110 kWords, while maintaining the coding efficiency.
  • mapping is done by calling the function arith_get_pk() with the state variable t as an input argument.
  • the return value is the probability model index pki, which is used by the arithmetic coder as the probability distribution (also called cumulative-frequencies) for coding the current symbol (or to select the appropriate cumulative-frequencies table out of a set of cumulative-frequencies-tables).
  • mapping is done in two steps, which are done in two different parts of the function.
  • the hash table "arith_cf_ng_hash” (also designated with ari_pk_hash[]) is considered and used for checking, if the current state t is a significant state.
  • a significant state is a state, which was previously selected during a training phase to have its "own” mapping into pki.
  • the non-significant states are mapped later on into pki in the second part 55Oe of the function arith_get_pk() by using default mappings (also designated as range-based mapping). According to the invention, the number of significant states was reduced to 67, which is drastically lower than the previous 22955 significant states from the old table.
  • the following code permits to detect if the present state t is a significant state. If yes, the function is finished (because the return-statement is reached) in returns the associated probability model index pki :
  • the second part of 54Oe of the mapping function is used for mapping the non-significant states.
  • the states are divided into seven non-uniform sections according to the indices. Each section is associated to a probability model.
  • the mapping is recorded in the lookup table psci[].
  • the bits beyond the 22th bit of t indicate the accuracy of the level prediction. According to the prediction accuracy, a different mapping is used. In total, seven sections and four accuracies are considered, which correspond to a total of 28 different mappings stored in psci[]. The mapping is done as follows:
  • the embodiments according to the invention are related to the above-discussed mapping tables, which are implemented in an audio encoder and an audio decoder.
  • the advantage of the new tables partly comes from the adequate training, which was performed. Further, the invention is based on the considerations about the best dimension of the tables.
  • bitstream syntax of a bitstream carrying the arithmetically-encoded spectral information will be described taking reference to Figs. 6a to 6h.
  • Fig. 6a shows a syntax representation of s so-called USAC raw data block ("usac_raw_data_block()") .
  • the USAC raw data block comprises one or more single channel elements ("single_channel_element()") and/or one or more channel pair elements ("channel_pair_element()”) .
  • the single channel element comprises a linear-prediction-domain channel stream ("lpd_channel_stream ()”) or a frequency-domain channel stream (“fd_channel_stream ()”) in dependence on the core mode.
  • Fig. 6c shows a syntax representation of a channel pair element.
  • a channel pair element comprises core mode information ("corejtnodeO", "core_model”).
  • the channel pair element may comprise a configuration information "ics_info()".
  • the channel pair element comprises a linear- prediction-domain channel stream or a frequency-domain channel stream associated with a first of the channels, and the channel pair element also comprises a linear-prediction- domain channel stream or a frequency-domain channel stream associated with a second of the channels.
  • the configuration information "ics_info()" a syntax representation of which is shown in Fig. 6d, comprises a plurality of different configuration information items, which are not of particular relevance for the present invention.
  • a frequency-domain channel stream (“fd_channel_stream ()"), a syntax representation of which is shown in Fig, 6e, comprises a gain information (“global_gain”) and a configuration information (“ics_info ()”).
  • the frequency-domain channel stream comprises scale factor data ("scale_factor_data ()”), which describes scale factors used for the scaling of spectral values of different scale factor bands, and which is applied, for example, by the sealer 150 and the rescaler 240.
  • the frequency-domain channel stream also comprises arithmetically-coded spectral data (“ac_spectral_data ()”), which represents arithmetically-encoded spectral values.
  • the arithmetically-coded spectral data (“ac_spectral_data()"), a syntax representation of which is shown in Fig. 6f, comprises an optional arithmetic reset flag (“arith_reset_flag”), which is used for selectively resetting the context, as described above.
  • the arithmetically-coded spectral data comprise a plurality of arithmetic-data blocks
  • arith_data which carry the arithmetically-coded spectral values.
  • the structure of the arithmetically-coded data blocks depends on the number of frequency bands (represented by the variable "num_bands") and also on the state of the arithmetic reset flag, as will be discussed in the following.
  • Fig. 6g shows a syntax representation of said arithmetically-coded data blocks.
  • the data representation within the arithmetically-coded data block depends on the number Ig of spectral values to be encoded, the status of the arithmetic reset flag and also on the context, i.e. the previously-encoded spectral values.
  • the context for the encoding of the current set of spectral values is determined in accordance with the context determination algorithm shown at reference numeral 660.
  • the arithmetically-encoded data block comprises lg/4 sets of codewords, each set of codewords representing a tuple of spectral values.
  • a set of codewords comprises an arithmetic codeword "acod_ng [pki][ng]" representing a group index ng of the tuple of spectral values using between 1 and 20 bits.
  • a set of codewords also comprises an arithmetic codeword "acod_ne[ne]" representing an element index ne of a tuple of spectral values if the group comprising the tuple of spectral values includes more than one element.
  • the set of codewords comprises one or more codewords "acod_r [][][][]” if the tuple of spectral values requires more bit planes than the most-significant bit plane for a correct representation.
  • the codeword “acod_ne [ne]” represents the element index using between 1 and 20 bits and the codeword "acod_r [][][][]” represents a less-significant bit plane using between 1 and 20 bits.
  • bit planes are required (in addition to the most- significant bit plane) for a proper representation of the tuple of spectral values.
  • this is signaled by using one or more arithmetic escape codewords ("ARITHJESCAPE").
  • ARITHJESCAPE arithmetic escape codewords
  • arithmetic escape codewords "acod_ng [pki][ARITH_ESCAPE]", which are encoded in accordance with a currently-selected cumulative-frequencies-table, a cumulative-frequencies-table-index of which is given by the variable pki.
  • the context is adapted, as can be seen at reference numeral 664, 662, if one or more arithmetic escape codewords are included in the bitstream.
  • an arithmetic codeword "acod_ng [pki][ng]" is included in the bitstream, as shown at reference numeral 663, wherein pki designates the currently- valid probability model index (taking into consideration the context adaptation caused by the inclusion of the arithmetic escape codewords), and wherein ng designates the group index associated with the most-significant bit plane of the tuple of spectral values to be encoded.
  • the group index can be derived, in an encoder, by evaluating the table dg vectors, which allows for deriving the group index ng and an element index ne associated to a tuple of spectral values when taken in combination with the table "dgroups".
  • the arithmetically-encoded data block comprises the codeword "acod_ne[ng]" encoding the element index ne using an appropriately-selected cumulative- frequencies-table.
  • any less-significant-bit planes results in the presence of one or more codewords "acod_r [][][][]", each of which represents a tuple of 4 bits of a least-significant bit plane.
  • the one or more codewords "acod_r[] [][][]” are encoded in accordance with a corresponding cumulative-frequencies-table, which is constant and context-independent.
  • Fig. 6h shows a legend of definitions and help elements defining the syntax of the arithmetically-encoded data block.
  • bitstream format which may be provided by the audio coder 100, and which may be evaluated by the audio decoder 200.
  • the bitstream of the arithmetically-encoded spectral values is encoded such that it fits the decoding algorithm discussed above.
  • the encoding is the inverse operation of the decoding, such that it can generally be assumed that the encoder performs a table lookup using the above-discussed tables, which is approximately inverse to the table lookup performed by the decoder.
  • the decoding algorithm and/or the desired bitstream syntax will easily be able to design an arithmetic encoder, which provides the data defined in the bitstream syntax and required by the arithmetic decoder.
  • the method 2000 comprises a first step 2010 of providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values.
  • the method 2000 further comprises a second step 2020 of providing a time-domain audio representation using the decoded spectral values.
  • the step 2010 comprises selecting 2012 a cumulative- frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index.
  • the step 2010 also comprises a sub-step 2014 of applying the selected cumulative- frequencies-table to derive a group-index from a variable-length-codeword representing the group index.
  • the step 2010 also comprises the sub-step 2016 of deriving values of a most-significant bit-plane of a tuple of spectral values using the group index and the element index, the element index designating an element within a group selected by the group index.
  • the step 2010 also comprises a sub-step 2018 of providing a tuple of decoded spectral values using the values of the most-significant bit plane of the tuple of spectral values.
  • the method 2100 of Fig. 21 comprises a first step of providing a frequency-domain audio representation on the basis of a time-domain audio representation of the input audio information, such that the frequency-domain audio representation comprises a set of spectral values, and such that an energy is compacted in a sub-set of the spectral values.
  • the method 2100 also comprises a second step 2120 of encoding a tuple of adjacent spectral values of the set of spectral values, or of encoding a tuple of adjacent spectral values of a pre-processed version of the set of spectral values.
  • the step 2120 comprises a sub-step 2122 of mapping values of a most-significant bit plane of a tuple of spectral values onto a group index and an element index, the element index designating an element within a group selected by the group index.
  • the step 2120 also comprises a sub-step 2124 of selecting a cumulative- frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index describing a state of the arithmetic encoder.
  • the step 2120 also comprises a sub-step 2126 of arithmetically encoding the group index using the selected cumulative- frequencies-table (selected in the sub-step 2124) in order to obtain an arithmetically- encoded variable-length codeword.
  • the methods 2000 and 2100 of Figs. 20 and 21 can be supplemented by any of the features and functionalities described herein with respect to the inventive coding scheme.
  • the methods 2000 and 2100 can be supplemented by any of the features and functionalities of the apparatus described herein.
  • the coding tables described herein are preferable used in connection with the encoding method and decoding method.
  • aspects described in the context of an apparatus it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
  • Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
  • the inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
  • embodiments of the invention can be implemented in hardware or in software.
  • the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
  • embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may for example be stored on a machine readable carrier.
  • inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
  • an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
  • a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
  • a programmable logic device for example a field programmable gate array
  • a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
  • the methods are preferably performed by any hardware apparatus.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An audio decoder comprises an arithmetic decoder for providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values and a frequency-domain to time-domain converter for providing a time-domain audio representation using the decoded spectral values, in order to obtain a decoded audio information. The arithmetic decoder is configured to derive a group index from a variable-length-codeword representing the group index in dependence on a state index. The arithmetic decoder is configured to derive the values of a most-significant bit-plane of a tuple of spectral values using the group index and an element index, and to provide a tuple of decoded spectral values using the values of the most-significant bit-plane of the tuple of spectral values. The arithmetic decoder is configured to select a cumulative-frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on the state index, and to apply the selected cumulative-frequencies-table to derive the group index from the variable-length codeword representing the group index.

Description

Audio Encoder, Audio Decoder, Method for Encoding an Input Audio Information, Method for Decoding an Input Audio Information and Computer Program using
Improved Coding Tables
Background of the Invention
Embodiments according to the invention are related to an audio decoder for providing a decoded audio information on the basis of an encoded audio information, an audio encoder for providing an encoded audio information on the basis of an input audio information, a method for providing a decoded audio information on the basis of an encoded audio information, a method for providing an encoded audio information on the basis of an input audio information and a computer program.
Embodiments according to the invention are related to the concept of using updated arithmetic coder tables in an audio encoder, like, for example, a so-called unified-speech- and-audio-coder (USAC).
In the following, the background of the invention will be briefly explained in order to facilitate the understanding of the invention and the advantages thereof. During the past decade, big efforts have been put on creating the possibility to digitally store and distribute audio contents with good bitrate efficiency. One important achievement on this way is the definition of the International Standard ISO/IEC 14496-3. Part 3 of this Standard is related to an encoding and decoding of audio contents, and subpart 4 of part 3 is related to general audio coding. ISO/IEC 14496 part 3, subpart 4 defines a concept for encoding and decoding of general audio content. In addition, further improvements have been proposed in order to improve the quality and/or to reduce the required bit rate.
According to the concept described in said Standard, a time-domain audio signal is converted into a time-frequency representation. The transform from the time-domain to the time-frequency-domain is typically performed using transform blocks, which are also designated as "frames" of time-domain samples. It has been found that it is advantageous to use overlapping frames, which are shifted, for example, by half a frame, because the overlap allows to efficiently avoid (or at least reduce) artifacts. In addition, it has been found that a windowing should be performed in order to avoid the artifacts originating from this processing of temporally limited frames. By transforming a windowed portion of the input audio signal from the time-domain to the time-frequency domain, an energy compaction is obtained in many cases, such that some of the spectral values comprise a significantly larger magnitude than a plurality of other spectral values. Accordingly, there are, in many cases, a comparatively small number of spectral values having a magnitude, which is significantly above an average magnitude of the spectral values. A typical example of a time-domain to time-frequency domain transform resulting in an energy compaction is the so-called modified-discrete-cosine- transform (MDCT).
The spectral values are often scaled and quantized in accordance with a psychoacoustic model, such that quantization errors are comparatively smaller for psychoacoustically more important spectral values, and are comparatively larger for psychoacoustically less- important spectral values. The scaled and quantized spectral values are encoded in order to provide a bitrate-efficient representation thereof.
For example, the usage of a so-called Huffman coding of quantized spectral coefficients is described in the International Standard ISO/IEC 14496-3 :2005(E), part 3, subpart 4.
However, it has been found that the quality of the coding of the spectral values has a significant impact on the required bitrate. Also, it has been found that the complexity of an audio decoder, which is often implemented in a portable consumer device, and which should therefore be cheap and of low power consumption, is dependent on the coding used for encoding the spectral values.
In view of this situation, there is a need for a concept for an encoding and decoding of an audio content, which provides for an improved trade off between bitrate-efficiency and resource efficiency.
Summary of the Invention
An embodiment according to the invention creates an audio decoder for providing a decoded audio information on the basis of an encoded audio information. The audio decoder comprises an arithmetic decoder for providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values. The audio decoder also comprises a frequency-domain to time-domain converter for providing a time-domain audio representation using the decoded spectral values, in order to obtain the decoded audio information. The arithmetic decoder is configured to derive, in dependence on a state-index describing a state of the arithmetic decoder, a group-index from a variable-length codeword representing the group-index. The arithmetic decoder is configured to derive values of a most-significant bit-plane of a tuple of spectral values using the group-index and an element-index, the element index describing (or designating, or selecting) an element within a group selected by the group-index. The arithmetic decoder is configured to provide a tuple of decoded spectral values using the values of the most-significant bit-plane of the tuple of spectral values. The arithmetic decoder is configured to select a cumulative-frequencies-table out of a set of 32 cumulative- frequencies-tables in dependence on the state-index describing the state of the arithmetic decoder, and to apply the selected cumulative frequencies-table to derive the group-index from the variable-length-codeword representing the group index.
This embodiment according to the invention is based on the finding that the usage of a set of 32 cumulative-frequencies-tables provides for an optimal trade off between an achievable bitrate and a complexity of an audio encoder or audio decoder. In particular, it has been found that 32 different cumulative-frequencies-tables are appropriate (in that they result in a reasonably low bitrate) for any relevant time-frequency-domain representation of an audio content. It has been found that a set of 32 cumulative-frequencies-tables is optimal, because the usage of a smaller number of cumulative-frequencies-tables would result in a significantly increased bitrate, and because the usage of a larger number of cumulative-frequencies-tables would bring along only insignificant improvements of the bitrate but a remarkable increase of the memory consumption both at the encoder side and at the decoder side.
To summarize the above, an intensive research has brought forward that a number of 32 probability models, which are selected in dependence on a state-index and which are represented by 32 cumulative-frequencies-tables, provides for an optimal trade off between bitrate-efficiency and implementation effort of the arithmetic coding.
In a preferred embodiment, the arithmetic decoder is configured to derive a 7-bit hash-table index from the state-index and to obtain a hash-table entry value from a hash-table, which hash-table comprises a mapping of 128 hash-table index values onto corresponding hash- table entry values. In this case, the arithmetic decoder is configured to decide whether the hash-table entry value (i.e. the content of the hash-table at a memory position designated by the hash-table index value) is an escape value, a valid cumulative-frequencies-table- identifier value associated to a state-index value (e.g. a state-index value on the basis of which the hash-table index value has been derived), or an invalid cumulative-frequencies- table identifier-value (e.g. a cumulative-frequencies-table identifier value which does not fit the state-index value on the basis of which the hash-table index value has been derived). The arithmetic decoder is configured to scan through the entries of the hash-table until an escape value or a valid cumulative-frequencies-table identifier-value is found. The arithmetic decoder is configured to provide a cumulative-frequencies-index-value in dependence on an identification of an interval of values in which the state-index value is contained, if the obtained hash-table entry value is the escape value, and to derive the cumulative-frequencies-index- value from the obtained hash-table entry value, if the hash- table entry value is a valid cumulative-frequencies-table identifier-value associated to the state-index value.
This embodiment of the invention is based on the finding that there are only a small number of "significant" states of the audio decoder, for which it is important to use a specific probability model (associated to a small number of only about 1 to 10 states), represented by a specific cumulative-frequencies-table (associated to a small number of only about 1 to 10 states). It has also been found that for the vast majority of states of the audio decoder, it is preferred to map the states onto a cumulative-frequencies-index-value on the basis of a determination of an interval of values, in which the state-index is contained, thereby mapping a whole interval of state-index values (typically a range of more than 1000 different state-index values) onto the same cumulative-frequencies-index- value.
As a consequence of the finding that a bitrate-efficient encoding can be obtained even if the number of "significant" states, to which specific probability distributions are associated, is very small, it has been possible to reduce the size of the hash-table to only 128 entries (while hash tables more than 100 times larger were used previously). Accordingly, the encoder-sided and decoder-sided implementation of the hash-table requires a very small amount of resources. This helps to make audio decoders cheap and also to keep the energy consumption of such audio decoders reasonably small, which in turn improves the possibilities to implement cheap and mobile consumer devices.
In a preferred embodiment, the hash-table is configured to map 67 values of the 7-bit hash- table-index onto valid cumulative-frequencies-table identifier- values and to map 61 values of the 7-bit hash-table-index onto escape values. Accordingly, a comparatively small number of only 67 "significant" states are present. In addition, any "insignificant" states (the number of which insignificant states is significantly larger than the number of significant states, for example by a factor of at least 10, but preferably by a factor of even more than 1000), which differ from the "significant" states, are mapped onto escape values, thereby making it easy to distinguish between a "significant" state and an "insignificant" state. Accordingly, the difference between significant and insignificant states can be determined fast and with low memory consumption.
In a preferred embodiment, the arithmetic decoder is configured to map 67 different values of the state index onto 67 different cumulative-frequencies-table identifier-values, such that 26 different cumulative-frequencies index values are associated to 67 different significant states described by the state-index value.
It has been found that even for the comparatively low number of 67 different significant states, it is sufficient to have only 26 different cumulative-frequencies distributions associated therewith. Again, the choice of only 26 different cumulative-frequencies-tables associated to the 67 significant states has shown to bring along an optimum trade-off between bitrate requirements and encoding/decoding complexity requirements.
In a preferred embodiment, the arithmetic decoder is configured to map the different nonsignificant states onto nine different cumulative-frequencies index values, such that a total of nine different cumulative-frequencies-tables are available for use in connection with the non-significant states, for which a range-type mapping is performed in order to derive the cumulative-frequencies index value. It has been found that a very small number of preferably nine different cumulative-frequencies-tables is sufficient in order to obtain a bitrate-efficient arithmetic encoding for most of the states.
It has also been found that there is a strong difference between a comparatively small number of significant states and a large number of non-significant states. The preferably 67 significant states, to which cumulative-frequencies-tables are associated by valid cumulative-frequencies-table identifier values of the hash table, are associated with particularly characteristic patterns of the spectral values like, for example, trajectories in the time-frequency representation having a particular width and direction. In contrast, all the other states, which are considered as non-significant states and to which a cumulative- frequencies-table is associated using a range-based algorithm, merely represent less characteristic spectral value distributions.
Notably, it has also been found that some of the cumulative-frequencies-tables, which are associated to the non-significant states, are also well-suited for being used in connection with some of the significant states. Accordingly, it has been found that it is advantageous to use three cumulative-frequencies-tables (e.g. cumulative-frequencies-tables having associated therewith indices 05, 26 and 30) which are applied both for one or more significant states and also for one or more non-significant states. For example, it has been found to be particularly advantageous, in terms of a trade-off between bitrate-efficiency and computational complexity, to have 23 different cumulative-frequencies-tables which are associated to significant states only using valid cumulative-frequencies-table identifier values of the hash table, to have six cumulative-frequencies-tables which are associated to non-significant states only using an escape-mechanism, which escape mechanism is based on an escape value stored in the hash table, and a range-based mapping. Also, it has been found to be advantageous to have three cumulative-frequencies-tables, which are associated both to one or more of the significant states and to one or more of the nonsignificant states.
Another embodiment according to the invention creates an audio encoder for providing an encoded audio information on the basis of an input audio information. The audio encoder comprises a time-domain to frequency-domain converter for providing a frequency-domain audio representation on the basis of a time-domain representation of the input audio information, such that the frequency-domain audio representation comprises a set of spectral values. The audio encoder also comprises an arithmetic encoder configured to encode a tuple of adjacent spectral values, or a preprocessed version thereof, using a variable-length codeword. The arithmetic encoder is configured to map values of a most- significant bit plane of a tuple of spectral values onto a group index and an element index, the element index describing an element within a group selected by the group index. The arithmetic encoder is further configured to select a cumulative-frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index describing a state of the arithmetic encoder and to arithmetically encode the group index using a selected cumulative-frequencies-table, in order to obtain an arithmetically encoded variable-length codeword.
The audio encoder according to this embodiment is based on the same idea as the audio decoder discussed above. In particular, the audio encoder is based on the finding that the number of 32 cumulative-frequencies-tables brings along an optimized trade-off between bitrate-efficiency and encoding/decoding complexity.
Another embodiment according to the invention creates a method for providing a decoded audio representation on the basis of an encoded audio representation. Another embodiment according to the invention creates a method for providing an encoded audio representation on the basis of an input audio representation.
Another embodiment according to the invention creates a computer program for performing the inventive methods.
Brief Description of the Figures
Embodiments according to the present invention will subsequently be described taking reference to the enclosed figures, in which:
Fig. 1 shows a block schematic diagram of an audio encoder, according to an embodiment of the invention;
Fig. 2 shows a block schematic diagram of an audio decoder, according to an embodiment of the invention;
Fig. 3 shows a pseudo-program-code representation of an algorithm "tuples_decode()" for decoding a tuple of spectral values;
Fig. 4 shows a schematic representation of a context for a state calculation;
Fig. 5a shows a pseudo-program-code representation of an algorithm "arith_reset_context ()" for resetting a context;
Fig. 5b shows a pseudo-program-code representation of an algorithm
"arith_map_context ()" for mapping a context;
Fig. 5c shows a pseudo-program-code representation of an algorithm
"arith_get_context ()" for obtaining a context state value;
Fig. 5d shows a pseudo-program-code representation of an algorithm
"arith_get_pk(s)" for deriving a cumulative-frequencies-table index value pki from a state variable;
Fig. 5e shows a pseudo-prograni-code representation of an algorithm "arith_decode
()" for arithmetically decoding a symbol from a variable-length codeword; Fig. 5f shows a pseudo-program-code representation of an algorithm for deriving an element number value mm and a group offset value og from a group index nq;
Fig. 5g shows a pseudo-program-code representation of an algorithm for obtaining spectral values a, b, c, d of a most-significant bit-plane of a tuple of spectral values on the basis of the group offset value og and an element index value ne;
Fig. 5h shows a pseudo-program-code representation of an algorithm for combining a tuple a, b, c, d of spectral values with values of a less-significant bit-plane, in order to obtain an updated version of the tuple a, b, c, d of spectral values;
Fig.5i shows a pseudo-program-code representation of an algorithm "arith__update_context ()" for updating the context;
Fig.5j shows a legend of definitions and variables;
Fig.6a shows as syntax representation of a unified-speech-and-audio-coding (USAC) raw data block;
Fig.6b shows a syntax representation of a single channel element;
Fig.6c shows syntax representation of a channel pair element;
Fig.6d shows a syntax representation of an "ics" control information;
Fig.6e shows a syntax representation of a frequency-domain channel stream;
Fig.6f shows a syntax representation of arithmetically-coded spectral data;
Fig.6g shows a syntax representation for decoding a set of tuples of spectral values;
Fig.6h shows a legend of data elements and variables;
Fig.7 shows a table representation of memory requirements of a previously used arithmetic coder; Fig. 8 shows a table representation of memory requirements of an arithmetic coder according to the present invention;
Fig. 9 shows a block schematic diagram of a set-up for evaluating performance improvements obtained by the arithmetic coder according to the present invention;
Fig. 10 shows a table representation of bitrates required for encoding different pieces of audio information using a previously used arithmetic coder;
Fig. 11 shows a table representation of bitrates required for encoding different pieces of audio information using the inventive concept;
Fig. 12 shows, in the form of a table representation, a comparison between average bitrates produced by a previously used audio coder and an audio coder according to the present invention;
Fig. 13 shows, in the form of a table representation, a comparison of bitrate reductions and bitrate increases obtained using the inventive concept when compared to a previously used concept;
Fig. 14 shows a table representation of entries of a table "arith_cf_ng_hash[]";
Figs. 15(1) to 15(10) show a table representation of entries of a table "arith_cf_ne []";
Figs. 16(1) to 16(32) show a table representation of entries of a table "arith_cf_ng [pki] [545]" for 32 different values 0 to 31 of the index pki;
Figs. 17(1) and 17(2) show a table representation of entries of a table "dgroups []";
Figs. 18(1) to 18(11) show a table representation of entries of a table "dvectors[]" ;
Figs. 19(1) to 19(32) show a table representation of entries of a table "egroups [a][b][c][df;
Fig. 20 show a flow chart of a method for providing a decoded representation of an audio information; and Fig. 21 shows a flow chart of a method for providing an encoded representation of an audio information.
Detailed Description of the Embodiments
1. Audio Encoder
In the following, an audio encoder according to an embodiment of the present invention will be described. Fig. 1 shows a block schematic diagram of such an audio encoder 100.
The audio encoder 100 is configured to receive an input audio information 110 and to provide, on the basis thereof, a bitstream 112, which constitutes an encoded audio information. The audio encoder 100 optionally comprises a preprocessor 120, which is configured to receive the input audio information 110 and to provide, on the basis thereof, a preprocessed input audio information HOa. The audio encoder 100 also comprises an energy-compacting time-domain to frequency-domain signal transformer 130, which is also designated as signal converter. The signal converter 130 is configured to receive the input audio information 110, 110a and to provide, on the basis thereof, a frequency-domain audio information 132, which preferably takes the form of a set of spectral values. For example, the signal transformer 130 may be configured to receive a frame of the input audio information 110, 110a (for example, a block of time-domain samples) and to provide a set of spectral values representing the audio content of the respective audio frame. In addition, the signal transformer 130 may be configured to receive a plurality of subsequent, overlapping or non-overlapping, audio frames of the input audio information 110, 110a and to provide, on the basis thereof, a time-frequency-domain audio representation, which comprises a sequence of subsequent sets of spectral values, one set of spectral values associated with each frame.
The energy-compacting time-domain to frequency-domain signal transformer 130 may comprise an energy-compacting filterbank, which provides spectral values associated with different, overlapping or non-overlapping, frequency ranges. For example, the signal transformer 130 may comprise a windowing MDCT transformer 130a, which is configured to window the input audio information 110, 110a (or a frame thereof) using a transform window and to perform a modified-discrete-cosine-transform of the windowed input audio information 110, HOa (or of the windowed frame thereof). Accordingly, the frequency- domain audio representation 132 may comprise a set of, for example, 1024 spectral values in the form of MDCT coefficients associated with a frame of the input audio information.
The audio encoder 100 may further, optionally, comprise a spectral post-processor 140, which is configured to receive the frequency-domain audio representation 132 and to provide, on the basis thereof, a post-processed frequency-domain audio representation 142. The spectral post-processor 140 may, for example, be configured to perform a temporal noise shaping and/or a long term prediction and/or any other spectral post-processing known in the art. The audio encoder further comprises, optionally, a sealer/quantizer 150, which is configured to receive the frequency-domain audio representation 132 or the post- processed version 142 thereof and to provide a scaled and quantized frequency-domain audio representation 152.
The audio encoder 100 further comprises, optionally, a psycho-acoustic model processor 160, which is configured to receive the input audio information 110 (or the post-processed version 110a thereof) and to provide, on the basis thereof, an optional control information, which may be used for the control of the energy-compacting time-domain to frequency- domain signal transformer 130, for the control of the optional spectral post-processor 140 and/or for the control of the optional sealer/quantizer 150. For example, the psycho- acoustic model processor 160 may be configured to analyze the input audio information, to determine which components of the input audio information 110, 110a are particularly important for the human perception of the audio content and which components of the input audio information 110, HOa are less important for the perception of the audio content. Accordingly, the psycho-acoustic model processor 160 may provide control information, which is used by the audio encoder 100 in order to adjust the scaling of the frequency-domain audio representation 132, 142 by the sealer/quantizer 150 and/or the quantization resolution applied by the sealer/quantizer 150. Consequently, perceptually important scale factor bands (i.e. groups of adjacent spectral values which are particularly important for the human perception of the audio content) are scaled with a large scaling factor and quantized with comparatively high resolution, while perceptually less-important scale factor bands (i.e. groups of adjacent spectral values) are scaled with a comparatively smaller scaling factor and quantized with a comparatively lower quantization resolution. Accordingly, scaled spectral values of perceptually more important frequencies are typically significantly larger than spectral values of perceptually less important frequencies.
The audio encoder also comprises an arithmetic encoder 170, which is configured to receive the scaled and quantized version 152 of the frequency-domain audio representation 132 (or, alternatively, the post-processed version 142 of the frequency-domain audio representation 132, or even the frequency-domain audio representation 132 itself) and to provide arithmetic codeword information 172a, 172b on the basis thereof, such that the arithmetic codeword information represents the frequency-domain audio representation 152.
The audio encoder 100 also comprises a bitstream payload formatter 190, which is configured to receive the arithmetic codeword information 172a, 172b. The bitstream payload formatter 190 is also typically configured to receive additional information, like, for example, scale factor information describing which scale factors have been applied by the sealer/quantizer 150. In addition, the bitstream payload formatter 190 may be configured to receive other control information. The bitstream payload formatter 190 is configured to provide the bitstream 112 on the basis of the received information by assembling the bitstream in accordance with a desired bitstream syntax, which will be discussed below.
In the following, details regarding the arithmetic encoder 170 will be described. The arithmetic encoder 170 is configured to receive a plurality of tuples of, for example, four post-processed and scaled and quantized spectral values of the frequency-domain audio representation 132. The arithmetic encoder comprises a most-significant-bitplane-extractor 174, which is configured to extract a most-significant bit-plane from a tuple of spectral values. It should be noted here that the most-significant bit-plane may comprise one or even more bits (for example, two or three bits), which are the most-significant bits of the spectral values of the tuple of spectral values. Thus, the most-significant bit-plane extractor 174 provides a most-significant bit-plane 176 of a tuple of spectral values (which spectral values are preferably adjacent in frequency). The arithmetic encoder 170 also comprises a group index determinator/element index determinator 178, which is configured to map the most-significant bit-plane 176 onto a group-index value ng and an element-index value ne. This mapping may be performed using a look-up table, for example, the look-up table "egroups" discussed in detail below. The group index determinator/element index determinator 178 may be configured to map some of the combinations of values of the most-significant bit-plane 176 onto a group index ng of a group comprising only one element and may be configured to map other combinations of values of the most- significant bit-plane 176 onto a group index ng of a group comprising a plurality of combinations of values. Accordingly, the group index determinator/element index determinator may be configured to map such combinations of values of the most- significant bit-plane 176 which comprise a comparatively high probability onto groups comprising only one or only a few elements, and to map combinations of values of the most-significant bit-plane 176 comprising a comparatively lower probability onto groups comprising more elements. Consequently, the element index ne of a combination of values, which is mapped to a group comprising only a single element, may only take a single value and may therefore be neglected. In contrast, the element index ne of a combination of values, which is mapped to a group comprising a plurality of elements, may take a plurality of values. Thus, the group index determinator/element index determinator 178 provides a group index value ng (also designated with 180a) and, if required, the element index value ne (also designated with 180b), wherein the element index value ne may be set to a default value or omitted if the group ng to which the most-significant bit- plane 176 is mapped, comprises a single element only.
The arithmetic encoder 170 also comprises a first codeword determinator 180, which is configured to determine an arithmetic codeword acod_ng [pki][ng] representing the group index ng. In addition, the first codeword determinator 180 may provide an arithmetic codeword acodjtie [ne] representing the element index ne, if the number of elements mm of the group ng is larger than 1. Otherwise, the provision of the arithmetic codeword acod_ne [ne] representing the element index ne may be omitted. Optionally, the codeword determinator 180 may also provide one or more escape codewords (also designated herein with "ARITH_ESCAPE") indicating, for example, how many less-significant bit-planes are available (and, consequently, indicating the numeric weight of the most-significant bit- plane). The first codeword determinator 180 may be configured to provide the codeword associated with a group index ng using a selected curnulated-frequencies-table having (or being referenced by) a cumulative-frequencies-table index pki.
In order to determine as to which cumulative-frequencies-table should be selected, the arithmetic encoder preferably comprises a state tracker 182, which is configured to track the state of the arithmetic encoder, for example, by observing which tuples of spectral values have been encoded previously. The state tracker 182 consequently provides a state information 184, for example a state value designated with "s" or "t". The arithmetic encoder 170 also comprises a cumulative-frequencies-table selector 186, which is configured to receive the state information 184 and to provide an information 188 describing the selected cumulative-frequencies-table to the codeword determinator 180. For example, the cumulative-frequencies-table selector 186 may provide a cumulative- frequencies-table index pki describing which cumulative-frequencies-table, out of a set of 32 cumulative frequencies tables, is selected for usage by the codeword determinator. Alternatively, the cumulative-frequencies-table selector 186 may provide the entire selected cumulative-frequencies-table to the codeword determinator. Thus, the codeword determinator 180 may use the selected cumulative-frequencies-table for the provision of the codeword acod_ng[pki][ng] of the group index ng, such that the actual codeword acod_ng[pki][ng] encoding the group index ng is dependent on the value of ng and the cumulative-frequencies-table index pki, and consequently on the current state information 184. In contrast, the first codeword determinator 180 may use a default (state-independent) cumulative-frequencies-table for the provision of the codeword acod_ne [ne], which may, however, be dependent on the number of elements within the selected group ng. Further details regarding the coding process and the obtained codeword format will be described below.
The arithmetic encoder 170 further comprises a less-significant bit-plane extractor 189a, which is configured to extract one or more less-significant bit-planes from the scaled and quantized frequency-domain audio representation 152, if one or more of the values of a tuple of spectral values to be encoded exceed the range of values encodeable using the most-significant bit-plane only. The less-significant bit-planes may comprise one or more bits per spectral values, as desired. Accordingly, the less-significant bit-plane extractor 189a provides a less-significant bit-plane information 189b. The arithmetic encoder 170 also comprises a second codeword determinator 189c, which is configured to receive the less-significant bit-plane information 189d and to provide, on the basis thereof, 0, 1 or more codewords "acod_r" representing the content of 0, 1 or more less-significant bit- planes. The second codeword determinator 189c may be configured to apply an arithmetic encoding algorithm or any other encoding algorithm in order to derive the less-significant bit-plane codewords "acodjr" from the less-significant bit-plane information 189b.
It should be noted here that the number of less-significant bit-planes may vary in dependence on the value of the scaled and quantized spectral values 152, such that there may be no less-significant bit-plane at all, if the scaled and quantized spectral values of the current tuple are comparatively small, such that there may be one less-significant bit-plane if the scaled and quantized spectral values of the current tuple are of a medium range and such that there may be more than one less-significant bit-plane if the scaled and quantized spectral values take a comparatively large value.
To summarize the above, the arithmetic encoder 170 is configured to encode a tuple of scaled and quantized spectral values, which is described by the information 152, using a hierarchical encoding process. The most-significant bit-plane (comprising, for example, one, two or three bits per spectral value) is encoded to obtain an arithmetic codeword "acod_ng [pki][ng]" of a group index ng and, in some cases, a codeword "acod_ne [ne]" of an element index ne. One or more less-significant bit-planes (each of the less-significant bit-planes comprising, for example, one, two or three bits) are encoded to obtain one or more codewords "acod_r". When encoding the most-significant bit-plane, the combination of values of the most-significant bit-plane is mapped to a group ng of the plurality of groups wherein some of the groups comprise one element only and wherein others of the groups comprise a plurality of elements each. Accordingly, the probability of the different combinations of values is considered. Subsequently, the group index ng and the element index ne (if required) are encoded, wherein 32 different cumulative-frequencies-tables are available for the encoding of the group index ng in dependence on a state of the arithmetic encoder 170, i.e. in dependence on previously-encoded tuples of spectral values. Accordingly, codewords "acod__ng [pki][ng]" and "acod_ne [ne]" are obtained, wherein the latter codeword is only included in the bitstream 112 if the group index ng designates a group comprising more than one element. In addition, one or more codewords "acod_r" are provided and included into the bitstream if one or more less-significant bit-planes are present.
Reset description
The audio encoder 100 may optionally be configured to decide whether an improvement in bitrate can be obtained by resetting the context, for example by setting the state index to a default value. Accordingly, the audio encoder 100 may be configured to provide a reset information (e.g. named "arith_reset_flag") indication whether the context for the arithmetic encoding is reset, and also indicating whether the context for the arithmetic decoding in a corresponding decoder should be reset.
Details regarding the bitstream format and the applied cumulative-frequency tables will be discussed below.
2. Audio Decoder
In the following, an audio decoder according to an embodiment of the invention will be described. Fig. 2 shows a block schematic diagram of such an audio decoder 200.
The audio decoder 200 is configured to receive a bitstream 210, which represents an encoded audio information and which may be identical to the bitstream 112 provided by the audio encoder 100. The audio decoder 200 provides a decoded audio information 212 on the basis of the bitstream 210. The audio decoder 200 comprises an optional bitstream payload deformatter 220, which is configured to receive the bitstream 210 and to extract from the bitstream 210 an encoded frequency-domain audio representation 222. For example, the bitstream payload deformatter 220 may be configured to extract from the bitstream 210 arithmetically-coded spectral data like, for example, an arithmetic codeword "acod_ng [pki][ng]" representing a group index ng, an arithmetic codeword "acod_ne [ne]" representing an element index ne and a codeword "acod_r" representing a content of a less-significant bit-plane of the frequency-domain audio representation. Thus, the encoded frequency-domain audio representation 222 constitutes (or comprises) an arithmetically-encoded representation of spectral values. The bitstream payload deformatter 220 is further configured to extract from the bitstream additional control information, which is not shown in Fig. 2. In addition, the bitstream payload deformatter is optionally configured to extract from the bitstream 210 a state reset information 224, which is also designated as arithmetic reset flag or "arith_reset_flag".
The audio decoder 200 comprises an arithmetic decoder 230, which is also designated as "spectral noiseless decoder". The arithmetic decoder 230 is configured to receive the encoded frequency-domain audio representation 220 and, optionally, the state reset information 224. The arithmetic decoder 230 is also configured to provide a decoded frequency-domain audio representation 232, which may comprise a decoded representation of spectral values. For example, the decoded frequency-domain audio representation 232 may comprise a decoded representation of tuples of spectral values, which are described by the encoded frequency-domain audio representation 220.
The audio decoder 200 also comprises an optional inverse quantizer/rescaler 240, which is configured to receive the decoded frequency-domain audio representation 232 and to provide, on the basis thereof, an inversely-quantized and rescaled frequency-domain audio representation 242.
The audio decoder 200 further comprises an optional spectral pre-processor 250, which is configured to receive the inversely-quantized and rescaled frequency-domain audio representation 242 and to provide, on the basis thereof, a pre-processed version 252 of the inversely-quantized and rescaled frequency-domain audio representation 242. The audio decoder 200 also comprises a frequency-domain to time-domain signal transformer 260, which is also designated as a "signal converter". The signal transformer 260 is configured to receive the pre-processed version 252 of the inversely-quantized and rescaled frequency-domain audio representation 242 (or, alternatively, the inversely-quantized and rescaled frequency-domain audio representation 242 or the decoded frequency-domain audio representation 232) and to provide, on the basis thereof, a time-domain representation 262 of the audio information. The frequency-domain to time-domain signal transformer 260 may, for example, comprise a transformer for performing an inverse- modified-discrete-cosine transform (IMDCT) and an appropriate windowing (as well as other auxiliary functionalities, like, for example, an overlap-and-add).
The audio decoder 200 may further comprise an optional time-domain post-processor 270, which is configured to receive the time-domain representation 262 of the audio information and to obtain the decoded audio information 212 using a time-domain post-processing. However, if the post-processing is omitted, the time-domain representation 262 may be identical to the decoded audio information 212.
It should be noted here that the inverse quantizer/rescaler 240, the spectral pre-processor 250, the frequency-domain to time-domain signal transformer 260 and the time-domain post-processor 270 may be controlled in dependence on control information, which is extracted from the bitstream 210 by the bitstream payload deformatter 220.
To summarize the overall functionality of the audio decoder 200, a decoded frequency- domain audio representation 232, for example, a set of spectral values associated with an audio frame of the encoded audio information, may be obtained on the basis of the encoded frequency-domain representation 222 using the arithmetic decoder 230. Subsequently, the set of, for example, 1024 spectral values, which may be MDCT coefficients, are inversely quantized, rescaled and pre-processed. Accordingly, an inversely-quantized, rescaled and spectrally pre-processed set of spectral values (for example, 1024 MDCT coefficients) is obtained. Afterwards, a time-domain representation of an audio frame is derived from the inversely-quantized, rescaled and spectrally pre-processed set of frequency-domain values (e.g. MDCT coefficients). Accordingly, a time-domain representation of an audio frame is obtained. The time-domain representation of a given audio frame may be combined with time-domain representations of previous and/or subsequent audio frames. For example, an overlap-and-add between time-domain representations of subsequent audio frames may be performed in order to smoothen the transitions between the time-domain representations of the adjacent audio frames and in order to obtain an aliasing cancellation. For details regarding the reconstruction of the decoded audio information 212 on the basis of the decoded time-frequency domain audio representation 232, reference is made, for example, to the International Standard ISO/IEC 14496-3, part 3, sub-part 4 where a detailed discussion is given. In the following, some details regarding the arithmetic decoder 230 will be described. The arithmetic decoder 230 comprises a group index determinator/element index determinator 280, which is configured to receive the arithmetic codeword acod_ng [pki][ng] describing the group index ng and to also receive the codeword acod_ne [ne] of the element index ne if the codeword acod_ne [ne] is available. The group index determinator 280 is configured to provide a decoded group index value ng and to also provide a decoded element index value ne if the group described by the group index value ng comprises more than one element. However, the group index determinator/element index determinator 280 may be configured to provide the default element index value ne, for example, of one if the group described by the group index value ng comprises one element only. The group index determinator/element index determinator 280 may be configured to use a cumulative- frequencies table out of a set comprising a plurality of 32 cumulative frequencies tables for deriving the group index value ng from the arithmetic codeword "acod_ng [pki][ng]". The arithmetic decoder 280 further comprises a most-significant bit-plane determinator 284, which is configured to derive values 286 of a most-significant bit-plane of a 2-bit tuple (or 3 -bit tuple) of spectral values on the basis of a group index value ng and an element index value ne. The arithmetic decoder 230 further comprises a less-significant bit-plane determinator 288, which is configured to receive one or more codewords "acod_r" representing one or more less-significant bit-planes of a tuple of spectral values. Accordingly, the less-significant bit-plane determinator 288 is configured to provide decoded values 290 of one or more less-significant bit-planes. The audio decoder 200 also comprises a bit-plane combiner 292, which is configured to receive the decoded values 286 of the most-significant bit-plane of the tuple of spectral values and the decoded values 290 of one or more less-significant bit planes of the tuple of spectral values if such less- significant bit-planes are available for the current tuple of spectral values. Accordingly, the bit-plane combiner 292 provides a tuple of encoded spectral values, which is part of the decoded frequency-domain audio representation 232. Naturally, the arithmetic decoder 230 is typically configured to provide a plurality of tuples of decoded spectral values in order to obtain a full set of decoded spectral values associated with a current frame of the audio content.
The arithmetic decoder 230 further comprises a cumulative-frequencies-table selector 296, which is configured to select one of the 32 cumulative-frequencies tables in dependence on a state index 298 describing a state of the arithmetic decoder. The arithmetic decoder 230 further comprises a state tracker 299, which is configured to track a state of the arithmetic decoder in dependence on the tuples of previously-decoded spectral values. The state information may optionally be reset to a default state information in response to the state reset information 224. Accordingly, the cumulative-frequencies-table selector 296 is configured to provide an index (e.g. pki) of a selected cumulative-frequencies-table, or a selected cumulative-frequencies-table itself, to the group index determinator/element index determinator 280 for application in the decoding of the group index ng in dependence on the group index codeword "acod_ng".
To summarize the functionality of the audio decoder 200, the audio decoder 200 is configured to receive a bitrate-efficiently-encoded frequency-domain audio representation 222 and to obtain a decoded frequency-domain audio representation on the basis thereof. In the arithmetic decoder 230, which is used for obtaining the decoded frequency-domain audio representation 232 on the basis of the encoded frequency-domain audio representation 222, a probability of different combinations of values of the most-significant bit-plane is exploited by using an arithmetic decoder 280, which is configured to apply a cumulative-frequencies-table. In addition, statistic dependencies between different tuples of spectral values are exploited by selecting different cumulative-frequencies-tables out of a set comprising 32 different cumulative-frequencies-tables in dependence on a state index 298, which is obtained by observing the previously-computed tuples of decoded spectral values.
3. Overview over the Tool of Spectral Noiseless Coding
In the following, details regarding the encoding and decoding algorithm, which is performed, for example, by the arithmetic encoder 170 and the arithmetic decoder 230 will be explained.
Focus is put on the description of the decoding algorithm. It should be noted, however, that a corresponding encoding algorithm can be performed in accordance with the teachings of the decoding algorithm, wherein mappings are inversed.
It should be noted that the decoding, which will be discussed in the following, is used in order to allow for a so-called "spectral noiseless coding" of typically post-processed, scaled and quantized spectral values. The spectral noiseless coding is used in an audio encoding/decoding concept to further reduce the redundancy of the quantized spectrum, which is obtained, for example, by an energy-compacting time-domain to a frequency- domain transformer.
The spectral noiseless coding scheme, which is used in embodiments of the invention, is based on an arithmetic coding in conjunction with a dynamically-adapted context. In the preferred embodiment and in the following, the spectral values are processed by the spectral noiseless coding by tuples combining 4 successive spectral values in frequency and called then 4-tuples. The noiseless coding is fed by (original or encoded representations of) quantized spectral values and uses context-dependent cumulative- frequencies-tables derived, for example, from four previously-decoded neighboring 4- tuples. Here, the neighborhood in both time and frequency is taken into account as illustrated in Fig. 4. The cumulative-frequencies-tables (which will be explained below) are then used by the arithmetic coder to generate a variable-length binary code and by the arithmetic decoder to derive decoded values from a variable-length binary code.
For example, the arithmetic coder 170 produces a binary code for a given set of symbols in dependence on the respective probabilities. The binary code is generated by mapping a probability interval, where the set of symbol lies, to a codeword.
4. Decoding Process
4.1 Decoding Process Overview
In the following, an overview of the process of decoding a tuple of spectral values will be given taking reference to Fig. 3, which shows a pseudo-program code representation of the process of decoding a plurality of tuples of spectral values.
The process of decoding a plurality of tuples of spectral values comprises an initialization 310 of a context. The initialization 310 of the context selectively comprises a reset of the context using the function "arith_reset_context ()" or a derivation of the current context from a previous context using the function "arith_map_context (lg/4)". Both the reset of the context and the derivation of the current context from a previous context will be discussed below.
The decoding of a plurality of tuples of spectral values also comprises an iteration of a tuple decoding 312 and a context update 314, which context update is performed by a function "arith_update_context (a,b,c,d,i,lg/4)", which is described below. The tuple decoding 312 and the context update 314 are repeated lg/4 times, wherein lg/4 indicates the number of tuples of spectral values to be decoded. The tuple decoding 312 comprises a context- value calculation 312a, a group index decoding 312b, an element index decoding 312c, a most-significant bit-plane determination 312d and a less-significant bit-plane addition 312e. The state value computation 312a comprises the computation of a first state value s using the function "arith_get_context (i)", which function returns the first state value s. The state value computation 312a also comprises a computation of a level value lev, which is obtained by shifting the first state value s to the right by 24-bits. The state value computation 312a also comprises a computation of a second state value t according to the formula shown in Fig. 3.
The group index decoding 312b comprises an iterative execution of a decoding algorithm 312ba, wherein a variable j is initialized to 0 before a first execution of the algorithm 312ba.
The algorithm 312ba comprises a computation of a state index pki (which also serves as a cumulative-frequencies-table index) in dependence on the second state value t using a function "arith_get_pk()", which is discussed above. The algorithm 312ba also comprises the selection of a cumulative-frequencies-table in dependence on the state index pki, wherein a variable "cum_freq" may be set to a starting address of one out of 32 cumulative-frequencies-tables in dependence on the state index pki. Also, a variable "cfi" may be initialized to a length of the selected cumulative-frequencies-table, which is equal to the number of symbols in the alphabet, i.e. the number of different value which cane be decoded. The lengths of all the cumulative-frequencies-tables from "arith_cf_ng [pki=0][545]" to "arith_cf_ng[pki=31][545]" available for the decoding of the group index is 545, as 544 different group indexes and an escape symbol can be decoded. Subsequently, a group index ng may be obtained by executing a function "arith_decode()", taking into consideration the selected cumulative frequencies-table. When deriving the group index ng, bits named "acod_ng" of the bitstream 210 may be evaluated (see Fig. 6g).
The algorithm 312ba also comprises checking whether the group index ng is equal to an escape symbol "ARITH_ESCAPE" or not. If the group index is not equal to the arithmetic escape symbol, the algorithm 312ba is aborted ("break"-condition) and the remaining instructions of the algorithm 312ba are therefore skipped. Accordingly, execution of the process is continued with the element index decoding 312c (if required) or with the most- significant-bitplane-determination 312d. In contrast, if the decoded group index ng is identical to the arithmetic escape symbol "ARITH_ESCAPE", the level value lev is increased by two. Also, if the algorithm 312ba is executed for the first time, i.e. if j=0, the second state value t is increased by 4194304, and the second state value t is set to 0 otherwise. Also, the variable j is set to 1 prior to the repetition of the algorithm 312ba. As mentioned, the algorithm 312ba is repeated until the decoded group index ng is different from the arithmetic escape symbol.
As soon as the group index decoding 312b is completed, i.e. a group index value different from the arithmetic escape symbol has been decoded, the element index decoding 312c is executed, if required. For this purpose, a cardinal (number of elements) of the group having the group index ng is determined, wherein the cardinal mm of the group designated by the group index ng is described by the eight least-significant bits (bits 0-7) of a table entry "dgroups [ng]" of the table "dgroups" at table position ng. If the cardinal mm of the group designated by the group index ng is larger than one, the element index ne is obtained by executing an algorithm 312ca. Otherwise, the element index ne may optionally be set to 0, or a different default value. For example, the operation ne=0 may be executed before the condition statement "if(mm>l)". The algorithm 312ca comprises the determination of a start address "cum_freq" of an appropriate cumulative-frequencies-table or a cumulative- frequencies-subtable. For example, the variable "cum_freq" may be set to a sum of the start address of the cumulative-frequencies-table "arith_cf_ne" and the value (mm) * (mm- l)/2, as shown in Fig. 3. Also, the variable cfl may be initialized to an appropriate length value of the respective cumulative-frequencies-table or cumulative-frequencies-subtable, which is equal to the number of elements mm within the group of index ng Subsequently, the element index ne may be obtained by performing the function "arith_decode()", wherein the selected cumulative-frequencies-table (e.g. a subtable of the table "arith_cf_ne") associated to the encoding of the element index is used.
Subsequently, the determination 312d of the most-significant bit-plane is executed. For this purpose, an entry of the table "dgvectors" is evaluated, the index of which is determined by the most significant bits (e.g. bits 8-15) of the value j of the table "dgroups" at element index ng and by the element index ne, as can be seen in Fig. 3. To be more specific, the value of the most-significant bit-plane of a first spectral value "a" (of a tuple of spectral values) is determined by an entry of the table "dgvectors" at an element index 4*(j»8+ne). Similarly, the value of the most-significant bit plane of a second spectral value "b" (of a tuple of spectral values) is obtained by evaluating the entry of the table "dgvectors" at index 4* (j»8+ne)
+1. Similarly, a value of the most-significant plane of a third spectral value "c" and of a fourth spectral value "d" (of the tuple of spectral values) is obtained, as shown in Fig. 3 at reference numeral 312d.
Subsequently, the less-significant bit planes are obtained, for example as shown at reference numerals 312e in Fig. 3. For each less significant bit plane of the tuple, 1 over 16 binary combinations is decoded. However, it should be noted that the concept for obtaining the values of the less-significant bit planes is not of particular relevance for the present invention.
4.2. Decoding Order (Fig. 4)
In the following, the decoding order of the spectral values will be described.
4-tuples of quantized spectral coefficients are noiselessly coded and transmitted (e.g. in the bitstream) starting from the lowest-frequency coefficient and progressing to the highest- frequency coefficient.
Coefficients from an advanced audio coding (for example obtained using a modified- discrete-cosine-transform, as discussed in ISO/IEC 14496, part3, subpart 4) are stored in an array called "x_ac_quant[g][wm][sfb][bin]", and the order of transmission of the noiseless-coding-codeword (e.g. acod_ng, acod_ne, acod_r) is such that when they are decoded in the order received and stored in the array, "bin" (the frequency index) is the most rapidly incrementing index and "g" is the most slowly incrementing index. Within a codeword, the order of decoding is a, b, c, d. In other words, the values a, b, c, d, are spectral values of adjacent frequencies, wherein the spectral value a is associated to a lower frequency than the spectral value b, the spectral value b is associated to a lower frequency than the spectral value c, and the spectral value c is associated to a lower frequency than the spectral value d.
Coefficients from the transform-coded-excitation (tcx) are stored directly in an array x_tcx_invquant[win][bin], and the order of the transmission of the noiseless coding codewords is such that when they are decoded in the order received and stored in the array,
"bin" is the most rapidly incrementing index and "win" is the slowest incrementing index.
Within a codeword, the order of decoding is a, b, c, d. In other words, if the spectral values describe a transform-coded-excitation of the linear-prediction filter of a speech coder, the spectral values a, b, c, d are associated to adjacent and increasing frequencies of the transform-coded-excitation.
Notably, the audio decoder 200 may be configured to apply the decoded frequency-domain audio representation 232, which is provided by the arithmetic decoder 230, both for a "direct" generation of a time-domain audio signal representation using a frequency-domain to time-domain signal transform and for an "indirect" provision of an audio signal representation using both a frequency-domain to time-domain decoder and a linear- prediction-filter excited by the output of the frequency-domain to time-domain signal transformer.
In other words, the arithmetic decoder 200, the functionality of which is discussed here in detail, is well-suited for decoding spectral values of a time-frequency-domain representation of an audio content encoded in the frequency-domain and for the provision of a time-frequency-domain representation of a stimulus signal for a linear-prediction-filter adapted to decode a speech signal encoded in the linear-prediction-domain. Thus, the arithmetic decoder is well-suited for use in an audio decoder which is capable of handling both frequency-domain-encoded audio content and linear-predictive-frequency-domain- encoded audio content.
4.3. Context Initialization (Figs. 5a and 5b)
In the following, the context initialization, which is performed in a step 310, will be described. First, the flag "arith_reset_flag", which may be part of the bitstream, determines if the context should be reset. If the flag is TRUE, the function "arith_reset_context()", which is shown in Fig. 5a, is called. Otherwise, if the flag "arith_reset_flag" is FALSE, a mapping is done between the past context and the current context in accordance with the algorithm "arith_map_context()", which is shown in Fig. 5b.
As can be seen, the current context is stored in a global variable q [2] [290], which takes the form of an array having a first dimension of 2 and a second dimension of 290. A past context is stored in a variable qs [258], which takes the form of a table having a dimension of 258. The variable "previous_lg/4" describes a number of 4-tuples of a past context.
As can be seen, the reset of the context, which is performed by the function "arith_reset_context()", comprises an initialization of the entries "a", "b", "c", "d" of the arrays q and qs (designated, for example, with qs[i].a , q[0][i].a and q[l][i].a) to zero. In addition, the entries "v" of the arrays q and qs (designated with qs[i].v, q[0][i].v and q[l][i].v) are initialized to -1. Also, the variable "previous_lg" is initialized to 1024.
If, however, it is decided not to reset the context, a mapping of the context may be performed in accordance with the algorithm "arith_map_context()'\ As can be seen, the mapping is dependent on the core mode, wherein "core_mode==l" indicates that the spectral values to be decoded are associated with a linear-predictive-frequency-domain- encoded audio frame, and wherein "core_mode==0" indicates that the spectral values to be decoded are associated with a frequency-domain-encoded audio frame. It should be noted here that the function "arith_map_context()" sets the entries q[0][i] of the current context array q to the values qs[i] of the past context array qs, if the number of spectral values associated with the current frequency-domain-encoded audio frame is identical to the number of spectral values associated with the previous audio frame for i=0 to i=lg/4+l .
However, a more complicated mapping is performed if the number of spectral values associated to the current audio frame is different from the number of spectral values associated to the previous audio frame. However, details regarding the mapping in this case are not particularly relevant for the key idea of present invention, such that reference is made to the pseudo program code of Fig. 5b for details.
4.4 State Value Computation (Fig. 5c)
In the following, the state value computation 312a will be described in more detail.
It should be noted that the first state value s (as shown in Fig. 3) can be obtained as a returned value of the function "arith_get_context(i)", a pseudo program code representation of which is shown in Fig. 5c.
Regarding the computation of the state value, reference is also made to Fig. 4, which shows the context used for a state calculation. Fig. 4 shows a two-dimensional representation of tuples of spectral values, both over time and frequency. An abscissa 410 describes the time, and an ordinate 412 describes the frequency. As can be seen in Fig. 4, a tuple 420 to decode is associated with a time index tO and a frequency index i (keeping in mind that the spectral values of the tuple 420 to decode are associated to four different frequencies). As can be seen, for the time index tO, the tuples having frequency indices i-1 and i-2 are already decoded at the time at which the tuple 420 having the frequency index i is to be decoded. As can be seen from Fig. 4, a tuple 430 having a time index tO and a frequency index i-1 is already decoded before the tuple 420 is decoded, and the tuple 430 is considered for the context which is used for decoding the tuple 420. Similarly, a tuple 440 having a time index t-1 and a frequency index i-1, a tuple 444 having a time index t-1 and a frequency index i, and a tuple 448 having a time index t-1 and a frequency index i+1 are already decoded before the tuple 420 is decoded, and are considered for the determination of the context which is used for decoding the tuple 420. In contrast, some other tuples already decoded, which are represented by squares having dashed lines, and other tuples, which are not yet decoded and which are shown by circles having dashed lines, are not used for determining the context for decoding the tuple 420. Taking reference now to Fig. 5c, which shows the functionality of the function "arith_get_context()", some more details regarding the calculation of the first context value "s" will be described.
The function "arith_get_context()" comprises a variable initialization 530a, during which the variables tθ, tl, t2 and t3 are initialized in dependence on the entries "v" of the array q at index positions (0, i), (1, i-1), (0, i-1) and (0, i+1). Accordingly, the variables t0 to t3 are initialized with the values of the entries "v", which are associated to the tuples 444, 430, 440, 448 respectively as shown in Fig. 4.
It should also be noted that the function "arith_get_context()" performs a subsequent check of a plurality of conditions, wherein the function "arith_get_context()" is terminated as soon as a "return" instruction is reached, wherein the return instruction (or operator) serves to return its operand (following the return instruction or operator) as the state value s.
The execution of the function "arith_get_context()" comprises a first condition check 530b. If it is found that (the values of) all the variables tθ, tl, t2 and t3 are smaller than 10, the return value is computed as shown at reference numeral 530b, and the function "arith_get_context()" is terminated with the return of said return value.
The execution of the function "arith_get_context()" also comprises a second condition check 530c. If it is found in the second condition check 530c that all the variables tθ, tl, t2 and t3 are smaller than 34, the variables t2 and t3 are conditionally modified, as shown at the reference numeral 530c, and the return value is computed as shown at reference numeral 530c. In particular, the variable t2 is set to 2 if the variable t2 is greater than 1 and smaller than 10. Otherwise, if the variable t2 is greater than or equal to 10 the variable t2 is set to 3. Similarly, the variable t3 is set to 2, if the variable t3 is greater than 1 and smaller than 10. Otherwise, if the variable t3 is greater than or equal to 10, the variable t3 is set to 3. Accordingly, the range of values of the variable t2 is limited to a maximum positive value of 3.
If, however, neither the condition of the first condition check 530b nor the condition of the second condition check 530c is fulfilled, a third condition check 53Od is performed. If it is found in the third condition check 530d that both the variables t0 and tl are smaller than 90, then the return value is computed as shown in reference numeral 530d, wherein the values of the variables t2 and t3 are left out of consideration. If5 however, none of the conditions of the first condition check 530b, of the second condition check 530c and of the third condition check 53Od is fulfilled, a fourth condition check 530e is performed, in which it is determined whether the variables t0 and tl are both smaller than 544. If this is the case, the return value is computed as shown in reference numeral 530e, and the function "arith_get_context()" is terminated.
If, however, none of the condition checks 530b, 530c, 530d, 530e results in the termination of the function "arith_get_context()"5 a context computation 530f is performed. The context computation 530f comprises a variable initialization 530fa, a variable rescaling 530fb, a table-based value adaptation 530fc and a return value computation 530fd. In the variable initialization 530fa, variables aθ, bθ, c0, d0 are set to the values of the entries "a", "b", "c", "d" of the array q at the array position (0,i) if the variable t0 takes a value greater than 1. It corresponds to the values of the 4-tuple 444 of the Fig. 4. Otherwise, if the value of the variable t0 is not greater than 1, the variables aθ, bθ, cO and d0 are initialized to 0. Similarly, if the value of the variable tl is greater than 1, the variables al, bl, cl, dl are initialized to the values of the entries "a", "b", "c", "d" of the array q at the position (1, i- 1), which corresponds to the values of the 4-tuple 430 of the Fig.4.
Accordingly, the variables aθ, bθ, cO and d0 are set to the spectral values a, b, c, d of a previously-decoded tuple of spectral values of time index t-1 and frequency index i, if the value of the variable t0 is greater than 1. Similarly, the variables al, bl, cl, dl are set to the spectral values a, b, c, d of the previously-decoded tuple of spectral values and the time index t0 and frequency index i-1.
Subsequently, the variables aθ, bθ, cO, dθ, al, bl, cl, dl are iteratively rescaled in that the number representations are iteratively shifted to the right by one bit until all of the variables aθ, bθ, cO, dθ, al, bl, cl, dl are in a range between -4 and +3, including the boundaries —4 and +3. After the variable rescaling 530fb, the variable 1 indicates how often the set of variables aθ, bθ, cO, dθ, al, bl, cl, dl has been shifted to the right, wherein at least one shift-to-the-right operation is performed. Accordingly, adapted variables aθ, bθ, cO, d0, al, bl, cl, dl are obtained, which are all in a range between -4 and +3.
Subsequently, the table-based values adaptation 530fc is performed. For this purpose, the variable t0 is set to a value, which is determined by an entry of the table (or array) "egroups", if the variable t0 is greater than 1. As can be seen, the entry at the position (4+aO, 4+b0, 4+cO, 4+d0) is used for this purpose. Similarly, if the value tl is greater than 1, the variable tl is set to a value, which is determined by an entry of the table "egroups" at a table position (4+al, 4+bl, 4+cl, 4+dl). Finally, a return value is computed in dependence on the variable 1 (which indicates how often a shift-to-the-right operation has been applied), and also in dependence on the variables tO and tl, as shown at reference numeral 530fd.
Accordingly, it can generally be said that the return value of the function "arith_get_context" is determined by the most-relevant bit planes (e.g. the three most- relevant bits) of the tuples 444, 430, 440 and 448 of the Fig. 4.
Also, it should be noted that a table lookup is performed if the variable tO is greater than or equal to 544 or the variable tl is greater than or equal to 544, while a numerical computation of the return value, using multiplications and additions, is used otherwise. Thus, a more liberate and more detailed computation of the return value of the function arith_get_context is performed if one of the variables tO and tl is greater than or equal 544.
It should be noted that in the Fig. 3 of reference numeral 312a, the variable "lev" is derived from the returned value s of the function arith_get_context(i). The variable lev is derived from the value s by shifting the value to the right by 24 bits. The state variable t is also derived from the value s by performing a mask operation of the value s by means of a "and" operation between the value s and the hexadecimal value "OxFFFFFF" and by adding a value of "1" to the result of the end operation.
4.5 Group Index Decoding (Figs. 5d, 5e)
In the following, the process 312b of group index decoding will be discussed, which process 312b is based on a previous calculation of the state value t as described above. Also, the iterative execution of the algorithm 312ba comprises a call of the function "arith_get_pk()" with the state value t (as shown in Fig. 3) as a parameter.
4.5.1. Function "arith_get_pk()" (Fig. 5d)
The function "arith_get_pk()" will subsequently be described in reference to Fig. 5d. The execution of the function "arith_getjpk()" comprises the initialization of an array psci with 28 values, as shown at reference numeral 540a. In addition, the function "arith_get_pk()" comprises the initialization of a pointer p and of the variables i, j, as shown at reference numeral 540b. The algorithm "arith_get_pk()" also comprises an initialization of the variable i to a value, which is equal to 63 *t, wherein t is the parameter handed over to the function "arith_get_pk()", when the function "arith_get_pk()" is called. Accordingly, the input variable s of the function "arith_get_pk()" may be identical to the variable t of the algorithm "tuples_decode()" as shown in Fig. 3. The initialization of the variable i is shown at reference numeral 540c.
The function "arith_get_pk()" also comprises an iterative execution of a hash table access 54Od, wherein the hash table access 54Od is repeated until a "break" condition is reached, or until a "return" operator is reached. If the "break" condition is reached, a range-based provision 54Oe of a return value is performed. If, however, the return operator is reached, the operand of the return operator is returned and the function "arith_getjpk()" is terminated.
The hash table access 54Od comprises an iterative execution of a first step 540da, a second step 540db, a third step 540dc and a fourth step 540dd. In the first step 540da, the variable j is set to the value of an entry of the table "ari_pk_hash", wherein the index of the entry is determined by the seven least-significant bits of the variable i. In the second step 540db, it is determined whether the value of the variable j, which is obtained in the first step 540da, takes the hexadecimal value of OxFFFFFFFF. If this is the case, the iterative execution of the hash table access 54Od is aborted and the execution of the algorithm "arith_get_pk()" is continued with the range-based provision of a return value. In other words, if the entry of the table "ari_pk_hash()", which is addressed by the seven least-significant bits of the variable i, takes the escape value of OxFFFFFFFF, it is assumed that the state defined by the input variable t of the function "arith_get_pk()" is a so-called "non-significant" state, to which a return value should be assigned using the range-based provision 54Oe of the return value, hi the third step 540dc of the hash table access 54Od, it is checked whether the most significant bits (e.g. bits 8 to 31) of the value of the variable j are equal to the value of the input variable t of the function "arith_get_pk()". If this is the case, the eight least significant bits (bits 0 to 7) of the variable j are returned as a return value of the function "arith_get_pk()", and the function "arith_get_pk()" is terminated. If, however, it is found that the condition of the third step 540dc is not fulfilled, the variable i is incremented by 1 (step 540dd) and the hash table access 54Od is repeated, starting from the first step 540da.
The range-based provision 54Oe of a return value comprises an initialization 540ea of the pointer p to a starting point within the array psci. The starting point is determined by bits 23 and 24 of the input variable t of the function "arith_get_pk()", which corresponds to the number of escape symbol "ARITH_ESCAPE" already decoded for the present tuple to decode. The pointer p is initialized to point at the first entry (24) of the array psci if the bits 23 and 24 of the input variable t take the values "00", to the eighth entry (30) of the array psci, if the bits 23 and 24 of the input variable s take the values "01", to the 15th entry (5) of the array psci, if the 23rd and 24th bit of the input variable s take the values "10" and to the 22nd entry (5) of the array psci, if the 23rd and 24th bit of the input variable t take the values "11". In a subsequent step 540eb, the variable j is set to take the value represented by the 22 least-significant bits (bits 1 to bit 22) of the input variable t, as shown at reference numeral 540eb. Subsequently, a decision is made which entry of the array psci is returned as the return value of the algorithm "arith_get_pk()". The following decision is made:
• If the value of j is smaller than 436961, and if the value of j is also smaller than 252001, and if the value of j is also smaller than 243001, the entry at the starting point determined in the step 540ea is returned;
• If the value of j is smaller than 436961, and the value of j is also smaller than 252001 and if the value of j is not smaller than 243001, the first entry after the starting point determining the step 540ea is returned; • If the value of j is smaller than 436961, and if the value of j is not smaller than
252001, and if the value of j is smaller than 288993, the second entry after the starting point determined in step 540ea is returned;
• If the value of j is smaller than 436961, and if the value of j is not smaller than 252001, and if the value of j is not smaller than 288993, the third entry after the starting point determined in the step 540ea is returned;
• If the value of j is not smaller than 436961, and if the value of j is smaller than 1609865 and if the value of j is also smaller than 880865, the fourth entry after the starting point determined in the step 540ea is returned;
• If the value of j is not smaller than 436961, and if the value of j is smaller than 1609865, and if the value of j is not smaller than 880865, the fifth entry after the starting point determined in the step 540ea is returned; and
• If the value of j is not smaller than 436961, and if the value of j is also not smaller than 1609865, the sixth value after the starting point determined in the step 540ea is returned.
For more details, reference is made to the algorithm at reference numeral 540ec in Fig. 5.
To summarize the above, the function "arith_get_pk()", which is called with the state value t, provides the value pki as a return value, as can be seen in Fig. 3 at reference numeral 312ba. The value of the variable pki is used to select a cumulative-frequencies-table for the execution of the function "arith_decode()" as it has been discussed with reference to Fig.3. Accordingly, the variable "cum_freq[]" is initialized appropriately to designate the selected cumulative-frequencies-table.
4.5.2. Function "arith_decode()" (Fig. 5e)
In the following, the functionality of the function "arith_decode()" will be discussed in detail taking reference to Fig. 5e. It should be noted that the function "arith_decode()" uses the helper function "arith_fϊrst_symbol (void)", which returns TRUE, if it is the first symbol of the sequence and FALSE otherwise. The function "arith_decode()" also uses the helper function "arith_get_next_bit()", which gets and provides the next bit of the bitstream.
In addition, the function "arith_decode()" uses the global variables "low", "high" and "value". Further, the function "arith_decode()" receives, as an input variable, the variable "cum_freq[]", which points towards a first entry or element (having element index or entry index 0) of the selected cumulative-frequencies-table. Also, the function "arith_decode()" uses the input variable cfl, which indicates the length of the selected cumulative- frequencies-table designated by the variable "cum_freq[]".
The function "arith_decode()" comprises, as a first step, a variable initialization 550a, which is performed if the helper function "arith_fϊrst_syrnbol()" indicates that the first symbol of a sequence of symbols is being decoded. The value initialization 550a initializes the variable "value" in dependence on a plurality of, for example, 20 bits, which are obtained from the bitstream using the helper function "arith_get_next_bit", such that the variable "value" takes the value represented by said bits. Also, the variable "low" is initialized to take the value of 0, and the variable "high" is initialized to take the value of 1048575.
In a second step 550b, the variable "range" is set to a value, which is larger, by 1, than the difference between the values of the variables "high" and "low". The variable "cum" is set to a value which represents a relative position of the value of the variable "value" between the value of the variable "low" and the value of the variable "high". Accordingly, the variable "cum" takes, for example, a value between 0 and 216 in dependence on the value of the variable "value". The pointer p is initialized to a value which is smaller, by 1, than the starting address of the selected cumulative-frequencies-table.
The algorithm "arith_decode()" also comprises an iterative cumulative-frequencies-table- search 550c. The iterative cumulative-frequencies-table-search is repeated until the variable cfl is smaller than or equal to 1. In the iterative cumulative-frequencies-table- search 550c, the pointer variable q is set to a value, which is equal to the sum of the current value of the pointer variable p and half the value of the variable cfl. If the value of the entry *q of the selected cumulative-frequencies-table, which entry is addressed by the pointer variable q, is larger than the value of the variable "cum", the pointer variable p is set to the value of the pointer variable q, and the variable cfl is incremented. Finally, the variable cfl is shifted to the right by one bit, thereby effectively dividing the value of the variable cfl by 2 and neglecting the modulo portion.
Accordingly, the iterative cumulative-frequencies-table-search 550C effectively compares the value of the variable "cum" with a plurality of entries of the selected cumulative- frequencies-table, in order to identify an interval within the selected cumulative- frequencies-table, which is bounded by entries of the cumulative-frequencies-table, such that the value cum lies within the identified interval. Accordingly, the entries of the selected cumulative-frequencies-table define intervals, wherein a respective symbol value is associated to each of the intervals of the selected cumulative-frequencies-table. Also, the widths of the intervals between two adjacent values of the cumulative-frequencies-table define probabilities of the symbols associated with said intervals, such that the selected cumulative-frequencies-table in its entirety defines a probability distribution of the different symbols (or symbol values). Details regarding the available cumulative- frequencies-tables will be discussed below taking reference to Fig. 16.
Taking reference again to Fig. 5e, the symbol value is derived from the value of the pointer variable p, wherein the symbol value is derived as shown at reference numeral 55Od. Thus, the difference between the value of the pointer variable p and the starting address "cum_freq" is evaluated in order to obtain the symbol value, which is represented by the variable "symbol".
The algorithm "arith_decode" also comprises an adaptation 550e of the variables "high" and "low". If the symbol value represented by the variable "symbol" is different from 0, the variable "high" is updated, as shown at reference numeral 55Od. Also, the value of the variable "low" is updated, as shown at reference numeral 550e. The variable "high" is set to a value which is determined by the value of the variable "low", the variable "range" and the entry having the index "symbol -1" of the selected cumulative-frequencies-table. The variable "low" is increased, wherein the magnitude of the increase is determined by the variable "range" and the entry of the selected cumulative-frequencies-table having the index "symbol". Accordingly, the difference between the values of the variables "low" and "high" is adjusted in dependence on the numeric difference between two adjacent entries of the selected cumulative-frequencies-table.
Accordingly, if a symbol value having a low probability is detected, the interval between the values of the variables "low" and "high" is reduced to a narrow width. In contrast, if the detected symbol value comprises a relatively large probability, the width of the interval between the values of the variables "low" and "high" is set to a comparatively large value. Again, the width of the interval between the values of the variable "low" and "high" is dependent on the detected symbol and the corresponding entries of the cumulative- frequencies-table.
The algorithm "arith_decode()" also comprises an interval renormalization 55Of, in which the interval determined in the step 55Oe is iteratively shifted and scaled until the "break"- condition is reached. In the interval renormalization 55Of, a selective shift-downward operation 55Ofa is performed. If the variable "high" is smaller than 524286, nothing is done, and the interval renormalization continues with an interval-size-increase operation 560fb. If, however, the variable "high" is not smaller than 524286 and the variable "low" is greater than or equal to 524286, the variables "values", "low" and "high" are all reduced by 524286, such that an interval defined by the variables "low" and "high" is shifted downwards, and such that the value of the variable "value" is also shifted downwards. If, however, it is found that the value of the variable "high" is not smaller than 524286, and that the variable "low" is not greater than or equal to 524286, and that the variable "low" is greater than or equal to 262143 and that the variable "high" is smaller than 786429, the variables "value", "low" and "high" are all reduced by 262143, thereby shifting down the interval between the values of the variables "high" and "low" and also the value of the variable "value". If, however, neither of the above conditions is fulfilled, the interval renormalization is aborted.
If, however, any of the above-mentioned conditions, which are evaluated in the step 550fa, is fulfilled, the interval-increase-operation 550fb is executed. In the interval-increase- operation 550fb, the value of the variable "low" is doubled. Also, the value of the variable "high" is doubled, and the result of the doubling is increased by 1. Also, the value of the variable "value" is doubled (shifted to the left by one bit), and a bit of the bitstream, which is obtained by the helper function "arith_get_next_bit" is used as the least-significant bit. Accordingly, the size of the interval between the values of the variables "low" and "high" is approximately doubled, and the precision of the variable "value" is increased by using a new bit of the bitstream. As mentioned above, the steps 550fa and 550fb are repeated until the "break" condition is reached, i.e. until the interval between the values of the variables "low" and "high" is large enough.
Regarding the functionality of the algorithm "arith_decode()", it should be noted that the interval between the values of the variables "low" and "high" is reduced in the step 55Oe in dependence on two adjacent entries of the cumulative-frequencies-table referenced by the variable "cum_freq". If an interval between two adjacent values of the selected cumulative-frequencies-table is small, i.e. if the adjacent values are comparatively close together, the interval between the values of the variables "low" and "high", which is obtained in the step 55Oe, will be comparatively small. In contrast, if two adjacent entries of the cumulative-frequencies-table are spaced further, the interval between the values of the variables "low" and "high", which is obtained in the step 55Oe, will be comparatively large.
Consequently, if the interval between the values of the variables "low" and "high", which is obtained in the step 55Oe, is comparatively small, a large number of interval renormalization steps will be executed to re-scale the interval to a "sufficient" size (such that neither of the conditions of the condition evaluation 550fa is fulfilled). Accordingly, a comparatively large number of bits from the bitstream will be used in order to increase the precision of the variable "value". If, in contrast, the interval size obtained in the step 550e is comparatively large, only a smaller number of repetitions of the interval normalization steps 550fa and 550fb will be required in order to renormalize the interval between the values of the variables "low" and "high" to a "sufficient" size. Accordingly, only a comparatively small number of bits from the bitstream will be used to increase the precision of the variable "value" and to prepare a decoding of a next symbol.
To summarize the above, if a symbol is decoded, which comprises a comparatively high probability, and to which a large interval is associated by the entries of the selected cumulative-frequencies-table, only a comparatively small number of bits will be read from the bitstream in order to allow for the decoding of a subsequent symbol. In contrast, if a symbol is decoded, which comprises a comparatively small probability and to which a small interval is associated by the entries of the selected cumulative-frequencies-table, a comparatively large number of bits will be taken from the bitstream in order to prepare a decoding of the next symbol. Accordingly, the entries of the cumulative-frequencies-tables reflect the probabilities of the different symbols and also reflect a number of bits required for decoding a sequence of symbols. By varying the cumulative-frequencies-table in dependence on a context, i.e. in dependence on previously decoded symbols, for example by selecting different cumulative-frequencies-tables in dependence on the context, stochastic dependencies between the different symbols can be exploited, which allows for a particular bitrate- efficient encoding of the subsequent (or adjacent) symbols.
To summarize the above, the function "arith_decode()", which has been described with reference to Fig. 5e, is called with the cumulative-frequencies-table "arith_cf_ng[pki][]", corresponding to the index pki returned by the function "arith_getjpk()" to determine the group index ng.
4.5.3. Escape Mechanism
While the decoded group index ng is the escape symbol "ARITH_ESCAPE", an additional group index ng is decoded and the variable lev is incremented by 2. Accordingly, an information is obtained about the numeric significance of the most-significant bit-plane and also about a number of less-significant bit-planes to be decoded. If an escape symbol "ARITH_ESCAPE" is decoded for the first time for the present tuple to decode, the state variable t is then incremented by the value 4194304, which corresponds to set to "1" the 23th bit of the variable t. If an escape symbol is decoded for the second and more time, the state variable t is then set to zero. In both cases, when a escape symbol "ARITH_ESCAPE" is decoded, the updated state variable t is then used for a new iteration of the group index decoding 312b.
4.6.Element Index Decoding (Fig. 5f)
Once the decoded group index is not the escape symbol, "ARITH_ESCAPE", the number of elements, mm, within the group ng and the group offset, og are deduced by looking up the table "dgroups[]" in accordance with the algorithm shown in Fig. 5f.
In other words, the variable mm is set to a value, which is determined by the least- significant bits (e.g. bits 0-7) of the entries of the table "dgroups[]" at a table position determined by the group index ng. Similarly, the group offset og is determined by the more significant bits (bit 8 and onwards) of the entry of the table "dgroupsQ", which entry is determined by the position offset defined by the variable ng.
If the variable mm is greater than 1 , i.e. if the group determined by the group index ng comprises more than one element, the element index ne is decoded by calling the function "arith_decode()" with the cumulative-frequencies-table
arith_cf_ne + ((mm*(mm-l))»l)[] and with the length of the cumulative-frequencies- table equals to mm.
In other words, a subsection of the table "arith_cf_ne[]" is selected, wherein the cumulative-frequencies-table "arith_cf_ne[]" in its entirety describes probability distributions for a plurality of different numbers of elements of a group selected by the group index ng. It should be noted that offsets of the different subsections (or subtables) of the cumulative-frequencies-table "arith_cf_ne[]" are described by the formula (mm*(mm- 1))»1.
It should be noted that preferably the variable "cum_freq", which is used as an input variable of the algorithm "arith_decode()"5 is initialized to the starting address of the subsection (or subtable) of the table or array "arith_cf_ne[]" which is associated to the number of elements mm of the current group described by the group index ng. Also, the variable cfl, which is an input variable of the algorithm "arith_decode()", is initialized to the value mm. Subsequently, the function "arith_decode()" is called, the operation of which has been described in detail above. However, for the decoding of the element index, the function "arith_decode()" uses a subtable of the table or array "arith_cf_ne[]", rather than one of the cumulative-frequencies tables "arith_cf_ng[pki=0][545]" to "arith_cf_ng[pki=31][545]". Consequently, the element index ne is provided as a return value of the function "arith_decode()"5 which uses a number of bits of the bitstream to obtain the element index ne.
4.7. Most-Significant Bit-Plane Determination (Fig. 5g)
Once the element index ne is decoded and returned as a return value of the function "arith_decode()", the most-significant signed 2-bits wise plane of the 4-tuple can be derived using the table "dgvectorf]" in accordance with the algorithm shown in Fig. 5g.
For example, a first spectral value "a" of the tuple of spectral values can be set to an entry of the table or array "dgvectors[]", wherein the array element index (or table element index, or briefly "element index" or "entry index") is determined as 4*(og+ne). Similarly, the second spectral value "b" of the tuple of spectral values can be set to an entry of the array "dgvectors[]", wherein the array element index is determined by 4*(og+ne)+l. A third spectral value "c" of the tuple of spectral values can be set to an entry of the array "dgvectorsQ", wherein an element index is determined by 4*(og+ne)+2. A fourth spectral value "d" of the tuple of spectral values can be set to an entry of the array "dgvectorsQ", wherein the element index is determined by 4*(og+ne)+3. Thus, the spectral values "a"," b", "c", "d", which represent the most-significant bit planes of the tuple of spectral values, are derived from the array "dgvectors[]", wherein the entries of the array determining the spectral values "a", "b", "c", "d" are selected in accordance with the group index ng and the element index ne (if available).
4.8. Less-Significant Bit-Plane Determination (Fig. 5h)
The remaining bit planes are then decoded from the most-significant to the lowest significant level by calling lev times the function "arith_decode()" with the cumulative- frequencies-table "arith_cf_r[]". For this purpose, the input variable "cum_freq" of the function "arith_decode()" may be initialized to the starting address of the array "arith_cf_r[]". Also, the input variable cfl of the function "arith_decode()" may be initialized to an appropriate value representing the length of the table "arith_cf_r[]", which is in case of a tuple of dimension 4 equals to 16.
The function "arith_decode()" can return the variable r, which represents the binary values of one of the less-significant bit-plane of the decoded tuple. The decoded bit-plane r permits to refine the decoded 4-tuple according to the algorithm shown in Fig. 5h.
In other words, when "adding" a less-significant bit plane, the first spectral value "a" is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the least-significant bit (bit 0) of the value r is added as the new least-significant bit (which may be done using an OR operation). The second spectral value "b" is multiplied by 2, and the second bit (bit 1) of the value r is added as a least-significant value of the spectral value "b". The third spectral value "c" is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the third bit (bit 2) of the value r is added as the least-significant bit. The fourth spectral value d" is multiplied by 2 (or, equivalently, shifted to the left by one bit), and the fourth bit (bit 3) of the value r is added as a least-significant bit. For details, reference is made to the algorithm shown in Fig. 5h. 4.9. Context Update (Tig. 5ϊ)
Once the 4-tuple (a, b, c, d) is completely decoded, i.e. all of the less-significant bit planes have been added, the context tables q and qs are updated by calling the function "arith_update_context()". In the following, details regarding the function "arith_update_context()" will be described taking reference to Fig. 5i, which shows a pseudo-program code representation of said function.
The function "arith_update_context" receives, as input variables, the spectral values "a", "b", "c", "d" of the decoded 4-tuple, the index "i" of the 4-tuple to decode (or the decoded 4-tuple) and the number lg/4 of 4-tuples associated with the current audio frame.
The function "arith_update_context()" comprises a step 580a of copying the spectral values "a", "b", "c", "d", into the array q. For example, the entry "a" of the array q at the position (1, i) (also designated with "q[l][i].a") is set to take the first spectral value "a". The entry "b" of the array q at the position (1, i) is set to the second spectral value "b". The entry "c" of the array q at the position (1, i) is set to the third spectral value "c", and the entry "d" of the array q at the position (1, i) is set to the fourth spectral value "d". Accordingly, the spectral values "a", "b", "c", "d" are stored in the entries "a", "b", "c", "d" of the array q at the position (1 , i).
The function "arith_update_context()" also comprises a step 580b of setting the entry "v" of the array q at the position (1, i). If one of the spectral values "a", "b", "c", "d" of the currently-decoded tuple of spectral values is smaller than -4 or greater than or equal to 4, the entry "V of the array q at the position (1, i) is set to the value of 1024. Otherwise, i.e. if all the spectral values "a", "b", "c", "d" are within the range between -4 and +3, including the boundaries, the entry "v" of the array q at the position (1, i) is set to an entry of the table or array "egroups[]" at the position (4+a, 4+b, 4+c, 4+d). Accordingly, the entry "v" of the array q is typically set to a standard value of 1024, if one of the spectral values "a", "b", "c", "d" is comparatively large, thereby causing the function "arith_get_context()" to perform the process by 53Of during the decoding of an adjacent tuple of spectral values.
The function "arith_update_context()" also comprises a first mapping 580c, which is performed if the last tuple of spectral values of a current audio frame is decoded and the core mode is the linear-predictive-frequency-domain core mode (for the case of an audio coder switchable between a frequency-domain core mode and a linear-predictive- frequency-domain core mode. The function "arith_update_context()" also comprises a second mapping 58Od, which is performed if the last tuple of spectral values of the current audio frame is decoded and if the core mode is the frequency-domain core mode.
Regarding the first mapping 580c, reference is made to the pseudo-program code of Fig. 5i. Regarding the second mapping 580d, reference is also made to Fig. 5i.
4.10 Summary of the Decoding Process
In the following, the decoding process will be briefly summarized.
4-tuples of quantized spectral coefficients are noiselessly coded and transmitted starting from the lowest-frequency coefficient and progressing to the highest-frequency coefficient.
The coefficients from the advanced-audio-coding are stored in an array
"x_ac_quant[g][win][sfb][bin]" and the order of transmission of the noiseless coding codewords is such that when they are decoded in the order received and stored in the array,
"bin" is the most rapidly incrementing index and "g" is the most slowly incrementing index. Within a codeword, the order of decoding is a, b, c, d.
Coefficients from the transform-coded-excitation are stored directly in an array "x_tcx_invquant[win][bin]", and the order of the transmission of the noiseless coding codewords is such that when they are decoded in the order received and stored in the array, "bin" is the most rapidly incrementing index and "win" is the most slowly incrementing index. Within a codeword, the order of decoding is a, b, c, d.
First, the flag "arith_reset_fiag" determines if the context must be reset. If the flag is TRUE, the function "arith_reset_context", a pseudo-program-code representation of which is shown in Fig. 5a, is called. Otherwise, when the flag "arith_reset_flag" is FALSE, a mapping is done between the past context and the current context in accordance with the function "arith_map_context()", a pseudo-program-code representation of which is shown in Fig. 5b.
The noiseless decoder outputs 4-tuples of signed quantized spectral coefficients. At first, the state of the context is calculated based on the four previously decoded groups surrounding the 4-tuple to decode. The state of the context is given by the function "arith_get_context()", a pseudo-program-code representation of which is shown in Fig. 5c. Once the state is known, the group to which belongs the most-significant signed 2-bits wise plane of the 4-tuple is decoded using the function "arith_decode()", fed with the appropriated cumulative-frequencies-table corresponding to the context state.
The correspondence is made by the function "arith_get_pk()", a pseudo-program-code representation of which is shown in Fig. 5d.
Then, to determine the group index ng, the function "arith_decode()" is called with the cumulative-frequencies-table, "arith_cf_ng[pki][], corresponding to the index returned by the function "arith_get_pk()". The arithmetic coder (or decoder) is an integer implementation using the method of tag generation with scaling. For details, reference is made, for example, to the book "Introduction to Data Compression" of K. Sayood, third edition, 2006, Elsevier Inc. The pseudo-C-code of Fig. 5e describes the used algorithm of the function "arith_decode()".
While the decoded group index ng is the escape symbol, ARITH_ESCAPE, an additional group index ng is decoded and the variable lev is incremented by 2. The state of the context is also adjusted. Once the decoded group index is not the escape symbol, ARITH_ESCAPE, the number of elements, mm, within the group and the group offset, og, are deduced by looking up to the table dgroups[], in accordance with the algorithm shown in Fig. 5f.
The element index ne is then decoded by calling the function "arith_decode()" with the cumulative-frequencies-table (arith_cf_ne+((mm*(mm-l)) »1)[]. Once the element index is decoded, the most-significant 2-bits wise plane of the 4-tuple can be derived with the table "dgvector[]" in accordance with the algorithm shown in Fig. 5g.
The remaining bit planes are then decoded from the most-significant to the lowest- significant level by calling lev times the function "arith_decode()" with the cumulative- frequencies-table "arith_cf_r[]".
The decoded bit plane r permits to refine the decoded 4-tuple by the algorithm shown in Fig. 5h.
Once the 4-tuple (a, b, c, d) is completely decoded, the context tables q and qs are updated by calling the function "arith_update_context()", which is shown in Fig. 5i. A legend of definitions is shown in Fig. 5j.
5. Mapping tables
In an embodiment according to the invention, particularly advantageous tables "arith_cf_ng_hash" and "arith_cf_ng" are used for the execution of the function "arith_get_pk", which has been discussed with reference to Fig. 5d, and for the execution of the function "arith_decode", which has been discussed with reference to Fig. 5e.
5.1. Table "arith cf ng hashH"
A content of a particularly advantageous implementation of the table "arith_cf_ng_hash[]", which takes the role of the table "arithjpk_hash[]", is shown in the table of Fig. 14. It should be noted here that the table of Fig. 14 lists the entries of the table "arith_cf_ng_hash[]. Said entries are referenced by a one-dimensional integer-type entry index (also designated as "element index" or "array index"), which is, for example, designated with "i". As can be seen, a first column 1410, which is an index column, describes starting indices associated to the respective lines 1412a to 1412p. A first value column 1420 shows the values of entries of the table "arith_cf_ng_hash[]" for entry indices identical to the start index shown in the index column 1310. A second value column 1422 shows entries of the table "arith_cf_ng_hash" for entry indices which are large, by one, compared to the start indices shown in column 1410 of the respective line. A third value column 1424 describes entries of the table "arith_cf_ng_hash", for which the element index is larger, by two, than the starting index shown in column 1410 of the corresponding line. Similarly, columns 1426, 1428, 1430, 1432, 1434 show entries of the table "arith_cf_ng_hash", for which the element index is larger, by 3 (column 1526) , by 4 (column 1428), by 5 (column 1430), by 6 (column 1432) or by 7 (column 1434) than the starting index shown in column 1410 of the respective line 1412a to 1412p.
For example, the line 1412a shows, in the columns 1420 to 1434, entries of the table "arith_cf_ng_hash" having element indices 0, 1, 2, 3, 4, 5, 6 and 7. Similarly, the line 1412b shows, in the columns 1420 to 1434, a sequence of entries of the table "arith_cf_ng_hash" having element indices of 8, 9, 10, 11, 12, 13, 14 and 15. For the other lines 1412c to 1412p, the arrangement of the entries discussed above applies analogously.
5.2. Table "arith cf ne" Figs. 15 (1) to 15 (10) show table presentations of entries of the table "arith_cf_ne", which is evaluated by the function "arith_decode()" when decoding the element index ne.
As can be seen, the entries of the table "arith_cf_ne[]" comprise indices between 0 and 2699. A starting index column 1510 describes starting indices associated to the respective lines. A first line is designated, for example, with 1512a, and a second line is designated, for example, with 1512b.
A first entry column 1520 shows entries of the table "arith_cf_ne[]", the entry index of which is determined by the value given in the starting index column 1510 of the corresponding lines (e.g. lines 1512, 1512b). Entry columns 1522, 1524, 1526, 1528, 1530,
1532, 1534 show entries of the table "arith_cf_ne[]" having element indices which are larger, by one (column 1522), by 2 (columnl524), by 3 (column 1526), by 4 (column
1528), by 5 (column 1530), by 6 (column 1532) or by 7 (column 1534) than the starting index shown in the starting index column 1510 of the respective line (e.g. line 1512a or
1512b). Accordingly, the first line 1512a shows, in columns 1520 to 1534, elements of the table "arith_cf_ne[]" having entry indices between 0 and 7. The second line 1512b shows, in columns 1520 to 1534, entries of the table "arith_cf_ne[]" having entry indices between
8 and 15. The above discussed rules also apply to the additional lines of the table "arith_cf_ne[]".
5.3. Table "arith cf ngrpki1F5451
Figs. 16(1) to 16(32) show a set of 32 cumulative-frequencies-tables "arith_cf_ng[pki][545], one of which is selected by an audio encoder 100 or audio decoder 200, for example, for the execution of the function "arith_decode()", i.e. for the decoding of the group index ng. The selected one of the 32 cumulative-frequencies-tables shown in Fig. 16(1) to 16(32) takes the function of the table "cum_freq[]" in the execution of the function "arith_decode()".
As can be seen in Figs. 16(1) to Fig. 16(32), different values of the variable pki are associated to the different tables. A cumulative-frequencies-table index pki=0 is associated to a table 1601, a cumulative-frequencies-table index pki=l is associated to table 1602, and cumulative-frequencies-table indices pki=2, pki=3, ... pki=31 are associated with tables 1603 to 1632.
The structure of the tables 1601 to 1632 associated with the table indices pki=0 to pki=32 is identical, such that only the structure of the representation of the table 1601 associated with the table index pki=0 will be discussed in detail. The table 1601 comprises an index column 1640, which shows starting indices associated with respective lines of the table 1601. A first line is designated to 1642a, and a second line is designated with 1642b.
A first entry column 1650 represents entries of the table 1601, the entry indices of which are identical to the starting indices shown in the index column 1640 of the respective lines. Similarly, entry columns 1651 to 1665 show entries of the table 1601 (associated with the table index pki=0) having entry indices which are larger, by 1 (col. 1651), by 2 (col. 1652), by 3 (1653), by 4 (col. 1654), by 5 (col. 1655), by 6 (col. 1656), by 7 (col. 1657), by 8 (col. 1658), by 9 (col. 1659), by 10 (col. 1660), by 11 (col. 1661), by 12 (col. 1662), by 13 (1663), by 14 (col. 1664) or by 15 (col. 1665) than the starting indices, shown in the index column 1640 of the respective lines. Accordingly, the first line 1632a shows the values of the entries of the table 1601 having element indices between 0 (value 16684) and 15 (value 6352). The second line 1642b shows values of entries of the table 1601 having element indices between 16 (value 6202) and 31 (value 3547).
Naturally, the above rules apply to the other lines as well. Also, the arrangement of the element entries in the tables 16(2) to 16(32) is identical to the arrangement of the entries of the table 1601.
5.4. Table "dgroupsH"
Figs. 17(1) and 17(2) show a representation of the entries of a table "dgroups[]", which may be applied by the audio encoder 100 and the audio encoder 200. For example, the table "dgroups[]" may be applied in the execution of the algorithm "tuples_decode()", which is shown in Fig. 3. Also, the table "dgroups[]" may be applied by the algorithm shown in Fig. 5f in order to determine the number mm of elements in a group and the group offset og of a group designated by the group index ng.
The representation of the table "dgroupsf]" comprises an index column 1710, which shows start indices associated with respective lines, e.g. a first line 1712a and a second line 1712b of the table representation. A first entry column 1720 of the representation shows entries of the table "dgroups[]'\ the element indices of which are identical to the starting indices shown in the index column 1710 of the respective lines. Similarly, entry columns 1722, 1724, 1726, 1728, 1730, 1732, 1734 show entries of the table "dgroupsf]", the element indices which are larger by 1 (col. 1722), by 2 (col. 1724), by 3 (col. 1726), by 4 (col. 1728), by 5 (col. 1730), by 6 (col. 1732) or by 7 (col. 1734) than the starting indices shown in the index column 1710 of the respective lines. For example, the first line 1712a shows, in entry columns 1722 to 1734, entries of the table "dgroupsf]" having element indices between 0 (col. 1720) and 7 (col. 1734). Similarly, the second line 1712b shows, in columns 1720 to 1734, entries of the table "dgroupsf]" having element indices between 8 (col. 1720) and 15 (col. 1734). It should be noted that the entries of the table "dgroups[]" are shown in a hexadecimal notation in Figs. 17 (1) and 17 (2), which is indicated by the prefix "Ox". The most-significant hexadecimal digit is shown on the left side and the least- significant hexadecimal digit is shown on the right side.
5.5. Table "dgvectorsH"
Figs. 18(1) to 18(11) show a table representation of the entries of the table "dgvectors[]". The table dgvectorsf] may for example be used in the audio encoder 100 or the audio decoder 200. For example, the table "dgvectorsf]" may be used in the step 312d of the function "tuples_decode()" shown in Fig. 3 or in the execution of the algorithm of Fig. 5g. Accordingly, the table "dgvectorsf]" may be used to map a group index and an element index onto values of a most-significant bit-plane of a tuple of spectral values.
The table representation of Figs. 18(1) to 18(11) comprises an index column 1810, which comprises starting indices associated with the lines (for example a first line 1812a or a second line 1812b) of the table representation. Entry columns 1820 to 1882 show entries of the table "dgvectorsf]". For example, the entry column 1820 shows entries of the table "dgvectorsf]", entry indices of which are identical to a corresponding starting index shown in the index column 1810 of the respective lines. Subsequent entry columns 1822 to 1882 show, in ascending order, entries of the table "dgvectorsf]" which are larger, by 1 (col. 1822), by 2 (col. 1824), by 3 (col. 1826), by 4, by 5, by 6, by 7, by 8, by 9, by 10, by 11, ...by 29 (col. 1878), by 30 (col. 1880) or by 31 (col. 1882) than the starting index values shown in the index column 1810 of the respective lines. Accordingly, the first line 1812a shows, in columns 1820 to 1882, entries of the table "dgvectorsf]" having element indices between 0 and 31, wherein the element indices associated with the entries increase monotonically from the left to the right. Similarly, the second line 1812b shows, in columns 1820 to 1882, entries of the table "dgvectorsf]" having element indices between
32 and 63 (with the element indices increasing from the left to the right).
5.6. Table "eeroups" Figs. 19(1) to 19(32) show a table representation of a table "egroups[a][d][c][d]", which may also be considered as a 4-dimensional array having four element indices a, b, c, d. It should be noted that each of the element indices a, b, c, d may take values between 0 and 7. The table or array "egroups[a][b][c][d]" may be used in the audio encoder 100 or in the audio decoder 200. For example, the table or array "egroups[a][b][c][d]" may be used in the function "arith_get_context()" to derive the return value and in the function "arith_update_context()" to determine the entry "v" of the array q at entry index (1, 1+i).
The entries of the array "egroups" are shown in 64 tables 1901 to 1964. Different combinations of index values a and b are associated with each of the tables 1901 to 1964. For example, the combination a=0 and b=0 is associated with the first table 1901, and the combination a=0, b=l is associated to the second table 1902.
It should be noted here that the structure of different tables 1901 to 1964 is identical, such that only the structure of the first table 1901 will be discussed here. The table 1901 comprises an index column 1970 representing values of the third index c associated with the respective lines 1972a to 1972h. Similarly, an index line 1980 represents index values of the fourth index d associated with respective columns 1982a to 1982h of the table 1901. Accordingly, the indices a and b associated to an entry of the tables 1901 to 1964 are written next to the respective table. The index c associated with an entry of one of the tables 1901 to 1964 is determined by the value of the index column of the respective line, and the fourth index d of an entry is determined by the value of the index line of the respective column of the entry. For example, the line 1972 of the table 1901 represents entries of the array "egroups[a][b][c][d]" for the indices a=0, b=0, c=0 and d=(0 to 7) (from the left to the right). Also, the column 1982a represents entries of the array "egroups [a] [b][c][d]" with indices a=0, b=0, c=(0 to 7) and d=0, from top to the bottom. Also, it should be noted that the entries of the array "egroups" are represented in hexadecimal notation, which is indicated by the prefix "Ox".
6. Performance Evaluation and Advantages
The embodiments according to the invention use an updated set of tables, as discussed above, which reduces significantly the memory requirements for the spectral noiseless coding when compared to a previously-used set of tables. A lossless transcoding is possible within the bitrate constraints.
In the following, the modification of a previously-used noiseless coding underlying the inventive concept will be discussed. Audio coding concepts, like, for example, the so-called unified-speech-and-audio-coding (USAC) use a context-adapted arithmetic coder (and decoder) for noiselessly (or losslessly) coding the quantized spectral coefficients (for example the spectral coefficients 252). The context adaptation associated with an arithmetic coder (encoder or decoder) permits to achieve high noiseless coding performances. The main drawback of this technology comes from its relatively high complexity in terms of a memory requirement. Indeed, the context adaptation requires a significantly large set of tables modeling different probability distributions. The read-only-memory (ROM) consumption of the previously- used unified-speech-and-audio-coding (USAC) is evaluated to be about 150 kWords, wherein the entropy coder represents around 73 percent of the total requirement.
One of the aims of the present contribution is to propose a new set of tables for the arithmetic coder (or decoder) requiring significantly less memory space while maintaining the original performance of the noiseless coder (encoder or decoder).
In the following, the current status of the previously-implemented unified-speech-and- audio-coding will be described. Fig. 7 shows a table listing the detailed memory requirements of a previously-implemented USAC noiseless coder (encoder or decoder).
It is easily observable from the table of Fig. 7 that the most demanding tables are, by far, the tables "arith_cf_ng_hash[]" and "arith_cf_ng[][]" related to the context-adaptation of the group symbol coding. It is worth notifying that the tables "arith_cf_ne[]" and "arith_cf_r[]" gathering the cumulative frequencies of the element index symbols and the remaining bit plane symbols can easily be algebraically recovered and do not necessarily need to be stored.
In the following, some details regarding the proposed new set of reduced tables will be described. In accordance with the present invention, it is proposed to replace the previously-used tables "arith_cf_ng_hash[]" and "arith_cf_ng[][]" by the new set of tables given in Figs. 14 and 16(1) to 16(32). The new tables "arith_cf_ng_hash[]" and "arith_cf_ng[pki][545]" exhibit reduced sizes compared to the previously used tables (for example, tables used in the USAC reference model 0) and are subsequently called "reduced tables". The memory requirements of the noiseless coder (encoder or decoder) using the reduced tables are listed in detail in the table of Fig. 8.
The new set of tables shows a size reduction factor of about 7 compared to the original ones. The reduction was achieved by reducing the number of probability distribution models and by optimizing the selected models. Further, the mapping between the context state and the newly-defined models was optimized.
In the following, a performance evaluation in terms of bitrates will be given. In particular, a lossless transcoding between a bitstream, which was generated using the previously applied "large" tables, and a transcoded bitstream, which is provided in accordance with the proposed "smaller" tables will be discussed. The new set of tables was proven to be able to be transcode transparently a bitstream generated with the previously-used tables, which are also designated as "reference model 0 tables" or "RMO tables". The transcoding was performed using the transcoding scheme described in Fig. 9.
Fig. 9 shows a block schematic diagram of a lossless transcoding from the "reference model 0 tables" to the "reduced tables". As can be seen from Fig. 9, the evaluation setup comprises a USAC RMO encoder 910, which receives, as an input information, spectral values 908, and which provides an arithmetically encoded representation 912 of the spectral values using the "old" tables of the reference model 0. Accordingly, a so-called "RMO" bitstream is obtained, which comprises the arithmetically-encoded spectral values 912. The evaluation setup 900 also comprises lossless transcoding 920, in which the arithmetically-encoded spectral values 912 of the RMO bitstream are decoded by an entropy decoder using the "old" tables (reference model 0 tables) to obtain decoded spectral values 924. The lossless transcoding also comprises an entropy encoder, which is configured to encode the decoded spectral values 924 using the "new" reduced tables in order to obtain a reduced-tables bitstream 928. Subsequently, the RMO bitstream, which comprises the representation of the spectral values encoded using the "old" tables, is compared with the reduced-tables bitstream 928, which comprises the representation of the spectral values encoded using the "new" tables.
For verification purposes, the so-called RMO bitstream is decoded using a USAC reference decoder and using the "old" tables to obtain a so-called RMO synthesis result 942. Also, the so-called reduced-table bitstream 928 is decoded using a USAC reference decoder and using the "new" tables. Accordingly, a reduced-tables synthesis result 952 is obtained. The RMO synthesis result 942 is subsequently compared with the reduced-table synthesis result 952 to verify the correctness of the implementation.
In the following, some results of the analysis will be described. The tables of Figs. 10 and 11 display minimum, maximum and average bitrates over all sub-segments of the entire, concatenated, encoded item with the RMO tables and the reduced table respectively. For each operating mode, an individual sub-segment length was determined by a combination of succeeding access-units, whose length is closest to 100 ms. Sub-segment lengths and according bitrates are specified in the two tables. As mentioned earlier, the lossless transcoding from the RMO tables bitstream to the reduced tables bitstream is achieved for each operating mode, i.e. the bit reservoir conditions were not violated while obtaining a bit-exact synthesis. The table of Fig.12 compares the bitrates generated only by the core coder when using the RMO tables and the reduced tables. The reduced tables perform on average slightly better than the RMO tables for each operating mode except at 64 kbit per second (kbps) stereo, where the average increase of bits is only 0.02% of 64 kbps, which corresponds to approximately 0.5 bits/frame. Nevertheless, the bitstream generated with the reduced tables still matches the bitrate requirement at this operating mode.
The table of Fig. 13 shows, for each of the operating modes, the worse and best cases of the difference of gathered bits after and before transcoding the RMO tables bitstream to the reduced tables bitstream. The difference is looked on a sub-segment basis. It can be observed in all the cases that the performance of the reduced tables is very consistent and very stable. For a sub-segment, the maximum increase of bits by replacing the RMO tables with the reduced tables is below 6 % of the total bitrate. On the other hand, the decrease of bits can reach more than 6 %.
To summarize the above, new tables for the USAC spectral noiseless coding have been proposed, the sizes of which are significantly reduced while maintaining the high coding performance of the spectral noiseless coding module. The achieved size reduction is of about a ratio of 7 or more than 90 kWords. The proposed new set of tables permits to significantly reduce the memory requirements and therefore lowers the implementation complexity. A bit exactness is maintained for each operating mode when comparing the synthesized output wave forms.
The above-mentioned advantages are reached by improving a specific part of the arithmetic coding. In some embodiments, a new hash table, arith_cf_ng_hash[128], a new set of 32 probability models, arith_cf_ng[32][545], and the corresponding mapping function "arith_get_pk()" are used. The update of the arithmetic coder tables (when compared to previously-used arithmetic coder tables) changes the mapping of the state index s into the probability model index pki as well as the probability models themselves. It does not change neither the computation of the state index s, nor the way the probability model is used afterwards for coding the current symbol (i.e. the group index ng of the present 4-tuple). The main advantage of the new tables is to reduce the memory requirements for storing the tables, which have now a size of about 15 kWords (i.e. 15*1024 words of 32 bits each) instead of about 110 kWords, while maintaining the coding efficiency.
One of the important aspects, which brings along the above-discussed improvements, is the inventive mapping between state-index and probability-model index. This mapping is done by calling the function arith_get_pk() with the state variable t as an input argument.
The return value is the probability model index pki, which is used by the arithmetic coder as the probability distribution (also called cumulative-frequencies) for coding the current symbol (or to select the appropriate cumulative-frequencies table out of a set of cumulative-frequencies-tables).
The mapping is done in two steps, which are done in two different parts of the function.
In the first part 54Od (of the function "arith_get_pk()"), the hash table "arith_cf_ng_hash" (also designated with ari_pk_hash[]) is considered and used for checking, if the current state t is a significant state. A significant state is a state, which was previously selected during a training phase to have its "own" mapping into pki. The non-significant states are mapped later on into pki in the second part 55Oe of the function arith_get_pk() by using default mappings (also designated as range-based mapping). According to the invention, the number of significant states was reduced to 67, which is drastically lower than the previous 22955 significant states from the old table. By reducing the number of significant states, the size of the table, "arith_cf_ng_hash[]" (also designated as ari_pk_hash[]), is proportionally reduced. It can also be expected that the performance will be affected. However, the performance was maintained by assuring that the training was done correctly. The training was performed for both AAC and TCX in a switching coding structure. Such precautions were not taken when generating the old tables and the retained 67 states are the most useful states for a non-switching-audio coder and also for a switching audio coder.
The following code permits to detect if the present state t is a significant state. If yes, the function is finished (because the return-statement is reached) in returns the associated probability model index pki :
i=63*t; for(;;) j=ari_pk_hash[i& 127]; if(j==OxFFFFFFFFul) break; ff ( ϋ»8)==t) return j&255; /++;
;
The coefficient 63 in the first line of the above code (see also reference numeral 540c) determines the starting point in the search of the hash table. It is fixed during the training phase by using an exhaustive global optimization. Inside the hash table, the state index t is coded on the 24 "last" bits of the entries (for example the 24 leftmost or most-significant bits of the entries). The "first" bits (for example the eight rightmost or least-significant bits) correspond to the associated probability index. In a straightforward implementation, the table would contain only 67 entries. However, for minimizing the number of access to the table, an escape mechanism is used. Indeed, 128-67=61 escape symbols of value OxFFFFFFFFul are inserted and permit to reduce the number of accesses to the table when a state is insignificant. In this case, when such a symbol is met, the code jumps directly into a second part 54Oe of the function "arith_get_pk()" (because the "break" statement is reached in this case).
The second part of 54Oe of the mapping function is used for mapping the non-significant states. The states are divided into seven non-uniform sections according to the indices. Each section is associated to a probability model. The mapping is recorded in the lookup table psci[]. The bits beyond the 22th bit of t indicate the accuracy of the level prediction. According to the prediction accuracy, a different mapping is used. In total, seven sections and four accuracies are considered, which correspond to a total of 28 different mappings stored in psci[]. The mapping is done as follows:
p=psci+7*(t»22); j= t & 4194303; if (j<436961 )
{ if (j<252001 ) return p[(j<243001)?0:1]; else return p[(j<288993)?2:3]; ; else
{ if (j<1609865) return p[(j<880865)?4:5J; else return p[6]; ;
Finally, the probability models were reduced from 128 to 32. It was found that a lot of the models previously used were only seldom selected and not really useful. An appropriate training permits to select only 32 representative models.
To summarize the above, the embodiments according to the invention are related to the above-discussed mapping tables, which are implemented in an audio encoder and an audio decoder. The advantage of the new tables partly comes from the adequate training, which was performed. Further, the invention is based on the considerations about the best dimension of the tables.
7. Bitstream Syntax
In the following, the bitstream syntax of a bitstream carrying the arithmetically-encoded spectral information will be described taking reference to Figs. 6a to 6h.
Fig. 6a shows a syntax representation of s so-called USAC raw data block ("usac_raw_data_block()") .
The USAC raw data block comprises one or more single channel elements ("single_channel_element()") and/or one or more channel pair elements ("channel_pair_element()") .
Taking reference now to Fig. 6b, the syntax of a single channel element is described. The single channel element comprises a linear-prediction-domain channel stream ("lpd_channel_stream ()") or a frequency-domain channel stream ("fd_channel_stream ()") in dependence on the core mode.
Fig. 6c shows a syntax representation of a channel pair element. A channel pair element comprises core mode information ("corejtnodeO", "core_model"). In addition, the channel pair element may comprise a configuration information "ics_info()". Additionally, depending on the core mode information, the channel pair element comprises a linear- prediction-domain channel stream or a frequency-domain channel stream associated with a first of the channels, and the channel pair element also comprises a linear-prediction- domain channel stream or a frequency-domain channel stream associated with a second of the channels. The configuration information "ics_info()", a syntax representation of which is shown in Fig. 6d, comprises a plurality of different configuration information items, which are not of particular relevance for the present invention.
A frequency-domain channel stream ("fd_channel_stream ()"), a syntax representation of which is shown in Fig, 6e, comprises a gain information ("global_gain") and a configuration information ("ics_info ()"). In addition, the frequency-domain channel stream comprises scale factor data ("scale_factor_data ()"), which describes scale factors used for the scaling of spectral values of different scale factor bands, and which is applied, for example, by the sealer 150 and the rescaler 240. The frequency-domain channel stream also comprises arithmetically-coded spectral data ("ac_spectral_data ()"), which represents arithmetically-encoded spectral values.
The arithmetically-coded spectral data ("ac_spectral_data()"), a syntax representation of which is shown in Fig. 6f, comprises an optional arithmetic reset flag ("arith_reset_flag"), which is used for selectively resetting the context, as described above. In addition, the arithmetically-coded spectral data comprise a plurality of arithmetic-data blocks
("arith_data"), which carry the arithmetically-coded spectral values. The structure of the arithmetically-coded data blocks depends on the number of frequency bands (represented by the variable "num_bands") and also on the state of the arithmetic reset flag, as will be discussed in the following.
The structure of the arithmetically-encoded data block will be described taking reference to Fig. 6g, which shows a syntax representation of said arithmetically-coded data blocks. The data representation within the arithmetically-coded data block depends on the number Ig of spectral values to be encoded, the status of the arithmetic reset flag and also on the context, i.e. the previously-encoded spectral values.
The context for the encoding of the current set of spectral values is determined in accordance with the context determination algorithm shown at reference numeral 660. The arithmetically-encoded data block comprises lg/4 sets of codewords, each set of codewords representing a tuple of spectral values. A set of codewords comprises an arithmetic codeword "acod_ng [pki][ng]" representing a group index ng of the tuple of spectral values using between 1 and 20 bits. A set of codewords also comprises an arithmetic codeword "acod_ne[ne]" representing an element index ne of a tuple of spectral values if the group comprising the tuple of spectral values includes more than one element. In addition, the set of codewords comprises one or more codewords "acod_r [][][][]" if the tuple of spectral values requires more bit planes than the most-significant bit plane for a correct representation. The codeword "acod_ne [ne]" represents the element index using between 1 and 20 bits and the codeword "acod_r [][][][]" represents a less-significant bit plane using between 1 and 20 bits.
If, however, one or more less-significant bit-planes are required (in addition to the most- significant bit plane) for a proper representation of the tuple of spectral values, this is signaled by using one or more arithmetic escape codewords ("ARITHJESCAPE"). Thus, it can be generally said that for a tuple of spectral values, it is determined how many bit planes (the most-significant bit plane and, possibly, one or more additional less-significant bit planes) are required. If one or more less-significant bit planes are required, this is signaled by one or more arithmetic escape codewords "acod_ng [pki][ARITH_ESCAPE]", which are encoded in accordance with a currently-selected cumulative-frequencies-table, a cumulative-frequencies-table-index of which is given by the variable pki. In addition, the context is adapted, as can be seen at reference numeral 664, 662, if one or more arithmetic escape codewords are included in the bitstream. Following the one or more arithmetic escape codewords, an arithmetic codeword "acod_ng [pki][ng]" is included in the bitstream, as shown at reference numeral 663, wherein pki designates the currently- valid probability model index (taking into consideration the context adaptation caused by the inclusion of the arithmetic escape codewords), and wherein ng designates the group index associated with the most-significant bit plane of the tuple of spectral values to be encoded. The group index can be derived, in an encoder, by evaluating the table dg vectors, which allows for deriving the group index ng and an element index ne associated to a tuple of spectral values when taken in combination with the table "dgroups".
If the group, which includes the tuple of spectral values to be encoded, comprises more than one element, the arithmetically-encoded data block comprises the codeword "acod_ne[ng]" encoding the element index ne using an appropriately-selected cumulative- frequencies-table.
As discussed above, the presence of any less-significant-bit planes results in the presence of one or more codewords "acod_r [][][][]", each of which represents a tuple of 4 bits of a least-significant bit plane. The one or more codewords "acod_r[] [][][]" are encoded in accordance with a corresponding cumulative-frequencies-table, which is constant and context-independent.
In addition, it should be noted that the context is updated after the encoding of each tuple of spectral values, as shown at reference numeral 668, such that the context is typically different for encoding of two subsequent tuples of spectral values. Fig. 6h shows a legend of definitions and help elements defining the syntax of the arithmetically-encoded data block.
To summarize the above, a bitstream format has been described, which may be provided by the audio coder 100, and which may be evaluated by the audio decoder 200. The bitstream of the arithmetically-encoded spectral values is encoded such that it fits the decoding algorithm discussed above.
In addition, it should be generally noted that the encoding is the inverse operation of the decoding, such that it can generally be assumed that the encoder performs a table lookup using the above-discussed tables, which is approximately inverse to the table lookup performed by the decoder. Generally, it can be said that a man skilled in the art who knows the decoding algorithm and/or the desired bitstream syntax will easily be able to design an arithmetic encoder, which provides the data defined in the bitstream syntax and required by the arithmetic decoder.
8. Decoding Method
In the following, a method for providing a decoded audio information on the basis of an encoded audio information will be described taking reference to Fig. 20. The method 2000 comprises a first step 2010 of providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values. The method 2000 further comprises a second step 2020 of providing a time-domain audio representation using the decoded spectral values. The step 2010 comprises selecting 2012 a cumulative- frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index. The step 2010 also comprises a sub-step 2014 of applying the selected cumulative- frequencies-table to derive a group-index from a variable-length-codeword representing the group index. The step 2010 also comprises the sub-step 2016 of deriving values of a most-significant bit-plane of a tuple of spectral values using the group index and the element index, the element index designating an element within a group selected by the group index. The step 2010 also comprises a sub-step 2018 of providing a tuple of decoded spectral values using the values of the most-significant bit plane of the tuple of spectral values.
9. Encoding Method In the following, a method for providing an encoded audio information on the basis of an input audio information will be discussed taking reference to Fig. 21. The method 2100 of Fig. 21 comprises a first step of providing a frequency-domain audio representation on the basis of a time-domain audio representation of the input audio information, such that the frequency-domain audio representation comprises a set of spectral values, and such that an energy is compacted in a sub-set of the spectral values. The method 2100 also comprises a second step 2120 of encoding a tuple of adjacent spectral values of the set of spectral values, or of encoding a tuple of adjacent spectral values of a pre-processed version of the set of spectral values. The step 2120 comprises a sub-step 2122 of mapping values of a most-significant bit plane of a tuple of spectral values onto a group index and an element index, the element index designating an element within a group selected by the group index. The step 2120 also comprises a sub-step 2124 of selecting a cumulative- frequencies-table out of a set of 32 cumulative-frequencies-tables in dependence on a state index describing a state of the arithmetic encoder. The step 2120 also comprises a sub-step 2126 of arithmetically encoding the group index using the selected cumulative- frequencies-table (selected in the sub-step 2124) in order to obtain an arithmetically- encoded variable-length codeword.
It should be noted here that the methods 2000 and 2100 of Figs. 20 and 21 can be supplemented by any of the features and functionalities described herein with respect to the inventive coding scheme. In addition, the methods 2000 and 2100 can be supplemented by any of the features and functionalities of the apparatus described herein. Also, the coding tables described herein are preferable used in connection with the encoding method and decoding method.
10. Implementation Alternatives
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus. The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet. A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
While the foregoing has been particularly shown and described with reference to particular embodiments above, it will be understood by those skilled in the art that various other changes in the forms and details may be made without departing from the sprit and cope thereof. It is to be understood that various changes may be made in adapting to different embodiments without departing from the broader concept disclosed herein and comprehended by the claims that follow.

Claims

Claims
1. An audio decoder (200) for providing a decoded audio information (212) on the basis of an encoded audio information (210), the audio decoder comprising:
an arithmetic decoder (230) for providing a plurality of decoded spectral values (232; a,b,c,d) on the basis of an arithmetically-encoded representation (222; acod_ng[pki][ng], acod_ne[ne], acod_r[] [][][]) of the spectral values; and
a frequency-domain to time-domain converter (260) for providing a time-domain audio representation (262, 212) using the decoded spectral values (232; a,b,c,d), in order to obtain the decoded audio information (262, 212);
wherein the arithmetic decoder (230) is configured to derive a group index (ng) from a variable-length-code word (acod_ng[pki][ng]) representing the group index (ng) in dependence on a state index (s);
wherein the arithmetic decoder (230) is configured to derive values (a, b, c, d) of a most-significant bit-plane of a tuple of spectral values using the group index (ng) and an element index (ne), the element index (ne) describing an element within a group selected by the group index;
wherein the arithmetic decoder (230) is configured to provide a tuple of decoded spectral values using the values of the most-significant bit-plane of the tuple of spectral values; and
wherein the arithmetic decoder is configured to select a cumulative-frequencies- table (arith_cf_ng[pki][545]) out of a set of 32 cumulative-frequencies-tables (arith_cf_ng[0][545] to arith_cf_ng[31][545]) in dependence on the state-index (t), and to apply the selected cumulative-frequencies-table to derive the group-index (ng) from the variable-length codeword (acod_ng[pki][ng]) representing the group- index (ng).
2. The audio decoder (200) according to claim 1,
wherein the arithmetic decoder is configured to derive a 7-bit hash-table index value (i & 127) from the state-index (t), and to obtain a hash-table value (j) from a hash-table (ari_pk_hash[]; arith_cf_ng_hash[]), which hash-table comprises a mapping of 128 hash-table-index- values onto corresponding hash-table-entry- values, and
wherein the arithmetic decoder is configured to decide whether the hash-table value
(j) associated to the hash-table index- value derived from state-index (t) is an escape value (OXFFFFFFFF), a valid cumulative-frequencies-table-identifier-value associated to the state-index or an invalid cumulative-frequencies-table-identifier- value in conflict with the state-index, and to derive the cumulative frequencies-table index- value (pki) from the hash-table- value, if the hash-table-value associated to the hash-table index-value derived from state-index (t) is a valid cumulative frequencies-table-identifier-value associated to the state index (t), and to provide a cumulative-frequencies-table index-value (pki) in dependence on an identification of an interval of values in which the state index (t) is contained, if the hash-table- value (j) ) associated to the hash-table index- value derived from state-index (t) is the escape value; and
wherein the arithmetic decoder is configured to scan through the entries of the hash-table (ari_pk_hash[]; arith_cf_ng_hash[]), starting from the entry designated by the hash-table index- value (i & 127) derived from the state-index (t), until the hash-table value (j) associated to the hash-table index- value derived from state- index (t) is either an escape value (OxFFFFFFFF) or a valid cumulative- frequencies-table-identifier- value associated to the state-index, and
wherein the arithmetic decoder is configured to provide a cumulative-frequencies- table index- value (pki) in dependence on an identification of an interval of values in which the state index (t) is contained, if the hash-table-value reached when scanning through the entries of the hash-table is the escape value (OxFFFFFFFF), and to derive the cumulative-frequencies-table index- value (pki) from the hash- table-value reached when scanning through the entries of the hash-table, if the hash-table- value reached when scanning through the entries of the hash-table is a valid cumulative-frequencies-table identifier- value associated to the state index (t).
The audio decoder (200) according to claim 2, wherein the hash-table (ari_pk_hash[]; arith_cf_ng_hash[]) is configured to map 67 values of the 7-bit hash-table-index value (i & 107) onto valid cumulative-frequencies-table identifier- values, and to map 61 values of the 7-bit hash-table-index value (i & 107) onto the escape value (OxFFFFFFFF).
4. The audio decoder (200) according to claim 3, wherein the arithmetic decoder (230) is configured to map 67 different values of the state index (t) onto 26 different cumulative-frequencies-table index-values (pki), such that 26 different cumulative- frequencies-table index-values are associated to 67 different significant states described by the state index (t).
5. The audio decoder (200) according to one of claims 2-4, wherein the arithmetic decoder (230) is configured to map a plurality of non-significant states onto 9 different cumulative-frequencies-table index-values.
6. The audio decoder(200) according to one of claims 1-5,
wherein the arithmetic decoder (230) is configured to set
i = 63 n
and to iteratively perform the algorithm:
j=ari_pk_hash [ i&127 ] ; if ( j ==OxFFFFFFFFul ) break; if ( (j »8 ) ==t ) return j &255 ; i++;
until the first condition j = OxFFFFFFFF
or the second condition (j » 8) == t
is fulfilled,
wherein s designates the state index,
wherein i and j designate integer variables,
wherein ari_pk_hash[i & 128] designates an entry of index (i & 128) of a hash- table, wherein "&" designates a bit-wise logical AND operator,
wherein "» 8" designates a binary shift-to-the-right operation shifting by 8 bits,
wherein "==" designates a check for an identity condition,
wherein "++" designates an increase-by-one operator,
wherein "return j & 255" designates an operation of returning a value described by the 8 least significant bits of variable j as a cumulative-frequencies-table-index value (pki); and
wherein the arithmetic decoder is configured to perform the algorithm p=psci+7* (t»22) ; j= t & 4194303; if ( j<436961 ) { if ( j<252001 ) return p[ (j<243001)?0: 1]; else return p [ (j<288993) ?2 : 3] ; } else { if ( j<1609865 ) return p [ ( j<880865) ?4 : 5] ; else return p [6] ; }
in response to a break condition, wherein
Figure imgf000063_0001
24,5,25,26,27,28,29,30,5,30,30,30,30,31,5,5,
O^ O/ O^ O / O^ O^ O^ Of O/ Of O^ 0
}; defines an array of different cumulative frequencies-table-index-values having array indices of 0 to 27, and
wherein the operation p = psci + 7 -t » 22 sets the pointer p to an element having an element index which is determined by 7 times a value represented by the most significant bits of the state index t,
wherein an operation of the forms "return p[condition?x:y]" returns an entry of the array psci having an element index which is determined by a sum of 7 times a value represented by the most significant bits of the state index s and the value x, if the condition is fulfilled and returns an entry of the array psci having an element index which is determined by a sum of 7 times a value represented by the most significant bits of the state index s and the value y, if the condition is not fulfilled, and
when the hash-table ari_pk_hash is defined given in Fig. 14.
7. The audio decoder (200) according to one of claims 1 to 6,
wherein the arithmetic decoder (230) is configured to obtain (550a) a codeword- value (value) in dependence on a plurality of bits (acod_ng[pki][ng]) of the encoded audio information (222),
to obtain (550b) a relative position value (cum) describing a relative position of the codeword value (value) within a range between a lower range boundary value (low) and a higher range boundary value (high),
to determine (550c) in which interval (cum_freq[symbol], cum_freq[symbol-l]) out of a plurality of intervals defined by entries of the selected cumulative-frequencies- table (cum_freq, arith_cf_ng[pki][545]) the relative position value (cum) is included,
to provide (550d) a symbol information (symbol) in response to a result of determining, in which interval out of the plurality of intervals defined by the entries of the selected cumulative-frequencies-table (cum_freq, arith_cf_ng[pki][545]) the relative position value (cum) is included, and to update (55Oe) one or both of the range boundary values (high, low) in dependence on one or more of the entries (cum_freq[symbol-l], cum_freq[symbol]) of the selected cumulative-frequencies-table (cum_freq, arith_cf_ng[pki][545]) associated with the symbol information (symbol), and
to rescale (550fa) the range between the range boundary values (low, high), and
to update (550fb) the codeword value (value) using one or more additional bits (arith_get-next_bit) of the encoded audio information.
8. The audio decoder (200) according to one of claims 1-7, wherein the arithmetic decoder is configured to obtain the group-index (ng) on the basis of the encoded audio information (222) using a selected one (arith_cf_ng[pki][545]) of the 32 cumulative frequencies-tables (arith_cf_ng[0][545] to arith_cf_ng[31][545]), and using the following algorithm arith_decode(),
arith_decode () { if (arith_first_symbol ( ) ) { value = 0; for (i=l; i<=20; i++)
{ value = (valuβ«l) | arith_get_next_bit ( ) ; } low=0; high=1048575; }
range = high-low+1; cum =( ( ( (int64) (value-low+1) ) «16) - ( (int64) l))/((int64) range); p = cum_freq-l;
do { q=p+(cfl»l) ; if ( *q > cum ) { p=q; cfl++; } cfl»=l; } while ( cf 1>1 ) ;
symbol = p-cum_freq+l; if (symbol) high=low+ ( ( (int64) range) *( (int64) cum_f req[symbol-l] )) »16 - 1;
low += (((int64) range)* ((int64) cum_freq [symbol] )) »16;
for (;;)
{ if ( high<524286) { }
else if ( low>=524286)
{ value -=524286; low -=524286; high -=524286; }
else if ( low>=262143 && high<786429) { value -= 262143; low -= 262143; high -= 262143; }
else break;
low += low; high += high+1; value = (value«l) | arith_get_next_bit ( ) ; }
return symbol; wherein the arithmetic decoder is configured to initialize a codeword value "value" in dependence on 20 bits (acod_ng]pki][ng]) of the encoded audio information, if a helper function "arith_first_symbol()" indicates that a first symbol of a sequence of symbols is decoded;
wherein a helper function "arith_get_next_bit()" provides a next bit of the encoded audio information,
wherein an operator "«" designates a Boolean shift-to-the-left-operation to shift to the left bits of an operand preceding the operator "«" by a number of bits specified by an operand following the operator "«",
when the operator "|" designates a Boolean or-operation,
wherein the operator "(int64)" designates that a number type to be used for a representation of a following operand is a 64-bit integer number type,
wherein "cum_freq" is an address value of a first entry of a selected cumulative- frequencies-table (arith_cf_ng[pki][545]),
wherein "*q" designates a (q-cum_freq+l)-th entry of the selected cumulative frequencies-table (arith__cf_ng[pki][545]) having an entry index of q-cum_freq ,
wherein cfl is a variable initialized to a length of the selected cumulative- frequencies-table (arith_cf_ng[pki][545]) and modified during a processing of the algorithm "arith_decode()",
wherein "cum_freq[symbol-l]" designates a symbol-th entry of the selected cumulative frequencies-table (arith_cf_ng[pki][545]); and
wherein an operation "for(;;) {...}" indicates to repeat a block of instructions included in the brackets "{" and "}" until a "break" instruction is reached.
The audio decoder according to claim 8, wherein the 32 cumulative frequencies- tables (arith_cf_ng[pki=0][545] to arith_cf_ng[pki=31][545]) are associated to 32 cumulative-frequencies-table index-values pki between 0 and 31, and wherein the cumulative frequencies-tables are defined in accordance with the table representations of Fig.16(1) to Fig.16(32).
10. The audio decoder (200) according to one of the claims 1-9, wherein the arithmetic decoder (230) is configured to determine a number (mm) of group elements of a group designated by the group-index (ng),
to evaluate an encoded representation (acod_ne[ne]) of an element index, to obtain a decoded representation of the element index, if the group designated by the group index (ne) comprises more than one element,
to determine a look-up address base (4-(og + ne)) in dependence on the group-index (ng) and, if the group designated by the group-index (ng) comprises more than one element, the element index (ne), and
to determine look-up values of the most-significant bit plane in a look-up table (dgvectors) in dependence on the look-up address base.
11. The audio decoder (200) according to one of claims 1 to 10, wherein the arithmetic decoder (230) is configured to determine the state index (t) for decoding a current tuple of spectral values using an algorithm "arith_get_context" defined as follows:
arith_get_context ( ) { tO=q[O] [i].v+l; tl=q[l] [i-l].v+l; t2=q[0] [i-1] .v+1; t3=q[0] [i+1] .v+1;
if ( (t0<10) && (tl<10) && (t2<10) && (t3<10) ) { if ( t2>l ) t2=2; if ( t3>l ) t3=2; return 3* (3* (3* (3* (3* (10* (10*tO+tl) ) +t2) +t3) ) ) ; } if ( (tθ<34) && (tl<34) && (t2<34) && (t3<34) ){ if ( (t2>l) && (t2<10) ) t2=2; else if ( t2>=10 ) t2=3 ;
if ( (t3>l) && (t3<10) ) t3=2; else if ( t3>=10 ) t3=3;
return 252000+4* (4* (34* (34*tO+tl) ) +t2) +t3; }
if ( (t0<90) && (tl<90) ) return 880864+90* (90*t0+tl) ;
if ( (tθ<544) && (tl<544) ) return 1609864 + 544*tO+tl;
if ( tθ>l )
{ aO=q[O] [i].a; bO=q[O] [i] .b; cO=q[O] [i].c; dO=q[O] [i].d;
} else aθ=bθ=cθ=dθ=θ;
if ( tl>l )
{ al=q[l] [i-1]. a; bl=q[l] [i-1]. b; cl=q[l] [i-1] . c; dl=q[l] [i-1]. d;
} else al=bl=cl=dl=0;
1=0; do { aθ»=l; bθ»=l; cθ»=l; dθ»=l;
al»=l; bl»=l; cl»=l; dl»=l;
1++; } while ( (aO<-4) I | (aθ>=4) | | (bO<-4) I I (bθ>=4) | (cO<-4) I l (cθ>=4) I l (dO<-4) | | (dθ>=4) |
(al<-4) M (al>=4) | | (bl<-4) I l (bl>=4) |
(cl<-4) I I (cl>=4) I I (dl<-4) | I (dl>=4)
if ( tθ>l ) tO=l+(egroups[4+aO] [4+bO] [4+cO] [4+dO] » 16); if ( tl>l ) tl=l+(egroups [4+al] [4+bl] [4+cl] [4+dl] » 16);
return 1609864 + ( (1«24 ) I (544*tO+tl) ) ; }
wherein q[0][i].v designates a context variable of a tuple of spectral values of a previous audio frame associated with the same frequencies like the current tuple of spectral values;
wherein q[l][i-l] designates a context variable of a tuple of spectral values of a current audio frame associated with lower frequencies than the current tuple of spectral values;
wherein q[0][i-l] designates a context variable of a tuple of spectral values of the previous audio frame associated with lower frequencies than the current tuple of spectral values; wherein q[0][i+l] designates a context variable of a tuple of spectral values of the previous audio frame associated with higher frequencies than the current tuple of spectral values;
wherein "&&" designates a logical AND operation;
wherein q[0][i].a designates a first tuple value "a" of a tuple of spectral values of the previous audio frame associated with the same frequencies as the current tuple of spectral values;
wherein q[0][i].b designates a second tuple value "b" of a tuple of spectral values of the previous audio frame associated with the same frequencies as the current tuple of spectral values;
wherein q[0][i].c designates a third tuple value "c" of a tuple of spectral values of the previous audio frame associated with the same frequencies as the current tuple of spectral values;
wherein q[O][i].d designates a fourth tuple value "d" of a tuple of spectral values of the previous audio frame associated with the same frequencies as the current tuple of spectral values;
wherein q[l][i-l].a designates a first tuple value "a" of a tuple of spectral values of the current audio frame associated with lower frequencies than the current tuple of spectral values;
wherein q[l][i-l].b designates a second tuple value "b" of a tuple of spectral values of the current audio frame associated with lower frequencies than the current tuple of spectral values;
wherein q[l][i-l].c designates a third tuple value "c" of a tuple of spectral values of the current audio frame associated with lower frequencies than the current tuple of spectral values;
wherein q[l][i-l].d designates a fourth tuple value "d" of a tuple of spectral values of the previous audio frame associated with lower frequencies than the current tuple of spectral values; wherein aθ, bθ, cO, dθ, al, bl, cl, dl are variables representing a signed number in a 2-s complement representation;
wherein "»" designates a logical shift-to-the right operator; and
wherein "egroups[4+aθ][4+bθ][4+cθ][4+dθ]" defines an entry having entry indices 4+aO, 4+bO, 4+cO, 4+dO of a four-dimensional array "egroups",
wherein "egroups [4+a 1 ] [4+b 1 ] [4+c 1 ] [4+d I]" defines an entry having entry indices
4+al, 4+bl, 4+cl, 4+dl of the four-dimensional array "egroups",
wherein the array "egroups" is defined in accordance with the table representations of Fig.19(1) to Fig.19(32).
12. The audio decoder (200) according to claim 11,
wherein the arithmetic decoder (230) is configured to update the context variables using the following algorithm:
arith_update_context ( ) { q[l] [i] .a=a; q[l] [i] .b=b; q[l] [i] .c=c; q[l] [i] .d=d;
if ( (a<-4) M (a>=4) | | (b<-4) | | (b>=4) | | (c<-4) M (c>=4) M (d<-4) M (d>=4) )
{ q[l] [i] .v =1024; } else q[l] [i] . v=egroups [4+a] [4+b] [4+c] [4+d] ;
if(i==lg/4 && core_mode==l) { qs[0]=q[l] [O] ; ratio= ((float) Ig) /( (float) 1024) ; for (j=0; j<256; j++)
{ k = (int) ( (float) j*ratio) ; qs[l+k] = q[l] [1+j] ;
} qs [previous_lg/4+l] = q[l] [lg/4+1] ; previous_lg = 1024; }
if(i==lg/4 && core_mode==0) { for (j=0; j<258; j++) { qs[j] = q[l] [J];
} previous_lg = min (1024, Ig) ; } }
wherein, a, b, c, d designate values of a current, completely-decoded tuple of spectral values;
wherein lg/4 designates a number of 4 tuples associated to the current audio frame;
wherein "core_mode = = 1" indicates a linear-prediction-domain core mode; and
wherein "core_mode = = 0" indicates a frequency-domain core mode;
wherein a "(float)" operator indicates to use a floating point number representation; and
wherein a "(int)" operator indicates to use an integer number representation.
13. An audio encoder (100) for providing an encoded audio information (112) on the basis of an input audio information (110), the audio encoder comprising: an energy-compacting time-domain to frequency-domain converter (130) for providing a frequency-domain audio representation (132) on the basis of a time- domain representation (110, 110a) of the input audio information, such that the frequency-domain audio representation (1332) comprises a set of spectral values;
an arithmetic encoder (170) configured to encode a tuple (a,b,c,d) of adjacent spectral values (132), or a pre-processed version (152) thereof, using a variable- length codeword (acod_ng[pki][ng], acod_ne[ne], acod_r[] [][][]);
wherein the arithmetic encoder (170) is configured to map values of a most- significant bit-plane (a»lev, b»lev, c»lev, d»lev) of a tuple (a,b,c,d) of spectral values onto a group-index (ng) and an element index (ne), the element index (ne) describing an element within a group selected by the group index (ng);
wherein the arithmetic encoder (170) is configured to select a cumulative- frequencies-table (arith_cf_ng[pki][545]) out of a set of 32 cumulative-frequencies- tables (arith_cf_ng[0][545] to arith_cf_ng[31][545]) in dependence on a state index (t) of the arithmetic encoder; and
wherein the arithmetic encoder (170) is configured to arithmetically encode the group index (ng) using the selected cumulative-frequencies-table (arith_cf_ng[pki][545]) in order to obtain an arithmetically-encoded variable- length-codeword (acod_ng[pki] [ng]).
14. A method (2000) for providing a decoded audio representation on the basis of an encoded audio representation, the method comprising:
providing (2010) a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values; and
providing (2020) a time-domain audio representation using the decoded spectral values;
wherein providing (2010) a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values comprises
selecting (2012) a cumulative-frequencies-table out of a set of 32 cumulative- frequencies-tables in dependence on a state index, applying (2014) the selected cumulative-frequencies-table to derive a group-index from a variable-length-codeword representing the group index,
deriving (2016) values of a most-significant bit-plane of a tuple of spectral values using the group index and an element index, the element index designating an element within a group selected by the group index; and
providing (2018) a tuple of decoded spectral values using the values of the most- significant bit-plane of the tuple of spectral values.
15. A method (2100) for providing an encoded audio representation on the basis of an input audio representation, the method comprising:
providing (2110) a frequency-domain audio representation on the basis of a time- domain audio representation of an input audio information, such that the frequency- domain audio representation comprises a set of spectral values, and such that an energy is compacted in a sub-set of the spectral values; and
encoding (2120) a tuple of adjacent spectral values of the set of spectral values, or of a pre-processed version of the set of spectral values, wherein the encoding of the tuple of adjacent spectral values comprises
mapping (2122) values of a most-significant bit-plane of the tuple of spectral values onto a group index and an element index, the element index designating an element within a group selected by the group index,
selecting (2124) a cumulative-frequencies-table out of a set of 32 cumulative- frequencies-tables in dependence on a state index describing a state of the arithmetic encoding, and
arithmetically encoding (2126) the group index using the selected cumulative- frequencies-table, in order to obtain an arithmetically-encoded variable length- codeword.
16. A computer program for performing the method according to claim 14 or 15, when the computer program runs on a computer.
PCT/EP2010/050954 2009-01-28 2010-01-27 Audio encoder, audio decoder, method for encoding an input audio information, method for decoding an input audio information and computer program using improved coding tables WO2010086342A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14788509P 2009-01-28 2009-01-28
US61/147,885 2009-01-28

Publications (1)

Publication Number Publication Date
WO2010086342A1 true WO2010086342A1 (en) 2010-08-05

Family

ID=42245645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/050954 WO2010086342A1 (en) 2009-01-28 2010-01-27 Audio encoder, audio decoder, method for encoding an input audio information, method for decoding an input audio information and computer program using improved coding tables

Country Status (3)

Country Link
AR (1) AR075200A1 (en)
TW (1) TW201126508A (en)
WO (1) WO2010086342A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160028442A (en) * 2013-07-11 2016-03-11 톰슨 라이센싱 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
RU2652464C2 (en) * 2013-01-31 2018-04-26 Оранж Improved correction of frame loss when decoding signal
RU2688275C2 (en) * 2014-05-16 2019-05-21 Квэлкомм Инкорпорейтед Selection of codebooks for encoding vectors decomposed from higher-order ambisonic audio signals
US10499176B2 (en) 2013-05-29 2019-12-03 Qualcomm Incorporated Identifying codebooks to use when coding spatial components of a sound field
CN112509591A (en) * 2020-12-04 2021-03-16 北京百瑞互联技术有限公司 Audio coding and decoding method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007066970A1 (en) * 2005-12-07 2007-06-14 Samsung Electronics Co., Ltd. Method, medium, and apparatus encoding and/or decoding an audio signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007066970A1 (en) * 2005-12-07 2007-06-14 Samsung Electronics Co., Ltd. Method, medium, and apparatus encoding and/or decoding an audio signal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"ISO/IEC 14496-3:2001(E) - Subpart 4: General Audio Coding (GA) . AAC, TwinVQ, BSAC", INTERNATIONAL STANDARD ISO/IEC, XX, XX, 1 January 2001 (2001-01-01), pages 1 - 313, XP007902531 *
K. SAYOOD: "Introduction to Data Compression", 2006, ELSEVIER INC.
MEINE NIKOLAUS ET AL: "IMPROVED QUANTIZATION AND LOSSLESS CODING FOR SUBBAND AUDIO CODING", PREPRINTS OF PAPERS PRESENTED AT THE AES CONVENTION, XX, XX, vol. 1-4, 31 May 2005 (2005-05-31), pages 1 - 9, XP008071322 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2652464C2 (en) * 2013-01-31 2018-04-26 Оранж Improved correction of frame loss when decoding signal
US11146903B2 (en) 2013-05-29 2021-10-12 Qualcomm Incorporated Compression of decomposed representations of a sound field
US11962990B2 (en) 2013-05-29 2024-04-16 Qualcomm Incorporated Reordering of foreground audio objects in the ambisonics domain
US10499176B2 (en) 2013-05-29 2019-12-03 Qualcomm Incorporated Identifying codebooks to use when coding spatial components of a sound field
US11540076B2 (en) 2013-07-11 2022-12-27 Dolby Laboratories Licensing Corporation Methods and apparatus for decoding encoded HOA signals
KR102534163B1 (en) 2013-07-11 2023-05-30 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR20210029302A (en) * 2013-07-11 2021-03-15 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR102658702B1 (en) 2013-07-11 2024-04-19 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
US11863958B2 (en) 2013-07-11 2024-01-02 Dolby Laboratories Licensing Corporation Methods and apparatus for decoding encoded HOA signals
US11297455B2 (en) 2013-07-11 2022-04-05 Dolby Laboratories Licensing Corporation Methods and apparatus for decoding encoded HOA signals
KR102386726B1 (en) 2013-07-11 2022-04-15 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR20220051026A (en) * 2013-07-11 2022-04-25 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR20160028442A (en) * 2013-07-11 2016-03-11 톰슨 라이센싱 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR20230070540A (en) * 2013-07-11 2023-05-23 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
KR102226620B1 (en) 2013-07-11 2021-03-12 돌비 인터네셔널 에이비 Method and apparatus for generating from a coefficient domain representation of hoa signals a mixed spatial/coefficient domain representation of said hoa signals
US10770087B2 (en) 2014-05-16 2020-09-08 Qualcomm Incorporated Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
RU2688275C2 (en) * 2014-05-16 2019-05-21 Квэлкомм Инкорпорейтед Selection of codebooks for encoding vectors decomposed from higher-order ambisonic audio signals
CN112509591A (en) * 2020-12-04 2021-03-16 北京百瑞互联技术有限公司 Audio coding and decoding method and system
CN112509591B (en) * 2020-12-04 2024-05-14 北京百瑞互联技术股份有限公司 Audio encoding and decoding method and system

Also Published As

Publication number Publication date
TW201126508A (en) 2011-08-01
AR075200A1 (en) 2011-03-16

Similar Documents

Publication Publication Date Title
US20230162742A1 (en) Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
AU2011287747B2 (en) Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using an optimized hash table
WO2011086065A1 (en) Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a hash table describing both significant state values and interval boundaries
WO2010086342A1 (en) Audio encoder, audio decoder, method for encoding an input audio information, method for decoding an input audio information and computer program using improved coding tables
AU2010309898B2 (en) Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
AU2010309821B2 (en) Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using an iterative interval size reduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10711852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10711852

Country of ref document: EP

Kind code of ref document: A1