WO2009044346A1 - System and method for combining adaptive golomb coding with fixed rate quantization - Google Patents

System and method for combining adaptive golomb coding with fixed rate quantization Download PDF

Info

Publication number
WO2009044346A1
WO2009044346A1 PCT/IB2008/053986 IB2008053986W WO2009044346A1 WO 2009044346 A1 WO2009044346 A1 WO 2009044346A1 IB 2008053986 W IB2008053986 W IB 2008053986W WO 2009044346 A1 WO2009044346 A1 WO 2009044346A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal components
signal
adaptive parameter
bits
bitstream
Prior art date
Application number
PCT/IB2008/053986
Other languages
French (fr)
Inventor
Adriana Vasilache
Lasse Laaksonen
Mikko Tammi
Anssi Ramo
Original Assignee
Nokia Corporation
Nokia Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation, Nokia Inc filed Critical Nokia Corporation
Publication of WO2009044346A1 publication Critical patent/WO2009044346A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation

Definitions

  • the present invention relates generally to data compression. More particularly, the present invention relates to encoding spectral parameters in full band speech and audio coding where adaptive and/or estimated coding of side information is based on the inter or intra frame correlation.
  • Audio signals such as speech or music
  • Audio encoders and decoders are used to represent audio-based signals, such as music and background noise. These types of coders generally do not utilize a speech model for the coding process, but rather use processes for representing all types of audio signals, including speech.
  • Speech encoders and decoders are generally optimized for speech signals and can operate at either a fixed or variable bit rate. An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may work with speech signals at a coding rate equivalent to a pure speech codec.
  • the audio codec may code any signal including music, background noise and speech, with higher quality and performance.
  • the encoder transforms input data/signal(s), e.g., the speech signals, into a compressed representation suited for storage and/or transmission, and the decoder can uncompress the compressed representation back into a useable or presentable form.
  • the encoder discards some information in the original data or speech signal sequence in order to represent the content in a more compact form, i.e., at a lower bitrate.
  • the time-frequency analysis process is performed by either a unitary transform such the modified discrete cosine transform (MDCT) and the Discrete Fourier Transform (DFT), or by an analysis filterbank such a polyphase filterbank with uniform bandpass filters or a time varying, critically sampled bank of non-uniform bandpass filters.
  • a unitary transform such as the modified discrete cosine transform (MDCT) and the Discrete Fourier Transform (DFT)
  • DFT Discrete Fourier Transform
  • an analysis filterbank such a polyphase filterbank with uniform bandpass filters or a time varying, critically sampled bank of non-uniform bandpass filters.
  • time-frequency analysis process involves a tradeoff between time and frequency resolution requirements.
  • the choice of time-frequency analysis methods additionally determines the amount of coding delay introduced into a parameter which is important in duplex broadcasts.
  • audio codecs generally employ lossless or entropy-based coding on the integer representation of the quantized coefficients.
  • the entropy of a signal can be considered to denote a minimum number of bits on which the information contained in the signal can be represented. The difference between the amount of symbols or bits actually used to represent the signal and the theoretical entropy lower bound is the redundancy associated with the signal.
  • an entropy coder allocates short code words to samples with a high probability of occurrence and longer code words to samples with a lower probability, and in this way reduces the average bit consumption.
  • an embedded variable rate speech or audio coding scheme which is also referred as a layered coding process.
  • Embedded variable rate audio or speech coding denotes an audio or speech coding process in which a bitstream resulting from a coding operation is distributed into successive layers.
  • a base or core layer which comprises primary coded data, is generated from the binary elements essential for the decoding of the binary stream, and determines a minimum quality of decoding.
  • the ITU-T Recommendation G.722.1 specifies low complexity coding at 24 and 32 kbits/s for hands free operation in systems with low frame loss. This codec is designed to operate over a wideband signal which is sampled at 16 kHz, and whose band is limited to 7 kHz.
  • study group 16 aims to incorporate a full band scalable extension to this codec which is capable of coding the input signal at a sampling frequency of up to 48 kHz.
  • the codec is to be based on the transform codec principal, where the time based signal is transformed into the frequency domain using a unitary transform such as the MDCT.
  • the frequency coefficients are conventionally divided into sub vectors, where consecutive sub vectors can be grouped into bands, where the number of vectors grouped into each band may be arranged such that it reflects the psychoacoustic characteristic of the signal.
  • a full band input signal is split into overlapping frames of 40 ms (e.g., 20 ms overlap), where each 40 ms frame is windowed (sine window identical to that in the G.722.1 codec).
  • the full band input signal that is split and "frame-windowed” is transformed to the frequency domain through the use of a modified discrete cosine transform (MDCT), which uses the same floating-point code as the G.722.1 codec).
  • MDCT discrete cosine transform
  • Spectral weighting as well as quantized transform coefficients are then encoded and transmitted to a decoder.
  • One type of efficient quantizer that is conventionally utilized for long vectors is based on the Voronoi extension of the eight-dimensional Gosset (RE8) lattice and used in several state-of-the-art codecs such as adaptive multi rate- wideband plus (AMR- WB+) and G.EV-VBR, an embedded variable -varible bit rate codec which is currently under standardisation activity within the study group 16 of the ITU.
  • RE8 lattices are described in "Method and system for multi-rate lattice vector quantization of a signal" by B. Bessette, S. Ragot, and J. -P. Adoul, and U.S. Patent No. 7,106,228.
  • Quantizing long vectors thus consists of splitting a vector into eight-dimensional sub-vectors and quantizing them in a RE8 lattice, where the indices of the RE8 lattice code vectors together with side information indicate how many bits each quantized sub-vector form a bitstream. Additionally, it should be noted that the side information is encoded with a modified Golomb code with fixed Golomb parameter, and there is a global gain which ensures that the available number of bits is not exceeded.
  • Various embodiments allow for encoding an audio signal via the adapting of a Golomb parameter.
  • a plurality of signal components are generated, where each of the plurality of signal components has an associated value.
  • Each of the associated values is encoded with each of the plurality of signal components.
  • the number of the plurality of signal components to be distributed to a bitstream is determined.
  • Figure 1 is a flow chart illustrating coding processes performed in accordance with various embodiments
  • Figure 2 illustrates a table representative of Golomb codes for n integers of a
  • Figure 3 is a flow chart illustrating processes for utilizing intra/inter frame correlation in accordance with various embodiments
  • Figure 4 is a flow chart illustrating processes for encoding an audio signal via the adapting of a Golomb parameter in accordance with various embodiments
  • Figure 5 is an overview diagram of a system within which the present invention may be implemented.
  • Figure 6 is a perspective view of a mobile telephone that can be used in the implementation of the present invention.
  • Figure 7 is a schematic representation of the telephone circuitry of the mobile telephone of Figure 6. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Various embodiments are utilized to increase the efficiency of a quantization tool based on the Voronoi extension of the RE8 lattice described above by adaptive and/or estimated encoding of side information. That is, where the side information is comprised of non-negative integers 0, 2, 3, 4, 5, 6,..., and, e.g., for a positive integer m, the mth Golomb code defines a reversible prefix-free mapping of those non- negative integers to variable length binary codewords, the modified Golomb coding with fixed parameter is replaced by adaptive Golomb Rice coding.
  • the Golomb Rice coding is a specific instance/algorithm of a Golomb parameter, where in accordance with various embodiments, adaptation of the Golomb parameter is performed based on a moving window average of previous side information values.
  • the Golomb parameter for encoding is decided based on values of one or more side information parameters from the previous frame.
  • various embodiments utilizing predictive (via estimation) adaptation as described herein within, for example, a multi-block frame can experience increased compression efficiency.
  • the saved bits from the current block which would have been used to encode the side information data using the fixed Golomb code, can be utilized for the encoding of the next block. This is because a reduction in codeword length is achieved by having an adaptive Golomb tuning parameter, m.
  • An example of a coding scenario in which various embodiments can be used is when an input audio signal is MDCT-transformed and the spectral coefficients are quantized. Prior to quantization, the spectral coefficients in each psycho-acoustical band are scaled with a scale derived from the masking threshold per band, where psycho-acoustical compression models can refer to those models that select humanly- relevant frequencies for compression, while rejecting other frequencies. [0024] To encode the full band of 20 kHz of a signal sampled at 48 kHz, for example, 800 spectral components are quantized for each 20 ms frame. For transient frames, the frame is split into four blocks and on each of the frames, a MDCT is performed.
  • FIG. 1 illustrates a flow chart describing a possible implementation according to various embodiments.
  • a coder receives the MDCT coefficients output from a MDCT processor at 100, and at 110 a grouping processor processes the coefficients into groups of coefficients (these groups of coefficients are also known as sub-bands or perceptual bands), where the grouping of the frequency coefficients is performed according to the psycho-acoustical model described above.
  • each "frame" of the signal (20 ms) when applied to the MDCT produces 280 critically sampled coefficient values.
  • the psycho-acoustical band energies are calculated and spread across neighboring sub-bands in order to determine a masking threshold for each sub-band.
  • This masking threshold or alternatively its inverse, is used to scale each sub-band prior to quantization as described above.
  • the masking threshold is indicative of how much quantization error is tolerated in each sub-band, and this information is used to change the quantization resolution accordingly. Because, as described above, the same quantizer/quantization tool is being used for all of the sub-bands, the scaling acts as a quantization resolution change.
  • the scales are re-optimized based on the quantized spectral coefficient values, where the re-optimized scales are quantized with a logarithmic scalar quantizer having, e.g., 4 bits per scale. That is, at 122, the scales are calculated and at 124, the scaling processor scales the coefficients with the non-quantized scales. At 126, the scales are re-optimzed for the quantized coefficient values, and at 128, the re-calculated scales are quantized. It should be noted that the scaling processor can quantize the scale factors, where the quantization of the scale factors is performed, for example, using a 16 codeword quantizer.
  • one codebook can be used for each sub band.
  • the quantization processor performs a quantization and indexing of the scaled MDCT coefficients at 130. Thereafter, the signals are multiplexed at 140. It should be noted that there is one set of scales calculated from the masking thresholds described above.
  • bits from one (sub)frame to another differs from the bitpool mechanism in Advanced Audio Coding (AAC), in that it does not need a header to specify the frame boundaries, and for the transient cases, it is usable in a fixed rate case where the same number of bit are used for all the frames. It should further be noted that depending on various embodiments used, savings averaging between 3 and 10% of the bitrate allocated to spectral coefficients are achieved. Under complexity and delay constraints, the compression efficiency is increased for transients.
  • AAC Advanced Audio Coding
  • a global gain is estimated such that the number of bits used for quantization do not exceed the available number of bits for any given total bit rate.
  • the vector is split in 100 eight-dimensional sub-vectors and quantized in an RE8 lattice. It should be noted however, that the vector can be split into dimensional sub-vectors other than just eight-dimensional sub-vectors. Moreover, it should be noted that various embodiments can be implemented using various quantizers including, but not limited to the RE8 lattice described herein.
  • the side information and the lattice codevector indices are calculated. It should be noted that the encoding of the side information is done independently of the lattice codevectors indices, as described below.
  • the code 1 0 0 is the mth ("2") Golomb code variable length binary codeword mapping of the nonnegative integer 3.
  • the estimation of the Golomb parameter can be based on an estimation of the mean of the data to be encoded. Since the statistics of data changes within a frame, it is customary to use a moving window for the estimation of the mean value, where one alternative for estimation involves using previous data values to estimate the mean. It should be noted, though, that other methods of estimating the Golomb parameter can be found in "Lossless adaptive Golomb/Rice decoding of integer data using backward adaptive rules", H.S. Malvar, United States Patent 0103556, 2006. In accordance with one method of estimation, for scaled spectral data, the correlation in the side information has a relatively short history length. Therefore, the last 2 or 3 values result in substantially optimal performance, although various embodiments are not limited to using these values.
  • the side information that needs to be encoded can be the following
  • nq [6 11 8 8 9 5 5 3 4 0 2 3 6 2 2 4 0 4 2 3 2 2 0 0 2 0 3 0 0 0 0 0 0 0 0]
  • the moving window length is 2.
  • a default Golomb parameter which can be 2.
  • This function returns the number of bits for different Golomb codes without counting the stop bit.
  • the resulting Golomb parameter for nq[2] is 8, the resulting Golomb code for nq[3] is 8, etc.
  • GPO2 Golomb power of two
  • the number of bits for side information of the sub-vectors is calculated in decreasing order of sub-vector energy (or estimated number of bits per sub-vector), to ensure that the most energetic sub-vectors will be encoded. This order is considered only when counting the bits.
  • the order of the encoded sub-vectors in the bitstream is the "natural" order, staring with the first sub-vector. Furthermore, at the decoding end, the natural order is also followed, thus ensuring that the previous values from the moving window, and allowing the calculation of the Golomb parameter with which the data has been encoded.
  • inter-frame correlation is usually higher than the intra- frame correlation, this approach can be more efficient. It should be noted however that utilizing inter- frame correlation for adapting Golomb codes involves inter-frame prediction, which without any specialized frame erasure concealment method, renders the encoded bitstream sensitive to channel errors. It can, however be used if the channel is clean. That is, in the presence of frame erasures, restricted prediction (between a small number of frames, e.g., 2 - 4) could be used.
  • the side information would be transmitted without any adaptation based on the previous frame.
  • the adaptation provided in the first embodiment is not affected by frame erasures.
  • the moving window would, in this case be centered on the current position, since all the data for the previous frame is available.
  • the saved bits can be borrowed and used for the next frame, when there are no delay constraints.
  • Inter-frame correlation can be implicitly used at the encoder, e.g., only at the estimation of the global gain. Since the global gain is estimated based on the energy of the sub-vectors to be quantized, the number of available bits, and for the previous frame, there is an estimation of the number of bits saved if intra- frame correlation were to be utilized, the available number of bits can be considered to be larger by the number of bits saved at the previous frame. Therefore, while the very first frame will be left with some unused bits, the rest of the frames will have an indication indicative of how many bits are actually available.
  • Figure 3 is a flow chart illustrating processes performed for exploiting either intra and/or inter frame correlation.
  • the global gain is estimated upon receiving a spectral vector and the available number of bits, where NTot - A ⁇ fixed + Nprev saved , and where N prev _saved refers to the number of bits saved at a previous frame.
  • the spectral vector is divided over the global gain, and at 320, the spectral vector is split into 8-dimensional sub-vectors.
  • each sub- vector is quantized in an RE8 lattice and the side information is calculated. The number of bits for the side information is then calculated at 340 based upon inter/intra frame correlation.
  • An example illustrating the use of inter- frame prediction in accordance with various embodiments involves a scenario involving transients, where a 20 ms frame is divided in four blocks on which MDCT is performed, the four blocks being considered to be four consecutive sub-frames.
  • bits can be transmitted from one sub-frame to another and inter-(sub)frame prediction can be utilized.
  • the transmission of bits from one (sub)frame to another needs as side information, the number of bits that are saved. This is because the decoding of the bitstream in the Voronoi extension tool described above stops when the allocated number of bits are consumed. If the decoder is not aware of the bit savings, it will not stop decoding when the information relevant to the spectral data has finished decoding.
  • FIG. 4 is a flow chart illustrating various processes for encoding an audio signal via the adapting of a Golomb parameter in accordance with various embodiments.
  • a plurality of signal components are generated, where each of the plurality of signal components has an associated value.
  • each of the plurality of signal components can be a single eight- dimensional sub-vector quantized in an RE8 lattice, and the associated value can be, e.g., the side information.
  • each of the associated values is encoded with each of the plurality of signal components.
  • an adaptive parameter of the entropy coder e.g., the Golomb parameter m, is estimated using at least one previous associated value.
  • inter-frame correlation can be utilized to estimate the Golomb parameter needed to encode the side information, negating the need to use bits to transmit the Golomb parameter m.
  • the number of the plurality of signal components to be distributed to a bitstream is determined. [0044] In other words, once it is known what the m value is, the number of bits needed to code the bitstream of side information is known. Hence, this information can be used to instruct the quantizer how many bits it can use in the next frame for quantizing the sub-vectors. The number of bits needed to code the bitstream of side information can be determined because a layered codec is utilized.
  • the bandwidth that is left for the sub-vector indices i.e., the indices from the lattice quantizer
  • a subset of the total set of sub-vector indices can be distributed to the bitstream for the current frame.
  • the saved bits can also be re-used in substantially the same manner as described herein.
  • FIG. 5 shows a generic multimedia communications system for use with the present invention.
  • a data source 500 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats.
  • An encoder 510 encodes the source signal into a coded media bitstream.
  • the encoder 510 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 510 may be required to code different media types of the source signal.
  • the encoder 510 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media.
  • only processing of one coded media bitstream of one media type is considered to simplify the description.
  • typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream).
  • the system may include many encoders, but in the following only one encoder 510 is considered to simplify the description without a lack of generality.
  • the coded media bitstream is transferred to a storage 520.
  • the storage 520 may comprise any type of mass memory to store the coded media bitstream.
  • the format of the coded media bitstream in the storage 520 may be an elementary self- contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file.
  • Some systems operate "live", i.e. omit storage and transfer coded media bitstream from the encoder 510 directly to a sender 530.
  • the coded media bitstream is then transferred to the sender 530, also referred to as the server, on a need basis.
  • the format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file.
  • the encoder 510, the storage 520, and the sender 530 may reside in the same physical device or they may be included in separate devices.
  • the encoder 510 and the sender 530 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 510 and/or in the sender 530 to smooth out variations in processing delay, transfer delay, and coded media bitrate.
  • the sender 530 sends the coded media bitstream using a communication protocol stack.
  • the stack may include but is not limited to Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), and Internet Protocol (IP).
  • RTP Real-Time Transport Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • the sender 530 encapsulates the coded media bitstream into packets.
  • RTP Real-Time Transport Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • the sender 530 encapsulates the coded media bitstream into packets.
  • RTP Real-Time Transport Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • the sender 530 may or may not be connected to a gateway 540 through a communication network.
  • the gateway 540 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions.
  • Examples of gateways 540 include multipoint conference control units (MCUs), gateways between circuit-switched and packet- switched video telephony, Push-to-talk over Cellular (PoC) servers, IP encapsulators in digital video broadcasting-handheld (DVB-H) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks.
  • MCUs multipoint conference control units
  • PoC Push-to-talk over Cellular
  • DVD-H digital video broadcasting-handheld
  • the system includes one or more receivers 550, typically capable of receiving, de-modulating, and de-capsulating the transmitted signal into a coded media bitstream.
  • the coded media bitstream is typically processed further by a decoder 560, whose output is one or more uncompressed media streams.
  • a renderer 570 may reproduce the uncompressed media streams with a loudspeaker or a display, for example.
  • the receiver 550, the decoder 560, and the renderer 570 may reside in the same physical device or they may be included in separate devices.
  • the bitstream to be decoded can be received from a remote device located within virtually any type of network. Additionally, the bitstream can be received from local hardware or software.
  • Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc.
  • CDMA Code Division Multiple Access
  • GSM Global System for Mobile Communications
  • UMTS Universal Mobile Telecommunications System
  • TDMA Time Division Multiple Access
  • FDMA Frequency Division Multiple Access
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • SMS Short Messaging Service
  • MMS Multimedia Messaging Service
  • e-mail Instant Messaging Service
  • Bluetooth IEEE 802.11, etc.
  • a communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
  • Figures 6 and 7 show one representative mobile device 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of mobile device 12 or other electronic device. Some or all of the features depicted in Figures 5 and 6 could be incorporated into any or all of the devices represented in Figure 4.
  • the mobile device 12 of Figures 6 and 7 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a UICC according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56 and a memory 58.
  • Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones.
  • the present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein.
  • the particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
  • Software and web implementations of the present invention could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Encoding an audio signal via the adapting of a Golomb parameter is provided. A plurality of signal components are generated, where each of the plurality of signal components has an associated value. Each of the associated values is encoded with each of the plurality of signal components. An adaptive parameter of the entropy coder, e.g., a Golomb parameter m, is estimated using at least one previous associated value. That is, inter or intra- frame prediction can be utilized to estimate the Golomb parameter needed to encode the side information, negating the need to use bits to transmit the Golomb parameter m. The number of the plurality of signal components to be distributed to a bitstream is determined.

Description

SYSTEMAND METHOD FOR COMBINING ADAPTIVE GOLOMB CODING WITH FIXED RATE QUANTIZATION
FIELD OF THE INVENTION
[0001] The present invention relates generally to data compression. More particularly, the present invention relates to encoding spectral parameters in full band speech and audio coding where adaptive and/or estimated coding of side information is based on the inter or intra frame correlation.
BACKGROUND
[0002] This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
[0003] Audio signals, such as speech or music, can be encoded, e.g., for enabling an efficient transmission or storage of the audio signals. Audio encoders and decoders are used to represent audio-based signals, such as music and background noise. These types of coders generally do not utilize a speech model for the coding process, but rather use processes for representing all types of audio signals, including speech. [0004] Speech encoders and decoders (codecs) are generally optimized for speech signals and can operate at either a fixed or variable bit rate. An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may work with speech signals at a coding rate equivalent to a pure speech codec. At higher bit rates, the audio codec may code any signal including music, background noise and speech, with higher quality and performance. It should be noted that the encoder transforms input data/signal(s), e.g., the speech signals, into a compressed representation suited for storage and/or transmission, and the decoder can uncompress the compressed representation back into a useable or presentable form. Typically, the encoder discards some information in the original data or speech signal sequence in order to represent the content in a more compact form, i.e., at a lower bitrate.
[0005] Conventional audio transform-based compression algorithms segment the input signal into blocks having durations ranging from 2 ms up to 50 ms. A time- frequency analysis then decomposes each analysis block in the encoder. This transformation or sub-band filtering process compacts the energy of the input signal into a few transform coefficients and therefore de-correlates successive samples. These coefficients, sub-band samples, or parameters are quantized and encoded according to perceptual criteria. Generally, the time-frequency analysis process is performed by either a unitary transform such the modified discrete cosine transform (MDCT) and the Discrete Fourier Transform (DFT), or by an analysis filterbank such a polyphase filterbank with uniform bandpass filters or a time varying, critically sampled bank of non-uniform bandpass filters.
[0006] It should be noted that this conventional time-frequency analysis process involves a tradeoff between time and frequency resolution requirements. The choice of time-frequency analysis methods additionally determines the amount of coding delay introduced into a parameter which is important in duplex broadcasts. In addition to lossy coding techniques, such as the perceptual quantization of frequency domain coefficients, audio codecs generally employ lossless or entropy-based coding on the integer representation of the quantized coefficients. It should also be noted that the entropy of a signal can be considered to denote a minimum number of bits on which the information contained in the signal can be represented. The difference between the amount of symbols or bits actually used to represent the signal and the theoretical entropy lower bound is the redundancy associated with the signal. Therefore, an entropy coder allocates short code words to samples with a high probability of occurrence and longer code words to samples with a lower probability, and in this way reduces the average bit consumption. [0007] Yet another conventional audio and speech coding option is an embedded variable rate speech or audio coding scheme, which is also referred as a layered coding process. Embedded variable rate audio or speech coding denotes an audio or speech coding process in which a bitstream resulting from a coding operation is distributed into successive layers. A base or core layer, which comprises primary coded data, is generated from the binary elements essential for the decoding of the binary stream, and determines a minimum quality of decoding. Subsequent layers make it possible to progressively improve the quality of the signal arising from the decoding operation, where each new layer brings new information. One of the particular features of layered based coding is the possibility of intervening at any level of the transmission or storage chain, so as to delete a part of the binary stream without having to include any particular indication thereof to the decoder. [0008] The ITU-T Recommendation G.722.1 specifies low complexity coding at 24 and 32 kbits/s for hands free operation in systems with low frame loss. This codec is designed to operate over a wideband signal which is sampled at 16 kHz, and whose band is limited to 7 kHz. Within the ITU-T, study group 16 aims to incorporate a full band scalable extension to this codec which is capable of coding the input signal at a sampling frequency of up to 48 kHz. The codec is to be based on the transform codec principal, where the time based signal is transformed into the frequency domain using a unitary transform such as the MDCT. In a codec of this type, the frequency coefficients are conventionally divided into sub vectors, where consecutive sub vectors can be grouped into bands, where the number of vectors grouped into each band may be arranged such that it reflects the psychoacoustic characteristic of the signal.
[0009] For example, at an encoder level, a full band input signal is split into overlapping frames of 40 ms (e.g., 20 ms overlap), where each 40 ms frame is windowed (sine window identical to that in the G.722.1 codec). Moreover, the full band input signal that is split and "frame-windowed" is transformed to the frequency domain through the use of a modified discrete cosine transform (MDCT), which uses the same floating-point code as the G.722.1 codec). With the transform coefficients that result from the MDCT, a spectral weighting function is estimated and used in order to shape the transform coefficients quantization noise. Spectral weighting as well as quantized transform coefficients are then encoded and transmitted to a decoder. However, a problem exists with regard to the encoding of the spectral parameters resulting from the spectral weighting.
[0010] One type of efficient quantizer that is conventionally utilized for long vectors is based on the Voronoi extension of the eight-dimensional Gosset (RE8) lattice and used in several state-of-the-art codecs such as adaptive multi rate- wideband plus (AMR- WB+) and G.EV-VBR, an embedded variable -varible bit rate codec which is currently under standardisation activity within the study group 16 of the ITU. RE8 lattices are described in "Method and system for multi-rate lattice vector quantization of a signal" by B. Bessette, S. Ragot, and J. -P. Adoul, and U.S. Patent No. 7,106,228. Quantizing long vectors thus consists of splitting a vector into eight-dimensional sub-vectors and quantizing them in a RE8 lattice, where the indices of the RE8 lattice code vectors together with side information indicate how many bits each quantized sub-vector form a bitstream. Additionally, it should be noted that the side information is encoded with a modified Golomb code with fixed Golomb parameter, and there is a global gain which ensures that the available number of bits is not exceeded.
SUMMARY
[0011] Various embodiments allow for encoding an audio signal via the adapting of a Golomb parameter. A plurality of signal components are generated, where each of the plurality of signal components has an associated value. Each of the associated values is encoded with each of the plurality of signal components. An adaptive parameter of the entropy coder, e.g., a Golomb parameter m = 1, is estimated using at least one previous associated value. That is, inter or intra frame correlation can be utilized to estimate the side information, negating the need to use bits to transmit the Golomb parameter m. The number of the plurality of signal components to be distributed to a bitstream is determined.
[0012] Therefore, once it is known what the m value is for each signal component, the number of bits needed to code the bitstream of side information is known. Hence, this information can be used to instruct the quantizer how many bits it can use in the next frame for quantizing the sub-vectors. Additionally, several bits are saved with respect to the available number of bits for the spectral coefficients in the current frame. The difference arises because the estimation of the number of bits for the sub- vectors has been done without considering the correlations between them. If there are no complexity constraints, the global gain can be changed (diminished) and the data re-quantized. Moreover, if complexity is an issue, the saved bits can be borrowed and used for the next frame, when there are no delay constraints.
[0013] These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1 is a flow chart illustrating coding processes performed in accordance with various embodiments;
[0015] Figure 2 illustrates a table representative of Golomb codes for n integers of a
Golomb parameter m;
[0016] Figure 3 is a flow chart illustrating processes for utilizing intra/inter frame correlation in accordance with various embodiments;
[0017] Figure 4 is a flow chart illustrating processes for encoding an audio signal via the adapting of a Golomb parameter in accordance with various embodiments;
[0018] Figure 5 is an overview diagram of a system within which the present invention may be implemented;
[0019] Figure 6 is a perspective view of a mobile telephone that can be used in the implementation of the present invention; and
[0020] Figure 7 is a schematic representation of the telephone circuitry of the mobile telephone of Figure 6. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0021] Various embodiments are utilized to increase the efficiency of a quantization tool based on the Voronoi extension of the RE8 lattice described above by adaptive and/or estimated encoding of side information. That is, where the side information is comprised of non-negative integers 0, 2, 3, 4, 5, 6,..., and, e.g., for a positive integer m, the mth Golomb code defines a reversible prefix-free mapping of those non- negative integers to variable length binary codewords, the modified Golomb coding with fixed parameter is replaced by adaptive Golomb Rice coding. It should be noted that the Golomb Rice coding is a specific instance/algorithm of a Golomb parameter, where in accordance with various embodiments, adaptation of the Golomb parameter is performed based on a moving window average of previous side information values. In other words, given an inter- frame correlation, if the general context of the codec allows a particular case of inter-frame prediction, the Golomb parameter for encoding is decided based on values of one or more side information parameters from the previous frame.
[0022] Therefore, various embodiments utilizing predictive (via estimation) adaptation as described herein within, for example, a multi-block frame (e.g., transient frames), can experience increased compression efficiency. Furthermore, the saved bits from the current block, which would have been used to encode the side information data using the fixed Golomb code, can be utilized for the encoding of the next block. This is because a reduction in codeword length is achieved by having an adaptive Golomb tuning parameter, m.
[0023] An example of a coding scenario in which various embodiments can be used is when an input audio signal is MDCT-transformed and the spectral coefficients are quantized. Prior to quantization, the spectral coefficients in each psycho-acoustical band are scaled with a scale derived from the masking threshold per band, where psycho-acoustical compression models can refer to those models that select humanly- relevant frequencies for compression, while rejecting other frequencies. [0024] To encode the full band of 20 kHz of a signal sampled at 48 kHz, for example, 800 spectral components are quantized for each 20 ms frame. For transient frames, the frame is split into four blocks and on each of the frames, a MDCT is performed. Each of the four sub-frames is then quantized independently. It should be noted that the same quantization tool is used for both transient and non-transient frames, where only the length of the vector to be quantized differs. [0025] Figure 1 illustrates a flow chart describing a possible implementation according to various embodiments. A coder receives the MDCT coefficients output from a MDCT processor at 100, and at 110 a grouping processor processes the coefficients into groups of coefficients (these groups of coefficients are also known as sub-bands or perceptual bands), where the grouping of the frequency coefficients is performed according to the psycho-acoustical model described above. In this example, each "frame" of the signal (20 ms) when applied to the MDCT produces 280 critically sampled coefficient values. It should be noted that in this example, a 16 kHz sampling signal is described, whereas described above, various embodiments can be implemented for other signal frequencies, e.g., the full band of 20 kHz sampled at 48 kHz. It should also be noted that different groupings of coefficients can be effectuated. That is, depending on the number of samples input, the MDCT may output different numbers of coefficients per transform.
[0026] The psycho-acoustical band energies are calculated and spread across neighboring sub-bands in order to determine a masking threshold for each sub-band. This masking threshold, or alternatively its inverse, is used to scale each sub-band prior to quantization as described above. The masking threshold is indicative of how much quantization error is tolerated in each sub-band, and this information is used to change the quantization resolution accordingly. Because, as described above, the same quantizer/quantization tool is being used for all of the sub-bands, the scaling acts as a quantization resolution change. After the quantization has been performed, the scales are re-optimized based on the quantized spectral coefficient values, where the re-optimized scales are quantized with a logarithmic scalar quantizer having, e.g., 4 bits per scale. That is, at 122, the scales are calculated and at 124, the scaling processor scales the coefficients with the non-quantized scales. At 126, the scales are re-optimzed for the quantized coefficient values, and at 128, the re-calculated scales are quantized. It should be noted that the scaling processor can quantize the scale factors, where the quantization of the scale factors is performed, for example, using a 16 codeword quantizer. Therefore, in this example, one codebook can be used for each sub band. The quantization processor performs a quantization and indexing of the scaled MDCT coefficients at 130. Thereafter, the signals are multiplexed at 140. It should be noted that there is one set of scales calculated from the masking thresholds described above.
[0027] It should be noted that the transmission of bits from one (sub)frame to another differs from the bitpool mechanism in Advanced Audio Coding (AAC), in that it does not need a header to specify the frame boundaries, and for the transient cases, it is usable in a fixed rate case where the same number of bit are used for all the frames. It should further be noted that depending on various embodiments used, savings averaging between 3 and 10% of the bitrate allocated to spectral coefficients are achieved. Under complexity and delay constraints, the compression efficiency is increased for transients.
[0028] A global gain is estimated such that the number of bits used for quantization do not exceed the available number of bits for any given total bit rate. After division over the estimated global gain, the vector is split in 100 eight-dimensional sub-vectors and quantized in an RE8 lattice. It should be noted however, that the vector can be split into dimensional sub-vectors other than just eight-dimensional sub-vectors. Moreover, it should be noted that various embodiments can be implemented using various quantizers including, but not limited to the RE8 lattice described herein. The side information and the lattice codevector indices are calculated. It should be noted that the encoding of the side information is done independently of the lattice codevectors indices, as described below.
[0029] In conventional implementations of Golomb coding, side information integers are encoded with a Golomb code using a parameter m=\ as described above. Because the value "1" is not permitted as side information, all integers n>\ are encoded with code corresponding to n-\ . A description Golomb Rice codes can be found in "Punctured Elias codes for variable-length coding of integers", P. Fenwick, Technical report 137, ISSN 1173-3500, Department of Computer Science, University of Auckland, New Zealand, 2006 [0030] Figure 2 illustrates a table representative of Golomb codes for a first set of integers and the parameter m. For example, if m =2 and n =3, the corresponding Golomb code is "1 0 0." Therefore, with conventional Golomb coding, the code 1 0 0 is the mth ("2") Golomb code variable length binary codeword mapping of the nonnegative integer 3.
[0031] However, as described above, various embodiments effectuate the adaptation of the Golomb parameter, where according to a first embodiment, the estimation of the Golomb parameter can be based on an estimation of the mean of the data to be encoded. Since the statistics of data changes within a frame, it is customary to use a moving window for the estimation of the mean value, where one alternative for estimation involves using previous data values to estimate the mean. It should be noted, though, that other methods of estimating the Golomb parameter can be found in "Lossless adaptive Golomb/Rice decoding of integer data using backward adaptive rules", H.S. Malvar, United States Patent 0103556, 2006. In accordance with one method of estimation, for scaled spectral data, the correlation in the side information has a relatively short history length. Therefore, the last 2 or 3 values result in substantially optimal performance, although various embodiments are not limited to using these values.
[0032] For example, the side information that needs to be encoded can be the following
nq = [6 11 8 8 9 5 5 3 4 0 2 3 6 2 2 4 0 4 2 3 2 2 2 0 0 2 0 3 0 0 0 0 0 0 0 0 0]
[0033] The moving window length is 2. The first two values (^g[O] = 6 and nq[l] = 11) are encoded with a default Golomb parameter which can be 2. For the value 8, the average (6+11)/2 = 8.5 is computed and the number of bits used in the Golomb code for the quasi-optimal Golomb parameter is calculated, where pos =2, using the following algorithm, which is further described in "Selecting the Golomb Parameter in Rice Coding," A. Kiely, IPN Progress Report, vol. 42-159, pp. 1-8, November 15, 2004 available at http://coding.jpl.nasa.gov/~aaron/: if (mean< 1.62) /* Golomb parameter m=l */
{ if (nq[pos] ==0) return 0; return nq[pos];
}
if (mean < 3.68 ) /* Golomb parameter m=2*/
{ if (nq[pos] ==0) return 1; return ((nq[pos]-l)/2+l);
} else if (mean < 7.84 ) /* Golomb parameter m = 4=22*/
{ if (nq[pos] ==0) return 2; return (4*/?g[pos] + (w#[pos]-l)/4+2);
} else { /* Golomb parameter m = 8=2 */ if (nq[pos] ==0) return 3; return (4*m/[pos] + (ng[pos]-l)/8+3); }
[0034] This function returns the number of bits for different Golomb codes without counting the stop bit. The resulting Golomb parameter for nq[2] is 8, the resulting Golomb code for nq[3] is 8, etc. It should be noted that various embodiments are not limited to considering a Golomb power of two (GPO2) codes. [0035] Moreover, the number of bits for side information of the sub-vectors is calculated in decreasing order of sub-vector energy (or estimated number of bits per sub-vector), to ensure that the most energetic sub-vectors will be encoded. This order is considered only when counting the bits. The order of the encoded sub-vectors in the bitstream is the "natural" order, staring with the first sub-vector. Furthermore, at the decoding end, the natural order is also followed, thus ensuring that the previous values from the moving window, and allowing the calculation of the Golomb parameter with which the data has been encoded.
[0036] Therefore, in most frames, the implementation of various embodiments saves a number of bits with respect to the fixed Golomb code with parameter m=\ . There may be scenarios when this is not the case, however, and when this such a scenario occurs, one bit can be used to indicate whether the adaptive Golomb is used or fixed Golomb code with parameter m=\ is used.
[0037] In accordance with a second embodiment, instead of using the moving window average from the current frame, values from the previous frame can be used. Since the inter- frame correlation is usually higher than the intra- frame correlation, this approach can be more efficient. It should be noted however that utilizing inter- frame correlation for adapting Golomb codes involves inter-frame prediction, which without any specialized frame erasure concealment method, renders the encoded bitstream sensitive to channel errors. It can, however be used if the channel is clean. That is, in the presence of frame erasures, restricted prediction (between a small number of frames, e.g., 2 - 4) could be used. Hence, at every second (e.g., third/fourth) frame, the side information would be transmitted without any adaptation based on the previous frame. Note that the adaptation provided in the first embodiment is not affected by frame erasures. The moving window would, in this case be centered on the current position, since all the data for the previous frame is available. [0038] For both methods, there are several bits saved with respect to the available number of bits for the spectral coefficients in the current frame. The difference arises because the estimation of the number of bits for the sub-vectors has been done without considering the correlations between them. If there are no complexity constraints, the global gain can be changed (diminished) and the data re-quantized. Moreover, if complexity is an issue, the saved bits can be borrowed and used for the next frame, when there are no delay constraints. [0039] However, for the first embodiment described above, which uses intra frame correlation, the issue of utilizing the saved bits without affecting the delay or the sensitivity to frame erasures can be addressed as follows. Inter-frame correlation can be implicitly used at the encoder, e.g., only at the estimation of the global gain. Since the global gain is estimated based on the energy of the sub-vectors to be quantized, the number of available bits, and for the previous frame, there is an estimation of the number of bits saved if intra- frame correlation were to be utilized, the available number of bits can be considered to be larger by the number of bits saved at the previous frame. Therefore, while the very first frame will be left with some unused bits, the rest of the frames will have an indication indicative of how many bits are actually available.
[0040] Figure 3 is a flow chart illustrating processes performed for exploiting either intra and/or inter frame correlation. At 300, the global gain is estimated upon receiving a spectral vector and the available number of bits, where NTot - A τ fixed + Nprev saved , and where Nprev_saved refers to the number of bits saved at a previous frame. At 310, the spectral vector is divided over the global gain, and at 320, the spectral vector is split into 8-dimensional sub-vectors. At 330, each sub- vector is quantized in an RE8 lattice and the side information is calculated. The number of bits for the side information is then calculated at 340 based upon inter/intra frame correlation. The total number of bits needed is calculated at 350 according to, e.g., the formula NB = N + 1 + NBbitssaved , where N is the calculated number of bits needed for the side information, NBbltssaved is indicative of the number of bits on which the number of saved bits is written, and the "1" bit from NB = N + 1 + NBbitssaved is used to indicate what coding is used, e.g., Golomb or fixed. It is then determined at 260 whether N fixed > NB , where Nfixed is the default number of bits given by the codec. If so, then the number of bits needed, N, is increased by Nfixed - NB in the next frame/block at 380. If not, then the default Golomb encoding with m = 1 is used at 370.
[0041] An example illustrating the use of inter- frame prediction in accordance with various embodiments involves a scenario involving transients, where a 20 ms frame is divided in four blocks on which MDCT is performed, the four blocks being considered to be four consecutive sub-frames. In this scenario, bits can be transmitted from one sub-frame to another and inter-(sub)frame prediction can be utilized. [0042] In other words, the transmission of bits from one (sub)frame to another needs as side information, the number of bits that are saved. This is because the decoding of the bitstream in the Voronoi extension tool described above stops when the allocated number of bits are consumed. If the decoder is not aware of the bit savings, it will not stop decoding when the information relevant to the spectral data has finished decoding. This occurs because some of the sub-vectors are not explicitly encoded, but knowing that the bit budget is already spent, the sub-vectors are inferred to be null vectors. Practically speaking, seven bits (five for sub-frames) are sufficient to signal the quantity of saved bits.
[0043] Figure 4 is a flow chart illustrating various processes for encoding an audio signal via the adapting of a Golomb parameter in accordance with various embodiments. At 400, a plurality of signal components are generated, where each of the plurality of signal components has an associated value. For example, and as described above, each of the plurality of signal components can be a single eight- dimensional sub-vector quantized in an RE8 lattice, and the associated value can be, e.g., the side information. At 410, each of the associated values is encoded with each of the plurality of signal components. At 420, an adaptive parameter of the entropy coder, e.g., the Golomb parameter m, is estimated using at least one previous associated value. That is, as described above, inter-frame correlation can be utilized to estimate the Golomb parameter needed to encode the side information, negating the need to use bits to transmit the Golomb parameter m. At 430, the number of the plurality of signal components to be distributed to a bitstream is determined. [0044] In other words, once it is known what the m value is, the number of bits needed to code the bitstream of side information is known. Hence, this information can be used to instruct the quantizer how many bits it can use in the next frame for quantizing the sub-vectors. The number of bits needed to code the bitstream of side information can be determined because a layered codec is utilized. That is, once a target bit rate is arrived at, the bandwidth that is left for the sub-vector indices (i.e., the indices from the lattice quantizer) can be determined. Thereafter, a subset of the total set of sub-vector indices can be distributed to the bitstream for the current frame. Moreover, if there is other data, where the encoding of that data can be made more efficient, e.g., scales, error spectrum shapes, etc., the saved bits can also be re-used in substantially the same manner as described herein. It should be noted that an efficient entropy encoding of the side information (e.g., Golomb Rice coding), scales (e.g., differential coding, arithmetic coding, and/or Golomb Rice), error spectrum shapes, etc. can also be used to obtain a variable bit-rate version of the layered codec. [0045] Figure 5 shows a generic multimedia communications system for use with the present invention. As shown in Figure 5, a data source 500 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats. An encoder 510 encodes the source signal into a coded media bitstream. The encoder 510 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 510 may be required to code different media types of the source signal. The encoder 510 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description. It should be noted, however, that typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream). It should also be noted that the system may include many encoders, but in the following only one encoder 510 is considered to simplify the description without a lack of generality.
[0046] It should be understood that, although text and examples contained herein may specifically describe an encoding process, one skilled in the art would readily understand that the same concepts and principles also apply to the corresponding decoding process and vice versa.
[0047] The coded media bitstream is transferred to a storage 520. The storage 520 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 520 may be an elementary self- contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate "live", i.e. omit storage and transfer coded media bitstream from the encoder 510 directly to a sender 530. The coded media bitstream is then transferred to the sender 530, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file. The encoder 510, the storage 520, and the sender 530 may reside in the same physical device or they may be included in separate devices. The encoder 510 and the sender 530 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 510 and/or in the sender 530 to smooth out variations in processing delay, transfer delay, and coded media bitrate.
[0048] The sender 530 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), and Internet Protocol (IP). When the communication protocol stack is packet-oriented, the sender 530 encapsulates the coded media bitstream into packets. For example, when RTP is used, the sender 530 encapsulates the coded media bitstream into RTP packets according to an RTP payload format. Typically, each media type has a dedicated RTP payload format. It should be again noted that a system may contain more than one sender 530, but for the sake of simplicity, the following description only considers one sender 530. [0049] The sender 530 may or may not be connected to a gateway 540 through a communication network. The gateway 540 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions. Examples of gateways 540 include multipoint conference control units (MCUs), gateways between circuit-switched and packet- switched video telephony, Push-to-talk over Cellular (PoC) servers, IP encapsulators in digital video broadcasting-handheld (DVB-H) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks. When RTP is used, the gateway 540 is called an RTP mixer and acts as an endpoint of an RTP connection.
[0050] The system includes one or more receivers 550, typically capable of receiving, de-modulating, and de-capsulating the transmitted signal into a coded media bitstream. The coded media bitstream is typically processed further by a decoder 560, whose output is one or more uncompressed media streams. Finally, a renderer 570 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 550, the decoder 560, and the renderer 570 may reside in the same physical device or they may be included in separate devices. [0051] It should be noted that the bitstream to be decoded can be received from a remote device located within virtually any type of network. Additionally, the bitstream can be received from local hardware or software. [0052] Scalability in terms of bitrate, decoding complexity, and picture size is a desirable property for heterogeneous and error prone environments. This property is desirable in order to counter limitations such as constraints on bit rate, content resolution, network throughput, and computational power in a receiving device. [0053] Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. A communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.
[0054] Figures 6 and 7 show one representative mobile device 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of mobile device 12 or other electronic device. Some or all of the features depicted in Figures 5 and 6 could be incorporated into any or all of the devices represented in Figure 4. [0055] The mobile device 12 of Figures 6 and 7 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a UICC according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56 and a memory 58. Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones.
[0056] The present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps. [0057] Software and web implementations of the present invention could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. It should also be noted that the words "component" and "module," as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs. [0058] The foregoing description of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments were chosen and described in order to explain the principles of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

WHAT IS CLAIMED IS:
1. A method of encoding, comprising: generating a plurality of signal components; encoding an associated value with each of the plurality of signal components, wherein an adaptive parameter is estimated using at least one previous associated value; and utilizing information relative to a value of the adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream.
2. The method of claim 1 , wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector.
3. The method of claim 1 , wherein the associated value comprises side information indicative of a number of bits used for a lattice codevector.
4. The method of claim 3, wherein the side information comprises a plurality of non-negative integers.
5. The method of claim 1, wherein the adaptive parameter comprises a Golomb parameter parameter associated with a Golomb Rice algorithm.
6. The method of claim 1 , wherein estimating the adaptive parameter further comprises utilizing passed data sets of at least one of an inter- frame and an intra-frame prediction process.
7. The method of claim 1 , wherein the number of bits needed for the plurality of signal components to be distributed to the bitstream is further utilized to instruct a quantizer regarding a number of bits to be used in a next frame for quantizing sub-vectors.
8. The method of claim 1 , wherein the plurality of signal components are derived from a modified discrete cosine transform and lattice quantized input audio signal.
9. A computer program product, embodied on a computer-readable medium, comprising computer code configured to perform the processes of claim 1.
10. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code configured to generate a plurality of signal components for encoding an input signal; computer code configured to encode an associated value with each of the plurality of signal components, wherein an adaptive parameter is estimated using at least one previous associated value; and computer code configured to utilize information relative to a value of the adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream.
11. The apparatus of claim 10, wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector.
12. The apparatus of claim 10, wherein the associated value comprises side information indicative of a number of bits used for a lattice codevector.
13. The apparatus of claim 12, wherein the side information comprises a plurality of non-negative integers.
14. The apparatus of claim 10, wherein the adaptive parameter comprises a Golomb parameter associated with a Golomb Rice algorithm.
15. The apparatus of claim 10, wherein the memory unit further comprises computer code configured to utilize passed data sets of at least one of an inter-frame and an intra-frame prediction process to estimate the adaptive parameter.
16. The apparatus of claim 10, wherein the memory unit further comprises computer code configured to instruct a quantizer regarding a number of bits to be used in a next frame for quantizing sub-vectors based upon the number of the bits needed for the plurality of signal components to be distributed to the bitstream is further utilized.
17. The apparatus of claim 10, wherein the input signal is an input audio signal, and wherein the plurality of signal components are derived from a modified discrete cosine transform and lattice quantized input audio signal.
18. Encoding means, comprising: means for generating a plurality of signal components for encoding an input signal; means for encoding an associated value with each of the plurality of signal components, wherein an adaptive parameter is estimated using at least one previous associated value; and means for utilizing information relative to a value of the adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream.
19. The encoding means of claim 18, wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector.
20. A method, comprising: decoding information relative to a value of an adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream, wherein the adaptive parameter is estimated using at least one previous associated value; decoding an associated value included with each of a plurality of signal components; and generating an output signal represented by the plurality of signal components.
21. The method of claim 20, wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector.
22. The method of claim 21 , wherein a natural order associated with each of the lattice quantized sub-vectors is followed during the decoding of the information.
23. The method of claim 20, wherein the associated value comprises side information indicative of a number of bits used for a lattice codevector.
24. The method of claim 23, wherein the side information comprises a plurality of non-negative integers.
25. The method of claim 20, wherein the adaptive parameter comprises aGolomb parameter associated with a Golomb Rice algorithm.
26. A computer program product, embodied on a computer-readable medium, comprising computer code configured to perform the processes of claim 20.
27. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code configured to decode information relative to a value of an adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream for decoding the bitstream, wherein the adaptive parameter is estimated using at least one previous associated value; decoding an associated value included with each of a plurality of signal components; and generating an output signal represented by the plurality of signal components.
28. The apparatus of claim 27, wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector.
29. The apparatus of claim 28, wherein the associated value comprises side information indicative of a number of bits used for a lattice codevector.
30. The apparatus of claim 27, wherein the side information comprises a plurality of non-negative integers.
31. The apparatus of claim 27, wherein the adaptive parameter comprises a Golomb parameter associated with a Golomb Rice algorithm.
32. The apparatus of claim 27, wherein the input signal is an input audio signal, and wherein the plurality of signal components are derived from a modified discrete cosine transform and lattice quantized input audio signal.
33. Decoder means, comprising: means for decoding information relative to a value of an adaptive parameter to determine a number of bits needed for the plurality of signal components to be distributed to a bitstream for decoding the bitstream, wherein the adaptive parameter is estimated using at least one previous associated value; means for decoding an associated value included with each of a plurality of signal components; and means for generating an output signal represented by the plurality of signal components.
34. The decoder means of claim 33, wherein each of the plurality of signal components is comprised of a lattice quantized sub-vector
PCT/IB2008/053986 2007-10-05 2008-10-01 System and method for combining adaptive golomb coding with fixed rate quantization WO2009044346A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US97805107P 2007-10-05 2007-10-05
US60/978,051 2007-10-05

Publications (1)

Publication Number Publication Date
WO2009044346A1 true WO2009044346A1 (en) 2009-04-09

Family

ID=40328230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/053986 WO2009044346A1 (en) 2007-10-05 2008-10-01 System and method for combining adaptive golomb coding with fixed rate quantization

Country Status (1)

Country Link
WO (1) WO2009044346A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2837185A4 (en) * 2012-04-13 2016-03-16 Canon Kk Method, apparatus and system for encoding and decoding a subset of transform units of encoded video data
CN105719654A (en) * 2011-04-21 2016-06-29 三星电子株式会社 Decoding device and method for sound signal or audio signal and quantizing device
US10652543B2 (en) 2018-09-13 2020-05-12 Sony Corporation Embedded codec circuitry and method for frequency-dependent coding of transform coefficients
CN117411947A (en) * 2023-12-15 2024-01-16 安徽中科大国祯信息科技有限责任公司 Cloud edge cooperation-based water service data rapid transmission method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7106228B2 (en) * 2002-05-31 2006-09-12 Voiceage Corporation Method and system for multi-rate lattice vector quantization of a signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7106228B2 (en) * 2002-05-31 2006-09-12 Voiceage Corporation Method and system for multi-rate lattice vector quantization of a signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AARON KIELY: "Selecting the Golomb Parameter in Rice Coding", IPN PROGRESS REPORTS, vol. 42-159, 15 November 2004 (2004-11-15), pages 1 - 18, XP002514321 *
GUILLAUME FUCHS; ROCH LEFEBVRE: "A Scalable CELP/Transform Coder for Low Bit Rate Speech and Audio Coding", PROCEEDINGS OF THE 120TH AES CONVENTION, 20 May 2006 (2006-05-20) - 23 May 2006 (2006-05-23), pages 1 - 9, XP002514322 *
KELVIN H. C. ENG; DONG-YAN HUANG; SAY WEI FOO: "A New Bit Allocation Method for Low Delay Audio Coding at Low Bit Rates", PROCEEDINGS OF THE 112TH AES CONVENTION, 10 May 2002 (2002-05-10) - 13 May 2002 (2002-05-13), pages 1 - 6, XP002514320 *
STEPHANE RAGOT: "Nouvelles techniques de quantification vectorielle algébrique basées sur le codage de Voronoi ? Application au codage AMR-WB+", May 2003, SHERBROOKE UNIVERSITY, XP002514323 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719654A (en) * 2011-04-21 2016-06-29 三星电子株式会社 Decoding device and method for sound signal or audio signal and quantizing device
EP2837185A4 (en) * 2012-04-13 2016-03-16 Canon Kk Method, apparatus and system for encoding and decoding a subset of transform units of encoded video data
RU2634214C1 (en) * 2012-04-13 2017-10-24 Кэнон Кабусики Кайся Method, device and system for coding and decoding subset of units of conversion of coded video
KR101818102B1 (en) 2012-04-13 2018-01-12 캐논 가부시끼가이샤 Method, apparatus and system for encoding and decoding a subset of transform units of encoded video data
KR20180006495A (en) * 2012-04-13 2018-01-17 캐논 가부시끼가이샤 Method, apparatus and system for encoding and decoding a subset of transform units of encoded video data
RU2667715C1 (en) * 2012-04-13 2018-09-24 Кэнон Кабусики Кайся Method, device and system for coding and decoding conversion units of coded video data
KR101974320B1 (en) 2012-04-13 2019-04-30 캐논 가부시끼가이샤 Method, apparatus and medium for encoding and decoding a sub block of transform units of video data
US10873761B2 (en) 2012-04-13 2020-12-22 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding a subset of transform units of encoded video data
US10652543B2 (en) 2018-09-13 2020-05-12 Sony Corporation Embedded codec circuitry and method for frequency-dependent coding of transform coefficients
CN117411947A (en) * 2023-12-15 2024-01-16 安徽中科大国祯信息科技有限责任公司 Cloud edge cooperation-based water service data rapid transmission method
CN117411947B (en) * 2023-12-15 2024-02-23 安徽中科大国祯信息科技有限责任公司 Cloud edge cooperation-based water service data rapid transmission method

Similar Documents

Publication Publication Date Title
US8515767B2 (en) Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs
US8397117B2 (en) Method and apparatus for error concealment of encoded audio data
US7610195B2 (en) Decoding of predictively coded data using buffer adaptation
RU2585990C2 (en) Device and method for encoding by huffman method
EP1806737A1 (en) Sound encoder and sound encoding method
CN101115051B (en) Audio signal processing method, system and audio signal transmitting/receiving device
EP2856776B1 (en) Stereo audio signal encoder
CN1918630B (en) Method and device for quantizing an information signal
JP2021153305A (en) Encoder, decoder, system and methods for encoding and decoding
EP1203370A1 (en) Method for improving the coding efficiency of an audio signal
US11113934B2 (en) Encoding/decoding apparatuses and methods for encoding/decoding vibrotactile signals
TWI306336B (en) Sacle factor based bit shifting in fine granularity scalability audio coding
CN111933159B (en) Audio encoder, audio decoder, method and computer program for adapting the encoding and decoding of least significant bits
CN1918631B (en) Audio encoding device and method, audio decoding method and device
CA2551281A1 (en) Voice/musical sound encoding device and voice/musical sound encoding method
WO2009044346A1 (en) System and method for combining adaptive golomb coding with fixed rate quantization
CN110235197B (en) Stereo audio signal encoder
WO2008092719A1 (en) Audio quantization
WO2012161675A1 (en) Redundant coding unit for audio codec
US20110135007A1 (en) Entropy-Coded Lattice Vector Quantization
EP3577649B1 (en) Stereo audio signal encoder
KR100765747B1 (en) Apparatus for scalable speech and audio coding using Tree Structured Vector Quantizer
De Meuleneire et al. Algebraic quantization of transform coefficients for embedded audio coding
Golchin et al. Lossless coding of MPEG-1 Layer III encoded audio streams

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08836346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08836346

Country of ref document: EP

Kind code of ref document: A1