US20120232913A1 - Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding - Google Patents
Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding Download PDFInfo
- Publication number
- US20120232913A1 US20120232913A1 US13/414,490 US201213414490A US2012232913A1 US 20120232913 A1 US20120232913 A1 US 20120232913A1 US 201213414490 A US201213414490 A US 201213414490A US 2012232913 A1 US2012232913 A1 US 2012232913A1
- Authority
- US
- United States
- Prior art keywords
- shape
- bits
- gain
- codebook
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 239000013598 vector Substances 0.000 title claims abstract description 49
- 238000013139 quantization Methods 0.000 title claims abstract description 31
- 238000000638 solvent extraction Methods 0.000 title claims abstract description 22
- 238000005192 partition Methods 0.000 claims abstract description 39
- 230000008569 process Effects 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 14
- 230000005236 sound signal Effects 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 4
- 230000005284 excitation Effects 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 230000007812 deficiency Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011045 prefiltration Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
Definitions
- One or more implementations relate generally to digital communications, and more specifically to eliminating quantization distortion in audio codecs.
- codecs coder-decoders
- Vector quantization is used in many signal compression applications.
- Each vector is called a code vector or a codeword and the set of all the codewords is called a codebook.
- the encoder takes an input vector and outputs the index of the codeword that offers the lowest distortion.
- Gain shape vector quantization is a type of vector quantization method that has become widely used in high quality speech coding systems, and is generally used when it is important to preserve the energy of the vector.
- FIG. 1 is a diagram of an encoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment.
- FIG. 2 is a diagram of a decoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment.
- FIG. 3 is a diagram that illustrates the partitioning of audio bands into gain and shape units for use with a bit allocation and partitioning scheme in a gain shape vector quantization coding system, under an embodiment.
- FIG. 4 is a diagram that illustrates the iterative splitting of shape units to match codebook size, under an embodiment.
- FIG. 5 is a flowchart that illustrates a method of performing bit allocation in a gain shape vector quantization coding system, under an embodiment.
- Embodiments are generally directed to systems and methods for bit allocation and band partitioning for gain-shape vector quantization in an audio codec.
- the method uses an implicit, dynamic scheme to allow an encoder and decoder to recreate a series of bit allocation decisions without transmitting additional side information for each decision, based on the number of bits that are left remaining and available in a given packet. Since packet-switched networks for real-time communication must already convey the size of the packet, this side channel reduces the amount of explicit side information that must be transmitted, thus improving compression of the audio signal.
- the band comprising the allocation of bits for the shape is recursively split into equal partitions until the size of each partition is less than the maximum codebook size.
- any of the embodiments described herein may be used alone or together with one another in any combination.
- the one or more implementations encompassed within this specification may also include embodiments that are only partially mentioned or alluded to or are not mentioned or alluded to at all in this brief summary or in the abstract.
- aspects of the one or more embodiments described herein may be implemented on one or more computers or processor-based devices executing software instructions.
- the computers may be networked in a peer-to-peer or other distributed computer network arrangement (e.g., client-server), and may be included as part of an audio and/or video processing and playback system.
- client-server distributed computer network arrangement
- FIG. 1 is a diagram of an encoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment.
- the encoder 100 is a transform codec circuit based on the modified discrete cosine transform (MDCT) using a codebook for transform coefficients in the frequency domain.
- the input signal is a pulse-code modulated (PCM) signal that is input to a pre-filter stage 102 .
- the PCM coded input signal is segmented into relatively small overlapping blocks by segmentation component 104 .
- the block-segmented signal is input to the MDCT function 106 and transformed to frequency coefficients through an MDCT function.
- the frequency coefficients are grouped to resemble the critical bands of the human auditory system.
- the entire amount of energy of each group is analyzed in band energy component 108 , and the values quantized in quantizer 110 for data reduction.
- the quantized energy values are compressed through prediction by transmitting only the difference to the predicted values (delta encoding).
- the unquantized band energy values are removed from the raw DCT coefficients (normalization) in function 113 .
- the coefficients of the resulting residual signal (the so-called “band shape”) are coded by Pyramid Vector Quantization (PVQ) block 112 .
- PVQ Pyramid Vector Quantization
- PVQ is a form of spherical vector quantization using the lattice points of a pyramidal shape in multidimensional space as the quantizer codebook for quickly and efficiently quantizing Laplacian-like data, such as data generated by transforms or subband filters.
- This encoding process produces code words of fixed (predictable) length, which in turn enables robustness against bit errors and removes any need for entropy encoding.
- the output of the encoder is coded into a single bitstream by a range encoder 114 .
- the bitstream output from the range encoder 114 is then transmitted to the decoder circuit.
- the encoder 100 uses a technique known as band folding, which delivers a similar effect to the spectral band replication by reusing coefficients of lower bands for higher bands, while also reducing algorithmic delay and computational complexity.
- FIG. 2 is a block diagram of a decoder circuit for use in an audio coding system that includes a dynamic coefficient spreading mechanism, under an embodiment.
- the decoder 200 receives the encoded data from the encoder and processes the input signal through a range decoder 202 . From the range decoder 202 , the signal is passed through an energy decoder 203 and a PVQ decoder 208 , and to pitch post filter 210 . The values from PVQ decoder 208 are multiplied to the band shape coefficients by function 204 , and then transformed back to PCM data through inverse MDCT function 206 . The individual blocks may be rejoined using weighted overlap-add (WOLA) in a folding block.
- WOLA weighted overlap-add
- bit allocation and partitioning function 220 that is incorporated as part of PVQ 112 provides the bit allocation and partitioning functions described herein.
- a separate bit allocation block 205 provides bit allocation data to the energy decoder 203 and PVQ decoder 208 .
- a similar bit allocation block may be provided on the encoder side between quantizer 110 and PVQ 112 for symmetry between the encoder and decoder.
- the codec represented by FIG. 1 and FIG. 2 may be an audio codec, such as the CELT (Constrained Energy Lapped Transform) codec developed by the Xiph.Org Foundation. It should be noted, however, that any similar codec might be used.
- CELT Constrained Energy Lapped Transform
- an input audio signal is mapped from the time domain into a set of frequency domain coefficients, using a transform function.
- This function may be either a transform with a fixed resolution across all frequencies, such as the Modified Discrete Cosine Transform (MDCT), or one with variable time-frequency (TF) resolution.
- MDCT Modified Discrete Cosine Transform
- TF variable time-frequency
- Embodiments of the codec circuits of FIGS. 1 and 2 are used to implement a signal compression system that employs gain shape vector quantization methods.
- a vector quantization method comprises passing signal vectors of a codebook through a synthesis filter to reproduce signals and using error values between the reproduced signals and the input signal in order to determine the index of a signal vector having the smallest error.
- gain shape vector quantization a vector can be expressed in terms of a gain and a shape, which is a unit norm vector that can be coded using a codebook with unit norm vectors. The gain and shape can be quantized separately using some respective number of bits so that either the gain or shape is more accurately represented.
- Embodiments of the signal processing systems and methods described herein implement methods for bit allocation and band partitioning for use in an audio codec based on gain-shape vector quantization. In certain audio applications, these methods allow for the practical adaptation of bit rates from 32 kbps to 255 kbps per channel and latencies of 5 ms or less up to more than 20 ms.
- the system uses an implicit-dynamic scheme to allow an encoder and decoder both to recreate a series of bit allocation decisions without requiring the transmission of additional side information.
- Each of the encoder 100 and decoder 200 stages executes a respective bit allocation and partitioning process 120 and 220 to determine appropriate bit allocations for the gain and shape values of the audio signal.
- the input PCM signal is partitioned into (possibly overlapping) frames, each of which may contain one or more blocks that are transformed to frequency coefficients through an MDCT (or similar) function.
- the frequency coefficients are grouped into a number of bands, whose size may vary to match properties of the human ear. This accounts for psycho acoustic effects associated with audio signal processing.
- Each band may further group coefficients into tiles, where each tile contains coefficients from that band corresponding to a single block.
- the bands are then quantized, coded, and transmitted to the decoder 200 , and may possibly undergo time-frequency (TF)-resolution changes (such as described in U.S. Patent App. No. 61/384,154).
- TF time-frequency
- FIG. 3 is a diagram that illustrates the partitioning of audio bands into subsequent units for use with a bit allocation and partitioning scheme in a gain shape vector quantization coding system, under an embodiment.
- coefficients representing the audio content 302 are partitioned into one or more of bands 304 , whose size may vary to match properties of the human ear. These coefficients may be the output of any appropriate process, such as a time-domain filtering operation, the excitation of an LPC (Linear Predictive Coding) model, the result of a subband filterbank such as the MDCT, or a combination of these processes, or the result of some other processing.
- LPC Linear Predictive Coding
- the codec system under an embodiment includes a gain-shape allocation function that determines the number of bits to allocate to coding the gain versus the number of bits to code the shape. Essentially the system determines the size of the codebook to be used for the gain (bit rate) and then uses the remaining bits to code the shape. After coding an initial set of parameters, such as flags to set the operating mode, transform sizes, filtering parameters, a coarse representation of the gains, or other side information, any remaining bits in the packet are distributed to the individual bands. The exact method of distributing bits to bands is usually based on psychoacoustic principles, which are well-known in the art, and depend on the specific representation of audio content being used, and may additionally benefit from a small amount of side information to adapt to the signal being coded.
- N the number of dimensions N and the target bitrate b
- MSE mean squared error
- one known method derives this allocation under the assumptions that the gain is quantized using an A-law quantizer and the shape is quantized using an optimal spherical quantizer (for which there is no known construction for arbitrary dimension) and that the bitrate b is large.
- the result for the size of the codebook to use for the gain, N g is given in Eq. 1 as follows:
- N g ( ( N - 1 ) ⁇ C g C svq ) N - 1 2 ⁇ N ⁇ 2 b N , ( 1 )
- C g is a constant that depends on the A-law quantizer parameter, but not N or b.
- the value of C svq is:
- N g and C svq are quite complicated, and requires several processor-intensive division operations, as well as the evaluation of several transcendental functions.
- the result that is desired is log 2 N g , which is the number of bits to use, and not N g , itself, further complicating the situation.
- these calculations are not particularly well suited for implementation on low-powered DSP processors, such as may be found in many commercial audio compression systems.
- the assumption that b is large gives suboptimal results when b is in fact small, as is often the case for low-bitrate audio coding.
- a gain-shape allocation method utilizes an approximation method to simplify the gain shape bit allocation calculations in order to simplify the processing operations.
- the process applies an approximation function for large factorials (e.g., Stirling's approximation) to Eq. (2) above to produce the following expression:
- bit allocation for the gain is actually computed via the expression:
- G and G 2 are experimentally chosen constants (selected to be close to 1 ⁇ 2 log 2 C g and G+N/2, respectively), and a is a low-rate correction factor determined as follows:
- the codebook size may be limited to various sizes (such as a whole number of bits), subject to some maximum, b g max .
- the actual size of the codebook is determined as given in Eq. 8, as follows:
- the constants G and G 2 can be chosen experimentally by an offline training procedure. This procedure first collects a large number of training vectors to be quantized, and measures the average MSE after quantizing at every supported combination of gain quantizer bitrate and shape quantizer bitrate. For a given target bitrate and for each supported gain quantizer bitrate, the process finds the largest shape quantizer bitrate that yields a total less than the target, and the smallest shape quantizer bitrate that yields a total greater than the target, and uses these to interpolate an average MSE value at the target bitrate. Finally, the process selects the gain quantizer bitrate that minimizes this interpolated MSE for the target bitrate.
- the process then repeats with all supported N>2 for all desired bitrate targets, and picks the value of G that minimizes the mismatch between the decisions made by this process and those made by Eq. 8.
- the roles of gain and shape can be reversed in this process, but there are typically fewer supported gain bitrates than shape bitrates, which can make this option less efficient.
- Eq. 8 may be approximated using fixed-point integer arithmetic. The equation requires only a single division and a single logarithm calculation, both of which can be accelerated through the use of a small lookup table.
- the normalized coefficients of an entire band that comprise the “shape,” are quantized.
- the normalized coefficients of an entire band, which compose the shape would be quantized with a single vector quantizer, but in practice efficient vector quantizers with codewords larger than the size of a typical microprocessor word, e.g., 32 bits, are difficult to implement. That is, the number of bits allocated for the shape may be on the order of hundreds of bits, but such a codebook would be too big for practical purposes. To address this issue, the process undertakes a band partitioning and allocation procedure.
- Algebraic codebooks such as the Pyramid Vector Quantizer are an ideal choice for a vector quantizer when a large number of band sizes, N, and bit rates b s , must be supported. They can be implemented for sizes larger than 32 bits using multiple-precision arithmetic, but this has a large cost in terms of computation time, code size, and data size.
- the following described method of band partitioning and allocation generally works with any suitable vector quantizer, but the Pyramid Vector Quantizer is used in a preferred embodiment.
- a band is allocated more than a certain number of bits for the shape, it is recursively split into halves (partitioned) until the allocation for each partition becomes small enough to code with a single vector quantization codeword, or until the maximum partition depth is reached.
- the exact number of bits required to trigger a split may vary from band to band, or even among the partitions within a band.
- a threshold is set a constant amount above the largest codebook size for the current partition (usually close to 32 bits, but sometimes significantly smaller), and it is only split into two more partitions if the target allocation exceeds this amount.
- splitting reduces the VQ (vector quantization) dimension of the codebooks used, it adds some small amount of coding inefficiency, and the constant amount added to the threshold helps compensate for this overhead by avoiding splitting when the increased bit allocation would not result in lower distortion.
- Alternative embodiments may utilize other splitting rules, like splitting when the allocation exceeds a fixed threshold (such as 32 bits), which is simpler to implement and reduces compression performance only by a very tiny amount.
- the codec must determine the optimal bit allocations for ⁇ , x 1 , and x 2 , denoted b ⁇ , b 1 , and b 2 , respectively.
- the value ⁇ represents the ratio of the gains
- x 1 , and x 2 are the normalized shapes that are generated after factoring out the gains from y 1 and y 2 .
- the normalized coefficients in a band may be further grouped into one or more tiles (after possible deinterleaving or other reordering), where each tile contains coefficients from distinct periods of time.
- the normalized shape coefficients 310 are grouped into tiles 314 after deinterleaving process 312 .
- These tiles 314 may vary in size, and in the preferred embodiment the size of each tile may vary from band to band, though all the tiles within a band are the same size. It is not necessary that the basis functions corresponding to coefficients within an individual tile be exactly zero outside of the time period that tile correspond to, but minimizing their magnitude outside this period avoids leakage and reduces the occurrence of pre-echo artifacts.
- Knowledge of the tile groupings does not affect the partitioning process, and a partition may contain several tiles, a single tile, or part of a single tile. However the tile groupings do affect the optimal bit allocation, which attempts to take into account temporal masking.
- FIG. 4 is a diagram that illustrates the iterative splitting of shape units to match codebook size, under an embodiment.
- the tiles 314 of the normalized shape coefficients are successively split into partitions 402 until the allocation for each partition becomes small enough to code with a single vector quantization codeword.
- Quantized values of ⁇ , g 1 , and g 2 denoted ⁇ circumflex over ( ⁇ ) ⁇ , g 1 , and g 2 , respectively are generated for each partition.
- These values, along with the gains 308 are processed by quantization/coding stage 404 .
- a bit allocation process is used to determine the optimal bit allocations for ⁇ , x 1 , and x 2 .
- b p is denoted as the current allocation for the band, e.g., either b s if the entire band is being partitioned, or b 1 or b 2 from a previous round of partitioning.
- the target allocation for ⁇ in terms of the total allocation for the current partition, b p , and the size of each partition after splitting, N p is determined by the following Eq. 9:
- the process uses a uniform probability distribution function (PDF) to drive the range coder, while for partitions that contain data only from a single tile, it uses a triangular PDF.
- PDF uniform probability distribution function
- Many other coding schemes of varying complexity and compression performance are also possible. Because these coding schemes can use a variable number of bits, a fixed-point estimate of the actual number of bits used, b ⁇ circumflex over ( ⁇ ) ⁇ is subtracted from the total allocation b p , instead of the original target allocation.
- T ⁇ is a temporal masking offset, computed according to psychoacoustic principals.
- T ⁇ ⁇ max ⁇ ( tN 8 , - ⁇ ( ⁇ ⁇ ) ) , ⁇ ⁇ ⁇ ⁇ 4 , - t ⁇ ⁇ ⁇ ( ⁇ ⁇ ) 32 , ⁇ ⁇ > ⁇ 4 , ( 13 )
- T ⁇ 0.
- Different values depending on the sampling rates, tile sizes, and other factors may also be used as appropriate, depending on the constraints and requirements of the system.
- dequantized versions of the original gains may be recovered as shown in Eq. 14:
- the denominators are 1.
- a practical implementation will use an integer approximation to cos ⁇ circumflex over ( ⁇ ) ⁇ and sin ⁇ circumflex over ( ⁇ ) ⁇ , in order to use them for computing log 2 tan ⁇ circumflex over ( ⁇ ) ⁇ in Eq. 12 (also using an integer approximation), which must produce exactly the same value in the encoder and the decoder.
- each of the encoder 100 and decoder 200 circuits includes a respective bit allocation/partitioning process 120 and 220 . These processes determine and generate the appropriate signals for the coding and allocation of bits for the gain and shape parameters.
- process 120 of the encoder is incorporated in the encoder side PVQ function 112 and makes the bit allocation decisions and transmits symbols using codebooks that are sized to take up the appropriate number of bits. These symbols are then sent in a packet to the decoder 200 .
- the bit allocation/processing component 220 of the decoder 200 reads the symbols and repeats the same calculations as performed in process 120 to determine the size of the codebook to use to read the symbols that follow in the packet.
- the encoder determines the number of bits to use for ⁇ and sends the quantized value using the requisite number of bits.
- the decoder reads ⁇ and figures out from its value the number of bits to use for the quantized values of x 1 and x 2 using Eqs. 10 and 11.
- FIG. 5 is a flowchart that illustrates an overall method of performing bit allocation in a gain shape vector quantization coding system, under an embodiment.
- the overall process begins with act 502 , which determines the size of the codebook to use for the gain, such as determined using Eq. 8.
- the number of bits allocated to the shape may exceed the practical codebook size (e.g., 32 bits).
- the band is split into partitions that are smaller than the maximum codebook size, act 506 .
- the first split operation creates two half bands or partitions.
- the relative magnitude of values on either side of the split are encoded and the process then determines whether the size of each partition exceeds the maximum codebook size, act 508 . If the first split does not generate sufficiently small partitions, the splitting process is executed recursively until the appropriate codebook size is reached, act 510 . The allocation of bits for the ratio of the magnitudes of each half, ⁇ , and the two partitions, x 1 and x 2 , are then allocated.
- a partition 402 may not use all of its allocated bits. In order to reduce the waste incurred by not using the entire allocation, these bits may be redistributed to subsequent partitions, and even subsequent bands. To maximize the effectiveness of the redistribution, the described method may employ a rebalancing technique to code the larger of the two partitions in each split (the one allocated the greater number of bits) first, followed by the smaller one, after possibly adjusting its allocation to use some or all of the bits the first one failed to use. Bits unused during shape coding may also be redistributed for improving the precision of the gains.
- the input signal may be a digitized video signal that is organized such that the frequency coefficients are grouped into a number of bands, whose size may vary to match properties of the human eye to account for the psycho visual effects associated with video signal processing.
- Appropriate changes may be made to the values of certain variables in the equations shown above, depending on the characteristics of the video signal and the requirements of the video codec components.
- Embodiments are directed to a method and system of coding an audio signal using gain-shape vector quantization, comprising: organizing coefficients representing audio content into one or more bands; dividing each band into a gain and a shape; determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method, wherein the size of the codebook dictates a number of bits to allocate to the size; subtracting, in the processor-based device, the number of bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape; determining if the number of bits allocated to the shape is less than a defined number of bits used in the codebook; and recursively dividing the band into equal size partitions until the number of bits allocated to the shape in each partition is less than the defined number.
- Embodiments are further directed to a method and system of coding an audio signal using gain-shape vector quantization, comprising: organizing coefficients representing audio content into one or more bands; dividing each band into a gain and a shape; quantizing the gain using an A-law quantizer, and quantizing the shape using an optimal spherical quantizer; determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method for large factorials that approximates the size of the codebook to use for the gain, wherein the size of the codebook dictates a number of bits to allocate to the size; and subtracting, in the processor-based device, the number bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape.
- the terms “component,” “module,” and “process,” may be used interchangeably to refer to a processing unit that performs a particular function and that may be implemented through computer program code (software), digital or analog circuitry, computer firmware, or any combination thereof.
- the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Embodiments are generally directed to systems and methods for bit allocation and band partitioning for gain-shape vector quantization in an audio codec. An audio codec implements a method that uses an implicit, dynamic scheme to allow an encoder and decoder to recreate a series of bit allocation decisions for gain and shape without transmitting additional side information for each decision, based on the number of bits that are left remaining and available in a given packet. For implementation in practical codecs, the band comprising the allocation of bits for the shape is recursively split into equal partitions until the number of bits allocated to each partition is less than the maximum codebook size.
Description
- This application claims priority to provisional U.S. Provisional Patent Application No. 61/450,053, filed on Mar. 7, 2011 and entitled “Method and System for Bit Allocation and Partitioning in Gain-Shape Vector Quantization for Audio Coding,” which is incorporated herein in its entirety.
- A portion of the disclosure of this patent document including any priority documents contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
- One or more implementations relate generally to digital communications, and more specifically to eliminating quantization distortion in audio codecs.
- The present application incorporates by reference U.S. Patent Application No. 61/384,154, which is assigned to the assignees of the present application.
- The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches.
- The transmission and storage of computer data increasingly relies on the use of codecs (coder-decoders) to compress/decompress digital media files to reduce the file sizes to manageable sizes to optimize transmission bandwidth and memory use. Vector quantization is used in many signal compression applications. In general, a vector quantizer maps k-dimensional vectors in a vector space into a finite set of vectors Y={yi:i=1, 2, . . . , N}. Each vector is called a code vector or a codeword and the set of all the codewords is called a codebook. In a codec, the encoder takes an input vector and outputs the index of the codeword that offers the lowest distortion. The lowest distortion is typically found by evaluating the Euclidean distance between the input vector and each codeword in the codebook. Once the closest codeword is found, the index of that codeword is sent through a channel, and is then replaced with the associated codeword. Gain shape vector quantization is a type of vector quantization method that has become widely used in high quality speech coding systems, and is generally used when it is important to preserve the energy of the vector.
- Many existing low-delay audio codecs only support a limited number of frame sizes and bitrates, often hard-coding the dimensions and rates of the codebooks they use. This allows careful tuning of the rate allocation to various pieces of the codec, but is not very flexible. This lack of flexibility limits the ability of the codec to adapt to the variable capacity of modern network channels, and to trade off latency for quality and loss robustness. Moreover, with regard to gain shape vector quantization, present methods of determining bit rate allocations for the gain and shape quantizations require the solution of processor-intensive calculations that are not appropriate for use with low-power or fixed-point digital signal processors (DSPs).
- What is needed, therefore, is an efficient system for bit allocation and band partitioning for use in an audio codec for gain-shape vector quantization operations.
- In the following drawings like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more implementations are not limited to the examples depicted in the figures.
-
FIG. 1 is a diagram of an encoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment. -
FIG. 2 is a diagram of a decoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment. -
FIG. 3 is a diagram that illustrates the partitioning of audio bands into gain and shape units for use with a bit allocation and partitioning scheme in a gain shape vector quantization coding system, under an embodiment. -
FIG. 4 is a diagram that illustrates the iterative splitting of shape units to match codebook size, under an embodiment. -
FIG. 5 is a flowchart that illustrates a method of performing bit allocation in a gain shape vector quantization coding system, under an embodiment. - Embodiments are generally directed to systems and methods for bit allocation and band partitioning for gain-shape vector quantization in an audio codec. The method uses an implicit, dynamic scheme to allow an encoder and decoder to recreate a series of bit allocation decisions without transmitting additional side information for each decision, based on the number of bits that are left remaining and available in a given packet. Since packet-switched networks for real-time communication must already convey the size of the packet, this side channel reduces the amount of explicit side information that must be transmitted, thus improving compression of the audio signal. For implementation in practical codecs, the band comprising the allocation of bits for the shape is recursively split into equal partitions until the size of each partition is less than the maximum codebook size.
- Any of the embodiments described herein may be used alone or together with one another in any combination. The one or more implementations encompassed within this specification may also include embodiments that are only partially mentioned or alluded to or are not mentioned or alluded to at all in this brief summary or in the abstract. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
- Aspects of the one or more embodiments described herein may be implemented on one or more computers or processor-based devices executing software instructions. The computers may be networked in a peer-to-peer or other distributed computer network arrangement (e.g., client-server), and may be included as part of an audio and/or video processing and playback system.
- Embodiments are directed to an audio coding scheme implemented in a codec (coder-decoder) system.
FIG. 1 is a diagram of an encoder circuit that implements a bit allocation and band partitioning scheme in an audio coding system, under an embodiment. Theencoder 100 is a transform codec circuit based on the modified discrete cosine transform (MDCT) using a codebook for transform coefficients in the frequency domain. The input signal is a pulse-code modulated (PCM) signal that is input to apre-filter stage 102. The PCM coded input signal is segmented into relatively small overlapping blocks bysegmentation component 104. The block-segmented signal is input to theMDCT function 106 and transformed to frequency coefficients through an MDCT function. Different block sizes can be selected depending on application requirements and constraints. For example, short block sizes allow for low latency, but may cause a decrease in frequency resolution. The frequency coefficients are grouped to resemble the critical bands of the human auditory system. The entire amount of energy of each group is analyzed inband energy component 108, and the values quantized inquantizer 110 for data reduction. The quantized energy values are compressed through prediction by transmitting only the difference to the predicted values (delta encoding). The unquantized band energy values are removed from the raw DCT coefficients (normalization) infunction 113. The coefficients of the resulting residual signal (the so-called “band shape”) are coded by Pyramid Vector Quantization (PVQ)block 112. PVQ is a form of spherical vector quantization using the lattice points of a pyramidal shape in multidimensional space as the quantizer codebook for quickly and efficiently quantizing Laplacian-like data, such as data generated by transforms or subband filters. This encoding process produces code words of fixed (predictable) length, which in turn enables robustness against bit errors and removes any need for entropy encoding. The output of the encoder is coded into a single bitstream by arange encoder 114. The bitstream output from therange encoder 114 is then transmitted to the decoder circuit. - In an embodiment, and in connection with the
PVQ function 112, theencoder 100 uses a technique known as band folding, which delivers a similar effect to the spectral band replication by reusing coefficients of lower bands for higher bands, while also reducing algorithmic delay and computational complexity. -
FIG. 2 is a block diagram of a decoder circuit for use in an audio coding system that includes a dynamic coefficient spreading mechanism, under an embodiment. Thedecoder 200 receives the encoded data from the encoder and processes the input signal through arange decoder 202. From therange decoder 202, the signal is passed through anenergy decoder 203 and aPVQ decoder 208, and to pitchpost filter 210. The values fromPVQ decoder 208 are multiplied to the band shape coefficients byfunction 204, and then transformed back to PCM data throughinverse MDCT function 206. The individual blocks may be rejoined using weighted overlap-add (WOLA) in a folding block. Many parameters are not explicitly coded, but instead are reconstructed using the same functions as the encoder. The decoded signal is then processed through apitch post filter 210 and output to an audio output circuit, such as audio speaker(s). In the embodiment ofFIG. 2 , a bit allocation andpartitioning function 220 that is incorporated as part ofPVQ 112 provides the bit allocation and partitioning functions described herein. A separatebit allocation block 205 provides bit allocation data to theenergy decoder 203 andPVQ decoder 208. A similar bit allocation block may be provided on the encoder side betweenquantizer 110 and PVQ 112 for symmetry between the encoder and decoder. - In an embodiment, the codec represented by
FIG. 1 andFIG. 2 may be an audio codec, such as the CELT (Constrained Energy Lapped Transform) codec developed by the Xiph.Org Foundation. It should be noted, however, that any similar codec might be used. - For the embodiment of
FIGS. 1 and 2 , an input audio signal is mapped from the time domain into a set of frequency domain coefficients, using a transform function. This function may be either a transform with a fixed resolution across all frequencies, such as the Modified Discrete Cosine Transform (MDCT), or one with variable time-frequency (TF) resolution. An example of a variable time-frequency resolution scheme is described in U.S. Patent Application No. 61/384,154, which is hereby incorporated by reference in its entirety. - Embodiments of the codec circuits of
FIGS. 1 and 2 are used to implement a signal compression system that employs gain shape vector quantization methods. A vector quantization method comprises passing signal vectors of a codebook through a synthesis filter to reproduce signals and using error values between the reproduced signals and the input signal in order to determine the index of a signal vector having the smallest error. In gain shape vector quantization, a vector can be expressed in terms of a gain and a shape, which is a unit norm vector that can be coded using a codebook with unit norm vectors. The gain and shape can be quantized separately using some respective number of bits so that either the gain or shape is more accurately represented. - Embodiments of the signal processing systems and methods described herein implement methods for bit allocation and band partitioning for use in an audio codec based on gain-shape vector quantization. In certain audio applications, these methods allow for the practical adaptation of bit rates from 32 kbps to 255 kbps per channel and latencies of 5 ms or less up to more than 20 ms. The system uses an implicit-dynamic scheme to allow an encoder and decoder both to recreate a series of bit allocation decisions without requiring the transmission of additional side information. Each of the
encoder 100 anddecoder 200 stages executes a respective bit allocation andpartitioning process - In an embodiment of the audio codec system, as shown in
FIGS. 1 and 2 , the input PCM signal is partitioned into (possibly overlapping) frames, each of which may contain one or more blocks that are transformed to frequency coefficients through an MDCT (or similar) function. After transformation to the frequency domain, the frequency coefficients are grouped into a number of bands, whose size may vary to match properties of the human ear. This accounts for psycho acoustic effects associated with audio signal processing. Each band may further group coefficients into tiles, where each tile contains coefficients from that band corresponding to a single block. The bands are then quantized, coded, and transmitted to thedecoder 200, and may possibly undergo time-frequency (TF)-resolution changes (such as described in U.S. Patent App. No. 61/384,154). -
FIG. 3 is a diagram that illustrates the partitioning of audio bands into subsequent units for use with a bit allocation and partitioning scheme in a gain shape vector quantization coding system, under an embodiment. Under an embodiment, coefficients representing theaudio content 302 are partitioned into one or more ofbands 304, whose size may vary to match properties of the human ear. These coefficients may be the output of any appropriate process, such as a time-domain filtering operation, the excitation of an LPC (Linear Predictive Coding) model, the result of a subband filterbank such as the MDCT, or a combination of these processes, or the result of some other processing. As shown inFIG. 3 , thebands 304 are processed through anormalization process 306 so that each band y is divided into again 308, g, and ashape 310, x, where y=g·x and ∥x∥=1 under some norm, such as the L2 norm. - The codec system under an embodiment includes a gain-shape allocation function that determines the number of bits to allocate to coding the gain versus the number of bits to code the shape. Essentially the system determines the size of the codebook to be used for the gain (bit rate) and then uses the remaining bits to code the shape. After coding an initial set of parameters, such as flags to set the operating mode, transform sizes, filtering parameters, a coarse representation of the gains, or other side information, any remaining bits in the packet are distributed to the individual bands. The exact method of distributing bits to bands is usually based on psychoacoustic principles, which are well-known in the art, and depend on the specific representation of audio content being used, and may additionally benefit from a small amount of side information to adapt to the signal being coded.
- Once bits have been allocated to a particular band, they must be partitioned between the scalar gain quantizer and the vector shape quantizer of dimension N−1. It is assumed that N≧2, since if N=1, the “shape” consists of, at most, a single sign bit, and all the remaining bits should go to the gain. Given the number of dimensions N and the target bitrate b, one can find the allocation that minimizes the mean squared error (MSE) introduced by the quantization, using known methods. For example, one known method derives this allocation under the assumptions that the gain is quantized using an A-law quantizer and the shape is quantized using an optimal spherical quantizer (for which there is no known construction for arbitrary dimension) and that the bitrate b is large. The result for the size of the codebook to use for the gain, Ng, is given in Eq. 1 as follows:
-
- where Cg is a constant that depends on the A-law quantizer parameter, but not N or b. The value of Csvq is:
-
- As can be seen, the expression based on Ng and Csvq is quite complicated, and requires several processor-intensive division operations, as well as the evaluation of several transcendental functions. In addition, the result that is desired is log2Ng, which is the number of bits to use, and not Ng, itself, further complicating the situation. As such, these calculations are not particularly well suited for implementation on low-powered DSP processors, such as may be found in many commercial audio compression systems. In addition, the assumption that b is large gives suboptimal results when b is in fact small, as is often the case for low-bitrate audio coding.
- In an embodiment, a gain-shape allocation method utilizes an approximation method to simplify the gain shape bit allocation calculations in order to simplify the processing operations. The process applies an approximation function for large factorials (e.g., Stirling's approximation) to Eq. (2) above to produce the following expression:
-
- In above Eq. 3, the value, Csvq rapidly approaches 1 as N becomes large. Substituting the
value 1 into Eq. 1 for Csvq and replacing (N−1) with N (which compensates for undershooting Csvq for small N) produces the following: -
Ng≈√{square root over (CgN)}2b/N, (4) - which is moderately accurate for N>2. This gives the bit allocation for the gain, bg, (in bits) as:
-
- In an embodiment, the bit allocation for the gain is actually computed via the expression:
-
- In the above Eq. 6, the values G and G2 are experimentally chosen constants (selected to be close to ½ log2Cg and G+N/2, respectively), and a is a low-rate correction factor determined as follows:
-
- Given suitably chosen values of G2 and G, this comes quite close to minimizing the mean square error (MSE) over a large range of values of N and b, but is much simpler to compute than Eq. 1. In a practical codec, one cannot use negative bits, and the codebook size may be limited to various sizes (such as a whole number of bits), subject to some maximum, bg max. Thus in a preferred embodiment, the actual size of the codebook is determined as given in Eq. 8, as follows:
-
- The above Eq. 8 rounds the calculated number of bits for gain to an integer number of bits, as well as imposes bounds on the possible value and prevents the possibility of negative bits.
- In an embodiment, the constants G and G2 can be chosen experimentally by an offline training procedure. This procedure first collects a large number of training vectors to be quantized, and measures the average MSE after quantizing at every supported combination of gain quantizer bitrate and shape quantizer bitrate. For a given target bitrate and for each supported gain quantizer bitrate, the process finds the largest shape quantizer bitrate that yields a total less than the target, and the smallest shape quantizer bitrate that yields a total greater than the target, and uses these to interpolate an average MSE value at the target bitrate. Finally, the process selects the gain quantizer bitrate that minimizes this interpolated MSE for the target bitrate. The process is repeated with N=2 for all desired bitrate targets and picks the value of G2 that minimizes the mismatch between the decisions made by this process and those made by Eq. 8. The process then repeats with all supported N>2 for all desired bitrate targets, and picks the value of G that minimizes the mismatch between the decisions made by this process and those made by Eq. 8. The roles of gain and shape can be reversed in this process, but there are typically fewer supported gain bitrates than shape bitrates, which can make this option less efficient.
- Once the number of bits bg for the gain is determined, a simple subtraction step is used to determine the number of bits to allocate to the shape bs. In this case, the remaining bs=b−bg bits are allocated to the shape. In practice, Eq. 8 may be approximated using fixed-point integer arithmetic. The equation requires only a single division and a single logarithm calculation, both of which can be accelerated through the use of a small lookup table.
- Once the number of bits to be allocated respectively to the gain (bg) and shape (bs) have been determined, the normalized coefficients of an entire band that comprise the “shape,” are quantized. Ideally, the normalized coefficients of an entire band, which compose the shape would be quantized with a single vector quantizer, but in practice efficient vector quantizers with codewords larger than the size of a typical microprocessor word, e.g., 32 bits, are difficult to implement. That is, the number of bits allocated for the shape may be on the order of hundreds of bits, but such a codebook would be too big for practical purposes. To address this issue, the process undertakes a band partitioning and allocation procedure. Algebraic codebooks such as the Pyramid Vector Quantizer are an ideal choice for a vector quantizer when a large number of band sizes, N, and bit rates bs, must be supported. They can be implemented for sizes larger than 32 bits using multiple-precision arithmetic, but this has a large cost in terms of computation time, code size, and data size. The following described method of band partitioning and allocation generally works with any suitable vector quantizer, but the Pyramid Vector Quantizer is used in a preferred embodiment.
- To maintain processing efficiency, when a band is allocated more than a certain number of bits for the shape, it is recursively split into halves (partitioned) until the allocation for each partition becomes small enough to code with a single vector quantization codeword, or until the maximum partition depth is reached. The exact number of bits required to trigger a split may vary from band to band, or even among the partitions within a band. In a preferred embodiment, a threshold is set a constant amount above the largest codebook size for the current partition (usually close to 32 bits, but sometimes significantly smaller), and it is only split into two more partitions if the target allocation exceeds this amount. Because splitting reduces the VQ (vector quantization) dimension of the codebooks used, it adds some small amount of coding inefficiency, and the constant amount added to the threshold helps compensate for this overhead by avoiding splitting when the increased bit allocation would not result in lower distortion. Alternative embodiments may utilize other splitting rules, like splitting when the allocation exceeds a fixed threshold (such as 32 bits), which is simpler to implement and reduces compression performance only by a very tiny amount.
- If x is the input to the splitting process (either a whole band, or a single partition that has already been split at least once), then it is split into two pieces y1 and y2, such that x is the concatenation of y1 and y2. These are again separated into gains, g1 and g2, and shapes, x1 and x2, such that y1=g1x1 and y2=g2x2 and ∥x1∥=∥x2∥=1. The relative magnitude of the two partitions is coded using a scalar parameter θ=arctan(g2/g1), in the range [0, π/2]. Given these parameters, the codec must determine the optimal bit allocations for θ, x1, and x2, denoted bθ, b1, and b2, respectively. The value θ represents the ratio of the gains, and x1, and x2 are the normalized shapes that are generated after factoring out the gains from y1 and y2.
- The normalized coefficients in a band may be further grouped into one or more tiles (after possible deinterleaving or other reordering), where each tile contains coefficients from distinct periods of time. Thus, as shown with reference to
FIG. 3 , the normalizedshape coefficients 310 are grouped intotiles 314 after deinterleavingprocess 312. Thesetiles 314 may vary in size, and in the preferred embodiment the size of each tile may vary from band to band, though all the tiles within a band are the same size. It is not necessary that the basis functions corresponding to coefficients within an individual tile be exactly zero outside of the time period that tile correspond to, but minimizing their magnitude outside this period avoids leakage and reduces the occurrence of pre-echo artifacts. Knowledge of the tile groupings does not affect the partitioning process, and a partition may contain several tiles, a single tile, or part of a single tile. However the tile groupings do affect the optimal bit allocation, which attempts to take into account temporal masking. -
FIG. 4 is a diagram that illustrates the iterative splitting of shape units to match codebook size, under an embodiment. As shown inFIG. 4 , thetiles 314 of the normalized shape coefficients are successively split intopartitions 402 until the allocation for each partition becomes small enough to code with a single vector quantization codeword. Quantized values of θ, g1, and g2, denoted {circumflex over (θ)}, g1 , and g2, respectively are generated for each partition. These values, along with thegains 308 are processed by quantization/coding stage 404. - In an embodiment, a bit allocation process is used to determine the optimal bit allocations for θ, x1, and x2. In this process, bp is denoted as the current allocation for the band, e.g., either bs if the entire band is being partitioned, or b1 or b2 from a previous round of partitioning. Following a process similar to that used for Eq. 8, above, the target allocation for θ in terms of the total allocation for the current partition, bp, and the size of each partition after splitting, Np, is determined by the following Eq. 9:
-
- In the above Eq. 9, S is an experimentally determined constant. As before, a practical implementation will need to map this allocation to a real codebook for θ. It is possible to derive a number of alternatives for this procedure, and use it to produce a quantized θ value, {circumflex over (θ)}. For example, in the preferred embodiment, the allocation is capped at a maximum value, bθ max, and the codebook size is computed from an integer approximation of Eq. 9 using ⅛th bit precision. A preferred embodiment actually codes {circumflex over (θ)} using a range coder, which allows codebooks that do not use a whole number of bits. For partitions that contain data from more than one tile, the process uses a uniform probability distribution function (PDF) to drive the range coder, while for partitions that contain data only from a single tile, it uses a triangular PDF. Many other coding schemes of varying complexity and compression performance are also possible. Because these coding schemes can use a variable number of bits, a fixed-point estimate of the actual number of bits used, b{circumflex over (θ)} is subtracted from the total allocation bp, instead of the original target allocation.
- The allocation for the two partitions x1 and x2 is determined, in turn, as given in Eqs. 10 and 11:
-
- In the above Eq. 12, Tδ is a temporal masking offset, computed according to psychoacoustic principals. In a preferred embodiment, when the total number of tiles on both sides of the partition, t, is greater than 1, then
-
- Otherwise Tδ=0. Different values depending on the sampling rates, tile sizes, and other factors may also be used as appropriate, depending on the constraints and requirements of the system.
- In the
decoder 200, dequantized versions of the original gains may be recovered as shown in Eq. 14: -
- When the L2 norm is used, the denominators are 1. A practical implementation will use an integer approximation to cos {circumflex over (θ)} and sin {circumflex over (θ)}, in order to use them for computing log2tan {circumflex over (θ)} in Eq. 12 (also using an integer approximation), which must produce exactly the same value in the encoder and the decoder.
- As shown in
FIGS. 1 and 2 , each of theencoder 100 anddecoder 200 circuits includes a respective bit allocation/partitioning process process 120 of the encoder is incorporated in the encoderside PVQ function 112 and makes the bit allocation decisions and transmits symbols using codebooks that are sized to take up the appropriate number of bits. These symbols are then sent in a packet to thedecoder 200. The bit allocation/processing component 220 of thedecoder 200 reads the symbols and repeats the same calculations as performed inprocess 120 to determine the size of the codebook to use to read the symbols that follow in the packet. Thus, the encoder determines the number of bits to use for θ and sends the quantized value using the requisite number of bits. The decoder reads θ and figures out from its value the number of bits to use for the quantized values of x1 and x2 using Eqs. 10 and 11. -
FIG. 5 is a flowchart that illustrates an overall method of performing bit allocation in a gain shape vector quantization coding system, under an embodiment. The overall process begins withact 502, which determines the size of the codebook to use for the gain, such as determined using Eq. 8. The remaining bits are then allocated to the shape by the simple operation, bs=b−bg,act 504. In a practical implementation, the number of bits allocated to the shape may exceed the practical codebook size (e.g., 32 bits). In this case, the band is split into partitions that are smaller than the maximum codebook size,act 506. The first split operation creates two half bands or partitions. The relative magnitude of values on either side of the split are encoded and the process then determines whether the size of each partition exceeds the maximum codebook size,act 508. If the first split does not generate sufficiently small partitions, the splitting process is executed recursively until the appropriate codebook size is reached,act 510. The allocation of bits for the ratio of the magnitudes of each half, θ, and the two partitions, x1 and x2, are then allocated. - Because of the practical restrictions on the size of various codebooks, a
partition 402, as shown inFIG. 4 may not use all of its allocated bits. In order to reduce the waste incurred by not using the entire allocation, these bits may be redistributed to subsequent partitions, and even subsequent bands. To maximize the effectiveness of the redistribution, the described method may employ a rebalancing technique to code the larger of the two partitions in each split (the one allocated the greater number of bits) first, followed by the smaller one, after possibly adjusting its allocation to use some or all of the bits the first one failed to use. Bits unused during shape coding may also be redistributed for improving the precision of the gains. - Although embodiments have been described in relation to processing audio signals using an audio codec, it should be understood that the methods and systems described herein can also be implemented to process video signals to using a video codec. In this case, the input signal may be a digitized video signal that is organized such that the frequency coefficients are grouped into a number of bands, whose size may vary to match properties of the human eye to account for the psycho visual effects associated with video signal processing. Appropriate changes may be made to the values of certain variables in the equations shown above, depending on the characteristics of the video signal and the requirements of the video codec components.
- Embodiments are directed to a method and system of coding an audio signal using gain-shape vector quantization, comprising: organizing coefficients representing audio content into one or more bands; dividing each band into a gain and a shape; determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method, wherein the size of the codebook dictates a number of bits to allocate to the size; subtracting, in the processor-based device, the number of bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape; determining if the number of bits allocated to the shape is less than a defined number of bits used in the codebook; and recursively dividing the band into equal size partitions until the number of bits allocated to the shape in each partition is less than the defined number.
- Embodiments are further directed to a method and system of coding an audio signal using gain-shape vector quantization, comprising: organizing coefficients representing audio content into one or more bands; dividing each band into a gain and a shape; quantizing the gain using an A-law quantizer, and quantizing the shape using an optimal spherical quantizer; determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method for large factorials that approximates the size of the codebook to use for the gain, wherein the size of the codebook dictates a number of bits to allocate to the size; and subtracting, in the processor-based device, the number bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape.
- For purposes of the present description, the terms “component,” “module,” and “process,” may be used interchangeably to refer to a processing unit that performs a particular function and that may be implemented through computer program code (software), digital or analog circuitry, computer firmware, or any combination thereof.
- It should be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
- Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
- While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Claims (21)
1. A computer-implemented method of coding an audio signal using gain-shape vector quantization, comprising:
organizing coefficients representing audio content into one or more bands;
dividing each band into a gain and a shape;
determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method, wherein the size of the codebook dictates a number of bits to allocate to the size;
subtracting, in the processor-based device, the number bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape;
determining if the number of bits allocated to the shape is less than a defined number of bits used in the codebook; and
recursively dividing the band into equal size partitions until the number of bits allocated to the shape in each partition is less than the defined number.
2. The method of claim 1 wherein the coefficients are generated by a process selected from the group consisting of: time-domain filtering, excitation of a Linear Predictive Coding (LPC) model, a subband filter process, and a modified discrete cosine transform function.
3. The method of claim 2 wherein the one or more bands are selected to be of a size that matches one or more properties of human hearing.
4. The method of claim 1 wherein the codebook comprises an algebraic codebook, and wherein the defined number of bits comprises 32 bits.
5. The method of claim 4 wherein the processor-based device comprises an audio codec having an encoder circuit and a decoder circuit.
6. The method of claim 5 wherein the encoder circuit executes an encoder process that makes a series of bit allocation decisions for the gain and the shape of the audio content, and wherein the decoder circuit executes a decoder process that recreates the series of bit allocation decisions for gain and shape, without requiring transmission of additional side information for each decision in any data packet transmitted between the encoder circuit and the decoder circuit.
7. The method of claim 1 wherein the gain is quantized using an A-law quantizer, and the shape is quantized using an optimal spherical quantizer, and wherein the approximation comprises an approximation for large factorials that approximates the size of the codebook to use for the gain, denoted Ng, as: Ng≈√{square root over (CgN)}2b/N, wherein N is a number of dimensions, b is a target bitrate, and Cg is a defined constant that depends on the A-law quantizer parameter.
8. The method of claim 7 wherein the number of bits allocated for the gain is denoted bg, and is calculated using the formula: bg=log2 Ng.
9. The method of claim 8 further comprising determining the number of bits allocated for the gain using a low bitrate correction factor.
10. A computer-implemented method of coding an audio signal using gain-shape vector quantization, comprising:
organizing coefficients representing audio content into one or more bands;
dividing each band into a gain and a shape;
quantizing the gain using an A-law quantizer, and quantizing the shape using an optimal spherical quantizer;
determining, in processor-based device processing the audio content, a size of a codebook to use for the shape using an approximation method for large factorials that approximates the size of the codebook to use for the gain, wherein the size of the codebook dictates a number of bits to allocate to the size; and
subtracting, in the processor-based device, the number bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape.
11. The method of claim 10 further comprising:
determining if the number of bits allocated to the shape is less than a defined number of bits used in the codebook; and
recursively dividing the band into equal size partitions until the number of bits allocated to the shape in each partition is less than the defined number.
12. The method of claim 11 wherein each partition is separated into gains denoted g1 and g2 and shapes denoted x1 and x2.
13. The method of claim 12 further comprising coding a relative magnitude of two partitions comprising a divided band using a scalar parameter denoted θ, wherein a value of the scalar parameter is calculated by: θ=arctan(g1/g2).
14. The method of claim 13 wherein the codebook comprises an algebraic codebook, and wherein the defined number of bits comprises 32 bits.
15. The method of claim 14 wherein the processor-based device comprises an audio codec having an encoder circuit and a decoder circuit.
16. The method of claim 15 wherein the encoder circuit executes an encoder process that makes a series of bit allocation decisions for the gain and the shape of the audio content, and wherein the decoder circuit executes a decoder process that recreates the series of bit allocation decisions for gain and shape, without requiring transmission of additional side information for each decision in any data packet transmitted between the encoder circuit and the decoder circuit.
17. A system for coding an audio signal in an audio codec utilizing gain-shape vector quantization, comprising:
a first component organizing coefficients representing audio content into one or more bands and dividing each band into a gain and a shape;
a gain shape allocation component determining a size of a codebook to use for the shape using an approximation method, wherein the size of the codebook dictates a number of bits to allocate to the size and subtracting, in the processor-based device, the number bits allocated to the size from a total number of bits to determine a number of bits to allocate to the shape; and
a band partitioning and allocation component determining if the number of bits allocated to the shape is less than a defined number of bits used in the codebook, and recursively dividing the band into equal size partitions until the number of bits allocated to the shape in each partition is less than the defined number.
18. The system of claim 17 wherein the coefficients are generated by a process selected from the group consisting of: time-domain filtering, excitation of a Linear Predictive Coding (LPC) model, a subband filter process, and a modified discrete cosine transform function.
19. The system of claim 18 wherein the codebook comprises an algebraic codebook, and wherein the defined number of bits comprises 32 bits.
20. The system of claim 19 wherein the system includes an audio codec having an encoder circuit and a decoder circuit, wherein the encoder circuit executes an encoder process that makes a series of bit allocation decisions for the gain and the shape of the audio content, and wherein the decoder circuit executes a decoder process that recreates the series of bit allocation decisions for gain and shape, without requiring transmission of additional side information for each decision in any data packet transmitted between the encoder circuit and the decoder circuit.
21. The system of claim 17 wherein the gain is quantized using an A-law quantizer, and the shape is quantized using an optimal spherical quantizer, and wherein the approximation comprises an approximation for large factorials that approximates the size of the codebook to use for the gain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/414,490 US9009036B2 (en) | 2011-03-07 | 2012-03-07 | Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161450053P | 2011-03-07 | 2011-03-07 | |
US13/414,490 US9009036B2 (en) | 2011-03-07 | 2012-03-07 | Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding |
Publications (2)
Publication Number | Publication Date |
---|---|
US20120232913A1 true US20120232913A1 (en) | 2012-09-13 |
US9009036B2 US9009036B2 (en) | 2015-04-14 |
Family
ID=46796877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/414,490 Active 2033-01-05 US9009036B2 (en) | 2011-03-07 | 2012-03-07 | Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding |
Country Status (2)
Country | Link |
---|---|
US (1) | US9009036B2 (en) |
WO (1) | WO2012122299A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130101028A1 (en) * | 2010-07-05 | 2013-04-25 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, device, program, and recording medium |
US20130254417A1 (en) * | 2012-03-21 | 2013-09-26 | Jason Nicholls | System method device for streaming video |
US20140249806A1 (en) * | 2011-10-28 | 2014-09-04 | Panasonic Corporation | Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method |
US20140372456A1 (en) * | 2013-06-14 | 2014-12-18 | Nvidia Corporation | Method and system for bin coalescing for parallel divide-and-conquer sorting algorithms |
US20150149157A1 (en) * | 2013-11-22 | 2015-05-28 | Qualcomm Incorporated | Frequency domain gain shape estimation |
US20160027449A1 (en) * | 2014-07-28 | 2016-01-28 | Telefonaktiebolget L M Ericsson (Publ) | Pyramid vector quantizer shape search |
CN107210042A (en) * | 2015-01-30 | 2017-09-26 | 日本电信电话株式会社 | Code device, decoding apparatus, their method, program and recording medium |
EP3252763A1 (en) * | 2016-05-30 | 2017-12-06 | Nokia Technologies Oy | Low-delay audio coding |
US9940942B2 (en) | 2013-04-05 | 2018-04-10 | Dolby International Ab | Advanced quantizer |
WO2018200426A1 (en) * | 2017-04-25 | 2018-11-01 | Dts, Inc. | Variable alphabet size in digital audio signals |
CN110033779A (en) * | 2014-02-27 | 2019-07-19 | 瑞典爱立信有限公司 | It indexs for pyramid vector quantization and conciliates the method and apparatus of index |
CN110708075A (en) * | 2013-11-12 | 2020-01-17 | 瑞典爱立信有限公司 | Partitioned gain shape vector coding |
US10573331B2 (en) * | 2018-05-01 | 2020-02-25 | Qualcomm Incorporated | Cooperative pyramid vector quantizers for scalable audio coding |
US10586546B2 (en) * | 2018-04-26 | 2020-03-10 | Qualcomm Incorporated | Inversely enumerated pyramid vector quantizers for efficient rate adaptation in audio coding |
US10734006B2 (en) | 2018-06-01 | 2020-08-04 | Qualcomm Incorporated | Audio coding based on audio pattern recognition |
WO2020263855A1 (en) * | 2019-06-24 | 2020-12-30 | Qualcomm Incorporated | Psychoacoustic audio coding of ambisonic audio data |
CN114913863A (en) * | 2021-02-09 | 2022-08-16 | 同响科技股份有限公司 | Digital audio signal data coding method |
US20230011939A1 (en) * | 2021-07-08 | 2023-01-12 | Meta Platforms, Inc. | Prioritizing encoding of video data received by an online system to maximize visual quality while accounting for fixed computing capacity |
US20230262222A1 (en) * | 2022-02-11 | 2023-08-17 | Qualcomm Incorporated | Neural-network media compression using quantized entropy coding distribution parameters |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
PL3385948T3 (en) | 2014-03-24 | 2020-01-31 | Nippon Telegraph And Telephone Corporation | Encoding method, encoder, program and recording medium |
US10366698B2 (en) | 2016-08-30 | 2019-07-30 | Dts, Inc. | Variable length coding of indices and bit scheduling in a pyramid vector quantizer |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5778339A (en) * | 1993-11-29 | 1998-07-07 | Sony Corporation | Signal encoding method, signal encoding apparatus, signal decoding method, signal decoding apparatus, and recording medium |
US5960388A (en) * | 1992-03-18 | 1999-09-28 | Sony Corporation | Voiced/unvoiced decision based on frequency band ratio |
US5983172A (en) * | 1995-11-30 | 1999-11-09 | Hitachi, Ltd. | Method for coding/decoding, coding/decoding device, and videoconferencing apparatus using such device |
US6018707A (en) * | 1996-09-24 | 2000-01-25 | Sony Corporation | Vector quantization method, speech encoding method and apparatus |
US6064954A (en) * | 1997-04-03 | 2000-05-16 | International Business Machines Corp. | Digital audio signal coding |
US20080140393A1 (en) * | 2006-12-08 | 2008-06-12 | Electronics & Telecommunications Research Institute | Speech coding apparatus and method |
US7454330B1 (en) * | 1995-10-26 | 2008-11-18 | Sony Corporation | Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility |
US7630882B2 (en) * | 2005-07-15 | 2009-12-08 | Microsoft Corporation | Frequency segmentation to obtain bands for efficient coding of digital media |
US20110035214A1 (en) * | 2008-04-09 | 2011-02-10 | Panasonic Corporation | Encoding device and encoding method |
US7979271B2 (en) * | 2004-02-18 | 2011-07-12 | Voiceage Corporation | Methods and devices for switching between sound signal coding modes at a coder and for producing target signals at a decoder |
US20120029925A1 (en) * | 2010-07-30 | 2012-02-02 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for dynamic bit allocation |
US20130117028A1 (en) * | 2011-10-28 | 2013-05-09 | Electronics And Telecommunications Research Institute | Apparatus and method for coding signal in a communication system |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2560873B2 (en) | 1990-02-28 | 1996-12-04 | 日本ビクター株式会社 | Orthogonal transform coding Decoding method |
US5845241A (en) | 1996-09-04 | 1998-12-01 | Hughes Electronics Corporation | High-accuracy, low-distortion time-frequency analysis of signals using rotated-window spectrograms |
US6463097B1 (en) | 1998-10-16 | 2002-10-08 | Koninklijke Philips Electronics N.V. | Rate detection in direct sequence code division multiple access systems |
US6978236B1 (en) | 1999-10-01 | 2005-12-20 | Coding Technologies Ab | Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching |
US6993477B1 (en) | 2000-06-08 | 2006-01-31 | Lucent Technologies Inc. | Methods and apparatus for adaptive signal processing involving a Karhunen-Loève basis |
US6567777B1 (en) | 2000-08-02 | 2003-05-20 | Motorola, Inc. | Efficient magnitude spectrum approximation |
WO2002091363A1 (en) | 2001-05-08 | 2002-11-14 | Koninklijke Philips Electronics N.V. | Audio coding |
US6934676B2 (en) | 2001-05-11 | 2005-08-23 | Nokia Mobile Phones Ltd. | Method and system for inter-channel signal redundancy removal in perceptual audio coding |
US7275036B2 (en) | 2002-04-18 | 2007-09-25 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data |
US7447631B2 (en) | 2002-06-17 | 2008-11-04 | Dolby Laboratories Licensing Corporation | Audio coding system using spectral hole filling |
DE10236694A1 (en) | 2002-08-09 | 2004-02-26 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Equipment for scalable coding and decoding of spectral values of signal containing audio and/or video information by splitting signal binary spectral values into two partial scaling layers |
US7502743B2 (en) | 2002-09-04 | 2009-03-10 | Microsoft Corporation | Multi-channel audio encoding and decoding with multi-channel transform selection |
JP4657570B2 (en) | 2002-11-13 | 2011-03-23 | ソニー株式会社 | Music information encoding apparatus and method, music information decoding apparatus and method, program, and recording medium |
DE10331803A1 (en) | 2003-07-14 | 2005-02-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for converting to a transformed representation or for inverse transformation of the transformed representation |
DE502004008221D1 (en) * | 2003-07-25 | 2008-11-20 | Sennheiser Electronic | METHOD AND DEVICE FOR DIGITIZING AND DATA COMPROMISING ANALOG SIGNALS |
US8983834B2 (en) | 2004-03-01 | 2015-03-17 | Dolby Laboratories Licensing Corporation | Multichannel audio coding |
US7272567B2 (en) | 2004-03-25 | 2007-09-18 | Zoran Fejzo | Scalable lossless audio codec and authoring tool |
US7242976B2 (en) | 2004-04-02 | 2007-07-10 | Oki Electric Industry Co., Ltd. | Device and method for selecting codes |
US7161507B2 (en) * | 2004-08-20 | 2007-01-09 | 1St Works Corporation | Fast, practically optimal entropy coding |
TWI498882B (en) | 2004-08-25 | 2015-09-01 | Dolby Lab Licensing Corp | Audio decoder |
TWI393121B (en) | 2004-08-25 | 2013-04-11 | Dolby Lab Licensing Corp | Method and apparatus for processing a set of n audio signals, and computer program associated therewith |
US7548853B2 (en) | 2005-06-17 | 2009-06-16 | Shmunk Dmitry V | Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding |
US7546240B2 (en) | 2005-07-15 | 2009-06-09 | Microsoft Corporation | Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition |
JP4810335B2 (en) | 2006-07-06 | 2011-11-09 | 株式会社東芝 | Wideband audio signal encoding apparatus and wideband audio signal decoding apparatus |
US7761290B2 (en) | 2007-06-15 | 2010-07-20 | Microsoft Corporation | Flexible frequency and time partitioning in perceptual transform coding of audio |
CN101939782B (en) | 2007-08-27 | 2012-12-05 | 爱立信电话股份有限公司 | Adaptive transition frequency between noise fill and bandwidth extension |
ES2774956T3 (en) | 2007-08-27 | 2020-07-23 | Ericsson Telefon Ab L M | Method and device for perceptual spectral decoding of an audio signal, including spectral gap filling |
ATE518224T1 (en) | 2008-01-04 | 2011-08-15 | Dolby Int Ab | AUDIO ENCODERS AND DECODERS |
EP4372744A1 (en) | 2008-07-11 | 2024-05-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder, audio decoder, methods for encoding and decoding an audio signal, audio stream and computer program |
PL2311033T3 (en) | 2008-07-11 | 2012-05-31 | Fraunhofer Ges Forschung | Providing a time warp activation signal and encoding an audio signal therewith |
US8290782B2 (en) | 2008-07-24 | 2012-10-16 | Dts, Inc. | Compression of audio scale-factors by two-dimensional transformation |
EP2182513B1 (en) | 2008-11-04 | 2013-03-20 | Lg Electronics Inc. | An apparatus for processing an audio signal and method thereof |
US8463599B2 (en) | 2009-02-04 | 2013-06-11 | Motorola Mobility Llc | Bandwidth extension method and apparatus for a modified discrete cosine transform audio coder |
CN101930425B (en) | 2009-06-24 | 2015-09-30 | 华为技术有限公司 | Signal processing method, data processing method and device |
-
2012
- 2012-03-07 US US13/414,490 patent/US9009036B2/en active Active
- 2012-03-07 WO PCT/US2012/028120 patent/WO2012122299A1/en active Application Filing
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5960388A (en) * | 1992-03-18 | 1999-09-28 | Sony Corporation | Voiced/unvoiced decision based on frequency band ratio |
US5778339A (en) * | 1993-11-29 | 1998-07-07 | Sony Corporation | Signal encoding method, signal encoding apparatus, signal decoding method, signal decoding apparatus, and recording medium |
US7454330B1 (en) * | 1995-10-26 | 2008-11-18 | Sony Corporation | Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility |
US5983172A (en) * | 1995-11-30 | 1999-11-09 | Hitachi, Ltd. | Method for coding/decoding, coding/decoding device, and videoconferencing apparatus using such device |
US6018707A (en) * | 1996-09-24 | 2000-01-25 | Sony Corporation | Vector quantization method, speech encoding method and apparatus |
US6064954A (en) * | 1997-04-03 | 2000-05-16 | International Business Machines Corp. | Digital audio signal coding |
US7979271B2 (en) * | 2004-02-18 | 2011-07-12 | Voiceage Corporation | Methods and devices for switching between sound signal coding modes at a coder and for producing target signals at a decoder |
US7630882B2 (en) * | 2005-07-15 | 2009-12-08 | Microsoft Corporation | Frequency segmentation to obtain bands for efficient coding of digital media |
US20080140393A1 (en) * | 2006-12-08 | 2008-06-12 | Electronics & Telecommunications Research Institute | Speech coding apparatus and method |
US20110035214A1 (en) * | 2008-04-09 | 2011-02-10 | Panasonic Corporation | Encoding device and encoding method |
US20120029925A1 (en) * | 2010-07-30 | 2012-02-02 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for dynamic bit allocation |
US20120029924A1 (en) * | 2010-07-30 | 2012-02-02 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for multi-stage shape vector quantization |
US20130117028A1 (en) * | 2011-10-28 | 2013-05-09 | Electronics And Telecommunications Research Institute | Apparatus and method for coding signal in a communication system |
Non-Patent Citations (4)
Title |
---|
Kruger, H., et al. "On logarithmic spherical vector quantization." Information Theory and Its Applications, 2008. ISITA 2008. International Symposium on. IEEE, 2008. * |
Valin et al., "Constrained-Energy Lapped Transform (CELT) Codec", IETF Internet Draft, 4 July 2009. * |
Valin, Jean-Marc, et al. "A high-quality speech and audio codec with less than 10-ms delay." Audio, Speech, and Language Processing, IEEE Transactions on 18.1 (2010): 58-67. * |
Valin, Jean-Marc, Timothy B. Terriberry, and Gregory Maxwell. "A full-bandwidth audio codec with low complexity and very low delay." Proc. EUSIPCO. 2009. * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130101028A1 (en) * | 2010-07-05 | 2013-04-25 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, device, program, and recording medium |
US9786292B2 (en) * | 2011-10-28 | 2017-10-10 | Panasonic Intellectual Property Corporation Of America | Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method |
US20140249806A1 (en) * | 2011-10-28 | 2014-09-04 | Panasonic Corporation | Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method |
US20130254417A1 (en) * | 2012-03-21 | 2013-09-26 | Jason Nicholls | System method device for streaming video |
US10311884B2 (en) | 2013-04-05 | 2019-06-04 | Dolby International Ab | Advanced quantizer |
US9940942B2 (en) | 2013-04-05 | 2018-04-10 | Dolby International Ab | Advanced quantizer |
US20140372456A1 (en) * | 2013-06-14 | 2014-12-18 | Nvidia Corporation | Method and system for bin coalescing for parallel divide-and-conquer sorting algorithms |
US9619204B2 (en) * | 2013-06-14 | 2017-04-11 | Nvidia Corporation | Method and system for bin coalescing for parallel divide-and-conquer sorting algorithms |
CN110708075A (en) * | 2013-11-12 | 2020-01-17 | 瑞典爱立信有限公司 | Partitioned gain shape vector coding |
US20150149157A1 (en) * | 2013-11-22 | 2015-05-28 | Qualcomm Incorporated | Frequency domain gain shape estimation |
CN110033779A (en) * | 2014-02-27 | 2019-07-19 | 瑞典爱立信有限公司 | It indexs for pyramid vector quantization and conciliates the method and apparatus of index |
US10715807B2 (en) * | 2014-02-27 | 2020-07-14 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for pyramid vector quantization indexing and de-indexing of audio/video sample vectors |
US10841584B2 (en) * | 2014-02-27 | 2020-11-17 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for pyramid vector quantization de-indexing of audio/video sample vectors |
US20190342552A1 (en) * | 2014-02-27 | 2019-11-07 | Telefonaktiebolaget L M Ericsson (Publ) | Method and apparatus for pyramid vector quantization indexing and de-indexing of audio/video sample vectors |
US11942102B2 (en) | 2014-07-28 | 2024-03-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Pyramid vector quantizer shape search |
US20160027449A1 (en) * | 2014-07-28 | 2016-01-28 | Telefonaktiebolget L M Ericsson (Publ) | Pyramid vector quantizer shape search |
US9792922B2 (en) * | 2014-07-28 | 2017-10-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Pyramid vector quantizer shape search |
CN107210042A (en) * | 2015-01-30 | 2017-09-26 | 日本电信电话株式会社 | Code device, decoding apparatus, their method, program and recording medium |
EP3252763A1 (en) * | 2016-05-30 | 2017-12-06 | Nokia Technologies Oy | Low-delay audio coding |
WO2018200426A1 (en) * | 2017-04-25 | 2018-11-01 | Dts, Inc. | Variable alphabet size in digital audio signals |
CN110800049A (en) * | 2017-04-25 | 2020-02-14 | Dts公司 | Variable alphabet size in digital audio signals |
US10699723B2 (en) | 2017-04-25 | 2020-06-30 | Dts, Inc. | Encoding and decoding of digital audio signals using variable alphabet size |
US10586546B2 (en) * | 2018-04-26 | 2020-03-10 | Qualcomm Incorporated | Inversely enumerated pyramid vector quantizers for efficient rate adaptation in audio coding |
US10573331B2 (en) * | 2018-05-01 | 2020-02-25 | Qualcomm Incorporated | Cooperative pyramid vector quantizers for scalable audio coding |
US10734006B2 (en) | 2018-06-01 | 2020-08-04 | Qualcomm Incorporated | Audio coding based on audio pattern recognition |
WO2020263855A1 (en) * | 2019-06-24 | 2020-12-30 | Qualcomm Incorporated | Psychoacoustic audio coding of ambisonic audio data |
US12073842B2 (en) | 2019-06-24 | 2024-08-27 | Qualcomm Incorporated | Psychoacoustic audio coding of ambisonic audio data |
CN114913863A (en) * | 2021-02-09 | 2022-08-16 | 同响科技股份有限公司 | Digital audio signal data coding method |
US20230011939A1 (en) * | 2021-07-08 | 2023-01-12 | Meta Platforms, Inc. | Prioritizing encoding of video data received by an online system to maximize visual quality while accounting for fixed computing capacity |
US11716513B2 (en) * | 2021-07-08 | 2023-08-01 | Meta Platforms, Inc. | Prioritizing encoding of video data received by an online system to maximize visual quality while accounting for fixed computing capacity |
US20230262222A1 (en) * | 2022-02-11 | 2023-08-17 | Qualcomm Incorporated | Neural-network media compression using quantized entropy coding distribution parameters |
US11876969B2 (en) * | 2022-02-11 | 2024-01-16 | Qualcomm Incorporated | Neural-network media compression using quantized entropy coding distribution parameters |
Also Published As
Publication number | Publication date |
---|---|
WO2012122299A1 (en) | 2012-09-13 |
US9009036B2 (en) | 2015-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9009036B2 (en) | Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding | |
KR100852481B1 (en) | Device and method for determining a quantiser step size | |
JP6779966B2 (en) | Advanced quantizer | |
EP1600946B1 (en) | Method and apparatus for encoding a digital audio signal | |
CN106941003B (en) | Energy lossless encoding method and apparatus, and energy lossless decoding method and apparatus | |
CN109313908B (en) | Audio encoder and method for encoding an audio signal | |
EP2490215A2 (en) | Method and apparatus to extract important spectral component from audio signal and low bit-rate audio signal coding and/or decoding method and apparatus using the same | |
US8838442B2 (en) | Method and system for two-step spreading for tonal artifact avoidance in audio coding | |
RU2505921C2 (en) | Method and apparatus for encoding and decoding audio signals (versions) | |
JP2016505168A (en) | Time domain level adjustment of audio signal decoding or encoding | |
CN101161033A (en) | Economical loudness measurement of coded audio | |
KR20100063086A (en) | Temporal masking in audio coding based on spectral dynamics in frequency sub-bands | |
CN102436819A (en) | Wireless audio compression and decompression methods, audio coder and audio decoder | |
US8149927B2 (en) | Method of and apparatus for encoding/decoding digital signal using linear quantization by sections | |
US20040158456A1 (en) | System, method, and apparatus for fast quantization in perceptual audio coders | |
RU2662921C2 (en) | Device and method for the audio signal envelope encoding, processing and decoding by the aggregate amount representation simulation using the distribution quantization and encoding | |
EP3008725A1 (en) | Apparatus and method for audio signal envelope encoding, processing and decoding by splitting the audio signal envelope employing distribution quantization and coding | |
CN103035249B (en) | Audio arithmetic coding method based on time-frequency plane context | |
Bhaskar | Low rate coding of audio by a predictive transform coder for efficient satellite transmission |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |