US20020116199A1 - Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec - Google Patents

Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec Download PDF

Info

Publication number
US20020116199A1
US20020116199A1 US10/061,310 US6131002A US2002116199A1 US 20020116199 A1 US20020116199 A1 US 20020116199A1 US 6131002 A US6131002 A US 6131002A US 2002116199 A1 US2002116199 A1 US 2002116199A1
Authority
US
United States
Prior art keywords
signal
coefficients
residue
noise
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/061,310
Other versions
US6885993B2 (en
Inventor
Shuwu Wu
John Mantegna
Keren Perlmutter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meta Platforms Inc
Original Assignee
America Online Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US10/061,310 priority Critical patent/US6885993B2/en
Application filed by America Online Inc filed Critical America Online Inc
Publication of US20020116199A1 publication Critical patent/US20020116199A1/en
Priority to US11/075,440 priority patent/US7181403B2/en
Application granted granted Critical
Publication of US6885993B2 publication Critical patent/US6885993B2/en
Priority to US11/609,081 priority patent/US7418395B2/en
Priority to US12/197,645 priority patent/US8010371B2/en
Assigned to BANK OF AMERICAN, N.A. AS COLLATERAL AGENT reassignment BANK OF AMERICAN, N.A. AS COLLATERAL AGENT SECURITY AGREEMENT Assignors: AOL ADVERTISING INC., AOL INC., BEBO, INC., GOING, INC., ICQ LLC, LIGHTNINGCAST LLC, MAPQUEST, INC., NETSCAPE COMMUNICATIONS CORPORATION, QUIGO TECHNOLOGIES LLC, SPHERE SOURCE, INC., TACODA LLC, TRUVEO, INC., YEDDA, INC.
Assigned to AMERICA ONLINE, INC. reassignment AMERICA ONLINE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: 07039-104IL1, ., MANTEGNA, JOHN, PERLMUTTER, KEREN O.
Assigned to AOL INC. reassignment AOL INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AOL LLC
Assigned to AOL LLC reassignment AOL LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: AMERICA ONLINE, INC.
Assigned to MAPQUEST, INC, YEDDA, INC, SPHERE SOURCE, INC, TACODA LLC, TRUVEO, INC, NETSCAPE COMMUNICATIONS CORPORATION, GOING INC, AOL ADVERTISING INC, LIGHTNINGCAST LLC, QUIGO TECHNOLOGIES LLC, AOL INC reassignment MAPQUEST, INC TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS Assignors: BANK OF AMERICA, N A
Priority to US13/191,496 priority patent/US8285558B2/en
Assigned to AMERICA ONLINE, INC. reassignment AMERICA ONLINE, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE NAME OF FIRST ASSIGNOR PREVIOUSLY RECORDED ON REEL 023713 FRAME 0240. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT OF ASSIGNORS INTEREST. Assignors: MANTEGNA, JOHN, PERLMUTTER, KEREN, WU, SHUWU
Assigned to FACEBOOK, INC. reassignment FACEBOOK, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AOL INC.
Priority to US13/618,339 priority patent/US20130173271A1/en
Priority to US13/618,414 priority patent/US8712785B2/en
Adjusted expiration legal-status Critical
Assigned to META PLATFORMS, INC. reassignment META PLATFORMS, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: FACEBOOK, INC.
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/028Noise substitution, i.e. substituting non-tonal spectral components by noisy source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0012Smoothing of parameters of the decoder interpolation

Definitions

  • This invention relates to compression and decompression of continuous signals, and more particularly to a method and system for reduction of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals.
  • audio compression techniques have been developed to transmit audio signals in constrained bandwidth channels and store such signals on media with limited storage capacity.
  • general purpose audio compression no assumptions can be made about the source or characteristics of the sound.
  • compression/decompression algorithms must be general enough to deal with the arbitrary nature of audio signals, which in turn poses a substantial constraint on viable approaches.
  • audio refers to a signal that can be any sound in general, such as music of any type, speech, and a mixture of music and speech.
  • General audio compression thus differs from speech coding in one significant aspect: in speech coding where the source is known a priori, model-based algorithms are practical.
  • transform domain quantization Most approaches to audio compression can be broadly divided into two major categories: time and transform domain quantization.
  • the characteristics of the transform domain are defined by the reversible transformations employed.
  • a transform such as the fast Fourier transform (FFT), discrete cosine transform (DCT), or modified discrete cosine transform (MDCT)
  • the transform domain is equivalent to the frequency domain.
  • transforms like wavelet transform (WT) or packet transform (PT) are used, the transform domain represents a mixture of time and frequency information.
  • Quantization is one of the most common and direct techniques to achieve data compression.
  • Scalar quantization encodes data points individually, while vector quantization groups input data into vectors, each of which is encoded as a whole.
  • Vector quantization typically searches a codebook (a collection of vectors) for the closest match to an input vector, yielding an output index.
  • a dequantizer simply performs a table lookup in an identical codebook to reconstruct the original vector.
  • Other approaches that do not involve codebooks are known, such as closed form solutions.
  • codec that complies with the MPEG-Audio standard (ISO/IEC 11172-3; 1993(E))(here, simply “MPEG”)is an example of an approach employing time-domain scalar quantization.
  • MPEG employs scalar quantization of the time-domain signal in individual subbands, while bit allocation in the scalar quantizer is based on a psychoacoustic model, which is implemented separately in the frequency domain (dualpath approach).
  • Vector quantization schemes usually can achieve far better compression ratios than scalar quantization at a given distortion level.
  • the human auditory system is sensitive to the distortion associated with zeroing even a single time-domain sample. This phenomenon makes direct application of traditional vector quantization techniques on a time-domain audio signal an unattractive proposition, since vector quantization at the rate of 1 bit per sample or lower often leads to zeroing of some vector components (that is, time-domain samples).
  • the inventors have determined that it would be desirable to provide an audio compression technique suitable for real-time applications while having reduced computational complexity.
  • the technique should provide low bit-rate full bandwidth compression (about 1-bit per sample) of music and speech, while being applicable to higher bit-rate audio compression.
  • the present invention provides such a technique.
  • the invention includes a method and system for minimization of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals.
  • the invention includes a general purpose, ultra-low latency audio codec algorithm.
  • the invention includes: a method and apparatus for compression and decompression of audio signals using a novel boundary analysis and synthesis framework to substantially reduce quantization-induced frame or block-discontinuity; a novel adaptive cosine packet transform (ACPT) as the transform of choice to effectively capture the input audio characteristics; a signal-residue classifier to separate the strong signal clusters from the noise and weak signal components (collectively called residue); an adaptive sparse vector quantization (ASVQ) algorithm for signal components; a stochastic noise model for the residue; and an associated rate control algorithm.
  • ACPT novel adaptive cosine packet transform
  • ASVQ adaptive sparse vector quantization
  • This invention also involves a general purpose framework that substantially reduces the quantization-induced block-discontinuity in lossy data compression involving any continuous data.
  • the ACPT algorithm dynamically adapts to the instantaneous changes in the audio signal from frame to frame, resulting in efficient signal modeling that leads to a high degree of data compression.
  • a signal/residue classifier is employed to separate the strong signal clusters from the residue.
  • the signal clusters are encoded as a special type of adaptive sparse vector quantization.
  • the residue is modeled and encoded as bands of stochastic noise.
  • the invention includes a zero-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including performing a first quantization of each block and generating first quantization indices indicative of such first quantization; determining a quantization error for each block; performing a second quantization of any quantization error arising near the boundaries of each block from such first quantization and generating second quantization indices indicative of such second quantization; and encoding the first and second quantization indices and formatting such encoded indices as an output bit-stream.
  • the invention includes a low-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including forming an overlapping time-domain block by prepending a small fraction of a previous time-domain block to a current time-domain block; performing a reversible transform on each overlapping time-domain block, so as to yield energy concentration in the transform domain; quantizing each reversibly transformed block and generating quantization indices indicative of such quantization; encoding the quantization indices for each quantized block as an encoded block, and outputting each encoded block as a bit-stream; decoding each encoded block into quantization indices; generating a quantized transform-domain block from the quantization indices; inversely transforming each quantized transform-domain block into an overlapping time-domain block; excluding data from regions near the boundary of each overlapping time-domain block and reconstructing an initial output data block from the remaining data of such overlapping time-domain
  • the invention also includes corresponding methods for decompressing a bitstream representing an input signal compressed in this manner, particularly audio data.
  • the invention further includes corresponding computer program implementations of these and other algorithms.
  • FIGS. 1 A- 1 C are waveform diagrams for a data block derived from a continuous data stream.
  • FIG. 1A shows a sine wave before quantization.
  • FIG. 1B shows the sine wave of FIG. 1A after quantization.
  • FIG. 1C shows that the quantization error or residue (and thus energy concentration) substantially increases near the boundaries of the block.
  • FIG. 2 is a block diagram of a preferred general purpose audio encoding system in accordance with the invention.
  • FIG. 3 is a block diagram of a preferred general purpose audio decoding system in accordance with the invention.
  • FIG. 4 illustrates the boundary analysis and synthesis aspects of the invention.
  • the residue in the case of lossy quantization, the residue is non-zero, and due to the block-independent application of the quantization, the residue will not match at the block boundaries; hence, block-discontinuity will result in the reconstructed signal.
  • the quantization error is relatively small when compared to the original signal strength. i.e., the reconstructed waveform approximates the original signal within a data block, one interesting phenomenon arises: the residue energy tends to concentrate at both ends of the block boundary. In other words, the Gibbs leakage energy tends to concentrate at the block boundaries. Certain windowing techniques can further enhance such residue energy concentration.
  • FIGS. 1 A- 1 C are waveform diagrams for a data block derived from a continuous data stream.
  • FIG. 1A shows a sine wave before quantization.
  • FIG. 1B shows the sine wave of FIG. 1A after quantization.
  • FIG. 1C shows that the quantization error or residue (and thus energy concentration) substantially increases near the boundaries of the block.
  • windowing technique to enhance the residue energy concentration near the block boundaries.
  • Preferred is a windowing function characterized by the identity function (i.e., no transformation) for most of a block, but with bell-shaped decays near the boundaries of a block (see FIG. 4, described below).
  • Residue quantization Application of rigorous time-domain waveform quantization of the residue (i.e., the quantization error near the boundaries of each frame). In essence. more bits are used to define the boundaries by encoding the residue near the block-boundaries. This approach is slightly less efficient in coding but results in zero coding latency.
  • Boundary exclusion and interpolation During encoding. overlapped data blocks with a small overlapped data region that contains all the concentrated residue energy are used, resulting in a small coding latency. During decoding, each reconstructed block excludes the boundary regions where residue energy concentrates, resulting in a minimized time-domain residue and block-discontinuity. Boundary interpolation is then used to further reduce the block-discontinuity.
  • An ideal audio compression algorithm may include the following features:
  • ACPT Adaptive Cosine Packet Transform
  • the (wavelet or cosine) packet transform is a well-studied subject in the wavelet research community as well as in the data compression community.
  • a wavelet transform (WT) results in transform coefficients that represent a mixture of time and frequency domain characteristics.
  • WTs One characteristic of WTs is that it has mathematically compact support. In other words, the wavelet has basis functions that are non-vanishing only in a finite region, in contrast to sine waves that extend to infinity.
  • the advantage of such compact support is that WTs can capture more efficiently the characteristics of a transient signal impulse than FFTs or DCTs can.
  • PTs have the further advantage that they adapt to the input signal time scale through best basis analysis (by minimizing certain parameters like entropy), yielding even more efficient representation of a transient signal event.
  • WTs or PTs as the transform of choice in the present audio coding framework
  • ACPT as the preferred transform for an audio codec.
  • One advantage of using a cosine packet transform (CPT) for audio coding is that it can efficiently capture transient signals, while also adapting to harmonic-like (sinusoidal-like) signals appropriately.
  • ACPTs are an extension to conventional CPTs that provide a number of advantages. In low bit-rate audio coding, coding efficiency is improved by using longer audio coding frames (blocks). When a highly transient signal is embedded in a longer coding frame. CPTs may not capture the fast time response. This is because, for example, in the best basis analysis algorithm that minimizes entropy, entropy may not be the most appropriate signature (nonlinear dependency on the signal normalization factor is one reason) for time scale adaptation under certain signal conditions.
  • An ACPT provides an alternative by pre-splitting the longer coding frame into sub-frames through an adaptive switching mechanism, and then applying a CPT on the subsequent sub-frames. The “best basis” associated with ACPTs is called the extended best basis.
  • a Signal and Residue Classifier may be implemented in different ways. One approach is to identify all the discrete strong signal components from the residue, yielding a sparse vector signal coefficient frame vector, where subsequent adaptive sparse vector quantization (ASVQ) is used as the preferred quantization mechanism. A second approach is based on one simple observation of natural signals: the strong signal component coefficients tend to be clustered.
  • this second approach would separate the strong signal clusters from the contiguous residue coefficients.
  • the subsequent quantization of the clustered signal vector can be regarded as a special type of ASVQ (global clustered sparse vector type). It has been shown that the second approach generally yields higher coding efficiency since signal components are clustered, and thus fewer bits are required to encode their locations.
  • ASVQ is the preferred quantization mechanism for the strong signal components.
  • ASVQ please refer to allowed U.S. Patent application Ser. No. 08/958,567 by Shuwu Wu and John Mantegna, entitled “Audio Codec using Adaptive Sparse Vector Quantization with Subband Vector Classification”, filed Oct. 28, 1997, which is assigned to the assignee of the present invention and hereby incorporated by reference.
  • the preferred embodiment employs a mechanism to provide bit-allocation that is appropriate for the block-discontinuity minimization. This simple yet effective bit-allocation also allows for short-term bit-rate prediction, which proves to be useful in the rate-control algorithm.
  • a second approach is rooted in time-domain filter bank approach. Again the residue energy is calculated and quantized. On reconstruction, a predetermined bank of filters is used to generate the residue signal for each frequency band. The input to these filters is white noise, and the output is gain-adjusted to match the original residue energy. This approach offers gain interpolation for each residue band between residue frames, yielding continuous residue energy.
  • rate control mechanism is employed in the encoder to better target the desired range of bit-rates.
  • the rate control mechanism operates as a feedback loop to the SRC block and the ASVQ.
  • the preferred rate control mechanism uses a linear model to predict the short-term bit-rate associated with the current coding frame. It also calculates the long-term bit-rate. Both the short- and long-term bit-rates are then used to select appropriate SRC and ASVQ control parameters.
  • This rate control mechanism offers a number of benefits, including reduced complexity in computation complexity without applying quantization and in in situ adaptation to transient signals.
  • the framework for minimization of quantization-induced block-discontinuity allows for dynamic and arbitrary reversible transform-based signal modeling. This provides flexibility for dynamic switching among different signal models and the potential to produce near-optimal coding.
  • This advantageous feature is simply not available in the traditional MPEG I or MPEG II audio codecs or in the advanced audio codec (AAC). (For a detailed description of AAC, please see the References section below). This is important due to the dynamic and arbitrary nature of audio signals.
  • the preferred audio codec of the invention is a general purpose audio codec that applies to all music, sounds, and speech. Further, the codec's inherent low latency is particularly useful in the coding of short (on the order of one second) sound effects.
  • the preferred audio coding algorithm of the invention is also very scalable in the sense that it can produce low bit-rate (about 1 bit/sample) full bandwidth audio compression at sampling rates ranging from 8kHz to 44kHz with only minor adjustments in coding parameters. This algorithm can also be extended to high quality audio and stereo compression.
  • the preferred audio encoding and decoding embodiments of the invention form an audio coding and decoding system that achieves audio compression at variable low bit-rates in the neighborhood of 0.5 to 1.2 bits per sample. This audio compression system applies to both low bit-rate coding and high quality transparent coding and audio reproduction at a higher rate.
  • the following sections separately describe preferred encoder and decoder embodiments.
  • FIG. 2 is a block diagram of a preferred general purpose audio encoding system in accordance with the invention.
  • the preferred audio encoding system may be implemented in software or hardware, and comprises 8 major functional blocks, 100-114, which are described below.
  • boundary analysis 100 constitutes the first functional block in the general purpose audio encoder.
  • the first approach yields zero latency at a cost of requiring encoding of the residue waveform near the block boundaries (“near” typically being about ⁇ fraction (1/16) ⁇ of the block size).
  • the second approach introduces a very small latency, but has better coding efficiency because it avoids the need to encode the residue near the block boundaries, where most of the residue energy concentrates.
  • this second approach introduces in the audio coding relative to a state-of-the-art MPEG AAC codec (where the latency is multiple frames vs. a fraction of a frame for the preferred codec of the invention), it is preferable to use the second approach for better coding efficiency, unless zero latency is absolutely required.
  • the first approach can simply be viewed as a special case of the second approach as far as the boundary analysis function 100 and synthesis function 212 (see FIG. 3) are concerned. So a description of the second approach suffices to describe both approaches.
  • FIG. 4 illustrates the boundary analysis and synthesis aspects of the invention.
  • An audio coding (analysis or synthesis) frame consists of a sufficient (should be no less than 256, preferably 1024 or 2048) number of samples, Ns. In general, larger Ns values lead to higher coding efficiency, but at a risk of losing fast transient response fidelity.
  • HB D synthesis history buffer
  • a window function is created during audio codec initialization to have the following properties: (1) at the center region of Ns ⁇ sHB E +sHB D samples in size, the window function equals unity (i.e., the identity function); and (2) the remaining equally divided left and right edges typically equate to the left and right half of a bell-shape curve, respectively.
  • a typical candidate bell-shape curve could be a Hamming or Kaiser-Bessel window function. This window function is then applied on the analysis frame samples. The analysis history buffer (HB E ) is then updated by the last sHB E samples from the current analysis frame. This completes the boundary analysis.
  • An optional normalization function 102 in the general purpose audio codec performs a normalization of the windowed output signal from the boundary analysis block.
  • the normalization function 102 the average time-domain signal amplitude over the entire coding frame (Ns samples) is calculated. Then a scalar quantization of the average amplitude is performed. The quantized value is used to normalize the input time-domain signal. The purpose of this normalization is to reduce the signal dynamic range, which will result in bit savings during the later quantization stage.
  • This normalization is performed after boundary analysis and in the time-domain for the following reasons: (1) the boundary matching needs to be performed on the original signal in the time-domain where the signal is continuous; and (2) it is preferable for the scalar quantization table to be independent of the subsequent transform, and thus it must be performed before the transform.
  • the scalar normalization factor is later encoded as part of the encoding of the audio signal.
  • the transform function 104 transforms each time-domain block to a transform domain block comprising a plurality of coefficients.
  • the transform algorithm is an adaptive cosine packet transform (ACPT).
  • ACPT is an extension or generalization of the conventional cosine packet transform (CPT).
  • CPT consists of cosine packet analysis (forward transform) and synthesis (inverse transform). The following describes the steps of performing cosine packet analysis in the preferred embodiment.
  • Mathwork's Matlab notation is used in the pseudo-codes throughout this description, where: 1:m implies an array of numbers with starting value of 1, increment of 1, and ending value of m; and .*, ./, and . 2 indicate the point-wise multiply, divide, and square operations, respectively.
  • N the number of sample points in the cosine packet transform.
  • D the depth of the finest time splitting
  • the function dct4 is the type IV discrete cosine transform. When Nc is a power of 2, a fast dct4 transform can be used.
  • ACPT adaptive cosine packet transform
  • D2 The purpose of introducing D2 is to provide a means to stop the basis splitting at a point (D2) which could be smaller than the maximum allowed value D, thus de-coupling the link between the size of the edge correction region of ACPT and the finest splitting of best basis. If pre-splitting is required, then the best basis analysis is carried out for each of the pre-split sub-frames, yielding an extended best basis tree (a 2-D array, instead of the conventional 1-D array). Since the only difference between ACPT and CPT is to allow for more flexible best basis selection, which we have found to be very helpful in the context of low bit-rate audio coding, ACPT is a reversible transform like CPT.
  • [0085] 3. Perform an adaptive switching algorithm to determine whether a pre-split at level D1 is needed for the current ACPT frame.
  • Many algorithms are available for such adaptive switching.
  • Another class of approaches would be to use the packet transform table coefficients at level D1.
  • One candidate in this class of approaches is to calculate the entropy of the transform coefficients for each of the pre-split sub-frames individually. Then, an entropy-based switching criterion can be used.
  • Other candidates include computing some transient signature parameters from the available transform coefficients from Step 2, and then employing some appropriate criteria.
  • Nt is a threshold number which is typically set to a fraction of Nj (e.g., ⁇ fraction (Nj/8) ⁇ ).
  • the thr1 and thr2 are two empirically determined threshold values. The first criterion detects the transient signal amplitude variation, the second detects the transform coefficients (similar to the DCT coefficients within each sub-frame) or spectrum spread per unit of entropy value.
  • D0 and D2 are the maximum depths for time-splitting PRE-SPLIT_REQUIRED and PRE-SPLIT_NOT_REQUIRED, respectively.
  • each 1-D sub-array is the statistics tree for one sub-frame.
  • the PRE-SPLIT_REQUIRED case there are 2 D1 such sub-arrays.
  • the PRE-SPLIT_NOT_REQUIRED case there is no splitting (or just one sub-frame), so there is only one sub-array, i.e., strees becomes a 1-D array.
  • ACPT computes the transform table coefficients only at the required time-splitting levels, ACPT is generally less computationally complex than CPT.
  • the extended best basis tree (2-D array) can be considered an array of individual best basis trees (1-D) for each sub-frame.
  • a lossless (optimal) variable length technique for coding a best basis tree is preferred:
  • the signal and residue classifier (SRC) function 106 partitions the coefficients of each time-domain block into signal coefficients and residue coefficients. More particularly, the SRC function 106 separates strong input signal components (called signal) from noise and weak signal components (collectively called residue). As discussed above, there are two preferred approaches for SRC. In both cases, ASVQ is an appropriate technique for subsequent quantization of the signal. The following describes the second approach that identifies signal and residue in clusters:
  • Nt is a threshold number which is typically set to a fraction of N.
  • zone zone(:, index);
  • minZS is the minimum zone size, which is empirically determined to minimize the required quantization bits for coding the signal zone indices and signal vectors.
  • the signal components are processed by a quantization function 108 .
  • the preferred quantization for signal components is adaptive sparse vector quantization (ASVQ).
  • ASVQ is the preferred quantization scheme for such sparse vectors.
  • type IV quantization in ASVQ applies.
  • An improvement to ASVQ type IV quantization can be accomplished in cases where all signal components are contained in a number of contiguous clusters.
  • ASVQ supports variable bit allocation, which allows various types of vectors to be coded differently in a manner that reduces psychoacoustic artifacts.
  • a simple bit allocation scheme is implemented to rigorously quantize the strongest signal components. Such a fine quantization is required in the preferred framework due to the block-discontinuity minimization mechanism.
  • the variable bit allocation enables different quality settings for the codec.
  • the residue components which are weak and psychoacoustically less important, are modeled as stochastic noise in order to achieve low bit-rate coding.
  • the motivation behind such a model is that, for residue components, it is more important to reconstruct their energy levels correctly than to re-create their phase information.
  • the stochastic noise model of the preferred embodiment follows;
  • a DCT or FFT is performed and the subsequent spectral coefficients are grouped into a number of subbands.
  • the sizes and number of subbands can be variable and dynamically determined.
  • a mean energy level then would be calculated for each spectral subband.
  • the subband energy vector then could be encoded in either the linear or logarithmic domain by an appropriate vector quantization technique.
  • the preferred audio codec is a general purpose algorithm that is designed to deal with arbitrary types of signals, it takes advantage of spectral or temporal properties of an audio signal to reduce the bit-rate. This approach may lead to rates that are outside of the targeted rate ranges (sometime rates are too low and sometimes rates are higher than the desired, depending on the audio content). Accordingly, a rate control function 112 is optionally applied to bring better uniformity to the resulting bit-rates.
  • the preferred rate control mechanism operates as a feedback loop to the SRC 106 or quantization 108 functions.
  • the preferred algorithm dynamically modifies the SRC or ASVQ quantization parameters to better maintain a desired bit rate.
  • the dynamic parameter modifications are driven by the desired short-term and long-term bit rates.
  • the short-term bit rate can be defined as the “instantaneous” bit-rate associated with the current coding frame.
  • the long-term bit-rate is defined as the average bit-rate over a large number or all of the previously coded frames.
  • the preferred algorithm attempts to target a desired short-term bit rate associated with the signal coefficients through an iterative process. This desired bit rate is determined from the short-term bit rate for the current frame and the short-term bit rate not associated with the signal coefficients of the previous frame.
  • the expected short-term bit rate associated with the signal can be predicted based on a linear model:
  • a and B are functions of quantization related parameters, collectively represented as q.
  • the variable q can take on values from a limited set of choices, represented by the variable n.
  • An increase (decrease) in n leads to better (worse) quantization for the signal coefficients.
  • S represents the percentage of the frame that is classified as signal, and it is a function of the characteristics of the current frame.
  • S can take on values from a limited set of choices, represented by the variable m. An increase (decrease) in m leads to a larger (smaller) portion of the frame being classified as signal.
  • the rate control mechanism targets the desired long-term bit rate by predicting the short-term bit rate and using this prediction to guide the selection of classification and quantization related parameters associated with the preferred audio codec.
  • the use of this model to predict the short-term bit rate associated with the current frame offers the following benefits:
  • the rate control mechanism can react in situ to transient signals.
  • the preferred implementation uses both the long-term bit rate and the short-term bit rate to guide the encoder to better target a desired bit rate.
  • the algorithm is activated under four conditions:
  • the preferred implementation of the rate control mechanism is outlined in the threestep procedure below. The four conditions differ in Step 3 only.
  • the implementation of Step 3 for cases 1 (LOW, LOW) and 4 (HIGH, HIGH) are given below.
  • Case 2 (LOW, HIGH) and Case 4 (HIGH, HIGH) are identical, with the exception that they have different values for the upper limit of the target short-term bit rate for the signal coefficients.
  • Case 3 (HIGH, LOW) and Case 1 (HIGH, HIGH) are identical, with the exception that they have different values for the lower limit of the target short-term bit rate for the signal coefficients. Accordingly, given n and m used for the previous frame:
  • the indices output by the quantization function 108 and the Stochastic Noise Analysis function 110 are formatted into a suitable bit-stream form by the bit-stream formatting function 114 .
  • the output information may also include zone indices to indicate the location of the quantization and stochastic noise analysis indices, rate control information, best basis tree information, and any normalization factors.
  • the format is the “ART” multimedia format used by America Online and further described in U.S. patent application Ser. No. 08/866.857, filed May 30, 1997 entitled “Encapsulated Document and Format System”, assigned to the assignee of the present invention and hereby incorporated by reference.
  • Formatting may include such information as identification fields, field definitions, error detection and correction data, version information, etc.
  • the formatted bit-stream represents a compressed audio file that may then be transmitted over a channel, such as the Internet, or stored on a medium, such as a magnetic or optical data storage disk.
  • FIG. 3 is a block diagram of a preferred general purpose audio decoding system in accordance with the invention.
  • the preferred audio decoding system may be implemented in software or hardware, and comprises 7 major functional blocks, 200 - 212 , which are described below.
  • An incoming bit-stream previously generated by an audio encoder in accordance with the invention is coupled to a bit-stream decoding function 200 .
  • the decoding function 200 simply disassembles the received binary data into the original audio data, separating out the quantization indices and Stochastic Noise Analysis indices into corresponding signal and noise energy values, in known fashion.
  • the Stochastic Noise Analysis indices are applied to a Stochastic Noise Synthesis function 202 .
  • a Stochastic Noise Synthesis function 202 there are two preferred implementations of the stochastic noise synthesis. Given coded spectral energy for each frequency band, one can synthesize the stochastic noise in either the spectral domain or the time-domain for each of the residue sub-frames.
  • the spectral domain approaches generate pseudo-random numbers, which are scaled by the residue energy level in each frequency band. These scaled random numbers for each band are used as the synthesized DCT or FFT coefficients. Then, the synthesized coefficients are inversely transformed to form a time-domain spectrally colored noise signal. This technique is lower in computational complexity than its time-domain counterpart, and is useful when the residue sub-frame sizes are small.
  • the time-domain technique involves a filter bank based noise synthesizer.
  • a bank of band-limited filters, one for each frequency band, is pre-computed.
  • the time-domain noise signal is synthesized one frequency band at a time. The following describes the details of synthesizing the time-domain noise signal for one frequency band:
  • a random number generator is used to generate white noise.
  • the white noise signal is fed through the band-limited filter to produce the desired spectrally colored stochastic noise for the given frequency band.
  • the noise gain curve for the entire coding frame is determined by interpolating the encoded residue energy levels among residue sub-frames and between audio coding frames. Because of the interpolation, such a noise gain curve is continuous. This continuity is an additional advantage of the time-domain-based technique.
  • Steps 1 and 2 can be pre-computed, thereby eliminating the need for implementing these steps during the decoding process. Computational complexity can therefore be reduced.
  • the quantization indices are applied to an inverse quantization function 204 to generate signal coefficients.
  • the de-quantization process is carried out for each of the best basis trees for each sub-frame.
  • the signal coefficients are applied to an inverse transform function 206 to generate a time-domain reconstructed signal waveform.
  • the adaptive cosine synthesis is similar to its counterpart in CPT with one additional step that converts the extended best basis tree (2-D array in general) into the combined best basis tree (1-D array). Then the cosine packet synthesis is carried out for the inverse transform. Details follow:
  • the time-domain reconstructed signal and synthesized stochastic noise signal, from the inverse adaptive cosine packet synthesis function 206 and the stochastic noise synthesis function 202 , respectively, are combined to form the complete reconstructed signal.
  • the reconstructed signal is then optionally multiplied by the encoded scalar normalization factor in a renormalization function 208 .
  • the boundary synthesis function 210 constitutes the last functional block before any time-domain post-processing (including but not limited to soft clipping, scaling, and re-sampling). Boundary synthesis is illustrated in the bottom (Decode) portion of FIG. 4.
  • a synthesis history buffer (HB D ) is maintained for the purpose of boundary interpolation.
  • the size of this history (sHB D ) is a fraction of the size of the analysis history buffer (sHB E ), namely,
  • Ns ⁇ sHB E samples are called the pre-interpolation output data.
  • the first sHB D samples of the pre-interpolation output data overlap with the samples kept in the synthesis history buffer in time. Therefore, a simple interpolation (e.g., linear interpolation) is used to reduce the boundary discontinuity.
  • the Ns ⁇ sHB E output data is then sent to the next functional block (in this embodiment, soft clipping 212 ).
  • the synthesis history buffer is subsequently updated by the sHB D samples from the current synthesis frame, starting at sample number Ns ⁇ sHB E /2 ⁇ sHB D /2.
  • the output of the boundary synthesis component 210 is applied to a soft clipping component 212 .
  • Signal saturation in low bit-rate audio compression due to lossy algorithms is a significant source of audible distortion if a simple and naive “hard clipping” mechanism is used to remove them.
  • Soft clipping reduces spectral distortion when compared to the conventional “hard clipping” technique. The preferred soft clipping algorithm is described in allowed U.S. patent application Ser. No. 08/958,567 referenced above.
  • the invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus to perform the required method steps. However, preferably, the invention is implemented in one or more computer programs executing on programmable systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The program code is executed on the processors to perform the functions described herein.
  • Each such program may be implemented in any desired computer language (including but not limited to machine, assembly, and high level logical, procedural, or object oriented programming languages) to communicate with a computer system.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on a storage media or device (e.g., ROM, CD-ROM, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
  • a storage media or device e.g., ROM, CD-ROM, or magnetic or optical media
  • the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Compressing the digitized time-domain continuous input signal typically includes formatting the input signal into a plurality of time-domain blocks having boundaries, forming an overlapping time-domain block by prepending a fraction of a previous time-domain block to a current time-domain block, transforming each overlapping time-domain block to a transform domain block including a plurality of coefficients, partitioning the coefficients of each transform domain block into signal coefficients and residue coefficients, quantizing the signal coefficients for each transformed domain block and generating signal quantization indices indicative of such quantization, modeling the residue coefficients for each transform domain block as stochastic noise and generating residue quantization indices indicative of such quantization, and formatting the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream. The continuous data may include audio data.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a division of U.S. application Ser. No. 09/321,488, filed May 27, 1999, and titled “Method and System For Reduction of Quantization-Induced Block-Discontinuities and General Purpose Audio Codec,” which is incorporated by reference.[0001]
  • TECHNICAL FIELD
  • This invention relates to compression and decompression of continuous signals, and more particularly to a method and system for reduction of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals. [0002]
  • BACKGROUND
  • A variety of audio compression techniques have been developed to transmit audio signals in constrained bandwidth channels and store such signals on media with limited storage capacity. For general purpose audio compression, no assumptions can be made about the source or characteristics of the sound. Thus, compression/decompression algorithms must be general enough to deal with the arbitrary nature of audio signals, which in turn poses a substantial constraint on viable approaches. In this document, the term “audio” refers to a signal that can be any sound in general, such as music of any type, speech, and a mixture of music and speech. General audio compression thus differs from speech coding in one significant aspect: in speech coding where the source is known a priori, model-based algorithms are practical. [0003]
  • Most approaches to audio compression can be broadly divided into two major categories: time and transform domain quantization. The characteristics of the transform domain are defined by the reversible transformations employed. When a transform such as the fast Fourier transform (FFT), discrete cosine transform (DCT), or modified discrete cosine transform (MDCT) is used, the transform domain is equivalent to the frequency domain. When transforms like wavelet transform (WT) or packet transform (PT) are used, the transform domain represents a mixture of time and frequency information. [0004]
  • Quantization is one of the most common and direct techniques to achieve data compression. There are two basic quantization types: scalar and vector. Scalar quantization encodes data points individually, while vector quantization groups input data into vectors, each of which is encoded as a whole. Vector quantization typically searches a codebook (a collection of vectors) for the closest match to an input vector, yielding an output index. A dequantizer simply performs a table lookup in an identical codebook to reconstruct the original vector. Other approaches that do not involve codebooks are known, such as closed form solutions. [0005]
  • A coder/decoder (“codec”) that complies with the MPEG-Audio standard (ISO/IEC 11172-3; 1993(E))(here, simply “MPEG”)is an example of an approach employing time-domain scalar quantization. In particular, MPEG employs scalar quantization of the time-domain signal in individual subbands, while bit allocation in the scalar quantizer is based on a psychoacoustic model, which is implemented separately in the frequency domain (dualpath approach). [0006]
  • It is well known that scalar quantization is not optimal with respect to rate/distortion tradeoffs. Scalar quantization cannot exploit correlations among adjacent data points and thus scalar quantization generally yields higher distortion levels for a given bit rate. To reduce distortion, more bits must be used. Thus, time-domain scalar quantization limits the degree of compression, resulting in higher bit-rates. [0007]
  • Vector quantization schemes usually can achieve far better compression ratios than scalar quantization at a given distortion level. However, the human auditory system is sensitive to the distortion associated with zeroing even a single time-domain sample. This phenomenon makes direct application of traditional vector quantization techniques on a time-domain audio signal an unattractive proposition, since vector quantization at the rate of 1 bit per sample or lower often leads to zeroing of some vector components (that is, time-domain samples). [0008]
  • These limitations of time-domain-based approaches may lead one to conclude that a frequency domain-based (or more generally, a transform domain-based) approach may be a better alternative in the context of vector quantization for audio compression. However, there is a significant difficulty that needs to be resolved in non-time-domain quantization based audio compression. The input signal is continuous, with no practical limits on the total time duration. It is thus necessary to encode the audio signal in a piecewise manner. Each piece is called an audio encode or decode block or frame. Performing quantization in the frequency domain on a per frame basis generally leads to discontinuities at the frame boundaries. Such discontinuities yield objectionable audible artifacts (“clicks” and “pops”). One remedy to this discontinuity problem is to use overlapped frames, which results in proportionately lower compression ratios and higher computational complexity. A more popular approach is to use critically sampled subband filter banks, which employ a history buffer that maintains continuity at frame boundaries, but at a cost of latency in the codec-reconstructed audio signal. The long history buffer may also lead to inferior reconstructed transient response, resulting in audible artifacts. Another class of approaches enforces boundary conditions as constraints in audio encode and decode processes. The formal and rigorous mathematical treatments of the boundary condition constraint-based approaches generally involve intensive computation, which tends to be impractical for real-time applications. [0009]
  • The inventors have determined that it would be desirable to provide an audio compression technique suitable for real-time applications while having reduced computational complexity. The technique should provide low bit-rate full bandwidth compression (about 1-bit per sample) of music and speech, while being applicable to higher bit-rate audio compression. The present invention provides such a technique. [0010]
  • SUMMARY
  • The invention includes a method and system for minimization of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals. In one embodiment, the invention includes a general purpose, ultra-low latency audio codec algorithm. [0011]
  • In one aspect, the invention includes: a method and apparatus for compression and decompression of audio signals using a novel boundary analysis and synthesis framework to substantially reduce quantization-induced frame or block-discontinuity; a novel adaptive cosine packet transform (ACPT) as the transform of choice to effectively capture the input audio characteristics; a signal-residue classifier to separate the strong signal clusters from the noise and weak signal components (collectively called residue); an adaptive sparse vector quantization (ASVQ) algorithm for signal components; a stochastic noise model for the residue; and an associated rate control algorithm. This invention also involves a general purpose framework that substantially reduces the quantization-induced block-discontinuity in lossy data compression involving any continuous data. [0012]
  • The ACPT algorithm dynamically adapts to the instantaneous changes in the audio signal from frame to frame, resulting in efficient signal modeling that leads to a high degree of data compression. Subsequently, a signal/residue classifier is employed to separate the strong signal clusters from the residue. The signal clusters are encoded as a special type of adaptive sparse vector quantization. The residue is modeled and encoded as bands of stochastic noise. [0013]
  • More particularly, in one aspect, the invention includes a zero-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including performing a first quantization of each block and generating first quantization indices indicative of such first quantization; determining a quantization error for each block; performing a second quantization of any quantization error arising near the boundaries of each block from such first quantization and generating second quantization indices indicative of such second quantization; and encoding the first and second quantization indices and formatting such encoded indices as an output bit-stream. [0014]
  • In another aspect, the invention includes a low-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including forming an overlapping time-domain block by prepending a small fraction of a previous time-domain block to a current time-domain block; performing a reversible transform on each overlapping time-domain block, so as to yield energy concentration in the transform domain; quantizing each reversibly transformed block and generating quantization indices indicative of such quantization; encoding the quantization indices for each quantized block as an encoded block, and outputting each encoded block as a bit-stream; decoding each encoded block into quantization indices; generating a quantized transform-domain block from the quantization indices; inversely transforming each quantized transform-domain block into an overlapping time-domain block; excluding data from regions near the boundary of each overlapping time-domain block and reconstructing an initial output data block from the remaining data of such overlapping time-domain block; interpolating boundary data between adjacent overlapping time-domain blocks; and prepending the interpolated boundary data with the initial output data block to generate a final output data block. [0015]
  • The invention also includes corresponding methods for decompressing a bitstream representing an input signal compressed in this manner, particularly audio data. The invention further includes corresponding computer program implementations of these and other algorithms. [0016]
  • Advantages of the invention include: [0017]
  • A novel block-discontinuity minimization framework that allows for flexible and dynamic signal or data modeling; [0018]
  • A general purpose and highly scalable audio compression technique; [0019]
  • High data compression ratio/lower bit-rate, characteristics well suited for applications like real-time or non-real-time audio transmission over the Internet with limited connection bandwidth; [0020]
  • Ultra-low to zero coding latency, ideal for interactive real-time applications; [0021]
  • Ultra-low bit-rate compression of certain types of audio; [0022]
  • Low computational complexity. [0023]
  • The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of theinvention will be apparent from the description and drawings, and from the claims. [0024]
  • DESCRIPTION OF DRAWINGS
  • FIGS. [0025] 1A-1C are waveform diagrams for a data block derived from a continuous data stream. FIG. 1A shows a sine wave before quantization. FIG. 1B shows the sine wave of FIG. 1A after quantization. FIG. 1C shows that the quantization error or residue (and thus energy concentration) substantially increases near the boundaries of the block.
  • FIG. 2 is a block diagram of a preferred general purpose audio encoding system in accordance with the invention. [0026]
  • FIG. 3 is a block diagram of a preferred general purpose audio decoding system in accordance with the invention. [0027]
  • FIG. 4 illustrates the boundary analysis and synthesis aspects of the invention. [0028]
  • Like reference numbers and designations in the various drawings indicate like elements. [0029]
  • DETAILED DESCRIPTION General Concepts
  • The following subsections describe basic concepts on which the invention is based, and characteristics of the preferred embodiment. [0030]
  • Framework for Reduction of Quantization-Induced Block-Discontinuity
  • When encoding a continuous signal in a frame or block-wise manner in a transform domain. block-independent application of lossy quantization of the transform coefficients will result in discontinuity at the block boundary. This problem is closely related to the so-called “Gibbs leakage” problem. Consider the case where the quantization applied in each data block is to reconstruct the original signal waveform, in contrast to quantization that reproduces the original signal characteristics, such as its frequency content. We define the quantization error. or “residue”, in a data block to be the original signal minus the reconstructed signal. If the quantization in question is lossless, then the residue is zero for each block, and no discontinuity results (we always assume the original signal is continuous). However, in the case of lossy quantization, the residue is non-zero, and due to the block-independent application of the quantization, the residue will not match at the block boundaries; hence, block-discontinuity will result in the reconstructed signal. If the quantization error is relatively small when compared to the original signal strength. i.e., the reconstructed waveform approximates the original signal within a data block, one interesting phenomenon arises: the residue energy tends to concentrate at both ends of the block boundary. In other words, the Gibbs leakage energy tends to concentrate at the block boundaries. Certain windowing techniques can further enhance such residue energy concentration. [0031]
  • As an example of Gibbs leakage energy, FIGS. [0032] 1A-1C are waveform diagrams for a data block derived from a continuous data stream. FIG. 1A shows a sine wave before quantization. FIG. 1B shows the sine wave of FIG. 1A after quantization. FIG. 1C shows that the quantization error or residue (and thus energy concentration) substantially increases near the boundaries of the block.
  • With this concept in mind, one aspect of the invention encompasses: [0033]
  • 1. Optional use of a windowing technique to enhance the residue energy concentration near the block boundaries. Preferred is a windowing function characterized by the identity function (i.e., no transformation) for most of a block, but with bell-shaped decays near the boundaries of a block (see FIG. 4, described below). [0034]
  • 2. Use of dynamically adapted signal modeling to effectively capture the signal characteristics within each block without regard to neighboring blocks. [0035]
  • 3. Efficient quantization on the transform coefficients to approximate the original waveform. [0036]
  • 4. Use of one of two approaches near the block boundaries, where the residue energy is concentrated, to substantially reduce the effects of quantization error: [0037]
  • (1) Residue quantization: Application of rigorous time-domain waveform quantization of the residue (i.e., the quantization error near the boundaries of each frame). In essence. more bits are used to define the boundaries by encoding the residue near the block-boundaries. This approach is slightly less efficient in coding but results in zero coding latency. [0038]
  • (2) Boundary exclusion and interpolation: During encoding. overlapped data blocks with a small overlapped data region that contains all the concentrated residue energy are used, resulting in a small coding latency. During decoding, each reconstructed block excludes the boundary regions where residue energy concentrates, resulting in a minimized time-domain residue and block-discontinuity. Boundary interpolation is then used to further reduce the block-discontinuity. [0039]
  • 5. Modeling the remaining residue energy as bands of stochastic noise, which provides the psychoacoustic masking for artifacts that may be introduced in the signal modeling, and approximates the original noise floor. [0040]
  • The characteristics and advantages of this procedural framework are the following: [0041]
  • 1. It applies to any transform-based (actually, any reversible operation-based) coding of an arbitrary continuous signal (including but not limited to audio signals) employing quantization that approximates the original signal waveform. [0042]
  • 2. Great flexibility, in that it allows for many different classes of solutions. [0043]
  • 3. It allows for block-to-block adaptive change in transformation, resulting in potentially optimal signal modeling and transient fidelity. [0044]
  • 4. It yields very low to zero coding latency since it does not rely on a long history buffer to maintain the block continuity. [0045]
  • 5. It is simple and low in computational complexity. [0046]
  • Application of Framework for Reduction of Quantization-Induced Block-Discontinuity to Audio Compression
  • An ideal audio compression algorithm may include the following features: [0047]
  • 1. Flexible and dynamic signal modeling for coding efficiency; [0048]
  • 2. Continuity preservation without introducing long coding latency or compromising the transient fidelity; [0049]
  • 3. Low computation complexity for real-time applications. [0050]
  • Traditional approaches to reducing quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals typically rely on a long history buffer (e.g., multiple frames) to maintain the boundary continuity at the expense of codec latency, transient fidelity, and coding efficiency. The transient response gets compromised due to the averaging or smearing effects of a long history buffer. The coding efficiency is also reduced because maintenance of continuity through a long history buffer precludes adaptive signal modeling, which is necessary when dealing with the dynamic nature of arbitrary audio signals. The framework of the present invention offers a solution for coding of continuous data particularly audio data, without such compromises. As stated in the last subsection, this framework is very flexible in nature, which allows for many possible implementations of coding algorithms. Described below is a novel and practical general purpose, low-latency, and efficient audio coding algorithm. [0051]
  • Adaptive Cosine Packet Transform (ACPT)
  • The (wavelet or cosine) packet transform (PT) is a well-studied subject in the wavelet research community as well as in the data compression community. A wavelet transform (WT) results in transform coefficients that represent a mixture of time and frequency domain characteristics. One characteristic of WTs is that it has mathematically compact support. In other words, the wavelet has basis functions that are non-vanishing only in a finite region, in contrast to sine waves that extend to infinity. The advantage of such compact support is that WTs can capture more efficiently the characteristics of a transient signal impulse than FFTs or DCTs can. PTs have the further advantage that they adapt to the input signal time scale through best basis analysis (by minimizing certain parameters like entropy), yielding even more efficient representation of a transient signal event. Although one can certainly use WTs or PTs as the transform of choice in the present audio coding framework, it is the inventors' intention to present ACPT as the preferred transform for an audio codec. One advantage of using a cosine packet transform (CPT) for audio coding is that it can efficiently capture transient signals, while also adapting to harmonic-like (sinusoidal-like) signals appropriately. [0052]
  • ACPTs are an extension to conventional CPTs that provide a number of advantages. In low bit-rate audio coding, coding efficiency is improved by using longer audio coding frames (blocks). When a highly transient signal is embedded in a longer coding frame. CPTs may not capture the fast time response. This is because, for example, in the best basis analysis algorithm that minimizes entropy, entropy may not be the most appropriate signature (nonlinear dependency on the signal normalization factor is one reason) for time scale adaptation under certain signal conditions. An ACPT provides an alternative by pre-splitting the longer coding frame into sub-frames through an adaptive switching mechanism, and then applying a CPT on the subsequent sub-frames. The “best basis” associated with ACPTs is called the extended best basis. [0053]
  • Signal and Residue Classifier (SRC)
  • To achieve low bit-rate compression (e.g., at 1-bit per sample or lower), it is beneficial to separate the strong signal component coefficients in the set of transform coefficients from the noise and very weak signal component coefficients. For the purpose of this document, the term “residue” is used to describe both noise and weak signal components. A Signal and Residue Classifier (SRC) may be implemented in different ways. One approach is to identify all the discrete strong signal components from the residue, yielding a sparse vector signal coefficient frame vector, where subsequent adaptive sparse vector quantization (ASVQ) is used as the preferred quantization mechanism. A second approach is based on one simple observation of natural signals: the strong signal component coefficients tend to be clustered. Therefore, this second approach would separate the strong signal clusters from the contiguous residue coefficients. The subsequent quantization of the clustered signal vector can be regarded as a special type of ASVQ (global clustered sparse vector type). It has been shown that the second approach generally yields higher coding efficiency since signal components are clustered, and thus fewer bits are required to encode their locations. [0054]
  • ASVQ
  • As mentioned in the last section, ASVQ is the preferred quantization mechanism for the strong signal components. For a discussion of ASVQ, please refer to allowed U.S. Patent application Ser. No. 08/958,567 by Shuwu Wu and John Mantegna, entitled “Audio Codec using Adaptive Sparse Vector Quantization with Subband Vector Classification”, filed Oct. 28, 1997, which is assigned to the assignee of the present invention and hereby incorporated by reference. [0055]
  • In addition to ASVQ, the preferred embodiment employs a mechanism to provide bit-allocation that is appropriate for the block-discontinuity minimization. This simple yet effective bit-allocation also allows for short-term bit-rate prediction, which proves to be useful in the rate-control algorithm. [0056]
  • Stochastic Noise Model
  • While the strong signal components are coded more rigorously using ASVQ, the remaining residue is treated differently in the preferred embodiment. First, the extended best basis from applying an ACPT is used to divide the coding frame into residue sub-frames. Within each residue sub-frame, the residue is then modeled as bands of stochastic noise. Two approaches may be used: [0057]
  • 1. One approach simply calculates the residue amplitude or energy in each frequency band. Then random DCT coefficients are generated in each band to match the original residue energy. The inverse DCT is performed on the combined DCT coefficients to yield a time-domain residue signal. [0058]
  • 2. A second approach is rooted in time-domain filter bank approach. Again the residue energy is calculated and quantized. On reconstruction, a predetermined bank of filters is used to generate the residue signal for each frequency band. The input to these filters is white noise, and the output is gain-adjusted to match the original residue energy. This approach offers gain interpolation for each residue band between residue frames, yielding continuous residue energy. [0059]
  • Rate Control Algorithm
  • Another aspect of the invention is the application of rate to control to the preferred codec. The rate control mechanism is employed in the encoder to better target the desired range of bit-rates. The rate control mechanism operates as a feedback loop to the SRC block and the ASVQ. The preferred rate control mechanism uses a linear model to predict the short-term bit-rate associated with the current coding frame. It also calculates the long-term bit-rate. Both the short- and long-term bit-rates are then used to select appropriate SRC and ASVQ control parameters. This rate control mechanism offers a number of benefits, including reduced complexity in computation complexity without applying quantization and in in situ adaptation to transient signals. [0060]
  • Flexibility
  • As discussed above, the framework for minimization of quantization-induced block-discontinuity allows for dynamic and arbitrary reversible transform-based signal modeling. This provides flexibility for dynamic switching among different signal models and the potential to produce near-optimal coding. This advantageous feature is simply not available in the traditional MPEG I or MPEG II audio codecs or in the advanced audio codec (AAC). (For a detailed description of AAC, please see the References section below). This is important due to the dynamic and arbitrary nature of audio signals. The preferred audio codec of the invention is a general purpose audio codec that applies to all music, sounds, and speech. Further, the codec's inherent low latency is particularly useful in the coding of short (on the order of one second) sound effects. [0061]
  • Scalability
  • The preferred audio coding algorithm of the invention is also very scalable in the sense that it can produce low bit-rate (about 1 bit/sample) full bandwidth audio compression at sampling rates ranging from 8kHz to 44kHz with only minor adjustments in coding parameters. This algorithm can also be extended to high quality audio and stereo compression. [0062]
  • Audio Encoding/Decoding
  • The preferred audio encoding and decoding embodiments of the invention form an audio coding and decoding system that achieves audio compression at variable low bit-rates in the neighborhood of 0.5 to 1.2 bits per sample. This audio compression system applies to both low bit-rate coding and high quality transparent coding and audio reproduction at a higher rate. The following sections separately describe preferred encoder and decoder embodiments. [0063]
  • Audio Encoding
  • FIG. 2 is a block diagram of a preferred general purpose audio encoding system in accordance with the invention. The preferred audio encoding system may be implemented in software or hardware, and comprises 8 major functional blocks, 100-114, which are described below. [0064]
  • Boundary Analysis 100
  • Excluding any signal pre-processing that converts input audio into the internal codec sampling frequency and pulse code modulation (PCM) representation, [0065] boundary analysis 100 constitutes the first functional block in the general purpose audio encoder. As discussed above, either of two approaches to reduction of quantization-induced block-discontinuities may be applied. The first approach (residue quantization) yields zero latency at a cost of requiring encoding of the residue waveform near the block boundaries (“near” typically being about {fraction (1/16)} of the block size). The second approach (boundary exclusion and interpolation) introduces a very small latency, but has better coding efficiency because it avoids the need to encode the residue near the block boundaries, where most of the residue energy concentrates. Given the very small latency that this second approach introduces in the audio coding relative to a state-of-the-art MPEG AAC codec (where the latency is multiple frames vs. a fraction of a frame for the preferred codec of the invention), it is preferable to use the second approach for better coding efficiency, unless zero latency is absolutely required.
  • Although the two different approaches have an impact on the subsequent vector quantization block, the first approach can simply be viewed as a special case of the second approach as far as the [0066] boundary analysis function 100 and synthesis function 212 (see FIG. 3) are concerned. So a description of the second approach suffices to describe both approaches.
  • FIG. 4 illustrates the boundary analysis and synthesis aspects of the invention. The following technique is illustrated in the top (Encode) portion of FIG. 4. An audio coding (analysis or synthesis) frame consists of a sufficient (should be no less than 256, preferably 1024 or 2048) number of samples, Ns. In general, larger Ns values lead to higher coding efficiency, but at a risk of losing fast transient response fidelity. An analysis history buffer (HB[0067] E) of size sHBE=RE*Ns samples from the previous coding frame is kept in the encoder, where REis a small fraction (typically set to {fraction (1/16)} or ⅛ of the block size) to cover regions near the block boundaries that have high residue energy. During the encoding of the current frame sInput=(1−RE)*Ns samples are taken in and concatenated with the samples in HBE to form a complete analysis frame. In the decoder, a similar synthesis history buffer (HBD) is also kept for boundary interpolation purposes, as described in a later section. The size of HBD is sHBD=RD*sHBE=RD*RE×Ns samples, where RD is a fraction, typically set to ¼.
  • A window function is created during audio codec initialization to have the following properties: (1) at the center region of Ns−sHB[0068] E+sHBD samples in size, the window function equals unity (i.e., the identity function); and (2) the remaining equally divided left and right edges typically equate to the left and right half of a bell-shape curve, respectively. A typical candidate bell-shape curve could be a Hamming or Kaiser-Bessel window function. This window function is then applied on the analysis frame samples. The analysis history buffer (HBE) is then updated by the last sHBE samples from the current analysis frame. This completes the boundary analysis.
  • When the parameter R[0069] E is set to zero, this analysis reduces to the first approach mentioned above. Therefore, residue quantization can be viewed as a special case of boundary exclusion and interpolation.
  • Normalization 102
  • An [0070] optional normalization function 102 in the general purpose audio codec performs a normalization of the windowed output signal from the boundary analysis block. In the normalization function 102, the average time-domain signal amplitude over the entire coding frame (Ns samples) is calculated. Then a scalar quantization of the average amplitude is performed. The quantized value is used to normalize the input time-domain signal. The purpose of this normalization is to reduce the signal dynamic range, which will result in bit savings during the later quantization stage. This normalization is performed after boundary analysis and in the time-domain for the following reasons: (1) the boundary matching needs to be performed on the original signal in the time-domain where the signal is continuous; and (2) it is preferable for the scalar quantization table to be independent of the subsequent transform, and thus it must be performed before the transform. The scalar normalization factor is later encoded as part of the encoding of the audio signal.
  • Transform 104
  • The [0071] transform function 104 transforms each time-domain block to a transform domain block comprising a plurality of coefficients. In the preferred embodiment, the transform algorithm is an adaptive cosine packet transform (ACPT). ACPT is an extension or generalization of the conventional cosine packet transform (CPT). CPT consists of cosine packet analysis (forward transform) and synthesis (inverse transform). The following describes the steps of performing cosine packet analysis in the preferred embodiment. Note: Mathwork's Matlab notation is used in the pseudo-codes throughout this description, where: 1:m implies an array of numbers with starting value of 1, increment of 1, and ending value of m; and .*, ./, and .
    Figure US20020116199A1-20020822-P00900
    2 indicate the point-wise multiply, divide, and square operations, respectively.
  • CPT
  • Let N be the number of sample points in the cosine packet transform. D be the depth of the finest time splitting, and Nc be the number of samples at the finest time splitting (Nc=N/2[0072]
    Figure US20020116199A1-20020822-P00900
    D, must be an integer). Perform the following:
  • 1. Pre-calculate bell window function bp (interior to domain) and bm (exterior to domain): [0073]
    m = Nc/2;
    x = 0.5 * [1 + (0.5:m-0.5) / m];
    if USE_TRIVIAL_BELL_WINDOW
    bp = sqrt(x);
    elseif USE_SINE_BELL_WINDOW
    bp = sin (pi / 2 * x);
    end
    bm = sqrt(1 − bp.^ 2).
  • 2. Calculate cosine packet transform table, pkt, for input N-point data x: [0074]
    pkt = zeros (N,D+1);
    for d=D:−1:0,
    nP = 2^ d;
    Nj = N / nP;
    for b = 0:nP−1,
    ind = b*Nj + (1:Nj);
    ind1 = 1:m; ind2 = Nj+1 − ind1;
    if b == 0
    xc = x(ind);
    xl = zeros(Nj, 1);
    xl(ind2) = xc(ind1) .* (1−bp) ./bm;
    else
    xl = xc;
    xc = xr;
    end
    if b < nP−1,
    xr = x(Nj+ind);
    else
    xr = zeros(Nj, 1);
    xr(ind1) = −xc(ind2) .* (1−bp) ./ bm;
    end
    xlcr = xc;
    xlcr(ind1) = bp .* xlcr(ind1) + bm .* xl(ind2);
    xlcr(ind2) = bp .* xlcr(ind2) − bm .* xr(ind1);
    c = sqrt(2/NJ) * dct4(xlcr);
    pkt(ind, d+1) = c;
    end
    end
  • The function dct4 is the type IV discrete cosine transform. When Nc is a power of 2, a fast dct4 transform can be used. [0075]
  • 3. Build the statistics tree, stree, for the subsequent best basis analysis. The following pseudo-code demonstrates only the most common case where the basis selection is based on the entropy of the packet transform coefficients: [0076]
    stree = zeros(2^ (D+1)−1, 1);
    pktN_1 = norm(pkt(:,1));
    if pktN_1 ˜= 0,
    pktN_1 = 1 / pktN_1;
    else
    pktN_1 = 1;
    end
    i = 0
    for d = 0:D,
    nP = 2^ d;
    Nj = N / nP
    for b = 0:nP−1,
    i = i+1;
    ind = b * Nj + (1:Nj);
    p = (pkt(ind, d+1)*pktN_1) .^ 2;
    stree(i) =− sum(p.* log(p+eps));
    end;
    end;
  • 4. Perform the best basis analysis to determine the best basis tree, btree: [0077]
    btree = zeros(2^ (D+1)−1, 1);
    vtree = stree;
    for d = D−1:−1:0,
    nP = 2^ d;
    for b = 0:nP−1,
    i = nP +b;
    vparent = stree(i);
    vchild = vtree(2*i) + vtree(2*i+1);
    if vparent < = vchild,
    btree(i) = 0; (terminating node)
    vtree(i) = vparent;
    else
    btree(i) = 1; (non-terminating node)
    vtree(i) = vchild;
    end
    end
    end
    entropy = vtree(1). (total entropy for cosine packet transform coefficients)
  • 5. Determine (optimal) CPT coefficients, opkt, from packet transform table and the best basis tree: [0078]
    opkt = zeros(N, 1);
    stack = zeros(2^ (D+1), 2);
    k = 1;
    while (k > 0),
    d = stack(k, 1);
    b = stack(k, 2);
    k = k−1;
    nP=2^ d;
    i = nP + b
    if btree(i) == 0,
    Nj = N/nP;
    ind = b * Nj + (1:Nj);
    opkt(ind) = pkt(ind, d+1);
    else
    k = k+1; stack(k, :) = [d+1 2*b];
    k = k+1; stack(k, :) = [d+1 2*b+1];
    end
    end
  • For a detailed description of wavelet transforms, packet transforms, and cosine packet transforms, see the References section below. [0079]
  • As mentioned above, the best basis selection algorithms offered by the conventional cosine packet transform sometimes fail to recognize the very fast (relatively speaking) time response inside a transform frame. We determined that it is necessary to generalize the cosine packet transform to what we call the “adaptive cosine packet transform”, ACPT. The basic idea behind ACPT is to employ an independent adaptive switching mechanism, on a frame by frame basis, to determine whether a pre-splitting of the CPT frame at a time splitting level of D1 is required, where 0<=D1<=D. If the pre-splitting is not required, ACPT is almost reduced to CPT with the exception that the maximum depth of time splitting is D2 for ACPTs' best basis analysis, where D1<=D2<=D. [0080]
  • The purpose of introducing D2 is to provide a means to stop the basis splitting at a point (D2) which could be smaller than the maximum allowed value D, thus de-coupling the link between the size of the edge correction region of ACPT and the finest splitting of best basis. If pre-splitting is required, then the best basis analysis is carried out for each of the pre-split sub-frames, yielding an extended best basis tree (a 2-D array, instead of the conventional 1-D array). Since the only difference between ACPT and CPT is to allow for more flexible best basis selection, which we have found to be very helpful in the context of low bit-rate audio coding, ACPT is a reversible transform like CPT. [0081]
  • ACPT
  • The preferred ACPT algorithm follows: [0082]
  • 1. Pre-calculate the bell window functions, bp and bm, as in [0083] Step 1 of the CPT algorithm above.
  • 2. Calculate the cosine packet transform table just for the time splitting level of D1, pkt(:,D1+1), as in CPT Step 2, but only for d=D1 (instead of d=D:−1:0). [0084]
  • 3. Perform an adaptive switching algorithm to determine whether a pre-split at level D1 is needed for the current ACPT frame. Many algorithms are available for such adaptive switching. One can use a time-domain based algorithm, where the adaptive switching can be carried out before Step 2. Another class of approaches would be to use the packet transform table coefficients at level D1. One candidate in this class of approaches is to calculate the entropy of the transform coefficients for each of the pre-split sub-frames individually. Then, an entropy-based switching criterion can be used. Other candidates include computing some transient signature parameters from the available transform coefficients from Step 2, and then employing some appropriate criteria. The following describes only a preferred implementation: [0085]
    nP1 = 2^ D1;
    Nj = N / nP1;
    entropy = zeros(1, nP1);
    amplitude = zeros(1, nP1);
    index = zeros(1, nP1);
    for i = 0:nP1−1,
    ind = i*Nj + (1:Nj);
    ci = pkt(ind, D1+1);
    norm_1 = norm(ci);
    amplitude(i) = norm_1;
    if norm_1 ˜= 0
    norm_1 = 1 / norm_1
    else
    norm_1 = 1
    end
    p = (norm_1*x) .^ 2;
    entropy(i+1) =− sum(p.*log(p+eps));
    ind2 = quickSort(abs(ci)); (quick sort index by abs(ci) in ascending order)
    ind2 = ind2(N+1 − (1:Nt)); (keep Nt indices associated with Nt largest abs(ci))
    index(i) = std(ind2); (standard deviation of ind2, spectrum spread)
    end
    if mean(amplitude) > 0.0,
    amplitude = amplitude/mean(amplitude);
    end
    mEntropy = mean(entropy);
    mIndex = mean(index);
    if max(amp) − min(amp) > thr1| mIndex < thr2 * mEntropy,
    PRE-SPLIT_REQUIRED
    else
    PRE-SPLIT_NOT_REQUIRED
    end;
  • where: Nt is a threshold number which is typically set to a fraction of Nj (e.g., {fraction (Nj/8)}). The thr1 and thr2 are two empirically determined threshold values. The first criterion detects the transient signal amplitude variation, the second detects the transform coefficients (similar to the DCT coefficients within each sub-frame) or spectrum spread per unit of entropy value. [0086]
  • 4. Calculate pkt at the required levels depending on pre-split decision: [0087]
    if PRE-SPLIT_REQUIRED
    CALCULATE pkt for levels = [D1+1:D2];
    else
    if D1 < D0,
    CALCULATE pkt for levels = [0:D1−1 D1+1:D0];
    elseif D1 == D0,
    CALCULATE pkt for levels = [0:D0−1];
    else
    CALCULATE pkt for levels = [0:D0];
    end
    end;
  • where D0 and D2 are the maximum depths for time-splitting PRE-SPLIT_REQUIRED and PRE-SPLIT_NOT_REQUIRED, respectively. [0088]
  • 5. Build statistics tree, stree, as in CPT Step 3, for only the required levels. [0089]
  • 6. Split the statistics tree, stree, into the extended statistics tree, strees, which is generally a 2-D array. Each 1-D sub-array is the statistics tree for one sub-frame. For the PRE-SPLIT_REQUIRED case, there are 2[0090]
    Figure US20020116199A1-20020822-P00900
    D1 such sub-arrays. For the PRE-SPLIT_NOT_REQUIRED case, there is no splitting (or just one sub-frame), so there is only one sub-array, i.e., strees becomes a 1-D array. The details are as follows:
    if PRE-SPLIT_NOT_REQUIRED,
    strees = stree
    else
    nP1 = 2^ D1;
    strees = zeros(2^ (D2−D1+1)−1.nP1);
    index = nP1;
    d2 = D2−D1
    for d = 0:d2,
    for i = 1:nP1,
    for j = 2^ d−1 + (1:2^ d),
    strees(j, i) = stree(index);
    index = index+1;
    end
    end
    end
    end
  • 7. Perform best basis analysis to determine the extended best basis tree, btrees, for each of the sub-frames the same way as in CPT Step 4. [0091]
  • 8. Determine the optimal transform coefficients, opkt, from the extended best basis tree. This involves determining opkt for each of the sub-frames. The algorithm for each sub-frame is the same as in CPT Step 5. [0092]
  • Because ACPT computes the transform table coefficients only at the required time-splitting levels, ACPT is generally less computationally complex than CPT. [0093]
  • The extended best basis tree (2-D array) can be considered an array of individual best basis trees (1-D) for each sub-frame. A lossless (optimal) variable length technique for coding a best basis tree is preferred: [0094]
  • d=maximum depth of time-splitting for the best basis tree in question
  • [0095]
    code = zeros(1,2^ d−1);
    code(1) = btree(1); index = 1;
    for i = 0:d−2,
    nP = 2^ i;
    for b = 0:nP−1,
    if btree(nP+b) == 1,
    code(index + (1:2)) = btree(2*(nP+b) + (0:1)); index = index + 2;
    end
    end
    end
    code = code(1:i); (quantized bit-stream, i bits used)
  • Signal and Residue Classifier 106
  • The signal and residue classifier (SRC) function [0096] 106 partitions the coefficients of each time-domain block into signal coefficients and residue coefficients. More particularly, the SRC function 106 separates strong input signal components (called signal) from noise and weak signal components (collectively called residue). As discussed above, there are two preferred approaches for SRC. In both cases, ASVQ is an appropriate technique for subsequent quantization of the signal. The following describes the second approach that identifies signal and residue in clusters:
  • 1. Sort index in ascending order of the absolute value of the ACPT coefficients, opkt: [0097]
  • ax=abs(opkt);
  • order=quickSort(ax);
  • 2. Calculate global noise floor, gnf: [0098]
  • gnf=ax(N−Nt);
  • where Nt is a threshold number which is typically set to a fraction of N.
  • 3. Determine signal clusters by calculating zone indices, zone, in the first pass: [0099]
    zone = zeros(2, N/2); (assuming no more than N/2 signal clusters)
    zc = 0;
    i = 1;
    inS = 0;
    sc = 0;
    while i <= N
    if ˜inS & ax(i) <= gnf,
    elseif ˜inS & ax(i) > gnf,
    zc = zc+1;
    inS = 1;
    sc = 0;
    zone(1, zc) = i; (start index of a signal cluster)
    elseif inS & ax(i) <= gnf,
    if sc >= nt, (nt is a threshold number, typically set to 5)
    zone(2, zc) = i;
    inS = 0;
    sc = 0;
    else
    sc = sc + 1;
    end;
    elseif inS & ax(i) > gnf
    sc = 0;
    end
    i = i + 1;
    end;
    if zc > 0 & zone(2,zc) == 0,
    zone(2, zc) = N;
    end;
    zone = zone(:, 1:zc);
    for i = 1:zc,
    indH = zone(2, i):
    while zc(indH) <= gnf,
    indH = indH − 1;
    end;
    zone(2, i) = indH;
    end;
  • 4. Determine the signal clusters in the second pass by using a local noise floor Inf, sRR is the size of the neighboring residue region for local noise floor estimation purposes, typically set to a small fraction of N (e.g., {fraction (N/32)}): [0100]
    zone0 = zone(2, :);
    for 1 = 1:zc,
    indL = max(1, zone(1,i)−sRR); indH = min(N, zone(2,i)−sRR);
    index = indL:indH;
    index = indL−1 + find(ax(index) <= gnf);
    if length(index) == 0,
    lnf = gnf;
    else
    lnf = ratio * mean(ax(index));(ratio is threshold number,
    typically set to 4.0)
    end;
    if lnf < gnf,
    indL = zone(1, i); indH = zone(2, i);
    if i = 1,
    indl = 1;
    else
    indl = zone0(i−1);
    end
    if i == zc,
    indh = N;
    else
    indh = zone0(i+1);
    end
    while indL > indl & ax(indL) > lnf,
    indL = indL − 1;
    end;
    while indH < indh & ax(indH) > lnf,
    indH = indH + 1,
    end;
    zone(1, i) = indL; zone(2, i) = indH;
    elseif lnf > gnf,
    indL = zone(1, i); indH = zone(2, i);
    while indL <= indH & ax(indL) <= lnf,
    indL = indL + 1;
    end;
    if indL > indH,
    zone(1, i) = 0; zone(2, i) = 0;
    else
    while indH >= indL & ax(indH) <= lnf,
    indH = indH − 1;
    end
    if indH < indL,
    zone(1, i) = 0; zone(2, i) = 0;
    else
    zone(1, i) = indL; zone(2, i) = indH;
    end
    end
    end
    end
  • 5. Remove the weak signal components: [0101]
    for i = 1:zc,
    indL = zone(1, i);
    if indL > 0,
    indH = zone(2, i); index = indL:indH;
    if max(ax(index)) > Athr, (Athr typically set to 2)
    while ax(indL) < Xthr, (Xthr typically set to 0.2)
    indL = indL+1;
    end
    while ax(indH) < Xthr,
    indH = indH+1;
    end
    zone(1, i) = indL; zone(2, i) = indH;
    end
    end
    end
  • 6. Remove the residue components: [0102]
  • index=find(zone(1,:))>0);
  • zone=zone(:, index);
  • zc=size(zone, 2);
  • 7. Merge signal clusters that are close neighbors: [0103]
    for i = 2:zc,
    indL = zone(1, i);
    if indL > 0 & indL − zone(2, ii−1) < minZS,
    zone(1, i) = zone(1, i−1);
    zone(1, i−1) = 0; zone(2, i−1) = 0;
    end
    end
  • where minZS is the minimum zone size, which is empirically determined to minimize the required quantization bits for coding the signal zone indices and signal vectors. [0104]
  • 8. Remove the residue components again, as in Step [0105] 6.
  • Quantization 108
  • After the [0106] SRC 106 separates ACPT coefficients into signal and residue components, the signal components are processed by a quantization function 108. The preferred quantization for signal components is adaptive sparse vector quantization (ASVQ).
  • If one considers the signal clusters vector as the original ACPT coefficients with the residue components set to zero, then a sparse vector results. As discussed in allowed U.S. patent application Ser. No. 08/958,567 by Shuwu Wu and John Mantegna, entitled “Audio Codec using Adaptive Sparse Vector Quantization with Subband Vector Classification”, filed Oct. 28, 1997, ASVQ is the preferred quantization scheme for such sparse vectors. In the case where the signal components are in clusters, type IV quantization in ASVQ applies. An improvement to ASVQ type IV quantization can be accomplished in cases where all signal components are contained in a number of contiguous clusters. In such cases, it is sufficient to only encode all the start and end indices for each of the clusters when encoding the element location index (ELI). Therefore, for the purpose of ELI quantization, instead of encoding the original sparse vector, a modified sparse vector (a super-sparse vector) with only non-zero elements at the start and end points of each signal cluster is encoded. This results in very significant bit savings. That is one of the main reasons it is advantageous to consider signal clusters instead of discrete components. For a detailed description of Type IV quantization and quantization of the ELI, please refer to the patent application referenced above. Of course, one can certainly use other lossless techniques, such as run length coding with Huffman codes, to encode the ELI. [0107]
  • ASVQ supports variable bit allocation, which allows various types of vectors to be coded differently in a manner that reduces psychoacoustic artifacts. In the preferred audio codec, a simple bit allocation scheme is implemented to rigorously quantize the strongest signal components. Such a fine quantization is required in the preferred framework due to the block-discontinuity minimization mechanism. In addition, the variable bit allocation enables different quality settings for the codec. [0108]
  • Stochastic Noise Analysis 110
  • After the [0109] SRC 106 separates ACPT coefficients into signal and residue components, the residue components, which are weak and psychoacoustically less important, are modeled as stochastic noise in order to achieve low bit-rate coding. The motivation behind such a model is that, for residue components, it is more important to reconstruct their energy levels correctly than to re-create their phase information. The stochastic noise model of the preferred embodiment follows;
  • 1. Construct a residue vector by taking the ACPT coefficient vector and setting all signal components to zero. [0110]
  • 2. Perform adaptive cosine packet synthesis (see above) on the residue vector to synthesize a time-domain residue signal. [0111]
  • 3. Use the extended best basis tree btrees, to split the residue frame into several residue sub-frames of variable sizes. The preferred algorithm is as follows: [0112]
  • join btrees to form a combined best basis tree, btree, as described in Section 5.12, Step 2
  • [0113]
    index = zeros(1, 2{circumflex over ( )}D);
    stack = zeros(2{circumflex over ( )}D+1, 2);
    k = 1;
    nSF = 0;     (number of residue sub-frames)
    while k > 0,
    d = stack(k, 1); b = stack(k, 2);
    k = k − 1;
    nP = 2{circumflex over ( )}d; Nj = N / np;
    i = nP + b;
    if btree(i) == 0,
    nSF = nSF + 1; index(nSF) = b * Nj;
    else
    k = k+1; stack(k, :) = [d+1 2*b];
    k = k+1; stack(k, :) = [d+1 2*b+1];
    end
    end;
    index = index(1:nSF);
    sort index in ascending order
    sSF = zeros(1, nSF);   (sizes of residue sub-frames)
    sSF(1:nSF−1) = diff(index);
    sSF(nSF) = N − index(nSF);
  • 4. Optionally, one may want to limit the maximum or minimum sizes of residue sub-frames by further sub-splitting or merging neighboring sub-frames for practical bit-allocation control. [0114]
  • 5. Optionally, for each residue sub-frame, a DCT or FFT is performed and the subsequent spectral coefficients are grouped into a number of subbands. The sizes and number of subbands can be variable and dynamically determined. A mean energy level then would be calculated for each spectral subband. The subband energy vector then could be encoded in either the linear or logarithmic domain by an appropriate vector quantization technique. [0115]
  • Rate Control 112
  • Because the preferred audio codec is a general purpose algorithm that is designed to deal with arbitrary types of signals, it takes advantage of spectral or temporal properties of an audio signal to reduce the bit-rate. This approach may lead to rates that are outside of the targeted rate ranges (sometime rates are too low and sometimes rates are higher than the desired, depending on the audio content). Accordingly, a [0116] rate control function 112 is optionally applied to bring better uniformity to the resulting bit-rates.
  • The preferred rate control mechanism operates as a feedback loop to the [0117] SRC 106 or quantization 108 functions. In particular, the preferred algorithm dynamically modifies the SRC or ASVQ quantization parameters to better maintain a desired bit rate. The dynamic parameter modifications are driven by the desired short-term and long-term bit rates. The short-term bit rate can be defined as the “instantaneous” bit-rate associated with the current coding frame. The long-term bit-rate is defined as the average bit-rate over a large number or all of the previously coded frames. The preferred algorithm attempts to target a desired short-term bit rate associated with the signal coefficients through an iterative process. This desired bit rate is determined from the short-term bit rate for the current frame and the short-term bit rate not associated with the signal coefficients of the previous frame. The expected short-term bit rate associated with the signal can be predicted based on a linear model:
  • Predicted=A(q(n))*S(c(m))+B(q(n)).   (1)
  • Here, A and B are functions of quantization related parameters, collectively represented as q. The variable q can take on values from a limited set of choices, represented by the variable n. An increase (decrease) in n leads to better (worse) quantization for the signal coefficients. Here S represents the percentage of the frame that is classified as signal, and it is a function of the characteristics of the current frame. S can take on values from a limited set of choices, represented by the variable m. An increase (decrease) in m leads to a larger (smaller) portion of the frame being classified as signal. [0118]
  • Thus, the rate control mechanism targets the desired long-term bit rate by predicting the short-term bit rate and using this prediction to guide the selection of classification and quantization related parameters associated with the preferred audio codec. The use of this model to predict the short-term bit rate associated with the current frame offers the following benefits: [0119]
  • 1. Because the rate control is guided by characteristics of the current frame, the rate control mechanism can react in situ to transient signals. [0120]
  • 2. Because the short-term bit rate is predicted without performing quantization, reduced computational complexity results. [0121]
  • The preferred implementation uses both the long-term bit rate and the short-term bit rate to guide the encoder to better target a desired bit rate. The algorithm is activated under four conditions: [0122]
  • 1. (LOW, LOW): The long-term bit rate is low and the short-term bit rate is low. [0123]
  • 2. (LOW, HIGH): The long-term bit rate is low and the short-term bit rate is high. [0124]
  • 3. (HIGH, LOW): The long-term bit rate is high and the short-term bit rate is low. [0125]
  • 4. (HIGH, HIGH): The long-term bit rate is high and the short-term bit rate is high. [0126]
  • The preferred implementation of the rate control mechanism is outlined in the threestep procedure below. The four conditions differ in Step 3 only. The implementation of Step 3 for cases 1 (LOW, LOW) and 4 (HIGH, HIGH) are given below. Case 2 (LOW, HIGH) and Case 4 (HIGH, HIGH) are identical, with the exception that they have different values for the upper limit of the target short-term bit rate for the signal coefficients. Case 3 (HIGH, LOW) and Case 1 (HIGH, HIGH) are identical, with the exception that they have different values for the lower limit of the target short-term bit rate for the signal coefficients. Accordingly, given n and m used for the previous frame: [0127]
  • 1. Calculate S(c(m)), the percentage of the frame classified as signal, based on the characteristics of the frame. [0128]
  • 2. Predict the required bits to quantize the signal in the current frame based on the linear model given in equation (1) above, using S(c(m)) calculated in (1). A(n), and B(n). [0129]
  • 3. Conditional processing step: [0130]
    if the (LOW, LOW) case applies:
    do {
    if m < MAX_M
    m++;
    else
    end loop after this iteration
    end
    Repeat Steps
    1 and 2 with the new parameter m
    (and therefore S(c(m)).
    if predicted short term bit rate for signal < lower limit of target
    short term bit
    rate for signal and n < MAX_N
    n++;
    if further from target than before
    n−−; (use results with previous n)
    end loop after this iteration
    end
    end
    } while (not end loop and (predicted short term bit rate for signal < lower
    limit of
    target short term bit rate for signal) and (m < MAX_M or n < MAX_n))
    end
    if the (HIGH, HIGH) case applies:
    do {
    if m < MIN_M
    m−−;
    else
    end loop after this iteration
    end
  • [0131] Repeat Steps 1 and 2 with the new parameter m (and therefore S(c(m)).
     if predicted short term bit rate for signal > upper limit of target short term bit
    rate for signal and n > MIN_N
    n−−;
    if further from target than before
    n++; (use results with previous n)
    end loop after this iteration
     end
    end
    } while (not end loop and (predicted short term bit rate for signal > upper limit of
    target short term bit rate for signal) and (m > MIN_M or n > MIN_n))
    end
  • In this implementation, additional information about which set of quantization parameters is chosen may be encoded. [0132]
  • Bit-Stream Formatting 124
  • The indices output by the [0133] quantization function 108 and the Stochastic Noise Analysis function 110 are formatted into a suitable bit-stream form by the bit-stream formatting function 114. The output information may also include zone indices to indicate the location of the quantization and stochastic noise analysis indices, rate control information, best basis tree information, and any normalization factors.
  • In the preferred embodiment, the format is the “ART” multimedia format used by America Online and further described in U.S. patent application Ser. No. 08/866.857, filed May 30, 1997 entitled “Encapsulated Document and Format System”, assigned to the assignee of the present invention and hereby incorporated by reference. However, other formats may be used, in known fashion. Formatting may include such information as identification fields, field definitions, error detection and correction data, version information, etc. [0134]
  • The formatted bit-stream represents a compressed audio file that may then be transmitted over a channel, such as the Internet, or stored on a medium, such as a magnetic or optical data storage disk. [0135]
  • Audio Decoding
  • FIG. 3 is a block diagram of a preferred general purpose audio decoding system in accordance with the invention. The preferred audio decoding system may be implemented in software or hardware, and comprises 7 major functional blocks, [0136] 200-212, which are described below.
  • Bit-stream Decoding 200
  • An incoming bit-stream previously generated by an audio encoder in accordance with the invention is coupled to a bit-[0137] stream decoding function 200. The decoding function 200 simply disassembles the received binary data into the original audio data, separating out the quantization indices and Stochastic Noise Analysis indices into corresponding signal and noise energy values, in known fashion.
  • Stochastic Noise Synthesis 202
  • The Stochastic Noise Analysis indices are applied to a Stochastic [0138] Noise Synthesis function 202. As discussed above, there are two preferred implementations of the stochastic noise synthesis. Given coded spectral energy for each frequency band, one can synthesize the stochastic noise in either the spectral domain or the time-domain for each of the residue sub-frames.
  • The spectral domain approaches generate pseudo-random numbers, which are scaled by the residue energy level in each frequency band. These scaled random numbers for each band are used as the synthesized DCT or FFT coefficients. Then, the synthesized coefficients are inversely transformed to form a time-domain spectrally colored noise signal. This technique is lower in computational complexity than its time-domain counterpart, and is useful when the residue sub-frame sizes are small. [0139]
  • The time-domain technique involves a filter bank based noise synthesizer. A bank of band-limited filters, one for each frequency band, is pre-computed. The time-domain noise signal is synthesized one frequency band at a time. The following describes the details of synthesizing the time-domain noise signal for one frequency band: [0140]
  • 1. A random number generator is used to generate white noise. [0141]
  • 2. The white noise signal is fed through the band-limited filter to produce the desired spectrally colored stochastic noise for the given frequency band. [0142]
  • 3. For each frequency band, the noise gain curve for the entire coding frame is determined by interpolating the encoded residue energy levels among residue sub-frames and between audio coding frames. Because of the interpolation, such a noise gain curve is continuous. This continuity is an additional advantage of the time-domain-based technique. [0143]
  • 4. Finally, the gain curve is applied to the spectrally colored noise signal. [0144]
  • Steps 1 and 2 can be pre-computed, thereby eliminating the need for implementing these steps during the decoding process. Computational complexity can therefore be reduced. [0145]
  • Inverse Quantization 204
  • The quantization indices are applied to an [0146] inverse quantization function 204 to generate signal coefficients. As in the case of quantization of the extended best basis tree, the de-quantization process is carried out for each of the best basis trees for each sub-frame. The preferred algorithm for de-quantization of a best basis tree follows:
    d = maximum depth of time-splitting for the best basis tree in question
    maxWidth = 2{circumflex over ( )}D−1;
    read maxWidth bits from bit-stream to code(1:maxWidth); (code = quantized bit-stream)
    btree = zeros(2{circumflex over ( )}(D+1)−1, 1);
    btree(1) = code(1); index = 1;
    for i = 0:d−2,
    nP = 2{circumflex over ( )}i;
    for b = 0:nP−1,
    if btree(nP+b) == 1,
    btree(2*(nP+b) + (0:1)) = code(index+(1:2)); index = index + 2;
    end
    end
    end
    code = code(1:i);    (actual bit used is i)
    rewind bit pointer for the bit-stream by (maxWidth − i) bits.
  • The preferred de-quantization algorithm for the signal components is a straightforward application of ASVQ type IV de-quantization described in allowed U.S. patent application Ser. No. 08/958,567 referenced above. [0147]
  • Inverse Transform 206
  • The signal coefficients are applied to an [0148] inverse transform function 206 to generate a time-domain reconstructed signal waveform. In this example, the adaptive cosine synthesis is similar to its counterpart in CPT with one additional step that converts the extended best basis tree (2-D array in general) into the combined best basis tree (1-D array). Then the cosine packet synthesis is carried out for the inverse transform. Details follow:
  • 1. Pre-calculate the bell window functions, bp and bm, as in [0149] CPT Step 1.
  • 2. Join the extended best basis tree, btrees, into a combined best basis tree, btree, a reverse of the split operation carried out in ACPT Step [0150] 6:
    if PRE-SPLIT_NOT_REQUIRED,
    btree = btrees;
    else
    nP1 = 2{circumflex over ( )}D1;
    btree = zeros(2{circumflex over ( )}(D+1)−1. 1);
    btree(1:nP1−1) = ones(nP1−1, 1);
    index = nP1;
    d2 = D2−D1;
    for i = 0:d2−1,
    for j = 1:nP1,
    for k = 2{circumflex over ( )}i−1 + (1:2{circumflex over ( )}i),
    btree(index) = btrees(k, j);
    index = index+1;
    end
    end
    end
    end
  • 3. Perform cosine packet synthesis to recover the time-domain signal, y, from the optimal cosine packet coefficients, opkt: [0151]
    m = N / 2{circumflex over ( )}(D+1);
    y = zeros(N, 1);
    stack = zeros(2{circumflex over ( )}D+1, 2);
    k = 1;
    while k > 0,
    d = stack(k, 1);
    b = stack(k, 2);
    k = k − 1;
    nP = 2{circumflex over ( )}d;
    Nj = N / nP;
    i = nP + b;
    if btree(i) == 0,
    ind = b * Nj + (1:Nj);
    xlcr = sqrt(2/Nj) * dct4(opkt(ind));
    xc = xlcr;
    xl = zeros(Nj, 1);
    xr = zeros(Nj, 1);
    ind1 = 1:m;
    ind2 = Nj+1 − ind1;
    xc(ind1) = bp .* xlcr(ind1);
    xc(ind2) = bp .* xlcr(ind2);
    xl(ind2) = bm .* xlcr(ind1);
    xr(ind1) = −bm .* xlcr(ind2);
    y(ind) = y(ind) + xc;
    if b == 0,
    y(ind1) = y(ind1) + xc(ind1) .* (1−bp) ./ bp;
    else
    y(ind−Nj) = y(ind−Nj) + xl;
    end
    if b < nP−1,
    y(ind+Nj) = y(ind+Nj) + xr;
    else
    y(ind2+N−Nj) = y(ind2+N−Nj) + xc(ind2) .* (1−bp) ./ bp;
    end;
    else
    k = k+1; stack(k, :) = [d+1 2*b];
    k = k+1; stack(k, :) = [d+1 2*b+1];
    end;
    end
  • Renormalization 208
  • The time-domain reconstructed signal and synthesized stochastic noise signal, from the inverse adaptive cosine [0152] packet synthesis function 206 and the stochastic noise synthesis function 202, respectively, are combined to form the complete reconstructed signal. The reconstructed signal is then optionally multiplied by the encoded scalar normalization factor in a renormalization function 208.
  • Boundary Synthesis 210
  • In the decoder, the [0153] boundary synthesis function 210 constitutes the last functional block before any time-domain post-processing (including but not limited to soft clipping, scaling, and re-sampling). Boundary synthesis is illustrated in the bottom (Decode) portion of FIG. 4. In the boundary synthesis component 210, a synthesis history buffer (HBD) is maintained for the purpose of boundary interpolation. The size of this history (sHBD) is a fraction of the size of the analysis history buffer (sHBE), namely,
  • sHBD=RD*sHBE=RD*RE*Ns, where. Ns is the number of samples in a coding frame.
  • Consider one coding frame of Ns samples. Label them S[i], where i=0, 1, 2, . . . , Ns. The synthesis history buffer keeps the sHB[0154] D samples from the last coding frame, starting at sample number Ns−sHBE/2−sHBD/2. The system takes Ns−sHBE samples from the synthesized time-domain signal (from the renormalization block), starting at sample number sHBE/2−sHBD/2.
  • These Ns−sHB[0155] E samples are called the pre-interpolation output data. The first sHBD samples of the pre-interpolation output data overlap with the samples kept in the synthesis history buffer in time. Therefore, a simple interpolation (e.g., linear interpolation) is used to reduce the boundary discontinuity. After the first sHBD samples are interpolated, the Ns−sHBE output data is then sent to the next functional block (in this embodiment, soft clipping 212). The synthesis history buffer is subsequently updated by the sHBD samples from the current synthesis frame, starting at sample number Ns−sHBE/2−sHBD/2.
  • The resulting codec latency is simply given by the following formula, [0156]
  • latency=(sHBE+sHBD)/2=RE*(1+RD)*Ns/2 (samples),
  • which is a small fraction of the audio coding frame. Since the latency is given in samples, higher intrinsic audio sampling rate generally implies lower codec latency. [0157]
  • Soft Clipping 212
  • In the preferred embodiment, the output of the [0158] boundary synthesis component 210 is applied to a soft clipping component 212. Signal saturation in low bit-rate audio compression due to lossy algorithms is a significant source of audible distortion if a simple and naive “hard clipping” mechanism is used to remove them. Soft clipping reduces spectral distortion when compared to the conventional “hard clipping” technique. The preferred soft clipping algorithm is described in allowed U.S. patent application Ser. No. 08/958,567 referenced above.
  • Computer Implementation
  • The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus to perform the required method steps. However, preferably, the invention is implemented in one or more computer programs executing on programmable systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. The program code is executed on the processors to perform the functions described herein. [0159]
  • Each such program may be implemented in any desired computer language (including but not limited to machine, assembly, and high level logical, procedural, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language. [0160]
  • Each such computer program is preferably stored on a storage media or device (e.g., ROM, CD-ROM, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein. [0161]
  • References
  • M. Bosi, et al., “ISO/IEC MPEG-2 advanced audio coding”, Journal of the Audio Engineering Society, vol. 45, no.10, pp. 789-812, October 1997. [0162]
  • S. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation”, IEEE Trans. Patt. Anal. Mach. Intell., vol. 11. pp. [0163] 674-693, July 1989.
  • R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection”, IEEE Trans. Inform. Theory, Special Issue on Wavelet Transforms and Multires. Signal Anal., vol. 38. pp. 713-718, March 1992. [0164]
  • M. V. Wickerhauser, “Acoustic signal compression with wavelet packets”, in Wavelets: A Tutorial in Theory and Applications, C. K. Chui, Ed. New York: Academic, 1992, pp. 679-700. [0165]
  • C. Herley, J. Kovacevic, K. Ramchandran, and M. Vetterli, “Tilings of the Time-Frequency Plane: Construction of Arbitrary Orthogonal Bases and Fast Tiling Algorithms”, IEEE Trans. on Signal Processing, vol. 41, No. 12, pp. 3341-3359. December 1993. [0166]
  • A number of embodiments of the present invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps of various of the algorithms may be order independent, and thus may be executed in an order other than as described above. As another example, although the preferred embodiments use vector quantization, scalar quantization may be used if desired in appropriate circumstances. Accordingly, other embodiments are within the scope of the following claims. [0167]

Claims (135)

What is claimed is:
1. A method for compressing a digitized time-domain continuous input signal, including:
formatting the input signal into a plurality of time-domain blocks having boundaries;
forming an overlapping time-domain block by prepending a fraction of a previous time-domain block to a current time-domain block;
transforming each overlapping time-domain block to a transform domain block comprising a plurality of coefficients;
partitioning the coefficients of each transform domain block into signal coefficients and residue coefficients;
quantizing the signal coefficients for each transform domain block and generating signal quantization indices indicative of such quantization;
modeling the residue coefficients for each transform domain block as stochastic noise and generating residue quantization indices indicative of such quantization; and
formatting the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream.
2. The method of claim 1 wherein the continuous data includes audio data.
3. The method of claim 1 further including applying a windowing function to each time-domain block to enhance residue energy concentration near the boundaries of each such time-domain block.
4. The method of claim 1 further including normalizing each time-domain block before transforming each such time-domain block to a transform domain block.
5. The method of claim 1 wherein transforming each time-domain block to a transform domain block comprising a plurality of coefficients includes applying an adaptive cosine packet transform algorithm.
6. The method of claim 5 wherein the adaptive cosine packet transform algorithm optimally adapts to instantaneous changes in each overlapping time-domain block, independent of previous and subsequent blocks.
7. The method of claim 5 wherein the adaptive cosine packet transform algorithm includes:
calculating bell window functions;
calculating a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
determining whether a pre-split at the time splitting level is needed for a current frame;
recalculating the cosine packet transform table at selected levels depending on the pre-split determination;
building a statistics tree for only the selected levels;
generating an extended statistics tree from the statistics tree;
performing a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
determining optimal transform coefficients from the extended best basis tree.
8. The method of claim 1 further including applying a rate control feedback loop to dynamically modify parameters of either or both of the partitioning step or the quantizing step to approach a target bit rate.
9. The method of claim 8 wherein the rate control feedback loop includes:
computing a predicted short term bit rate as A(q(n))×S(c(m))+B(q(n)), where A and B are functions of quantization related parameters, collectively represented as a variable q, the variable q can take on values from a limited set of choices, represented by a variable n, and S represents the percentage of a time-domain block that is classified as signal, where S can take on values from a limited set of choices, represented by a variable m; and
iteratively generating values for n and m, based on a long-term bit rate and the predicted short-term bit rate.
10. The method of claim 8 wherein applying the rate control feedback loop includes:
calculating a short-term bit rate for a preceding encoding frame;
calculating a long-term running average bit rate;
comparing the short-term bit rate and the long-term running average bit rate to a target bit rate range; and
adjusting an input threshold factor within a specified range for a signal and noise partitioning in a subsequent frame.
11. The method of claim 1 wherein partitioning the coefficients of each time-domain block into signal coefficients and residue coefficients includes:
sorting the absolute value of the coefficients of each transfer domain block;
calculating a global noise floor from the sorted coefficients;
calculating zone indices indicative of signal coefficient clusters;
calculating a local noise floor based on the zone indices;
determining signal coefficients based on the global noise floor, each local noise floor, and the zone indices;
removing weak signal coefficients from the signal coefficients;
removing residue coefficients from the signal coefficients in a first pass;
merging close neighbor signal coefficient clusters; and
removing residue coefficients from the signal coefficients in a second pass.
12. The method of claim 11 wherein calculating the global noise floor includes:
calculating a mean coefficient amplitude;
calculating a product of the mean coefficient amplitude and an adjustable input threshold factor as a threshold level; and
calculating the global noise floor as a mean amplitude of coefficients that are below the threshold level.
13. The method of claim 1 wherein quantizing the signal coefficients and generating signal quantization indices indicative of such quantization includes applying an adaptive sparse quantization algorithm.
14. The method of claim 1 wherein modeling the residue coefficients for each transform domain block as stochastic noise includes:
constructing a residue vector for each transform domain block;
synthesizing a time-domain residue frame from each residue vector;
splitting each residue frame into a plurality of residue sub-frames;
transforming each residue sub-frame into subbands of spectral coefficients; and
quantizing the spectral coefficients.
15. The method of claim 14 wherein splitting each residue frame into a plurality of residue sub-frames includes:
calculating subband sizes from a best basis tree; and
splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes.
16. A method for performing an adaptive cosine packet transform, including:
calculating bell window functions;
calculating a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
determining whether a pre-split at the time splitting level is needed for a current frame;
recalculating the cosine packet transform table at selected levels depending on the pre-split determination;
building a statistics tree for only the selected levels;
generating an extended statistics tree from the statistics tree;
performing a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
determining optimal transform coefficients from the extended best basis tree.
17. The method claim 16 further including:
determining how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
performing the pre-split for the current cosine packet transform frame to form the pre-split subframes.
18. A method for performing an adaptive cosine packet transform, including:
determining whether a pre-split is needed for a current cosine packet transform frame to form pre-split subframes;
applying a cosine packet transform to the pre-split subframes based on the determination;
performing a best basis analysis; and
determining optimal transform coefficients.
19. The method claim 18 further including:
determining how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
performing the pre-split for the current cosine packet transform frame to form the pre-split subframes.
20. The method of claim 18 further including:
calculating bell window functions; and
calculating a cosine packet transform table only for a time splitting level utilizing the bell window functions.
21. The method of claim 18 wherein performing the best basis analysis includes:
building a statistics tree for the pre-split subframes;
generating an extended statistics tree from the statistics tree; and
performing the best basis analysis to determine an extended best basis tree from the extended statistics tree.
22. The method of claim 21 wherein determining the optimal transform coefficients includes determining the optimal transform coefficients from the extended best basis tree.
23. A method for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, including:
decoding an output bit stream into vector quantization indices and residue vector quantization indices;
applying an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients;
applying an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
applying a stochastic noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
24. The method of claim 23 wherein the inverse vector quantization algorithm includes an inverse adaptive sparse vector quantization algorithm.
25. The method of claim 23 wherein the inverse transform includes an inverse adaptive cosine packet transform.
26. The method of claim 25 wherein the inverse adaptive cosine packet transform includes:
calculating bell window functions;
joining an extended best basis tree into a combined best basis tree; and
synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
27. The method of claim 23 further including renormalizing the reconstructed input signal waveform block.
28. The method of claim 23 wherein the stochastic noise synthesis algorithm is performed in the spectral domain, and includes:
generating pseudo-random numbers;
scaling the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
performing an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise subframe signal.
29. The method of claim 23 wherein the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer which includes:
pre-computing band-limited filter coefficients for a plurality of frequency bands;
generating pseudo-random white noise;
applying the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
computing a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
applying each gain curve to a spectrally colored noise signal; and
adding each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
30. The method of claim 23 wherein the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by:
calculating subband sizes from a best basis tree;
splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
placing the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
31. The method of claim 23 further including applying a soft clipping algorithm to the output signal to reduce spectral distortion.
32. A method for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, including:
generating a time-domain reconstructed signal waveform and residue vector quantization indices from an output bit stream;
applying a noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
33. The method of claim 32 wherein generating the time-domain reconstructed signal waveform and the residue vector quantization indices from the output bit stream includes:
decoding the output bit stream into vector quantization indices and the residue vector quantization indices;
applying an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients; and
applying an inverse transform to the signal coefficients to generate the time-domain reconstructed signal waveform.
34. The method of claim 33 wherein the inverse vector quantization algorithm includes an inverse adaptive sparse vector quantization algorithm.
35. The method of claim 33 wherein the inverse transform includes an inverse adaptive cosine packet transform.
36. The method of claim 35 wherein the inverse adaptive cosine packet transform includes:
calculating bell window functions;
joining an extended best basis tree into a combined best basis tree; and
synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
37. The method of claim 32 further including renormalizing the reconstructed input signal waveform block.
38. The method of claim 32 wherein the noise synthesis algorithm includes a stochastic noise synthesis algorithm.
39. The method of claim 38 wherein the stochastic noise synthesis algorithm is performed in the spectral domain, and includes:
generating pseudo-random numbers;
scaling the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
performing an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise signal.
40. The method of claim 38 wherein the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer which includes:
pre-computing band-limited filter coefficients for a plurality of frequency bands;
generating pseudo-random white noise;
applying the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
computing a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
applying each gain curve to a spectrally colored noise signal; and
adding each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
41. The method of claim 38 wherein the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by:
calculating subband sizes from a best basis tree;
splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
placing the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
42. The method of claim 32 further including applying a soft clipping algorithm to the output signal to reduce spectral distortion.
43. A method for performing an inverse adaptive cosine packet transform, including:
calculating bell window functions;
joining an extended best basis tree into a combined best basis tree; and
synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
44. The method of claim 43 further including applying the inverse adaptive cosine packet transform to signal coefficients to generate a time-domain reconstructed signal waveform.
45. A method for ultra-low latency compression and decompression for a general-purpose audio input signal, including:
formatting the audio input signal into a plurality of time-domain blocks having boundaries;
forming an overlapping time-domain block by prepending a fraction of a previous time-domain block to the current time-domain block:
transforming each time-domain block to a transform domain block comprising a plurality of coefficients;
partitioning the coefficients of each transform domain block into signal coefficients and residue coefficients;
quantizing the signal coefficients for each transform domain block and generating signal quantization indices indicative of such quantization;
modeling the residue coefficients for each transform domain block as stochastic noise and generating residue quantization indices indicative of such quantization;
formatting the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream;
decoding the output bit stream into quantization indices and residue quantization indices;
applying an inverse quantization algorithm to the quantization indices to generate signal coefficients;
applying an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
applying a stochastic noise synthesis algorithm to the residue quantization indices to generate a time-domain reconstructed residue waveform;
combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
46. A computer program, residing on a computer-readable medium, for compressing a digitized time-domain continuous input signal, the computer program comprising instructions for causing a computer to:
format the input signal into a plurality of time-domain blocks having boundaries;
form an overlapping time-domain block by prepending a fraction of a previous time-domain block to a current time-domain block;
transform each overlapping time-domain block to a transform domain block comprising a plurality of coefficients;
partition the coefficients of each transform domain block into signal coefficients and residue coefficients;
quantize the signal coefficients for each transform domain block and generate signal quantization indices indicative of such quantization;
model the residue coefficients for each transform domain block as stochastic noise and generate residue quantization indices indicative of such quantization; and
format the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream.
47. The computer program of claim 46 wherein the continuous data includes audio data.
48. The computer program of claim 46 further including instructions for causing the computer to apply a windowing function to each time-domain block to enhance residue energy concentration near the boundaries of each such time-domain block.
49. The computer program of claim 46 further including instructions for causing the computer to normalize each time-domain block before transforming each such time-domain block to a transform domain block.
50. The computer program of claim 46 wherein the instructions for causing the computer to transform each time-domain block to a transform domain block comprising a plurality of coefficients include instructions for causing the computer to apply an adaptive cosine packet transform algorithm.
51. The computer program of claim 50 wherein the adaptive cosine packet transform algorithm optimally adapts to instantaneous changes in each overlapping time-domain block, independent of previous and subsequent blocks.
52. The computer program of claim 50 wherein the adaptive cosine packet transform algorithm includes instructions for causing the computer to:
calculate bell window functions;
calculate a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
determine whether a pre-split at the time splitting level is needed for a current frame;
recalculate the cosine packet transform table at selected levels depending on the pre-split determination;
build a statistics tree for only the selected levels;
generate an extended statistics tree from the statistics tree;
perform a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
determine optimal transform coefficients from the extended best basis tree.
53. The computer program of claim 46 further including instructions for causing the computer to apply a rate control feedback loop to dynamically modify parameters of either or both of the instructions that cause the computer to partition or the instructions that cause the computer to quantize to approach a target bit rate.
54. The computer program of claim 53 wherein the rate control feedback loop includes instructions for causing the computer to:
compute a predicted short term bit rate as A(q(n))×S(c(m))+B(q(n)), where A and B are functions of quantization related parameters, collectively represented as a variable q, the variable q can take on values from a limited set of choices, represented by a variable n, and S represents the percentage of a time-domain block that is classified as signal, where S can take on values from a limited set of choices, represented by a variable m; and
iteratively generate values for n and m, based on a long-term bit rate and the predicted short-term bit rate.
55. The computer program of claim 53 wherein the instructions for causing the computer to apply the rate control feedback loop includes instructions for causing the computer to:
calculate a short-term bit rate for a preceding encoding frame;
calculate a long-term running average bit rate;
compare the short-term bit rate and the long-term running average bit rate to a target bit rate range; and
adjust an input threshold factor within a specified range for a signal and noise partitioning in a subsequent frame.
56. The computer program of claim 46 wherein the instructions for causing the computer to partition the coefficients of each time-domain block into signal coefficients and residue coefficients includes instructions for causing the computer to:
sort the absolute value of the coefficients of each transfer domain block;
calculate a global noise floor from the sorted coefficients;
calculate zone indices indicative of signal coefficient clusters;
calculate a local noise floor based on the zone indices;
determine signal coefficients based on the global noise floor, each local noise floor, and the zone indices;
remove weak signal coefficients from the signal coefficients;
remove residue coefficients from the signal coefficients in a first pass;
merge close neighbor signal coefficient clusters; and
remove residue coefficients from the signal coefficients in a second pass.
57. The computer program of claim 56 wherein the instructions for causing the computer to calculate the global noise floor include instructions for causing the computer to:
calculate a mean coefficient amplitude;
calculate a product of the mean coefficient amplitude and an adjustable input threshold factor as a threshold level; and
calculate the global noise floor as a mean amplitude of coefficients that are below the threshold level.
58. The computer program of claim 46 wherein the instructions for causing the computer to quantize the signal coefficients and generate signal quantization indices indicative of such quantization include instructions for causing the computer to apply an adaptive sparse quantization algorithm.
59. The computer program of claim 46 wherein the instructions for causing the computer to model the residue coefficients for each transform domain block as stochastic noise includes instructions for causing the computer to:
construct a residue vector for each transform domain block;
synthesize a time-domain residue frame from each residue vector;
split each residue frame into a plurality of residue sub-frames;
transform each residue sub-frame into subbands of spectral coefficients; and
quantize the spectral coefficients.
60. The computer program of claim 59 wherein the instructions for causing the computer to split each residue frame into a plurality of residue sub-frames include instructions for causing the computer to:
calculate subband sizes from a best basis tree; and
split each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes.
61. A computer program, residing on a computer-readable medium, for performing an adaptive cosine packet transform, the computer program comprising instructions for causing a computer to:
calculate bell window functions;
calculate a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
determine whether a pre-split at the time splitting level is needed for a current frame;
recalculate the cosine packet transform table at selected levels depending on the pre-split determination;
build a statistics tree for only the selected levels;
generate an extended statistics tree from the statistics tree;
perform a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
determine optimal transform coefficients from the extended best basis tree.
62. The computer program of claim 61 further including instructions for causing the computer to:
determine how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
perform the pre-split for the current cosine packet transform frame to form the pre-split subframes.
63. A computer program, residing on a computer-readable medium, for performing an adaptive cosine packet transform, the computer program comprising instructions for causing a computer to:
determine whether a pre-split is needed for a current cosine packet transform frame to form pre-split subframes;
apply a cosine packet transform to the pre-split subframes based on the determination;
perform a best basis analysis; and
determine optimal transform coefficients.
64. The computer program of claim 63 further including instructions for causing the computer to:
determine how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
perform the pre-split for the current cosine packet transform frame to form the pre-split subframes.
65. The computer program of claim 63 further including instructions for causing the computer to:
calculate bell window functions; and
calculate a cosine packet transform table only for a time splitting level utilizing the bell window functions.
66. The computer program of claim 63 wherein the instructions for causing the computer to perform the best basis analysis includes instructions for causing the computer to:
build a statistics tree for the pre-split subframes;
generate an extended statistics tree from the statistics tree; and
perform the best basis analysis to determine an extended best basis tree from the extended statistics tree.
67. The computer program of claim 66 wherein the instructions for causing the computer to determine the optimal transform coefficients includes instructions for causing the computer to determine the optimal transform coefficients from the extended best basis tree.
68. A computer program, residing on a computer-readable medium, for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, the computer program comprising instructions for causing a computer to:
decode an output bit stream into vector quantization indices and residue vector quantization indices;
apply an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients;
apply an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
apply a stochastic noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
combine the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
apply a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
69. The computer program of claim 68 wherein the inverse vector quantization algorithm includes an inverse adaptive sparse vector quantization algorithm.
70. The computer program of claim 68 wherein the inverse transform includes an inverse adaptive cosine packet transform.
71. The computer program of claim 70 wherein the inverse adaptive cosine packet transform includes instructions for causing the computer to:
calculate bell window functions;
join an extended best basis tree into a combined best basis tree; and
synthesize a time-domain signal from optimal cosine packet coefficients using the bell window functions.
72. The computer program of claim 68 further including instructions for causing the computer to renormalize the reconstructed input signal waveform block.
73. The computer program of claim 68 wherein the stochastic noise synthesis algorithm is performed in the spectral domain, and includes instructions for causing the computer to:
generate pseudo-random numbers;
scale the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
perform an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise subframe signal.
74. The computer program of claim 68 wherein the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer and the instructions for causing the computer to:
pre-compute band-limited filter coefficients for a plurality of frequency bands;
generate pseudo-random white noise;
apply the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
compute a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
apply each gain curve to a spectrally colored noise signal; and
add each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
75. The computer program of claim 68 wherein the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by including instructions for causing the computer to:
calculate subband sizes from a best basis tree;
split each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
place the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
76. The computer program of claim 68 further including instructions for causing the computer to apply a soft clipping algorithm to the output signal to reduce spectral distortion.
77. A computer program, residing on a computer-readable medium, for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, the computer program comprising instructions for causing a computer to:
generate a time-domain reconstructed signal waveform and residue vector quantization indices from an output bit stream;
apply a noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
combine the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
apply a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
78. The computer program of claim 77 wherein the instructions for causing the computer to generate the time-domain reconstructed signal waveform and the residue vector quantization indices from the output bit stream include instructions for causing the computer to:
decode the output bit stream into vector quantization indices and the residue vector quantization indices;
apply an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients; and
apply an inverse transform to the signal coefficients to generate the time-domain reconstructed signal waveform.
79. The computer program of claim 78 wherein the inverse vector quantization algorithm includes an inverse adaptive sparse vector quantization algorithm.
80. The computer program of claim 78 wherein the inverse transform includes an inverse adaptive cosine packet transform.
81. The computer program of claim 80 wherein the inverse adaptive cosine packet transform includes instructions for causing the computer to:
calculate bell window functions;
join an extended best basis tree into a combined best basis tree; and
synthesize a time-domain signal from optimal cosine packet coefficients using the bell window functions.
82. The computer program of claim 77 further including instructions for causing the computer to renormalize the reconstructed input signal waveform block.
83. The computer program of claim 77 wherein the noise synthesis algorithm includes a stochastic noise synthesis algorithm.
84. The computer program of claim 83 wherein the stochastic noise synthesis algorithm is performed in the spectral domain, and includes instructions for causing the computer to:
generate pseudo-random numbers;
scale the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
perform an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise signal.
85. The computer program of claim 83 wherein the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer which includes instructions for causing the computer to:
pre-compute band-limited filter coefficients for a plurality of frequency bands;
generate pseudo-random white noise;
apply the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
compute a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
apply each gain curve to a spectrally colored noise signal; and
add each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
86. The computer program of claim 83 wherein the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by including instructions for causing the computer to:
calculate subband sizes from a best basis tree;
split each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
place the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
87. The computer program of claim 77 further including instructions for causing the computer to apply a soft clipping algorithm to the output signal to reduce spectral distortion.
88. A computer program, residing on a computer-readable medium, for performing an inverse adaptive cosine packet transform, the computer program comprising instructions for causing a computer to:
calculate bell window functions;
join an extended best basis tree into a combined best basis tree; and
synthesize a time-domain signal from optimal cosine packet coefficients using the bell window functions.
89. The computer program of claim 88 further including instructions for causing the computer to apply the inverse adaptive cosine packet transform to signal coefficients to generate a time-domain reconstructed signal waveform.
90. A computer program, residing on a computer-readable medium, for ultra-low latency compression and decompression for a general-purpose audio input signal, the computer program comprising instructions for causing a computer to:
format the audio input signal into a plurality of time-domain blocks having boundaries;
form an overlapping time-domain block by prepending a fraction of a previous time-domain block to the current time-domain block;
transform each time-domain block to a transform domain block comprising a plurality of coefficients;
partition the coefficients of each transform domain block into signal coefficients and residue coefficients;
quantize the signal coefficients for each transform domain block and generate signal quantization indices indicative of such quantization;
model the residue coefficients for each transform domain block as stochastic noise and generate residue quantization indices indicative of such quantization;
format the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream;
decode the output bit stream into quantization indices and residue quantization indices;
apply an inverse quantization algorithm to the quantization indices to generate signal coefficients;
apply an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
apply a stochastic noise synthesis algorithm to the residue quantization indices to generate a time-domain reconstructed residue waveform;
combine the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
apply a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
91. A system for compressing a digitized time-domain continuous input signal, including:
means for formatting the input signal into a plurality of time-domain blocks having boundaries;
means for forming an overlapping time-domain block by prepending a fraction of a previous time-domain block to a current time-domain block;
means for transforming each overlapping time-domain block to a transform domain block comprising a plurality of coefficients;
means for partitioning the coefficients of each transform domain block into signal coefficients and residue coefficients;
means for quantizing the signal coefficients for each transform domain block and generating signal quantization indices indicative of such quantization;
means for modeling the residue coefficients for each transform domain block as stochastic noise and generating residue quantization indices indicative of such quantization; and
means for formatting the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream.
92. The system of claim 91 wherein the continuous data includes audio data.
93. The system of claim 91 further including means for applying a windowing function to each time-domain block to enhance residue energy concentration near the boundaries of each such time-domain block.
94. The system of claim 91 further including means for normalizing each time-domain block before transforming each such time-domain block to a transform domain block.
95. The system of claim 91 wherein the means for transforming each time-domain block to a transform domain block comprising a plurality of coefficients includes means for applying an adaptive cosine packet transform algorithm.
96. The system of claim 95 wherein the means for applying the adaptive cosine packet transform algorithm optimally adapts to instantaneous changes in each overlapping time-domain block, independent of previous and subsequent blocks.
97. The system of claim 95 wherein the means for applying the adaptive cosine packet transform algorithm includes:
means for calculating bell window functions;
means for calculating a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
means for determining whether a pre-split at the time splitting level is needed for a current frame:
means for recalculating the cosine packet transform table at selected levels depending on the pre-split determination;
means for building a statistics tree for only the selected levels;
means for generating an extended statistics tree from the statistics tree;
means for performing a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
means for determining optimal transform coefficients from the extended best basis tree.
98. The system of claim 91 further including means for applying a rate control feedback loop to dynamically modify parameters of either or both of the means for partitioning or the means for quantizing to approach a target bit rate.
99. The system of claim 98 wherein the means for applying the rate control feedback loop includes:
means for computing a predicted short term bit rate as A(q(n))×S(c(m))+B(q(n)), where A and B are functions of quantization related parameters, collectively represented as a variable q, the variable q can take on values from a limited set of choices, represented by a variable n, and S represents the percentage of a time-domain block that is classified as signal, where S can take on values from a limited set of choices, represented by a variable m; and
means for iteratively generating values for n and m, based on a long-term bit rate and the predicted short-term bit rate.
100. The system of claim 98 wherein the means for applying the rate control feedback loop includes:
means for calculating a short-term bit rate for a preceding encoding frame;
means for calculating a long-term running average bit rate;
means for comparing the short-term bit rate and the long-term running average bit rate to a target bit rate range; and
means for adjusting an input threshold factor within a specified range for a signal and noise partitioning in a subsequent frame.
101. The system of claim 91 wherein the means for partitioning the coefficients of each time-domain block into signal coefficients and residue coefficients includes:
means for sorting the absolute value of the coefficients of each transfer domain block;
means for calculating a global noise floor from the sorted coefficients;
means for calculating zone indices indicative of signal coefficient clusters;
means for calculating a local noise floor based on the zone indices;
means for determining signal coefficients based on the global noise floor, each local noise floor, and the zone indices;
means for removing weak signal coefficients from the signal coefficients;
means for removing residue coefficients from the signal coefficients in a first pass;
means for merging close neighbor signal coefficient clusters; and
means for removing residue coefficients from the signal coefficients in a second pass.
102. The system of claim 101 wherein the means for calculating the global noise floor includes:
means for calculating a mean coefficient amplitude;
means for calculating a product of the mean coefficient amplitude and an adjustable input threshold factor as a threshold level; and
means for calculating the global noise floor as a mean amplitude of coefficients that are below the threshold level.
103. The system of claim 91 wherein the means for quantizing the signal coefficients and generating signal quantization indices indicative of such quantization includes means for applying an adaptive sparse quantization algorithm.
104. The system of claim 91 wherein the means for modeling the residue coefficients for each transform domain block as stochastic noise includes:
means for constructing a residue vector for each transform domain block;
means for synthesizing a time-domain residue frame from each residue vector;
means for splitting each residue frame into a plurality of residue sub-frames;
means for transforming each residue sub-frame into subbands of spectral coefficients; and
means for quantizing the spectral coefficients.
105. The system of claim 104 wherein the means for splitting each residue frame into a plurality of residue sub-frames includes:
means for calculating subband sizes from a best basis tree; and
means for splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes.
106. A system for performing an adaptive cosine packet transform, including:
means for calculating bell window functions;
means for calculating a cosine packet transform table for at least one time splitting level utilizing the bell window functions;
means for determining whether a pre-split at the time splitting level is needed for a current frame:
means for recalculating the cosine packet transform table at selected levels depending on the pre-split determination;
means for building a statistics tree for only the selected levels;
means for generating an extended statistics tree from the statistics tree;
means for performing a best basis analysis to determine an extended best basis tree from the extended statistics tree; and
means for determining optimal transform coefficients from the extended best basis tree.
107. The system claim 106 further including:
means for determining how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
means for performing the pre-split for the current cosine packet transform frame to form the pre-split subframes.
108. A system for performing an adaptive cosine packet transform, including:
means for determining whether a pre-split is needed for a current cosine packet transform frame to form pre-split subframes;
means for applying a cosine packet transform to the pre-split subframes based on the determination;
means for performing a best basis analysis; and
means for determining optimal transform coefficients.
109. The system of claim 108 further including:
means for determining how to perform the pre-split for the current cosine packet transform frame to form the pre-split subframes; and
means for performing the pre-split for the current cosine packet transform frame to form the pre-split subframes.
110. The system of claim 108 further including:
means for calculating bell window functions; and
means for calculating a cosine packet transform table only for a time splitting level utilizing the bell window functions.
111. The system of claim 108 wherein the means for performing the best basis analysis includes:
means for building a statistics tree for the pre-split subframes;
means for generating an extended statistics tree from the statistics tree; and
means for performing the best basis analysis to determine an extended best basis tree from the extended statistics tree.
112. The system of claim 111 wherein the means for determining the optimal transform coefficients includes means for determining the optimal transform coefficients from the extended best basis tree.
113. A system for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, including:
means for decoding an output bit stream into vector quantization indices and residue vector quantization indices;
means for applying an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients;
means for applying an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
means for applying a stochastic noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
means for combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
means for applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
114. The system of claim 113 wherein the means for applying the inverse vector quantization algorithm includes means for applying an inverse adaptive sparse vector quantization algorithm.
115. The system of claim 113 wherein the means for applying the inverse transform includes means for applying an inverse adaptive cosine packet transform.
116. The system of claim 115 wherein the means for applying the inverse adaptive cosine packet transform includes:
means for calculating bell window functions;
means for joining an extended best basis tree into a combined best basis tree; and
means for synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
117. The system of claim 113 further including means for renormalizing the reconstructed input signal waveform block.
118. The system of claim 113 wherein the means for applying the stochastic noise synthesis algorithm is performed in the spectral domain, and includes:
means for generating pseudo-random numbers;
means for scaling the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
means for performing an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise subframe signal.
119. The system of claim 113 wherein the means for applying the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer which includes:
means for pre-computing band-limited filter coefficients for a plurality of frequency bands;
means for generating pseudo-random white noise;
means for applying the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
means for computing a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
means for applying each gain curve to a spectrally colored noise signal; and
means for adding each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
120. The system of claim 119 wherein the means for applying the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by:
means for calculating subband sizes from a best basis tree;
means for splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
means for placing the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
121. The system of claim 113 further including means for applying a soft clipping algorithm to the output signal to reduce spectral distortion.
122. A system for decompressing a bit stream including signal vector quantization indices and residue vector quantization indices, including:
means for generating a time-domain reconstructed signal waveform and residue vector quantization indices from an output bit stream;
means for applying a noise synthesis algorithm to the residue vector quantization indices to generate a time-domain reconstructed residue waveform;
means for combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
means for applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
123. The system of claim 122 wherein the means for generating the time-domain reconstructed signal waveform and the residue vector quantization indices from the output bit stream includes:
means for decoding the output bit stream into vector quantization indices and the residue vector quantization indices;
means for applying an inverse vector quantization algorithm to the vector quantization indices to generate signal coefficients; and
means for applying an inverse transform to the signal coefficients to generate the time-domain reconstructed signal waveform.
124. The system of claim 123 wherein the means for applying the inverse vector quantization algorithm includes means for applying an inverse adaptive sparse vector quantization algorithm.
125. The system of claim 123 wherein the means for applying the inverse transform includes means for applying an inverse adaptive cosine packet transform.
126. The system of claim 125 wherein means for applying the inverse adaptive cosine packet transform includes:
means for calculating bell window functions;
means for joining an extended best basis tree into a combined best basis tree; and
means for synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
127. The system of claim 122 further including means for renormalizing the reconstructed input signal waveform block.
128. The system of claim 122 wherein the means for applying the noise synthesis algorithm includes means for applying a stochastic noise synthesis algorithm.
129. The system of claim 128 wherein the means for applying the stochastic noise synthesis algorithm is performed in the spectral domain, and includes:
means for generating pseudo-random numbers;
means for scaling the pseudo-random numbers by residue energy to produce synthesized DCT or FFT coefficients; and
means for performing an inverse-DCT or inverse-FFT to obtain time-domain synthesized noise signal.
130. The system of claim 128 wherein the means for applying the stochastic noise synthesis algorithm includes a time-domain filter-bank based noise synthesizer which includes:
means for pre-computing band-limited filter coefficients for a plurality of frequency bands;
means for generating pseudo-random white noise;
applying the band-limited filter coefficients to the pseudo-random white noise to produce spectrally colored stochastic noise for each frequency band;
means for computing a noise gain curve for each frequency band by interpolating encoded residue energy levels among residue sub-frames and between audio coding frames;
means for applying each gain curve to a spectrally colored noise signal; and
means for adding each such noise signal to a corresponding frequency band to produce a final synthesized noise signal.
131. The system of claim 128 wherein the means for applying the stochastic noise synthesis algorithm includes a synthesized noise subframe signal assembled into a noise frame signal by:
means for calculating subband sizes from a best basis tree;
means for splitting each subband or joining neighboring subbands to create noise subframes that are within a specified range of subframe sizes; and
means for placing the ordered noise subframe signal into a reconstructed noise frame utilizing the subframe sizes.
132. The system of claim 122 further including means for applying a soft clipping algorithm to the output signal to reduce spectral distortion.
133. A system for performing an inverse adaptive cosine packet transform, including:
means for calculating bell window functions;
means for joining an extended best basis tree into a combined best basis tree; and
means for synthesizing a time-domain signal from optimal cosine packet coefficients using the bell window functions.
134. The system of claim 133 further including means for applying the inverse adaptive cosine packet transform to signal coefficients to generate a time-domain reconstructed signal waveform.
135. A system for ultra-low latency compression and decompression for a general-purpose audio input signal, including:
means for formatting the audio input signal into a plurality of time-domain blocks having boundaries;
means for forming an overlapping time-domain block by prepending a fraction of a previous time-domain block to the current time-domain block;
means for transforming each time-domain block to a transform domain block comprising a plurality of coefficients;
means for partitioning the coefficients of each transform domain block into signal coefficients and residue coefficients;
means for quantizing the signal coefficients for each transform domain block and generating signal quantization indices indicative of such quantization;
means for modeling the residue coefficients for each transform domain block as stochastic noise and generating residue quantization indices indicative of such quantization;
means for formatting the signal quantization indices and the residue quantization indices for each transform domain block as an output bit-stream;
means for decoding the output bit stream into quantization indices and residue quantization indices;
means for applying an inverse quantization algorithm to the quantization indices to generate signal coefficients;
means for applying an inverse transform to the signal coefficients to generate a time-domain reconstructed signal waveform;
means for applying a stochastic noise synthesis algorithm to the residue quantization indices to generate a time-domain reconstructed residue waveform;
means for combining the reconstructed signal waveform and the reconstructed residue waveform as a reconstructed input signal waveform block; and
means for applying a boundary synthesis algorithm to the reconstructed input signal waveform block to generate an output signal having substantially reduced boundary discontinuities.
US10/061,310 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec Expired - Lifetime US6885993B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US10/061,310 US6885993B2 (en) 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/075,440 US7181403B2 (en) 1999-05-27 2005-03-09 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/609,081 US7418395B2 (en) 1999-05-27 2006-12-11 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US12/197,645 US8010371B2 (en) 1999-05-27 2008-08-25 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/191,496 US8285558B2 (en) 1999-05-27 2011-07-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,414 US8712785B2 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,339 US20130173271A1 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/321,488 US6370502B1 (en) 1999-05-27 1999-05-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US10/061,310 US6885993B2 (en) 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/321,488 Division US6370502B1 (en) 1999-05-27 1999-05-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/075,440 Division US7181403B2 (en) 1999-05-27 2005-03-09 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Publications (2)

Publication Number Publication Date
US20020116199A1 true US20020116199A1 (en) 2002-08-22
US6885993B2 US6885993B2 (en) 2005-04-26

Family

ID=23250806

Family Applications (9)

Application Number Title Priority Date Filing Date
US09/321,488 Expired - Lifetime US6370502B1 (en) 1999-05-27 1999-05-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US10/061,310 Expired - Lifetime US6885993B2 (en) 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US10/061,206 Expired - Lifetime US6704706B2 (en) 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/075,440 Expired - Lifetime US7181403B2 (en) 1999-05-27 2005-03-09 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/609,081 Expired - Lifetime US7418395B2 (en) 1999-05-27 2006-12-11 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US12/197,645 Expired - Fee Related US8010371B2 (en) 1999-05-27 2008-08-25 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/191,496 Expired - Lifetime US8285558B2 (en) 1999-05-27 2011-07-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,339 Abandoned US20130173271A1 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,414 Expired - Lifetime US8712785B2 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/321,488 Expired - Lifetime US6370502B1 (en) 1999-05-27 1999-05-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Family Applications After (7)

Application Number Title Priority Date Filing Date
US10/061,206 Expired - Lifetime US6704706B2 (en) 1999-05-27 2002-02-04 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/075,440 Expired - Lifetime US7181403B2 (en) 1999-05-27 2005-03-09 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US11/609,081 Expired - Lifetime US7418395B2 (en) 1999-05-27 2006-12-11 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US12/197,645 Expired - Fee Related US8010371B2 (en) 1999-05-27 2008-08-25 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/191,496 Expired - Lifetime US8285558B2 (en) 1999-05-27 2011-07-27 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,339 Abandoned US20130173271A1 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US13/618,414 Expired - Lifetime US8712785B2 (en) 1999-05-27 2012-09-14 Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Country Status (6)

Country Link
US (9) US6370502B1 (en)
EP (2) EP1181686B1 (en)
AT (2) ATE278236T1 (en)
CA (1) CA2373520C (en)
DE (2) DE60041790D1 (en)
WO (1) WO2000074038A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120445A1 (en) * 2000-11-03 2002-08-29 Renat Vafin Coding signals
US20040024592A1 (en) * 2002-08-01 2004-02-05 Yamaha Corporation Audio data processing apparatus and audio data distributing apparatus
US20040039565A1 (en) * 2002-08-23 2004-02-26 Kulas Charles J. Digital representation of audio waveforms using peak shifting to provide increased dynamic range
US20050185732A1 (en) * 2004-02-25 2005-08-25 Nokia Corporation Multiscale wireless communication
US20060020453A1 (en) * 2004-05-13 2006-01-26 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US20070016405A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US20070094015A1 (en) * 2005-09-22 2007-04-26 Georges Samake Audio codec using the Fast Fourier Transform, the partial overlap and a decomposition in two plans based on the energy.
US20080086313A1 (en) * 2006-10-02 2008-04-10 Sony Corporation Signal processing apparatus, signal processing method, and computer program
US20090024395A1 (en) * 2004-01-19 2009-01-22 Matsushita Electric Industrial Co., Ltd. Audio signal encoding method, audio signal decoding method, transmitter, receiver, and wireless microphone system
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US20100281017A1 (en) * 2009-04-29 2010-11-04 Oracle International Corp Partition pruning via query rewrite
US20110110421A1 (en) * 2009-11-10 2011-05-12 Electronics And Telecommunications Research Institute Rate control method for video encoder using kalman filter and fir filter

Families Citing this family (136)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5807670A (en) * 1995-08-14 1998-09-15 Abbott Laboratories Detection of hepatitis GB virus genotypes
WO1999017451A2 (en) * 1997-09-30 1999-04-08 Koninklijke Philips Electronics N.V. Method and device for detecting bits in a data signal
US6370502B1 (en) * 1999-05-27 2002-04-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
EP1201088B1 (en) * 1999-07-30 2005-11-16 Indinell Sociedad Anonima Method and apparatus for processing digital images and audio data
EP1228506B1 (en) * 1999-10-30 2006-08-16 STMicroelectronics Asia Pacific Pte Ltd. Method of encoding an audio signal using a quality value for bit allocation
JP3507743B2 (en) * 1999-12-22 2004-03-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Digital watermarking method and system for compressed audio data
EP1199711A1 (en) * 2000-10-20 2002-04-24 Telefonaktiebolaget Lm Ericsson Encoding of audio signal using bandwidth expansion
US7062445B2 (en) * 2001-01-26 2006-06-13 Microsoft Corporation Quantization loop with heuristic approach
CN1167034C (en) * 2001-02-27 2004-09-15 华为技术有限公司 Method for image predenoising
US6757648B2 (en) * 2001-06-28 2004-06-29 Microsoft Corporation Techniques for quantization of spectral data in transcoding
US6882685B2 (en) * 2001-09-18 2005-04-19 Microsoft Corporation Block transform and quantization for image and video coding
EP1318611A1 (en) * 2001-12-06 2003-06-11 Deutsche Thomson-Brandt Gmbh Method for retrieving a sensitive criterion for quantized spectra detection
US7460993B2 (en) * 2001-12-14 2008-12-02 Microsoft Corporation Adaptive window-size selection in transform coding
US7027982B2 (en) * 2001-12-14 2006-04-11 Microsoft Corporation Quality and rate control strategy for digital audio
US7240001B2 (en) * 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
US6934677B2 (en) * 2001-12-14 2005-08-23 Microsoft Corporation Quantization matrices based on critical band pattern information for digital audio wherein quantization bands differ from critical bands
US7242713B2 (en) * 2002-05-02 2007-07-10 Microsoft Corporation 2-D transforms for image and video coding
US6980695B2 (en) * 2002-06-28 2005-12-27 Microsoft Corporation Rate allocation for mixed content video
US7328150B2 (en) * 2002-09-04 2008-02-05 Microsoft Corporation Innovations in pure lossless audio compression
US7424434B2 (en) * 2002-09-04 2008-09-09 Microsoft Corporation Unified lossy and lossless audio compression
US7536305B2 (en) 2002-09-04 2009-05-19 Microsoft Corporation Mixed lossless audio compression
US7299190B2 (en) * 2002-09-04 2007-11-20 Microsoft Corporation Quantization and inverse quantization for audio
JP4676140B2 (en) * 2002-09-04 2011-04-27 マイクロソフト コーポレーション Audio quantization and inverse quantization
US7502743B2 (en) * 2002-09-04 2009-03-10 Microsoft Corporation Multi-channel audio encoding and decoding with multi-channel transform selection
TW573293B (en) * 2002-09-13 2004-01-21 Univ Nat Central Nonlinear operation method suitable for audio encoding/decoding and an applied hardware thereof
US6831868B2 (en) * 2002-12-05 2004-12-14 Intel Corporation Byte aligned redundancy for memory array
DE10306022B3 (en) * 2003-02-13 2004-02-19 Siemens Ag Speech recognition method for telephone, personal digital assistant, notepad computer or automobile navigation system uses 3-stage individual word identification
US7471726B2 (en) * 2003-07-15 2008-12-30 Microsoft Corporation Spatial-domain lapped transform in digital media compression
US7609763B2 (en) * 2003-07-18 2009-10-27 Microsoft Corporation Advanced bi-directional predictive coding of video frames
US7738554B2 (en) 2003-07-18 2010-06-15 Microsoft Corporation DC coefficient signaling at small quantization step sizes
US7383180B2 (en) * 2003-07-18 2008-06-03 Microsoft Corporation Constant bitrate media encoding techniques
US8218624B2 (en) * 2003-07-18 2012-07-10 Microsoft Corporation Fractional quantization step sizes for high bit rates
US7602851B2 (en) * 2003-07-18 2009-10-13 Microsoft Corporation Intelligent differential quantization of video coding
US7580584B2 (en) * 2003-07-18 2009-08-25 Microsoft Corporation Adaptive multiple quantization
US7343291B2 (en) 2003-07-18 2008-03-11 Microsoft Corporation Multi-pass variable bitrate media encoding
US10554985B2 (en) 2003-07-18 2020-02-04 Microsoft Technology Licensing, Llc DC coefficient signaling at small quantization step sizes
US7369709B2 (en) * 2003-09-07 2008-05-06 Microsoft Corporation Conditional lapped transform
US7724827B2 (en) 2003-09-07 2010-05-25 Microsoft Corporation Multi-layer run level encoding and decoding
US7460990B2 (en) * 2004-01-23 2008-12-02 Microsoft Corporation Efficient coding of digital media spectral data using wide-sense perceptual similarity
DE102004007184B3 (en) * 2004-02-13 2005-09-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for quantizing an information signal
JP4997098B2 (en) * 2004-03-25 2012-08-08 ディー・ティー・エス,インコーポレーテッド Scalable reversible audio codec and authoring tool
US7272567B2 (en) * 2004-03-25 2007-09-18 Zoran Fejzo Scalable lossless audio codec and authoring tool
US20050232497A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation High-fidelity transcoding
US7487193B2 (en) * 2004-05-14 2009-02-03 Microsoft Corporation Fast video codec transform implementations
US7801383B2 (en) * 2004-05-15 2010-09-21 Microsoft Corporation Embedded scalar quantizers with arbitrary dead-zone ratios
US7930184B2 (en) 2004-08-04 2011-04-19 Dts, Inc. Multi-channel audio coding/decoding of random access points and transients
US7428342B2 (en) * 2004-12-17 2008-09-23 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US7471850B2 (en) * 2004-12-17 2008-12-30 Microsoft Corporation Reversible transform for lossy and lossless 2-D data compression
US7305139B2 (en) * 2004-12-17 2007-12-04 Microsoft Corporation Reversible 2-dimensional pre-/post-filtering for lapped biorthogonal transform
US20070160154A1 (en) * 2005-03-28 2007-07-12 Sukkar Rafid A Method and apparatus for injecting comfort noise in a communications signal
US20060217988A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for adaptive level control
US20060217970A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for noise reduction
US20060217972A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for modifying an encoded signal
US20060217983A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for injecting comfort noise in a communications system
US20060215683A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for voice quality enhancement
US8086451B2 (en) * 2005-04-20 2011-12-27 Qnx Software Systems Co. System for improving speech intelligibility through high frequency compression
US8422546B2 (en) 2005-05-25 2013-04-16 Microsoft Corporation Adaptive video encoding using a perceptual model
US7539612B2 (en) * 2005-07-15 2009-05-26 Microsoft Corporation Coding and decoding scale factor information
US8036274B2 (en) * 2005-08-12 2011-10-11 Microsoft Corporation SIMD lapped transform-based digital media encoding/decoding
US7689052B2 (en) * 2005-10-07 2010-03-30 Microsoft Corporation Multimedia signal processing using fixed-point approximations of linear transforms
ES2296489B1 (en) * 2005-12-02 2009-04-01 Cesar Alonso Abad SCALABLE METHOD OF AUDIO AND IMAGE COMPRESSION.
TWI311856B (en) * 2006-01-04 2009-07-01 Quanta Comp Inc Synthesis subband filtering method and apparatus
US7831434B2 (en) * 2006-01-20 2010-11-09 Microsoft Corporation Complex-transform channel coding with extended-band frequency coding
US7974340B2 (en) * 2006-04-07 2011-07-05 Microsoft Corporation Adaptive B-picture quantization control
US8503536B2 (en) * 2006-04-07 2013-08-06 Microsoft Corporation Quantization adjustments for DC shift artifacts
US8059721B2 (en) 2006-04-07 2011-11-15 Microsoft Corporation Estimating sample-domain distortion in the transform domain with rounding compensation
US7995649B2 (en) 2006-04-07 2011-08-09 Microsoft Corporation Quantization adjustment based on texture level
US8130828B2 (en) 2006-04-07 2012-03-06 Microsoft Corporation Adjusting quantization to preserve non-zero AC coefficients
CN101473347B (en) * 2006-04-21 2012-05-30 皇家飞利浦电子股份有限公司 Picture enhancing increasing precision smooth profiles
TWI316189B (en) * 2006-05-01 2009-10-21 Silicon Motion Inc Block-based method for processing wma stream
US8711925B2 (en) * 2006-05-05 2014-04-29 Microsoft Corporation Flexible quantization
KR101412255B1 (en) * 2006-12-13 2014-08-14 파나소닉 인텔렉츄얼 프로퍼티 코포레이션 오브 아메리카 Encoding device, decoding device, and method therof
JPWO2008072733A1 (en) * 2006-12-15 2010-04-02 パナソニック株式会社 Encoding apparatus and encoding method
US8238424B2 (en) 2007-02-09 2012-08-07 Microsoft Corporation Complexity-based adaptive preprocessing for multiple-pass video compression
US8942289B2 (en) * 2007-02-21 2015-01-27 Microsoft Corporation Computational complexity and precision control in transform-based digital media codec
US8498335B2 (en) * 2007-03-26 2013-07-30 Microsoft Corporation Adaptive deadzone size adjustment in quantization
US8243797B2 (en) * 2007-03-30 2012-08-14 Microsoft Corporation Regions of interest for quality adjustments
US8442337B2 (en) * 2007-04-18 2013-05-14 Microsoft Corporation Encoding adjustments for animation content
US8331438B2 (en) 2007-06-05 2012-12-11 Microsoft Corporation Adaptive selection of picture-level quantization parameters for predicted video pictures
US7885819B2 (en) 2007-06-29 2011-02-08 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US8254455B2 (en) * 2007-06-30 2012-08-28 Microsoft Corporation Computing collocated macroblock information for direct mode macroblocks
US8457958B2 (en) 2007-11-09 2013-06-04 Microsoft Corporation Audio transcoder using encoder-generated side information to transcode to target bit-rate
US8239210B2 (en) 2007-12-19 2012-08-07 Dts, Inc. Lossless multi-channel audio codec
KR101441897B1 (en) * 2008-01-31 2014-09-23 삼성전자주식회사 Method and apparatus for encoding residual signals and method and apparatus for decoding residual signals
US8386271B2 (en) * 2008-03-25 2013-02-26 Microsoft Corporation Lossless and near lossless scalable audio codec
US8189933B2 (en) * 2008-03-31 2012-05-29 Microsoft Corporation Classifying and controlling encoding quality for textured, dark smooth and smooth video content
US8164862B2 (en) * 2008-04-02 2012-04-24 Headway Technologies, Inc. Seed layer for TMR or CPP-GMR sensor
US8325800B2 (en) 2008-05-07 2012-12-04 Microsoft Corporation Encoding streaming media as a high bit rate layer, a low bit rate layer, and one or more intermediate bit rate layers
US8379851B2 (en) 2008-05-12 2013-02-19 Microsoft Corporation Optimized client side rate control and indexed file layout for streaming media
US8369638B2 (en) 2008-05-27 2013-02-05 Microsoft Corporation Reducing DC leakage in HD photo transform
US7860996B2 (en) 2008-05-30 2010-12-28 Microsoft Corporation Media streaming with seamless ad insertion
US8447591B2 (en) * 2008-05-30 2013-05-21 Microsoft Corporation Factorization of overlapping tranforms into two block transforms
US8897359B2 (en) 2008-06-03 2014-11-25 Microsoft Corporation Adaptive quantization for enhancement layer video coding
US8265140B2 (en) * 2008-09-30 2012-09-11 Microsoft Corporation Fine-grained client-side control of scalable media delivery
US8275209B2 (en) * 2008-10-10 2012-09-25 Microsoft Corporation Reduced DC gain mismatch and DC leakage in overlap transform processing
CN102272833B (en) * 2008-12-30 2013-10-30 阿塞里克股份有限公司 Audio equipment and signal processing method thereof
KR101622950B1 (en) * 2009-01-28 2016-05-23 삼성전자주식회사 Method of coding/decoding audio signal and apparatus for enabling the method
US8311115B2 (en) * 2009-01-29 2012-11-13 Microsoft Corporation Video encoding using previously calculated motion information
US8396114B2 (en) * 2009-01-29 2013-03-12 Microsoft Corporation Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
US8189666B2 (en) 2009-02-02 2012-05-29 Microsoft Corporation Local picture identifier and computation of co-located information
US8270473B2 (en) * 2009-06-12 2012-09-18 Microsoft Corporation Motion based dynamic resolution multiple bit rate video encoding
EP2517201B1 (en) * 2009-12-23 2015-11-04 Nokia Technologies Oy Sparse audio processing
US8705616B2 (en) 2010-06-11 2014-04-22 Microsoft Corporation Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures
US9591318B2 (en) 2011-09-16 2017-03-07 Microsoft Technology Licensing, Llc Multi-layer encoding and decoding
US11089343B2 (en) 2012-01-11 2021-08-10 Microsoft Technology Licensing, Llc Capability advertisement, configuration and control for video coding and decoding
US9454972B2 (en) * 2012-02-10 2016-09-27 Panasonic Intellectual Property Corporation Of America Audio and speech coding device, audio and speech decoding device, method for coding audio and speech, and method for decoding audio and speech
KR101821532B1 (en) * 2012-07-12 2018-03-08 노키아 테크놀로지스 오와이 Vector quantization
JP6065452B2 (en) * 2012-08-14 2017-01-25 富士通株式会社 Data embedding device and method, data extraction device and method, and program
KR102204136B1 (en) 2012-08-22 2021-01-18 한국전자통신연구원 Apparatus and method for encoding audio signal, apparatus and method for decoding audio signal
JP6146069B2 (en) 2013-03-18 2017-06-14 富士通株式会社 Data embedding device and method, data extraction device and method, and program
CN105247614B (en) * 2013-04-05 2019-04-05 杜比国际公司 Audio coder and decoder
EP3217398B1 (en) * 2013-04-05 2019-08-14 Dolby International AB Advanced quantizer
US20140355769A1 (en) 2013-05-29 2014-12-04 Qualcomm Incorporated Energy preservation for decomposed representations of a sound field
US9922656B2 (en) 2014-01-30 2018-03-20 Qualcomm Incorporated Transitioning of ambient higher-order ambisonic coefficients
US9502045B2 (en) 2014-01-30 2016-11-22 Qualcomm Incorporated Coding independent frames of ambient higher-order ambisonic coefficients
KR102244612B1 (en) 2014-04-21 2021-04-26 삼성전자주식회사 Appratus and method for transmitting and receiving voice data in wireless communication system
US9852737B2 (en) 2014-05-16 2017-12-26 Qualcomm Incorporated Coding vectors decomposed from higher-order ambisonics audio signals
US9620137B2 (en) * 2014-05-16 2017-04-11 Qualcomm Incorporated Determining between scalar and vector quantization in higher order ambisonic coefficients
US10770087B2 (en) 2014-05-16 2020-09-08 Qualcomm Incorporated Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
US9747910B2 (en) 2014-09-26 2017-08-29 Qualcomm Incorporated Switching between predictive and non-predictive quantization techniques in a higher order ambisonics (HOA) framework
EP3405950B1 (en) * 2016-01-22 2022-09-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Stereo audio coding with ild-based normalisation prior to mid/side decision
WO2018009226A1 (en) * 2016-07-08 2018-01-11 Hewlett-Packard Development Company, L.P. Color look up table compression
CN110998724B (en) 2017-08-01 2021-05-21 杜比实验室特许公司 Audio object classification based on location metadata
US11277455B2 (en) 2018-06-07 2022-03-15 Mellanox Technologies, Ltd. Streaming system
US11625393B2 (en) * 2019-02-19 2023-04-11 Mellanox Technologies, Ltd. High performance computing system
EP3699770A1 (en) 2019-02-25 2020-08-26 Mellanox Technologies TLV Ltd. Collective communication system and methods
US11750699B2 (en) 2020-01-15 2023-09-05 Mellanox Technologies, Ltd. Small message aggregation
US11252027B2 (en) 2020-01-23 2022-02-15 Mellanox Technologies, Ltd. Network element supporting flexible data reduction operations
US11533033B2 (en) * 2020-06-12 2022-12-20 Bose Corporation Audio signal amplifier gain control
US11876885B2 (en) 2020-07-02 2024-01-16 Mellanox Technologies, Ltd. Clock queue with arming and/or self-arming features
US11556378B2 (en) 2020-12-14 2023-01-17 Mellanox Technologies, Ltd. Offloading execution of a multi-task parameter-dependent operation to a network device
CN112737711B (en) * 2020-12-24 2023-04-18 成都戎星科技有限公司 Broadband carrier detection method based on adaptive noise floor estimation
CN113948085B (en) * 2021-12-22 2022-03-25 中国科学院自动化研究所 Speech recognition method, system, electronic device and storage medium
US11922237B1 (en) 2022-09-12 2024-03-05 Mellanox Technologies, Ltd. Single-step collective operations
CN116403599B (en) * 2023-06-07 2023-08-15 中国海洋大学 Efficient voice separation method and model building method thereof
CN117877504B (en) * 2024-03-11 2024-05-24 中国海洋大学 Combined voice enhancement method and model building method thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5388181A (en) * 1990-05-29 1995-02-07 Anderson; David J. Digital audio compression system
US5428395A (en) * 1992-06-18 1995-06-27 Samsung Electronics Co., Ltd. Encoding and decoding method and apparatus thereof using a variable picture partitioning technique
US5787204A (en) * 1991-01-10 1998-07-28 Olympus Optical Co., Ltd. Image signal decoding device capable of removing block distortion with simple structure
US5911130A (en) * 1995-05-30 1999-06-08 Victor Company Of Japan, Ltd. Audio signal compression and decompression utilizing amplitude, frequency, and time information
US5987407A (en) * 1997-10-28 1999-11-16 America Online, Inc. Soft-clipping postprocessor scaling decoded audio signal frame saturation regions to approximate original waveform shape and maintain continuity
US6256422B1 (en) * 1998-11-04 2001-07-03 International Business Machines Corporation Transform-domain correction of real-domain errors
US6263312B1 (en) * 1997-10-03 2001-07-17 Alaris, Inc. Audio compression and decompression employing subband decomposition of residual signal and distortion reduction
US6370502B1 (en) * 1999-05-27 2002-04-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6475245B2 (en) * 1997-08-29 2002-11-05 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4KBPS having phase alignment between mode-switched frames

Family Cites Families (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL161617C (en) * 1968-06-17 1980-02-15 Nippon Electric Co SEMICONDUCTOR WITH FLAT SURFACE AND METHOD FOR MANUFACTURING THE SAME
JPS5124341B2 (en) * 1971-12-24 1976-07-23
US3775262A (en) * 1972-02-09 1973-11-27 Ncr Method of making insulated gate field effect transistor
JPS4995591A (en) * 1973-01-12 1974-09-10
US4040073A (en) * 1975-08-29 1977-08-02 Westinghouse Electric Corporation Thin film transistor and display panel using the transistor
US4236167A (en) * 1978-02-06 1980-11-25 Rca Corporation Stepped oxide, high voltage MOS transistor with near intrinsic channel regions of different doping levels
US4232327A (en) * 1978-11-13 1980-11-04 Rca Corporation Extended drain self-aligned silicon gate MOSFET
US4336550A (en) * 1980-03-20 1982-06-22 Rca Corporation CMOS Device with silicided sources and drains and method
EP0058548B1 (en) * 1981-02-16 1986-08-06 Fujitsu Limited Method of producing mosfet type semiconductor device
JPS5823479A (en) * 1981-08-05 1983-02-12 Fujitsu Ltd Manufacture of semiconductor device
JPS59188974A (en) * 1983-04-11 1984-10-26 Nec Corp Manufacture of semiconductor device
US4503601A (en) * 1983-04-18 1985-03-12 Ncr Corporation Oxide trench structure for polysilicon gates and interconnects
JPH0693509B2 (en) * 1983-08-26 1994-11-16 シャープ株式会社 Thin film transistor
US4727044A (en) * 1984-05-18 1988-02-23 Semiconductor Energy Laboratory Co., Ltd. Method of making a thin film transistor with laser recrystallized source and drain
DE3530065C2 (en) * 1984-08-22 1999-11-18 Mitsubishi Electric Corp Process for the production of a semiconductor
EP0222215B1 (en) * 1985-10-23 1991-10-16 Hitachi, Ltd. Polysilicon mos transistor and method of manufacturing the same
US4701423A (en) * 1985-12-20 1987-10-20 Ncr Corporation Totally self-aligned CMOS process
US4755865A (en) * 1986-01-21 1988-07-05 Motorola Inc. Means for stabilizing polycrystalline semiconductor layers
US4690730A (en) * 1986-03-07 1987-09-01 Texas Instruments Incorporated Oxide-capped titanium silicide formation
JPS62229873A (en) * 1986-03-29 1987-10-08 Hitachi Ltd Manufacture of thin film semiconductor device
JPH0777264B2 (en) * 1986-04-02 1995-08-16 三菱電機株式会社 Method of manufacturing thin film transistor
US4728617A (en) * 1986-11-04 1988-03-01 Intel Corporation Method of fabricating a MOSFET with graded source and drain regions
US4753896A (en) * 1986-11-21 1988-06-28 Texas Instruments Incorporated Sidewall channel stop process
JPH0687503B2 (en) * 1987-03-11 1994-11-02 株式会社日立製作所 Thin film semiconductor device
US5024960A (en) * 1987-06-16 1991-06-18 Texas Instruments Incorporated Dual LDD submicron CMOS process for making low and high voltage transistors with common gate
US5258319A (en) * 1988-02-19 1993-11-02 Mitsubishi Denki Kabushiki Kaisha Method of manufacturing a MOS type field effect transistor using an oblique ion implantation step
US5238859A (en) * 1988-04-26 1993-08-24 Kabushiki Kaisha Toshiba Method of manufacturing semiconductor device
JP2653099B2 (en) * 1988-05-17 1997-09-10 セイコーエプソン株式会社 Active matrix panel, projection display and viewfinder
JPH01291467A (en) * 1988-05-19 1989-11-24 Toshiba Corp Thin film transistor
JP2752991B2 (en) * 1988-07-14 1998-05-18 株式会社東芝 Semiconductor device
US5146291A (en) * 1988-08-31 1992-09-08 Mitsubishi Denki Kabushiki Kaisha MIS device having lightly doped drain structure
US4971837A (en) * 1989-04-03 1990-11-20 Ppg Industries, Inc. Chip resistant coatings and methods of application
JPH0787189B2 (en) * 1990-01-19 1995-09-20 松下電器産業株式会社 Method for manufacturing semiconductor device
KR950000141B1 (en) * 1990-04-03 1995-01-10 미쓰비시 뎅끼 가부시끼가이샤 Semiconductor device & manufacturing method thereof
DE69127395T2 (en) * 1990-05-11 1998-01-02 Asahi Glass Co Ltd Method of manufacturing a thin film transistor with polycrystalline semiconductor
US5126283A (en) * 1990-05-21 1992-06-30 Motorola, Inc. Process for the selective encapsulation of an electrically conductive structure in a semiconductor device
US5227321A (en) * 1990-07-05 1993-07-13 Micron Technology, Inc. Method for forming MOS transistors
JP3163092B2 (en) * 1990-08-09 2001-05-08 株式会社東芝 Method for manufacturing semiconductor device
JP2940880B2 (en) * 1990-10-09 1999-08-25 三菱電機株式会社 Semiconductor device and manufacturing method thereof
US5514879A (en) * 1990-11-20 1996-05-07 Semiconductor Energy Laboratory Co., Ltd. Gate insulated field effect transistors and method of manufacturing the same
JP2999271B2 (en) * 1990-12-10 2000-01-17 株式会社半導体エネルギー研究所 Display device
US5097301A (en) * 1990-12-19 1992-03-17 Intel Corporation Composite inverse T-gate metal oxide semiconductor device and method of fabrication
DE69125260T2 (en) * 1990-12-28 1997-10-02 Sharp Kk A method of manufacturing a thin film transistor and an active matrix substrate for liquid crystal display devices
US5521107A (en) * 1991-02-16 1996-05-28 Semiconductor Energy Laboratory Co., Ltd. Method for forming a field-effect transistor including anodic oxidation of the gate
EP0499979A3 (en) * 1991-02-16 1993-06-09 Semiconductor Energy Laboratory Co., Ltd. Electro-optical device
USRE36314E (en) * 1991-03-06 1999-09-28 Semiconductor Energy Laboratory Co., Ltd. Insulated gate field effect semiconductor devices having a LDD region and an anodic oxide film of a gate electrode
KR960001611B1 (en) * 1991-03-06 1996-02-02 가부시끼가이샤 한도다이 에네르기 겐뀨쇼 Insulated gate type fet and its making method
JP2794678B2 (en) * 1991-08-26 1998-09-10 株式会社 半導体エネルギー研究所 Insulated gate semiconductor device and method of manufacturing the same
JP2794499B2 (en) * 1991-03-26 1998-09-03 株式会社半導体エネルギー研究所 Method for manufacturing semiconductor device
JP3277548B2 (en) * 1991-05-08 2002-04-22 セイコーエプソン株式会社 Display board
JP2717237B2 (en) * 1991-05-16 1998-02-18 株式会社 半導体エネルギー研究所 Insulated gate semiconductor device and method of manufacturing the same
US5151374A (en) * 1991-07-24 1992-09-29 Industrial Technology Research Institute Method of forming a thin film field effect transistor having a drain channel junction that is spaced from the gate electrode
JP2845303B2 (en) * 1991-08-23 1999-01-13 株式会社 半導体エネルギー研究所 Semiconductor device and manufacturing method thereof
US5545571A (en) * 1991-08-26 1996-08-13 Semiconductor Energy Laboratory Co., Ltd. Method of making TFT with anodic oxidation process using positive and negative voltages
US5650338A (en) * 1991-08-26 1997-07-22 Semiconductor Energy Laboratory Co., Ltd. Method for forming thin film transistor
US5495121A (en) * 1991-09-30 1996-02-27 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device
JP2650543B2 (en) * 1991-11-25 1997-09-03 カシオ計算機株式会社 Matrix circuit drive
JP2564725B2 (en) * 1991-12-24 1996-12-18 株式会社半導体エネルギー研究所 Method of manufacturing MOS transistor
JP3313432B2 (en) * 1991-12-27 2002-08-12 株式会社東芝 Semiconductor device and manufacturing method thereof
US5485019A (en) * 1992-02-05 1996-01-16 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and method for forming the same
US5241139A (en) * 1992-03-25 1993-08-31 International Business Machines Corporation Method and apparatus for determining the position of a member contacting a touch screen
EP0589478B1 (en) * 1992-09-25 1999-11-17 Sony Corporation Liquid crystal display device
TW232751B (en) * 1992-10-09 1994-10-21 Semiconductor Energy Res Co Ltd Semiconductor device and method for forming the same
US5403762A (en) * 1993-06-30 1995-04-04 Semiconductor Energy Laboratory Co., Ltd. Method of fabricating a TFT
JP3587537B2 (en) * 1992-12-09 2004-11-10 株式会社半導体エネルギー研究所 Semiconductor device
JP3437863B2 (en) * 1993-01-18 2003-08-18 株式会社半導体エネルギー研究所 Method for manufacturing MIS type semiconductor device
US5747355A (en) * 1993-03-30 1998-05-05 Semiconductor Energy Laboratory Co., Ltd. Method for producing a transistor using anodic oxidation
US5572040A (en) * 1993-07-12 1996-11-05 Peregrine Semiconductor Corporation High-frequency wireless communication system on a single ultrathin silicon on sapphire chip
US5492843A (en) * 1993-07-31 1996-02-20 Semiconductor Energy Laboratory Co., Ltd. Method of fabricating semiconductor device and method of processing substrate
TW297142B (en) * 1993-09-20 1997-02-01 Handotai Energy Kenkyusho Kk
JP3030368B2 (en) * 1993-10-01 2000-04-10 株式会社半導体エネルギー研究所 Semiconductor device and manufacturing method thereof
US6777763B1 (en) * 1993-10-01 2004-08-17 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and method for fabricating the same
US5719065A (en) * 1993-10-01 1998-02-17 Semiconductor Energy Laboratory Co., Ltd. Method for manufacturing semiconductor device with removable spacers
JPH07135323A (en) * 1993-10-20 1995-05-23 Semiconductor Energy Lab Co Ltd Thin film semiconductor integrated circuit and its fabrication
KR970010685B1 (en) * 1993-10-30 1997-06-30 삼성전자 주식회사 Thin film transistor semiconductor device & manufacturing method
TW299897U (en) * 1993-11-05 1997-03-01 Semiconductor Energy Lab A semiconductor integrated circuit
US5576231A (en) * 1993-11-05 1996-11-19 Semiconductor Energy Laboratory Co., Ltd. Process for fabricating an insulated gate field effect transistor with an anodic oxidized gate electrode
JP2873660B2 (en) * 1994-01-08 1999-03-24 株式会社半導体エネルギー研究所 Manufacturing method of semiconductor integrated circuit
JP3330736B2 (en) * 1994-07-14 2002-09-30 株式会社半導体エネルギー研究所 Method for manufacturing semiconductor device
US5789762A (en) * 1994-09-14 1998-08-04 Semiconductor Energy Laboratory Co., Ltd. Semiconductor active matrix circuit
JP3246715B2 (en) 1996-07-01 2002-01-15 松下電器産業株式会社 Audio signal compression method and audio signal compression device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5388181A (en) * 1990-05-29 1995-02-07 Anderson; David J. Digital audio compression system
US5787204A (en) * 1991-01-10 1998-07-28 Olympus Optical Co., Ltd. Image signal decoding device capable of removing block distortion with simple structure
US5428395A (en) * 1992-06-18 1995-06-27 Samsung Electronics Co., Ltd. Encoding and decoding method and apparatus thereof using a variable picture partitioning technique
US5911130A (en) * 1995-05-30 1999-06-08 Victor Company Of Japan, Ltd. Audio signal compression and decompression utilizing amplitude, frequency, and time information
US6475245B2 (en) * 1997-08-29 2002-11-05 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4KBPS having phase alignment between mode-switched frames
US6263312B1 (en) * 1997-10-03 2001-07-17 Alaris, Inc. Audio compression and decompression employing subband decomposition of residual signal and distortion reduction
US5987407A (en) * 1997-10-28 1999-11-16 America Online, Inc. Soft-clipping postprocessor scaling decoded audio signal frame saturation regions to approximate original waveform shape and maintain continuity
US6006179A (en) * 1997-10-28 1999-12-21 America Online, Inc. Audio codec using adaptive sparse vector quantization with subband vector classification
US6256422B1 (en) * 1998-11-04 2001-07-03 International Business Machines Corporation Transform-domain correction of real-domain errors
US6370502B1 (en) * 1999-05-27 2002-04-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6704706B2 (en) * 1999-05-27 2004-03-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020615B2 (en) * 2000-11-03 2006-03-28 Koninklijke Philips Electronics N.V. Method and apparatus for audio coding using transient relocation
US20020120445A1 (en) * 2000-11-03 2002-08-29 Renat Vafin Coding signals
US7363230B2 (en) * 2002-08-01 2008-04-22 Yamaha Corporation Audio data processing apparatus and audio data distributing apparatus
US20040024592A1 (en) * 2002-08-01 2004-02-05 Yamaha Corporation Audio data processing apparatus and audio data distributing apparatus
US8014609B2 (en) 2002-08-23 2011-09-06 Quonsil Pl. 3, Llc Digital representation of audio waveforms using peak shifting to provide increased dynamic range
US7356186B2 (en) * 2002-08-23 2008-04-08 Kulas Charles J Digital representation of audio waveforms using peak shifting to provide increased dynamic range
US20080140402A1 (en) * 2002-08-23 2008-06-12 Kulas Charles J Digital representation of audio waveforms using peak shifting to provide increased dynamic range
US20040039565A1 (en) * 2002-08-23 2004-02-26 Kulas Charles J. Digital representation of audio waveforms using peak shifting to provide increased dynamic range
US20090024395A1 (en) * 2004-01-19 2009-01-22 Matsushita Electric Industrial Co., Ltd. Audio signal encoding method, audio signal decoding method, transmitter, receiver, and wireless microphone system
US20050185732A1 (en) * 2004-02-25 2005-08-25 Nokia Corporation Multiscale wireless communication
US7680208B2 (en) * 2004-02-25 2010-03-16 Nokia Corporation Multiscale wireless communication
US20060020453A1 (en) * 2004-05-13 2006-01-26 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US8019600B2 (en) * 2004-05-13 2011-09-13 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US7546240B2 (en) * 2005-07-15 2009-06-09 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US20070016405A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US20070094015A1 (en) * 2005-09-22 2007-04-26 Georges Samake Audio codec using the Fast Fourier Transform, the partial overlap and a decomposition in two plans based on the energy.
US8719040B2 (en) * 2006-10-02 2014-05-06 Sony Corporation Signal processing apparatus, signal processing method, and computer program
US20080086313A1 (en) * 2006-10-02 2008-04-10 Sony Corporation Signal processing apparatus, signal processing method, and computer program
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US20100281017A1 (en) * 2009-04-29 2010-11-04 Oracle International Corp Partition pruning via query rewrite
US8533181B2 (en) * 2009-04-29 2013-09-10 Oracle International Corporation Partition pruning via query rewrite
US20110110421A1 (en) * 2009-11-10 2011-05-12 Electronics And Telecommunications Research Institute Rate control method for video encoder using kalman filter and fir filter
US8451891B2 (en) * 2009-11-10 2013-05-28 Electronics And Telecommunications Research Institute Rate control method for video encoder using Kalman filter and FIR filter

Also Published As

Publication number Publication date
US8712785B2 (en) 2014-04-29
ATE425531T1 (en) 2009-03-15
US6885993B2 (en) 2005-04-26
ATE278236T1 (en) 2004-10-15
DE60041790D1 (en) 2009-04-23
US20090063164A1 (en) 2009-03-05
US20020111801A1 (en) 2002-08-15
US6704706B2 (en) 2004-03-09
US8010371B2 (en) 2011-08-30
DE60014363D1 (en) 2004-11-04
EP1480201B1 (en) 2009-03-11
CA2373520A1 (en) 2000-12-07
EP1480201A2 (en) 2004-11-24
WO2000074038A1 (en) 2000-12-07
US20050159940A1 (en) 2005-07-21
EP1181686B1 (en) 2004-09-29
US7181403B2 (en) 2007-02-20
EP1181686A1 (en) 2002-02-27
US20110282677A1 (en) 2011-11-17
US20130173272A1 (en) 2013-07-04
CA2373520C (en) 2006-01-24
US8285558B2 (en) 2012-10-09
US6370502B1 (en) 2002-04-09
US7418395B2 (en) 2008-08-26
EP1480201A3 (en) 2005-01-19
US20070083364A1 (en) 2007-04-12
US20130173271A1 (en) 2013-07-04
DE60014363T2 (en) 2005-10-13

Similar Documents

Publication Publication Date Title
US6885993B2 (en) Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6006179A (en) Audio codec using adaptive sparse vector quantization with subband vector classification
US8924201B2 (en) Audio encoder and decoder
TWI441170B (en) Audio signal decoder, audio signal encoder, method for decoding an audio signal, method for encoding an audio signal and computer program using a pitch-dependent adaptation of a coding context
CA2853987A1 (en) Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
RU2809981C1 (en) Audio decoder, audio encoder and related methods using united coding of scaling parameters for multi-channel audio signal channels
RU2807462C1 (en) Audio data quantization device, audio data dequantation device and related methods

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: BANK OF AMERICAN, N.A. AS COLLATERAL AGENT,TEXAS

Free format text: SECURITY AGREEMENT;ASSIGNORS:AOL INC.;AOL ADVERTISING INC.;BEBO, INC.;AND OTHERS;REEL/FRAME:023649/0061

Effective date: 20091209

Owner name: BANK OF AMERICAN, N.A. AS COLLATERAL AGENT, TEXAS

Free format text: SECURITY AGREEMENT;ASSIGNORS:AOL INC.;AOL ADVERTISING INC.;BEBO, INC.;AND OTHERS;REEL/FRAME:023649/0061

Effective date: 20091209

AS Assignment

Owner name: AMERICA ONLINE, INC., VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:07039-104IL1, .;MANTEGNA, JOHN;PERLMUTTER, KEREN O.;REEL/FRAME:023713/0240

Effective date: 19990719

AS Assignment

Owner name: AOL LLC,VIRGINIA

Free format text: CHANGE OF NAME;ASSIGNOR:AMERICA ONLINE, INC.;REEL/FRAME:023723/0585

Effective date: 20060403

Owner name: AOL INC.,VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AOL LLC;REEL/FRAME:023723/0645

Effective date: 20091204

Owner name: AOL LLC, VIRGINIA

Free format text: CHANGE OF NAME;ASSIGNOR:AMERICA ONLINE, INC.;REEL/FRAME:023723/0585

Effective date: 20060403

Owner name: AOL INC., VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AOL LLC;REEL/FRAME:023723/0645

Effective date: 20091204

AS Assignment

Owner name: LIGHTNINGCAST LLC, NEW YORK

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: AOL ADVERTISING INC, NEW YORK

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: SPHERE SOURCE, INC, VIRGINIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: TACODA LLC, NEW YORK

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: TRUVEO, INC, CALIFORNIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: MAPQUEST, INC, COLORADO

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: NETSCAPE COMMUNICATIONS CORPORATION, VIRGINIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: QUIGO TECHNOLOGIES LLC, NEW YORK

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: GOING INC, MASSACHUSETTS

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: AOL INC, VIRGINIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

Owner name: YEDDA, INC, VIRGINIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:BANK OF AMERICA, N A;REEL/FRAME:025323/0416

Effective date: 20100930

AS Assignment

Owner name: AMERICA ONLINE, INC., VIRGINIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE NAME OF FIRST ASSIGNOR PREVIOUSLY RECORDED ON REEL 023713 FRAME 0240. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WU, SHUWU;MANTEGNA, JOHN;PERLMUTTER, KEREN;REEL/FRAME:028363/0473

Effective date: 19990719

AS Assignment

Owner name: FACEBOOK, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AOL INC.;REEL/FRAME:028487/0602

Effective date: 20120614

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: META PLATFORMS, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:058961/0436

Effective date: 20211028