CA2246532A1 - Perceptual audio coding - Google Patents
Perceptual audio coding Download PDFInfo
- Publication number
- CA2246532A1 CA2246532A1 CA002246532A CA2246532A CA2246532A1 CA 2246532 A1 CA2246532 A1 CA 2246532A1 CA 002246532 A CA002246532 A CA 002246532A CA 2246532 A CA2246532 A CA 2246532A CA 2246532 A1 CA2246532 A1 CA 2246532A1
- Authority
- CA
- Canada
- Prior art keywords
- band
- energy
- codebook
- codevector
- codevectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0013—Codebook search algorithms
Abstract
A method and apparatus for perceptual audio coding. The method and apparatus provide high-quality sound for coding rates down to and below 1 bit/sample for a wide variety of input signals including speech, music and background noise. The invention provides a new distortion measure for coding the input speech and training the codebooks, where the distortion measure is based on a masking spectrum of the input frequency spectrum. The invention also provides a method for direct calculation of masking thresholds from a modified discrete cosine transform of the input signal. The invention also provides a predictive and non-predictive vector quantizer for determining the energy of the coefficients representing the frequency spectrum. As well, the invention provides a split vector quantizer for quantizing the fine structure of coefficients representing the frequency spectrum. Bit allocation for the split vector quantizer is based on the masking threshold. The split vector quantizer also makes use of embedded codebooks.
Furthermore, the invention makes use of a new transient detection method for selection of input windows.
Furthermore, the invention makes use of a new transient detection method for selection of input windows.
Claims (86)
1. A method of transmitting a discretly represented frequency signal within a frequency band, said signal discretely represented by coefficients at certain frequencies within said band, comprising the steps of:
(a) providing a codebook of codevectors for said band, each codevector having an element for each of said certain frequencies;
(b) obtaining a masking threshold for said frequency signal;
(c) for each one of a plurality of codevectors in said codebook, obtaining a distortion measure by the steps of:
for each of said coefficients of said frequency signal (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain said distortion measure;
(d) selecting a codevector having a smallest distortion measure;
(e) transmitting an index to said selected codevector.
(a) providing a codebook of codevectors for said band, each codevector having an element for each of said certain frequencies;
(b) obtaining a masking threshold for said frequency signal;
(c) for each one of a plurality of codevectors in said codebook, obtaining a distortion measure by the steps of:
for each of said coefficients of said frequency signal (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain said distortion measure;
(d) selecting a codevector having a smallest distortion measure;
(e) transmitting an index to said selected codevector.
2. The method of claim 1 wherein said codevectors are normalised with respect to energy and wherein step (c)(i) of obtaining a representation of a difference between a given coefficient of said frequency signal and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with a measure of energy in said signal and including the step of:
(f) transmitting an indication of energy in said signal.
(f) transmitting an indication of energy in said signal.
3. The method of claim 2 wherein said step of obtaining a masking threshold comprises convolving a measure of energy in said signal with a known spreading function.
4. The method of claim 3 wherein said step of obtaining a maksing threshold further comprises adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.
5. A method of transmitting a discretely represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
(a) grouping said coefficients into frequency bands;
(b) for each band - providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
- obtaining a representation of energy of coefficients in said each band;
- selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
- selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an index to said selected codevector;
(d) concatenating said selected codevector addresses; and (e) transmitting said concatenated codevector addresses and an indication of each said representation of energy.
(a) grouping said coefficients into frequency bands;
(b) for each band - providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
- obtaining a representation of energy of coefficients in said each band;
- selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
- selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an index to said selected codevector;
(d) concatenating said selected codevector addresses; and (e) transmitting said concatenated codevector addresses and an indication of each said representation of energy.
6. The method of claim 5 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.
7. The method of claim 6 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.
8. The method of claim 7 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.
9. The method of claim 5 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.
10. The method of claim 5 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
11. The method of claim 10 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.
12. The method of claim 5 wherein each said codebook is sorted so as to provide sets of codevectors addressed by corresponding sets of addresses such that each larger set of addresses addresses a larger set of codevectors which span a frequency spectrum of said each band with increasingly less granularity.
13. A method of transmitting a discretely represented time series comprising the steps of:
- obtaining a frame of time samples;
- obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
- grouping said coefficients into frequency bands;
- for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
- concatenating said selected codevector addresses; and - transmitting said concatenated codevector addresses and an indication of each said representation of energy.
- obtaining a frame of time samples;
- obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
- grouping said coefficients into frequency bands;
- for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
- concatenating said selected codevector addresses; and - transmitting said concatenated codevector addresses and an indication of each said representation of energy.
14. The method of claim 13 wherein said step of obtaining a representation of energy of coefficients in said each band comprises the steps of:
- determining an indication of energy for said band;
- determining an average energy for said band;
- quantising said average energy by finding an entry in an average energy codebook which, when adjusted with a representation of average energy from a frequency representation for a previous frame, best approximates said average energy;
- normalising said energy indication with respect to said quantised approximation of said average energy;
- quantising said normalised energy indication by manipulating a normalised energy indication from a frequency representation for said previous frame with each of a number of prediction matrices and selecting a prediction matrix resulting in a quantised normalised energy indication which best approximates said normalised energy indication;
- obtaining said representation of energy from said quantised normalised energy.
- determining an indication of energy for said band;
- determining an average energy for said band;
- quantising said average energy by finding an entry in an average energy codebook which, when adjusted with a representation of average energy from a frequency representation for a previous frame, best approximates said average energy;
- normalising said energy indication with respect to said quantised approximation of said average energy;
- quantising said normalised energy indication by manipulating a normalised energy indication from a frequency representation for said previous frame with each of a number of prediction matrices and selecting a prediction matrix resulting in a quantised normalised energy indication which best approximates said normalised energy indication;
- obtaining said representation of energy from said quantised normalised energy.
15. The method of claim 13 including the steps of:
- obtaining an index to said entry in said average energy codebook;
- obtaining an index to said selected prediction matrix;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises - transmitting said average energy codebook index; and - transmitting said selected prediction matrix index.
- obtaining an index to said entry in said average energy codebook;
- obtaining an index to said selected prediction matrix;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises - transmitting said average energy codebook index; and - transmitting said selected prediction matrix index.
16. The method of claim 15 including the steps of:
- obtaining an actual residual from a difference between said quantised normalised energy indication and said normalised energy indication;
- comparing said actual residual to a residual codebook to find a quantised residual which is a best approximation said actual residual;
- adjusting said quantised normalised energy with said quantised residual;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said a combination of said quantised normalised energy and said quantised residual.
- obtaining an actual residual from a difference between said quantised normalised energy indication and said normalised energy indication;
- comparing said actual residual to a residual codebook to find a quantised residual which is a best approximation said actual residual;
- adjusting said quantised normalised energy with said quantised residual;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said a combination of said quantised normalised energy and said quantised residual.
17. The method of claim 16 including the steps of:
- obtaining an actual second residual from a difference between (i) said combination of said quantised normalised energy and said quantised residual and (ii) said normalised energy indication;
- comparing said actual second residual to a second residual codebook to find a quantised second residual which is a best approximation of said actual second residual;
adjusting said combination with said quantised second residual to obtain a further combination;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said further combination.
- obtaining an actual second residual from a difference between (i) said combination of said quantised normalised energy and said quantised residual and (ii) said normalised energy indication;
- comparing said actual second residual to a second residual codebook to find a quantised second residual which is a best approximation of said actual second residual;
adjusting said combination with said quantised second residual to obtain a further combination;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said further combination.
18. The method of claim 17 including the step of obtaining an index to said quantised residual in said residual codebook and an index to said quantised second residual in said second residual codebook;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises transmitting said quantised residual index and said quantised second residual index.
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises transmitting said quantised residual index and said quantised second residual index.
19. The method of claim 18 wherein said step of obtaining a representation of energy comprises unnormalising said further combination with said quantised average energy.
20. The method of claim 13 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.
21. The method of claim 20 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.
22. The method of claim 21 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.
23. The method of claim 13 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.
24. The method of claim 13 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
25. The method of claim 24 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.
26. A method of receiving a discretly represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
- providing pre-defined frequency bands;
- for each band providing a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
- receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
- determining a length of address for each band based on said per band indication of a representation of energy;
- parsing said concatenated codevector addresses based on said address length determining step;
- addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
- providing pre-defined frequency bands;
- for each band providing a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
- receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
- determining a length of address for each band based on said per band indication of a representation of energy;
- parsing said concatenated codevector addresses based on said address length determining step;
- addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
27. A transmitter comprising:
means for obtaining a frame of time samples;
means for obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
means for grouping said coefficients into frequency bands;
means for, for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
means for concatenating said selected codevector addresses; and means for transmitting said concatenated codevector addresses and an indication of each said representation of energy.
means for obtaining a frame of time samples;
means for obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
means for grouping said coefficients into frequency bands;
means for, for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
means for concatenating said selected codevector addresses; and means for transmitting said concatenated codevector addresses and an indication of each said representation of energy.
28. A receiver comprising:
means for providing pre-defined frequency bands;
a memory storing, for each band, a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
means for receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
means for determining a length of address for each band based on said per band indication of a representation of energy;
means for parsing said concatenated codevector addresses based on said address length determining step;
means for addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
means for providing pre-defined frequency bands;
a memory storing, for each band, a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
means for receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
means for determining a length of address for each band based on said per band indication of a representation of energy;
means for parsing said concatenated codevector addresses based on said address length determining step;
means for addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
29. A method of obtaining a codebook of codevectors which span a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and populating said codebook with said estimated codevectors resulting after a last iteration.
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and populating said codebook with said estimated codevectors resulting after a last iteration.
30 . The method of claim 29 wherein each distortion measure is obtained by the steps of:
- for each element of said training vector (i) obtaining a representation of a difference between a corresponding element of said one estimated codevector and (ii) reducing said difference by a masking threshold of said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
- for each element of said training vector (i) obtaining a representation of a difference between a corresponding element of said one estimated codevector and (ii) reducing said difference by a masking threshold of said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
31. The method of claim 30 wherein said masking threshold is obtained by convolving a measure of energy in said training vector with a known spreading function.
32. The method of claim 31 wherein said masking threshold is obtained by adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.
33. The method of claim 32 wherein said estimated codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given element of said training vector and a corresponding element of said one estimated codevector comprises obtaining a squared difference between said given element and said corresponding element after unnormalising said corresponding element with a measure of energy in said training vector
34. The method of claim 33 wherein said step of determining a centroid for a Voronoi region comprises finding a candidate vector within said region which generates a minimum value for a sum of distortion measures between said candidate vector and each training vector in said region.
35. The method of claim 34 wherein each distortion measure in said sum of distortion measures is obtained by the steps of:
- for each training vector, for each element of said each training vector (i) obtaining a representation of a difference between a corresponding element of said candidate vector and (ii) reducing said difference by a masking threshold for said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
- for each training vector, for each element of said each training vector (i) obtaining a representation of a difference between a corresponding element of said candidate vector and (ii) reducing said difference by a masking threshold for said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
36. The method of claim 29 wherein said estimated codevectors with which said codebook is populated is a first set of codevectors and wherein said codebook is enlarged by the steps of:
- fixing said first set of estimated codevectors;
- receiving an initial second set of estimated codevectors;
- associating each training vector with one estimated codevector from said first set or said second set with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for Voronoi region containing an estimated codevector from said second set;
- selecting each centroid vector as a new estimated second set codevector;
- repeating from said associating step until a difference between new estimated second set codevectors and estimated second set codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated second set codevectors resulting after a last iteration.
- fixing said first set of estimated codevectors;
- receiving an initial second set of estimated codevectors;
- associating each training vector with one estimated codevector from said first set or said second set with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for Voronoi region containing an estimated codevector from said second set;
- selecting each centroid vector as a new estimated second set codevector;
- repeating from said associating step until a difference between new estimated second set codevectors and estimated second set codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated second set codevectors resulting after a last iteration.
37. The method of claim 36 including the step of sorting said second set estimated codevectors to an end of said codebook whereby to obtain an embedded codebook.
38. A method of generating an embedded codebook for a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
(a) obtaining an optimized larger first codebook of codevectors which span said frequency band;
(b) obtaining an optimized smaller second codebook of codevectors which span said frequency band;
(c) fording codevectors in said first codebook which best approximate each entry in said second codebook;
(d) sorting said first codebook to place said codevectors found in step (c) at a front of said first codebook.
(a) obtaining an optimized larger first codebook of codevectors which span said frequency band;
(b) obtaining an optimized smaller second codebook of codevectors which span said frequency band;
(c) fording codevectors in said first codebook which best approximate each entry in said second codebook;
(d) sorting said first codebook to place said codevectors found in step (c) at a front of said first codebook.
39. The method of claim 38 wherein each step of obtaining an optimized codebook comprises the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated codevectors resulting after a last iteration.
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated codevectors resulting after a last iteration.
40. The method of claim 39 wherein step (c) comprises utilising a least squares method to find codevectors in said first codebook which best approximate each entry in said second codebook.
41. A method for allocating encoding bits to bands within the frequency spectrum in a perceptual audio coding transmitter, said transmitter having a split VQ unit, said method comprising the steps of:
(A) receiving at least one masking threshold and at least one spectral energy for each band;
(B) allocating bits to each band based on said masking threshold and spectral energy for each band; and (C) transmitting the bit allocation for each band to the split VQ unit.
(A) receiving at least one masking threshold and at least one spectral energy for each band;
(B) allocating bits to each band based on said masking threshold and spectral energy for each band; and (C) transmitting the bit allocation for each band to the split VQ unit.
42. The method of claim 41 wherein the step of allocating bits to each band based on said masking threshold and spectral energy for each band further comprises the steps of:
(B.1) calculating a gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold and subtracting the ratio of the (bits already allocated to that band) to (the coefficients in that band, multiplied by some constant);
(B.2) allocating a bit to the band with the highest gap value; and (B.3) repeating steps B.1 and B.2 until all bits available for transmission have been allocated.
(B.1) calculating a gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold and subtracting the ratio of the (bits already allocated to that band) to (the coefficients in that band, multiplied by some constant);
(B.2) allocating a bit to the band with the highest gap value; and (B.3) repeating steps B.1 and B.2 until all bits available for transmission have been allocated.
43. The method of claim 42 further comprising the step of:
(A.1) calculating a first approximation of the number of bits to be allocated to each band.
(A.1) calculating a first approximation of the number of bits to be allocated to each band.
44. The method of claim 43 wherein the step of calculating a first approximation of the number of bits to be allocated to each band comprises the steps of:
(A.1.1) calculating a second gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold for that band;
(A.1.2) approximating the number of bits for each band as equal a second ratio of the second gap value times the number of coefficients in the band times the total number of bits available for transmission to the sum over all bands of the product of the second gap value times the number of coefficients in the band;
(A.1.3) discarding the fractional results of the second ratio to yield an integer second ratio; and (A.1.4) allocating to each band as a first approximation said integer second ratio.
(A.1.1) calculating a second gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold for that band;
(A.1.2) approximating the number of bits for each band as equal a second ratio of the second gap value times the number of coefficients in the band times the total number of bits available for transmission to the sum over all bands of the product of the second gap value times the number of coefficients in the band;
(A.1.3) discarding the fractional results of the second ratio to yield an integer second ratio; and (A.1.4) allocating to each band as a first approximation said integer second ratio.
45. A method of selecting a window for calculating frequency domain coefficients in a perceptual audio coding transmitter, said method comprising the steps of:
(A) receiving a series of time samples of the input signal;
(B) determining when a strong positive transient occurs in said series; and, (C) switching to a different window when a strong positive transient is detected.
(A) receiving a series of time samples of the input signal;
(B) determining when a strong positive transient occurs in said series; and, (C) switching to a different window when a strong positive transient is detected.
46. The method of claim 45 wherein the step of determining when a strong positive transient occurs in said series comprises the steps of:
(B.1) calculating for a set of n successive time samples in said series the sum of the squares of the amplitudes for the three successive time samples to yield a first sum;
(B.2) calculating for the next n successive time samples in said series the sum of the squares of the amplitudes of the next three successive time samples to yield a second sum;
(B.3) calculating a ratio of the first sum less the second sum to the first sum;
(B.4) determining a strong positive transient has occurred when said ratio exceeds a threshold value;
(B.1) calculating for a set of n successive time samples in said series the sum of the squares of the amplitudes for the three successive time samples to yield a first sum;
(B.2) calculating for the next n successive time samples in said series the sum of the squares of the amplitudes of the next three successive time samples to yield a second sum;
(B.3) calculating a ratio of the first sum less the second sum to the first sum;
(B.4) determining a strong positive transient has occurred when said ratio exceeds a threshold value;
47. The method of claim 46 wherein n has the value 3.
48. The method of claim 45 wherein said different window is a first transitional window.
49. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a series of short windows when a strong positive transient is detected in said next series.
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a series of short windows when a strong positive transient is detected in said next series.
50. The method of claim 49 wherein the series of short windows is a set of three short windows.
51. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a second transitional window when a strong positive transient is not detected in said next series.
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a second transitional window when a strong positive transient is not detected in said next series.
52. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a series of short windows when a strong positive transient is detected in said second next series.
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a series of short windows when a strong positive transient is detected in said second next series.
53. The method of claim 52 wherein the series of short windows is a set of three short windows.
54. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a second transitional window when a strong positive transient is not detected in said second next series.
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a second transitional window when a strong positive transient is not detected in said second next series.
55. The method of claim 46 wherein said threshold value is 5.
56. In a perceptual audio coder, a method for calculating the masking threshold for a band, said band being one of a plurality of bands in a frame, said method comprising the steps of (A) receiving an input frame;
(B) calculating MDCT coefficients for each band of said frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(D) convolving a normalized spreading function with said power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
(B) calculating MDCT coefficients for each band of said frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(D) convolving a normalized spreading function with said power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
57. The method of claim 56, wherein said offset measure is calculated from the band number and a spectral flatness measure.
58. The method of claim 56 wherein said spectral flatness measure is 0.5.
59. The method of claim 57 wherein said spectral flatness measure is the ratio of the geometric mean of the MDCT coefficients to the arithmetic mean of the MDCT
coefficients.
coefficients.
60. The method of claim 59 wherein the offset is calculated according to the equation:
61. The method of claim 56, wherein said spreading function is normalized by:
(I) calculating the overall gain due to the unnormalized spreading function;
(II) dividing unnormalized spreading function values by the overall gain due to the spreading function.
(I) calculating the overall gain due to the unnormalized spreading function;
(II) dividing unnormalized spreading function values by the overall gain due to the spreading function.
62. The method of claim 60, wherein the unnormalized spreading function is:
F i=5.5(1-a) + (14.5 + i) a Where F i is the offset for the ith band; and a is the spectral flatness measure for the frame.
F i=5.5(1-a) + (14.5 + i) a Where F i is the offset for the ith band; and a is the spectral flatness measure for the frame.
63. In a perceptual audio coder, a method for calculating the masking threshold for a band, said method comprising the steps of:
(A) receiving an input frame;
(B) calculating MDCT coefficients for each band of the frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(C.1) calculating a quantized spectral energy for each band from said spectral energy for each band;
(D) convolving a normalized spreading function with said quantized power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
(A) receiving an input frame;
(B) calculating MDCT coefficients for each band of the frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(C.1) calculating a quantized spectral energy for each band from said spectral energy for each band;
(D) convolving a normalized spreading function with said quantized power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
64. In a perceptual audio coding transmitter, a method for quantizing the spectral energy of MDCT coefficients in a band of a frame comprising the steps of:
(A) receiving MDCT coefficients for each band in the frame;
(B) calculating the energy in each band from the MDCT coefficients;
(C) calculating a quantized value for the average energy of the frame;
(D) calculating a normalized energy vector for the frame by subtracting in the log domain the quantized value of the average energy of the frame from the energy in each band;
(E) determining a best prediction matrix to predict the normalized energy vector;
(F) calculating a first residual vector from the best predicted normalized energy vector and the normalized energy vector for each band;
(G) finding a first codevector which most closely matches the first residual vector;
(H) calculating and storing the normalized quantized energy vector for the frame; and, (I) transmitting the indices of the quantized energy, prediction matrix and first codevector to the receiver.
(A) receiving MDCT coefficients for each band in the frame;
(B) calculating the energy in each band from the MDCT coefficients;
(C) calculating a quantized value for the average energy of the frame;
(D) calculating a normalized energy vector for the frame by subtracting in the log domain the quantized value of the average energy of the frame from the energy in each band;
(E) determining a best prediction matrix to predict the normalized energy vector;
(F) calculating a first residual vector from the best predicted normalized energy vector and the normalized energy vector for each band;
(G) finding a first codevector which most closely matches the first residual vector;
(H) calculating and storing the normalized quantized energy vector for the frame; and, (I) transmitting the indices of the quantized energy, prediction matrix and first codevector to the receiver.
65. The method of claim 64 wherein the step of calculating the energy in each band from the MDCT coefficients comprises the step of:
(B.1) taking the sum of the squares of the absolute values of the MDCT
coefficients in the band.
(B.1) taking the sum of the squares of the absolute values of the MDCT
coefficients in the band.
66. The method of claim 64 wherein the step of calculating a quantized value for the average energy of the frame comprises the steps of:
(C.1) converting the energy in each band to the logarithmic domain;
(C.2) calculating the average log energy of the power spectrum by taking the sum of energy in each band and dividing by the number of bands;
(C.3) calculating a product of a leakage factor and the quantized value of the average log energy for the previous frame;
(C.4) subtracting this product from the average log energy of the power spectrum to yield a difference;
(C.5) finding the best match in a codebook to said difference; and, (C.6) adding the best match to said product to yield the quantized value for the average energy of the frame;
(C.1) converting the energy in each band to the logarithmic domain;
(C.2) calculating the average log energy of the power spectrum by taking the sum of energy in each band and dividing by the number of bands;
(C.3) calculating a product of a leakage factor and the quantized value of the average log energy for the previous frame;
(C.4) subtracting this product from the average log energy of the power spectrum to yield a difference;
(C.5) finding the best match in a codebook to said difference; and, (C.6) adding the best match to said product to yield the quantized value for the average energy of the frame;
67. The method of claim 64 wherein the step of determining a best prediction matrix to predict the normalized energy vector for all bands comprises the steps of:
(E.1) finding the prediction matrix which when multiplied by the normalized quantized energy vector of the previous frame gives the closest match to the normalized energy vector of the current frame;
(E.2) calculating a best predicted normalized energy vector by multiplying the prediction matrix which gives the closest match by the normalized quantized energy vector of the previous frame;
(E.1) finding the prediction matrix which when multiplied by the normalized quantized energy vector of the previous frame gives the closest match to the normalized energy vector of the current frame;
(E.2) calculating a best predicted normalized energy vector by multiplying the prediction matrix which gives the closest match by the normalized quantized energy vector of the previous frame;
68. The method of claim 67 wherein said prediction matrices are tridiagonal.
69. The method of claim 64 wherein the step of calculating a residual vector from the best predicted normalized energy vector and the normalized energy for each band comprises the step of subtracting the best predicted normalized energy from the normalized energy for each band.
70. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector.
71. The method of claim 64 further comprising the steps of (I) calculating a second residual vector by subtracting the first codevector which most closely matches the first residual vector from the first residual vector;
(J) finding a second codevector most closely matches the second residual vector; and, (K) transmitting the index to the second codevector to the receiver.
(J) finding a second codevector most closely matches the second residual vector; and, (K) transmitting the index to the second codevector to the receiver.
72. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector and to the codevector.
73. In a perceptual audio coding transmitter, a method for vector quantizing the MDCT
coefficients, said coefficients belonging to bands, said method comprising the steps of:
(A) receiving MDCT coefficients for each band;
(B) for each band:
(B.1) selecting a codevector that is the best match to the received MDCT coefficients for that band from a codebook;
(C) transmitting the indices for the selected codevectors to the receiver.
coefficients, said coefficients belonging to bands, said method comprising the steps of:
(A) receiving MDCT coefficients for each band;
(B) for each band:
(B.1) selecting a codevector that is the best match to the received MDCT coefficients for that band from a codebook;
(C) transmitting the indices for the selected codevectors to the receiver.
74. The method of claim 73 wherein the step of selecting a codevector from a codebook that is the best match to the received MDCT coefficients for that band further comprises the step of selecting the codevector that minimizes the energy between the codevector coefficients and the dead zone.
75. The method of claim 75 wherein the codevector that minimizes the energy between the codevector coefficients and the deadband satisfies the equation:
D i= ~max[0, E k(i) - t iu]
(sum over all coefficients in the ith critical band) Where the max function takes the larger value of the two arguments
D i= ~max[0, E k(i) - t iu]
(sum over all coefficients in the ith critical band) Where the max function takes the larger value of the two arguments
76. The method of claim 73 further comprising the steps of:
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from a codebook having 2b codevectors.
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from a codebook having 2b codevectors.
77. The method of claim 73 further comprising the steps of:
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from the first 2b codevectors in the codebook.
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from the first 2b codevectors in the codebook.
78. The method of claim 73 wherein at least one band comprises a plurality of critical bands.
79. In a perceptual audio coding system, a method of training the codebook in which the distortion measure used to select the codebok vectors for the codebook is calculated using the masking threshold.
80. The claim of claim 79 further comprising the steps of:
(A) producing a set of training vectors;
(B) calculating from each training vector a set of MDCT coefficients;
(C) calculating for each training vector a masking threshold for each band;
(D) making an estimate of codevectors for the codebook;
(E) calculating a distortion measure by calculating the energy of the difference between the MDCT coefficients for the training vector and the deadband surrounding the coefficients for the estimated codevectors;
(F) associating the coefficients within each band of each training vector with the estimated codevector that minimizes said distortion measure;
(G) calculating the centroid of each associated group;
(H) replacing the estimated codevectors by the centroids of each group;
(I) repeating steps (E) - (H) until the difference between successive estimated codevectors is small;
(J) populating the codebook with the estimated codevectors.
(A) producing a set of training vectors;
(B) calculating from each training vector a set of MDCT coefficients;
(C) calculating for each training vector a masking threshold for each band;
(D) making an estimate of codevectors for the codebook;
(E) calculating a distortion measure by calculating the energy of the difference between the MDCT coefficients for the training vector and the deadband surrounding the coefficients for the estimated codevectors;
(F) associating the coefficients within each band of each training vector with the estimated codevector that minimizes said distortion measure;
(G) calculating the centroid of each associated group;
(H) replacing the estimated codevectors by the centroids of each group;
(I) repeating steps (E) - (H) until the difference between successive estimated codevectors is small;
(J) populating the codebook with the estimated codevectors.
81. The method of claim 80 wherein the distortion method is calculated according to the equation:
D i = ~ max [0, E k(i) - t iu]
(sum over all coefficients in the i th critical band) Where the max function takes the larger value of the two arguments
D i = ~ max [0, E k(i) - t iu]
(sum over all coefficients in the i th critical band) Where the max function takes the larger value of the two arguments
82. The method of claim 80 wherein the centroid for each group is calculated according to the equation:
Xbest k (i) is that providing min ~~ max [0, (X k (i) - (G i)(0.5)Xbest k(i))2 - t iu) where ~ is a sum over all training vectors in the jth Voronoi region
Xbest k (i) is that providing min ~~ max [0, (X k (i) - (G i)(0.5)Xbest k(i))2 - t iu) where ~ is a sum over all training vectors in the jth Voronoi region
83. The method of claim 80 wherein the difference between successive estimated codevectors is small when a least squares difference between successive estimated codevectors is less than a threshold value, namely 10-4.
84. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2d codevectors;
(B) training a codebook having 2e codevectors, where a is less than d;
(C) finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook; and, (D) sorting the 2d codevectors so that the closest 2e are placed in the first 2e portion of the codebook
(A) training a codebook having 2d codevectors;
(B) training a codebook having 2e codevectors, where a is less than d;
(C) finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook; and, (D) sorting the 2d codevectors so that the closest 2e are placed in the first 2e portion of the codebook
85. The method of claim 84 wherein the step of finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook comprises the steps of:
(C.1) calculating the mean square difference between each codevector in the 2d element codebook and each of the codevectors in the 2d element codebook.
(C.2) selecting the codevector in the 2d element codebook which has the least mean square difference to each codevector in the 2e element codebook.
(C.1) calculating the mean square difference between each codevector in the 2d element codebook and each of the codevectors in the 2d element codebook.
(C.2) selecting the codevector in the 2d element codebook which has the least mean square difference to each codevector in the 2e element codebook.
86. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2f codevectors;
(B) estimating (2g - 2f) additional codevectors, where g is greater than f;
(C) forming a set of 2g codevectors from step (A) and from the (2g - 2f) additional estimated codevectors from step (B);
(D) determining the Voronoi regions for said set;
(E) determining the centroid of the Voronoi regions for the (2g - 2f) additional estimated codevectors;
(F) replacing the additional estimated codevectors by the centroids of their Voronoi regions;
(G) repeating steps (D) - (F) until the difference between successive additional estimated codevectors is small.
(H) populating a new 2g element codebook with the 2f codevectors from step (A) in a bottom 2f positions of said new 2g element codebook and populating the 2f + 1 to 2g positions of the codebook with the additional estimated codevectors.
(A) training a codebook having 2f codevectors;
(B) estimating (2g - 2f) additional codevectors, where g is greater than f;
(C) forming a set of 2g codevectors from step (A) and from the (2g - 2f) additional estimated codevectors from step (B);
(D) determining the Voronoi regions for said set;
(E) determining the centroid of the Voronoi regions for the (2g - 2f) additional estimated codevectors;
(F) replacing the additional estimated codevectors by the centroids of their Voronoi regions;
(G) repeating steps (D) - (F) until the difference between successive additional estimated codevectors is small.
(H) populating a new 2g element codebook with the 2f codevectors from step (A) in a bottom 2f positions of said new 2g element codebook and populating the 2f + 1 to 2g positions of the codebook with the additional estimated codevectors.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/146,752 US6704705B1 (en) | 1998-09-04 | 1998-09-04 | Perceptual audio coding |
CA002246532A CA2246532A1 (en) | 1998-09-04 | 1998-09-04 | Perceptual audio coding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/146,752 US6704705B1 (en) | 1998-09-04 | 1998-09-04 | Perceptual audio coding |
CA002246532A CA2246532A1 (en) | 1998-09-04 | 1998-09-04 | Perceptual audio coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2246532A1 true CA2246532A1 (en) | 2000-03-04 |
Family
ID=32471057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002246532A Abandoned CA2246532A1 (en) | 1998-09-04 | 1998-09-04 | Perceptual audio coding |
Country Status (2)
Country | Link |
---|---|
US (1) | US6704705B1 (en) |
CA (1) | CA2246532A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1676264A2 (en) * | 2003-09-29 | 2006-07-05 | Sony Electronics Inc. | A method of making a window type decision based on mdct data in audio encoding |
CN110047499A (en) * | 2013-01-29 | 2019-07-23 | 弗劳恩霍夫应用研究促进协会 | Low complex degree tone adaptive audio signal quantization |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3507743B2 (en) * | 1999-12-22 | 2004-03-15 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Digital watermarking method and system for compressed audio data |
TW521266B (en) * | 2000-07-13 | 2003-02-21 | Verbaltek Inc | Perceptual phonetic feature speech recognition system and method |
US20040002859A1 (en) * | 2002-06-26 | 2004-01-01 | Chi-Min Liu | Method and architecture of digital conding for transmitting and packing audio signals |
KR100462611B1 (en) * | 2002-06-27 | 2004-12-20 | 삼성전자주식회사 | Audio coding method with harmonic extraction and apparatus thereof. |
US7724827B2 (en) * | 2003-09-07 | 2010-05-25 | Microsoft Corporation | Multi-layer run level encoding and decoding |
US7426462B2 (en) * | 2003-09-29 | 2008-09-16 | Sony Corporation | Fast codebook selection method in audio encoding |
US7349842B2 (en) * | 2003-09-29 | 2008-03-25 | Sony Corporation | Rate-distortion control scheme in audio encoding |
US7630902B2 (en) * | 2004-09-17 | 2009-12-08 | Digital Rise Technology Co., Ltd. | Apparatus and methods for digital audio coding using codebook application ranges |
US7668715B1 (en) | 2004-11-30 | 2010-02-23 | Cirrus Logic, Inc. | Methods for selecting an initial quantization step size in audio encoders and systems using the same |
US7627481B1 (en) * | 2005-04-19 | 2009-12-01 | Apple Inc. | Adapting masking thresholds for encoding a low frequency transient signal in audio data |
US7885809B2 (en) * | 2005-04-20 | 2011-02-08 | Ntt Docomo, Inc. | Quantization of speech and audio coding parameters using partial information on atypical subsequences |
US7418394B2 (en) * | 2005-04-28 | 2008-08-26 | Dolby Laboratories Licensing Corporation | Method and system for operating audio encoders utilizing data from overlapping audio segments |
US8599925B2 (en) * | 2005-08-12 | 2013-12-03 | Microsoft Corporation | Efficient coding and decoding of transform blocks |
US8630849B2 (en) * | 2005-11-15 | 2014-01-14 | Samsung Electronics Co., Ltd. | Coefficient splitting structure for vector quantization bit allocation and dequantization |
US7461106B2 (en) * | 2006-09-12 | 2008-12-02 | Motorola, Inc. | Apparatus and method for low complexity combinatorial coding of signals |
CN101308655B (en) * | 2007-05-16 | 2011-07-06 | 展讯通信(上海)有限公司 | Audio coding and decoding method and layout design method of static discharge protective device and MOS component device |
US7774205B2 (en) * | 2007-06-15 | 2010-08-10 | Microsoft Corporation | Coding of sparse digital media spectral data |
JP5434592B2 (en) * | 2007-06-27 | 2014-03-05 | 日本電気株式会社 | Audio encoding method, audio decoding method, audio encoding device, audio decoding device, program, and audio encoding / decoding system |
CN101790756B (en) * | 2007-08-27 | 2012-09-05 | 爱立信电话股份有限公司 | Transient detector and method for supporting encoding of an audio signal |
ES2375192T3 (en) | 2007-08-27 | 2012-02-27 | Telefonaktiebolaget L M Ericsson (Publ) | CODIFICATION FOR IMPROVED SPEECH TRANSFORMATION AND AUDIO SIGNALS. |
US8576096B2 (en) * | 2007-10-11 | 2013-11-05 | Motorola Mobility Llc | Apparatus and method for low complexity combinatorial coding of signals |
US8209190B2 (en) * | 2007-10-25 | 2012-06-26 | Motorola Mobility, Inc. | Method and apparatus for generating an enhancement layer within an audio coding system |
US20090234642A1 (en) * | 2008-03-13 | 2009-09-17 | Motorola, Inc. | Method and Apparatus for Low Complexity Combinatorial Coding of Signals |
US8639519B2 (en) * | 2008-04-09 | 2014-01-28 | Motorola Mobility Llc | Method and apparatus for selective signal coding based on core encoder performance |
US8666733B2 (en) * | 2008-06-26 | 2014-03-04 | Japan Science And Technology Agency | Audio signal compression and decoding using band division and polynomial approximation |
KR101756834B1 (en) * | 2008-07-14 | 2017-07-12 | 삼성전자주식회사 | Method and apparatus for encoding and decoding of speech and audio signal |
US8219408B2 (en) * | 2008-12-29 | 2012-07-10 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8140342B2 (en) * | 2008-12-29 | 2012-03-20 | Motorola Mobility, Inc. | Selective scaling mask computation based on peak detection |
US8200496B2 (en) * | 2008-12-29 | 2012-06-12 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8175888B2 (en) | 2008-12-29 | 2012-05-08 | Motorola Mobility, Inc. | Enhanced layered gain factor balancing within a multiple-channel audio coding system |
CN102067211B (en) * | 2009-03-11 | 2013-04-17 | 华为技术有限公司 | Linear prediction analysis method, device and system |
CA2778323C (en) | 2009-10-20 | 2016-09-20 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values |
SG182467A1 (en) | 2010-01-12 | 2012-08-30 | Fraunhofer Ges Forschung | Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a hash table describing both significant state values and interval boundaries |
US8423355B2 (en) * | 2010-03-05 | 2013-04-16 | Motorola Mobility Llc | Encoder for audio signal including generic audio and speech frames |
US8428936B2 (en) * | 2010-03-05 | 2013-04-23 | Motorola Mobility Llc | Decoder for audio signal including generic audio and speech frames |
WO2011122875A2 (en) * | 2010-03-31 | 2011-10-06 | 한국전자통신연구원 | Encoding method and device, and decoding method and device |
ES2600313T3 (en) * | 2010-10-07 | 2017-02-08 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for estimating the level of audio frames encoded in a bitstream domain |
WO2013057895A1 (en) * | 2011-10-19 | 2013-04-25 | パナソニック株式会社 | Encoding device and encoding method |
US9129600B2 (en) | 2012-09-26 | 2015-09-08 | Google Technology Holdings LLC | Method and apparatus for encoding an audio signal |
CN104934034B (en) * | 2014-03-19 | 2016-11-16 | 华为技术有限公司 | Method and apparatus for signal processing |
KR102244612B1 (en) * | 2014-04-21 | 2021-04-26 | 삼성전자주식회사 | Appratus and method for transmitting and receiving voice data in wireless communication system |
CN106448688B (en) | 2014-07-28 | 2019-11-05 | 华为技术有限公司 | Audio coding method and relevant apparatus |
EP3079151A1 (en) * | 2015-04-09 | 2016-10-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder and method for encoding an audio signal |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4817157A (en) | 1988-01-07 | 1989-03-28 | Motorola, Inc. | Digital speech coder having improved vector excitation source |
US5040217A (en) | 1989-10-18 | 1991-08-13 | At&T Bell Laboratories | Perceptual coding of audio signals |
US5148489A (en) * | 1990-02-28 | 1992-09-15 | Sri International | Method for spectral estimation to improve noise robustness for speech recognition |
US5317672A (en) | 1991-03-05 | 1994-05-31 | Picturetel Corporation | Variable bit rate speech encoder |
US5187745A (en) * | 1991-06-27 | 1993-02-16 | Motorola, Inc. | Efficient codebook search for CELP vocoders |
US5179594A (en) * | 1991-06-12 | 1993-01-12 | Motorola, Inc. | Efficient calculation of autocorrelation coefficients for CELP vocoder adaptive codebook |
US5285498A (en) | 1992-03-02 | 1994-02-08 | At&T Bell Laboratories | Method and apparatus for coding audio signals based on perceptual model |
US5272529A (en) * | 1992-03-20 | 1993-12-21 | Northwest Starscan Limited Partnership | Adaptive hierarchical subband vector quantization encoder |
US5664057A (en) | 1993-07-07 | 1997-09-02 | Picturetel Corporation | Fixed bit rate speech encoder/decoder |
US5533052A (en) | 1993-10-15 | 1996-07-02 | Comsat Corporation | Adaptive predictive coding with transform domain quantization based on block size adaptation, backward adaptive power gain control, split bit-allocation and zero input response compensation |
CA2137756C (en) * | 1993-12-10 | 2000-02-01 | Kazunori Ozawa | Voice coder and a method for searching codebooks |
US5651090A (en) * | 1994-05-06 | 1997-07-22 | Nippon Telegraph And Telephone Corporation | Coding method and coder for coding input signals of plural channels using vector quantization, and decoding method and decoder therefor |
US5956674A (en) * | 1995-12-01 | 1999-09-21 | Digital Theater Systems, Inc. | Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels |
US6041297A (en) * | 1997-03-10 | 2000-03-21 | At&T Corp | Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations |
US6351730B2 (en) * | 1998-03-30 | 2002-02-26 | Lucent Technologies Inc. | Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment |
-
1998
- 1998-09-04 CA CA002246532A patent/CA2246532A1/en not_active Abandoned
- 1998-09-04 US US09/146,752 patent/US6704705B1/en not_active Expired - Lifetime
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1676264A2 (en) * | 2003-09-29 | 2006-07-05 | Sony Electronics Inc. | A method of making a window type decision based on mdct data in audio encoding |
EP1676264A4 (en) * | 2003-09-29 | 2008-02-20 | Sony Electronics Inc | A method of making a window type decision based on mdct data in audio encoding |
CN110047499A (en) * | 2013-01-29 | 2019-07-23 | 弗劳恩霍夫应用研究促进协会 | Low complex degree tone adaptive audio signal quantization |
US11694701B2 (en) | 2013-01-29 | 2023-07-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Low-complexity tonality-adaptive audio signal quantization |
CN110047499B (en) * | 2013-01-29 | 2023-08-29 | 弗劳恩霍夫应用研究促进协会 | Low Complexity Pitch Adaptive Audio Signal Quantization |
Also Published As
Publication number | Publication date |
---|---|
US6704705B1 (en) | 2004-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2246532A1 (en) | Perceptual audio coding | |
EP0905680B1 (en) | Method for quantizing LPC parameters using switched-predictive quantization | |
KR101343267B1 (en) | Method and apparatus for audio coding and decoding using frequency segmentation | |
EP2346030B1 (en) | Audio encoder, method for encoding an audio signal and computer program | |
KR101330362B1 (en) | Modification of codewords in dictionary used for efficient coding of digital media spectral data | |
CN102089808B (en) | Audio encoder, audio decoder and methods for encoding and decoding audio signal | |
US7325023B2 (en) | Method of making a window type decision based on MDCT data in audio encoding | |
RU2505921C2 (en) | Method and apparatus for encoding and decoding audio signals (versions) | |
KR20000010994A (en) | Audio signal coding and decoding methods and audio signal coder and decoder | |
KR20070017524A (en) | Encoding device, decoding device, and method thereof | |
US6889185B1 (en) | Quantization of linear prediction coefficients using perceptual weighting | |
EP1673765B1 (en) | A method for grouping short windows in audio encoding | |
CN102419977A (en) | Method for discriminating transient audio signals | |
EP0899720B1 (en) | Quantization of linear prediction coefficients | |
KR101393301B1 (en) | Method and apparatus for quantization and de-quantization of the Linear Predictive Coding coefficients | |
KR100188912B1 (en) | Bit reassigning method of subband coding | |
JPH10268897A (en) | Signal coding method and device therefor | |
EP0612159B1 (en) | An enhancement method for a coarse quantizer in the ATRAC | |
JP2842276B2 (en) | Wideband signal encoding device | |
Najafzadeh et al. | Perceptual bit allocation for low rate coding of narrowband audio | |
CN101271691B (en) | Time-domain noise reshaping instrument start-up judging method and device | |
KR101512320B1 (en) | Method and apparatus for quantization and de-quantization | |
Kemp et al. | LPC parameter quantization at 600, 800 and 1200 bits per second | |
KR20130047630A (en) | Apparatus and method for coding signal in a communication system | |
KR100300963B1 (en) | Linked scalar quantizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
FZDE | Discontinued |