CA2246532A1 - Perceptual audio coding - Google Patents

Perceptual audio coding Download PDF

Info

Publication number
CA2246532A1
CA2246532A1 CA002246532A CA2246532A CA2246532A1 CA 2246532 A1 CA2246532 A1 CA 2246532A1 CA 002246532 A CA002246532 A CA 002246532A CA 2246532 A CA2246532 A CA 2246532A CA 2246532 A1 CA2246532 A1 CA 2246532A1
Authority
CA
Canada
Prior art keywords
band
energy
codebook
codevector
codevectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002246532A
Other languages
French (fr)
Inventor
Peter Kabal
Hossein Najafzadeh-Azghandi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nortel Networks Ltd
Original Assignee
Nortel Networks Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nortel Networks Corp filed Critical Nortel Networks Corp
Priority to US09/146,752 priority Critical patent/US6704705B1/en
Priority to CA002246532A priority patent/CA2246532A1/en
Publication of CA2246532A1 publication Critical patent/CA2246532A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms

Abstract

A method and apparatus for perceptual audio coding. The method and apparatus provide high-quality sound for coding rates down to and below 1 bit/sample for a wide variety of input signals including speech, music and background noise. The invention provides a new distortion measure for coding the input speech and training the codebooks, where the distortion measure is based on a masking spectrum of the input frequency spectrum. The invention also provides a method for direct calculation of masking thresholds from a modified discrete cosine transform of the input signal. The invention also provides a predictive and non-predictive vector quantizer for determining the energy of the coefficients representing the frequency spectrum. As well, the invention provides a split vector quantizer for quantizing the fine structure of coefficients representing the frequency spectrum. Bit allocation for the split vector quantizer is based on the masking threshold. The split vector quantizer also makes use of embedded codebooks.
Furthermore, the invention makes use of a new transient detection method for selection of input windows.

Claims (86)

1. A method of transmitting a discretly represented frequency signal within a frequency band, said signal discretely represented by coefficients at certain frequencies within said band, comprising the steps of:
(a) providing a codebook of codevectors for said band, each codevector having an element for each of said certain frequencies;
(b) obtaining a masking threshold for said frequency signal;
(c) for each one of a plurality of codevectors in said codebook, obtaining a distortion measure by the steps of:
for each of said coefficients of said frequency signal (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain said distortion measure;
(d) selecting a codevector having a smallest distortion measure;
(e) transmitting an index to said selected codevector.
2. The method of claim 1 wherein said codevectors are normalised with respect to energy and wherein step (c)(i) of obtaining a representation of a difference between a given coefficient of said frequency signal and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with a measure of energy in said signal and including the step of:
(f) transmitting an indication of energy in said signal.
3. The method of claim 2 wherein said step of obtaining a masking threshold comprises convolving a measure of energy in said signal with a known spreading function.
4. The method of claim 3 wherein said step of obtaining a maksing threshold further comprises adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.
5. A method of transmitting a discretely represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
(a) grouping said coefficients into frequency bands;
(b) for each band - providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
- obtaining a representation of energy of coefficients in said each band;
- selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
- selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an index to said selected codevector;
(d) concatenating said selected codevector addresses; and (e) transmitting said concatenated codevector addresses and an indication of each said representation of energy.
6. The method of claim 5 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.
7. The method of claim 6 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.
8. The method of claim 7 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.
9. The method of claim 5 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.
10. The method of claim 5 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:

- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
11. The method of claim 10 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.
12. The method of claim 5 wherein each said codebook is sorted so as to provide sets of codevectors addressed by corresponding sets of addresses such that each larger set of addresses addresses a larger set of codevectors which span a frequency spectrum of said each band with increasingly less granularity.
13. A method of transmitting a discretely represented time series comprising the steps of:
- obtaining a frame of time samples;

- obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
- grouping said coefficients into frequency bands;
- for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
- concatenating said selected codevector addresses; and - transmitting said concatenated codevector addresses and an indication of each said representation of energy.
14. The method of claim 13 wherein said step of obtaining a representation of energy of coefficients in said each band comprises the steps of:
- determining an indication of energy for said band;
- determining an average energy for said band;
- quantising said average energy by finding an entry in an average energy codebook which, when adjusted with a representation of average energy from a frequency representation for a previous frame, best approximates said average energy;
- normalising said energy indication with respect to said quantised approximation of said average energy;
- quantising said normalised energy indication by manipulating a normalised energy indication from a frequency representation for said previous frame with each of a number of prediction matrices and selecting a prediction matrix resulting in a quantised normalised energy indication which best approximates said normalised energy indication;
- obtaining said representation of energy from said quantised normalised energy.
15. The method of claim 13 including the steps of:
- obtaining an index to said entry in said average energy codebook;
- obtaining an index to said selected prediction matrix;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises - transmitting said average energy codebook index; and - transmitting said selected prediction matrix index.
16. The method of claim 15 including the steps of:
- obtaining an actual residual from a difference between said quantised normalised energy indication and said normalised energy indication;
- comparing said actual residual to a residual codebook to find a quantised residual which is a best approximation said actual residual;
- adjusting said quantised normalised energy with said quantised residual;

and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said a combination of said quantised normalised energy and said quantised residual.
17. The method of claim 16 including the steps of:
- obtaining an actual second residual from a difference between (i) said combination of said quantised normalised energy and said quantised residual and (ii) said normalised energy indication;
- comparing said actual second residual to a second residual codebook to find a quantised second residual which is a best approximation of said actual second residual;
adjusting said combination with said quantised second residual to obtain a further combination;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said further combination.
18. The method of claim 17 including the step of obtaining an index to said quantised residual in said residual codebook and an index to said quantised second residual in said second residual codebook;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises transmitting said quantised residual index and said quantised second residual index.
19. The method of claim 18 wherein said step of obtaining a representation of energy comprises unnormalising said further combination with said quantised average energy.
20. The method of claim 13 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.
21. The method of claim 20 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.
22. The method of claim 21 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.
23. The method of claim 13 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.
24. The method of claim 13 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.
25. The method of claim 24 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.
26. A method of receiving a discretly represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
- providing pre-defined frequency bands;
- for each band providing a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
- receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
- determining a length of address for each band based on said per band indication of a representation of energy;

- parsing said concatenated codevector addresses based on said address length determining step;
- addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
27. A transmitter comprising:
means for obtaining a frame of time samples;
means for obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
means for grouping said coefficients into frequency bands;
means for, for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
means for concatenating said selected codevector addresses; and means for transmitting said concatenated codevector addresses and an indication of each said representation of energy.
28. A receiver comprising:
means for providing pre-defined frequency bands;
a memory storing, for each band, a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
means for receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
means for determining a length of address for each band based on said per band indication of a representation of energy;
means for parsing said concatenated codevector addresses based on said address length determining step;
means for addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.
29. A method of obtaining a codebook of codevectors which span a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;

- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and populating said codebook with said estimated codevectors resulting after a last iteration.
30 . The method of claim 29 wherein each distortion measure is obtained by the steps of:
- for each element of said training vector (i) obtaining a representation of a difference between a corresponding element of said one estimated codevector and (ii) reducing said difference by a masking threshold of said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
31. The method of claim 30 wherein said masking threshold is obtained by convolving a measure of energy in said training vector with a known spreading function.
32. The method of claim 31 wherein said masking threshold is obtained by adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.
33. The method of claim 32 wherein said estimated codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given element of said training vector and a corresponding element of said one estimated codevector comprises obtaining a squared difference between said given element and said corresponding element after unnormalising said corresponding element with a measure of energy in said training vector
34. The method of claim 33 wherein said step of determining a centroid for a Voronoi region comprises finding a candidate vector within said region which generates a minimum value for a sum of distortion measures between said candidate vector and each training vector in said region.
35. The method of claim 34 wherein each distortion measure in said sum of distortion measures is obtained by the steps of:
- for each training vector, for each element of said each training vector (i) obtaining a representation of a difference between a corresponding element of said candidate vector and (ii) reducing said difference by a masking threshold for said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.
36. The method of claim 29 wherein said estimated codevectors with which said codebook is populated is a first set of codevectors and wherein said codebook is enlarged by the steps of:
- fixing said first set of estimated codevectors;
- receiving an initial second set of estimated codevectors;
- associating each training vector with one estimated codevector from said first set or said second set with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for Voronoi region containing an estimated codevector from said second set;
- selecting each centroid vector as a new estimated second set codevector;
- repeating from said associating step until a difference between new estimated second set codevectors and estimated second set codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated second set codevectors resulting after a last iteration.
37. The method of claim 36 including the step of sorting said second set estimated codevectors to an end of said codebook whereby to obtain an embedded codebook.
38. A method of generating an embedded codebook for a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
(a) obtaining an optimized larger first codebook of codevectors which span said frequency band;
(b) obtaining an optimized smaller second codebook of codevectors which span said frequency band;
(c) fording codevectors in said first codebook which best approximate each entry in said second codebook;

(d) sorting said first codebook to place said codevectors found in step (c) at a front of said first codebook.
39. The method of claim 38 wherein each step of obtaining an optimized codebook comprises the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated codevectors resulting after a last iteration.
40. The method of claim 39 wherein step (c) comprises utilising a least squares method to find codevectors in said first codebook which best approximate each entry in said second codebook.
41. A method for allocating encoding bits to bands within the frequency spectrum in a perceptual audio coding transmitter, said transmitter having a split VQ unit, said method comprising the steps of:
(A) receiving at least one masking threshold and at least one spectral energy for each band;
(B) allocating bits to each band based on said masking threshold and spectral energy for each band; and (C) transmitting the bit allocation for each band to the split VQ unit.
42. The method of claim 41 wherein the step of allocating bits to each band based on said masking threshold and spectral energy for each band further comprises the steps of:
(B.1) calculating a gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold and subtracting the ratio of the (bits already allocated to that band) to (the coefficients in that band, multiplied by some constant);
(B.2) allocating a bit to the band with the highest gap value; and (B.3) repeating steps B.1 and B.2 until all bits available for transmission have been allocated.
43. The method of claim 42 further comprising the step of:
(A.1) calculating a first approximation of the number of bits to be allocated to each band.
44. The method of claim 43 wherein the step of calculating a first approximation of the number of bits to be allocated to each band comprises the steps of:
(A.1.1) calculating a second gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold for that band;
(A.1.2) approximating the number of bits for each band as equal a second ratio of the second gap value times the number of coefficients in the band times the total number of bits available for transmission to the sum over all bands of the product of the second gap value times the number of coefficients in the band;
(A.1.3) discarding the fractional results of the second ratio to yield an integer second ratio; and (A.1.4) allocating to each band as a first approximation said integer second ratio.
45. A method of selecting a window for calculating frequency domain coefficients in a perceptual audio coding transmitter, said method comprising the steps of:
(A) receiving a series of time samples of the input signal;
(B) determining when a strong positive transient occurs in said series; and, (C) switching to a different window when a strong positive transient is detected.
46. The method of claim 45 wherein the step of determining when a strong positive transient occurs in said series comprises the steps of:
(B.1) calculating for a set of n successive time samples in said series the sum of the squares of the amplitudes for the three successive time samples to yield a first sum;
(B.2) calculating for the next n successive time samples in said series the sum of the squares of the amplitudes of the next three successive time samples to yield a second sum;
(B.3) calculating a ratio of the first sum less the second sum to the first sum;
(B.4) determining a strong positive transient has occurred when said ratio exceeds a threshold value;
47. The method of claim 46 wherein n has the value 3.
48. The method of claim 45 wherein said different window is a first transitional window.
49. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a series of short windows when a strong positive transient is detected in said next series.
50. The method of claim 49 wherein the series of short windows is a set of three short windows.
51. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a second transitional window when a strong positive transient is not detected in said next series.
52. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a series of short windows when a strong positive transient is detected in said second next series.
53. The method of claim 52 wherein the series of short windows is a set of three short windows.
54. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a second transitional window when a strong positive transient is not detected in said second next series.
55. The method of claim 46 wherein said threshold value is 5.
56. In a perceptual audio coder, a method for calculating the masking threshold for a band, said band being one of a plurality of bands in a frame, said method comprising the steps of (A) receiving an input frame;
(B) calculating MDCT coefficients for each band of said frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(D) convolving a normalized spreading function with said power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
57. The method of claim 56, wherein said offset measure is calculated from the band number and a spectral flatness measure.
58. The method of claim 56 wherein said spectral flatness measure is 0.5.
59. The method of claim 57 wherein said spectral flatness measure is the ratio of the geometric mean of the MDCT coefficients to the arithmetic mean of the MDCT
coefficients.
60. The method of claim 59 wherein the offset is calculated according to the equation:

61. The method of claim 56, wherein said spreading function is normalized by:
(I) calculating the overall gain due to the unnormalized spreading function;
(II) dividing unnormalized spreading function values by the overall gain due to the spreading function.
62. The method of claim 60, wherein the unnormalized spreading function is:

F i=5.5(1-a) + (14.5 + i) a Where F i is the offset for the ith band; and a is the spectral flatness measure for the frame.
63. In a perceptual audio coder, a method for calculating the masking threshold for a band, said method comprising the steps of:
(A) receiving an input frame;
(B) calculating MDCT coefficients for each band of the frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(C.1) calculating a quantized spectral energy for each band from said spectral energy for each band;
(D) convolving a normalized spreading function with said quantized power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.
64. In a perceptual audio coding transmitter, a method for quantizing the spectral energy of MDCT coefficients in a band of a frame comprising the steps of:

(A) receiving MDCT coefficients for each band in the frame;
(B) calculating the energy in each band from the MDCT coefficients;
(C) calculating a quantized value for the average energy of the frame;
(D) calculating a normalized energy vector for the frame by subtracting in the log domain the quantized value of the average energy of the frame from the energy in each band;
(E) determining a best prediction matrix to predict the normalized energy vector;
(F) calculating a first residual vector from the best predicted normalized energy vector and the normalized energy vector for each band;
(G) finding a first codevector which most closely matches the first residual vector;
(H) calculating and storing the normalized quantized energy vector for the frame; and, (I) transmitting the indices of the quantized energy, prediction matrix and first codevector to the receiver.
65. The method of claim 64 wherein the step of calculating the energy in each band from the MDCT coefficients comprises the step of:
(B.1) taking the sum of the squares of the absolute values of the MDCT
coefficients in the band.
66. The method of claim 64 wherein the step of calculating a quantized value for the average energy of the frame comprises the steps of:
(C.1) converting the energy in each band to the logarithmic domain;
(C.2) calculating the average log energy of the power spectrum by taking the sum of energy in each band and dividing by the number of bands;
(C.3) calculating a product of a leakage factor and the quantized value of the average log energy for the previous frame;
(C.4) subtracting this product from the average log energy of the power spectrum to yield a difference;
(C.5) finding the best match in a codebook to said difference; and, (C.6) adding the best match to said product to yield the quantized value for the average energy of the frame;
67. The method of claim 64 wherein the step of determining a best prediction matrix to predict the normalized energy vector for all bands comprises the steps of:
(E.1) finding the prediction matrix which when multiplied by the normalized quantized energy vector of the previous frame gives the closest match to the normalized energy vector of the current frame;
(E.2) calculating a best predicted normalized energy vector by multiplying the prediction matrix which gives the closest match by the normalized quantized energy vector of the previous frame;
68. The method of claim 67 wherein said prediction matrices are tridiagonal.
69. The method of claim 64 wherein the step of calculating a residual vector from the best predicted normalized energy vector and the normalized energy for each band comprises the step of subtracting the best predicted normalized energy from the normalized energy for each band.
70. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector.
71. The method of claim 64 further comprising the steps of (I) calculating a second residual vector by subtracting the first codevector which most closely matches the first residual vector from the first residual vector;
(J) finding a second codevector most closely matches the second residual vector; and, (K) transmitting the index to the second codevector to the receiver.
72. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector and to the codevector.
73. In a perceptual audio coding transmitter, a method for vector quantizing the MDCT
coefficients, said coefficients belonging to bands, said method comprising the steps of:
(A) receiving MDCT coefficients for each band;
(B) for each band:
(B.1) selecting a codevector that is the best match to the received MDCT coefficients for that band from a codebook;
(C) transmitting the indices for the selected codevectors to the receiver.
74. The method of claim 73 wherein the step of selecting a codevector from a codebook that is the best match to the received MDCT coefficients for that band further comprises the step of selecting the codevector that minimizes the energy between the codevector coefficients and the dead zone.
75. The method of claim 75 wherein the codevector that minimizes the energy between the codevector coefficients and the deadband satisfies the equation:

D i= ~max[0, E k(i) - t iu]

(sum over all coefficients in the ith critical band) Where the max function takes the larger value of the two arguments
76. The method of claim 73 further comprising the steps of:

(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from a codebook having 2b codevectors.
77. The method of claim 73 further comprising the steps of:
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from the first 2b codevectors in the codebook.
78. The method of claim 73 wherein at least one band comprises a plurality of critical bands.
79. In a perceptual audio coding system, a method of training the codebook in which the distortion measure used to select the codebok vectors for the codebook is calculated using the masking threshold.
80. The claim of claim 79 further comprising the steps of:
(A) producing a set of training vectors;
(B) calculating from each training vector a set of MDCT coefficients;
(C) calculating for each training vector a masking threshold for each band;
(D) making an estimate of codevectors for the codebook;

(E) calculating a distortion measure by calculating the energy of the difference between the MDCT coefficients for the training vector and the deadband surrounding the coefficients for the estimated codevectors;
(F) associating the coefficients within each band of each training vector with the estimated codevector that minimizes said distortion measure;
(G) calculating the centroid of each associated group;
(H) replacing the estimated codevectors by the centroids of each group;
(I) repeating steps (E) - (H) until the difference between successive estimated codevectors is small;
(J) populating the codebook with the estimated codevectors.
81. The method of claim 80 wherein the distortion method is calculated according to the equation:

D i = ~ max [0, E k(i) - t iu]

(sum over all coefficients in the i th critical band) Where the max function takes the larger value of the two arguments
82. The method of claim 80 wherein the centroid for each group is calculated according to the equation:
Xbest k (i) is that providing min ~~ max [0, (X k (i) - (G i)(0.5)Xbest k(i))2 - t iu) where ~ is a sum over all training vectors in the jth Voronoi region
83. The method of claim 80 wherein the difference between successive estimated codevectors is small when a least squares difference between successive estimated codevectors is less than a threshold value, namely 10-4.
84. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2d codevectors;
(B) training a codebook having 2e codevectors, where a is less than d;
(C) finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook; and, (D) sorting the 2d codevectors so that the closest 2e are placed in the first 2e portion of the codebook
85. The method of claim 84 wherein the step of finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook comprises the steps of:
(C.1) calculating the mean square difference between each codevector in the 2d element codebook and each of the codevectors in the 2d element codebook.

(C.2) selecting the codevector in the 2d element codebook which has the least mean square difference to each codevector in the 2e element codebook.
86. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2f codevectors;
(B) estimating (2g - 2f) additional codevectors, where g is greater than f;
(C) forming a set of 2g codevectors from step (A) and from the (2g - 2f) additional estimated codevectors from step (B);
(D) determining the Voronoi regions for said set;
(E) determining the centroid of the Voronoi regions for the (2g - 2f) additional estimated codevectors;
(F) replacing the additional estimated codevectors by the centroids of their Voronoi regions;
(G) repeating steps (D) - (F) until the difference between successive additional estimated codevectors is small.

(H) populating a new 2g element codebook with the 2f codevectors from step (A) in a bottom 2f positions of said new 2g element codebook and populating the 2f + 1 to 2g positions of the codebook with the additional estimated codevectors.
CA002246532A 1998-09-04 1998-09-04 Perceptual audio coding Abandoned CA2246532A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US09/146,752 US6704705B1 (en) 1998-09-04 1998-09-04 Perceptual audio coding
CA002246532A CA2246532A1 (en) 1998-09-04 1998-09-04 Perceptual audio coding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/146,752 US6704705B1 (en) 1998-09-04 1998-09-04 Perceptual audio coding
CA002246532A CA2246532A1 (en) 1998-09-04 1998-09-04 Perceptual audio coding

Publications (1)

Publication Number Publication Date
CA2246532A1 true CA2246532A1 (en) 2000-03-04

Family

ID=32471057

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002246532A Abandoned CA2246532A1 (en) 1998-09-04 1998-09-04 Perceptual audio coding

Country Status (2)

Country Link
US (1) US6704705B1 (en)
CA (1) CA2246532A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1676264A2 (en) * 2003-09-29 2006-07-05 Sony Electronics Inc. A method of making a window type decision based on mdct data in audio encoding
CN110047499A (en) * 2013-01-29 2019-07-23 弗劳恩霍夫应用研究促进协会 Low complex degree tone adaptive audio signal quantization

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3507743B2 (en) * 1999-12-22 2004-03-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Digital watermarking method and system for compressed audio data
TW521266B (en) * 2000-07-13 2003-02-21 Verbaltek Inc Perceptual phonetic feature speech recognition system and method
US20040002859A1 (en) * 2002-06-26 2004-01-01 Chi-Min Liu Method and architecture of digital conding for transmitting and packing audio signals
KR100462611B1 (en) * 2002-06-27 2004-12-20 삼성전자주식회사 Audio coding method with harmonic extraction and apparatus thereof.
US7724827B2 (en) * 2003-09-07 2010-05-25 Microsoft Corporation Multi-layer run level encoding and decoding
US7426462B2 (en) * 2003-09-29 2008-09-16 Sony Corporation Fast codebook selection method in audio encoding
US7349842B2 (en) * 2003-09-29 2008-03-25 Sony Corporation Rate-distortion control scheme in audio encoding
US7630902B2 (en) * 2004-09-17 2009-12-08 Digital Rise Technology Co., Ltd. Apparatus and methods for digital audio coding using codebook application ranges
US7668715B1 (en) 2004-11-30 2010-02-23 Cirrus Logic, Inc. Methods for selecting an initial quantization step size in audio encoders and systems using the same
US7627481B1 (en) * 2005-04-19 2009-12-01 Apple Inc. Adapting masking thresholds for encoding a low frequency transient signal in audio data
US7885809B2 (en) * 2005-04-20 2011-02-08 Ntt Docomo, Inc. Quantization of speech and audio coding parameters using partial information on atypical subsequences
US7418394B2 (en) * 2005-04-28 2008-08-26 Dolby Laboratories Licensing Corporation Method and system for operating audio encoders utilizing data from overlapping audio segments
US8599925B2 (en) * 2005-08-12 2013-12-03 Microsoft Corporation Efficient coding and decoding of transform blocks
US8630849B2 (en) * 2005-11-15 2014-01-14 Samsung Electronics Co., Ltd. Coefficient splitting structure for vector quantization bit allocation and dequantization
US7461106B2 (en) * 2006-09-12 2008-12-02 Motorola, Inc. Apparatus and method for low complexity combinatorial coding of signals
CN101308655B (en) * 2007-05-16 2011-07-06 展讯通信(上海)有限公司 Audio coding and decoding method and layout design method of static discharge protective device and MOS component device
US7774205B2 (en) * 2007-06-15 2010-08-10 Microsoft Corporation Coding of sparse digital media spectral data
JP5434592B2 (en) * 2007-06-27 2014-03-05 日本電気株式会社 Audio encoding method, audio decoding method, audio encoding device, audio decoding device, program, and audio encoding / decoding system
CN101790756B (en) * 2007-08-27 2012-09-05 爱立信电话股份有限公司 Transient detector and method for supporting encoding of an audio signal
ES2375192T3 (en) 2007-08-27 2012-02-27 Telefonaktiebolaget L M Ericsson (Publ) CODIFICATION FOR IMPROVED SPEECH TRANSFORMATION AND AUDIO SIGNALS.
US8576096B2 (en) * 2007-10-11 2013-11-05 Motorola Mobility Llc Apparatus and method for low complexity combinatorial coding of signals
US8209190B2 (en) * 2007-10-25 2012-06-26 Motorola Mobility, Inc. Method and apparatus for generating an enhancement layer within an audio coding system
US20090234642A1 (en) * 2008-03-13 2009-09-17 Motorola, Inc. Method and Apparatus for Low Complexity Combinatorial Coding of Signals
US8639519B2 (en) * 2008-04-09 2014-01-28 Motorola Mobility Llc Method and apparatus for selective signal coding based on core encoder performance
US8666733B2 (en) * 2008-06-26 2014-03-04 Japan Science And Technology Agency Audio signal compression and decoding using band division and polynomial approximation
KR101756834B1 (en) * 2008-07-14 2017-07-12 삼성전자주식회사 Method and apparatus for encoding and decoding of speech and audio signal
US8219408B2 (en) * 2008-12-29 2012-07-10 Motorola Mobility, Inc. Audio signal decoder and method for producing a scaled reconstructed audio signal
US8140342B2 (en) * 2008-12-29 2012-03-20 Motorola Mobility, Inc. Selective scaling mask computation based on peak detection
US8200496B2 (en) * 2008-12-29 2012-06-12 Motorola Mobility, Inc. Audio signal decoder and method for producing a scaled reconstructed audio signal
US8175888B2 (en) 2008-12-29 2012-05-08 Motorola Mobility, Inc. Enhanced layered gain factor balancing within a multiple-channel audio coding system
CN102067211B (en) * 2009-03-11 2013-04-17 华为技术有限公司 Linear prediction analysis method, device and system
CA2778323C (en) 2009-10-20 2016-09-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
SG182467A1 (en) 2010-01-12 2012-08-30 Fraunhofer Ges Forschung Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a hash table describing both significant state values and interval boundaries
US8423355B2 (en) * 2010-03-05 2013-04-16 Motorola Mobility Llc Encoder for audio signal including generic audio and speech frames
US8428936B2 (en) * 2010-03-05 2013-04-23 Motorola Mobility Llc Decoder for audio signal including generic audio and speech frames
WO2011122875A2 (en) * 2010-03-31 2011-10-06 한국전자통신연구원 Encoding method and device, and decoding method and device
ES2600313T3 (en) * 2010-10-07 2017-02-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for estimating the level of audio frames encoded in a bitstream domain
WO2013057895A1 (en) * 2011-10-19 2013-04-25 パナソニック株式会社 Encoding device and encoding method
US9129600B2 (en) 2012-09-26 2015-09-08 Google Technology Holdings LLC Method and apparatus for encoding an audio signal
CN104934034B (en) * 2014-03-19 2016-11-16 华为技术有限公司 Method and apparatus for signal processing
KR102244612B1 (en) * 2014-04-21 2021-04-26 삼성전자주식회사 Appratus and method for transmitting and receiving voice data in wireless communication system
CN106448688B (en) 2014-07-28 2019-11-05 华为技术有限公司 Audio coding method and relevant apparatus
EP3079151A1 (en) * 2015-04-09 2016-10-12 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder and method for encoding an audio signal

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4817157A (en) 1988-01-07 1989-03-28 Motorola, Inc. Digital speech coder having improved vector excitation source
US5040217A (en) 1989-10-18 1991-08-13 At&T Bell Laboratories Perceptual coding of audio signals
US5148489A (en) * 1990-02-28 1992-09-15 Sri International Method for spectral estimation to improve noise robustness for speech recognition
US5317672A (en) 1991-03-05 1994-05-31 Picturetel Corporation Variable bit rate speech encoder
US5187745A (en) * 1991-06-27 1993-02-16 Motorola, Inc. Efficient codebook search for CELP vocoders
US5179594A (en) * 1991-06-12 1993-01-12 Motorola, Inc. Efficient calculation of autocorrelation coefficients for CELP vocoder adaptive codebook
US5285498A (en) 1992-03-02 1994-02-08 At&T Bell Laboratories Method and apparatus for coding audio signals based on perceptual model
US5272529A (en) * 1992-03-20 1993-12-21 Northwest Starscan Limited Partnership Adaptive hierarchical subband vector quantization encoder
US5664057A (en) 1993-07-07 1997-09-02 Picturetel Corporation Fixed bit rate speech encoder/decoder
US5533052A (en) 1993-10-15 1996-07-02 Comsat Corporation Adaptive predictive coding with transform domain quantization based on block size adaptation, backward adaptive power gain control, split bit-allocation and zero input response compensation
CA2137756C (en) * 1993-12-10 2000-02-01 Kazunori Ozawa Voice coder and a method for searching codebooks
US5651090A (en) * 1994-05-06 1997-07-22 Nippon Telegraph And Telephone Corporation Coding method and coder for coding input signals of plural channels using vector quantization, and decoding method and decoder therefor
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
US6041297A (en) * 1997-03-10 2000-03-21 At&T Corp Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations
US6351730B2 (en) * 1998-03-30 2002-02-26 Lucent Technologies Inc. Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1676264A2 (en) * 2003-09-29 2006-07-05 Sony Electronics Inc. A method of making a window type decision based on mdct data in audio encoding
EP1676264A4 (en) * 2003-09-29 2008-02-20 Sony Electronics Inc A method of making a window type decision based on mdct data in audio encoding
CN110047499A (en) * 2013-01-29 2019-07-23 弗劳恩霍夫应用研究促进协会 Low complex degree tone adaptive audio signal quantization
US11694701B2 (en) 2013-01-29 2023-07-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Low-complexity tonality-adaptive audio signal quantization
CN110047499B (en) * 2013-01-29 2023-08-29 弗劳恩霍夫应用研究促进协会 Low Complexity Pitch Adaptive Audio Signal Quantization

Also Published As

Publication number Publication date
US6704705B1 (en) 2004-03-09

Similar Documents

Publication Publication Date Title
CA2246532A1 (en) Perceptual audio coding
EP0905680B1 (en) Method for quantizing LPC parameters using switched-predictive quantization
KR101343267B1 (en) Method and apparatus for audio coding and decoding using frequency segmentation
EP2346030B1 (en) Audio encoder, method for encoding an audio signal and computer program
KR101330362B1 (en) Modification of codewords in dictionary used for efficient coding of digital media spectral data
CN102089808B (en) Audio encoder, audio decoder and methods for encoding and decoding audio signal
US7325023B2 (en) Method of making a window type decision based on MDCT data in audio encoding
RU2505921C2 (en) Method and apparatus for encoding and decoding audio signals (versions)
KR20000010994A (en) Audio signal coding and decoding methods and audio signal coder and decoder
KR20070017524A (en) Encoding device, decoding device, and method thereof
US6889185B1 (en) Quantization of linear prediction coefficients using perceptual weighting
EP1673765B1 (en) A method for grouping short windows in audio encoding
CN102419977A (en) Method for discriminating transient audio signals
EP0899720B1 (en) Quantization of linear prediction coefficients
KR101393301B1 (en) Method and apparatus for quantization and de-quantization of the Linear Predictive Coding coefficients
KR100188912B1 (en) Bit reassigning method of subband coding
JPH10268897A (en) Signal coding method and device therefor
EP0612159B1 (en) An enhancement method for a coarse quantizer in the ATRAC
JP2842276B2 (en) Wideband signal encoding device
Najafzadeh et al. Perceptual bit allocation for low rate coding of narrowband audio
CN101271691B (en) Time-domain noise reshaping instrument start-up judging method and device
KR101512320B1 (en) Method and apparatus for quantization and de-quantization
Kemp et al. LPC parameter quantization at 600, 800 and 1200 bits per second
KR20130047630A (en) Apparatus and method for coding signal in a communication system
KR100300963B1 (en) Linked scalar quantizer

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued