CA2246532A1

CA2246532A1 - Perceptual audio coding

Info

Publication number: CA2246532A1
Application number: CA002246532A
Authority: CA
Inventors: Peter Kabal; Hossein Najafzadeh-Azghandi
Original assignee: Nortel Networks Corp
Current assignee: Nortel Networks Ltd
Priority date: 1998-09-04
Filing date: 1998-09-04
Publication date: 2000-03-04
Also published as: US6704705B1

Abstract

A method and apparatus for perceptual audio coding. The method and apparatus provide high-quality sound for coding rates down to and below 1 bit/sample for a wide variety of input signals including speech, music and background noise. The invention provides a new distortion measure for coding the input speech and training the codebooks, where the distortion measure is based on a masking spectrum of the input frequency spectrum. The invention also provides a method for direct calculation of masking thresholds from a modified discrete cosine transform of the input signal. The invention also provides a predictive and non-predictive vector quantizer for determining the energy of the coefficients representing the frequency spectrum. As well, the invention provides a split vector quantizer for quantizing the fine structure of coefficients representing the frequency spectrum. Bit allocation for the split vector quantizer is based on the masking threshold. The split vector quantizer also makes use of embedded codebooks.
Furthermore, the invention makes use of a new transient detection method for selection of input windows.

Claims

1. A method of transmitting a discretly represented frequency signal within a frequency band, said signal discretely represented by coefficients at certain frequencies within said band, comprising the steps of:
(a) providing a codebook of codevectors for said band, each codevector having an element for each of said certain frequencies;
(b) obtaining a masking threshold for said frequency signal;
(c) for each one of a plurality of codevectors in said codebook, obtaining a distortion measure by the steps of:
for each of said coefficients of said frequency signal (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain said distortion measure;
(d) selecting a codevector having a smallest distortion measure;
(e) transmitting an index to said selected codevector.

2. The method of claim 1 wherein said codevectors are normalised with respect to energy and wherein step (c)(i) of obtaining a representation of a difference between a given coefficient of said frequency signal and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with a measure of energy in said signal and including the step of:
(f) transmitting an indication of energy in said signal.

3. The method of claim 2 wherein said step of obtaining a masking threshold comprises convolving a measure of energy in said signal with a known spreading function.

4. The method of claim 3 wherein said step of obtaining a maksing threshold further comprises adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.

5. A method of transmitting a discretely represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
(a) grouping said coefficients into frequency bands;
(b) for each band - providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
- obtaining a representation of energy of coefficients in said each band;
- selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
- selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an index to said selected codevector;
(d) concatenating said selected codevector addresses; and (e) transmitting said concatenated codevector addresses and an indication of each said representation of energy.

6. The method of claim 5 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.

7. The method of claim 6 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.

8. The method of claim 7 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.

9. The method of claim 5 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.

10. The method of claim 5 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:

- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.

11. The method of claim 10 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.

12. The method of claim 5 wherein each said codebook is sorted so as to provide sets of codevectors addressed by corresponding sets of addresses such that each larger set of addresses addresses a larger set of codevectors which span a frequency spectrum of said each band with increasingly less granularity.

13. A method of transmitting a discretely represented time series comprising the steps of:
- obtaining a frame of time samples;

- obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
- grouping said coefficients into frequency bands;
- for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
- concatenating said selected codevector addresses; and - transmitting said concatenated codevector addresses and an indication of each said representation of energy.

14. The method of claim 13 wherein said step of obtaining a representation of energy of coefficients in said each band comprises the steps of:
- determining an indication of energy for said band;
- determining an average energy for said band;
- quantising said average energy by finding an entry in an average energy codebook which, when adjusted with a representation of average energy from a frequency representation for a previous frame, best approximates said average energy;
- normalising said energy indication with respect to said quantised approximation of said average energy;
- quantising said normalised energy indication by manipulating a normalised energy indication from a frequency representation for said previous frame with each of a number of prediction matrices and selecting a prediction matrix resulting in a quantised normalised energy indication which best approximates said normalised energy indication;
- obtaining said representation of energy from said quantised normalised energy.

15. The method of claim 13 including the steps of:
- obtaining an index to said entry in said average energy codebook;
- obtaining an index to said selected prediction matrix;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises - transmitting said average energy codebook index; and - transmitting said selected prediction matrix index.

16. The method of claim 15 including the steps of:
- obtaining an actual residual from a difference between said quantised normalised energy indication and said normalised energy indication;
- comparing said actual residual to a residual codebook to find a quantised residual which is a best approximation said actual residual;
- adjusting said quantised normalised energy with said quantised residual;

and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said a combination of said quantised normalised energy and said quantised residual.

17. The method of claim 16 including the steps of:
- obtaining an actual second residual from a difference between (i) said combination of said quantised normalised energy and said quantised residual and (ii) said normalised energy indication;
- comparing said actual second residual to a second residual codebook to find a quantised second residual which is a best approximation of said actual second residual;
adjusting said combination with said quantised second residual to obtain a further combination;
and wherein said step of obtaining said representation of energy comprises obtaining said representation of energy from said further combination.

18. The method of claim 17 including the step of obtaining an index to said quantised residual in said residual codebook and an index to said quantised second residual in said second residual codebook;
and wherein said step of transmitting said concatenated codevector addresses and an indication of each said representation of energy comprises transmitting said quantised residual index and said quantised second residual index.

19. The method of claim 18 wherein said step of obtaining a representation of energy comprises unnormalising said further combination with said quantised average energy.

20. The method of claim 13 including the step of obtaining a representation of a masking threshold for each said band from said representation of energy and wherein said step of selecting a set of addresses comprising selecting such that said size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy reduced by a masking threshold indicated by said representation of a masking threshold.

21. The method of claim 20 wherein said representation of a masking threshold is obtained from a convolution of said representation of energy with a pre-defined spreading function.

22. The method of claim 21 wherein said representation of a masking threshold is reduced by an offset dependent upon a spectral flatness measure chosen as a constant.

23. The method of claim 13 wherein any band having an identical number of coefficients as another band shares a codebook with said other band.

24. The method of claim 13 wherein said step of selecting a codevector to represent said coefficients for said each band comprises the steps of:
- for each one codevector of said plurality of codevectors addressed by said address set for each of said coefficients of said each band (i) obtaining a representation of a difference between a corresponding element of said one codevector and (ii) reducing said difference by said masking threshold indicated by said representation of a masking threshold to obtain an indicator measure;
summing those obtained indicator measures which are positive to obtain a distortion measure;
- selecting a codevector having a smallest distortion measure.

25. The method of claim 24 wherein said codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given coefficient of said each band and a corresponding element of said one codevector comprises obtaining a squared difference between said given coefficient and said corresponding element after unnormalising said corresponding element with said representation of energy in said signal.

26. A method of receiving a discretly represented frequency signal, said signal discretely represented by coefficients at certain frequencies, comprising the steps of:
- providing pre-defined frequency bands;
- for each band providing a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
- receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
- determining a length of address for each band based on said per band indication of a representation of energy;

- parsing said concatenated codevector addresses based on said address length determining step;
- addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.

27. A transmitter comprising:
means for obtaining a frame of time samples;
means for obtaining a discrete frequency representation of said time series frame, said frequency representation comprising coefficients at certain frequencies;
means for grouping said coefficients into frequency bands;
means for, for each band (i) providing a codebook of codevectors, each codevector having an element corresponding with each coefficient within said each band;
(ii) obtaining a representation of energy of coefficients in said each band;
(iii) selecting a set of addresses which address at least a portion of said codebook such that a size of said address set is directly proportional to energy of coefficients in said each band indicated by said representation of energy;
(iv) selecting a codevector from said codebook from amongst those addressable by said address set to represent said coefficients for said band and obtaining an address to said selected codevector;
means for concatenating said selected codevector addresses; and means for transmitting said concatenated codevector addresses and an indication of each said representation of energy.

28. A receiver comprising:
means for providing pre-defined frequency bands;
a memory storing, for each band, a codebook of codevectors, each codevector having an element corresponding with each of said certain frequencies which are within said each band;
means for receiving concatenated codevector addresses for said bands and a per band indication of a representation of energy of coefficients in each band;
means for determining a length of address for each band based on said per band indication of a representation of energy;
means for parsing said concatenated codevector addresses based on said address length determining step;
means for addressing said codebook for each band with a parsed codebook address to obtain frequency coefficients for each said band.

29. A method of obtaining a codebook of codevectors which span a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;

- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and populating said codebook with said estimated codevectors resulting after a last iteration.

30 . The method of claim 29 wherein each distortion measure is obtained by the steps of:
- for each element of said training vector (i) obtaining a representation of a difference between a corresponding element of said one estimated codevector and (ii) reducing said difference by a masking threshold of said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.

31. The method of claim 30 wherein said masking threshold is obtained by convolving a measure of energy in said training vector with a known spreading function.

32. The method of claim 31 wherein said masking threshold is obtained by adjusting said convolution by an offset dependent upon a spectral flatness measure comprising an arithmatic mean of said coefficients.

33. The method of claim 32 wherein said estimated codevectors are normalised with respect to energy and wherein the step of obtaining a representation of a difference between a given element of said training vector and a corresponding element of said one estimated codevector comprises obtaining a squared difference between said given element and said corresponding element after unnormalising said corresponding element with a measure of energy in said training vector

34. The method of claim 33 wherein said step of determining a centroid for a Voronoi region comprises finding a candidate vector within said region which generates a minimum value for a sum of distortion measures between said candidate vector and each training vector in said region.

35. The method of claim 34 wherein each distortion measure in said sum of distortion measures is obtained by the steps of:
- for each training vector, for each element of said each training vector (i) obtaining a representation of a difference between a corresponding element of said candidate vector and (ii) reducing said difference by a masking threshold for said training vector to obtain an indicator measure;
- summing those obtained indicator measures which are positive to obtain said distortion measure.

36. The method of claim 29 wherein said estimated codevectors with which said codebook is populated is a first set of codevectors and wherein said codebook is enlarged by the steps of:
- fixing said first set of estimated codevectors;
- receiving an initial second set of estimated codevectors;
- associating each training vector with one estimated codevector from said first set or said second set with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for Voronoi region containing an estimated codevector from said second set;
- selecting each centroid vector as a new estimated second set codevector;
- repeating from said associating step until a difference between new estimated second set codevectors and estimated second set codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated second set codevectors resulting after a last iteration.

37. The method of claim 36 including the step of sorting said second set estimated codevectors to an end of said codebook whereby to obtain an embedded codebook.

38. A method of generating an embedded codebook for a frequency band discretely represented at pre-defined frequencies, comprising the steps of:
(a) obtaining an optimized larger first codebook of codevectors which span said frequency band;
(b) obtaining an optimized smaller second codebook of codevectors which span said frequency band;
(c) fording codevectors in said first codebook which best approximate each entry in said second codebook;

(d) sorting said first codebook to place said codevectors found in step (c) at a front of said first codebook.

39. The method of claim 38 wherein each step of obtaining an optimized codebook comprises the steps of:
- receiving training vectors for said frequency band;
- receiving an initial set of estimated codevectors;
- associating each training vector with a one of said estimated codevectors with respect to which it generates a smallest distortion measure to obtain associated groups of vectors;
- partitioning said associated groups of vectors into Voronoi regions;
- determining a centroid for each Voronoi region;
- selecting each centroid vector as a new estimated codevector;
- repeating from said associating step until a difference between new estimated codevectors and estimated codevectors from a previous iteration is less than a pre-defined threshold; and - populating said codebook with said estimated codevectors resulting after a last iteration.

40. The method of claim 39 wherein step (c) comprises utilising a least squares method to find codevectors in said first codebook which best approximate each entry in said second codebook.

41. A method for allocating encoding bits to bands within the frequency spectrum in a perceptual audio coding transmitter, said transmitter having a split VQ unit, said method comprising the steps of:
(A) receiving at least one masking threshold and at least one spectral energy for each band;
(B) allocating bits to each band based on said masking threshold and spectral energy for each band; and (C) transmitting the bit allocation for each band to the split VQ unit.

42. The method of claim 41 wherein the step of allocating bits to each band based on said masking threshold and spectral energy for each band further comprises the steps of:
(B.1) calculating a gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold and subtracting the ratio of the (bits already allocated to that band) to (the coefficients in that band, multiplied by some constant);
(B.2) allocating a bit to the band with the highest gap value; and (B.3) repeating steps B.1 and B.2 until all bits available for transmission have been allocated.

43. The method of claim 42 further comprising the step of:
(A.1) calculating a first approximation of the number of bits to be allocated to each band.

44. The method of claim 43 wherein the step of calculating a first approximation of the number of bits to be allocated to each band comprises the steps of:
(A.1.1) calculating a second gap value for each band wherein said gap is calculated by subtracting from the spectral energy for each band the masking threshold for that band;
(A.1.2) approximating the number of bits for each band as equal a second ratio of the second gap value times the number of coefficients in the band times the total number of bits available for transmission to the sum over all bands of the product of the second gap value times the number of coefficients in the band;
(A.1.3) discarding the fractional results of the second ratio to yield an integer second ratio; and (A.1.4) allocating to each band as a first approximation said integer second ratio.

45. A method of selecting a window for calculating frequency domain coefficients in a perceptual audio coding transmitter, said method comprising the steps of:
(A) receiving a series of time samples of the input signal;
(B) determining when a strong positive transient occurs in said series; and, (C) switching to a different window when a strong positive transient is detected.

46. The method of claim 45 wherein the step of determining when a strong positive transient occurs in said series comprises the steps of:
(B.1) calculating for a set of n successive time samples in said series the sum of the squares of the amplitudes for the three successive time samples to yield a first sum;
(B.2) calculating for the next n successive time samples in said series the sum of the squares of the amplitudes of the next three successive time samples to yield a second sum;
(B.3) calculating a ratio of the first sum less the second sum to the first sum;
(B.4) determining a strong positive transient has occurred when said ratio exceeds a threshold value;

47. The method of claim 46 wherein n has the value 3.

48. The method of claim 45 wherein said different window is a first transitional window.

49. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a series of short windows when a strong positive transient is detected in said next series.

50. The method of claim 49 wherein the series of short windows is a set of three short windows.

51. The method of claim 47 further comprising the steps of:
(D) receiving a next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said next series;
and, (F) switching to a second transitional window when a strong positive transient is not detected in said next series.

52. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a series of short windows when a strong positive transient is detected in said second next series.

53. The method of claim 52 wherein the series of short windows is a set of three short windows.

54. The method of claim 48 further comprising the steps of:
(D) receiving a second next series of time samples of the input signal;
(E) determining if a strong positive transient occurs in said second next series; and, (F) switching to a second transitional window when a strong positive transient is not detected in said second next series.

55. The method of claim 46 wherein said threshold value is 5.

56. In a perceptual audio coder, a method for calculating the masking threshold for a band, said band being one of a plurality of bands in a frame, said method comprising the steps of (A) receiving an input frame;
(B) calculating MDCT coefficients for each band of said frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(D) convolving a normalized spreading function with said power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.

57. The method of claim 56, wherein said offset measure is calculated from the band number and a spectral flatness measure.

58. The method of claim 56 wherein said spectral flatness measure is 0.5.

59. The method of claim 57 wherein said spectral flatness measure is the ratio of the geometric mean of the MDCT coefficients to the arithmetic mean of the MDCT
coefficients.

60. The method of claim 59 wherein the offset is calculated according to the equation:

61. The method of claim 56, wherein said spreading function is normalized by:
(I) calculating the overall gain due to the unnormalized spreading function;
(II) dividing unnormalized spreading function values by the overall gain due to the spreading function.

62. The method of claim 60, wherein the unnormalized spreading function is:

F i=5.5(1-a) + (14.5 + i) a Where F i is the offset for the ith band; and a is the spectral flatness measure for the frame.

63. In a perceptual audio coder, a method for calculating the masking threshold for a band, said method comprising the steps of:
(A) receiving an input frame;
(B) calculating MDCT coefficients for each band of the frame;
(C) calculating a spectral energy for each band of said frame from said MDCT coefficients to yield a power spectral density function;
(C.1) calculating a quantized spectral energy for each band from said spectral energy for each band;
(D) convolving a normalized spreading function with said quantized power spectral density function to yield a convolution;
(E) subtracting in the log domain an offset measure from said convolution to yield a masking threshold for a each band.

64. In a perceptual audio coding transmitter, a method for quantizing the spectral energy of MDCT coefficients in a band of a frame comprising the steps of:

(A) receiving MDCT coefficients for each band in the frame;
(B) calculating the energy in each band from the MDCT coefficients;
(C) calculating a quantized value for the average energy of the frame;
(D) calculating a normalized energy vector for the frame by subtracting in the log domain the quantized value of the average energy of the frame from the energy in each band;
(E) determining a best prediction matrix to predict the normalized energy vector;
(F) calculating a first residual vector from the best predicted normalized energy vector and the normalized energy vector for each band;
(G) finding a first codevector which most closely matches the first residual vector;
(H) calculating and storing the normalized quantized energy vector for the frame; and, (I) transmitting the indices of the quantized energy, prediction matrix and first codevector to the receiver.

65. The method of claim 64 wherein the step of calculating the energy in each band from the MDCT coefficients comprises the step of:
(B.1) taking the sum of the squares of the absolute values of the MDCT
coefficients in the band.

66. The method of claim 64 wherein the step of calculating a quantized value for the average energy of the frame comprises the steps of:
(C.1) converting the energy in each band to the logarithmic domain;
(C.2) calculating the average log energy of the power spectrum by taking the sum of energy in each band and dividing by the number of bands;
(C.3) calculating a product of a leakage factor and the quantized value of the average log energy for the previous frame;
(C.4) subtracting this product from the average log energy of the power spectrum to yield a difference;
(C.5) finding the best match in a codebook to said difference; and, (C.6) adding the best match to said product to yield the quantized value for the average energy of the frame;

67. The method of claim 64 wherein the step of determining a best prediction matrix to predict the normalized energy vector for all bands comprises the steps of:
(E.1) finding the prediction matrix which when multiplied by the normalized quantized energy vector of the previous frame gives the closest match to the normalized energy vector of the current frame;
(E.2) calculating a best predicted normalized energy vector by multiplying the prediction matrix which gives the closest match by the normalized quantized energy vector of the previous frame;

68. The method of claim 67 wherein said prediction matrices are tridiagonal.

69. The method of claim 64 wherein the step of calculating a residual vector from the best predicted normalized energy vector and the normalized energy for each band comprises the step of subtracting the best predicted normalized energy from the normalized energy for each band.

70. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector.

71. The method of claim 64 further comprising the steps of (I) calculating a second residual vector by subtracting the first codevector which most closely matches the first residual vector from the first residual vector;
(J) finding a second codevector most closely matches the second residual vector; and, (K) transmitting the index to the second codevector to the receiver.

72. The method of claim 64 wherein the step of calculating and storing the normalized quantized energy vector for the frame comprises the adding the best predicted normalized energy vector to the first codevector which most closely matches the first residual vector and to the codevector.

73. In a perceptual audio coding transmitter, a method for vector quantizing the MDCT
coefficients, said coefficients belonging to bands, said method comprising the steps of:
(A) receiving MDCT coefficients for each band;
(B) for each band:
(B.1) selecting a codevector that is the best match to the received MDCT coefficients for that band from a codebook;
(C) transmitting the indices for the selected codevectors to the receiver.

74. The method of claim 73 wherein the step of selecting a codevector from a codebook that is the best match to the received MDCT coefficients for that band further comprises the step of selecting the codevector that minimizes the energy between the codevector coefficients and the dead zone.

75. The method of claim 75 wherein the codevector that minimizes the energy between the codevector coefficients and the deadband satisfies the equation:

D i= ~max[0, E k(i) - t iu]

(sum over all coefficients in the ith critical band) Where the max function takes the larger value of the two arguments

76. The method of claim 73 further comprising the steps of:

(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from a codebook having 2b codevectors.

77. The method of claim 73 further comprising the steps of:
(A.1) receiving an indication of the number of bits, b, used to represent the codevector index for each band; and (A.2) selecting a codevector for the band from the first 2b codevectors in the codebook.

78. The method of claim 73 wherein at least one band comprises a plurality of critical bands.

79. In a perceptual audio coding system, a method of training the codebook in which the distortion measure used to select the codebok vectors for the codebook is calculated using the masking threshold.

80. The claim of claim 79 further comprising the steps of:
(A) producing a set of training vectors;
(B) calculating from each training vector a set of MDCT coefficients;
(C) calculating for each training vector a masking threshold for each band;
(D) making an estimate of codevectors for the codebook;

(E) calculating a distortion measure by calculating the energy of the difference between the MDCT coefficients for the training vector and the deadband surrounding the coefficients for the estimated codevectors;
(F) associating the coefficients within each band of each training vector with the estimated codevector that minimizes said distortion measure;
(G) calculating the centroid of each associated group;
(H) replacing the estimated codevectors by the centroids of each group;
(I) repeating steps (E) - (H) until the difference between successive estimated codevectors is small;
(J) populating the codebook with the estimated codevectors.

81. The method of claim 80 wherein the distortion method is calculated according to the equation:

D i = ~ max [0, E k(i) - t iu]

(sum over all coefficients in the i th critical band) Where the max function takes the larger value of the two arguments

82. The method of claim 80 wherein the centroid for each group is calculated according to the equation:
Xbest k (i) is that providing min ~~ max [0, (X k (i) - (G i)(0.5)Xbest k(i))2 - t iu) where ~ is a sum over all training vectors in the jth Voronoi region

83. The method of claim 80 wherein the difference between successive estimated codevectors is small when a least squares difference between successive estimated codevectors is less than a threshold value, namely 10-4.

84. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2d codevectors;
(B) training a codebook having 2e codevectors, where a is less than d;
(C) finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook; and, (D) sorting the 2d codevectors so that the closest 2e are placed in the first 2e portion of the codebook

85. The method of claim 84 wherein the step of finding the codevectors in the 2d element codebook closest to the codevectors in the 2e element codebook comprises the steps of:
(C.1) calculating the mean square difference between each codevector in the 2d element codebook and each of the codevectors in the 2d element codebook.

(C.2) selecting the codevector in the 2d element codebook which has the least mean square difference to each codevector in the 2e element codebook.

86. A method for creating an embedded codebook comprising the steps of:
(A) training a codebook having 2f codevectors;
(B) estimating (2g - 2f) additional codevectors, where g is greater than f;
(C) forming a set of 2g codevectors from step (A) and from the (2g - 2f) additional estimated codevectors from step (B);
(D) determining the Voronoi regions for said set;
(E) determining the centroid of the Voronoi regions for the (2g - 2f) additional estimated codevectors;
(F) replacing the additional estimated codevectors by the centroids of their Voronoi regions;
(G) repeating steps (D) - (F) until the difference between successive additional estimated codevectors is small.

(H) populating a new 2g element codebook with the 2f codevectors from step (A) in a bottom 2f positions of said new 2g element codebook and populating the 2f + 1 to 2g positions of the codebook with the additional estimated codevectors.