EP1172803A2 - Vector quantization system and method of operation - Google Patents
Vector quantization system and method of operation Download PDFInfo
- Publication number
- EP1172803A2 EP1172803A2 EP01116530A EP01116530A EP1172803A2 EP 1172803 A2 EP1172803 A2 EP 1172803A2 EP 01116530 A EP01116530 A EP 01116530A EP 01116530 A EP01116530 A EP 01116530A EP 1172803 A2 EP1172803 A2 EP 1172803A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- lsf
- estimate
- prediction error
- current
- mean value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000013139 quantization Methods 0.000 title claims abstract description 27
- 230000003595 spectral effect Effects 0.000 claims abstract description 59
- 230000007774 longterm Effects 0.000 claims abstract description 20
- 238000001914 filtration Methods 0.000 claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims abstract description 12
- 238000006073 displacement reaction Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 9
- 230000006978 adaptation Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
- G10L19/07—Line spectrum pair [LSP] vocoders
Definitions
- the present invention relates to speech encoding and decoding systems in general, and more particularly to systems and methods for vector quantization of line spectral frequencies.
- VQ vector quantization
- MSA Microphone/Speaker Adaption
- Aarskog et al One method for handling spectral balance variations is the Microphone/Speaker Adaption (MSA) method taught by Aarskog et al .
- MSA Microphone/Speaker Adaption
- This method is disadvantageous in that it requires two stages of inverse filtering, thus increasing the complexity of the quantizer due to the required autocorrelation function calculations.
- two LPC filter quantizers are needed, one for the MSA filter and one for the conventional LPC filter.
- An additional slow-speed data path is also needed to convey the quantized MSA filter parameters from encoder to decoder.
- the present invention seeks to provide improved systems and methods for vector quantization that account for spectral balance variations while avoiding the limitations of the prior art.
- a quantization system and method are disclosed that achieve similar objective performance, in terms of mean spectral distortion and outliers, for speech within and outside the training database, and similar quantizer performance for different types of speech largely irrespective of the spectral balance.
- the present invention exploits properties of line-spectrum pairs to yield a robust quantizer with superior performance under various conditions.
- the present invention further discloses a more error-robust system and method for deriving adaptable mean values based upon previous quantizer decisions in a uniform gain moving average fashion.
- the present invention is an extension to mean-removed vector quantization and is equally applicable to both auto-regressive and moving average predictive vector quantization.
- a system and method are disclosed for slow averaging of the positions of the inverse quantized line spectral frequencies (LSFs) using a series of simple filters (one per LSF) with one or more long time constants.
- a method of providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, quantizing the displacement of the LSF from an estimate of its long-term mean, reconstructing an estimate of the LSF from the quantized displacement and the long-term LSF mean estimate, and filtering the reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate.
- LSFs line spectral frequencies
- the filtering step includes filtering the reconstructed LSF estimate using a first-order recursive filter.
- the first-order recursive filter is of unity gain and employs a time constant of about 1 second for the LSF.
- a method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, at an encoder a) quantizing the difference between the LSF and a current LSF mean value estimate, and at the encoder and a decoder b) dequantizing the difference, c) adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and d) filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate.
- LSFs line spectral frequencies
- a method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, at an encoder a) quantizing a prediction error derived from the LSF from which a current short-term LSF mean value and a current moving average predicted LSF estimate have been subtracted, and at the encoder and a decoder b) dequantizing the prediction error, c) determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and d) determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction error.
- LSFs line spectral frequencies
- next-current short-term LSF mean value is the sum of a training data derived mean and a moving average of a plurality of previously dequantized prediction error values.
- the equal gains are assigned to each dequantized prediction error value.
- apparatus for providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations, the apparatus including means for quantizing the displacement of a line spectral frequency (LSF) from an estimate of its long-term mean, means for reconstructing an estimate of the LSF from the quantized displacement and the long-term LSF mean estimate, and means for filtering the reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate.
- LSF line spectral frequency
- the filtering means includes a first-order recursive filter.
- the first-order recursive filter is of unity gain and employs a time constant of about 1 second for the LSF.
- the apparatus including an encoder including means for quantizing the difference between a line spectral frequency (LSF) and a current LSF mean value estimate, means for dequantizing the difference, means for adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and means for filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate, and a decoder including means for dequantizing the difference, means for adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and means for filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate.
- LSF line spectral frequency
- the apparatus including an encoder including means for quantizing a prediction error derived from the LSF from which a current short-term LSF mean value and a current moving average predicted LSF estimate have been subtracted, means for dequantizing the prediction error, means for determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and means for determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction error and the current short-term LSF mean value, and a decoder including means for dequantizing the prediction error, means for determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and means for determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction
- next-current short-term LSF mean value is the sum of a training data derived mean and a moving average of a plurality of previously dequantized prediction error values.
- the equal gains are assigned to each dequantized prediction error value.
- Fig. 1 is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with a preferred embodiment of the present invention.
- LSFs are quantized with their previous long-term mean values removed, using any conventional VQ technique, such as memoryless, AR predictive, MA predictive, or other suitable technique.
- the same long-term mean value is used during encoding and decoding.
- the long-term average value of the LSF changes at both the encoder and decoder. In this way, the quantizer adapts to long-term variations in the LSFs.
- spectral frequencies of a speech spectrum are provided to an encoder, generally referenced 10.
- a subtractor 12 subtracts the current estimate of the mean value associated with the LSF from the LSF input.
- a quantizer 14 then quantizes the difference of the LSF from its mean value by selecting an appropriate codebook index in accordance with any known and suitable quantization means.
- the quantization index is then provided to inverse quantizers 16 and 18 at encoder 10 and a decoder, generally referenced 20, respectively.
- Inverse quantizer 18 dequantizes the quantization index using any known and suitable means to determine an associated LSF.
- An adder 22 adds the current estimate of the mean value associated with the LSF back into the LSF determined at inverse quantizer 18, thus providing an approximation of the LSF input to encoder 10.
- the quantized LSF from adder 22 in addition to being used during subsequent speech encoding and decoding, is provided to a simple, first-order filter where the LSF is multiplied by a filter value X at a multiplier 26.
- the previous estimate of the LSF mean value, held at a delay 28, is then multiplied by a filter value 1-X at a multiplier 30.
- the result of multiplier 30 is then added to the result from multiplier 26 at an adder 32.
- the result from adder 32 represents the current estimate of the LSF mean value and is stored in delay 28.
- Inverse quantizer 16 likewise dequantizes the quantization index to determine an associated LSF which is then provided to an adder 34 and a simple, first-order filter which includes a multiplier 38, an adder 40, a delay 42, and a multiplier 44, all of which operate in the manner described hereinabove for adder 22, multiplier 26, delay 28, multiplier 30, and adder 32, with the notable exception that the estimate of the LSF mean value in delay 42 is provided to subtractor 12 in addition to being provided to adder 34.
- Fig. 2 is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with another preferred embodiment of the present invention.
- the LSF means are derived from a relatively long moving average predictor in order to overcome the problems associated with infinite error propagation and incorporated within a conventional third-order (short) moving average predictive vector quantizer.
- the system of Fig. 2 may be implemented using a rectangular window moving average predictor for the calculation of the LSF means, such as one that is about 750 ms long (a relatively long predictor). This may be easily achieved by employing a circular buffer containing the quantizer indices from previous decisions.
- line spectral frequencies of a speech spectrum are provided to an encoder, generally referenced 50.
- a subtractor 52 receives the current short-term mean value associated with the LSF from an adder 54 and subtracts it from the LSF.
- a subtractor 56 then receives the current moving average predicted estimate of the LSF from an adder 58 and subtracts it from the output of subtractor 52.
- the output of subtractor 56 is then divided by a tap t 0 of the short MA predictor at a divider 92 to provide a prediction error which is then quantized at a quantizer 60 using any known and suitable quantization means.
- the quantization index is then provided to inverse quantizers 62 and 64 at encoder 50 and a decoder, generally referenced 66, respectively.
- Inverse quantizer 62 dequantizes the quantization index using any known and suitable means to determine an associated LSF.
- the taps of the short moving average predictor ( t 0 , t 1 & t 2 ) may be determined by any reasonable technique, but are ideally jointly optimized with the relatively long moving average LSF mean predictor in operation.
- the current output of inverse quantizer 62 is multiplied by t 0 at a multiplier 68 and provided to an adder 70.
- the previous output of inverse quantizer 62, stored at a delay 72, is multiplied by a tap t 1 at a multiplier 74 and provided to adder 70.
- the twice-previous output of inverse quantizer 62, stored at a delay 76, is multiplied by a tap t 2 at a multiplier 78 and provided to adder 70.
- Adder 70 adds all three inputs and provides the result to an adder 80.
- the output of adder 70 represents the current quantization error component of the output LSF.
- the previous output of inverse quantizer 62, stored at delay 72, is multiplied by tap t 1 at a multiplier 82 and provided to adder 58.
- the twice-previous output of inverse quantizer 62, stored at delay 76, is multiplied by tap t 2 at a multiplier 84 and provided to adder 58.
- Adder 58 adds the two inputs and provides the result to subtractor 56.
- the output of adder 58 represents the current predicted estimate of the LSF.
- the current output of inverse quantizer 62 is also provided to an ordered series of n delays 86, with each delay storing an n th previous output of inverse quantizer 62.
- Each previous value n is then multiplied by 1/ n by a series of multipliers 88, thereby providing equal gain for each value n , and provided to adder 54, where they are added together with a predetermined estimate ⁇ of the mean of the LSF stored at a delay 90.
- the value of ⁇ may be determined from training data and represents an initial estimate of the LSF means.
- the output of adder 54 is then provided to adder 80 as well as subtractor 52.
- the output of adder 54 represents the current short-term mean value associated with the LSF.
- the current quantization error component of the LSF is preferably added to the current short-term LSF mean value at adder 80 to provide an approximation of the LSF input.
- elements 68' - 90' preferably operate in the manner described hereinabove for correspondingly-numbered elements 68 - 90 with the notable exceptions that adder 54' provides input only to adder 80', delay 72' provides input only to multiplier 74', and delay 76' provides input only to multiplier 78'.
- Fig. 3 is a simplified graph illustration showing Mean Spectral Distortion Performance (dB) of a conventional third-order MA Predictive LSF VQ with Fixed Means, represented by a plot 100, the Moving Average Mean Adaptation of Fig. 2, represented by a plot 102, and the Backwards Adapted Means of Fig. 1, represented by a plot 104.
- dB Mean Spectral Distortion Performance
- FIG. 3 shows the spectral distortion figures for three identical third-order moving average predictive quantizers (MA-PVQs) plotted with and without adaptation of the mean values as described hereinabove.
- the test file that was used comprised 8,000 frames each of flat filtered speech, Intermediate Reference System (IRS) filtered speech, and modified IRS filtered speech.
- the training data that was used for both quantizers was IRS filtered.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- The present invention relates to speech encoding and decoding systems in general, and more particularly to systems and methods for vector quantization of line spectral frequencies.
- One of the challenges of designing vector quantizers for linear predictive coding (LPC) speech filters is making them robust to variations in spectral balance, as well as variations between speakers. Spectral balance variations may have several sources, but the dominant sources are the spectral responses of microphones and of anti-aliasing filters, which can vary quite considerably. In order to account for these variations it is common to train a quantizer for an LPC filter for use with a wide variety of speech input, recorded from many different sources.
- Conventional vector quantization (VQ) training methods use speech from as many different sources as possible, in an attempt to provide robust performance for many different input spectra. However, this approach is disadvantageous in that training is relatively slow and complex, as many speech samples are required. This approach furthermore generally results in a quantizer which is not optimal for any one filtering condition.
- One method for handling spectral balance variations is the Microphone/Speaker Adaption (MSA) method taught by Aarskog et al. In this method, the average spectrum of speech input presented to a speech coder is compensated for by an MSA filter prior to further compression by an inverse filter. The speech is subsequently filtered by a complementary filter after decoding. This method is disadvantageous in that it requires two stages of inverse filtering, thus increasing the complexity of the quantizer due to the required autocorrelation function calculations. Furthermore, two LPC filter quantizers are needed, one for the MSA filter and one for the conventional LPC filter. An additional slow-speed data path is also needed to convey the quantized MSA filter parameters from encoder to decoder.
- The following publications are believed to be descriptive of the current state of the art of speech encoding and decoding systems in general, and vector quantization of line spectral frequencies technologies in particular, and terms related thereto:
- A. Aarskog, A. Nilsen, O. Berg, and H. C. Gruen, "A Long-Term Predictive ADPCM Coder with Short-Term Prediction and Vector Quantization," 1991 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-91, vol. 1, pp. 37 -40;
- A. Aarskog and H. C. Gruen, "Predictive Coding of Speech Using Microphone/Speaker Adaptation and Vector Quantization," IEEE Transactions on Speech and Audio Processing, April 1994, vol. 22, pp. 266 -273;
- W.B. Kleijn and K.K. Paliwal, "Speech Coding and Synthesis," Elsevier Press, 1995.
-
- The disclosures of all patents, patent applications, and other publications mentioned in this specification are hereby incorporated by reference.
- The present invention seeks to provide improved systems and methods for vector quantization that account for spectral balance variations while avoiding the limitations of the prior art.
- A quantization system and method are disclosed that achieve similar objective performance, in terms of mean spectral distortion and outliers, for speech within and outside the training database, and similar quantizer performance for different types of speech largely irrespective of the spectral balance. The present invention exploits properties of line-spectrum pairs to yield a robust quantizer with superior performance under various conditions. The present invention further discloses a more error-robust system and method for deriving adaptable mean values based upon previous quantizer decisions in a uniform gain moving average fashion.
- The present invention is an extension to mean-removed vector quantization and is equally applicable to both auto-regressive and moving average predictive vector quantization. A system and method are disclosed for slow averaging of the positions of the inverse quantized line spectral frequencies (LSFs) using a series of simple filters (one per LSF) with one or more long time constants.
- There is thus provided in accordance with a preferred embodiment of the present invention a method of providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations, the method including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, quantizing the displacement of the LSF from an estimate of its long-term mean, reconstructing an estimate of the LSF from the quantized displacement and the long-term LSF mean estimate, and filtering the reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate.
- Further in accordance with a preferred embodiment of the present invention the filtering step includes filtering the reconstructed LSF estimate using a first-order recursive filter.
- Still further in accordance with a preferred embodiment of the present invention, the first-order recursive filter is of unity gain and employs a time constant of about 1 second for the LSF.
- There is also provided in accordance with a preferred embodiment of the present invention a method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, at an encoder a) quantizing the difference between the LSF and a current LSF mean value estimate, and at the encoder and a decoder b) dequantizing the difference, c) adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and d) filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate.
- There is additionally provided in accordance with a preferred embodiment of the present invention a method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method including the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum, at an encoder a) quantizing a prediction error derived from the LSF from which a current short-term LSF mean value and a current moving average predicted LSF estimate have been subtracted, and at the encoder and a decoder b) dequantizing the prediction error, c) determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and d) determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction error.
- Further in accordance with a preferred embodiment of the present invention the next-current short-term LSF mean value is the sum of a training data derived mean and a moving average of a plurality of previously dequantized prediction error values.
- Still further in accordance with a preferred embodiment of the present invention the equal gains are assigned to each dequantized prediction error value.
- There is also provided in accordance with a preferred embodiment of the present invention apparatus for providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations, the apparatus including means for quantizing the displacement of a line spectral frequency (LSF) from an estimate of its long-term mean, means for reconstructing an estimate of the LSF from the quantized displacement and the long-term LSF mean estimate, and means for filtering the reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate.
- Further in accordance with a preferred embodiment of the present invention the filtering means includes a first-order recursive filter.
- Still further in accordance with a preferred embodiment of the present invention the first-order recursive filter is of unity gain and employs a time constant of about 1 second for the LSF.
- There is additionally provided in accordance with a preferred embodiment of the present invention apparatus for quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the apparatus including an encoder including means for quantizing the difference between a line spectral frequency (LSF) and a current LSF mean value estimate, means for dequantizing the difference, means for adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and means for filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate, and a decoder including means for dequantizing the difference, means for adding the dequantized difference to a current LSF mean value estimate, thereby providing an approximation of the LSF, and means for filtering the quantized LSF together with the current LSF mean value estimate, thereby providing a new current LSF mean value estimate.
- There is also provided in accordance with a preferred embodiment of the present invention apparatus for quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the apparatus including an encoder including means for quantizing a prediction error derived from the LSF from which a current short-term LSF mean value and a current moving average predicted LSF estimate have been subtracted, means for dequantizing the prediction error, means for determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and means for determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction error and the current short-term LSF mean value, and a decoder including means for dequantizing the prediction error, means for determining a next-current short-term LSF mean value from the dequantized prediction error and at least one previously dequantized prediction error, and means for determining a next-current moving average predicted LSF estimate from the dequantized prediction error and at least one previously dequantized prediction error.
- Further in accordance with a preferred embodiment of the present invention the next-current short-term LSF mean value is the sum of a training data derived mean and a moving average of a plurality of previously dequantized prediction error values.
- Still further in accordance with a preferred embodiment of the present invention the equal gains are assigned to each dequantized prediction error value.
- The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the appended drawings in which:
- Fig. 1 is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with a preferred embodiment of the present invention;
- Fig. 2 is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with another preferred embodiment of the present invention; and
- Figure 3 is a simplified graph illustration showing mean spectral distortion performance (dB) of the systems of Figs. 1 and 2 with fixed means, moving average mean adaptation and backwards adapted means.
-
- Reference is now made to Fig. 1, which is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with a preferred embodiment of the present invention. In the system of Fig. 1 LSFs are quantized with their previous long-term mean values removed, using any conventional VQ technique, such as memoryless, AR predictive, MA predictive, or other suitable technique. The same long-term mean value is used during encoding and decoding. As each quantization process is performed, the long-term average value of the LSF changes at both the encoder and decoder. In this way, the quantizer adapts to long-term variations in the LSFs.
- In the system of Fig. 1 line spectral frequencies of a speech spectrum are provided to an encoder, generally referenced 10. A
subtractor 12 subtracts the current estimate of the mean value associated with the LSF from the LSF input. Aquantizer 14 then quantizes the difference of the LSF from its mean value by selecting an appropriate codebook index in accordance with any known and suitable quantization means. The quantization index is then provided toinverse quantizers encoder 10 and a decoder, generally referenced 20, respectively.Inverse quantizer 18 dequantizes the quantization index using any known and suitable means to determine an associated LSF. Anadder 22 adds the current estimate of the mean value associated with the LSF back into the LSF determined atinverse quantizer 18, thus providing an approximation of the LSF input toencoder 10. - The quantized LSF from
adder 22, in addition to being used during subsequent speech encoding and decoding, is provided to a simple, first-order filter where the LSF is multiplied by a filter value X at amultiplier 26. The previous estimate of the LSF mean value, held at adelay 28, is then multiplied by a filter value 1-X at amultiplier 30. The result ofmultiplier 30 is then added to the result frommultiplier 26 at anadder 32. The result fromadder 32 represents the current estimate of the LSF mean value and is stored indelay 28. - The time constant used in the system of Fig. 1 may in principle take any value, however, the filter value X is preferably determined such that the time constant of the filter is relatively long compared to the maximum duration of steady state vowels. This ensures that the filter removes the slow-varying spectral shape rather than the utterance-to-utterance variations, i.e., the fast spectral variations of normal speech, which typically do not exceed a few hundred milliseconds. Too long a time constant restricts the time needed to adapt to new speakers. Experimentation has shown that a time constant of approximately 1 second, corresponding to a filter value of X=.037, provides satisfactory performance. Where errors may occur in quantizer index transmission, the time constant is preferably selected to minimize error propagation stemming from the use of an infinite memory recursive filter.
-
Inverse quantizer 16 likewise dequantizes the quantization index to determine an associated LSF which is then provided to anadder 34 and a simple, first-order filter which includes amultiplier 38, anadder 40, adelay 42, and amultiplier 44, all of which operate in the manner described hereinabove foradder 22,multiplier 26,delay 28,multiplier 30, andadder 32, with the notable exception that the estimate of the LSF mean value indelay 42 is provided tosubtractor 12 in addition to being provided to adder 34. - Reference is now made to Fig. 2, which is a simplified illustration of a system for backwards-adaptive vector quantization of line spectral frequencies (LSF), constructed and operative in accordance with another preferred embodiment of the present invention. In the system of Fig. 2 the LSF means are derived from a relatively long moving average predictor in order to overcome the problems associated with infinite error propagation and incorporated within a conventional third-order (short) moving average predictive vector quantizer. The system of Fig. 2 may be implemented using a rectangular window moving average predictor for the calculation of the LSF means, such as one that is about 750 ms long (a relatively long predictor). This may be easily achieved by employing a circular buffer containing the quantizer indices from previous decisions.
- In the system of Fig. 2, line spectral frequencies of a speech spectrum are provided to an encoder, generally referenced 50. A
subtractor 52 receives the current short-term mean value associated with the LSF from anadder 54 and subtracts it from the LSF. Asubtractor 56 then receives the current moving average predicted estimate of the LSF from anadder 58 and subtracts it from the output ofsubtractor 52. The output ofsubtractor 56 is then divided by a tap t0 of the short MA predictor at adivider 92 to provide a prediction error which is then quantized at aquantizer 60 using any known and suitable quantization means. The quantization index is then provided toinverse quantizers encoder 50 and a decoder, generally referenced 66, respectively.Inverse quantizer 62 dequantizes the quantization index using any known and suitable means to determine an associated LSF. The taps of the short moving average predictor (t 0, t 1 & t 2) may be determined by any reasonable technique, but are ideally jointly optimized with the relatively long moving average LSF mean predictor in operation. - The current output of
inverse quantizer 62 is multiplied by t 0 at amultiplier 68 and provided to anadder 70. The previous output ofinverse quantizer 62, stored at adelay 72, is multiplied by a tap t 1 at amultiplier 74 and provided to adder 70. The twice-previous output ofinverse quantizer 62, stored at adelay 76, is multiplied by a tap t 2 at amultiplier 78 and provided to adder 70.Adder 70 adds all three inputs and provides the result to anadder 80. The output ofadder 70 represents the current quantization error component of the output LSF. - The previous output of
inverse quantizer 62, stored atdelay 72, is multiplied by tap t 1 at amultiplier 82 and provided to adder 58. The twice-previous output ofinverse quantizer 62, stored atdelay 76, is multiplied by tap t 2 at amultiplier 84 and provided to adder 58.Adder 58 adds the two inputs and provides the result tosubtractor 56. The output ofadder 58 represents the current predicted estimate of the LSF. - The current output of
inverse quantizer 62 is also provided to an ordered series of n delays 86, with each delay storing an nth previous output ofinverse quantizer 62. Each previous value n is then multiplied by 1/n by a series ofmultipliers 88, thereby providing equal gain for each value n, and provided to adder 54, where they are added together with a predetermined estimate µ of the mean of the LSF stored at adelay 90. The value of µ may be determined from training data and represents an initial estimate of the LSF means. The output ofadder 54 is then provided to adder 80 as well assubtractor 52. The output ofadder 54 represents the current short-term mean value associated with the LSF. - The current quantization error component of the LSF is preferably added to the current short-term LSF mean value at
adder 80 to provide an approximation of the LSF input. - At
decoder 66, elements 68' - 90' preferably operate in the manner described hereinabove for correspondingly-numbered elements 68 - 90 with the notable exceptions that adder 54' provides input only to adder 80', delay 72' provides input only to multiplier 74', and delay 76' provides input only to multiplier 78'. - Experimentation with the system of Fig. 2 has shown that there is little degradation in performance when the moving-average derived adaptive mean method is applied to a third-order moving average predictive VQ as compared with a conventional third-order MA predictive VQ without mean adaption.
- Experimentation has shown that the application of the systems of Figs. 1 and 2 leads to significant gains in performance since the long-term averaging of the LSF means removes some of the speaker and microphone/anti-aliasing spectral variation which is present in the input. Such performance gains are shown in Fig. 3 which is a simplified graph illustration showing Mean Spectral Distortion Performance (dB) of a conventional third-order MA Predictive LSF VQ with Fixed Means, represented by a
plot 100, the Moving Average Mean Adaptation of Fig. 2, represented by aplot 102, and the Backwards Adapted Means of Fig. 1, represented by aplot 104. Fig. 3 shows the spectral distortion figures for three identical third-order moving average predictive quantizers (MA-PVQs) plotted with and without adaptation of the mean values as described hereinabove. The test file that was used comprised 8,000 frames each of flat filtered speech, Intermediate Reference System (IRS) filtered speech, and modified IRS filtered speech. The training data that was used for both quantizers was IRS filtered. - While the methods and apparatus disclosed herein may or may not have been described with reference to specific hardware or software, the methods and apparatus have been described in a manner sufficient to enable persons of ordinary skill in the art to readily adapt commercially available hardware and software as may be needed to reduce any of the embodiments of the present invention to practice without undue experimentation and using conventional techniques.
- While the present invention has been described with reference to a few specific embodiments, the description is intended to be illustrative of the invention as a whole and is not to be construed as limiting the invention to the embodiments shown. It is appreciated that various modifications may occur to those skilled in the art that, while not specifically shown herein, are nevertheless within the true scope of the invention.
Claims (14)
- A method of providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations, the method comprising the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum:quantizing (14) the displacement (12) of said LSF from an estimate (28) of its long-term mean;reconstructing (22) an estimate of said LSF from said quantized displacement and said long-term LSF mean estimate; andfiltering (26,30,32) said reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate (28).
- A method according to claim 1 wherein said filtering step comprises filtering said reconstructed LSF estimate using a first-order recursive filter (26,30,32).
- A method according to claim 2 wherein said first-order recursive filter is of unity gain and employs a time constant of about 1 second for said LSF.
- A method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method comprising the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum:at an encoder (10):a) quantizing (14) the difference (12) between said LSF and a current LSF mean value estimate;at said encoder (10) and a decoder (20):b) dequantizing (16,18) said difference;c) adding (22,34) said dequantized difference to a current LSF mean value estimate (28,42), thereby providing an approximation of said LSF; andd) filtering (26,30,32;38,40,44) said quantized LSF together with said current LSF mean value estimate (28,42), thereby providing a new current LSF mean value estimate.
- A method of quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, the method comprising the steps of, for each of a plurality of line spectral frequencies (LSFs) of a speech spectrum:at an encoder (50):a) quantizing (60) a prediction error (92) derived from said LSF from which a current short-term LSF mean value (52) and a current moving average predicted LSF estimate (56) have been subtracted; andat said encoder (50) and a decoder (66):b) dequantizing (62,64) said prediction error;c) determining a next-current short-term LSF mean value (54) from said dequantized prediction error and at least one previously dequantized prediction error (86); andd) determining a next-current moving average predicted LSF estimate (58) from said dequantized prediction error (72) and at least one previously dequantized prediction error (76).
- A method according to claim 5 wherein the next-current short-term LSF mean value (54) is the sum of a training data derived mean (90) and a moving average (88) of a plurality of previously dequantized prediction error values (86).
- A method according to claim 6 wherein equal gains (88) are assigned to each dequantized prediction error value (86).
- Apparatus for providing robust quantization of speech spectral parameters tolerant to spectral balance and speaker variations, said apparatus comprising:means (14) for quantizing the displacement (12) of a line spectral frequency (LSF) from an estimate (28) of its long-term mean;means (22) for reconstructing an estimate of said LSF from said quantized displacement and said long-term LSF mean estimate; andmeans (26,30,32) for filtering said reconstructed LSF estimate, thereby providing a subsequent long-term LSF mean estimate (28).
- Apparatus according to claim 8 wherein said filtering means comprises a first-order recursive filter (26,30,32).
- Apparatus according to claim 9 wherein said first-order recursive filter is of unity gain and employs a time constant of about 1 second for said LSF.
- Apparatus for quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, said apparatus comprising:an encoder (10) comprising:means (14) for quantizing the difference (12) between a line spectral frequency (LSF) and a current LSF mean value estimate;means (16) for dequantizing said difference;means (34) for adding said dequantized difference to a current LSF mean value estimate (42), thereby providing an approximation of said LSF; andmeans (38,40,44) for filtering said quantized LSF together with said current LSF mean value estimate (42), thereby providing a new current LSF mean value estimate; anda decoder comprising:means (18) for dequantizing said difference;means (22) for adding said dequantized difference to a current LSF mean value estimate (28), thereby providing an approximation of said LSF; andmeans (26,30,32) for filtering said quantized LSF together with said current LSF mean value estimate (28), thereby providing a new current LSF mean value estimate.
- Apparatus for quantizing speech spectral parameters that is tolerant to spectral balance and speaker variations, said apparatus comprising:an encoder (50) comprising:means (60) for quantizing a prediction error (92) derived from said LSF from which a current short-term LSF mean value (52) and a current moving average predicted LSF estimate (56) have been subtracted;means (62) for dequantizing said prediction error;means (54) for determining a next-current short-term LSF mean value from said dequantized prediction error and at least one previously dequantized prediction error (86); andmeans (58) for determining a next-current moving average predicted LSF estimate from said dequantized prediction error (72) and at least one previously dequantized prediction error (76); anda decoder (66) comprising:means (64) for dequantizing said prediction error;means (54') for determining a next-current short-term LSF mean value from said dequantized prediction error and at least one previously dequantized prediction error (86'); andmeans (58') for determining a next-current moving average predicted LSF estimate from said dequantized prediction error (72') and at least one previously dequantized prediction error (76').
- Apparatus according to claim 12 wherein the next-current short-term LSF mean value (54) is the sum of a training data derived mean (90) and a moving average (88) of a plurality of previously dequantized prediction error values (86).
- Apparatus according to claim 13 wherein equal gains (88) are assigned to each dequantized prediction error value (86).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0017145 | 2000-07-13 | ||
GB0017145A GB2364870A (en) | 2000-07-13 | 2000-07-13 | Vector quantization system for speech encoding/decoding |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1172803A2 true EP1172803A2 (en) | 2002-01-16 |
EP1172803A3 EP1172803A3 (en) | 2004-01-14 |
Family
ID=9895545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01116530A Withdrawn EP1172803A3 (en) | 2000-07-13 | 2001-07-09 | Vector quantization system and method of operation |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1172803A3 (en) |
GB (1) | GB2364870A (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5966688A (en) * | 1997-10-28 | 1999-10-12 | Hughes Electronics Corporation | Speech mode based multi-stage vector quantizer |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3033060B2 (en) * | 1988-12-22 | 2000-04-17 | 国際電信電話株式会社 | Voice prediction encoding / decoding method |
US5664053A (en) * | 1995-04-03 | 1997-09-02 | Universite De Sherbrooke | Predictive split-matrix quantization of spectral parameters for efficient coding of speech |
US6081776A (en) * | 1998-07-13 | 2000-06-27 | Lockheed Martin Corp. | Speech coding system and method including adaptive finite impulse response filter |
-
2000
- 2000-07-13 GB GB0017145A patent/GB2364870A/en not_active Withdrawn
-
2001
- 2001-07-09 EP EP01116530A patent/EP1172803A3/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5966688A (en) * | 1997-10-28 | 1999-10-12 | Hughes Electronics Corporation | Speech mode based multi-stage vector quantizer |
Non-Patent Citations (4)
Title |
---|
AARSKOG A: "A LONG-TERM PREDICTIVE ADPCM CODER WITH SHORT-TERM PREDICTION AND VECTOR QUANTIZATION" IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING (ICASSP'91), TORONTO, CANADA, vol. 1, 14 - 17 May 1991, pages 37-40, XP000245161 IEEE, New York, USA ISBN: 0-7803-0003-3 * |
CHO INHWAN ET AL: "Predictive pyramid vector quantisation of LSF parameters" ELECTRONICS LETTERS, IEE STEVENAGE, GB, vol. 34, no. 8, 16 April 1998 (1998-04-16), pages 735-736, XP006009612 ISSN: 0013-5194 * |
OHMURO H ET AL: "VECTOR QUANTIZATION OF LSP PARAMETERS USING MOVING AVERAGE INTERFRAME PREDICTION" ELECTRONICS & COMMUNICATIONS IN JAPAN, PART III - FUNDAMENTAL ELECTRONIC SCIENCE, SCRIPTA TECHNICA. NEW YORK, US, vol. 77, no. 10, PART 3, 1 October 1994 (1994-10-01), pages 12-25, XP000527379 ISSN: 1042-0967 * |
SKOGLUND J ET AL: "PREDICTIVE VQ FOR NOISY CHANNEL SPECTRUM CODING: AR OR MA?" IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP'97), MUNICH, GERMANY, vol. 2, 21 - 24 April 1997, pages 1351-1354, XP000822706 IEEE COMP. SOC. PRESS, USA ISBN: 0-8186-7920-4 * |
Also Published As
Publication number | Publication date |
---|---|
GB0017145D0 (en) | 2000-08-30 |
GB2364870A (en) | 2002-02-06 |
EP1172803A3 (en) | 2004-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0503684B1 (en) | Adaptive filtering method for speech and audio | |
CA2483791C (en) | Method and device for efficient frame erasure concealment in linear predictive based speech codecs | |
EP0939394B1 (en) | Apparatus for encoding and apparatus for decoding speech and musical signals | |
US5140638A (en) | Speech coding system and a method of encoding speech | |
EP1202251A2 (en) | Transcoder for prevention of tandem coding of speech | |
EP0501421B1 (en) | Speech coding system | |
US6694018B1 (en) | Echo canceling apparatus and method, and voice reproducing apparatus | |
KR20090039660A (en) | Updating of decoder states after packet loss concealment | |
NO340674B1 (en) | Information signal encoding | |
CA2262787C (en) | Methods and devices for noise conditioning signals representative of audio information in compressed and digitized form | |
US5913187A (en) | Nonlinear filter for noise suppression in linear prediction speech processing devices | |
JPH02155313A (en) | Coding method | |
JP3357795B2 (en) | Voice coding method and apparatus | |
Kroon et al. | Predictive coding of speech using analysis-by-synthesis techniques | |
EP1301018A1 (en) | Apparatus and method for modifying a digital signal in the coded domain | |
US6104994A (en) | Method for speech coding under background noise conditions | |
EP1208413A2 (en) | Coded domain noise control | |
US20130268268A1 (en) | Encoding of an improvement stage in a hierarchical encoder | |
EP1172803A2 (en) | Vector quantization system and method of operation | |
CA2542137C (en) | Harmonic noise weighting in digital speech coders | |
Härmä et al. | Backward adaptive warped lattice for wideband stereo coding | |
Lee | An enhanced ADPCM coder for voice over packet networks | |
JPH04301900A (en) | Audio encoding device | |
KR100392258B1 (en) | Implementation method for reducing the processing time of CELP vocoder | |
Aarskog et al. | Predictive coding of speech using microphone/speaker adaptation and vector quantization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20040715 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230520 |