EP1035538A2 - Multimodale Quantisierung des Prädiktionsfehlers in einem Sprachkodierer - Google Patents

Multimodale Quantisierung des Prädiktionsfehlers in einem Sprachkodierer Download PDF

Info

Publication number
EP1035538A2
EP1035538A2 EP00200874A EP00200874A EP1035538A2 EP 1035538 A2 EP1035538 A2 EP 1035538A2 EP 00200874 A EP00200874 A EP 00200874A EP 00200874 A EP00200874 A EP 00200874A EP 1035538 A2 EP1035538 A2 EP 1035538A2
Authority
EP
European Patent Office
Prior art keywords
vector
weak
vectors
predictor
strong
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP00200874A
Other languages
English (en)
French (fr)
Other versions
EP1035538A3 (de
EP1035538B1 (de
Inventor
Jacek Stachurski
Alan V. Mccree
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Texas Instruments Inc
Original Assignee
Texas Instruments Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Inc filed Critical Texas Instruments Inc
Publication of EP1035538A2 publication Critical patent/EP1035538A2/de
Publication of EP1035538A3 publication Critical patent/EP1035538A3/de
Application granted granted Critical
Publication of EP1035538B1 publication Critical patent/EP1035538B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes

Definitions

  • the present invention relates generally to the field of electronic devices, and, more particularly, to speech coding, technical transmission, storage, and synthesis circuitry and methods.
  • LPC linear predictive coding
  • the speech output from such LPC vocoders is not acceptable in many applications because it does not always sound like natural human speech, especially in the presence of background noise. And there is a demand for a speech vocoder with at least telephone quality speech at a bit rate of about 4 Kbps.
  • Various approaches to improve quality include enhancing the estimation of the parameters of a mixed excitation linear prediction (MELP) system and more efficient quantization of them. See Yeldener et al, A Mixed Sinusoidally Excited Linear Prediction coder at 4 kb/s and Below, Proc. IEEE Int. Conf. Acoust.,Speech,Signal Processing (1998) and Shlomot et al, Combined Harmonic and Waveform Coding of Speech at Low Bit Rates, IEEE ... 585 (1998).
  • MELP mixed excitation linear prediction
  • the present application discloses a linear predictive coding method with the residual's Fourier coefficients classified into overlapping classes with each class having its own vector quantization codebook(s).
  • both strongly predictive and weakly predictive codebooks may be used but with a weak predictor replacing a strong predictor which otherwise would have followed a weak predictor.
  • First preferred embodiments classify the spectra of the linear prediction (LP) residual (in a MELP coder) into classes of spectra (vectors) and vector quantize each class separately. For example, one first preferred embodiment classifies the spectra into long vectors (many harmonics which correspond roughly to low pitch frequency as typical of male speech) and short vectors (few harmonics which correspond roughly to high pitch frequency as typical of female speech). These spectra are then vector quantized with separate codebooks to facilitate encoding of vectors with different numbers of components (harmonics).
  • Figure 1a shows the classification flow and includes an overlap of the classes.
  • Second preferred embodiments allow for predictive coding of the spectra (or alternatively, other parameters such as line spectral frequencies or LSFs) and a selection of either the strong or weak predictor based on best approximation but with the proviso that a first strong predictor which otherwise follows a weak predictor is replaced with a weak predictor. This deters error propagation by a sequence of strong predictors of an error in a weak predictor preceding the series of strong predictors.
  • Figure 1b illustrates a predictive coding control flow.
  • Figures 2a-2b illustrate preferred embodiment MELP coding (analysis) and decoding (synthesis) in block format.
  • M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples y(n) is taken to be 8000 Hz (the same as the public telephone network sampling for digital transmission); and the number of samples ⁇ y(n) ⁇ in a frame is often 160 (a 20 msec frame) or 180 (a 22.5 msec frame).
  • a frame of samples may be generated by various windowing operations applied to the input speech samples.
  • ⁇ e(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
  • the coefficients ⁇ a(j) ⁇ may be converted to LSFs for quantization and transmission.
  • the ⁇ e(n) ⁇ form the LP residual for the frame and ideally would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
  • the LP residual is not available at the decoder; so the task of the encoder is to represent the LP residual so that the decoder can generate the LP excitation from the encoded parameters.
  • the Band-Pass voicing for a frequency band of samples determines whether the LP excitation derived from the LP residual ⁇ e(n) ⁇ should be periodic (voiced) or white noise (unvoiced) for a particular band.
  • the Pitch Analysis determines the pitch period (smallest period in voiced frames) by low pass filtering ⁇ y(n) ⁇ and then correlating ⁇ y(n) ⁇ with ⁇ y(n+m) ⁇ for various m; interpolations provide for fractional sample intervals.
  • the resultant pitch period is denoted pT where p is a real number, typically constrained to be in the range 20 to 132 and T is the sampling interval of 1/8 millisecond. Thus p is the number of samples in a pitch period.
  • the LP residual ⁇ e(n) ⁇ in voiced bands should be a combination of pitch-frequency harmonics.
  • Gain Analysis sets the overall energy level for a frame.
  • the encoding (and decoding) may be implemented with a digital signal processor (DSP) such as the TMS320C30 manufactured by Texas Instruments which can be programmed to perform the analysis or synthesis essentially in real time.
  • DSP digital signal processor
  • Figure 3a illustrates an LP residual ⁇ e(n) ⁇ for a voiced frame and includes about eight pitch periods with each pitch period about 26 samples.
  • Figure 3b shows the magnitudes of the ⁇ E(j) ⁇ for one particular period of the LP residual
  • Figure 3c shows the magnitudes of the ⁇ E(j) ⁇ for all eight pitch periods.
  • the Fourier coefficients peak about 1/pT, 2/pT, 3/pT, ..., k/pT, ...; that is, at the fundamental frequency 1/pT and harmonics.
  • p may not be an integer, and the magnitudes of the Fourier coefficients at the fundamental-frequency harmonics, denoted X[1], X[2], ..., X[k], ... must be estimated. These estimates will be quantized, transmitted, and used by the decoder to create the LP excitation.
  • the preferred embodiments use vector quantization of the spectra. That is, treat the set of Fourier coefficients X[1], X[2], ... X[k], ... as a vector in a multi-dimensional quantization, and transmit only the index of the output quantized vector. Note that there are [p] or [p]+1 coefficients, but only half of the components are significant due to their conjugate symmetry.
  • the set of output quantized vectors may be created by adaptive selection with a clustering method from a set of input training vectors. For example, a large number of randomly selected vectors (spectra) from various speakers can be used to form a codebook (or codebooks with multistep vector quantization) .
  • a quantized and coded version of an input spectrum X[1], X[2], ... X[k], ... can be transmitted as the index in the codebook of the quantized vector and which may be 20 bits.
  • the first preferred embodiments proceed with vector quantization of the Fourier coefficient spectra as follows.
  • Some vectors will qualify as both short and long vectors.
  • conjugate symmetry of the Fourier coefficients implies only the first half of the vector components are significant and used.
  • Each codebook has 2 20 output quantized vectors, so 20 bits will index the output quantized vectors in each codebook. One bit could be used to select the codebook, but the pitch is transmitted and can be used to determine whether the 20 bits are long or short vector quantization.
  • a differential (predictive) approach will decrease the quantization noise. That is, rather than vector quantize a spectrum X[1], X[2], ... X[k], ..., first generate a prediction of the spectrum from the preceding one or more frames' quantized spectra (vectors) and just quantize the difference. If the current frame's vector can be well approximated from the prior frames' vectors, then a "strong" prediction can be used in which the difference between the current frame's vector and a strong predictor may be small. Contrarily, if the current frame's vector cannot be well approximated from the prior frames' vectors, then a "weak" prediction (including no prediction) can be used in which the difference between the current frame's vector and a predictor may be large.
  • a simple prediction of the current frame's vector X could be the preceding frame's quantized vector Y, or more generally a multiple ⁇ Y with ⁇ a weight factor (between 0 and 1).
  • could be a diagonal matrix with different factors for different vector components.
  • the predictor ⁇ Y is close to Y and if also close to X, the difference vector X- ⁇ Y to be quantized is small compared to X. This would be a strong predictor, and the decoder recovers an estimate for X by Q(X- ⁇ Y) + ⁇ Y with the first term the quantized difference vector X- ⁇ Y and the second term from the previous frame and likely the dominant term.
  • the predictor is weak in that the difference vector X- ⁇ Y to be quantized is likely comparable to X.
  • 0 is no prediction at all and the vector to be quantized is X itself.
  • the parameters i.e., LSFs, Fourier coefficients, pitch, (7) corresponding to the current frame are considered lost or unreliable and the frame is reconstructed based on the parameters from the previous frames.
  • the error resulting from missing a set of parameters will propagate throughout the series of frames for which a strong prediction is used. If the error occurs in the middle of the series, the exact evolution of the predicted parameters is compromised and some perceptual distortion is usually introduced.
  • a frame erasure happens within a region where a weak predictor is consistently selected, the effect of the error will be localized (it will be quickly reduced by the weak prediction).
  • a second preferred embodiment analyzes the predictors used in a series of frames and controls their sequencing.
  • one preferred embodiment modifies the current frame to use the weak predictor but does not affect the next frame's predictor.
  • Figure 1b illustrates the decisions.
  • the usual decoder recovers X 2 as Q(X 2 -X 2strong ) + X 2strong with the second term dominant, and analogously for X 3 , X 4 , ...
  • the preferred embodiment decoder recovers X 2 as Q(X 2 -X 2weak ) + X 2weak but with the first term likely dominant.
  • the decoder recreates X 1weak from the preceding reconstructed frames' vectors X 0 , X 1 , ... , and similarly for X 2strong and X 2weak recreated from reconstructed X 1 , X 0 , ..., and likewise for the other predictors.
  • the vector Q(X 1 -X 1weak ) is lost and the decoder reconstructs the X 1 by something such as just repeating reconstructed X 0 from the prior frame. However, this may not be a very good approximation because originally a weak predictor was used.
  • the usual decoder reconstructs X 2 by Q(X 2 -X 2strong ) + Y 2strong with Y 2strong the strong predictor recreated from X 0 , X 0 , ... rather than from X 1 , X 0 , ... because X 1 was lost and replaced by possibly poor approximation X 0 .
  • the preferred embodiment reconstructs X 2 by Q(X 2 -X 2weak ) + Y 2weak with Y 2strong the weak predictor recreated from X 0 , X 0 , ... rather than from X 1 , X 0 , ... again because X 1 was lost and replaced by possibly poor approximation X 0 .
  • the error would roughly be X 2weak - Y 2weak which likely is small due to the weak predictor being the smaller term compared to the difference term Q(X 2 -X 2weak ). And this smaller error also applies to the reconstruction of X 3 , X 4 ,
  • Alternative second preferred embodiments modify two (or more) successive frame's strong predictors after a weak predictor frame to be weak predictors. That is, a sequence of weak, strong, strong, strong, ... would be changed to weak, weak, weak, strong, ...

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
EP20000200874 1999-03-12 2000-03-13 Multimodale Quantisierung des Prädiktionsfehlers in einem Sprachkodierer Expired - Lifetime EP1035538B1 (de)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US12411299P 1999-03-12 1999-03-12
US12408999P 1999-03-12 1999-03-12
US124112P 1999-03-12
US124089P 1999-03-12

Publications (3)

Publication Number Publication Date
EP1035538A2 true EP1035538A2 (de) 2000-09-13
EP1035538A3 EP1035538A3 (de) 2003-04-23
EP1035538B1 EP1035538B1 (de) 2005-07-27

Family

ID=26822196

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20000200874 Expired - Lifetime EP1035538B1 (de) 1999-03-12 2000-03-13 Multimodale Quantisierung des Prädiktionsfehlers in einem Sprachkodierer

Country Status (3)

Country Link
EP (1) EP1035538B1 (de)
JP (1) JP2000305597A (de)
DE (1) DE60021455T2 (de)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004059618A1 (en) * 2002-12-24 2004-07-15 Nokia Corporation Method and device for robust predictive vector quantization of linear prediction parameters in variable bit rate speech coding
US8224657B2 (en) 2002-07-05 2012-07-17 Nokia Corporation Method and device for efficient in-band dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for CDMA wireless systems

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249767A1 (en) * 2007-04-05 2008-10-09 Ali Erdem Ertan Method and system for reducing frame erasure related error propagation in predictive speech parameter coding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0751494A1 (de) * 1994-12-21 1997-01-02 Sony Corporation System zur kodierung von tonsignalen
US5749065A (en) * 1994-08-30 1998-05-05 Sony Corporation Speech encoding method, speech decoding method and speech encoding/decoding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5749065A (en) * 1994-08-30 1998-05-05 Sony Corporation Speech encoding method, speech decoding method and speech encoding/decoding method
EP0751494A1 (de) * 1994-12-21 1997-01-02 Sony Corporation System zur kodierung von tonsignalen

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ERIKSSON T ET AL: "Exploiting interframe correlation in spectral quantization: a study of different memory VQ schemes" 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING CONFERENCE PROCEEDINGS (CAT. NO.96CH35903), 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING CONFERENCE PROCEEDINGS, ATLANTA, GA, USA, 7-10 M, pages 765-768 vol. 2, XP002230715 1996, New York, NY, USA, IEEE, USA ISBN: 0-7803-3192-3 *
MARSTON D F: "Gender adapted speech coding" ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, 12 May 1998 (1998-05-12), pages 357-360, XP010279165 ISBN: 0-7803-4428-6 *
MCCREE A ET AL: "A 1.7 kb/s MELP coder with improved analysis and quantization" ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, 12 May 1998 (1998-05-12), pages 593-596, XP010279196 ISBN: 0-7803-4428-6 *
STACHURSKI JACEK ET AL: "High quality MELP coding at bit-rates around 4 kb/s" PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP-99);PHOENIX, AZ, USA MAR 15-MAR 19 1999, vol. 1, 1999, pages 485-488, XP002230714 ICASSP IEEE Int Conf Acoust Speech Signal Process Proc;ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1999 IEEE, Piscataway, NJ, USA *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8224657B2 (en) 2002-07-05 2012-07-17 Nokia Corporation Method and device for efficient in-band dim-and-burst signaling and half-rate max operation in variable bit-rate wideband speech coding for CDMA wireless systems
WO2004059618A1 (en) * 2002-12-24 2004-07-15 Nokia Corporation Method and device for robust predictive vector quantization of linear prediction parameters in variable bit rate speech coding
KR100712056B1 (ko) * 2002-12-24 2007-05-02 노키아 코포레이션 가변 비트율 음성 부호화에서의 선형 예측 매개변수들의견실한 예측 벡터 양자화를 위한 방법 및 장치

Also Published As

Publication number Publication date
DE60021455D1 (de) 2005-09-01
EP1035538A3 (de) 2003-04-23
EP1035538B1 (de) 2005-07-27
DE60021455T2 (de) 2006-05-24
JP2000305597A (ja) 2000-11-02

Similar Documents

Publication Publication Date Title
JP5343098B2 (ja) スーパーフレーム構造のlpcハーモニックボコーダ
JP4843124B2 (ja) 音声信号を符号化及び復号化するためのコーデック及び方法
JP5373217B2 (ja) 可変レートスピーチ符号化
KR100304682B1 (ko) 음성 코더용 고속 여기 코딩
Bradbury Linear predictive coding
EP0718822A2 (de) Mit niedriger Übertragungsrate und Rückwarts-Prädiktion arbeitendes Mehrmoden-CELP-Codec
CA2412449C (en) Improved speech model and analysis, synthesis, and quantization methods
KR20020052191A (ko) 음성 분류를 이용한 음성의 가변 비트 속도 켈프 코딩 방법
KR20010102004A (ko) Celp 트랜스코딩
TW463143B (en) Low-bit rate speech encoding method
EP1597721B1 (de) Melp (mixed excitation linear prediction)-transkodierung mit 600 bps
EP1035538B1 (de) Multimodale Quantisierung des Prädiktionsfehlers in einem Sprachkodierer
US7295974B1 (en) Encoding in speech compression
JPH07225599A (ja) 音声の符号化方法
EP1397655A1 (de) Verfahren und einrichtung zur codierung von sprache in analyse-durch-synthese-sprachcodierern
JP3496618B2 (ja) 複数レートで動作する無音声符号化を含む音声符号化・復号装置及び方法
JP3153075B2 (ja) 音声符号化装置
KR100554164B1 (ko) 서로 다른 celp 방식의 음성 코덱 간의 상호부호화장치 및 그 방법
KR0155798B1 (ko) 음성신호 부호화 및 복호화 방법
Gournay et al. A 1200 bits/s HSX speech coder for very-low-bit-rate communications
Papanastasiou et al. Efficient mixed excitation models in LPC based prototype interpolation speech coders
Drygajilo Speech Coding Techniques and Standards
KR20060064694A (ko) 디지털 음성 코더들에서의 고조파 잡음 가중
JPH02160300A (ja) 音声符号化方式
Viswanathan et al. A harmonic deviations linear prediction vocoder for improved narrowband speech transmission

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17P Request for examination filed

Effective date: 20031023

AKX Designation fees paid

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 20040206

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60021455

Country of ref document: DE

Date of ref document: 20050901

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20060428

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20150224

Year of fee payment: 16

Ref country code: GB

Payment date: 20150224

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20150331

Year of fee payment: 16

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60021455

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20160313

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20161130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160313

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160331

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20161001