EP0689189B1 - Methode zur Verbesserung der Eigenschaften von Sprachkodierern - Google Patents

Methode zur Verbesserung der Eigenschaften von Sprachkodierern Download PDF

Info

Publication number
EP0689189B1
EP0689189B1 EP95108870A EP95108870A EP0689189B1 EP 0689189 B1 EP0689189 B1 EP 0689189B1 EP 95108870 A EP95108870 A EP 95108870A EP 95108870 A EP95108870 A EP 95108870A EP 0689189 B1 EP0689189 B1 EP 0689189B1
Authority
EP
European Patent Office
Prior art keywords
excitation
signal
synthetic
free
objective
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP95108870A
Other languages
English (en)
French (fr)
Other versions
EP0689189A1 (de
Inventor
Silvio Cucchi
Marco Fratti
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel Lucent SAS
Original Assignee
Alcatel CIT SA
Alcatel SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel CIT SA, Alcatel SA filed Critical Alcatel CIT SA
Publication of EP0689189A1 publication Critical patent/EP0689189A1/de
Application granted granted Critical
Publication of EP0689189B1 publication Critical patent/EP0689189B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/083Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • Speech coding is of application in several communication fields: from transmission via satellite to radiomobile, store-forward systems, automatic responders, etc.
  • an excitation parameters computing method is provided as claimed in claim 1.
  • an audio coder is provided as claimed in claim 6.
  • the voice coders based on the Linear Prediction (LP) are parametric coders; typically Analysis-by-Synthesis (A-b-S) techniques are used for the correct determination of the parameters of the system.
  • Such coders synthesize the voice through the use of a suitable input excitation to a synthesis LP filter.
  • the excitation should have the characteristics of the "physical" excitation wave which, coming from the glottis, is then spectrally modified in function of the characteristics of the system that simulates the voice segment (LP filter).
  • A-b-S coders make use of an excitation structure which is composed of an Adaptive Codebook and of a Fixed Codebook (eventually structured). Without prejudicing the generality, it can be assumed that the Fixed Codebook is composed of independent vectors of random numbers, as in the case of CELP coders (M.R. Schroeder, B.S. Atal, "Code Excited Linear Prediction (CELP): high-quality speech at very low bit rates", Proc. ICASSP '85, pages 937-940).
  • CELP Code Excited Linear Prediction
  • Fig. 1 there is represented a block diagram of a typical CELP voice synthesizer; block LPC-IIR denotes the synthesis filter for reconstructing the voice waveform; e a (n) is the adaptive codebook vector (and Ga is the corresponding scaling factor) and e s (n) is the fixed codebook vector (and Gs is the corresponding scaling factor); e(n) is the composite excitation vector.
  • e a (n) is the adaptive codebook vector (and Ga is the corresponding scaling factor)
  • e s (n) is the fixed codebook vector (and Gs is the corresponding scaling factor)
  • e(n) is the composite excitation vector.
  • e a (n) and e s (n) are selected from a suitable set of vectors and are determined simultaneously with respective Ga and Gs. The determination occurs in a time interval of about 5 to 10 ms (analysis frame) and is based on the minimization of the objective function according to the well-known criterion of the perceptively weighted mean-squared error (see M.R. Schroeder, B.S. Atal, "Code Excited Linear Prediction (CELP): high-quality speech at very low bit-rates", Proc.
  • CELP Code Excited Linear Prediction
  • ICASSP '85, pages 937 to 940 according to the following expression : where N is the length of the time interval for minimization; u i (n) is the zero-state synthesis filter response at the the i-th input of the Codebook (either adaptive or fixed) and G is the corresponding gain; lastly, r s (n) is the reference signal or "objective" signal (i.e. the original voice segment from which the contribution of the reconstruction filter memory deriving from previous synthesis has been subtracted).
  • a first approach consists in using a signal r s el (n) longer than N samples as a reference signal of the objective function (i.e. signal r s (n) of eq. (1) ).
  • the voice can always be considered as obtained from an ideal excitation that constitutes the input of an all-pole synthesis filter (the filter denoted by LPC-IIR in Fig. 1).
  • Such ideal excitation is nothing else than the prediction residue, obtained by filtering the voice through the "inverse filter", i.e. the all-zero filter derived from LPC-IIR.
  • the synthetic excitation has spectral and time location (e.g. the pitch pulse) characteristics similar to those of the ideal excitation. Therefore, it is evident that by including, in the objective function, the contributions of the free evolutions due to both the ideal excitation and to the synthetic excitation, it is possible to carry out a more correct choice of the latter. In fact, depending on the spectral/time characteristics of the signal, the difference between the ideal free evolution and the synthetic one may have a preponderant weight in the modified objective function.
  • the excitation parameters i.e. the i-th index and the corresponding gain G are then choosen in such a way as to minimize the modified objective function (4).
  • the synthetic excitation has spectral and time location (e.g. pitch pulse) characteristics, similar to the ones present in the ideal excitation. From this it derives that it may be important to obtain not only a good similarity between the original voice and the synthetic voice, but also a good similarity between ideal excitation and synthetic excitation.
  • the parameters of the reconstructed excitation allow the achievement of a synthetic voice which "averagely" is similar to the original voice.
  • e s (n) is the prediction residue obtained from the reference signal r s (n) and e i (n) is the codebook excitation generating the synthetic signal u i (n).
  • the prediction residue e s (n) must be calculated starting from r s (n) through inverse filtering (with all-zero filter) with null initial state.
  • the reference has been obtained from the voice signal by subtracting its reconstruction filter memory deriving from the previous synthesis. The reference signal is then "free" from every contribution due to the filter memory and can be considered as obtained from a suitable ideal excitation e s (n) coming into the synthesis filter with null initial state.
  • excitation parameters i.e. the i-th index and the corresponding gain G
  • Parameter ⁇ can be either fixed or even made adaptive (i.e. varying with time), for instance in function of certain characteristics of the signal that can be estimated a priori (e.g.: estimate of voiced/unvoiced, estimate of transients, estimate of the pitch period or of the synthesis filter, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)

Claims (6)

  1. Verfahren zum Berechnen der Anregungsparameter in Sprachcodierern basierend auf linearen Vorhersage- und Analyse-durch-Synthese-Techniken, die eine zu minimierende Zielfunktion verwenden, dadurch gekennzeichnet, daß die Zielfunktion gemeinsam oder alternativ a) die freie Entwicklung des Zielsignals und des synthetischen Signals und b) eine Gewichtung im Hinblick auf den Fehler zwischen dem Vorhersage-Rückstand und der synthetischen Anregung umfaßt.
  2. Verfahren nach Anspruch 1 in den Alternativen a) oder a) und b), dadurch gekennzeichnet, daß die Zielfunktion: Ex = αE1 + (1 - α)E3    verwendet wird, wobei die Funktion E1 neben dem Fehler zwischen den Ziel signalen und den synthetischen Signalen auch den Fehler zwischen den relativen freien Entwicklungen berücksichtigt, und die Funktion E3 den Fehler zwischen dem Vorhersage-Rückstand und der synthetischen Anregung berücksichtigt, und 0 < α ≤ 0 ist.
  3. Verfahren nach Anspruch 2, dadurch gekennzeichnet, daß die Funktion E1 gegeben ist durch:
    Figure 00140001
       wobei N die Länge des Zeitintervalls für die Minimierung ist, M die freie Entwicklungslänge ist, rs el(n) das durch eine freie Entwicklung erhaltene erweiterte Referenzsignal ist, ui el(n) die erweiterte Null-Zustands-Synthesefilterantwort an dem i-ten Eingang des Codebuches ist, und G die entsprechende Verstärkung ist.
  4. Verfahren nach Anspruch 2, dadurch gekennzeichnet, daß die Funktion E3 gegeben ist durch:
    Figure 00140002
       wobei es(n) der von dem Referenzsignal erhaltene Vorhersage-Rückstand ist und ei(n) das Codebuch-Anregungssignal ist.
  5. Verfahren nach Anspruch 2, dadurch gekennzeichnet, daß der Gewichtsfaktor zeitlich variierbar ist.
  6. Toncodierer, der umfaßt:
    Mittel zum Ausführen einer linearen Vorhersage,
    Mittel zum Ausführen einer Analyse-durch-Synthese, und
    Mittel zum Berechnen der Anregungsparameter unter Verwendung einer zu minimierenden Zielfunktion,
       dadurch gekennzeichnet, daß die Zielfunktion gemeinsam oder alternativ
    a) die freie Entwicklung des Zielsignals und des synthetischen Signals, und
    b) eine Gewichtung im Hinblick auf den Fehler zwischen dem Vorhersage-Rückstand und der synthetischen Anregung umfaßt.
EP95108870A 1994-06-20 1995-06-08 Methode zur Verbesserung der Eigenschaften von Sprachkodierern Expired - Lifetime EP0689189B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITMI941283 1994-06-20
ITMI941283A IT1271182B (it) 1994-06-20 1994-06-20 Metodo per migliorare le prestazioni dei codificatori vocali

Publications (2)

Publication Number Publication Date
EP0689189A1 EP0689189A1 (de) 1995-12-27
EP0689189B1 true EP0689189B1 (de) 1999-09-22

Family

ID=11369140

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95108870A Expired - Lifetime EP0689189B1 (de) 1994-06-20 1995-06-08 Methode zur Verbesserung der Eigenschaften von Sprachkodierern

Country Status (4)

Country Link
EP (1) EP0689189B1 (de)
AU (1) AU698340B2 (de)
DE (1) DE69512323T2 (de)
IT (1) IT1271182B (de)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3273455B2 (ja) * 1994-10-07 2002-04-08 日本電信電話株式会社 ベクトル量子化方法及びその復号化器
DE10047172C1 (de) * 2000-09-22 2001-11-29 Siemens Ag Verfahren zur Sprachverarbeitung

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5235669A (en) * 1990-06-29 1993-08-10 At&T Laboratories Low-delay code-excited linear-predictive coding of wideband speech at 32 kbits/sec
FI98104C (fi) * 1991-05-20 1997-04-10 Nokia Mobile Phones Ltd Menetelmä herätevektorin generoimiseksi ja digitaalinen puhekooderi
JPH06138896A (ja) * 1991-05-31 1994-05-20 Motorola Inc 音声フレームを符号化するための装置および方法

Also Published As

Publication number Publication date
DE69512323D1 (de) 1999-10-28
EP0689189A1 (de) 1995-12-27
AU2175395A (en) 1996-01-04
ITMI941283A0 (it) 1994-06-20
ITMI941283A1 (it) 1995-12-20
AU698340B2 (en) 1998-10-29
IT1271182B (it) 1997-05-27
DE69512323T2 (de) 2000-07-06

Similar Documents

Publication Publication Date Title
Spanias Speech coding: A tutorial review
US5826224A (en) Method of storing reflection coeffients in a vector quantizer for a speech coder to provide reduced storage requirements
CA2177421C (en) Pitch delay modification during frame erasures
CA2031006C (en) Near-toll quality 4.8 kbps speech codec
US5138661A (en) Linear predictive codeword excited speech synthesizer
EP1338002B1 (de) Verfahren und vorrichtung zur einstufigen oder zweistufigen geräuschrückkopplungs kodierung von sprach- und audiosignalen
KR100389178B1 (ko) 음성디코더및그의이용을위한방법
EP0731449B1 (de) Verfahren zur Modifikation von LPC-Koeffizienten von akustischen Signalen
DE69934608T2 (de) Adaptive kompensation der spektralen verzerrung eines synthetisierten sprachresiduums
US20080275698A1 (en) Excitation vector generator, speech coder and speech decoder
EP0718822A2 (de) Mit niedriger Übertragungsrate und Rückwarts-Prädiktion arbeitendes Mehrmoden-CELP-Codec
US6148282A (en) Multimodal code-excited linear prediction (CELP) coder and method using peakiness measure
US5598504A (en) Speech coding system to reduce distortion through signal overlap
Salami et al. 8 kbit/s ACELP coding of speech with 10 ms speech-frame: A candidate for CCITT standardization
US6169970B1 (en) Generalized analysis-by-synthesis speech coding method and apparatus
EP0557940B1 (de) Sprachkodierungsystem
US5313554A (en) Backward gain adaptation method in code excited linear prediction coders
US6704703B2 (en) Recursively excited linear prediction speech coder
EP0689189B1 (de) Methode zur Verbesserung der Eigenschaften von Sprachkodierern
Paulus Variable bitrate wideband speech coding using perceptually motivated thresholds
EP0745972B1 (de) Verfahren und Vorrichtung zur Sprachkodierung
EP0539103B1 (de) Verallgemeinerte Analyse-durch-Synthese Methode und Einrichtung zur Sprachkodierung
Ekudden et al. ITU-t g. 729 extension at 6.4 kbps.
Kim et al. A 4 kbps adaptive fixed code-excited linear prediction speech coder
GB2352949A (en) Speech coder for communications unit

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB IT

17P Request for examination filed

Effective date: 19960111

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 19980810

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ALCATEL

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB IT

ITF It: translation for a ep patent filed

Owner name: BORSANO CORRADO

REF Corresponds to:

Ref document number: 69512323

Country of ref document: DE

Date of ref document: 19991028

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20060630

Year of fee payment: 12

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070608

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20140618

Year of fee payment: 20

REG Reference to a national code

Ref country code: FR

Ref legal event code: GC

Effective date: 20140717

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20140619

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20140619

Year of fee payment: 20

REG Reference to a national code

Ref country code: FR

Ref legal event code: RG

Effective date: 20141016

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 69512323

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20150607

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20150607