EP0484339B1 - Digital speech coder with vector excitation source having improved speech quality - Google Patents

Digital speech coder with vector excitation source having improved speech quality Download PDF

Info

Publication number
EP0484339B1
EP0484339B1 EP90908908A EP90908908A EP0484339B1 EP 0484339 B1 EP0484339 B1 EP 0484339B1 EP 90908908 A EP90908908 A EP 90908908A EP 90908908 A EP90908908 A EP 90908908A EP 0484339 B1 EP0484339 B1 EP 0484339B1
Authority
EP
European Patent Office
Prior art keywords
excitation
excitation signal
signal
candidate
pitch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP90908908A
Other languages
German (de)
English (en)
French (fr)
Other versions
EP0484339A4 (en
EP0484339A1 (en
Inventor
Ira Alan Gerson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Publication of EP0484339A1 publication Critical patent/EP0484339A1/en
Publication of EP0484339A4 publication Critical patent/EP0484339A4/en
Application granted granted Critical
Publication of EP0484339B1 publication Critical patent/EP0484339B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • G10L2019/0005Multi-stage vector quantisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms

Definitions

  • This invention relates generally to speech coders, and more particularly to digital speech coders that use vector excitation sources.
  • Speech coders are known in the art. Some speech coders convert analog voice samples into digitized representations, and subsequently represent the spectral speech information through use of linear predictive coding. Other speech coders improve upon ordinary linear predictive coding techniques by providing an excitation signal that is related to the original voice signal.
  • Previously issued U.S. Patent No. 4,817,157 describes a digital speech coder having an improved vector excitation source wherein a codebook of excitation vectors is accessed to select an excitation signal that best fits the available information, and hence provides a recovered speech signal that closely represents the original.
  • the resultant decoded speech signal will more closely represent the original unencoded speech signal if there is a significant number of candidate excitation vectors available for consideration as the excitation source.
  • Increasing performance in this way generally results in enlargement of the codebook size, and this will usually increase processing complexity and data rates.
  • This methodology allows candidate excitation signals to be considered without requiring a commensurate increase in processing complexity or data rates.
  • the coded excitation signal is determined substantially independent from any pitch information.
  • candidate excitation signals as provided by a codebook are processed to substantially remove components that are representable, at least in part, by a reference component that is related, at least in part, to the intermediate pitch vector. More particularly, the vector component related to the intermediate pitch vector is removed from the candidate excitation signal (a process known as orthogonalizing).
  • the orthogonalized candidate excitation signals are then compared with the unencoded speech sample to identify the candidate excitation signal that best represents this particular speech sample.
  • the pitch information including a pitch filter coefficient parameter, can be optimized later to best suit the selected excitation signal to thereby yield an overall optimized coded representation of the speech signal.
  • a second codebook of candidate excitation signals wherein two excitation signals are used to represent the speech sample.
  • the first excitation signal can be selected as described above, and the second excitation signal can be selected in a similar manner, wherein candidate second excitation signals are first orthogonalized with respect to both the intermediate pitch vector and the previously selected first excitation signal.
  • This invention can be embodied in a speech coder that makes use of an appropriate digital signal processor such as a Motorola DSP 56000 family device.
  • an appropriate digital signal processor such as a Motorola DSP 56000 family device.
  • the computational functions of such a DSP embodiment are represented in Fig. 1 as a block diagram equivalent circuit.
  • a pitch period parameter (101) (determined in accordance with prior art technique) is provided to a pitch filter state (102) that comprises part of a pitch filter.
  • the resultant signal (103) comprises an intermediate pitch vector that is provided to both a first multiplier (104) and two orthogonalizing processes (106 and 107) as described below in more detail.
  • This first multiplier (104) functions to multiply the resultant signal by a pitch filter coefficient (108) to yield a pitch filter output (109). Selection of the pitch filter coefficient (108) will be described below in more detail.
  • a first codebook (111) includes a set of basis vectors that can be linearly combined to form a plurality of resultant excitation signals.
  • the number of possible resultant excitation signals can be, for example, between 64 and 2,048, with more of course being possible when appropriate to a particular application.
  • the problem, when encoding a particular speech sample, is to select whichever of these excitation sources best represents the corresponding component of the original speech information.
  • the excitation signals formulated by the first codebook (111) will be presented in seriatim fashion as candidate excitation sources.
  • Each candidate excitation source will first be orthogonalized (106) with respect to the resultant signal.
  • the vector dimension space is a function of the number of samples comprising the vectors, which may be upwards of 40 samples or more.
  • the candidate excitation vectors may be readily orthogonalized by orthogonalizing the basis vectors, wherein linear combinations of the orthogonadized basis vectors with one another will result in orthogonalized excitation vectors.
  • the resulting candidate excitation source can be compared (112) with the unencoded signal (113) (or an appropriate representative signal based thereon) to determine the relative similarity or disparity between the two.
  • the process is then repeated for each of the excitation sources of the first codebook (111). A determination can then be made as to which candidate excitation source most closely aligns with the unencoded signal (113).
  • a gain factor (114) can also be used to modify each candidate excitation source signal, as well understood in the art.
  • the excitation source selection and gain compensation can both be accomplished in a substantially simultaneous manner, as also well understood in the art.
  • the orthogonalizing process (106) can thereafter be dispensed with and the exact excitation source signal selected (116) through an appropriate control mechanism (117). Thereafter, presuming a single codebook coder, the pitch information can be gated (117) and summed (118) together with the selected excitation source with the pitch filter coefficient (108) and excitation gain (114) optimized such that the combined excitation most closely aligns with the encoded signal (113).
  • the pitch period parameter, pitch filter coefficient, and particular excitation source and gain are known, and appropriate representations thereof may be utilized thereafter as representative of the original speech sample.
  • an additional codebook (121) can be utilized, which second codebook (121) again includes a plurality of basis vector derived candidate excitation sources.
  • the use of such multiple codebooks is understood in the art.
  • the candidate excitation sources from the second codebook (121) are orthogonalized (107) with respect to both the resultant signal (103) and the selected excitation source signal from the first codebook (111).
  • the selection process can then continue as described above, with the orthogonalized candidate excitation source signals from the second codebook (121) being compared against a representative unencoded signal (113) to identify the closest fit.
  • the pitch filter coefficient (108) and excitation gains (114 and 120) can then be optimized as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Analogue/Digital Conversion (AREA)
EP90908908A 1989-06-23 1990-05-02 Digital speech coder with vector excitation source having improved speech quality Expired - Lifetime EP0484339B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US37054189A 1989-06-23 1989-06-23
US370541 1989-07-23
PCT/US1990/002469 WO1991001545A1 (en) 1989-06-23 1990-05-02 Digital speech coder with vector excitation source having improved speech quality

Publications (3)

Publication Number Publication Date
EP0484339A1 EP0484339A1 (en) 1992-05-13
EP0484339A4 EP0484339A4 (en) 1993-05-05
EP0484339B1 true EP0484339B1 (en) 1998-02-04

Family

ID=23460115

Family Applications (1)

Application Number Title Priority Date Filing Date
EP90908908A Expired - Lifetime EP0484339B1 (en) 1989-06-23 1990-05-02 Digital speech coder with vector excitation source having improved speech quality

Country Status (10)

Country Link
EP (1) EP0484339B1 (he)
KR (1) KR950003557B1 (he)
CN (1) CN1023160C (he)
AU (1) AU638462B2 (he)
BR (1) BR9007467A (he)
CA (1) CA2060310C (he)
DE (1) DE69032026T2 (he)
IL (1) IL94119A (he)
NZ (1) NZ234180A (he)
WO (1) WO1991001545A1 (he)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0451199A (ja) * 1990-06-18 1992-02-19 Fujitsu Ltd 音声符号化・復号化方式
JPH0451200A (ja) * 1990-06-18 1992-02-19 Fujitsu Ltd 音声符号化方式
IT1241358B (it) * 1990-12-20 1994-01-10 Sip Sistema di codifica del segnale vocale con sottocodice annidato
JP2776050B2 (ja) * 1991-02-26 1998-07-16 日本電気株式会社 音声符号化方式
DE4315315A1 (de) * 1993-05-07 1994-11-10 Ant Nachrichtentech Verfahren zur Vektorquantisierung insbesondere von Sprachsignalen
US5727122A (en) * 1993-06-10 1998-03-10 Oki Electric Industry Co., Ltd. Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method
JP3224955B2 (ja) * 1994-05-27 2001-11-05 株式会社東芝 ベクトル量子化装置およびベクトル量子化方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1252568A (en) * 1984-12-24 1989-04-11 Kazunori Ozawa Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate
US4868867A (en) * 1987-04-06 1989-09-19 Voicecraft Inc. Vector excitation speech or audio coder for transmission or storage
US4899385A (en) * 1987-06-26 1990-02-06 American Telephone And Telegraph Company Code excited linear predictive vocoder

Also Published As

Publication number Publication date
DE69032026T2 (de) 1998-09-17
KR950003557B1 (ko) 1995-04-14
CA2060310C (en) 2001-07-17
NZ234180A (en) 1993-11-25
IL94119A (he) 1996-06-18
DE69032026D1 (de) 1998-03-12
CN1023160C (zh) 1993-12-15
EP0484339A4 (en) 1993-05-05
CA2060310A1 (en) 1990-12-24
BR9007467A (pt) 1992-06-16
AU5735990A (en) 1991-02-22
IL94119A0 (en) 1991-01-31
EP0484339A1 (en) 1992-05-13
KR920702787A (ko) 1992-10-06
CN1048278A (zh) 1991-01-02
AU638462B2 (en) 1993-07-01
WO1991001545A1 (en) 1991-02-07

Similar Documents

Publication Publication Date Title
EP0409239B1 (en) Speech coding/decoding method
CA2023167C (en) Speech coding system and a method of encoding speech
EP0443548A2 (en) Speech coder
US5633980A (en) Voice cover and a method for searching codebooks
US5694426A (en) Signal quantizer with reduced output fluctuation
EP0415163B1 (en) Digital speech coder having improved long term lag parameter determination
JPH04270398A (ja) 音声符号化方式
JPH04134400A (ja) 音声符号化装置
EP0484339B1 (en) Digital speech coder with vector excitation source having improved speech quality
CA2147394C (en) Quantization of input vectors with and without rearrangement of vector elements of a candidate vector
EP0578436A1 (en) Selective application of speech coding techniques
US6330531B1 (en) Comb codebook structure
JP2002268686A (ja) 音声符号化装置及び音声復号化装置
US5924063A (en) Celp-type speech encoder having an improved long-term predictor
US5822721A (en) Method and apparatus for fractal-excited linear predictive coding of digital signals
KR100416363B1 (ko) 선형 예측 분석 대 합성 엔코딩 방법 및 엔코더
JP2658816B2 (ja) 音声のピッチ符号化装置
JP3183944B2 (ja) 音声符号化装置
JP3249144B2 (ja) 音声符号化装置
EP0483882A2 (en) Speech parameter encoding method capable of transmitting a spectrum parameter with a reduced number of bits
JP3471889B2 (ja) 音声符号化方法及び装置
JP3228389B2 (ja) 利得形状ベクトル量子化装置
JP3192051B2 (ja) 音声符号化装置
JP2876785B2 (ja) 改善された音声品質を有するベクトル励起源を具備するデジタル音声符号器
US5761633A (en) Method of encoding and decoding speech signals

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19920109

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

A4 Supplementary search report drawn up and despatched

Effective date: 19930319

AK Designated contracting states

Kind code of ref document: A4

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 19951030

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

ET Fr: translation filed
REF Corresponds to:

Ref document number: 69032026

Country of ref document: DE

Date of ref document: 19980312

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20090507

Year of fee payment: 20

Ref country code: DE

Payment date: 20090529

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20090407

Year of fee payment: 20

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20100501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20100501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20100502

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230520