WO1991001545A1 - Digital speech coder with vector excitation source having improved speech quality - Google Patents
Digital speech coder with vector excitation source having improved speech quality Download PDFInfo
- Publication number
- WO1991001545A1 WO1991001545A1 PCT/US1990/002469 US9002469W WO9101545A1 WO 1991001545 A1 WO1991001545 A1 WO 1991001545A1 US 9002469 W US9002469 W US 9002469W WO 9101545 A1 WO9101545 A1 WO 9101545A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- excitation signal
- excitation
- candidate
- signal
- determining
- Prior art date
Links
- 230000005284 excitation Effects 0.000 title claims abstract description 88
- 239000013598 vector Substances 0.000 title abstract description 27
- 238000000034 method Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000010420 art technique Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0004—Design or structure of the codebook
- G10L2019/0005—Multi-stage vector quantisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0011—Long term prediction filters, i.e. pitch estimation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0013—Codebook search algorithms
Definitions
- This invention relates generally to speech coders, and more particularly to digital speech coders that use vector excitation sources.
- Speech coders are known in the art. Some speech coders convert analog voice samples into digitized representations, and subsequently represent the spectral speech information through use of linear predictive coding. Other speech coders improve upon ordinary linear predictive coding techniques by providing an excitation signal that is related to the original voice signal. I have described, in previously issued U.S. Patent No. 4,817,157, a digital speech coder having an improved vector excitation source wherein a codebook of excitation vectors is accessed to select an excitation signal that best fits the available information, and hence provides a recovered speech signal that closely represents the original. In general, the resultant decoded speech signal will more closely represent the original unencoded speech signal if there is a significant number of candidate excitation vectors available for consideration as the excitation source. Increasing performance in this way, however, generally results in enlargement of the codebook size, and this will usually increase processing complexity and data rates.
- the coder when encoding a signal sample, such as a speech sample, the coder first determines a pitch period parameter for the speech sample. Relying in part upon this pitch period parameter, a particular coded excitation signal can be determined independent of the pitch filter coefficient, following which the pitch filter coefficient parameter can be optimized for that particular speech sample. This methodology allows candidate excitation signals to be considered without requiring a commensurate increase in processing complexity or data rates.
- the coded excitation signal is determined substantially independent from any pitch information.
- candidate excitation signals as provided by a codebook are processed to substantially remove components that are representable, at least in part, by a reference component that is related, at least in part, to the intermediate pitch vector. More particularly, the vector component related to the intermediate pitch vector is removed from the candidate excitation signal (a process known as orthogonalizing).
- the orthogonalized candidate excitation signals are then compared with the unencoded speech sample to identify the candidate excitation signal that best represents this particular speech sample.
- the pitch information including a pitch filter coefficient parameter, can be optimized later to best suit the selected excitation signal to thereby yield an overall optimized coded representation of the speech signal.
- a second codebook of candidate excitation signals wherein two excitation signals are used to represent the speech sample.
- the first excitation signal can be selected as described above, and the second excitation signal can be selected in a similar manner, wherein candidate second excitation signals are first orthogonalized with respect to both the intermediate pitch vector and the previously selected first excitation signal.
- Fig. 1 comprises a block diagrammatic depiction of the invention
- Fig. 2 comprises a simple vector diagram representing one aspect of the invention. Best Mode For Carrying Out The Invention:
- This invention can be embodied in a speech coder that makes use of an appropriate digital signal processor such as a Motorola DSP 56000 family device.
- the computational functions of such a DSP embodiment are represented in Fig. 1 as a block diagram equivalent circuit.
- a pitch period parameter (101 ) (determined in accordance with prior art technique) is provided to a pitch filter state (102) that comprises part of a pitch filter.
- the resultant signal (103) comprises an intermediate pitch vector that is provided to both a first multiplier (104) and two orthogonalizing processes (106 and 107) as described below in more detail.
- This first multiplier (104) functions to multiply the resultant signal by a pitch filter coefficient (108) to yield a pitch filter output (109). Selection of the pitch filter coefficient (108) will be described below in more detail.
- a first codebook (111) includes a set of basis vectors that can be linearly combined to form a plurality of resultant excitation signals.
- the number of possible resultant excitation signals can be, for example, between 64 and 2,048, with more of course being possible when appropriate to a particular application.
- the problem, when encoding a particular speech sample, is to select whichever of these excitation sources best represents the corresponding component of the original speech information.
- the excitation signals formulated by the first codebook (111 ) will be presented in seriatim fashion as candidate excitation sources.
- Each candidate excitation source will first be orthogonalized (106) with respect to the resultant signal. For example, referring momentarily to Fig. 2, if vector A were considered to represent the resultant signal and vector B were to represent a particular candidate excitation source, orthogonalization of the candidate excitation source signal would result in the vector denoted by reference character B'.
- the vector dimension space is a function of the number of samples comprising the vectors, which may be upwards of 40 samples or more.
- the candidate excitation vectors may be readily orthogonalized by orthogonalizing the basis vectors, wherein linear combinations of the orthogonadized basis vectors with one another will result in orthogonalized excitation vectors.
- the resulting candidate excitation source can be compared (112) with the unencoded signal (113) (or an appropriate representative signal based thereon) to determine the relative similarity or disparity between the two.
- the process is then repeated for each of the excitation sources of the first codebook (111 ).
- a determination can then be made as to which candidate excitation source most closely aligns with the unencoded signal (113).
- a gain factor (1 14) can also be used to modify each candidate excitation source signal, as well understood in the art.
- the excitation source selection and gain compensation can both be accomplished in a substantially simultaneous manner, as also well understood in the art.
- the orthogonalizing process (106) can thereafter be dispensed with and the exact excitation source signal selected (116) through an appropriate control mechanism (117). Thereafter, presuming a single codebook coder, the pitch information can be gated (117) and summed (118) together with the selected excitation source with the pitch filter coefficient (108) and excitation gain (114) optimized such that the combined excitation most closely aligns with the encoded signal (113).
- the pitch period parameter, pitch filter coefficient, and particular excitation source and gain are known, and appropriate representations thereof may be utilized thereafter as representative of the original speech sample.
- an additional codebook (121) can be utilized, which second codebook (121) again includes a plurality of basis vector derived candidate excitation sources.
- the use of such multiple codebooks is understood in the art.
- the candidate excitation sources from the second codebook (121) are orthogonalized (107) with respect to both the resultant signal (103) and the selected excitation source signal from the first codebook (111 ).
- the selection process can then continue as described above, with the orthogonalized candidate excitation source signals from the second codebook (121 ) being compared against a representative unencoded signal (113) to identify the closest fit.
- the pitch filter coefficient (108) and excitation gains (114 and 120) can then be optimized as described above. What is claimed is:
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Analogue/Digital Conversion (AREA)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP90908908A EP0484339B1 (en) | 1989-06-23 | 1990-05-02 | Digital speech coder with vector excitation source having improved speech quality |
BR909007467A BR9007467A (pt) | 1989-06-23 | 1990-05-02 | Processo de codificacao de amostra de fala e processo de codificacao de amostra de sinal |
DE69032026T DE69032026T2 (de) | 1989-06-23 | 1990-05-02 | Digitaler sprachcodierer mit verbesserter sprachqualität unter anwendung einer vektoranregungsquelle |
CA002060310A CA2060310C (en) | 1989-06-23 | 1990-05-02 | Digital speech coder with vector excitation source having improved speech quality |
KR1019910701947A KR950003557B1 (ko) | 1989-06-23 | 1990-05-02 | 음성 샘플 및 신호 샘플 엔코딩 방법 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US37054189A | 1989-06-23 | 1989-06-23 | |
US370,541 | 1989-06-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1991001545A1 true WO1991001545A1 (en) | 1991-02-07 |
Family
ID=23460115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1990/002469 WO1991001545A1 (en) | 1989-06-23 | 1990-05-02 | Digital speech coder with vector excitation source having improved speech quality |
Country Status (10)
Country | Link |
---|---|
EP (1) | EP0484339B1 (zh) |
KR (1) | KR950003557B1 (zh) |
CN (1) | CN1023160C (zh) |
AU (1) | AU638462B2 (zh) |
BR (1) | BR9007467A (zh) |
CA (1) | CA2060310C (zh) |
DE (1) | DE69032026T2 (zh) |
IL (1) | IL94119A (zh) |
NZ (1) | NZ234180A (zh) |
WO (1) | WO1991001545A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0462559A2 (en) * | 1990-06-18 | 1991-12-27 | Fujitsu Limited | Speech coding and decoding system |
EP0462558A2 (en) * | 1990-06-18 | 1991-12-27 | Fujitsu Limited | Speech coding system |
US5353373A (en) * | 1990-12-20 | 1994-10-04 | Sip - Societa Italiana Per L'esercizio Delle Telecomunicazioni P.A. | System for embedded coding of speech signals |
WO1994027286A1 (de) * | 1993-05-07 | 1994-11-24 | Ant Nachrichtentechnik Gmbh | Verfahren zur vektorquantisierung, insbesondere von sprachsignalen |
EP0654909A1 (en) * | 1993-06-10 | 1995-05-24 | Oki Electric Industry Company, Limited | Code excitation linear prediction encoder and decoder |
US5677986A (en) * | 1994-05-27 | 1997-10-14 | Kabushiki Kaisha Toshiba | Vector quantizing apparatus |
EP0898267A2 (en) * | 1991-02-26 | 1999-02-24 | Nec Corporation | Speech coding method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4821324A (en) * | 1984-12-24 | 1989-04-11 | Nec Corporation | Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate |
US4868867A (en) * | 1987-04-06 | 1989-09-19 | Voicecraft Inc. | Vector excitation speech or audio coder for transmission or storage |
US4899385A (en) * | 1987-06-26 | 1990-02-06 | American Telephone And Telegraph Company | Code excited linear predictive vocoder |
-
1990
- 1990-04-18 IL IL9411990A patent/IL94119A/en not_active IP Right Cessation
- 1990-05-02 CA CA002060310A patent/CA2060310C/en not_active Expired - Lifetime
- 1990-05-02 WO PCT/US1990/002469 patent/WO1991001545A1/en active IP Right Grant
- 1990-05-02 AU AU57359/90A patent/AU638462B2/en not_active Expired
- 1990-05-02 BR BR909007467A patent/BR9007467A/pt not_active IP Right Cessation
- 1990-05-02 DE DE69032026T patent/DE69032026T2/de not_active Expired - Lifetime
- 1990-05-02 EP EP90908908A patent/EP0484339B1/en not_active Expired - Lifetime
- 1990-05-02 KR KR1019910701947A patent/KR950003557B1/ko not_active IP Right Cessation
- 1990-06-19 CN CN90103020A patent/CN1023160C/zh not_active Expired - Lifetime
- 1990-06-21 NZ NZ234180A patent/NZ234180A/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4821324A (en) * | 1984-12-24 | 1989-04-11 | Nec Corporation | Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate |
US4868867A (en) * | 1987-04-06 | 1989-09-19 | Voicecraft Inc. | Vector excitation speech or audio coder for transmission or storage |
US4899385A (en) * | 1987-06-26 | 1990-02-06 | American Telephone And Telegraph Company | Code excited linear predictive vocoder |
Non-Patent Citations (1)
Title |
---|
See also references of EP0484339A4 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799131A (en) * | 1990-06-18 | 1998-08-25 | Fujitsu Limited | Speech coding and decoding system |
EP0462558A2 (en) * | 1990-06-18 | 1991-12-27 | Fujitsu Limited | Speech coding system |
EP0462559A3 (en) * | 1990-06-18 | 1992-08-05 | Fujitsu Limited | Speech coding and decoding system |
EP0462558A3 (en) * | 1990-06-18 | 1992-08-12 | Fujitsu Limited | Speech coding system |
EP0462559A2 (en) * | 1990-06-18 | 1991-12-27 | Fujitsu Limited | Speech coding and decoding system |
US5353373A (en) * | 1990-12-20 | 1994-10-04 | Sip - Societa Italiana Per L'esercizio Delle Telecomunicazioni P.A. | System for embedded coding of speech signals |
EP0898267A3 (en) * | 1991-02-26 | 1999-03-03 | Nec Corporation | Speech coding method and system |
EP0898267A2 (en) * | 1991-02-26 | 1999-02-24 | Nec Corporation | Speech coding method and system |
WO1994027286A1 (de) * | 1993-05-07 | 1994-11-24 | Ant Nachrichtentechnik Gmbh | Verfahren zur vektorquantisierung, insbesondere von sprachsignalen |
AU681137B2 (en) * | 1993-05-07 | 1997-08-21 | Bosch Telecom Gmbh | Process for vector quantization, especially of voice signals |
EP0654909A1 (en) * | 1993-06-10 | 1995-05-24 | Oki Electric Industry Company, Limited | Code excitation linear prediction encoder and decoder |
US5727122A (en) * | 1993-06-10 | 1998-03-10 | Oki Electric Industry Co., Ltd. | Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method |
EP0654909A4 (en) * | 1993-06-10 | 1997-09-10 | Oki Electric Ind Co Ltd | PREDICTIVE LINEAR ENCODER-ENCODER WITH CODES EXCITATION. |
US5677986A (en) * | 1994-05-27 | 1997-10-14 | Kabushiki Kaisha Toshiba | Vector quantizing apparatus |
Also Published As
Publication number | Publication date |
---|---|
AU5735990A (en) | 1991-02-22 |
KR920702787A (ko) | 1992-10-06 |
CA2060310A1 (en) | 1990-12-24 |
DE69032026T2 (de) | 1998-09-17 |
CN1023160C (zh) | 1993-12-15 |
EP0484339A1 (en) | 1992-05-13 |
DE69032026D1 (de) | 1998-03-12 |
EP0484339B1 (en) | 1998-02-04 |
AU638462B2 (en) | 1993-07-01 |
BR9007467A (pt) | 1992-06-16 |
IL94119A (en) | 1996-06-18 |
IL94119A0 (en) | 1991-01-31 |
NZ234180A (en) | 1993-11-25 |
CN1048278A (zh) | 1991-01-02 |
EP0484339A4 (en) | 1993-05-05 |
CA2060310C (en) | 2001-07-17 |
KR950003557B1 (ko) | 1995-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5208862A (en) | Speech coder | |
US8364473B2 (en) | Method and apparatus for receiving an encoded speech signal based on codebooks | |
KR100417836B1 (ko) | 과다-샘플된 합성 광대역 신호를 위한 고주파 내용 복구방법 및 디바이스 | |
US6484140B2 (en) | Apparatus and method for encoding a signal as well as apparatus and method for decoding signal | |
US5140638A (en) | Speech coding system and a method of encoding speech | |
EP0294020A2 (en) | Vector adaptive coding method for speech and audio | |
CA2202825C (en) | Speech coder | |
US5633980A (en) | Voice cover and a method for searching codebooks | |
KR20020077389A (ko) | 광대역 신호의 코딩을 위한 대수적 코드북에서의 펄스위치 및 부호의 인덱싱 | |
KR19980024885A (ko) | 벡터양자화 방법, 음성부호화 방법 및 장치 | |
US5926785A (en) | Speech encoding method and apparatus including a codebook storing a plurality of code vectors for encoding a speech signal | |
US5526464A (en) | Reducing search complexity for code-excited linear prediction (CELP) coding | |
EP0570365A1 (en) | Digital speech coder having optimized signal energy parameters | |
CA2147394C (en) | Quantization of input vectors with and without rearrangement of vector elements of a candidate vector | |
CA2060310C (en) | Digital speech coder with vector excitation source having improved speech quality | |
US5797119A (en) | Comb filter speech coding with preselected excitation code vectors | |
JP2002268686A (ja) | 音声符号化装置及び音声復号化装置 | |
US5822721A (en) | Method and apparatus for fractal-excited linear predictive coding of digital signals | |
Chen et al. | Vector adaptive predictive coding of speech at 9.6 kb/s | |
KR100416363B1 (ko) | 선형 예측 분석 대 합성 엔코딩 방법 및 엔코더 | |
JP2658816B2 (ja) | 音声のピッチ符号化装置 | |
WO2000017858A9 (en) | Robust fast search for two-dimensional gain vector quantizer | |
EP0405548B1 (en) | System for speech coding and apparatus for the same | |
JP3252285B2 (ja) | 音声帯域信号符号化方法 | |
JP2876785B2 (ja) | 改善された音声品質を有するベクトル励起源を具備するデジタル音声符号器 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU BR CA JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2060310 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1990908908 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1990908908 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 1990908908 Country of ref document: EP |