AU638462B2 - Digital speech coder with vector excitation source having improved speech quality - Google Patents

Digital speech coder with vector excitation source having improved speech quality Download PDF

Info

Publication number
AU638462B2
AU638462B2 AU57359/90A AU5735990A AU638462B2 AU 638462 B2 AU638462 B2 AU 638462B2 AU 57359/90 A AU57359/90 A AU 57359/90A AU 5735990 A AU5735990 A AU 5735990A AU 638462 B2 AU638462 B2 AU 638462B2
Authority
AU
Australia
Prior art keywords
excitation signal
signal
excitation
candidate
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
AU57359/90A
Other versions
AU5735990A (en
Inventor
Ira Alan Gerson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Publication of AU5735990A publication Critical patent/AU5735990A/en
Application granted granted Critical
Publication of AU638462B2 publication Critical patent/AU638462B2/en
Anticipated expiration legal-status Critical
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • G10L2019/0005Multi-stage vector quantisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Analogue/Digital Conversion (AREA)

Description

OPI DATE 22/02/91 APPLN. ID 57359 PC AOJP DATE 28/03/91 PCT NUMBER PCT/US90/02469 INTERNATION- ,I ,i rL n r i I uL)nJII- LIt"I \1 1u11 i riI 1v4 Au I\ I A-i i.viATY (PCT) (51) International Patent Classification 5 (11) International Publication Number: WO 91/01545 5/00 Al (43) International Publication Date: 7 February 1991 (07.02.91) (21) International Application Number: (22) International Filing Date: Priority data: 370,541 23 July 1 PCT/US90/02469 2 May 1990 (02.05.90) 989 (23.07.89) (81) Designated States: AT (European patent), AU, BE (European patent). BR, CA, CH (European patent), DE (European patent)*, DK (European patent), ES (European patent), FR (European patent), GB (European patent), IT (European patent), JP, KR, LU (European patent), NL (European patent), SE (European patent).
Published With international search report.
638462 (71)Applicant: MOTOROLA, INC. [US/US]; 1303 East Algonquin Road, Schaumburg, IL 60196 (US).
(72) Inventor: GERSON, Ira, Alan 1120 Nottingham Lane, Hoffman Estates, IL 60195 (US).
(74) Agents: PARMELEE, Steven, G. et al.; Motorola, Inc., Intellectual Property Depatment, 1303 East Algonquin Road, Schaumburg, IL 60196 (US).
(54) Title: DIGITAL SPEECH CODER WITH VECTOR EXCITATION SOURCE HAVING IMPROVED SPEECH QUAL-
ITY
PITCH
PITCH
PERIOD FILTER PARAMETER
COEFFICIENT
101
B
1 1 0 1 2 0 104 1 0 9 PITCH FILTER 117 STATE
UNENCODED
103 CONTROL SIGNAL SG ASIGNAL o 106 3 112 CODEBOOK ORTHOGONALIZING COMPARISON No. 1 PROCESS114 118 116
GAIN
117 CONTROL
SIGNAL
121 11 107 CODEBOOK ORTHOGONALIZING No. 2 PROCESS 120
GAIN
CONTROL
SIGNAL o (57) Abstract In a vector excitation source digital speech coder utilizing vector excitation, candidate excitation sources l11, 121) are considered independent of certain pitch parameters. Once a particular excitation source has been selected, the excluded pitch parameter may then be optimized, resulting in an overall improvement in speech quality.
See back of page WO 91/01545 PCT/US90/02469 DIGITAL SPEECH CODER WITH VECTOR EXCITATION SOURCE HAVING IMPROVED SPEECH QUALITY T. hn iea.- This invention relates generally to speech coders, and more particularly to digital speech coders that use vector excitation sources.
-Backgrou.nId of the Invrn ion Speech coders are known in the art. Some speech coders convert analog voice samples into digitized representations, and subsequently represent the spectral speech information through use of linear predictive coding. Other speech coders improve upon ordinary linear predictive coding techniques by providing an excitation signal that is related to the original voice signal. I have described, in previously issued U.S. Patent No. 4,817,157, a digital speech coder having an improved vector excitation source wherein a codebook of excitation vectors is accessed to select an excitation signal that best fits the available information, and hence provides a recovered speech signal that closely represents the original.
2 In general, the resultant decoded speech signal will more closely represent the original unencoded speech signal if there is a significant number of candidate excitation vectors available for consideration as the excitation source. Increasing performance in this way, however, generally results in enlargement of the codebook size, and this will usually increase processing complexity and data rates.
A need therefore exists for a digital speech coder that uses a vector excitation signal, wherein for a given size codebook, the quality of the decoded speech signal is substantially maximized with minimal increase in complexity and substantially no increase in data rate.
These needs and others may be substantially met through provision of the digital speech coder with vector excitation source having improved speech quality disclosed herein. When encoding a signal sample, such as a speech sample, the coder may first determine a pitch period parameter for the speech sample. Relying in part upon this pitch period parameter, a particular coded excitation signal can be determined independent of the pitch filter coefficient, following which the pitch filter coefficient parameter can be optimized for that particular speech sample. This methodology may allow candidate excitation signals to be considered without requiring a commensurate increase in processing complexity or data rates.
In one embodiment, the coded excitation signal may be determined substantially independent from any pitch information. In particular, candidate excitation signals 30 as provided by a ccdebook may be processed to substantially remove components that are representable, at least in part, by a reference component that is related, at least in part, to the intermediate pitch vector. More particularly, the vector component related to the intermediate pitch vector may be removed from the candidate excitation signal (a process known as orthogonalizing). The orthogonalized candidate excitation signals may then be compared with the unencoded speech sample to identify the candidate excitation signal that 0 *00 0' 0S S
S
S. *S 00I Rnk
TO%
3 best represents this particular speech sample. The pitch information, including a pitch filter coefficient parameter, can be optimized later to best suit the selected excitation signal to thereby yield an overall optimized coded representation of the speech signal.
In another embodiment, a second codebook of candidate excitation signals, wherein two excitation signals are used to represent the speech sample, may be provided. The first excitation signal can be selected as described above, and the second excitation signal can be selected in a similar manner, wherein candidate second excitation signals may first be orthogonalized with respect to both the intermediate pitch vector and the previously selected first excitation signal.
According to one aspect of the present invention there is provided a method of encoding a speech sample, comprising the steps of: A) determining a pitch period parameter for the speech sample characterized by; B) determining, independent of any pitch filter coefficient, a coded excitation signal for the speech sample; C) optimizing at least a pitch filter coefficient parameter for the speech sample.
According to a further aspect of the present invention there is provided a method of encoding a signal sample using at least two codebooks that include information regarding candidate excitation signals, comprising the steps of: 30 A) determining, using a first one of the codebooks, a first excitation signal for the signal sample; characterized by: B) determining, using a second one of the codebooks, a second excitation signal for the signal too 0 sample, which second excitation signal is comprised of information that is substantially independent of information that is representable by the first excitation Ctt S~tsignal; br-,R A 5C) using the first and second excitation signals to represent, at least in part, the signal sample.
A preferred embodiment of the present invention will now be described with reference to the accompanying drawings wherein: Fig. 1 comprises a block diagrammatic depiction of the invention; and Fig. 2 comprises a simple vector diagram representing one aspect of the invention.
I. 0 o*o 39 3a
MJP
WO 91/01545 PC/US90/02469 4 This invention can be embodied in a speech coder that makes use of an appropriate digital signal processor such as a Motorola DSP 56000 family device. The computational functions of such a DSP embodiment are represented in Fig. 1 as a block diagram equivalent circuit.
A pitch period parameter (101) (determined in accordance with prior art technique) is provided to a pitch filter state (102) that comprises part of a pitch filter. The resultant signal (103) comprises an intermediate pitch vector that is provided to both a first multiplier (104) and two orthogonalizing processes (106 and 107) as described below in more detail. This first multiplier (104) functions to multiply the resultant signal by a pitch filter coefficient (108) to yield a pitch filter output (109). Selection of the pitch filter coefficient (108) will be described below in more detail.
A first codebook (111) includes a set of basis vectors that can be linearly combined to form a plurality of resultant excitation signals. Depending upon the size of the memory utilized, and other factors appropriate to the application, the number of possible resultant excitation signals can be, for example, between 64 and 2,048, with more of course being possible when appropriate to a particular application. The problem, when encoding a particular speech sample, is to select whichever of these excitation sources best represents the corresponding component of the original speech information.
PT
WO 91/01545 PC'T/US90/02469 Pursuant to this invention, once a particular resultant signal (103) has been determined, the excitation signals formulated by the first codebook (111) will be presented in seriatim fashion as candidate excitation sources. Each candidate excitation source will first be orthogonalized (106) with respect to the resultant signal. For example, referring momentarily to Fig. 2, if vector A were considered to represent the resultant signal and vector B were to represent a particular candidate excitation source, orthogonalization of the candidate excitation source signal would result in the vector denoted by reference character (It should be understood that in practice, the vector dimension space is a function of the number of samples comprising the vectors, which may be upwards of 40 samples or more. It should also be noted that the candidate excitation vectors may be readily orthogonalized by orthogonalizing the basis vectors, wherein linear combinations of the orthogonadized basis vectors with one another will result in orthogonalized excitation vectors.) Once orthogonalized, the resulting candidate excitation source can be compared (112) with the unencoded signal (113) (or an appropriate representative signal cased thereon) to determine the relative similarity or disparity between the two. "he process is then repeated for each of the excitation sources of t-e first codebook (111). A determination can then be made as to which candidate excitation source most closely aligns with the unencoded signal (113).
In this particular embodiment, a gain factor (114) can also be used to modify each candidate excitation WO 91/01545 PCT/US90/02469 6 source signal, as well understood in the art. In addition, if desired, the excitation source selection and gain compensation can both be accomplished in a substantially simultaneous manner, as also well understood in the art.
Once an appropriate excitation source from the first codebook (111) has been selected through this process, the orthogonalizing process (106) can thereafter be dispensed with and the exact excitation source signal selected (116) through an appropriate control mechanism (117). Thereafter, presuming a single codebook coder, the pitch information can be gated (117) and summed (118) together with the selected excitation source with the pitch filter coefficient (108) and excitation gain (114) optimized such that the combined excitation most closely aligns with the encoded signal (113). Once optimized, the pitch period parameter, pitch filter coefficient, and particular excitation source and gain are known, and appropriate representations thereof may be utilized thereafter as representative of the original speech sample.
If desired, and as depicted in Fig. 1, an additional codebook (121) can be utilized, which second codebook (121) again includes a plurality of basis vector derived candidate excitation sources. The use of such multiple codebooks is understood in the art. Pursuant to this invention, however, once the first excitation source from the first codebook (111) has been selected as described above, the candidate excitation sources from the second codebook (121) are orthogonalized (107) with respect to both the resultant signal (103) and the selected excitation source signal from the first WO 91/01545 PCT/US90/02469 7 codebook (111). The selection process can then continue as described above, with the orthogonalized candidate excitation source signals from the second codebook (121) being compared against a representative unencoded signal (113) to identify the closest fit. Once this excitation source has been selected, the pitch filter coefficient (108) and excitation gains (114 and 120) can then be optimized as described above.
What i claimod-4&e:

Claims (10)

1. A method of encoding a speech sample, comprising the steps of: A) determining a pitch period parameter for the speech sample characterized by; B) determining, independent of any pitch filter coefficient, a coded excitation signal for the speech sample; C) optimizing at least a pitch filter coefficient parameter for the speech sample. WO 91/01545 PC/US90/02469 9
2. The method of claim 1 further characterized in that the step of determining a coded excitation signal includes providing a plurality of candidate excitation signals.
3. The method of claim 2 further characterized in that the step of determining a coded excitation signal includes processing the plurality of candidate excitation signals to render processed candidate excitation signals that are comprised of information that is substantially independent of information that is representable by a pitch filter output that is derived, at least in part, as a function of the pitch period parameter.
4. The method of claim 2 further characterized in that the step of determining a coded excitation signal includes processing the plurality of candidate excitation signals to orthogonalize the plurality of candidate excitation signals with respect to a pitch filter output that is derived, at least in part, as a function of the pitch period parameter. WO 91/01545 PCT/US90/02469 The method of claim 1, further characterized in that the step of determining the coded excitation signal comprises the steps of: B1) processing an excitation signal to substantially remove components that are representable, at least in part, by a reference that is related, at least in part, to the pitch period parameter; and B2) determining an appropriate excitation signal for the speech sample. WO 91/01545 PCr/US90/02469 11
6. A method of claim 5 further characterized in that the step of processing the excitation signal includes processing the excitation signal to orthogonalize the excitation signal with respect to a pitch filter output that is derived, at least in part, as a function of the pitch period parameter.
7. The method of claim 5, and further characterized by the step of: C1) processing a candidate excitation signal to substantially remove components that are representable at least in part, by: a reference that is related, at least in part, to the pitch period parameter; and the appropriate excitation signal. dtcfrmined in step C.
8. The method of claim 7 further characterized in that the step of processing a candidate excitation signal includes processing the candidate excitation signal to orthogonalize the candidate excitation signal with respect to both the reference and the appropriate excitation signal.determined in step C. 7?* WO 91/01545 PC/US90/02469 12
9. A method of encoding a signal sample using at least two codebooks that include information regarding candidate excitation signals, comprising the steps of: A) determining, using a first one of the codebooks, a first excitation signal for the signal sample; characterized by: B) determining, using a second one of the codebooks, a second excitation signal for the signal sample, which second excitation signal is comprised of information that is substantially independent of information that is representable by the first excitation signal; C) using the first and second excitation signals to represent, at least in part, the signal sample. 13 The method of claim 9 further characterized in that the signal sample comprises a speech sample.
11. The method of claim 9 further characterized in that the step of determining the second excitation signal includes processing a candidate excitation signal to orthogonalize the candidate excitation signal with respect to the first excitation signal.
12. A method of encoding a signal sample substantially as herein described with reference to the accompanying drawings. DATED: 27 April, 1993. PHILLIPS ORMONDE FITZPATRICK Attorneys for: MOTOROLA, INC. I 4641u 39 MJP
AU57359/90A 1989-06-23 1990-05-02 Digital speech coder with vector excitation source having improved speech quality Expired AU638462B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US37054189A 1989-06-23 1989-06-23
US370541 1989-06-23

Publications (2)

Publication Number Publication Date
AU5735990A AU5735990A (en) 1991-02-22
AU638462B2 true AU638462B2 (en) 1993-07-01

Family

ID=23460115

Family Applications (1)

Application Number Title Priority Date Filing Date
AU57359/90A Expired AU638462B2 (en) 1989-06-23 1990-05-02 Digital speech coder with vector excitation source having improved speech quality

Country Status (10)

Country Link
EP (1) EP0484339B1 (en)
KR (1) KR950003557B1 (en)
CN (1) CN1023160C (en)
AU (1) AU638462B2 (en)
BR (1) BR9007467A (en)
CA (1) CA2060310C (en)
DE (1) DE69032026T2 (en)
IL (1) IL94119A (en)
NZ (1) NZ234180A (en)
WO (1) WO1991001545A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0451200A (en) * 1990-06-18 1992-02-19 Fujitsu Ltd Sound encoding system
JPH0451199A (en) * 1990-06-18 1992-02-19 Fujitsu Ltd Sound encoding/decoding system
IT1241358B (en) * 1990-12-20 1994-01-10 Sip VOICE SIGNAL CODING SYSTEM WITH NESTED SUBCODE
JP2776050B2 (en) * 1991-02-26 1998-07-16 日本電気株式会社 Audio coding method
DE4315315A1 (en) * 1993-05-07 1994-11-10 Ant Nachrichtentech Method for vector quantization, especially of speech signals
SG43128A1 (en) * 1993-06-10 1997-10-17 Oki Electric Ind Co Ltd Code excitation linear predictive (celp) encoder and decoder
JP3224955B2 (en) * 1994-05-27 2001-11-05 株式会社東芝 Vector quantization apparatus and vector quantization method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821324A (en) * 1984-12-24 1989-04-11 Nec Corporation Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate
US4868867A (en) * 1987-04-06 1989-09-19 Voicecraft Inc. Vector excitation speech or audio coder for transmission or storage
US4899385A (en) * 1987-06-26 1990-02-06 American Telephone And Telegraph Company Code excited linear predictive vocoder

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821324A (en) * 1984-12-24 1989-04-11 Nec Corporation Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate
US4868867A (en) * 1987-04-06 1989-09-19 Voicecraft Inc. Vector excitation speech or audio coder for transmission or storage
US4899385A (en) * 1987-06-26 1990-02-06 American Telephone And Telegraph Company Code excited linear predictive vocoder

Also Published As

Publication number Publication date
KR920702787A (en) 1992-10-06
EP0484339B1 (en) 1998-02-04
EP0484339A4 (en) 1993-05-05
WO1991001545A1 (en) 1991-02-07
CA2060310C (en) 2001-07-17
BR9007467A (en) 1992-06-16
DE69032026T2 (en) 1998-09-17
IL94119A0 (en) 1991-01-31
CN1023160C (en) 1993-12-15
NZ234180A (en) 1993-11-25
IL94119A (en) 1996-06-18
DE69032026D1 (en) 1998-03-12
KR950003557B1 (en) 1995-04-14
CA2060310A1 (en) 1990-12-24
AU5735990A (en) 1991-02-22
CN1048278A (en) 1991-01-02
EP0484339A1 (en) 1992-05-13

Similar Documents

Publication Publication Date Title
EP1164579B1 (en) Audible signal encoding method
US7444283B2 (en) Method and apparatus for transmitting an encoded speech signal
CA2156593C (en) Postfilter and method of postfiltering
JP3392412B2 (en) Voice coding apparatus and voice encoding method
CA2202825C (en) Speech coder
US6532443B1 (en) Reduced length infinite impulse response weighting
EP0957472B1 (en) Speech coding apparatus and speech decoding apparatus
US5633980A (en) Voice cover and a method for searching codebooks
AU638462B2 (en) Digital speech coder with vector excitation source having improved speech quality
CA2147394C (en) Quantization of input vectors with and without rearrangement of vector elements of a candidate vector
CA2090205C (en) Speech coding system
EP1093230A1 (en) Voice coder
JP3360545B2 (en) Audio coding device
JP3299099B2 (en) Audio coding device
EP1113418B1 (en) Voice encoding/decoding device
EP0405548B1 (en) System for speech coding and apparatus for the same
JP2001142499A (en) Speech encoding device and speech decoding device
JP3192051B2 (en) Audio coding device
JPH0844398A (en) Voice encoding device
EP0662682A2 (en) Speech signal coding
JPH09146599A (en) Sound coding device
JP2907019B2 (en) Audio coding device
EP0762658A3 (en) Method for coding vectorial digital data and associated decoding method
JPH09319399A (en) Voice encoder