CA2254620A1 - Vocoder with efficient, fault tolerant excitation vector encoding - Google Patents

Vocoder with efficient, fault tolerant excitation vector encoding Download PDF

Info

Publication number
CA2254620A1
CA2254620A1 CA002254620A CA2254620A CA2254620A1 CA 2254620 A1 CA2254620 A1 CA 2254620A1 CA 002254620 A CA002254620 A CA 002254620A CA 2254620 A CA2254620 A CA 2254620A CA 2254620 A1 CA2254620 A1 CA 2254620A1
Authority
CA
Canada
Prior art keywords
excitation pulse
pulses
sets
pulse
excitation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002254620A
Other languages
French (fr)
Inventor
Michael D. Turner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia of America Corp
Original Assignee
Lucent Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucent Technologies Inc filed Critical Lucent Technologies Inc
Publication of CA2254620A1 publication Critical patent/CA2254620A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A CELP vocoder efficiently encodes an excitation vector in a way that is less sensitive to single bit errors. Each of the pulses composing the excitation vector are limited to one of four predetermined positions. As a result, only three bits are required to encode each pulse (two bits for position and one sign bit) and) in addition, a single bit error only produces an error in one pulse.

Description

VOCODER WITH EFFICIENT, FAULT TOLERANT
EXCITATION VECTOR ENCODING
Background of the Invention Field of the Invention The present invention relates to communications; more specifically, voice encoding.
Description of the Related Art A voice encoder (vocoder) is used to encode voice signals so as to minimize the amount of bandwidth that is used for transmitting over communication channels. It is important to minimize the amount of bandwidth used per communication channel so as to maximize the number of channels available within a given range of spectrum. Many vocoders are known as code excited linear predictive (CELP) vocoders. Present CELP vocoders which model the fixed codebook contribution to the filter excitation as a series of pulses use an inefficient encoding scheme that is sensitive to bit errors. An encoding scheme that is wasteful of precious bandwidth and is sensitive to bit errors is particularly undesirable in an error-prone communication channel such as a wireless communication channel.
The encoding process involves representing a series of excitation pulses or an excitation vector as a series of bits referred to as a fixed index. The fixed index is used by a vocoder at a receiver to reproduce the excitation pulses which are then used to excite a speech model and thereby reproduce speech. Prior vocoders represent these pulses using 3-1/2 or more bits per pulse. Additionally, prior vocoders are sensitive to communication channel induced errors because a single bit error may produce errors in up to two pulses.
FIG. 1 illustrates a series of pulses that are to be represented by a fixed index. In this example there are ten pulses; each pulse may be positive or negative.
The fixed index specifies which ten of the forty possible predetermined positions are occupied by a pulse and the sign of each pulse. An inefficient coding scheme is illustrated by the table of FIG. 2. There are 40 possible positions for pulses;
however, the table indicates that each pulse is limited to one of eight positions. As a result, the vocoder is limited to using excitation vectors that are composed of a series of pulses that are permitted by the possible combinations specified by the table. FIG.
2 illustrates a fixed index table where two pulses are associated with each row of the table. In the first row, each of pulses I o and I 5 are restricted to one of eight positions; namely, positions 0, 5, 10, 15, 20, 25, 30 and 35. Likewise, each remaining row specifies the possible positions that may be assigned to each pulse of the pulse pair associated with that row. It should be noted that specifying one of eight positions for each pulse requires three bits for each pulse.
Additionally, a sign is specified for each pulse. In this prior art system, one bit is used to specify the sign of the first pulse of each pulse pair in each row. The sign of the second pulse in each pulse pair is specified by the position of that pulse. If the second pulse has a position that is smaller than the first pulse's position, the sign of the second pulse is opposite to that of the first pulse, otherwise the signs of the pulses are the same. As a result, for ten pulses, thirty-five bits are used to specify their positions and signs (3.5 bits/pulse). It should be noted that in this system if a single bit error occurs it will not only affect the position or sign of the pulse associated with that error, but it may also affect the sign of the second pulse in a pair of pulses.
Summary of the Invention The present invention provides a CELP vocoder that efficiently encodes an excitation vector in a way that is less sensitive to single bit errors.
Each of the pulses composing the excitation vector are limited to one of four predetermined positions. As a result, only three bits are required to encode each pulse (two bits for position and one sign bit) and, in addition) a single bit error only produces an error in one pulse.
Brief Description of the Drawing FIG. 1 illustrates a series of pulses;.
FIG. 2 is a fixed index table illustrating an inefficient encoding scheme;
FIG. 3 is a block diagram of a typical vocoder;
FIG. 4 illustrates the major functions of encoder 14 of vocoder 10;
FIG. 5 is a functional block diagram of decoder 20 of vocoder 10;
FIG. 6 is a fixed index table specifying valid pulse positions for a ten pulse excitation vector;
FIG. 7 is a fixed index table specifying valid pulse positions for a five pulse excitation vector; and FIG. 8 is a fixed index table specifying valid pulse positions for a three pulse excitation vector;
Detailed Description of the Invention FIG. 3 illustrates a block diagram of a typical vocoder. Vocoder 10 receives digitized speech on input 12. The digitized speech is an analog speech signal that has been passed through an analog to digitized converter, and has been broken into frames where each frame is typically on the order of 20 milliseconds.
The signal at input 12 is passed to encoder section 14 which encodes the speech so as decrease the amount of bandwidth used to transmit the speech. The encoded speech is made available at output 16. The encoded speech is received by the decode section of a similar vocoder at the other end of a communication channel. The decoder at the other end of the communication channel is similar or identical to the decoder portion of vocoder 10. Encoded speech is received by vocoder 10 through input 18, and is passed to decoder section 20. Decoder section 20 uses the encoded signals received from the transmitting vocoder to produce digitized speech at output 22.
Vocoders are well known in the communications arts. For example, vocoders are described in "Speech and audio coding for wireless and network applications," edited by Bishnu S. Atal, Vladimir Cupenman, and Allen Gersho, 1993, by Kluwer Academic Publishers. Vocoders are widely available and manufactured by companies such as Qualcomm Incorporated of San Diego, California, and Lucent Technologies Inc., of Murray Hill, New Jersey.
FIG. 4 illustrates the major functions of encoder 14 of vocoder 10. A
digitized speech signal is received at input 12, and is passed to linear predictive coder 40. Linear predictive coder 40 performs a linear predictive analysis of the incoming speech once per frame. Linear predictive analysis is well known in the art and produces a linear predictive synthesis model of the vocal tract based on the input speech signal. The linear predictive parameters or coefficients describing this model are transmitted as part of the encoded speech signal through output 16. Coder uses this model to produce a residual speech signal which represents the excitation that the model uses to reproduce the input speech signal. The residual speech signal is made available at output 42. The residual speech from output 42 is provided to input 48 of open-loop pitch search unit 50) to an input of adaptive codebook unit 72 and to fixed codebook unit 82.
Impulse response unit 60 receives the linear predictive parameters from coder 40 and generates the impulse response of the model generated in coder 40.
This impulse response is used in the adaptive and fixed codebook units.
Open loop pitch search unit 50 uses the residual speech signal from coder 40 to model its pitch and provides a pitch, or what is commonly called the pitch period or pitch delay signal, at output 52. The pitch delay signal from output 52 and the impulse response signal from output 64 of impulse response unit 60 are received by input 70 of adaptive codebook unit 72. Adaptive codebook unit 72 produces a pitch gain output and a pitch index output which become part of encoded speech output 16 of vocoder 10. Output 74 of adaptive codebook 72 also provides the pitch gain and pitch index signals to input 80 of fixed codebook unit 82.
Additionally, adaptive codebook 72 provides an excitation signal and an adaptive codebook target signal to input 80.
The adaptive codebook 72 produces its outputs using the digitized speech signal from input 12 and the residual speech signal produced by linear predictive coder 40. Adaptive codebook 72 uses the digitized speech signal and linear predictive coder 40's residual speech signal to form an adaptive codebook target signal. The adaptive codebook target signal is used as an input to fixed codebook 82, and as an input to a computation that produces the pitch gain, pitch index and excitation outputs of adaptive codebook unit 72. Additionally, the adaptive codebook target signal, the pitch delay signal from open loop pitch search unit 50, and the impulse response from impulse response unit 60 are used to produced the pitch index, the pitch gain and excitation signals which are passed to fixed codebook unit 82. The manner in which these signals are computed is well known in the vocoder art.
Fixed codebook 82 uses the inputs received from input 80 to produce a fixed gain output and a fixed index output which are used as part of the encoded speech at output 16. The fixed codebook unit attempts to model the stocastic part of the linear predictive coder 40's residual speech signal. A target for a fixed codebook search is produced by determining a fixed codebook error or the difference between the current adaptive codebook target signal and the residual speech signal from linear predictive coder 40. The fixed codebook error is well known in the art and is described in telecommunications standards as the mean square error between a weighted speech signal and a weighted synthesis speech signal. These standards are published by groups such as the International Telecommunication Union, the European Telecommunications Standards Institute, and the Telecommunications Industry Association. The fixed codebook search produces the fixed gain and fixed index that minimizes the fixed codebook error or the mean square of the error.
The fixed index describes a set of excitation pulses. The fixed index is obtained by searching for a set of excitation pulses that minimize the fixed codebook error;
however, the search for a set of excitation pulses is limited to valid sets of excitation pulses defined by the fixed codebook's fixed index table. The fixed index table limits the number of possible positions that each pulse may occupy. The manner in which the fixed gain and fixed index signals are computed using the outputs from adaptive codebook unit 72 are well known in the vocoder art.
FIG. 5 illustrates a functional block diagram of decoder 20 of vocoder 10. Encoded speech signals are received at input 18 of encoder 20. The encoded speech signals are received by decoder 100. Decoder 100 produces fixed and adaptive code vectors corresponding to the fixed index and pitch index signals, respectively. These code vectors are passed to the excitation construction portion of unit 110 along with the pitch gain and the fixed gain signals. The pitch gain signal is used to scale the adaptive vector which was produced using the pitch index signal, and the fixed gain signal is used to scale the fixed vector which was obtained using the fixed index signal. Decoder 100 passes the linear predictive code parameters to the filter or model synthesis section of unit 110. Unit 110 then uses the scaled vectors to excite the filter that is synthesized using the linear predictive coefficients produced by linear predictive coder 40, and produces an output signal which is representative of the digitized speech originally received at input 12.
Optionally, post filter 120 may be used to shape the spectrum of the digitized speech signal that is produced at output 20.
Referring back to FIG. 3, one of fixed codebook 82's outputs is a fixed index. A fixed index is produced four times per frame (once per subframe), which is every 5 msec for a system using 20 msec frames. The fixed index specifies an excitation vector or a series of excitation pulses, where the bits of the fixed index describe the position and sign of the pulses. As mentioned earlier, these excitation pulses are used as inputs to a speech model in a receiving vocoder.
FIG. 6 illustrates a fixed index table used for specifying the possible predetermined positions of the excitation pulses composing a valid excitation vector.
Each pulse is limited to one of four predetermined positions and therefore only requires two bits to specify a position. A third bit is used to specify a sign. For example, if ten pulses are to be specified, ten rows each having four possible positions are included in the table. In this example, pulse I o may occupy positions 0, 10, 20 or 30. And likewise, each of the other pulses may occupy one of the possible positions specified in its row. In this example, only thirty bits are required to specify the position and sign of ten pulses (3 bits/pulse) because two bits per pulse specify position and one bit per pulse specifies a sign.
FIG. 7 illustrates a fixed index table used for specifying the possible predetermined positions of five pulses where each pulse may occupy only one of four positions.
FIG. 8 illustrates a fixed index table specifying the possible predetermined positions of the pulses in a three pulse excitation vector where the excitation pulses specified by the last two rows are limited to three possible predetermined locations each. It is also possible to use a fixed index table that limits one or more excitation pulses to two possible predetermined locations each.
The schemes of FIGS. 6, 7 and 8 may be applied to excitation vectors having any number of pulses and the number of possible predetermined positions that each pulse may occupy may be limited to four or less.
The functional block diagrams can be implemented in various forms.
Each block can be implemented individually using microprocessors or microcomputers, or they can be implemented using a single microprocessor or microcomputer. It is also possible to implement each or all of the functional blocks using programmable digital signal processing devices or specialized devices received from the aforementioned manufacturers or other semiconductor manufacturers.

Claims (20)

1. A method for encoding an excitation vector, comprising the steps of:
selecting a selected excitation pulse set from a plurality of valid excitation pulse sets, each excitation pulse set having a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of up to four predetermined positions; and producing an output describing the selected excitation pulse set.
2. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of four predetermined positions.
3. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of up to four predetermined positions and a second excitation pulse is limited to one of up to three predetermined positions.
4. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of four predetermined positions and a second excitation pulse is limited to one of three predetermined positions.
5. The method of claim 1, wherein the step of producing an output comprises producing an output that describes a position of each excitation pulse in the selected excitation pulse set by up to two bits.
6. The method of claim 5, wherein the step of producing an output comprises producing a an output that describes a sign of each excitation pulse in the selected excitation pulse set by one bit.
7. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having ten pulses.
8. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having five pulses.
9. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having four pulses.
10. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having three pulses.
11. A method for encoding an excitation vector, comprising the steps of:
searching through a plurality of valid excitation pulse sets for a selected excitation pulse set that minimizes a fixed codebook error, each excitation pulse set having a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of up to four predetermined positions; and producing an output describing the selected excitation pulse set.
12. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of four predetermined positions.
13. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of up to four predetermined positions and a second excitation pulse is limited to one of up to three predetermined positions.
14. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of four predetermined positions and a second excitation pulse is limited to one of three predetermined positions.
15. The method of claim 11, wherein the step of producing an output comprises producing an output that describes a position of each excitation pulse in the selected excitation pulse set by up to two bits.
16. The method of claim 15, wherein the step of producing an output comprises producing a an output that describes a sign of each excitation pulse in the selected excitation pulse set by one bit.
17. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having ten pulses.
18. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having five pulses.
19. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having four pulses.
20. The method of claim 11) wherein the step of selecting comprises selecting a selected excitation pulse set having three pulses.
CA002254620A 1998-01-13 1998-11-30 Vocoder with efficient, fault tolerant excitation vector encoding Abandoned CA2254620A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US667698A 1998-01-13 1998-01-13
US09/006,676 1998-01-13

Publications (1)

Publication Number Publication Date
CA2254620A1 true CA2254620A1 (en) 1999-07-13

Family

ID=21722053

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002254620A Abandoned CA2254620A1 (en) 1998-01-13 1998-11-30 Vocoder with efficient, fault tolerant excitation vector encoding

Country Status (5)

Country Link
EP (1) EP0930608A1 (en)
KR (1) KR19990067850A (en)
CN (1) CN1239796A (en)
BR (1) BR9900019A (en)
CA (1) CA2254620A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8500843A (en) * 1985-03-22 1986-10-16 Koninkl Philips Electronics Nv MULTIPULS EXCITATION LINEAR-PREDICTIVE VOICE CODER.
SE463691B (en) * 1989-05-11 1991-01-07 Ericsson Telefon Ab L M PROCEDURE TO DEPLOY EXCITATION PULSE FOR A LINEAR PREDICTIVE ENCODER (LPC) WORKING ON THE MULTIPULAR PRINCIPLE
FR2729245B1 (en) * 1995-01-06 1997-04-11 Lamblin Claude LINEAR PREDICTION SPEECH CODING AND EXCITATION BY ALGEBRIC CODES
US5822724A (en) * 1995-06-14 1998-10-13 Nahumi; Dror Optimized pulse location in codebook searching techniques for speech processing

Also Published As

Publication number Publication date
BR9900019A (en) 2000-01-04
KR19990067850A (en) 1999-08-25
EP0930608A1 (en) 1999-07-21
CN1239796A (en) 1999-12-29

Similar Documents

Publication Publication Date Title
US7280959B2 (en) Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals
US7016831B2 (en) Voice code conversion apparatus
EP0409239B1 (en) Speech coding/decoding method
JP3430175B2 (en) Algebraic codebook with signal-selected pulse amplitude for fast encoding of speech signals
US6470313B1 (en) Speech coding
US7792679B2 (en) Optimized multiple coding method
KR20040028750A (en) Method and system for line spectral frequency vector quantization in speech codec
EP0815554A1 (en) Analysis-by-synthesis linear predictive speech coder
JPH05197400A (en) Means and method for low-bit-rate vocoder
US20040024594A1 (en) Fine granularity scalability speech coding for multi-pulses celp-based algorithm
US6847929B2 (en) Algebraic codebook system and method
JPH11259100A (en) Method for encoding exciting vector
EP0556354B1 (en) Error protection for multimode speech coders
EP1020848A2 (en) Method for transmitting auxiliary information in a vocoder stream
JP3063668B2 (en) Voice encoding device and decoding device
EP0863500A2 (en) Variable rate speech coding method and decoding method
US20030055633A1 (en) Method and device for coding speech in analysis-by-synthesis speech coders
EP0930608A1 (en) Vocoder with efficient, fault tolerant excitation vector encoding
AU1127699A (en) Vocoder with efficient, fault tolerant excitation vector encoding
KR100389898B1 (en) Method for quantizing linear spectrum pair coefficient in coding voice
JPH05165498A (en) Voice coding method
JPH09269798A (en) Voice coding method and voice decoding method

Legal Events

Date Code Title Description
EEER Examination request
FZDE Dead