CA2254620A1

CA2254620A1 - Vocoder with efficient, fault tolerant excitation vector encoding

Info

Publication number: CA2254620A1
Application number: CA002254620A
Authority: CA
Inventors: Michael D. Turner
Original assignee: Lucent Technologies Inc
Current assignee: Nokia of America Corp
Priority date: 1998-01-13
Filing date: 1998-11-30
Publication date: 1999-07-13
Also published as: BR9900019A; KR19990067850A; EP0930608A1; CN1239796A

Abstract

A CELP vocoder efficiently encodes an excitation vector in a way that is less sensitive to single bit errors. Each of the pulses composing the excitation vector are limited to one of four predetermined positions. As a result, only three bits are required to encode each pulse (two bits for position and one sign bit) and) in addition, a single bit error only produces an error in one pulse.

Description

VOCODER WITH EFFICIENT, FAULT TOLERANT
EXCITATION VECTOR ENCODING
Background of the Invention Field of the Invention The present invention relates to communications; more specifically, voice encoding.
Description of the Related Art A voice encoder (vocoder) is used to encode voice signals so as to minimize the amount of bandwidth that is used for transmitting over communication channels. It is important to minimize the amount of bandwidth used per communication channel so as to maximize the number of channels available within a given range of spectrum. Many vocoders are known as code excited linear predictive (CELP) vocoders. Present CELP vocoders which model the fixed codebook contribution to the filter excitation as a series of pulses use an inefficient encoding scheme that is sensitive to bit errors. An encoding scheme that is wasteful of precious bandwidth and is sensitive to bit errors is particularly undesirable in an error-prone communication channel such as a wireless communication channel.
The encoding process involves representing a series of excitation pulses or an excitation vector as a series of bits referred to as a fixed index. The fixed index is used by a vocoder at a receiver to reproduce the excitation pulses which are then used to excite a speech model and thereby reproduce speech. Prior vocoders represent these pulses using 3-1/2 or more bits per pulse. Additionally, prior vocoders are sensitive to communication channel induced errors because a single bit error may produce errors in up to two pulses.
FIG. 1 illustrates a series of pulses that are to be represented by a fixed index. In this example there are ten pulses; each pulse may be positive or negative.
The fixed index specifies which ten of the forty possible predetermined positions are occupied by a pulse and the sign of each pulse. An inefficient coding scheme is illustrated by the table of FIG. 2. There are 40 possible positions for pulses;
however, the table indicates that each pulse is limited to one of eight positions. As a result, the vocoder is limited to using excitation vectors that are composed of a series of pulses that are permitted by the possible combinations specified by the table. FIG.

2 illustrates a fixed index table where two pulses are associated with each row of the table. In the first row, each of pulses I o and I 5 are restricted to one of eight positions; namely, positions 0, 5, 10, 15, 20, 25, 30 and 35. Likewise, each remaining row specifies the possible positions that may be assigned to each pulse of the pulse pair associated with that row. It should be noted that specifying one of eight positions for each pulse requires three bits for each pulse.
Additionally, a sign is specified for each pulse. In this prior art system, one bit is used to specify the sign of the first pulse of each pulse pair in each row. The sign of the second pulse in each pulse pair is specified by the position of that pulse. If the second pulse has a position that is smaller than the first pulse's position, the sign of the second pulse is opposite to that of the first pulse, otherwise the signs of the pulses are the same. As a result, for ten pulses, thirty-five bits are used to specify their positions and signs (3.5 bits/pulse). It should be noted that in this system if a single bit error occurs it will not only affect the position or sign of the pulse associated with that error, but it may also affect the sign of the second pulse in a pair of pulses.
Summary of the Invention The present invention provides a CELP vocoder that efficiently encodes an excitation vector in a way that is less sensitive to single bit errors.
Each of the pulses composing the excitation vector are limited to one of four predetermined positions. As a result, only three bits are required to encode each pulse (two bits for position and one sign bit) and, in addition) a single bit error only produces an error in one pulse.
Brief Description of the Drawing FIG. 1 illustrates a series of pulses;.
FIG. 2 is a fixed index table illustrating an inefficient encoding scheme;
FIG. 3 is a block diagram of a typical vocoder;
FIG. 4 illustrates the major functions of encoder 14 of vocoder 10;
FIG. 5 is a functional block diagram of decoder 20 of vocoder 10;
FIG. 6 is a fixed index table specifying valid pulse positions for a ten pulse excitation vector;
FIG. 7 is a fixed index table specifying valid pulse positions for a five pulse excitation vector; and FIG. 8 is a fixed index table specifying valid pulse positions for a three pulse excitation vector;

Detailed Description of the Invention FIG. 3 illustrates a block diagram of a typical vocoder. Vocoder 10 receives digitized speech on input 12. The digitized speech is an analog speech signal that has been passed through an analog to digitized converter, and has been broken into frames where each frame is typically on the order of 20 milliseconds.
The signal at input 12 is passed to encoder section 14 which encodes the speech so as decrease the amount of bandwidth used to transmit the speech. The encoded speech is made available at output 16. The encoded speech is received by the decode section of a similar vocoder at the other end of a communication channel. The decoder at the other end of the communication channel is similar or identical to the decoder portion of vocoder 10. Encoded speech is received by vocoder 10 through input 18, and is passed to decoder section 20. Decoder section 20 uses the encoded signals received from the transmitting vocoder to produce digitized speech at output 22.
Vocoders are well known in the communications arts. For example, vocoders are described in "Speech and audio coding for wireless and network applications," edited by Bishnu S. Atal, Vladimir Cupenman, and Allen Gersho, 1993, by Kluwer Academic Publishers. Vocoders are widely available and manufactured by companies such as Qualcomm Incorporated of San Diego, California, and Lucent Technologies Inc., of Murray Hill, New Jersey.
FIG. 4 illustrates the major functions of encoder 14 of vocoder 10. A
digitized speech signal is received at input 12, and is passed to linear predictive coder 40. Linear predictive coder 40 performs a linear predictive analysis of the incoming speech once per frame. Linear predictive analysis is well known in the art and produces a linear predictive synthesis model of the vocal tract based on the input speech signal. The linear predictive parameters or coefficients describing this model are transmitted as part of the encoded speech signal through output 16. Coder uses this model to produce a residual speech signal which represents the excitation that the model uses to reproduce the input speech signal. The residual speech signal is made available at output 42. The residual speech from output 42 is provided to input 48 of open-loop pitch search unit 50) to an input of adaptive codebook unit 72 and to fixed codebook unit 82.
Impulse response unit 60 receives the linear predictive parameters from coder 40 and generates the impulse response of the model generated in coder 40.
This impulse response is used in the adaptive and fixed codebook units.
Open loop pitch search unit 50 uses the residual speech signal from coder 40 to model its pitch and provides a pitch, or what is commonly called the pitch period or pitch delay signal, at output 52. The pitch delay signal from output 52 and the impulse response signal from output 64 of impulse response unit 60 are received by input 70 of adaptive codebook unit 72. Adaptive codebook unit 72 produces a pitch gain output and a pitch index output which become part of encoded speech output 16 of vocoder 10. Output 74 of adaptive codebook 72 also provides the pitch gain and pitch index signals to input 80 of fixed codebook unit 82.
Additionally, adaptive codebook 72 provides an excitation signal and an adaptive codebook target signal to input 80.
The adaptive codebook 72 produces its outputs using the digitized speech signal from input 12 and the residual speech signal produced by linear predictive coder 40. Adaptive codebook 72 uses the digitized speech signal and linear predictive coder 40's residual speech signal to form an adaptive codebook target signal. The adaptive codebook target signal is used as an input to fixed codebook 82, and as an input to a computation that produces the pitch gain, pitch index and excitation outputs of adaptive codebook unit 72. Additionally, the adaptive codebook target signal, the pitch delay signal from open loop pitch search unit 50, and the impulse response from impulse response unit 60 are used to produced the pitch index, the pitch gain and excitation signals which are passed to fixed codebook unit 82. The manner in which these signals are computed is well known in the vocoder art.
Fixed codebook 82 uses the inputs received from input 80 to produce a fixed gain output and a fixed index output which are used as part of the encoded speech at output 16. The fixed codebook unit attempts to model the stocastic part of the linear predictive coder 40's residual speech signal. A target for a fixed codebook search is produced by determining a fixed codebook error or the difference between the current adaptive codebook target signal and the residual speech signal from linear predictive coder 40. The fixed codebook error is well known in the art and is described in telecommunications standards as the mean square error between a weighted speech signal and a weighted synthesis speech signal. These standards are published by groups such as the International Telecommunication Union, the European Telecommunications Standards Institute, and the Telecommunications Industry Association. The fixed codebook search produces the fixed gain and fixed index that minimizes the fixed codebook error or the mean square of the error.
The fixed index describes a set of excitation pulses. The fixed index is obtained by searching for a set of excitation pulses that minimize the fixed codebook error;
however, the search for a set of excitation pulses is limited to valid sets of excitation pulses defined by the fixed codebook's fixed index table. The fixed index table limits the number of possible positions that each pulse may occupy. The manner in which the fixed gain and fixed index signals are computed using the outputs from adaptive codebook unit 72 are well known in the vocoder art.
FIG. 5 illustrates a functional block diagram of decoder 20 of vocoder 10. Encoded speech signals are received at input 18 of encoder 20. The encoded speech signals are received by decoder 100. Decoder 100 produces fixed and adaptive code vectors corresponding to the fixed index and pitch index signals, respectively. These code vectors are passed to the excitation construction portion of unit 110 along with the pitch gain and the fixed gain signals. The pitch gain signal is used to scale the adaptive vector which was produced using the pitch index signal, and the fixed gain signal is used to scale the fixed vector which was obtained using the fixed index signal. Decoder 100 passes the linear predictive code parameters to the filter or model synthesis section of unit 110. Unit 110 then uses the scaled vectors to excite the filter that is synthesized using the linear predictive coefficients produced by linear predictive coder 40, and produces an output signal which is representative of the digitized speech originally received at input 12.
Optionally, post filter 120 may be used to shape the spectrum of the digitized speech signal that is produced at output 20.
Referring back to FIG. 3, one of fixed codebook 82's outputs is a fixed index. A fixed index is produced four times per frame (once per subframe), which is every 5 msec for a system using 20 msec frames. The fixed index specifies an excitation vector or a series of excitation pulses, where the bits of the fixed index describe the position and sign of the pulses. As mentioned earlier, these excitation pulses are used as inputs to a speech model in a receiving vocoder.
FIG. 6 illustrates a fixed index table used for specifying the possible predetermined positions of the excitation pulses composing a valid excitation vector.
Each pulse is limited to one of four predetermined positions and therefore only requires two bits to specify a position. A third bit is used to specify a sign. For example, if ten pulses are to be specified, ten rows each having four possible positions are included in the table. In this example, pulse I o may occupy positions 0, 10, 20 or 30. And likewise, each of the other pulses may occupy one of the possible positions specified in its row. In this example, only thirty bits are required to specify the position and sign of ten pulses (3 bits/pulse) because two bits per pulse specify position and one bit per pulse specifies a sign.
FIG. 7 illustrates a fixed index table used for specifying the possible predetermined positions of five pulses where each pulse may occupy only one of four positions.

FIG. 8 illustrates a fixed index table specifying the possible predetermined positions of the pulses in a three pulse excitation vector where the excitation pulses specified by the last two rows are limited to three possible predetermined locations each. It is also possible to use a fixed index table that limits one or more excitation pulses to two possible predetermined locations each.
The schemes of FIGS. 6, 7 and 8 may be applied to excitation vectors having any number of pulses and the number of possible predetermined positions that each pulse may occupy may be limited to four or less.
The functional block diagrams can be implemented in various forms.
Each block can be implemented individually using microprocessors or microcomputers, or they can be implemented using a single microprocessor or microcomputer. It is also possible to implement each or all of the functional blocks using programmable digital signal processing devices or specialized devices received from the aforementioned manufacturers or other semiconductor manufacturers.

Claims

1. A method for encoding an excitation vector, comprising the steps of:
selecting a selected excitation pulse set from a plurality of valid excitation pulse sets, each excitation pulse set having a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of up to four predetermined positions; and producing an output describing the selected excitation pulse set.

2. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of four predetermined positions.

3. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of up to four predetermined positions and a second excitation pulse is limited to one of up to three predetermined positions.

4. The method of claim 1, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of four predetermined positions and a second excitation pulse is limited to one of three predetermined positions.

5. The method of claim 1, wherein the step of producing an output comprises producing an output that describes a position of each excitation pulse in the selected excitation pulse set by up to two bits.

6. The method of claim 5, wherein the step of producing an output comprises producing a an output that describes a sign of each excitation pulse in the selected excitation pulse set by one bit.

7. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having ten pulses.

8. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having five pulses.

9. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having four pulses.

10. The method of claim 1, wherein the step of selecting comprises selecting a selected excitation pulse set having three pulses.

11. A method for encoding an excitation vector, comprising the steps of:
searching through a plurality of valid excitation pulse sets for a selected excitation pulse set that minimizes a fixed codebook error, each excitation pulse set having a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of up to four predetermined positions; and producing an output describing the selected excitation pulse set.

12. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where each excitation pulse is limited to one of four predetermined positions.

13. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of up to four predetermined positions and a second excitation pulse is limited to one of up to three predetermined positions.

14. The method of claim 11, wherein the step of restricting comprises restricting the plurality of valid excitation pulse sets to sets where a first excitation pulse is limited to one of four predetermined positions and a second excitation pulse is limited to one of three predetermined positions.

15. The method of claim 11, wherein the step of producing an output comprises producing an output that describes a position of each excitation pulse in the selected excitation pulse set by up to two bits.

16. The method of claim 15, wherein the step of producing an output comprises producing a an output that describes a sign of each excitation pulse in the selected excitation pulse set by one bit.

17. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having ten pulses.

18. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having five pulses.

19. The method of claim 11, wherein the step of selecting comprises selecting a selected excitation pulse set having four pulses.

20. The method of claim 11) wherein the step of selecting comprises selecting a selected excitation pulse set having three pulses.