MXPA99001099A

MXPA99001099A - Method and apparatus for searching an excitation codebook in a code excited linear prediction (clep) coder

Info

Publication number: MXPA99001099A
Application number: MXPA/A/1999/001099A
Authority: MX
Inventors: P Dejaco Andrew; Bi Ning
Original assignee: Qualcomm Incorporated
Priority date: 1996-07-31
Filing date: 1999-01-29
Publication date: 2000-01-01

Abstract

Method and apparatus for selecting a code vector in an algebraic codebook wherein the analysis window for the coder is extended beyond the length of the target speech frame. An input signal is filtered by a perceptual weighting filter (76). Then, the filter is set to ring out for a number of samples equal to the length of the perceptual weighting filter (76), while a zero input vector is applied as input. By extending the analysis window, the two dimensional impulse response matrix can be stored as a one dimensional autocorrelation matrix in memory (60, 80), greatly saving on the computational complexity and memory required for the search.

Description

METHOD AND APPARATUS FOR SEARCHING A BOOK OF CODES OF EXCITATION IN A LINEAR PREDICTION ENCODER WITH EXCITATION BY CODE (CELP) BACKGROUND OF THE INVENTION I. FIELD OF THE INVENTION The present invention relates to the processing of speech signals. More particularly, the present invention relates to a new and improved method and to an apparatus for locating an optimal excitation vector in a code prediction linear encoder (CELP).

II. Description of the Related Art The transmission of voice has been extended by digital techniques, particularly in long distance telephone and digital radio applications. This in turn has created interest in the determination of methods that minimize the amount of information sent over the transmission channel, while maintaining high quality in the reconstructed speech signal. If the voice signal is transmitted when sampling and digitizing simply, a data rate in the order of 64 kilobits per second (kbps) is required, in order to achieve a quality of the analog conventional telephone voice signal. Without P1122 / 99MX However, a significant reduction in data rate can be achieved through the use of voice signal analysis, followed by proper coding, transmission and resynthesis at the receiver. Devices that employ techniques to compress the speech signal by extracting the parameters that relate to a model of human speech signal generation are typically called voice coders. These devices are composed of an encoder, which analyzes the incoming speech signal to extract the relevant parameters, and a decoder that resynthesises the vocal signal using the parameters it receives on the transmission channel. The model is constantly changing to exactly model the time-varying speech signal. In this way, the vocal signal is divided into blocks of time, or analysis tables, during which the parameters are calculated. Then, the parameters are updated for each new table. Among the various classes of speech encoders, encoders with Linear Prediction Coding with Code Excitation (CELP), Stochastic Coding Coding, or Vocal Signal Coding with Excitation are of a single class. Vector. An example of a coding algorithm of this particular class is described in P1122 / 99MX article "A 4.8 kbps Code Excited Linear Predictive Coder" by Thomas E. Tremain et al. Proceedings of the Mobile Satellite Conference, 1988. Similarly, examples of other voice coders of this type are detailed in the U.S. Patent No. 5,414,796, entitled "Variable Rate Vocoder" and assigned to the assignee of the present invention and incorporated by reference herein. The function of the speech signal encoder is to compress the digitized speech signal into a low bit rate signal by removing all the natural redundancies inherent in the speech signal. In a CELP coder, redundancies are removed by means of a short term format filter (or LPC). Once these redundancies are removed, the resulting residual signal can be modeled with white Gaussian noise, which must also be encoded. The process of determining the coding parameters for a given frame of speech signal is as follows. First, the parameters of the LPC filter are determined by finding the filter coefficients that remove the short-term redundancy, due to the filtering of the vocal tract, in the speech signal. Then, an excitation signal is selected, which is input to the LPC filter in the decoder, by actuating the LPC filter with a number of P1122 / 99MX excitation waveforms, random in a codebook, and by selecting the particular excitation waveform that makes the output of the LPC filter the closest approximation to the original speech signal. In this way, the parameters transmitted relate to (1) the LPC filter and (2) an identification of the excitation vector of the codebook. A promising structure of the excitation codebook is called an algebraic codebook. The actual structure of algebraic codebooks is well known in the art and is described in the article "Fast CELP coding based on Algebraic Codes" by J.P. Adoul, et al., Proceedinqs of ICASSP, April 6-9, 1987. The use of algebraic codes is further described in U.S. Patent No. 5,444,816, entitled "Dynamic Codebook for Efficient Speech Coding Based on Algebraic Codes," the description of which is incorporated by reference.

SUMMARY OF THE INVENTION The CELP coding analysis, based on synthesis, uses a mean, quadratic, minimum error to match the vector of the best synthesized speech signal to the target vector of the speech signal. This measure is used to find the codebook of the code vector for P1122 / 99MX choose the optimal vector for the current subframe. This measure of the mean square error is typically limited to the window over which the excitation code vector is being chosen and, therefore, fails to justify the contribution that this vector will make in the next subframe that is sought. In the present invention, the window size over which the measure of the quadratic mean error is minimized is extended to justify this call of the code vector in the current frame to the next subframe. The window extension is equal to the impulse response length of the perceptual weighting filter, h (n). The approach of the quadratic mean error in the present invention is analogous to the correlation approach to the mean, quadratic, minimum error used in the LPC analysis as described in the article, "A 4.8 kbps Code Excited Linear Predictive Coder" by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988. By formulating the problem of quadratic mean error from this perspective, the present invention has the following advantages over the current approach: 1) The call of the code vector from the current subframe to the next sub-frame is justified to the extent and, in this way, the impulses placed at the P1122 / 99MX end of the vector are weighted equivalently to the pulses placed at the beginning of the vector. 2) The impulse response of the perceptual weighting filter becomes stationary for the whole subframe, making the autocorrelation matrix of h (n), F (i, j), Toeplitz, or put another way, F (i, j) = F | ij | . In this way, the present invention converts a 2-D matrix into a 1-D vector and, thus, reduces the RAM requirements for searching the codebook as well as the computation operations.

BRIEF DESCRIPTION OF THE DRAWINGS The features, objects and advantages of the present invention will become more apparent from the detailed description set forth below, when taken in conjunction with the drawings in which the reference numbers are used consistently in all the drawings. Figures and wherein: Figure 1 is an illustration of the traditional apparatus for selecting a code vector in an ACELP encoder; Figure 2 is a block diagram of the apparatus of the present invention for selecting a code vector in an ACELP encoder; and Figure 3 is a flow chart describing P1122 / 99MX the method for selecting a code vector in the present invention.

DETAILED DESCRIPTION OF THE PREFERRED MODALITIES Figure 1 illustrates the traditional apparatus and the method used to perform a search of the algebraic codebook. The generator 6 of the codebook includes a pulse generator 2 which, in response to a pulse position signal, pi, generates a signal with a unitary pulse at the i-th position. In the exemplary embodiment, the excitation vector of the codebook comprises forty samples and the possible positions for the unit pulse are divided into the tracks TO to T4 as shown in Table 1 below.

TABLE 1 Clue Positions TO 0, 5, 10, 15, 20, 25, 30, 35 TI 1, 6, 11, 16, 21, 26, 31, 36 T2 2, 7, 12, 17, 22, 27, 32, 37 T3 3, 8, 13, 18, 23, 28, 33, 38 T4 4, 9, 10, 19, 24, 29, 34, 39 In the example mode, an impulse is provided for P1122 / 99MX each track by the pulse generator 2. Np is the number of pulses in an excitation vector. In the example mode, Np is 5. For each pulse, p., A corresponding sign s is assigned to the impulse. The sign of the impulse illustrated by the multiplier 4 multiplies the unit impulse in the position, p .., by the value of the sign, s .. The resulting code vector, ck, is given by equation (1) below.

The filter generator 12 generates the derivation values for the formant filter, h (n), as is well known in the art and is described in detail in the aforementioned US Patent No. 5,414,796. Typically, the impulse function, h (n), will be computed for M samples, where M is the subframe length that is searched, for example 40. The composite filter coefficients, h (n), are provided and stored as a Toeplitz matrix, triangular, two-dimensional, (H), in memory element 13 where the diagonal is h (0) and the lower diagonals are h (l), ..., h (Ml) as shown below P11 2/99 X) The values are provided by the memory 13 to the matrix multiplication element 14. H is then multiplied by its transposition to give the correlation of the impulse response matrix F according to equation (3) below.

M F (j, j) = H * • H = £ h (n - i) h (l? - j), for i j (3) n «j The result of the correlation operation is then provided to the memory element 18 and stored as a two-dimensional array requiring 4O2 to 1600 memory locations for this mode. The frame of the input speech signal s (n) provides and is filtered by the perceptual weighting filter 32 to provide more target signal, x (n).

P1122 / 99MX The design and implementation of the perceptual weighting filter 32 is well known in the art and is described in detail in U.S. Patent No. 5,414,796, mentioned above. The sample values of the target signal, x (n), and the values of the pulse matrix, h (n), are given to the matrix multiplication element 16 which computes the cross-correlation between the target signal and the response of impulse according to equation (4) below.

M dfl) * H * • x =? X (i) h (i-.}.), Paraj = 0 • a M. (4) P »The values of the memory element 20, d (i), and the amplitude elements of the codebook vector, ck, are provided to the matrix multiplication element 22 which multiplies the amplitude elements of the codebook vector by the vector d (n) and squares the resulting value according to Equation (5) below.

P1122 / 99MX The amplitude elements of the book code vector, ck, and the impulse placement vector of the codebook, p, are provided to the matrix multiplication element 26. The matrix multiplication element 26 computes the value , E, according to equation (6) below.

The values of E and (E) are given to the divisor 28, which computes the value Tk according to equation (7) below.

The T values? for each amplitude element of the codebook vector, ck, and the impulse placement vector of the codebook, p, the decrement element is provided to the minimum 30, and the vector of the codebook to be increased to the maximum value With reference to Figure 2, the P1122 / 99MX apparatus for selecting the code vector in the present invention. In Figure 3, a flow chart describing the operational flow of the present invention is illustrated. First, in block 100, the present invention pre-computes the value of d (k), which can be computed in advance and stored since its values do not change with the code vector sought. The frame of the speech signal, s (n), is provided to the perceptual weighting filter 76 that generates the target signal, x (n). The target segment, resulting from the speech signal, x (n), consists of M + L-l perceptively empowered samples that are provided to element 78 of multiplication and accumulation. L is the length of the impulse response of the perceptual weighting filter 76. This vector of the target speech, of extended length x (n), is created by filtering M samples of the speech signal through the perceptual weighting filter 76 and then continuing to let this filter call additional samples, while a zero input vector is applied as the input to the perceptual weighting filter 76. As previously described with respect to the filter generator 12, the filter generator 56 computes the filter derivation coefficients for the formant filter and from these coefficients determines the response of P1122 / 99 X impulse, h (n). However, the filter generator 76 generates a filter response for delays from 0 to L-1, where L is the length of the impulse response, h (n). It should be noted that, although described in the exemplary embodiment, in the absence of an intensity filter, the present invention is equally applicable for cases where there is an intensity filter, by a simple modification of the impulse response as it is well known in the art. The values of h (n) of the filter generator 56 are provided to the multiplication and accumulation element 78. The multiplication and accumulation element 78 computes the cross-correlation in the target sequence x (n), with the impulse response of the filter, h (n), according to equation (8) below. n + L-1 d (n) = x (n) h (n-j), paran = 0 to M-l. (8) j = n The computed values of d (n) are then stored in the memory element 80. In block 102, the present invention pre-computes the values of F needed for the computation of E. It is at this point where the greatest gain in memory saving of the present invention is realized.

P1122 / 99MX Because the measure of the mean square error has been extended over a larger window, h (n) is now stationary over the entire subframe and, consequently, the 2-DF (i, j) matrix becomes a 1-D vector due to F (i, j) = F (| ij |). In the present embodiment, as described in Table 1, this means that the traditional method requires 1600 locations of RAM whereas the present invention requires only 40. Savings are also obtained in the computation operation account and accumulation of vector 1 -D on also the 2-D matrix. In the present invention, the values of F are computed according to equation (9) below.

L-l F (i) = Th (? T) h (n ~ i) (9) The values of F (i) are stored in the memory element 80, which only requires L memory locations, as opposed to the traditional method that requires storage of M2 elements. In this embodiment, L = M. In block 104, the present invention computes the cross-correlation E value. The values of d (k) stored in the memory element 80 and the current codebook vector, c ^ k), of the generator of the codebook 50 are provided to the multiplication element and P1122 / 99MX accumulation 62. The element of multiplication and accumulation 62 occupies the cross-correlation of the objective vector, x (k), and the amplitude elements of the vector of the codebook, c ^ k), according to the equation (10 ).

Nt Ey = SciíPk) -d (Pk) (10) k = 0 The value of E ^ is then provided to the square-elevation means 64 which computes the squared elevation of E In block 106, the present invention computes the value of the autocorrelation of the synthesized speech signal, E ^. The amplitude elements of the codebook vector, c ± (k) and c. (K) are provided from the codebook generator 50 to the multiplication and accumulation element 70. In addition, the values of F | i-j | the multiplication and accumulation element 70 of the memory element 60 are provided. The multiplication and accumulation element 70 computes the value given in equation (11), below.

Np Np? ? Ck (Pi) -C (P.}.) -F! Pi-pj («) i = 0 j = -i + l P1122 / 99MX The value computed by the multiplication and accumulation means 70 is provided to the multiplier 72 where its value is multiplied by 2. The product of the multiplier 72 is provided to a first input of the adder 74. The memory element 60 provides the value from F (0) to the value 75 where it is multiplied by the value N. The product of multiplier 75 is provided to a second input of adder 74. The sum of adder 74 is the value E given by equation (12) below.

Np Np Eyy = N F { 0) +2 XcfefoVCkí iVFIpi -,! (12) i = 0 i + 1 An appreciation of the computational resource savings can be achieved by comparing equation (12) of the present invention with equation (6) of the traditional search method. These savings result from the faster addressing of a 2-D matrix (FlPi-p) over the 2-D access of F (pi, pj), also results from the smallest number of additions required for each calculation of E ^ (for the exemplary mode, equation (6) requires 15 additions while equation (12) requires 11, assuming that ck (pi) they are only terms of sign 1 or -1) and also results from the 1360 savings of location in RAM P1122 / 99MX since F (i, j) does not need to be stored. In block 108, the present invention computes the value of (E) '2 / Eyy. The value of Eyy when adding the element 74 is provided to a first input of the divider 66. The value of (Exy) 2 when squaring the 64 squaring means 64 is provided a second input of the divider 66. Divider 66 then computes the quotient given in the equation (13) below.

The quotient value of divider 66 is provided to the decrementing element to the minimum 66. In block 110, if all vck vectors have not been tested, the stream moves back to block 104 and the next code vector is tested as described above. If all the vectors have been tested, then in block 112, the decrease element to the minimum 68 selects the code vector that results in the maximum value of (Exy) 2 / Eyy. The prior description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications of these modalities P1122 / 99MX will be readily apparent to those skilled in the art, and the generic principles defined therein can be applied to other modalities without the use of an inventive faculty. In this way, it is not proposed to limit the present invention to the modalities shown herein, but rather to agree with the broader scope consistent with the principles and new features described herein.

P1122 / 99 X

Claims

CLAIMS. 1. In a linear prediction encoder for providing a synthesized speech signal in which the short-term and long-term redundancies are removed by a filter means having L derivations and having a pulse response, h (n); the recall is made from a table of N digitized samples of the speech signal resulting in a residual waveform of N samples, a method for encoding the residual waveform using the vector k of the codebook, ck, comprising: combining an objective signal, x (n), and the impulse response, h (n), to provide a first convolution; autocorrelate an impulse response matrix, where the impulse response matrix is a Toeplitz matrix, triangular, lower, with a diagonal h (0), where h (0) is the zero value of the impulse response, and lower diagonals h (l), ..., h (Ll); and where the autocorrelation of the impulse response is computed according to the equation:
P1122 / 99MX autocorrelate the synthesized speech signal according to the autocorrelation of the impulse response matrix and the codebook vectors, ck, to provide a synthesized autocorrelation of the speech signal, 'Eyy;' cross-correlating the synthesized speech signal and the target speech according to the first convolution and the codebook vectors to provide a cross-correlation E; and selecting a vector from the codebook according to the cross-correlation, Exy ^ and the synthesized autocorrelation of the speech signal, E. The method according to claim 1, further comprising the steps of: generating a first set of filter coefficients; generate a second set of filter coefficients; combine the first set of filter coefficients and the second set of filter coefficients to provide the impulse response, h (n). The method according to claim 1, further comprising: receiving an input frame of N digitized samples; Y
P1122 / 99MX perceptively weigh the input box to provide the target signal.
4. The method according to claim 1, wherein the step of combining the target signal and the impulse response is performed according to the equation:
n + L-1 d (n) = Tx (n) h (n-j), paran = 0 to M-l. j-n 5. The method according to claim 1, wherein it further comprises the step of storing the autocorrelation of the impulse response in a memory of L memory locations. The method according to claim 1, wherein the step of cross-correlating the synthesized speech signal and the target speech signal is performed according to the equation:
where d (k) is the cross-correlation of the target signal and the impulse response. The method according to claim 1, wherein the step of autocorrelating the speech signal
P1122 / 99MX synthesized is made according to the equation:
8. A method according to claim 1, wherein the step of selecting a codebook vector comprises the steps of: for each code vector, ck, squaring the value Exy; divide the computed value of Eyy between the square of Exy for each code vector; ck; and select the code vector that maximizes the quotient of Eyy and the square of Exy.
9. A method according to claim 1, wherein the codebook vectors, ck, are selected according to the format of the algebraic codebook.

P1122 / 99MX