A METHOD AND DEVICE FOR DESIGNING AND
SEARCHING LARGE STOCHASTIC CODEBOOKS
IN LOW BIT RATE SPEECH ENCODERS
BACKGROUND OF THE INVENTION
1. Field of the invention:
The present invention relates to a stochastic codebook structure, a method for generating a codeword using this stochastic codebook structure, and method and devices for efficiently searching a stochastic codebook.
2. Brief description of the prior art:
The demand for efficient digital speech encoding techniques with a good subjective quality/bit rate trade-off is increasing for numerous applications such as voice transmission over land-mobile, satellite, digital radio, or packed networks, as well as voice storage, voice response, and wireless telephony.
A speech encoder converts a speech signal into a digital bitstream transmitted over a communication channel or stored in a storage medium. The speech signal is first digitized, i.e. sampled and quantized with usually 16 bits per sample. The speech encoder then represents these digital samples with a smaller number of bits while maintaining a good subjective speech quality. The speech decoder or synthesizer processes the transmitted or stored bitstream and converts it back to a sound signal.
One of the best prior art techniques capable of achieving a good quality/bit rate trade-off is the so-called Code Excited Linear
Prediction (CELP) technique. According to this technique, the sampled speech signal is processed in successive blocks of L samples usually called frames where L is some predetermined number (corresponding to
10-30 ms of speech). In CELP, a linear prediction (LP) filter is computed and transmitted every frame. The L-sample frame is then divided into smaller blocks called subframes of N samples, where L=rN and r is the number of subframes in a frame (N usually corresponds to 4-10 ms of speech). An excitation signal is determined in each subframe, which usually consists of two components: one from the past excitation (also called pitch contribution or adaptive codebook) and the other from an innovative codebook (also called fixed codebook). This excitation signal is transmitted and used at the decoder as the input of the LP synthesis filter in order to obtain the synthesized speech.
In the CELP context, an innovative codebook is an indexed set of Λ/-sample-long sequences which will be referred to as Λ/-dimensional codevectors. Each codebook sequence is indexed by an integer k
ranging from 1 to β where B represents the size of the innovative codebook often expressed as a number of bits b, where S=2b.
An innovative codebook can be stored in physical memory (e.g. look-up table) or can refer to a mechanism for relating the index k to a corresponding codevector (e.g. a formula).
To synthesize speech according to the CELP technique, each subframe (block of N samples) is synthesized by filtering an appropriate codevector from an innovative codebook through time varying filters modelling the spectral characteristics of the speech signal. At the encoder end, a synthetic output is computed for at least a subset of the codevectors of the innovative codebook (codebook search). The retained codevector is the one producing the synthetic output closest to the original speech signal according to a perceptually weighted distortion measure. This perceptual weighting is performed through a so-called perceptual weighting filter, which is usually derived from the LP filter.
A first type of innovative codebooks are the so-called "stochastic codebooks". A drawback of these codebooks is that they involve substantial physical storage. They are stochastic (i.e. random) in the sense that the path from index to codevector involves look-up tables which are the result of randomly generated numbers or statistical techniques applied to large speech training sets. The size of stochastic codebooks tends to be limited by storage and/or search complexity.
A second type of innovative codebooks are the algebraic codebooks. By contrast to the stochastic codebooks, algebraic codebooks are not random and require no substantial storage. An
algebraic codebook is a set of indexed codevectors in which the amplitudes and positions of the pulses of the /c*h codevector can be derived from a corresponding index k through a rule requiring no, or minimal physical storage. Therefore, the size of algebraic codebooks is not limited by storage considerations. Algebraic codebooks can be designed for efficient search. For these reasons, algebraic codebooks have known a considerable success in speech coding standards, where codebooks ranging from 17 bits (e.g. ITU-T Recommendation G.729) to 35 bits (ETSI Enhanced Full Rate GSM) were efficiently used.
As the bit rate is reduced, the number of pulses in the codevectors of an algebraic codebook is reduced. This results in lower performance for unvoiced frames and in case of background noise, where codevectors with stochastic contents are more suitable. This shows the need for stochastic codebooks with efficient storage and search techniques.
OBJECT OF THE INVENTION
An object of the present invention is therefore to provide a stochastic codebook structure with reduced storage requirements, a method for generating a codeword using this stochastic codebook structure, and method and devices for efficiently searching this stochastic codebook structure.
SUMMARY OF THE INVENTION
More specifically, in accordance with the present invention, there is provided a stochastic codebook structure for generating codevectors, comprising a stochastic table and a codevector generator. The stochastic table contains a set of M random vectors. The codevector generator is connected to the stochastic table and comprises means for adding a number P of random vectors from the stochastic table to produce a codevector.
The present invention also relates to a stochastic codebook structure for generating codevectors, comprising a stochastic table containing a set of M random vectors. The stochastic codebook structure also comprises a codevector generator connected to the stochastic table and including a combiner of subsets of P random vectors from the stochastic table. This combiner produces codevectors each by combination of a subset of P random vectors from the stochastic table.
Further in accordance with the present invention, there is provided a method for generating a codevector, comprising constructing a stochastic table containing a set of M random vectors and combining a number P of random vectors from the stochastic table to produce a codevector.
In accordance with preferred embodiments of the present invention:
- combining a number P of random vectors comprises adding the number P of random vectors from the stochastic table to produce the codevector;
- the number P is selected from the group consisting of 2 and 3;
- adding the number P of random vectors from the stochastic table to produce the codevector comprises computing the codevector using the following relation:
■S, V + S V
1 P, S2VP2 P P
where c denotes the codevector, v denotes the P random vectors, s1 ( s2, ..., sp are signs equal to -1 or 1 , and p 1 p2 ..., p p , are indices of the P random vectors.
Still further in accordance with the present invention, there is provided a method for efficiently searching a stochastic codebook having a stochastic table containing a set of M random vectors of dimension N to find the best codevector for encoding a signal. This stochastic codebook searching method comprises applying to the M random vectors a preselection criterion related to the signal, preselecting a subset of K random vectors amongst the M random vectors of the above mentioned set in relation to the preselection criterion, applying a search criterion related to the signal to combinations of P random vectors out of the K random vectors of the preselected subset, and selecting, in relation to the search criterion, one of the combinations of P random vectors forming the best codevector for encoding the signal.
The invention is concerned with a corresponding device for efficiently searching a stochastic codebook having a stochastic table containing a set of M random vectors of dimension N to find the best codevector for encoding a signal. This stochastic codebook searching device comprises means for applying to the M random vectors a preselection criterion related to the signal, and means for preselecting a subset of K (typically =6) random vectors amongst the M random vectors of the set in relation to the preselection criterion. This stochastic codebook searching device further comprises means for applying a search criterion related to the signal to combinations of P random vectors out of the K random vectors of the preselected subset, and means for selecting, in relation to the search criterion, one of the combinations of P random vectors forming the best codevector for encoding the signal.
In accordance with preferred embodiments of the stochastic codebook searching method and device:
- applying the preselection criterion comprises: calculating a dot product between: a backward filtered version of a target vector calculated during encoding of the signal and used for searching the stochastic codebook; and each of the M random vectors of the set; and preselecting a subset of K random vectors comprises: preselecting as the subset the K random vectors of the set with the largest absolute values of dot products; (This corresponds to testing only the numerator of the search criterion)
- calculating the dot product comprises calculating the backward filtered version d(n) of the search target vector x(n) by correlating the search target vector x(n) with h(n) in accordance with the following relation:
-V-l d(ή) = x(n) * h{-n) = ^ x(i)h(i - n) i-n
where h(n) is an impulse response of a weighted synthesis filter calculated during encoding of the signal;
- presetting a sign of each random vector of the subset, wherein the preset sign can be the sign of the corresponding dot product;
- applying a search criterion comprises calculating, for each combination of P random vectors, a mathematical relation involving the combination, the mathematical relation being advantageously a ratio involving the combination and the target vector;
(The search process then proceeds with testing the search criterion for all the possible combinations of P out of K vectors. For P=2, this corresponds to testing the search criterion Kχ(K+1)/2 times (36 times for =8). A full search requires testing the criterion Mχ(/W+1)/2 times (528 times for M=32). This shows the significant decrease in the search complexity using the search method of the invention (the decrease is more significant when P=3 and M=64)).
- selecting one of the combinations of P random vectors comprises selecting the combination with the largest ratio;
- calculating the ratio for each combination of P random vectors comprises: convolving each random vector of the subset of K random vectors with an impulse response of a weighted synthesis filter calculated during encoding of the signal and thereby producing K filtered random vectors; computing the energy of each filtered random vector; calculating a dot product of each filtered random vector with the target vector; and for each combination of P random vectors, computing the ratio in response to the corresponding P filtered random vectors, P computed energies and P calculated dot products;
- computing the ratio for each combination of P random vectors comprises computing the ratios for all possible combinations of P vectors through P nested calculations loops;
- calculating a gain of the signal representative codevector through a ratio having: a numerator constituted by a sum of the P dot products between the P random vectors of the selected one combination and the target vector; and a denominator involving the P computed energies and P filtered random vectors respectively corresponding to the P random vectors of the selected one combination;
- calculating an index of the best codevector, this index containing information about:
signs of the P random vectors of the selected one combination; and indices of the P random vectors of the selected one combination;
- P=2 and calculating the index of the best codevector comprises constructing the index / of the best codevector from the respective indices -i and p 2 and sign indices σ1 and σ2 (σ= 0 or 1 to identify the sign) of the two random vectors using the following relation:
l = s + 2 x (i1 + i2 x M)
and the following rules:
- if σ , ≠ σ2, and p1 < p2, then set j = ft , = p, , and s = σ2 ;
- if σ 1 ≠ σ 2 , and p, > p2 ,then set i, = p, , = p, , and s = σ 1 ; - if <7y = σ2 , and p? > p2 , then set = p> , 2 = p, , and s = σ1 ; and
- if σ? = σ2 , and p1 < p2 , then set ^ = p, , i, = Q, , and s = σ, ; (The number of bits needed to encode the index of each codevector is log2(M) and the sign information can be encoded with only one (1) bit. Accordingly, the stochastic codebook structure described hereinabove corresponds to a codebook of P log2(/W)+1 bits. As an example, with =32 and F^2, an 11-bit codebook can be constructed (5 bits for each vector and one (1) bit for the signs). Similarly, if M=32 and P=3, a 16-bit codebook can be obtained using only 32 random vectors.)
- P=3 and calculating the index of the best codevector comprises: dividing the stochastic table into two halves with M/2 random vectors in each half of the stochastic table;
determining the one of these two halves of the stochastic table which contains at least two of the three random vectors of the selected one combination; and constructing the index / of the best codevector using the following relation:
l = φ + 2x (s + 2x(i1 + i2 x M/2)) + Mx Mx (σ3 + 2 x p3 )
and the following rules:
- if σ1 ≠ σ2, and p1 < p2, then set i, = ft , = p, , and s = σ2 ; - if σ1 ≠ σ 2 , and , > p2 ,then set /j = p, , = ft , and s = σ1
- if σ 1 = σ2 , and pή > p2 , then set i, = ft , £ - p , and s = σ1
- if σ1 = σ2 , and p1 ≤ p2 , then set i, = p, , = ft , and s = σ, and
- if φ corresponds to the second half, i1 = i1 - M/2 and i2 = i2 - M/2; where:
- φ = 0 or 1 and denotes said one half of the stochastic table containing at least two of the three random vectors of the selected one combination;
- σ1 and , denote respective sign indices of the two random vectors located in said one half of the stochastic table;
- σ3 denotes a sign index of the third random vector; and
- p1 and p2 denote respective indices of the two random vectors located in said one half of the stochastic table; and - p3 denotes an indicia of the third random vector; and
- P=3 and the method and device further comprise calculating an index of the signal respresentative codevector, this index containing information about: signs of two of the three random vectors of the selected one combination; and indices of the two random vectors of the selected one combination; and a bit indicating that a pulse is chosen to replace a third of the three random vectors of the selected one combination.
Other variation of the codebook structure according to the invention can be obtained. In the case of P=3, the third vector can be replaced by a pulse covering the range 0,..,Λ/-1. The possibility of making the third vector replaced by a pulse gives the codebook more flexibility to capture special time events in the signal.
The stochastic codebook structure according to the present invention can be also used in conjunction with a sparse codebook (such as an algebraic codebook) where one (1) more bit can be used to denote which codebook is selected.
Finally, the present invention relates to a cellular communication system, a cellular network element, a cellular mobile transmitter/receiver unit, and a bidirectional wireless communication sub-system.
The objects, advantages and other features of the present invention will become more apparent upon reading of the following non restrictive description of a preferred embodiment thereof, given by way of example only with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the appended drawings:
Figure 1 is a schematic block diagram of a CELP-type speech encoding device;
Figure 2 is a schematic block diagram of a CELP-type speech decoding device;
Figure 3 is a flow chart describing computation, according to the present invention, of the index of an innovative codevector for the case P=2, P being the number of signed random vectors added to derive the innovative codevector;
Figure 4 is a schematic representation of two nested loops used for computing, according to the present invention, two optimum indices among K preselected random vectors (for the case P=2);
Figure 5 is a schematic representation of three nested loops used for computing, according to the present invention, three optimum indices among the K preselected random vectors (for the case P=3);
Figure 6 is a schematic flow chart summarizing the procedure according to the present invention for searching a stochastic codebook; and
Figure 7 is a simplified, schematic block diagram of a cellular communication system in which the present invention can be used.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
As well known to those of ordinary skill in the art, a cellular communication system such as 701 (see Figure 7) provides a telecommunication service over a large geographic area by dividing that large geographic area into a number C of smaller cells. The C smaller cells are serviced by respective cellular base stations 702,, 7022 ... 702c to provide each cell with radio signalling, audio and data channels.
Radio signalling channels are used to page mobile radiotelephones (mobile transmitter/receiver units) such as 703 within the limits of the coverage area (cell) of the cellular base station 702, and to place calls to other radiotelephones 703 located either inside or outside the base station's cell or to another network such as the Public Switched Telephone Network (PSTN) 704.
Once a radiotelephone 703 has successfully placed or received a call, an audio or data channel is established between this radiotelephone 703 and the cellular base station 702 corresponding to the cell in which the radiotelephone 703 is situated, and communication between the base station 702 and radiotelephone 703 is conducted over that audio or data channel. The radiotelephone 703 may also receive control or timing information over a signalling channel while a call is in progress.
If a radiotelephone 703 leaves a cell and enters another adjacent cell while a call is in progress, the radiotelephone 703 hands over the call to an available audio or data channel of the base station 702 of the new cell. If a radiotelephone 703 leaves a cell and enters another adjacent cell while no call is in progress, the radiotelephone 703 sends a control message over the signalling channel to log into the base station 702 of the new cell. In this manner mobile communication over a wide geographical area is possible.
The cellular communication system 701 further comprises a control terminal 705 to control communication between the cellular base stations 702 and the PSTN 704, for example during a communication between a radiotelephone 703 and the PSTN 704, or between a radiotelephone 703 located in a first cell and a radiotelephone 703 situated in a second cell.
Of course, a bidirectional wireless radio communication subsystem is required to establish an audio or data channel between a base station 702 of one cell and a radiotelephone 703 located in that cell. As illustrated in very simplified form in Figure 7, such a bidirectional wireless radio communication subsystem typically comprises in the radiotelephone 703:
- a transmitter 706 including:
- an encoder 707 for encoding the speech signal; and
- a transmission circuit 708 for transmitting the encoded speech signal from the encoder 707 through an antenna such as 709; and
- a receiver 710 including:
- a receiving circuit 711 for receiving a transmitted encoded speech signal usually through the same antenna 709; and
- a decoder 712 for decoding the received encoded speech signal from the receiving circuit 711.
The radiotelephone further comprises other conventional radiotelephone circuits 713 to which the encoder 707 and decoder 712 are connected and for processing signals therefrom, which circuits 713 are well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.
Also, such a bidirectional wireless radio communication subsystem typically comprises in the base station 702:
- a transmitter 714 including:
- an encoder 715 for encoding the speech signal; and
- a transmission circuit 716 for transmitting the encoded speech signal from the encoder 715 through an antenna such as 717; and
- a receiver 718 including:
- a receiving circuit 719 for receiving a transmitted encoded speech signal through the same antenna 717 or through another antenna (not shown); and - a decoder 720 for decoding the received encoded voice signal from the receiving circuit 719.
The base station 702 further comprises, typically, a base station controller 721 , along with its associated database 722, for controlling communication between the control terminal 705 and the transmitter 714 and receiver 718.
As well known to those of ordinary skill in the art, encoding is required in order to reduce the bandwidth necessary to transmit sound signal, for example voice signal such as speech, across the bidirectional wireless radio communication subsystem, i.e., between a radiotelephone 703 and a base station 702.
LP speech encoders (such as 715 and 707) typically operating at 13 kbits/second and below such as Code-Excited Linear Prediction (CELP) encoders typically use a LP synthesis filter to model the short term spectral envelope of the speech signal. The LP information is transmitted, typically, every 10 or 20 ms to the decoder (such 720 and 712) and is extracted at the decoder end.
The novel technique disclosed in the present specification can apply to different LP-based encoding systems. However, a CELP-type encoding system is used in the preferred embodiment of the present invention for illustrating these novel techniques. Although speech is used in this preferred embodiment as the signal to be encoded, this novel technique can also be applied to other types of signals.
Figure 1 is a general, schematic block diagram of a CELP-type speech encoding device.
Referring to Figure 1 , the sampled input speech 113 is divided into L-sample blocks called "frames". For each frame, the different parameters representing the speech signal in the frame are computed, encoded, and transmitted. These parameters include linear prediction (LP) parameters representing the LP synthesis filter and excitation parameters. The LP parameters are usually computed once every frame.
Each frame is further divided into smaller blocks of N samples (blocks of length N) in which the excitation parameters (adaptive and innovative parameters) are determined. In the CELP literature, these blocks of length N are called "subframes", and a Λ/-sample sequence in a subframe is referred to as a Λ/-dimensional vector.
In the present preferred embodiment, the value of N corresponds to 5 ms and that of L corresponds to 20 ms, which means that a frame ( =160 at the sampling rate of 8 kHz) contains four subframes (Λ/=40 at the sampling rate of 8 kHz). Various Λ/-dimensional vectors occur in the encoding procedure. A list of vectors which appear in Figures 1 and 2 as well as a list of transmitted parameters are given herein below:
List of the main -V-dimensional vectors s Input speech vector (after pre-processing); sw Weighted speech vector; s0 Zero-input response of the weighted synthesis filter W(z)/A(z); x' Target vector for adaptive codebook search; h Impulse response of the combination of synthesis and weighted filters; fτ Adaptive codevector at pitch lag 7; bfτ Adaptive codevector scaled by pitch gain b; yτ Filtered adaptive codevector (fj convolved with ft); x Target vector for innovative codebook search;
ck Innovative codevector at index k ( -th entry of the innovative codebook); g ck Innovative codevector scaled by the innovative codebook gain g; zk Filtered innovative codevector (c
5 convolved with ft);
V; Random vectors in the stochastic table of size M; Wj Filtered preselected random vectors; u Excitation codevector (scaled innovative
10 and adaptive codevectors); s' Synthesis signal before postfiltering; and d Correlation between target vector x and impulse response ft.
15 List of transmitted parameters
STP Short term prediction parameters (defining
A(z)); T Pitch lag (or adaptive codebook index); b Pitch gain (or adaptive codebook gain);
20 k Codevector index (innovative codebook entry); and g Innovative codebook gain.
List of other codec parameters and symbols
25 A(z) Short term prediction filter (LP filter) in the subframe; A(z) Quantized LP filter in the subframe;
W(z) Perceptual weighting filter;
W(z)/A(z) Weighted synthesis filter;
L Number of samples in a frame;
N Number of samples in a subframe;
M Number of random vectors in the stochastic table; P Number of added random vectors from the stochastic table to form the innovative codevector; K Number of preselected random vectors in the stochastic codebook, these preselected random vectors having indices pr p2 pκ and signs s1 , s2 , ... , sκ ; X Dot product between d and the random vectors v,; Sj Energies of the filtered preselected random vectors w,; j Dot products between the target vector x and the filtered preselected random vectors w LTP Long Term Prediction parameters; MSWE Mean-Squared Weighted Error;
H Lower triangular convolution matrix derived from the impulse response vecrot ft; and Q Innovative codebook search criterion.
DECODING PRINCIPLE
It is believed preferable to first describe the speech decoding device of Figure 2. Figure 2 is a schematic block diagram of a CELP-type speech decoding device and illustrates the various steps carried out between the digital input (input of the demultiplexer/decoder 201) and the output sampled speech (output of the postfilter 209).
The demultiplexer/decoder 201 extracts four types of parameters from the binary information (input bitstream 210) received through a digital input channel from the encoding device of Figure 1. From each received binary frame, the extracted parameters are:
- the short term prediction parameters STP (usually once per frame);
- the long-term prediction parameters (pitch lag Tand pitch gain b (usually once per subframe);
- the innovative codebook index k; and
- the innovative codebook gain g (usually once per subframe).
The current speech signal is synthesized on the basis of these parameters as will be explained hereinbelow.
The decoding device of Figure 2 comprises an innovative excitation generator 203 to produce an innovative codevector ck in response to the received index k. This innovative codevector ck is scaled by the innovative codebook gain g through a sealer 207. The innovative excitation
generator 203 is normally formed by an innovative codebook responsive to the index k to output the innovative codevector ck.
The LTP (Long Term Prediction parameters) which usually consists of the past excitation delayed by the pitch lag 7 is generated by a adaptive codebook 202. As illustrated in Figure 2, the adaptive codebook
202 is responsive to the pitch lag 7 and to the past excitation u stored in memory 204 to produce the adaptive codebook codevector fτ at delay 7.
The adaptive codevector fτ is scaled by the pitch gain b through a sealer 206 to obtain the signal bfτ. The signal bfτ is then added to the scaled innovative codevector gck through an adder 205 to produce the excitation codevector u. The contents of the adaptive codebook 202 is updated through the memory 204 which itself receives and stores the excitation codevector u.
The synthesized output speech s is obtained by filtering the excitation codevector u through a synthesis filter 208 of transfer function MA(z), and then through a postfilter 209. The synthesis filter 208 and the postfilter 209 are updated by the received STP parameters from the demultiplexer/decoder 201. Both filters 208 and 209 are well known to those of ordinary skill in the art and will not be further described in the present specification.
It should be pointed out here that the previously described operations are repeated on a subframe basis, where the subframe size is equal to the vector dimension N. Although the STP parameters are updated on a frame basis (2 to 5 subframes), a quantized LP filter A(z) is computed on a subframe basis as it is well known to those of ordinary skill in the art.
ENCODING PRINCIPLE
The sampled input speech signal 113 is processed on a frame by frame basis by the encoding device of Figure 1. In Figure 1 , the encoding device is broken down into 11 modules numbered from 101 to 112.
Each input frame is first processed through an optional preprocessing unit 101. This pre-processing unit 101 consists of a high pass filter with a 140 Hz cut-off frequency. This high pass filter removes the unwanted sound components below 140 Hz.
The output of the pre-processing unit 101 is denoted s(n). This signal is used for performing linear prediction (LP) analysis in module 102. LP analysis is a technique well known to those of ordinary skill in the art. In this preferred embodiment, the autocorrelation approach is used. In the autocorrelation approach, the signal is first windowed using a Hamming window (usually of the order of 20-30 ms). The autocorrelations are computed from the windowed signal, and Levinson-Durbin recursion is used to compute the LP parameters, a„ where /=1 ,...,p, and where p is the LP order which is typically 10. The parameters a, are the coefficients of the LP filter, which is given by the following relation:
P
A ( z ) 1 +
2 ∑ Σ = 1 α.
Module 102 performs LP analysis, as well as quantization and interpolation of the LP filter coefficients. The LP filter coefficients are first transformed into another equivalent domain more suitable for quantization and interpolation purposes. Line spectral pairs (LSP) and immitance spectral
pairs (ISP) are two domains in which quantization and interpolation can be efficiently performed. The 10 LP filter coefficients can be quantized in the order of 18 to 30 bits using split or multi-stage quantization, or a combination thereof. The purpose of the inteφolation is to enable updating of the LP filter coefficients every subframe while transmitting them once every frame; this improves the performance of the encoding device without increasing the bit rate. Quantization and inteφolation of the LP filter coefficients are believed to be otherwise well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.
The following paragraphs will describe the other encoding operations performed on a subframe basis. In the following description, the filter A(z) denotes the unquantized inteφolated LP filter of the subframe, and the filter A(z) denotes the quantized interpolated LP filter of the subframe.
Perceptual weighting
In analysis-by-synthesis encoders, the optimum adaptive and innovative parameters are searched by minimizing the mean squared error between the input speech and the synthesized speech in a perceptually weighted domain. This is equivalent to minimizing the error between the weighted input speech and the weighted synthesis speech.
The perceptually weighted signal sw(n) is computed in a perceptual weighting filter 103. Traditionally, the perceptually weighted signal sjn) is computed by a perceptual weighting filter having a transfer function W(z) of the form:
W(z) = Λ(z/γ.,) / Λ(z/γ2) where 0 < γ2 < y, < 1
Typical values of γ<, and γ2 are 0.9 and 0.6, respectively. Other forms of transfer function W(z) also exist in the literature and could be used.
Open-loop pitch analysis
In order to simplify the pitch analysis, an open-loop pitch lag TOL is first estimated in open-loop adaptive search module 104 using the weighted speech signal s n). Then the closed loop pitch analysis, which is performed on a subframe basis in closed-loop adaptive codebook search module 107, is restricted around the open-loop pitch lag T0L which significantly reduces the search complexity of the LTP parameters 7 and b (pitch lag 7 and pitch gain b).
In the preferred embodiment, open-loop pitch analysis is usually performed in open-loop adaptive search module once every 10 ms (two subframes) using techniques well known to those of ordinary skill in the art.
Target vector computation
The target vector x' for LTP analysis is first computed by the adder 105. This is usually done by subtracting the zero-input response s0 of the weighted synthesis filter W(z)/A(z) from the weighted speech signal s n). More specifically:
X' = s.
where -^ is the Λ/-dimensional target vector, sw is the weighted speech signal vector in the subframe, and s0 is the zero-input response of the filter W(z)/A(z), which is the output of the combined filter W(z)/A(z) due to its initial states. Note that alternative, but mathematically equivalent, approaches can be used to compute the target vector x'.
s0 is computed in the zero-input response calculating unit 110. More specifically, the zero-input response calculator 110 is responsive to the quantized interpolated LP filter A(z) from the LP analysis, quantization and interpolation module 102 and to the initial states of the weighted synthesis filter W(z)/A(z) stored in update memory module 111 to calculate the zero- input response s0 (that part of the response due to the initial states as determined by setting the inputs equal to zero) of filter W(z)/A(z). In update memory 111 , the states of the weighted synthesis filter W(z)/A(z) are updated by filtering the excitation signal
u = gck + bfτ
through the weighted synthesis filter W(z)/A(z). At the end of this filtering, the states of the weighted synthesis filter W(z)/A(z) are stored in update memory 111 and used in the next subframe as initial states for calculating the zero- input response in module 110. Similar to the target vector, other alternative, but mathematically equivalent approaches can be used to update the filter states. This operation is otherwise well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.
Adaptive codebook search
First, a Λ/-dimensional impulse response vector ft of the weighted synthesis filter W(z)/A(z) is computed in the impulse response generator 106 using the LP filter coefficients A(z) and A{z) from module 102. Again, this operation is well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification..
The closed-loop pitch or adaptive codebook parameters b and 7 are computed in the closed-loop adaptive codebook search module 107; this closed-loop adaptive codebook search module 107 is responsive to the target vector x' and the impulse response vector ft to compute these closed- loop pitch or adaptive codebook parameters b and 7.
Traditionally, pitch prediction has been represented by a pitch filter having the following transfer function:
where b is the pitch gain and 7 is the pitch lag. In this case, the pitch contribution to the excitation signal u(n) is given by bu(n-T), where the total excitation is given by
u (n) = bu (n- T) + gck ( n)
with g being the innovative codebook gain and ck(n) the innovative codevector at index k.
This representation has limitations if the delay 7 is less than the subframe length N. In another representation, the pitch contribution can be seen as an adaptive codebook containing the past excitation signal. Generally, each vector in the adaptive codebook is a shift-by-one version of the previous vector (discarding one sample and adding a new sample). For pitch lags T>N, the adaptive codebook is equivalent to the filter structure (1/(1 -bz'τ)), and the adaptive codevector f-^n) is given by:
fτ (n) = u ( n- T) , n = 0 , , N-1
For pitch lags 7 shorter than N, a codevector fγ(n) is built by repeating the available samples from the past excitation until the codevector is completed (this is not equivalent to the filter structure).
In recent encoders, higher pitch resolution is used which significantly improves the quality of voiced sound segments. This is achieved by oversampling the past excitation signal using polyphase inteφolation filters. In this case, the codevector fγ(n) may correspond to an interpolated version of the past excitation, with pitch lag 7 being a non- integer delay (e.g. 50.25).
The adaptive search consists of finding the best pitch lag 7 and gain b that minimize the mean squared weighted error E between the target vector x' and the scaled filtered past excitation, where error E is expressed as:
E = \\x -byj2
where yτ is the filtered adaptive codevector (f convolved with ft) at delay 7:
yΛ n) =fΛ n) *h { n) = fΛ i ) h ( n-±) , n=0 , . . . , N-l i=0
It can be shown that the error £ is minimized by maximizing the criterion:
where t denotes vector transpose.
In this preferred embodiment, a 1/3 subsample pitch resolution is used, and the adaptive search is composed of three stages.
In the first stage, an open-loop pitch lag T0L is estimated in the open-loop adaptive search module 104 in response to the perceptually weighted speech signal Sjn). As indicated in the foregoing description, this open-loop pitch analysis is usually performed once every 10 ms (two subframes) using techniques well known to those of ordinary skill in the art.
In the second stage, the search criterion C is searched in the closed-loop adaptive search module 107 for integer pitch lags 7 around the estimated open-loop lag T0L (usually ± 5), which significantly simplifies the search procedure. A simple procedure is used for updating the filtered adaptive codevector yτ without the need to compute the convolution for every pitch lag.
Once an optimum integer pitch lag is found, a third stage of the search (module 107) tests the fractions around that optimum integer pitch lag.
Innovative codebook search
Once the pitch or LTP parameters 7 and b are determined, searching for the optimum innovative excitation is conducted by means of innovative codebook search module 109. First, subtractor 108 updates the target vector x' by subtracting the LTP contribution from that target vector x'
x = x' - byτ
where b is the pitch gain and yr is the filtered adaptive codevector (the past excitation at delay 7 convolved with the impulse response ft). The new target vector x is used for the innovative codebook search and is therefore supplied to module 109.
The search procedure in CELP is performed by finding the optimum innovative codevector ck and gain g which minimize the mean- squared error between the weighted input speech and weighted synthesis speech. This is equivalent to minimizing the mean-squared error between the target vector x and the scaled filtered codevector byτ, as it is well known to those of ordinary skill in the art. The mean-squared weighted error (MSWE) is given by:
N-l
E = ∑ (x(n) gzΛn) n=0
where zk(n) is the filtered innovative codevector at index k given by:
zk[n) n * h (n) = Σ cΛ±)h n-i n = 0, ... ,N-1 i=0
It is usually easier to use vector and matrix notations to represent theMSWEE. That is:
E - || x-gzΛ2 = Wx-gHcΛ2
where zk = Hck is the filtered innovative codevector, and H is a lower triangular convolution matrix derived from the impulse response vector ft. The matrix H is given by:
h(0) 0 0 0 1
| h(1) h(0) 0 0
| h(2) h(1) h(0) ... 0
H= I
h(N-1)... h(2) h(1) h(0) J
By differentiating with respect to g, it can be shown that the MSWE E is minimized by maximizing the search criterion:
. • xx { x C )
In an exhaustive innovative codebook search, the search criterion is evaluated for all possible codevectors ck, - =0,...,β-1 , where B is the codebook size. For innovative codebooks exceeding 10 bits (1024 entries), an exhaustive search procedure becomes impractical. For sparse innovative codebooks where the codevectors contain few non-zero pulses, it is possible to construct huge codebooks and efficiently search them. For example, algebraic innovative codebooks of sizes as large as 35 bits can be easily constructed and searched using efficient non exhaustive search procedures. Example of such codebooks are given in the following US patents:
5,444,816 (Adouletal.) 1995
5,699,482 (Adouletal.) 1997
5,754,976 (Adouletal.) 1998
5,701,392 (Adouletal.) 1997.
For non sparse stochastic innovative codebooks, it is difficult to construct and search codebooks exceeding 10 bits. The use of sparse innovative codebooks was efficient at bit rates higher than 6 kbits/s. However, as the bit rate decreases, multi-mode coding becomes necessary,
where the speech signal is divided into different modes (e.g. voiced, unvoiced, transient, background noise) and a speech frame is encoded according to the selected mode. In voiced speech mode, algebraic codebooks with a small number of pulses are suitable, while in unvoiced or background noise modes, non sparse stochastic codebooks are more suitable. Even without the use of multi-mode encoding, it was found that using an innovative codebook which contains a mixture of algebraic and random codevectors improves the performance of low bit rate codecs. In this case, 1 bit can be used to denote whether the algebraic or stochastic part of the innovative codebook is selected.
The present invention is concerned with constructing and efficiently searching such large stochastic codebooks, in particular but not exclusively innovative codebooks. This is disclosed in the following description.
STRUCTURE OF THE STOCHASTIC CODEBOOK
According to the preferred embodiment, a Λ/-dimensional codevector is derived by the addition of P signed random vectors (typically P=2 or 3) from a stochastic table containing M random vectors of dimension
N (typically M=32 or 64). Let v, denote the -th Λ/-dimensional random vector in the stochastic table, then a codevector is constructed by:
where the signs sv s2, ..., sp are signs equal to -1 or 1 , and p., ft ..., ft, are the indices of the random vectors from the stochastic table.
The number of bits needed to encode the index of each vector v, is log2(/ ) and the sign information can be encoded with only 1 bit as will be seen below. So the structure described above corresponds to a codebook of size B=P log2(M)+1 bits.
As an example, with =32 and P=2, an 11 -bit stochastic codebook can be constructed with 5 bits for each vector and 1 bit for the signs. Similarly, if M=32 and P=3, a 16-bit stochastic codebook can be constructed.
This shows the memory efficiency of this new structure, since non sparse stochastic codebooks of sizes as large as 216 and higher can be constructed using only a table of M=32 or 64 vectors.
The random vectors in the stochastic table can be generated using several approaches. Some suggested approaches for generating the contents of the random vectors are:
- random generators with uniform distribution;
- random generators with Gaussian distribution;
- band-pass filtered vectors randomly generated as above;
- vectors generated using training algorithms;
- overlapping vectors obtained from a random sequence where each vector is a shift-by-k version from the previous vector, with k = 2 or 3
(this saves memory requirements since only the first vector is stored as N samples and other vectors need only k samples each);
- inverse DFT (Discrete Fourier Transform) of complex vectors with unit amplitude and random, uniformly distributed phases; and
- as above but with setting few first and last complex rays to zero (equivalent to band pass filtering).
Other approaches can be used for generating the contents of the random vectors without departing from the spirit of this invention.
In the present preferred embodiment, the last approach is used, where:
(1 ) complex vectors are generated with unit amplitudes and randomly generated phases (uniformly distributed between -π and π);
(2) the amplitudes of the few first and last rays are set to zero (to perform a sort of band-pass filtering); and
(3) inverse DFT is used to obtain the contents of the random vectors.
ENCODING THE CODEBOOK INDEX
Let's first consider the case where the table has M random vectors, and the innovative codevectors are generated by the addition of 2 random vectors from the stochastic table (P=2). That is, the codevectors are given by:
c = s,\, + -y2v .
In this case, two signs, s., and s2 , and two indices, /' and have to be encoded. The values of /' and j are in the range 0 to M-λ , so that encoding thereof requires log2(/W) bits for each index, and encoding of the signs requires one (1) bit for each sign. However, one (1) bit can be saved upon encoding the signs since the order of the vectors v, and vs is not important. For example, choosing v16 as the first vector and v25 as the second vector is equivalent to choosing v25 as the first vector and v16 as the second vector. Thus, a total of 2 log2(/ )+1 bits is required to encode the signs and indices of the two random vectors.
A simple approach for implementing encoding of the codevector is to use only 1 bit for the sign information and 2 log2(/W) bits for the two indices while ordering the indices in a way such that the other sign information can be easily deduced. To better explain this, Figure 3 shows a flow chart for calculating the index of the codevector from the vector indices p, and p 2 and corresponding sign indices σ, and σ2. According to this procedure, the codevector index is given by (step 308):
l = s + 2 x (i1 + i2 x M)
If σ1 ≠ σ2 (step 301), and p < p2 (step 302), then set in step 308 >ι = P∑ '< i∑ - PX< and s = <-§ (step 303).
If σ 1 ≠ σ2 (step 301 ), and p., > p2 (step 302), then set in step 308 - Pi < ~ P2 . ^d s = σι (steP 304).
If σ 1 = σ2 (step 301), and p-, > p2 (step 305), then set in step 308 i1 - p2 ; /2 = p1 ; and s = σ, (step 306).
If σ1 = σ2 (step 301), and p-, ≤ p 2 (step 305), then set in step 308 i1 = p ; = p2 ; and s = (step 307).
Thus, when constructing the index of the codevector, if the two signs are equal then the smaller index p1 orp2 is assigned to A, and the larger index p2oxp1 is assigned to i2 otherwise the larger index p ,or p 2 is assigned to /'-j and the smaller index p2 or p1 to -
When three random vectors are added to construct the excitation codevector, each vector requires \og2(M) bits and the sign information needs only one (1) bit; a total of 3 log2(/W)+1 bits. A simple way of performing encoding of the index of the codevector is the following. The stochastic table is divided into two halves with M/2 random vectors in each half. The half which contains at least two chosen vectors is then determined. This information, denoted by φ, is encoded with one (1) bit. The two vector indices in the same half are encoded according to the algorithm of Figure 3, and require 2 log2( 2)+1 bits (which is equal to 2 log2(/W)-1 bits). The third vector is encoded separately with log^M) bits for the index and one (1) bit for the sign. The total number of bits is 1 +(2log2(M2)+1 )+(log2(Λf)+1 )=3log2( )+1.
More specifically, in the case P=3, calculating the index of the signal representative codevector comprises: dividing the stochastic table into two halves with M/2 random vectors in each half of the stochastic table;
determining the one of the two halves of the stochastic table which contains at least two of the three random vectors of the selected one of the combinations;and constructing the index / of the best codevector using the following relation:
/ = φ + 2 x (s + 2 x(/', + i2 x M/2)) + M x M x (q + 2 x ft )
and the following rules:
- if σ 1 ≠ σ2, and p1 < p2, then set j = ft , = p , and s = σ2 ; - if σ1 ≠ σ2 , and p1 > p2 ,then set i, = p, , = ft , and s = σ1
- if σ1 = σ2 , and p1 > p2 , then set i, = ft , £ = p , and s = σ1
- if σ1 = σ2 , and p1 < p2 , then set i, - p, , i> = ft , and s = σ1 and
- if φ corresponds to the second half, i1 = i1 - M/2 and i2 = i2 - M/2; where:
- φ = 0 or 1 and denotes said one half of the stochastic table containing at least two of the three random vectors of said selected one combination;
- σ1 and denote respective sign indices of the two random vectors located in said one half of the stochastic table;
- σ3 denotes a sign index of the third random vector; and
- p1 and p2 denote respective indices of the two random vectors located in said one half of the stochastic table; and - p3 denotes an indicia of the third random vector.
FAST SEARCH PROCEDURE FOR THE STOCHASTIC CODEBOOK
The innovative codebook search is performed in the above described innovative codebook search module 109.
The codevectors are given by:
S V
2 P, p PB
The goal of the search procedure is to find the indices p., p2 ..., pp of the best P random vectors and their corresponding signs sv s, sp, which maximize the search criterion:
(x'z, )2 (x'Hc, )2 (d'c4 ): β* =
**** *k *k zk z
where x is the target vector and zk =Hck is the filtered innovative codevector at index k. Note that in the numerator of the search criterion, the dot product between x and zk is equivalent to the dot product between d and c k, where d=H *x is the backward filtered version of the target vector x which is also the correlation between the target vector x and the impulse response ft. To find the elements of the vector d, the following relation is used (step 601 of Figure 6):
N-l d( ) = x(ή) *h(-n) = ∑x(i)h(i - )
Since d is independent of the codevector index k, it is computed only once; this simplifies the computation of the numerator for the different codevectors.
After computing the vector d, a preselection process is used to identify K out of the M random vectors in the stochastic table, so that the search process is then confined to those K vectors.
This preselection is performed by testing the numerator of the search criterion Qk for the M random vectors and by selecting the K vectors which have the largest absolute dot products (or squared dot product) between d and v„ /=0,...,M-1. More specifically, the dot products χ, given by:
N-\
Z, = ∑ d{ή)v, (n) n=0
are calculated for all the random vectors v, (step 602 of Figure 6) and the indices of the K vectors which result in the K largest values of |χ,| are retained (step 603 of Figure 6). These indices are stored in the index vector m„ /=0,..., -1.
To further simplify the search, the sign information corresponding to each preselected vector is also preset. The sign corresponding to each preselected vector is given by the sign of χ, for that vector (step 603 of Figure 6). These preset signs are stored in the sign vector s,, /'=0,..., -1.
The innovative codebook search is now confined to the preselected K vectors with their corresponding signs. For typical values of M=64, P=2, and K=6, the search is reduced to finding the best combination of P = 2 vectors among K = 6 random vectors instead of finding them among 64 random vectors. This reduces the number of tested vector combinations from 64x65/2 to 6x7/2.
Once the best promising K vectors and their corresponding signs are predetermined, the search proceeds for selecting P vectors among those K vectors which maximize the search criterion Qk.
The filtered vectors wt, y'=1 ,...,K corresponding to the K preselected vectors, are first calculated (step 604 of Figure 6) and stored.
This can be performed by convolving the preselected vectors with the impulse response ft of the weighted synthesis filter. The sign information is also included in the filtered vectors; i.e.:
wJ (ή) = sJ ∑vmj (i)h(n-i) , n=0,...,ΛM , y=0,..., -1.
.=0
The energy of each filtered preselected vector is then computed (step 605 of Figure 6):
^ = w = f>2 (fl) , y=0,..., -1 n=0
as well as its dot product with the target vector (step 605 of Figure 6):
N-\ j -= χ'W / = £ w, (M)X(/J) , y'=0,..., -1.
.1=0
Note that p} and ε} correspond to the numerator and denominator of the search criterion due to each preselected vector.
The search then proceeds with the selection of P vectors among the K preselected vectors by maximizing the search criterion Qk (step 606 of Figure 6).
Let's first start with the case where two vectors are added from the stochastic table (P=2). The search reduces to finding two vector indices p, and p2 among the K preselected vectors. In case of P=2, a codevector is given by:
c = Jι + S2 V P2
The filtered innovative codevector z is given by:
Note that the predetermined signs are included in the filtered preselected vectors Wj.
For two vectors, the search criterion is given by (the codevector index k is dropped for simplicity)
β _ (*'* )' _ (x'wΛ + 'wA )2 _ (Ppι + Ppι )2 z' z ( vw P„\ + w P„ι )J' + P_i )' ε P„\ + ε Pni + 2 ' P„\ w Pi
The vectors wy and the values of p} and ε} are computed before starting the codebook search. Then Q is evaluated using two nested loops for all possible positions p, and p2. Only the dot products between the different vectors wy need to be computed inside the loop.
The search procedure is shown in Figure 4. The search criterion is computed as Q=R2/D. However, cross product is used in comparing the present Q with the optimum one Qopt, in order to avoid the division inside the loop; more specifically, testing if Q>Qop{ is equivalent to testing if R2Dopt>R2 optD.
At the end of the two nested loops, the optimum vector indices p-, and p2 will be known. The two indices and the corresponding signs are then encoded as shown in Figure 3. The gain of the innovative codevector is then found by
PPl + P Pi g = D opt
The case where the codevector is found by adding three vectors from the stochastic codebook (P=3) will now be briefly considered. The K preselected random vectors and their corresponding signs are found in the same manner as described above. The filtered preselected vectors Wj and their product with x, p and their energies, εs, are found also before starting the codebook search. The search then proceeds with finding the best three vectors among the K preselected vectors, by computing the search criterion Q=R2/D using three nested loops.
Figure 5 shows the structure of the search in three nested loops in case of adding three random vectors to construct the innovative codevector. At the end of the three nested loops, the optimum vector indices p.,, p2, and p3 will be known. The gain of the innovative codevector is then found by:
Once the optimum codevector and its gain are chosen, the codebook index k and gain g are encoded and transmitted.
As indicated in the foregoing description, the search procedure described above is summarized in the flow chart of Figure 6.
The stochastic codebook disclosed in the present invention can be used alone or in conjunction with a sparse innovative codebook such as an algebraic codebook. In this case, one (1) bit can be used to denote whether the algebraic section or the stochastic section of the innovation codebook is chosen. Both sections are searched and a candidate from each section is retained. The two candidates are compared and the one which maximizes the selection criterion Q is chosen. A modified selection criterion can be used for choosing the winner among the two codebook sections, by taking into consideration the nature of the current speech signal in the subframe. Criteria such as the pitch gain, the synthesis filter tilt, etc. can be used to modify the search criterion such that to favour the algebraic part of the codebook in case of periodic signals (high pitch gain and strong tilt) or to favor the stochastic section otherwise.
Other variants of the stochastic codebook are also possible. One such variant is to have the flexibility of replacing the third random vector, in case of P=3, with a single pulse. In this case, one (1) bit is needed for indicating that a pulse is chosen to replace the third random vector. This helps in capturing special time events in the signal.
Although the present invention has been described hereinabove by way of a preferred embodiment thereof, this embodiment can be modified at will, within the scope of the appended claims, without departing from the spirit and nature of the subject invention.