EP0801789A1

EP0801789A1 - Speech coding method using synthesis analysis

Info

Publication number: EP0801789A1
Application number: EP96901009A
Authority: EP
Inventors: William Navarro; Michel Mauc
Original assignee: Matra Communication SA
Current assignee: Nortel Networks France SAS
Priority date: 1995-01-06
Filing date: 1996-01-03
Publication date: 1997-10-22
Anticipated expiration: 2016-01-03
Also published as: AU4490296A; WO1996021219A1; CN1134761C; ATE174147T1; EP0721180A1; DE69601068D1; DE69601068T2; ATE183600T1; FR2729244A1; DE69603755T2; US5899968A; CN1173940A; EP0721180B1; DE69603755D1; EP0801789B1; FR2729244B1

Abstract

The method involves using a digitiser (18) to form a speech signal into successive frames divided into typically 4 subframes of 40 samples of 16 bits. A coder (16) delivers a binary sequence at a substantially slower rate to a channel encoder (22), introducing error detection and/or correction bits. Each frame is analysed by short-term linear prediction (26) to determine coeffts. of a short-term synthesis filter. For each subframe, an excitation sequence is determined which after filtering produces a synthetic signal representing the speech. The subframe is divided into segments corresp. to pulses of stochastic excitation (40), and the positions of these pulses are found so that there is at most one in each segment.

Description

SYNTHESIS ANALYSIS SPEECH CODING METHOD

The present invention relates to speech coding using synthesis analysis.

The applicant has in particular described such speech coders which it developed in its European patent applications 0 195 487, 0 347 307 and 0 469 997.

In a speech coder with synthesis analysis, a linear prediction of the speech signal is carried out to obtain the coefficients of a short-term synthesis filter modeling the transfer function of the vocal tract. These coefficients are transmitted to the decoder, as well as parameters characterizing an excitation to be applied to the short-term synthesis filter. In most of the current coders, further research is carried out on the longer-term correlations of the speech signal in order to characterize a long-term synthesis filter accounting for the pitch of the speech. When the signal is voiced, the excitation indeed has a predictable component which can be represented by the past excitation, delayed by TP samples of the> speech signal and affected by a gain g _P. The long-term synthesis filter, also reconstituted at the decoder, then has a transfer function of the form 1 / B (z) with B (z) = 1-g _P .z ^-TP . The remaining, unpredictable part of the excitation is called stochastic excitation. In so-called CELP ("Code Excited Linear Prediction") coders, stochastic excitation consists of a vector searched for in a predetermined dictionary. In so-called MPLPC ("Multi-Pulse Linear Prediction Coding") coders, the stochastic excitation comprises a certain number of pulses whose positions are sought by the coder. In general, CELP coders are preferred for low transmission rates, but they are more complex to implement than MPLPC coders.

An object of the present invention is to provide a speech coding method in which the search for stochastic excitation is simplified.

The invention thus proposes a coding method with analysis by synthesis of a speech signal digitized in successive frames divided into sub-frames of lst samples, in which a linear prediction analysis is carried out for each frame to determine the coefficients of a short-term synthesis filter, and an excitation sequence with no contributions each associated with a respective gain is determined for each sub-frame so that the excitation sequence subjected to the short-term synthesis filter produces a synthetic signal representative of the speech signal, the ne contributions of the excitation sequence and the associated gains being determined by an iterative process in which the iteration n (0≤n <nc) comprises:

- the determination of the contribution n which maximizes the quantity (F _p · e _{n -1} ^T ) ² / F _p . F _p ^T ) where F _p denotes a line vector with lst components equal to the products of convolution between a possible value of the contribution n and the impulse response of a filter composed of the short-term synthesis filter and a weighting filter perceptual, and e _n-1 denotes a target vector determined during the iteration n-1 if n≥1 and e _{- 1} = X is an initial target vector; and

- the calculation of n + 1 gains forming a line vector g _n = (g _n (0), ..., g _n (n)) by solving the linear system g _n . B _n = b _n , where B _n is a symmetric matrix with n + 1 rows and n + 1 columns whose component B _n (i, j) (0≤i, j≤n) is equal to the scalar product F _{p ( i)} .F _{p (j)} ^T where F _{p (i)} and F _{p (j)} respectively designate the line vectors equal to the convolution products between the contributions i and j previously determined and the impulse response of the compound filter, and b _n is a line vector with n + 1 components b _n (i) (0≤i≤n) respectively equal to the scalar products between the vectors F _{p (i)} and the initial target vector X,

the ne gains associated with ne contributions of the excitation sequence being those calculated during the nc-1 iteration. At each iteration n (0≤n <nc), we calculate the rows n of three matrices L, R and K with no rows and no columns such as B _n = L _n .R _n ^τ and L _n = R _n .K _n where L _n , R _n and K _n denote matrices with n + 1 rows and n + 1 columns corresponding respectively to the first n + 1 rows and to the first n + 1 columns of said matrices L, R and K, the matrices L and R being lower triangular, the matrix K being diagonal, and the matrix L having only 1 on its main diagonal, we calculate the line n of the matrix L inverse of the matrix L, and we calculate the n + 1 gains according to the relation g _n = b _n .K _n . (L _n ^-1 ) ^T .L _n ^-1 where L _n ^-1 designates the matrix with n + 1 rows and n + 1 columns corresponding respectively to the first N + 1 rows and to the n + 1 first columns of the inverse matrix L ^-1 .

This excitation search mode limits the complexity of the calculations required to determine the excitation sequence, by making it possible to carry out only one division or inversion by iteration. In the case of an MPLPC coder, the contributions can be impulse contributions. This excitation search mode is not however applicable exclusively to MPLPC coders. It is applicable for example to so-called VSELP coders where the contributions to stochastic excitation are vectors chosen from a predetermined dictionary (see I. Gerson and M. Jasiuk: "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 kb / s ", Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Albuquerque 1990, Vol. 1, pages 461-464). Furthermore, the contributions cannot include the contribution corresponding to the delayed excitation delayed by TP samples, whose associated gain g _P is recalculated during successive iterations, or several contributions of this nature if several LTP delays are determined.

Other features and advantages of the invention will appear in the description below of preferred, but non-limiting, examples of embodiment, with reference to annexed drawings, in which:

- Figure 1 is a block diagram of a radio station incorporating a speech encoder implementing the invention;

- Figure 2 is a block diagram of a radio station capable of receiving a signal produced by that of Figure 1;

- Figures 3 to 6 are flowcharts illustrating an open loop LTP analysis process applied in the speech coder of Figure 1;

- Figure 7 is a flowchart illustrating a process for determining the impulse response of the weighted synthesis filter applied in the speech coder of Figure 1;

FIGS. 8 to 11 are flowcharts illustrating a process for finding the stochastic excitation applied in the speech coder of FIG. 1.

A speech coder implementing the invention is applicable in various types of speech transmission and / or storage systems using a digital compression technique. In the example of Figure 1, the speech coder 16 is part of a mobile radio station. The speech signal S is a digital signal sampled at a frequency typically equal to 8 kHz. The signal S comes from an analog-digital converter 18 receiving the amplified and filtered output signal from a microphone 20. The converter 18 puts the speech signal S in the form of successive frames themselves subdivided into nst sub-frames of the samples. A frame of 20 ms typically comprises nst = 4 sub-frames of lst = 40 samples of 16 bits at 8 kHz. Upstream of the encoder 16, the speech signal S can also be subjected to conventional shaping treatments such as Hamming filtering. The speech coder 16 delivers a binary sequence of bit rate significantly lower than that of the speech signal S, and addresses this sequence to a channel coder 22 the function of which is to introduce redundancy bits into the signal in order to allow detection and / or correction of possible transmission errors. The output signal from the channel encoder 22 is then modulated on a carrier frequency by the modulator 24, and the modulated signal is transmitted on the air interface.

The speech coder 16 is a synthesis analysis coder. The coder 16 determines on the one hand parameters characterizing a short-term synthesis filter modeling the speaker's vocal tract, and on the other hand an excitation sequence which, applied to the short-term synthesis filter, provides a synthetic signal constituting an estimate of the speech signal S according to a perceptual weighting criterion.

The short-term synthesis filter has a transfer function of the form 1 / A (z), with:

The coefficients a _i are determined by a module 26 for short-term linear prediction analysis of the speech signal S. The a _i are the linear prediction coefficients of the speech signal S. The order q of the linear prediction is typically of the order of 10. The methods applicable by module 26 for short-term linear prediction are well known in the field of speech coding. Module 26, for example, implements the Durbin-Levinson algorithm (see J. Makhoul: "Linear Prediction: A tutorial review", Proc. IEEE, Vol. 63, N ° 4, April 1975, p. 561-580 ). The coefficients a _i obtained are supplied to a module 28 which converts them into spectral line parameters (LSP). The representation of the prediction coefficients a _i by LSP parameters is frequently used in speech coders with analysis by synthesis. The LSP parameters are the q numbers cos (2πf _i ) arranged in descending order, the q spectral line frequencies (LSF) normalized f _i (1≤i≤q) being such that the complex numbers exp (2πjf _i ), with i = 1.3, ..., q-1, q + 1 and f _{q + 1} = 0.5, let the roots of the polynomial Q (z) defined by Q (z) = A (z) + z ^{- (q + 1)} .A (z ^-1 ) and let the complex numbers exp (2πjf _i ), with i = 0 , 2,4, ..., q and f ₀ = 0 _' are the roots of the polynomial Q (z) defined by Q ^* (z) = A (z) -z ^{- (q + 1)} .A (z ^{- 1} ).

The LSP parameters can be obtained by the conversion module 28 by the classical method of Chebyshev polynomials (see P. Kabal and RP Ramachandran: "The computation of a spectral frequencies using Chebyshev polynomials", IEEE Trans. ASSP, Vol.34, No. 6, 1986, pages 14191426). These are quantization values of the LSP parameters, obtained by a quantization module 30, which are transmitted to the decoder so that the latter finds the coefficients a _i of the short-term synthesis filter. The coefficients a _i can be found simply, given that:

To avoid sudden variations in the transfer function of the short-term synthesis filter, the LSP parameters are interpolated before the prediction coefficients a _{i are} deduced therefrom. This interpolation is performed on the first sub-frames of each frame of the signal. For example, if LSP _t and LSP _t-1 respectively designate a LSP parameter calculated for frame t and for the previous frame t-1, we take: LSP _t (0) = 0.5. LSP _t-1 +0, 5.LSP _t , LSP _t (1) = 0.25.LSP _t-1 + 0.75.LSP _t and LSP _t (2) = ... = LST _t (nst-1) = LSP _t for sub -frames 0, 1, 2, ..., nst-1 of frame t. The coefficients a _i of the filter 1 / A (z) are then determined, sub-frame by sub-frame from the interpolated LSP parameters.

The non-quantified LSP parameters are supplied by the module 28 to a module 32 for calculating the coefficients of a perceptual weighting filter 34. The perceptual weighting filter 34 preferably has a transfer function of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ) where γ ₁ and γ ₂ are coefficients such that γ ₁ > γ ₂ > 0 (for example γ., = 0.9 and γ ₂ = 0.6) . The coefficients of the perceptual weighting filter are calculated by the module 32 for each subframe after interpolation of the LSP parameters received from the module 28.

The perceptual weighting filter 34 receives the speech signal S and delivers a perceptually weighted signal SW which is analyzed by modules 36, 38, 40 to determine the excitation sequence. The excitation sequence of the short-term filter consists of an excitation predictable by a long-term synthesis filter modeling the pitch (pitch) of the speech, and a non-predictable stochastic excitation, or innovation sequence. .

Module 36 performs long-term prediction

(LTP) in open loop, i.e. it does not directly contribute to the minimization of the weighted error. In the case shown, the weighting filter 34 intervenes upstream of the open-loop analysis module, but it could be otherwise: the module 36 could operate directly on the speech signal S or even on the signal S cleared of its short-term correlations by a transfer function filter A (z). On the other hand, the modules 38 and 40 operate in closed loop, that is to say that they contribute directly to the minimization of the perceptually weighted error.

The long-term synthesis filter has a transfer function of the form 1 / B (z) with B (z) = 1-g _P. z ^-TP where g _P denotes a long-term prediction gain and TP denotes a long-term prediction delay. The delay in prediction in the long term can typically take N = 256 values between rmin and rmax samples. A fractional resolution is provided for the smallest delay values so as to avoid discernible differences in terms of voicing frequency. We use for example a resolution 1/6 between rmin-21 and 33 + 5/6, a resolution 1/3 between 34 and 47 + 2/3, a resolution 1/2 between 48 and 88 + 1/2, and a integer resolution between 89 and rmax = 142. Each possible delay is thus quantified by an integer index between 0 and N-1 = 255.

The long-term prediction delay is determined in two stages. In the first step, the open loop LTP analysis module 36 detects the voiced frames of the speech signal and determines, for each voiced frame, a degree of voicing MV and a search interval for the long-term prediction delay. The degree of voicing MV of a voiced frame can take three values: 1 for weakly voiced frames, 2 for moderately voiced frames, and 3 for very voiced frames. In the notations used below, we take a degree of voicing MV = 0 for the unvoiced frames. The search interval is defined by a central value represented by its quantization index ZP and by a width in the domain of the quantization indexes, depending on the degree of voicing MV. For weakly or moderately voiced frames {MV = 1 or 2) the width of the search interval is N1 index, that is to say that the index of the long-term prediction delay will be sought between ZP- 16 and ZP + 15 if N1 = 32. For very close frames (MV = 3), the width of the search interval is N3 index, i.e. the index of the long-term prediction delay will be sought between ZP-8 and ZP +7 if N3 = 16.

Once the degree of voicing MV of a frame has been determined by the module 36, the module 30 operates the quantization of the LSP parameters which have previously been determined for this frame. This quantification is for example vector, that is to say it consists in selecting, in one or more quantification tables predetermined, a set of quantized parameters LSP _Q which has a minimum distance from the set of parameters LSP provided by the module 28. In known manner, the quantization tables differ according to the degree of voicing MV provided to the quantization module 30 by the open loop analyzer 36. A set of quantization tables for a degree of voicing MV is determined, during prior tests, so as to be statistically representative of frames having this degree MV. These sets are stored both in the coders and in the decoders implementing the invention. The module 30 delivers the set of quantized parameters LSP _{Q as} well as its index Q in the applicable quantification tables.

The speech coder 16 further comprises a module 42 for calculating the impulse response of the filter composed of the short-term synthesis filter and the perceptual weighting filter. This compound filter has the transfer function W (z) / A (z). For the calculation of its impulse response h = (h (0), h (1), ..., h (lst-D) over the duration of a subframe, the module 42 takes for the perceptual weighting filter W (z) that corresponding to the LSP parameters interpolated but not quantified, that is to say the one whose coefficients were calculated by the module 32, and for the synthesis filter 1 / A (z) that corresponding to the LSP parameters quantized and interpolated, that is to say the one that will be effectively reconstructed by the decoder.

In the second step of determining the delay

Long-term prediction TP, the closed-loop LTP analysis module 38 determines the delay TP for each sub-frame of the voiced frames (MV = 1, 2 or 3). This delay TP is characterized by a differential value DP in the field of quantification indexes, coded on 5 bits if MV = 1 or 2 (N1 = 32), and on 4 bits if MV = 3 (N3 = 16). The TP delay index is ZP + DP. As is known, closed-loop LTP analysis consists in determining, in the search interval for long-term prediction delays T, the delay TP which maximizes, for each sub-frame of a voiced frame, the normalized correlation:

where x (i) denotes the weighted speech signal SW of the subframe from which the memory of the weighted synthesis filter has been subtracted (i.e. the response to a zero signal, due to its initial states, of the filter whose impulse response was calculated by module 42), and y _T (i) denotes the convolution product:

u (j-T) designating the predictable component of the delayed excitation sequence of T samples, estimated by the well-known technique of the adaptive codebook. For delays T less than the length of a subframe, the missing values of u (j-T) can be extrapolated from the previous values. Fractional delays are taken into account by oversampling the signal u (j-T) in the adaptive repertoire. An oversampling of a factor m is obtained by means of polyphase interpolating filters.

The gain g _P of long-term prediction could be determined by the module 38 for each sub-frame, by applying the known formula:

However, in a preferred version of the invention, the gain g _P is calculated by the stochastic analysis module 40.

The stochastic excitation determined for each subframe by the module 40 is of the multi-pulse type. An innovation sequence of lst samples comprises np pulses of positions p (n) and of amplitude g (n). In other words, the pulses have an amplitude of 1 and are associated with respective gains g (n). Since the LTP delay is not determined for the sub-frames of the unvoiced frames, a higher number of pulses can be taken for the stochastic excitation relating to these sub-frames, for example np = 5 if MV = 1, 2 or 3 and np = 6 if MV = 0. The positions and gains calculated by the module 40 of stochastic analysis are quantified by a module 44.

A bit scheduling module 46 receives the various parameters which will be useful to the decoder, and constitutes the binary sequence transmitted to the channel coder 22. These parameters are:

- the Q index of the LSP parameters quantized for each frame;

- the degree MV of voicing of each frame;

- the ZP index of the center of the LTP delay search interval for each voiced frame;

- the differential index DP of the delay LTP for each sub-frame of a voiced frame, and the associated gain g _P ;

- the positions p (n) and the gains g (n) of the pulses of the stochastic excitation for each sub-frame.

Some of these parameters may have a particular importance in the quality of speech reproduction or a particular sensitivity to transmission errors. A module 48 is thus provided in the encoder which receives the various parameters and which adds to some of them redundancy bits making it possible to detect and / or correct any transmission errors. For example, the degree of voicing MV coded on two bits being a critical parameter, we want it to reach the decoder with as few errors as possible. For this reason, redundancy bits are added to this parameter by the module 48. It is for example possible to add a parity bit to the two bits coding MV and to repeat once the three bits thus obtained. This example of redundancy makes it possible to detect all the single or double errors and to correct all the simple errors and 75% of the double errors.

The allocation of the bit rate per 20 ms frame is for example that indicated in Table I.

In the example considered here, the channel coder 22 is that used in the pan-European system of radiocommunication with mobiles (GSM). This channel coder, described in detail in Recommendation GSM 05.03, was developed for a 13 kbit / s speech coder of RPE-LTP type which also produces 260 bits per 20 ms frame. The sensibility of each of the 260 bits was determined from listening tests. The bits from the source encoder have been grouped into three categories. The first of these categories IA groups 50 bits which are coded convolutionally on the basis of a generator polynomial giving a redundancy of one half with a constraint length equal to 5. Three parity bits are calculated and added to the 50 bits of the category IA before convolutional coding. The second category (IB) has 132 bits which are protected at a rate of a half by the same polynomial as the previous category. The third category (II) contains 78 unprotected bits. After application of the convolutional code, the bits (456 per frame) are subjected to interleaving. The scheduling module 46 of the new source coder implementing the invention distributes the bits in the three categories according to the subjective importance of these bits.

A mobile radio station capable of receiving the speech signal processed by the source encoder 16 is shown diagrammatically in FIG. 2. The received radio signal is first processed by a demodulator 50 then by a channel decoder 52 which performs the dual operations those of the modulator 24 and of the channel coder 22. The channel decoder 52 supplies the speech decoder 54 with a binary sequence which, in the absence of transmission errors or when the possible errors have been corrected by the channel decoder 52, corresponds to the binary sequence delivered by the scheduling module 46 at the level of the coder 16. The decoder 54 comprises a module 56 which receives this binary sequence and which identifies the parameters relating to the different frames and subframes. The module 56 also performs some checks on the parameters received. In particular, the module 56 examines the redundancy bits introduced by the module 48 of the coder, to detect and / or correct the errors affecting the parameters associated with these redundancy bits.

For each speech frame to be synthesized, a module

58 of the decoder receives the degree of voicing MV and the index of Q for quantifying LSP parameters. The module 58 finds the quantized LSP parameters in the tables corresponding to the value of MV, and, after interpolation, converts them into coefficients a _i for the short-term synthesis filter 60. For each speech sub-frame to be synthesized, a pulse generator 62 receives the positions p (n) of the np pulses of the stochastic excitation. The generator 62 delivers pulses of unit amplitude which are each multiplied by 64 by the associated gain g (n). The output of amplifier 64 is addressed to the long-term synthesis filter 66. This filter 66 has an adaptive directory structure. The output samples u of the filter 66 are stored in the adaptive directory 68 so as to be available for the subsequent subframes. The delay TP relative to a subframe, calculated from the quantization indices ZP and DP, is supplied to the adaptive repertoire 68 to produce the signal u suitably delayed. The amplifier 70 multiplies the signal thus delayed by the gain g _P of long-term prediction. The long-term filter 66 finally comprises an adder 72 which adds the outputs of amplifiers 64 and 70 to provide the excitation sequence u. When the LTP analysis has not been carried out at the coder, for example if MV = 0, a prediction gain g _P zero is imposed on the amplifier 70 for the corresponding sub-frames. The excitation sequence is addressed to the short-term synthesis filter 60, and the resulting signal can also, in known manner, be subjected to a post-filter 74 whose coefficients depend on the synthesis parameters received, to form the signal of synthetic speech S '. The output signal S 'of the decoder 54 is then converted into analog by the converter 76 before being amplified to control a loudspeaker 78.

We will now describe, with reference to FIGS. 3 to 6, the LTP open loop analysis process implemented by the module 36 of the coder according to a first aspect of the invention.

In a first step 90, the module 36 calculates and memorizes, for each sub-frame st = 0.1, ..., nst-1 of the current frame, the autocorrelations C _st (k) and the delayed energies G _st (k) of the weighted speech signal SW for the delays integers k between rmin and rmax:

The energies per sub-frame R0 _st are also calculated:

In step 90, the module 36 also determines, for each sub-frame st, the entire delay K _st which maximizes the open-loop estimation P _st (k) of the long-term prediction gain on the sub-frame st, excluding the delays k for which the autocorrelation C _st (k) is negative or smaller than a small fraction ε of the energy R0 _st of the subframe. The estimate P _st (k) expressed in decibels is written:

Maximizing P _st (k) therefore amounts to maximizing the expression X _st (k) = C _st . ² (k) / G _st: (k) as shown in FIG. 6. The entire delay K _st is the basic delay in full resolution for the sub-frame st. Step 90 is followed by a comparison 92 between a first open loop estimate of the overall prediction gain over the current frame and a predetermined threshold S0 typically between 1 and 2 decibels (for example S0 = 1.5 dB). The first estimate of the overall prediction gain is equal to: where R0 is the total energy of the frame (R0 = R0 ₀ + R0 ₁₊ ... + R0 _nst-1 ), and X _st (K _st ) = C _st ² (K _st ) / G _sC (K _st ) designates the maximum determined in step 90 relative to the subframe st. As shown in Figure 6, comparison 92 can be performed without having to calculate the logarithm.

If the comparison 92 shows a first estimate of the prediction gain below the threshold S0, it is considered that the speech signal contains too few long-term correlations to be seen, and the degree of voicing MV of the current frame is taken equal to 0 in step 94, which in this case ends the operations performed by the module 36 on this frame. If on the contrary the threshold S0 is exceeded in step 92, the current frame is detected as voiced and the degree MV will be equal to 1, 2 or 3. The module 36 then calculates, for each subframe st, a list I _st containing candidate delays to constitute the ZP center of the search interval for long-term prediction delays.

The operations performed by the module 36 for each subframe st (st initialized to 0 at step 96) of a voiced frame begin with the determination 98 of a selection threshold SE _st in decibels equal to a determined fraction β of the estimate P _st (K _st .) of the prediction gain in decibels on the subframe, maximized in step 90 (β = 0.75 typically). For each sub-frame st of a voiced frame, the module 36 determines the basic delay rbf in full resolution for the rest of the processing. This basic delay could be taken equal to the integer K _st obtained in step 90. The fact of finding the basic delay in fractional resolution around K _st however makes it possible to gain in precision. Step 100 thus consists in finding, around the integer delay K _st obtained in step 90, the fractional delay which maximizes the expression C _st ² / G _st . This search can be carried out at the maximum resolution of the fractional delays (1/6 in the example described here) even if the entire delay K _st is not in the domain where this resolved maximum tion applies. We determine for example the number Δ _st which maximizes C _st ² (K _st + δ / 6) / G _st (K _st + δ / 6) for -6 <δ <+6, then the basic delay rbf in maximum resolution is taken equal to K _st + Δ _st / 6. For the fractional values T of the delay, the autocorrelations C _st (T) and the delayed energies G _st (T) are obtained by interpolation from the values stored in step 90 for the whole delays. Of course, the basic delay relating to a sub-frame could also be determined in fractional resolution from step 90 and taken into account in the first estimation of the overall prediction gain on the frame.

Once the basic delay rbf has been determined for a sub-frame, an examination 101 of the submultiples of this delay is carried out in order to select those for which the prediction gain is relatively large (FIG. 4), then multiples of the smallest sub-multiple selected (Figure 5). In step 102, the address j in the list I _st and the index m of the submultiple are initialized to 0 and 1, respectively. A comparison 104 is made between the submultiple rbf / m and the minimum delay rmin. The submultiple rbf / m is to be examined if it is greater than rmin. We then take for the integer i the value of the index of the quantized delay r _i closest to rbf / m (step 106), then we compare, in 108, the estimated value of the prediction gain P _st (r _i ) associated with the quantized delay r _i for the sub-frame considered at the selection threshold SE _st calculated in step 98:

with, for the fractional delays an interpolation of the values C _st and G _st calculated in step 90 for the whole delays. If P _st (r _i ) <SE _st , the delay r _i is not taken into account, and we go directly to step 110 of incrementing the index m before carrying out the comparison 104 again for the next submultiple. If test 108 shows that P _st (r _i ) ≥ SE _st , the delay r _i is retained and step 112 is executed before incrementing the index m in step 110. In step 112, we stores index i at address j in the list I _st , the value m is given to the integer m0 intended to be equal to the index of the smallest submultiple retained, then the address j is incremented by one.

The examination of the sub-multiples of the basic delay is finished when the comparison 104 shows rbf / m <rmin. We then examine the multiple delays of the smallest rbf / m0 of the submultiples previously selected according to the process illustrated in FIG. 5. This examination begins with an initialization 114 of the index n of the multiple: n = 2. A comparison 116 is made between the multiple n.rbf / m0 and the maximum delay rmax. If n.rbf / m0> rmax, test 118 is carried out to determine whether the index m0 of the smallest sub-multiple is an integer multiple of n. If so, the delay n.rbf / m0 has already been examined when examining the sub-multiples of rbf, and we go directly to step 120 of incrementing the index n before carrying out again comparison 116 for the next multiple. If test 118 shows that m0 is not an integer multiple of n, the multiple n.rbf / m0 is to be examined. We then take for the integer i the value of the index of the quantized delay r _i closest to n.rbf / m0 (step 122), then we compare, at 124, the estimated value of the prediction gain P _st ( r _i ) at the selection threshold SE _st . If P _st (r _i ) <SE _st , the delay r _i is not taken into account, and we go directly to step 120 of incrementing the index n. If test 124 shows that P _st (r _i ) ≥ SE _st , the delay r _i is retained and step 126 is executed before incrementing the index n in step 120. In step 126, we stores the index i at address j in the list I _st , then the address j is incremented by one.

The examination of the multiples of the smallest sub-multiple is finished when the comparison 116 shows that n.rbf / m0> rmax. At this time, the list I _st contains j candidate delay index. If we wish to limit the maximum length of the list I _st to jmax for the following steps, we can take the length j _st of this list equal to min (j, jmax) (step 128) and then, in step 130, order the list l _st in the order of gains C _st ² (r _{Ist (j)} ) / G _st ² (r _{Is t (j)} ) decreasing for 0≤j <j _st so as to keep only the j _st delays providing the largest gain values. The value of jmax is chosen according to the compromise sought between the efficiency of the search for LTP delays and the complexity of this search. Typical values of jmax range from 3 to 5.

Once the sub-multiples and the multiples have been examined and the list I _st has thus been obtained (FIG. 3), the analysis module 36 calculates a quantity Ymax determining a second open-loop estimate of the prediction gain at long term over the entire frame, as well as ZP, ZP0 and ZP1 indexes in a phase 132, the progress of which is detailed in FIG. 6. This phase 132 consists in testing search intervals of length NI to determine which one maximizes a second estimate of the overall prediction gain on the frame. The intervals tested are those whose centers are the candidate delays contained in the list I _st calculated during phase 101. Phase 132 begins with a step 136 where the address j in the list l _st is initialized to 0. At l 'step 138, we check if the index I _st (j) has already been encountered by testing a previous interval centered on I _st' (j ') with st'<st and 0≤j '<j _st' , in order to d '' Avoid testing the same interval twice. If test 138 reveals that I _st (j) already appeared in a list I _st . with st '<st, the address j is directly incremented in step 140, then it is compared to the length j _st of the list I _st . If the comparison 142 shows that j <j _st , we return to step 138 for the new value of the address j. When the comparison 142 shows that j = j _st , all the intervals relating to the list I _st have been tested, and phase 132 is terminated. When the test 138 is negative, the interval centered on I _st (j) is tested, starting with step 148 where we determine, for each sub-frame st ', the index i _st , of the optimal delay which maximizes on this interval the open loop estimation P _st (r _i ) of the long-term prediction gain, that is to say which maximizes the quantity Y _st , (i) = C _{st '} ² (r _i ) / G _{st '} (r _i ) where r _i denotes the quantized delay of index i for I _st (j) -N1 / 2 ≤i <I _st (j) + N1 / 2 and 0≤i <N. During the maximization 148 relating to a sub-frame st ', we a priori discard the indexes i for which the autocorrelation C _st , (r _i ) is negative, in order to avoid degrading the coding. If it turns out that all the values of i included in the tested interval [I (j) -N1 / 2, I (j) + N1 / 2 [give rise to negative autocorrelations C _{st '} (r _i ), selecting the index i _{st '} for which this autocorrelation is the smallest in absolute value. Then, in 150, the quantity Y determining the second estimate of the overall prediction gain for the interval centered on I _gt (j) is calculated according to:

then compared to Ymax, where Ymax represents the value to be maximized. This value Ymax is for example initialized to 0 at the same time as the index st in step 96. If Y≤Ymax, we go directly to step 140 for incrementing the index j. If the comparison 150 shows that Y> Ymax, step 152 is executed before incrementing the address j in step 140. At this step 152, the index ZP is taken equal to I _st (j) and the indexes ZP0 and ZP1 are respectively taken equal to the smallest and the largest of the indexes i _st . determined in step 148.

At the end of phase 132 relating to a sub-frame st, the index st is incremented by one unit (step 154) then compared, in step 156, to the number nst of sub-frames per frame. If st <nst, we return to step 98 to carry out the operations relating to the next sub-frame. When the comparison 156 shows that st = nst, the index ZP designates the center of the search interval which will be supplied to the module 38 of LTP analysis in closed loop, and ZP0 and ZP1 are indices whose deviation is representative the dispersion of the optimal delays by sub-frame in the interval centered on ZP.

In step 158, the module 36 determines the degree of voicing MV, on the basis of the second open loop estimate of the gain expressed in decibels: Gp = 20. log ₁₀ (RO / RO-Ymax). Two other thresholds S1 and S2 are used. If Gp≤S1, the degree of voicing MV is taken equal to 1 for the current frame. The threshold S1 is typically between 3 and 5 dB; for example S1 = 4 dB. If S1 <Gp <S2, the degree of voicing MV is taken equal to 2 for the current frame. The threshold S2 is typically between 5 and 8 dB; for example S2 = 7 dB. If Gp> S2, the dispersion of the optimal delays for the different sub-frames of the current frame is examined. If ZP1-ZP.N3 / 2 and ZP-ZP0≤N3 / 2, an interval of length N3 centered on ZP is sufficient to take into account all the optimal delays and the degree of voicing is taken equal to 3 (if Gp> S2) . Otherwise, if ZP1-ZP≥N3 / 2 or ZP-ZPO> N3 / 2, the degree of voicing is taken equal to 2 (if Gp> S2).

The index ZP of the center of the search interval of the prediction delay for a voiced frame can be between 0 and N-1 = 255, and the differential index DP determined for the module 38 can range from -16 to + 15 if MV = 1 or 2, and from -8 to +7 if MV = 3 (case N1 = 32, N3 = 16). The ZP + DP index of the TP delay finally determined can therefore in some cases be smaller than 0 or greater than 255. This allows the closed-loop LTP analysis to also relate to some TP delays smaller than rmin or more larger than rmax. This improves the subjective quality of the reproduction of so-called pathological voices and non-voice signals (DTMF voice frequencies or signaling frequencies used by the switched telephone network). Another possibility is to take for the search interval the first 32 or last indexes of quantification of the delays if ZP <16 or ZP> 240 with MV = 1 or 2, and the first 16 or last index if ZP <8 or ZP > 248 with MV = 3.

Reducing the delay search interval for very closely spaced frames (typically 16 values for MV = 3 instead of 32 for MV = 1 or 2) reduces the complexity of the closed loop LTP analysis performed by the module 38 by reducing the number of convolutions y _T (i) to be calculated according to formula (1). Another advantage is that a coding bit of the differential DP index is saved. Since the output rate is constant, this bit can be reallocated for coding other parameters. In particular, this additional bit can be allocated to the quantification of the long-term prediction gain g _P calculated by the module 40. In fact, a better precision on the gain g _P thanks to an additional quantization bit is appreciable because this parameter is perceptually important for very close sub-frames (MV = 3). Another possibility is to provide a parity bit for the delay TP and / or the gain g _P , making it possible to detect possible errors affecting these parameters.

It is possible to make some modifications to the open loop LTP analysis process described above with reference to Figures 3 to 6.

According to a first variant of this process, the first optimizations carried out in step 90 relative to the different sub-frames are replaced by a single optimization relating to the entire frame. In addition to the parameters C _st (k) and G _st (k) calculated for each sub-frame st, the autocorrelations C (k) and the delayed energies G (k) for the entire frame are also calculated:

We then determine the basic delay in whole resolution K which maximizes X (k) = C ² (k) / G (k) for rmin ≤ k ≤ rmax.

The first gain estimate compared to S0 in step 92 is then P (K) = 20.log ₁₀ [R0 / [R0-X (K)]]. We then determine, around K, a single basic delay in fractional resolution rbf and the examination 101 of the sub-multiples and the multiples is done once and produces a single list I instead of nst lists I _st . Phase 132 is then performed only once for this list I, distinguishing the subframes only in steps 148, 150 and 152. This variant embodiment has the advantage of reducing the complexity of the analysis in open loop.

According to a second variant of the open loop LTP analysis process, the domain [rmin, rmax] of possible delays is subdivided into nz sub-intervals having for example the same length (nz = 3 typically), and the first optimizations carried out at step 90 relative to the different subframes are replaced by nz optimizations in the different subintervals each relating to the entire frame. We thus obtain nz basic delays K ₁ ', ..., K _nz ' in full resolution. The voiced / unvoiced decision (step 92) is taken on the basis of that of the basic delays K _i 'which provides the greatest value for the first open-loop estimate of the long-term prediction gain. Then, if the frame is voiced, the basic delays in fractional resolution are determined by the same process as in step 100, but only allowing the quantized delay values. Examination 101 of the submultiples and multiples is not carried out. For the phase 132 of calculating the second estimate of the prediction gain, the nz basic delays previously determined are taken as candidate delays. This second variant makes it possible to dispense with the systematic examination of the submultiples and of the multiples which are generally taken into account by virtue of the subdivision of the domain of possible delays.

According to a third variant of the analysis process

LTP in open loop, phase 132 is modified in that, in optimization steps 148, the index i _st is determined on the one hand, which maximizes C _{st '} ² (r _i ) / G _sf' (r _i ) for I _st . (j) -N1 / 2≤i <I _st (j) + N1 / 2 and 0≤i <N, and on the other hand, during the same maximization loop, the index k _{st '} which maximizes this same quantity over a reduced interval Ist (j) -N3 / 2≤i <I _st (j) + N3 / 2 and 0≤i <N. Step 152 is also modified: the ZP0 indexes are no longer stored and

ZP1, but a quantity Ymax 'defined in the same way as

Ymax but with reference to the reduced length interval:

In this third variant, the determination 158 of the voicing mode leads to more often selecting the voicing degree MV = 3. We also take into account, in addition to the gain Gp previously described, a third open loop estimate of the gain LTP, corresponding to Ymax ': Gp' = 20. log ₁₀ [R0 / (R0-Ymax ')]. The degree of voicing is MV = 1 if Gp≤S1, MV = 3 if Gp '> S2 and MV = 2 if neither of these two conditions is verified. By thus increasing the proportion of frames of degree MV = 3, the average complexity of the closed loop analysis is reduced and the robustness to transmission errors is improved.

A fourth variant of the open loop LTP analysis process mainly concerns weakly voiced frames

(MV = 1). These frames often correspond to a start or an end of a voicing area. Frequently, these frames can comprise from one to three sub-frames for which the gain coefficient of the long-term synthesis filter is zero or even negative. It is proposed not to perform LTP analysis in closed loop for the sub-frames in question, in order to reduce the average complexity of the coding. This can be achieved by storing in step 152 of FIG. 6 nst pointers indicating for each subframe st 'whether the autocorrelation C _st , corresponding to the index delay i _st , is negative or even very small. Once all the intervals referenced in the lists l _st , the sub-frames for which the prediction gain is negative or negligible can be identified by consulting the nst pointers. If applicable, module 38 is deactivated for subframes corresponding. This does not affect the quality of the LTP analysis since the prediction gain corresponding to these subframes will be almost zero anyway. Another aspect of the invention relates to the module 42 for calculating the impulse response of the weighted synthesis filter. The closed loop LTP analysis module 38 needs this impulse response h over the duration of a subframe to calculate the convolutions y _T (i) according to formula (1). The stochastic analysis module 40 also needs it to calculate convolutions as will be seen below. The fact of having to calculate convolutions with a response h extending over the duration of a subframe (lst = 40 typically) implies a relative complexity of the coding, which it would be desirable to reduce in particular to increase the autonomy of the mobile station. In some cases it has been proposed to truncate the impulse response to a length less than the length of a sub-frame (for example to 20 samples), but this can degrade the quality of the coding. It is proposed according to the invention to truncate the impulse response h taking into account on the one hand the energy distribution of this response and on the other hand the degree of voicing MV of the frame considered, determined by the LTP analysis module 36 open loop.

The operations performed by the module 42 are for example in accordance with the flowchart of FIG. 7. The impulse response is first calculated in step 160 over a length pst greater than the length of a subframe and sufficiently large to be sure to take into account all the energy of the impulse response (for example pst = 60 for nst = 4 and lst = 40 if the short-term linear prediction is of order q = 10). In step 160, the truncated energies of the impulse response are also calculated: The components h (i) of the impulse response and the truncated energies Eh (i) can be obtained by filtering a unitary pulse by means of a transfer function filter W (z) / A (z) of zero initial states , or by recurrence:

for 0 <i <pst, with f (i) = h (i) = 0 for i <0, δ (0) = f (0) = h (0) = Eh (0) = 1, and δ (i ) = 0 for i ≠ 0. In expression (2), the coefficients a _i are those used in the perceptual weighting filter, i.e. the linear prediction coefficients interpolated but not quantified, while in expression (3), the coefficients ai. are those applied to the synthesis filter, i.e. the quantized and interpolated linear prediction coefficients.

Then the module 42 determines the smallest length Lα such that the energy Eh (Lα-1) of the truncated impulse response to Lα samples is at least equal to a proportion α of its total energy Eh (pst-l) estimated on pst samples. A typical value of α is 98%. The number Lα is initialized to pst in step 162 and decremented by one as 166 as Eh (Lα-2)> α. Eh (pst-1) (test 164). The length Lα sought is obtained when test 164 shows that Eh (Lα-2) <a. Eh (pst-1).

To take account of the degree of voicing MV, a correcting term Δ (MV) is added to the value of Lα which has been obtained (step 168). This corrective term is preferably an increasing function of the degree of voicing. We can for example take Δ (0) = - 5, Δ (1) = 0, Δ (2) = + 5 and Δ (3) = + 7. Of this In this way, the impulse response h will be determined all the more precisely as the voicing of the speech is important. The length of truncation Lh of the impulse response is taken equal to Lα if Lαsnst and to nst otherwise. The remaining samples of the impulse response (h (i) = 0 with i≥Lh) can be canceled.

With the truncation of the impulse response, the calculation (1) of the convolutions y _T (i) by the module 38 of LTP analysis in closed loop is modified as follows:

Obtaining these convolutions, which represents a significant part of the calculations performed, therefore requires significantly less reduction, addition and addressing in the adaptive repertoire when the impulse response is truncated. The dynamic truncation of the impulse response involving the degree of voicing MV makes it possible to obtain such a reduction in complexity without affecting the quality of the coding. The same considerations apply for the convolution calculations performed by the module 40 of stochastic analysis. These advantages are particularly appreciable when the perceptual weighting filter has a transfer function of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ) with 0 <γ ₂ <γ ₁ <1 which gives rise to impulse responses generally longer than those of the form W (z) = A (z) / A (z / γ) more commonly used in coders using analysis by synthesis.

A third aspect of the invention relates to the stochastic analysis module 40 used to model the unpredictable part of the excitation.

The stochastic excitation considered here is of the multi-pulse type. Stochastic excitement relating to a subframe is represented by np pulses of positions p (n) and of amplitudes, or gains, g (n) (1≤n≤np). The gain g _P of long-term prediction can also be calculated during the same process. In general, it can be considered that the excitation sequence relating to a sub-frame comprises ne contributions associated respectively with ne gains. The contributions are lst sample vectors which, weighted by the associated and summed gains correspond to the excitation sequence of the short-term synthesis filter. One of the contributions can be predictable, or several in the case of a long-term synthesis filter with several takes ("multi-tap pitch synthesis filter"). The other contributions are in the present case np vectors comprising only 0 except an impulse of amplitude 1. We therefore have nc≈np if MV = 0, and nc = np + 1 if MV = 1, 2 or 3.

Multi-pulse analysis including the calculation of the gain g _P = g (0) consists, in a known manner, of finding for each subframe positions p (n) (l≤ninp) and gains g (n) ( 0≤n≤np) which minimize the perceptually weighted quadratic error E between the speech signal and the synthesized signal, given by:

the gains being solution of the linear system g.B = b.

In the above notations:

- X denotes an initial target vector composed of the lst samples of the weighted speech signal SW without memory: X = (x (0), x (1), ..., x (lst-1)), the x (i ) having been calculated as indicated above during the closed loop LTP analysis;

- g denotes the line vector composed of the np + 1 gains: g = (g (0) = g _P , g (1), ..., g (np));

- the line vectors F _{p (n)} (0≤n <nc) are weighted contributions having as components i (0≤i <lst) the products of convolution between the contribution n to the excitation sequence and the impulse response h from the filter weighted summary;

- b denotes the line vector composed of the ne scalar products between the vector X and the line vectors F _{p (n)} ;

- B designates a symmetric matrix with no rows and no columns whose term B _{i, j} = F _{p (i)} . F _{p (j)} ^T (0≤i, j <nc) is equal to the scalar product between the vectors F _{p (i)} and F _{p (j)} previously defined;

- (.) ^T denotes the matrix transposition.

For the pulses of stochastic excitation (lininp = nc-l) the vectors F _{p (n)} are simply constituted by the vector of the impulse response h shifted by p (n) samples. Truncating the impulse response as described above therefore makes it possible to significantly reduce the number of operations useful for calculating scalar products involving these vectors F _{p (n)} . For the predictable contribution of the excitation, the vector F _{p (0)} = Y _TP has for components F _{p (0)} (i) (0≤i <lst) the convolutions y _TP (i) that the module 38 has calculated according to formula (1) or (1 ') for the long-term prediction delay selected TP. If MV = 0, the contribution n = 0 is also of impulse type and the position p (0) is to be calculated.

Minimizing the quadratic error E defined above amounts to finding the set of positions p (n) which maximize the normalized correlation bB ^-1 .b ^T then calculating the gains according to g = bB ^-1 .

But an exhaustive search of the pulse positions would require an excessive volume of calculations. To alleviate this problem, the multi-pulse approach generally applies a sub-optimal procedure consisting of successively calculating the gains and / or the pulse positions for each contribution. For each contribution n (0≤n <nc), we first determine the position p (n) which maximizes the normalized correlation (F _p · e _{n -1} ^T ) ² / (F _p · F _p ^T ), we recalculates the gains g _n (0) to g _n (n) according to g _n = b _n · B _n ^-1 , where g _n = (g _n (0), ..., g _n (n)), b _n = (b (0), ..., b (n)) and B _n = {B _{i, j} } _{0≤i, j≤n} , then calculate for the next iteration the target vector e _n equal to vector- initial target X from which we subtract the contributions 0 to n of the weighted synthetic signal multiplied by their respective gains:

At the end of the last iteration nc-1, the gains g _nc-1 (i) are the selected gains and the minimized quadratic error E is equal to the energy of the target vector e _nc-1 .

The above method gives satisfactory results, but it requires the inversion of a matrix B _n at each iteration. In their article "Amplitude Optimization and Pitch Prediction in Multipulse Coders" (IEEE Trans. On Acoustics, Speech, and Signal Processing, Vol.37, N ° 3, March 1989, pages 317-327), S. Singhal and BS Atal have proposed to simplify the problem of inverting the matrices B _n using the Cholesky decomposition: B _n = M _n .M _n ^T where M _n is a lower triangular matrix. This decomposition is possible because B _n is a symmetric matrix with positive eigenvalues. The advantage of this approach is that the inversion of a triangular matrix is relatively uncomplicated, B _n-1 being obtainable by B _n-1 = (M _n ^{- 1} ) ^T. M _n ^-1 .

The decomposition of Cholesky and the inversion of the matrix M _n however require to carry out divisions and calculations of square roots which are operations demanding in terms of computation complexity. The invention proposes to considerably simplify the implementation of the optimization by modifying the decomposition of the matrices B _n as follows:

B _n = L _n . R _n ^T = L _π ML _rr K _n - ¹ ) ^τ

where K _n is a diagonal matrix and L _n is a lower triangular matrix having only 1s on its main diagonal (ie L _n = M _n . K _n ^1/2 with the previous notations).

Given the structure of the matrix B _n , the matrices L _n = R _n .K _n , R _n , K _n and L _n ^-1 are each constructed by simply adding a line to the corresponding matrices of the previous iteration:

'' Under these conditions, the decomposition of B _n , the inversion of L _n , the obtaining of B _n ^{- 1} = K _n . (L _n ^-1 ) ^T .L _n ^-1 and the recalculation of the gains only require only one division per iteration and no square root calculation.

The stochastic analysis relating to a subframe of a voiced frame (MV = 1,2 or 3) can therefore take place as indicated in FIGS. 8 to 11. To calculate the long-term prediction gain, the contribution index n is initialized to 0 in step 180 and the vector F _{p (0)} is taken equal to the long-term contribution Y _TP provided by the module 38. If n> 0, the iteration n begins with the determination 182 of the position p (n) of the pulse n which maximizes the quantity:

where e = (e (0), ..., e (lst-1)) is a target vector calculated during the previous iteration. Different constraints can be brought to the domain of maximization of the quantity above included in the interval [0, lst [. The invention preferably uses a segmental search in which the excitation subframe is subdivided into ns segments of the same length (for example ns = 10 for lst = 40). For the first pulse (n = l), the maximization of (F _p .e ^T ) ² / (F _p .F _p ^T ) is performed on all the possible positions p in the subframe. At iteration n> 1, the maximization is carried out in step 182 on the set of possible positions excluding the segments in which the positions p (1), ..., p (n have been found respectively) -1) pulses during previous iterations.

In the case where the current frame has been detected as unvoiced, the contribution n = 0 is also constituted by a position pulse p (0). Step 180 then only includes initialization n = 0, and it is followed by a maximization step identical to step 182 to find p (0), with e = e _-1 = X as the initial value of the vector- target.

Note that when the contribution n = 0 is predictable (MV = 1, 2 or 3), the closed-loop LTP analysis module 38 has performed an operation of a nature similar to the maximization 182, since it has determined the contribution in the long term, characterized by the delay TP, by maximizing the quantity (Y _T .e ^T ) ² / (Y _T .Y _T ^T ) in the delay search interval T, with e = e _-1 = X as initial value of the target vector. It is also possible, when the energy of the LTP contribution is very low, to ignore this contribution in the process of recalculation of the gains.

After step 180 or 182, the module 40 proceeds to the calculation 184 of the line n of the matrices L, R and K involved in the decomposition of the matrix B, which makes it possible to complete the matrices L _n , R _n and K _n defined above. The decomposition of the matrix B makes it possible to write: for the component located in row n and in column j. We can therefore write, for j increasing from 0 to n-1:

and, for j = n:

These relationships are used in the calculation 184 detailed in FIG. 9. The column index j is first initialized at 0, in step 186. For the column index j, the variable tmp is first initialized at the value of component B (n, j), that is:

In step 188, the integer k is also initialized to 0. A comparison 190 is then made between the integers k and j. If k <j, we add the term L (n, k). R (j, k) to the variable tmp, then we increment the whole k by one unit (step 192) before re-performing the comparison 190. When comparison 190 shows that k = j, a comparison 194 is carried out between the integers j and n. If j <n, the component R (n, j) is taken equal to tmp and the component L (n, j) to tmp.K (j) in step 196, then the column index j is incremented d 'a unit before returning to step 188 to calculate the following components. When the comparison 194 shows that j = n, the component K (n) of the line n of the matrix K is calculated, which ends the calculation 184 relating to the line n. K (n) is taken equal to 1 / tmp if tmp ≠ 0 (step 198) and to 0 otherwise. We note that the calculation 184 requires at most one division 198, to obtain K (n). In addition, any singularity of the matrix B _n does not cause instabilities since we avoid divisions by 0.

With reference to FIG. 8, the calculation 184 of the lines n of L, R and K is followed by the inversion 200 of the matrix L _n consisting of the lines and columns 0 to n of the matrix L. The fact that L is triangular with 1 on its main diagonal greatly simplifies the inversion as the shows figure 10. We can indeed write:

for 0≤j '<n and L ^-1 (n, n) = 1, that is to say that the inversion can be done without having to make a division. In addition, as the components of the line n of L suffice to recalculate the gains, the use of the relation (5) makes it possible to make the inversion without having to memorize the whole matrix L, but only a vector Linv = (Linv (0), ..., Linv (n-1)) with Linv (j ') = L ^-1 (n, j'). The inversion 200 then begins with an initialization 202 of the column index j 'at n-1. In step 204, the term Linv (j ') is initialized to -L (n, j ") and the integer k' to j '+ 1. A comparison 206 is then made between the integers k' and n. If k '<n, we subtract the term L (k', j '). Linv (k') to Linv (j '), then we increment the whole k' by one unit (step 208) before re-executing comparison 206. When comparison 206 shows that k '= n, we compare j' to 0 (test 210) .If j '> 0, we decrement the whole j' by one unit (step 212) and we return in step 204 to calculate the next component, inversion 200 is complete when test 210 shows that j '= 0.

With reference to FIG. 8, the inversion 200 is followed by the calculation 214 of the reoptimized gains and of the target vector E for the following iteration. The computation of the reoptimized gains is also very simplified by the decomposition retained for the matrix B. One can indeed compute the vector g _n = (g _n (0), ..., g _n (n)) solution of g _n . B _n = b _n according to: and g _n (i ') = g _n-1 (i') + L ^-1 <n, i ') · g _n (n) for 0≤i'<n. The calculation 214 is detailed in FIG. 11. The component b (n) of the vector b is first calculated:

b (n) serves as the initialization value for the variable tmq. In step 216, the index i is also initialized to 0. The comparison 218 is then carried out between the integers i and n. If i <n, we add the term b (i). Linv (i) to the variable tmq and we increment i by one unit (step 220) before returning to the comparison 218. When the comparison 218 shows that i = n, we calculate the gain relative to the contribution n according to g ( n) = tmq.K (n), and the loop for calculating the other gains and the target vector is initialized (step 222) by taking e = Xg (n) .F _{p (n)} and i '= 0. This loop includes a comparison 224 between the integers i 'and n. If i '<n, the gain g (i') is recalculated in step 226 by adding Linv (i '). G (n) to its value calculated during the previous iteration n-1, then we subtract from target vector e the vector g (i '). F _{p (i')} . Step 226 also includes the incrementation of the index i 'before returning to the comparison 224. The calculation 214 of the gains and of the target vector is finished when the comparison 224 shows that i' = n. We see that the gains could be updated by using only the line n of the inverse matrix L _n ^-1 .

The calculation 214 is followed by an incrementation 228 of the index n of the contribution, then by a comparison 230 between the index n and the number of contributions ne. If n <nc, we return to step 182 for the next iteration. The optimization of positions and gains is finished when n = nc in test 230.

Segmental pulse search significantly decreases the number of pulse positions to be evaluated during steps 182 of the search for stochastic excitation. It also allows efficient quantification of positions found. In the typical case where the subframe of lst = 40 samples is divided into ns = 10 segments of ls = 4 samples, the set of possible pulse positions can take ns! .Ls ^np / [np! (Ns- np)!] = 258,048 values if np = 5 (MV = 1, 2 or 3) or 860,160 if np = 6 (MV = 0), instead of lst! / [np! (lst-np)!] = 658 008 values if np = 5 or 3 838 380 if np = 6 in the case where it is imposed only that two pulses cannot have the same position. In other words, we can quantify the positions on 18 bits instead of 20 bits if np = 5, and on 20 bits instead of 22 if np = 6.

The special case where the number of segments per subframe is equal to the number of pulses by stochastic excitation (ns = np) leads to the greatest simplicity of the search for stochastic excitation, as well as to the lowest bit rate (if lst = 40 and np = 5, there are 8 ⁵ = 32 768 sets of possible positions, quantifiable on 15 bits only instead of 18 if ns = 10). But by reducing the number of possible innovation sequences to this point, the quality of the coding can be reduced. For a given number of pulses, the number of segments can be optimized according to a targeted compromise between the quality of the coding and its simplicity of implementation (as well as the required bit rate).

The case where ns> np also has the advantage that good robustness to transmission errors can be obtained with regard to the positions of the pulses, by virtue of a separate quantification of the sequence numbers of the occupied segments and of the relative positions pulses in each occupied segment. For a pulse n, the sequence number s _n of the segment and the relative position pr _n are respectively the quotient and the remainder of the Euclidean division of p (n) by the length ls of a segment: p (n) = s _n .ls + pr _n (0≤s _n <ns, 0≤pr _n <ls). The relative positions are each quantized separately on 2 bits, if ls = 4. In the event of a transmission error affecting one of these bits, the corresponding pulse will be only slightly displaced, and the perceptual impact of the error will be limited. The serial numbers of occupied segments are identified by a binary word of ns = 10 bits each worth 1 for the occupied segments and 0 for the segments in which the stochastic excitation has no pulse. Possible binary words are those with a Hamming weight of np; they are ns! / [np! (ns-np)!] = 252 if np = 5, or 210 if np = 6. This word is quantifiable by an index of nb bits with 2 ^nb-1 <ns! / [Np! (Ns-np)!] ≤2 ^nb , that is nb = 8 in the example considered. If, for example, the stochastic analysis provided np = 5 pulses of positions 4, 12, 21, 34, 38, the relative positions quantified scalarly are 0,0,1,2,2 and the binary word representing the occupied segments is 0101010011, or 339 in decimal translation.

At the decoder, the possible binary words are stored in a quantization table in which the read addresses are the received quantization indexes. The order in this table, determined once and for all, can be optimized so that a transmission error affecting a bit of the index (the most frequent error case, especially when an interleaving is implemented in the channel coder 22) has, on average, minimal consequences according to a neighborhood criterion. The neighborhood criterion is for example that a word of ns bits can only be replaced by "neighboring" words, distant from a Hamming distance at most equal to a threshold np-2δ, so as to keep all the pulses except δ of them at valid positions in the event of an error in transmission of the index relating to a single bit. Other criteria could be used in substitution or in addition, for example that two words are considered to be neighbors if the replacement of one by the other does not modify the order of allocation of the gains associated with the pulses.

For purposes of illustration, we can consider the simplified case where ns = 4 and np = 2, ie 6 possible binary words quantifiable on nb = 3 bits. In this case, it can be verified that the quantification table presented in Table II allows to keep np-1 = 1 pulse well positioned for any error affecting a bit of the transmitted index. There are 4 cases of error (out of a total of 18), for which we receive a quantification index that we know to be incorrect (6 instead of 2 or 4; 7 instead of 3 or 5), but the decoder can then take measures limiting the distortion, for example repeating the innovation sequence relating to the previous sub-frame or else assigning acceptable binary words to the "impossible" indexes (for example 1001 or 1010 for the index 6 and 1100 or 0110 for index 7 still lead to np-l = l properly positioned pulse in the event of reception of 6 or 7 with a binary error).

In the general case, the order in the word quantification table can be determined from arithmetic considerations or, if this is insufficient, by simulating the error scenarios on a computer (exhaustively or by statistical sampling of the type Monte-Carlo according to the number of possible error cases). To secure the transmission of the quantization index of the occupied segments, it is also possible to take advantage of the different protection categories offered by the channel encoder 22, in particular if the neighborhood criterion cannot be satisfactorily verified for all cases. possible errors affecting a bit of the index. The scheduling module 46 can thus put in the minimum protection category, or in the unprotected category, a certain number nx of the bits of the index which, if affected by a transmission error, give rise to a wrong word but checking the neighborhood criterion with a probability deemed satisfactory, and putting the other bits of the index in a more protected category. This procedure calls for a different ordering of the words in the quantification table. This scheduling can also be optimized by means of simulations if it is desired to maximize the number nx of the bits of the index assigned to the least protected category.

One possibility is to start by constituting a list of words of ns bits by counting in Gray code from 0 to 2 ^ns -1, and to obtain the ordered quantification table by deleting from this list the words having no weight of Hamming of np. The table thus obtained is such that two consecutive words have a Hamming distance of np-2. If the indexes in this table have a binary representation in Gray code, any error on the least significant bits makes vary the index of ± 1 and thus involves the replacement of the words of effective occupation by a neighbor word in the sense of the np-2 threshold on the Hamming distance, and an error on the i-th least significant bit also varies the index by ± 1 with a probability of approximately 2 ^1-i . By placing the nx least significant bits of the index in Gray code in an unprotected category, a possible transmission error affecting one of these bits leads to the replacement of the busy word by a neighboring word with a probability at least equal. at

(1 + 1/2 + ... + 1/2 ^nx-1 ) / nx. This minimum probability decreases from 1 to (2 / nb) (1-1 / 2 ^nb ) for nx increasing from 1 to nb. The errors affecting the nb-nx most significant bits of the index will most often be corrected thanks to the protection applied to them by the channel coder. The value of nx is in this case chosen according to a compromise between robustness to errors (small values) and a reduced size of the protected categories (large values).

At the coder level, the possible binary words to represent the occupation of the segments are arranged in ascending order in a search table. An indexing table associates with each address the serial number, in the quantification table stored at the decoder, of the binary word having this address in the search table. In the simplified example mentioned above, the content of the search table and of the indexing table is given in table III (in decimal values).

The quantification of the occupation word of the segments deduced from the np positions provided by the stochastic analysis module 40 is carried out in two stages by the quantization module 44. A dichotomous search is first carried out in the search table to determine the address in this table of the word to be quantified. The quantization index is then obtained at the address determined in the indexing table and then supplied to the bit scheduling module 46.

The module 44 also performs the quantification of the gains calculated by the module 40. The gain g _TP is for example quantified in the interval [0; 1.6], on 5 bits if MV = 1 or 2 and on 6 bits if MV = 3 to take into account the greater perceptual importance of this parameter for very close frames. For the coding of the gains associated with the pulses of the stochastic excitation, the greatest absolute value Gs of the gains g (1), ..., g (np) is quantified over 5 bits, for example by taking 32 quantization values in geometric progression in the interval [0; 32767], and each of the relative gains g (1) / Gs, ..., g (np) / Gs in the interval is quantified

[-1; +1], on 4 bits if MV = 1, 2 or 3, or on 5 bits if MV = 0.

The quantization bits of Gs are placed in a category protected by the channel coder 22, as are the most significant bits of the quantization indexes of the relative gains. The relative gain quantization bits are ordered so as to allow their assignment to the associated pulses belonging to the segments located by the busy word. The segmental search according to the invention also makes it possible to effectively protect the relative positions of the pulses associated with the greatest gain values.

In the case where np = 5 and ls = 4, ten bits per sub-frame are necessary to quantify the relative positions of the pulses in the segments. We consider the case where 5 of these

10 bits are placed in a category with little or no protection

(II) and where the other 5 are placed in a more protected category (IB). The most natural distribution is to place the most significant bit of each relative position in the protected category IB, so that possible transmission errors rather affect the most significant bits and therefore only cause a shift of a sample. for the corresponding pulse. It is however judicious, for the quantification of the relative positions, to consider the impulses in the decreasing order of the absolute values of the associated gains and to place in category IB both quantization bits of each of the first two relative positions as well as the most significant bit of the third. In this way, the positions of the pulses are preferentially protected when they are associated with significant gains, which improves the average quality, particularly for the most widely viewed subframes.

To reconstruct the impulse contributions of the excitation, the decoder 54 first locates the segments by means of the occupation word received; he then attributes the associated winnings; then it assigns the positions relative to the impulses on the basis of the order of importance of the gains.

It will be understood that the various aspects of the invention described above each provide their own improvements, and that it is therefore possible to implement them independently of each other. Their combination makes it possible to produce a particularly interesting performance coder.

In the embodiment described above, the 13 kbit / s speech coder requires around 15 million instructions per second (Mips) in fixed point. This will therefore typically be done by programming a commercial digital signal processor (DSP), as well as the decoder which requires only around 5 Mips.

Claims

1. Method for coding with analysis by synthesis of a speech signal digitized in successive frames divided into sub-frames of lst samples, in which a linear prediction analysis is carried out for each frame to determine the coefficients of a synthesis filter short term (60), and an excitation sequence with no contributions each associated with a respective gain (g _P , g (n)) is determined for each subframe so that the excitation sequence subjected to the short-term synthesis produces a synthetic signal representative of the speech signal, the contributions of the excitation sequence and the associated gains being determined by an iterative process in which the iteration n (0≤n <nc) comprises:

- the determination of the contribution n which maximizes the quantity (F _p. e _n-1 ^T ) ² / (F _p . F _p ^T ) where F _p denotes a line vector with lst components equal to the products of convolution between a possible value of the contribution n and the impulse response of a filter composed of the short-term synthesis filter and a perceptual weighting filter, and e _n-1 denotes a target vector determined during the iteration n-1 if nil and e _-1 = X is an initial target vector; and

- the calculation of n + 1 gains forming a line vector g _n = (g _n (0), ..., g _n (n)) by solving the linear system g _n .B _n = b _n where B _n , is a symmetric matrix with n + 1 rows and n + 1 columns whose component B _n (i, j) (0≤i, j≤n) is equal to the scalar product F _{p (i)} - F _{p (j)} ^T where F _{p (i)} and F _{p (j)} respectively denote the line vectors equal to the convolution products between the contributions i and j previously determined and the impulse response of the compound filter, and b _n is a line vector with n + 1 components b _n (i) (0≤i≤n) respectively equal to the scalar products between the vectors F _{p (i)} and the initial target vector X,

the ne gains associated with ne contributions of the excitation sequence being those calculated during the nc-1 iteration, characterized in that at each iteration n (0≤n <nc), the rows n of three matrices L, R and K are calculated with nc rows and no columns such that B _n = L _n. R _n ^T and L _n = R _n .K _n where L _n , R _n and K _n denote matrices with n + 1 rows and n + 1 columns corresponding respectively to the n + 1 first rows and to the n + 1 first columns of said matrices L, R and K , the matrices L and R being lower triangular, the matrix K being diagonal, and the matrix L having only 1s on its main diagonal, we calculate the line n of the matrix L ^-1 inverse of the matrix L, and we calculates the n + 1 gains according to the relation g _n = b _n .K _n . (L _n ^-1 ) ^T .L _n ^-1 where L _n ^-1 designates the matrix with n + 1 rows and n + 1 corresponding columns respectively to the first n + 1 rows and to the first n + 1 columns of the inverse matrix L ^-1 .

2. Method according to claim 1, characterized in that at each iteration n (0≤n <nc), the terms R (n, j) and L (n, j) are successively calculated respectively located at line n and in column j of the matrices R and L for j increasing from 0 to n-1, according to:

then the term K (n) located at line n and at column n of the matrix K is calculated according to:

3. Method according to claim 2, characterized in that at each iteration n (0≤n <nc), the terms L ^-1 (n, j ') respectively located at a line n and in columns j 'of the inverse matrix L ^-1 for j' decreasing from n-1 to 0 according to

4. Method according to claim 3, characterized in that at each iteration n (0≤n <nc), the gain g _n (n) associated with the contribution n is calculated according to:

,

then we recalculate the gains associated with contributions i 'for i' between 0 and n-1 according to:

5. Method according to any one of claims 1 to 4, characterized in that the contributions do not include at least one long-term contribution corresponding to a delayed past excitation.

6. Method according to any one of the claims

1 to 5, in which the excitation sequence comprises a stochastic excitation constituted by several pulses whose respective positions (p (n)) are calculated in the sub-frame and associated gains (g (n)) respectively, characterized in what each subframe is subdivided into ns segments, ns being a number at least equal to the number np of pulses by stochastic excitation, in that the positions (p (n)) of the pulses of the stochastic excitation relative to a subframe are determined successively, and in that one seeks the first pulse at any position of the subframe, and the following positions in excluding each segment to which an impulse whose position was previously determined belongs.

7. Method according to claim 6, characterized in that, the number ns of segments per subframe being greater than the number np of pulses by stochastic excitation, the sequence numbers of the segments occupied by a pulse are distinctly quantified of the stochastic excitation and the relative positions of the pulses in the occupied segments.

8. Method according to claim 7, characterized in that the occupation of the segments is represented by a word of ns bits in which the bits at 1 are those having the same order number as the occupied segments, the occupation words possible being ordered in a quantization table indexed by nb-bit indexes, with 2 ^nb-1 <ns! / [np! (ns-np)!] ≤2 ^nb , in such a way that two words including the respective indexes in binary representation differ by a single bit are neighbors according to a predetermined criterion, and in that one transmits for each subframe the index in the table of quantification of the word of occupation corresponding to the np pulses of the stochastic excitation.

9. Method according to claim 7, characterized in that the occupation of the segments is represented by a word of ns bits in which the bits at 1 are those having the same order number as the occupied segments, the occupation words possible being ordered in a quantization table indexed by nb-bit indexes, with 2 ^nb-1 <ns! / [np! (ns-np)!] ≤2 ^nb , so that two words with respective indexes in binary representation differ by a single bit forming part of nx bits of determined ranks being neighbors according to a predetermined criterion, and in that one transmits for each subframe the index in the table of quantification of the word of occupation corresponding to the np pulses of stochastic excitation, by selectively protecting against transmission errors the nb-nx bits of the index other than said nx rank bits determined.

10. Method according to claim 7 or 8, characterized in that an open loop analysis of the speech signal is carried out to detect the voiced frames of the signal, in that a first is provided for the subframes of the voiced frames. number of pulses by stochastic excitation and a first table for quantifying the words of occupation of the segments, and in that provision is made for the sub-frames of the unvoiced frames a second number of pulses by stochastic excitation and a second table for quantifying the occupation words of the segments.

11. Method according to any one of claims 7 to 10, characterized in that the bits for quantizing the relative positions of the np pulses are distributed, between a first group protected from transmission errors and a second group less protected, depending on the size of the gains associated with the pulses.

12. Method according to claim 11, characterized in that at least one pulse having a large relative gain in absolute value has more quantization bits of its relative position in said first group than pulses having a lower relative gain in value absolute.