EP1194923A1

EP1194923A1 - Methods and device for audio analysis and synthesis

Info

Publication number: EP1194923A1
Application number: EP00953223A
Authority: EP
Inventors: François CAPMAN; Carlo Murgia
Original assignee: Matra Nortel Communications SAS
Current assignee: Nortel Networks France SAS
Priority date: 1999-07-05
Filing date: 2000-07-04
Publication date: 2002-04-10
Anticipated expiration: 2020-07-04
Also published as: FR2796194A1; DE60025615D1; EP1194923B1; AU6575100A; WO2001003116A1; FR2796194B1; ATE316284T1

Abstract

The invention concerns a method which consists, for the analysis of an audio signal (x), in weighting samples of each frame of N samples through an analysis window of the Hamming, Hanning, Kaiser type or the like; calculating the audio signal spectrum by transforming each frame of weighted samples in the frequency domain; and in processing the spectrum of the audio signal to deliver synthesis parameters for a signal derived from an analysed audio signal. The successive frames for which complete sets of synthesis parameters are supplied advantageously have mutual overlaps of less than N/2 samples. During synthesis, the method consists in recovering the frames with an optional interpolation, and in reforming the signal by a frame overlapping sum, after having carried out an appropriate weighting of the samples.

Description

AUDIO ANALYSIS AND SYNTHESIS METHODS AND DEVICES

The present invention relates to the analysis and synthesis of audio signals, from representations of these signals in the spectral domain.

It applies in particular, but not exclusively, to speech coding, in narrow band or in wide band, in various ranges of coding bit rates. Among the other fields of application, mention may be made of denoising by spectral subtraction (see EP-A-0 534 837 or WO99 / 14739).

In the analysis methods in question, the signal spectrum is obtained by transforming successive frames towards the frequency domain. The transformation used is most often the fast Fourier transform (TFR); but other known transforms can be used. In the frequent case of signal sampling at 8 kHz, the number N of samples per frame is typically of the order of 100 to 500, which represents frames of a few tens of milliseconds. To benefit from the maximum frequency resolution, the TFR is performed on 2N points, N samples at zero being added to the N samples of the frame.

The spectrum obtained by Fourier transform of the signal frame is the convolution of the real spectrum of the signal by the Fourier transform of the signal analysis window. This analysis window, which weights the samples of each frame, is necessary to take into account the finite duration of the frame. If the signal frame is directly subjected to the TFR, that is to say if a rectangular analysis window is used, the spectrum obtained is disturbed by the secondary peaks of the TFR of the analysis window. To limit this drawback, which is particularly sensitive when parameters representing the signal or the noise must be extracted from the spectra, use is made of windows having better spectral properties, that is to say weighting functions whose support is limited to N samples and whose Fourier transform has its energy concentrated in a narrow peak with a strong attenuation of the secondary peaks. The most common of these windows are the Hamming, Hanning and Kaiser windows.

In the OLA (“Overlap-And-Add”) analysis and synthesis method, the successive frames have mutual overlaps of 50% (N / 2 samples). As the analysis windows commonly used check the property f _A (i + N / 2) + f _A (i) = 1, the synthesis can be carried out simply by performing the overlapping sum of the frames of N samples successively calculated by inverse Fourier transform of the spectra.

In order to refine the spectral representation, certain so-called WOLA (“Weighted OLA”) methods use for analysis frames with mutual overlaps of more than 50%. At the synthesis, it is necessary to reweight the samples of the frames before summing them. These methods increase the complexity of analysis and synthesis. In coding applications, they also increase the required transmission rate.

An object of the invention is to propose an analysis and synthesis scheme for audio signals which makes it possible to limit the rate of the analysis frames, while using analysis windows having good spectral properties.

The invention provides a method of analyzing an audio signal processed by successive frames of N samples, in which the samples of each frame are weighted by an analysis window of the Hamming, Hanning, Kaiser or similar type, a spectrum of the audio signal by transforming each frame of weighted samples in the frequency domain, and the spectrum of the audio signal is processed to deliver synthesis parameters of a signal derived from the analyzed audio signal. According to the invention, the successive frames comprise an alternation of frames for which complete sets of synthesis parameters are delivered, which have mutual overlaps of less than N / 2 samples, ie less than 50%, and of frames for which one delivers incomplete sets of synthesis parameters.

The frames for which complete sets of synthesis parameters are not delivered may not be subject to any spectral analysis. As a variant, an analysis can nevertheless be carried out for these frames, in order to deliver incomplete sets of synthesis parameters including data representing an interpolation error of at least one of the synthesis parameters and / or data representing a filter d interpolation of at least one of the synthesis parameters. In a first field of application of the method, the processing of the spectrum of the audio signal comprises an extraction of coding parameters with a view to the transmission and / or storage of the coded audio signal. In a second field of application of the method, the processing of the spectrum of the audio signal comprises a denoising by spectral subtraction. Other fields of application can also be envisaged among the audio processing.

A second aspect of the invention relates to a method of synthesis of an audio signal, in which successive spectral estimates are obtained corresponding respectively to frames of N samples of the audio signal weighted by an analysis window, the successive frames presenting mutual overlaps of L samples, each frame of the signal is evaluated audio by transforming the spectral estimates in the time domain, and combining the evaluated frames to form the synthesized signal. According to this method, each evaluated frame is modified by applying to it a processing corresponding to a division by said analysis window and to a multiplication by a synthesis window, and the synthesized signal is formed as a sum with overlap of the modified frames. The number L being smaller than N / 2 and the samples of a frame having rows i numbered from 0 to N-1, the summary window f _s (i) checks f _s (N-L + i) + f _s (i) = A for 0 <i <L, and is equal to A for L <i <NL, A being a positive constant. In a variant of the synthesis method according to the invention, a set of successive overlapping frames of N samples of the audio signal weighted by an analysis window is evaluated, by transforming in the time domain spectral estimates corresponding respectively to said frames, and combines the evaluated frames to form the synthesized signal. For a subset of the evaluated frames, the spectral estimates are obtained by processing synthesis parameters respectively associated with the frames of said subset while, for the frames not forming part of the subset, the spectral estimates are obtained with a interpolation of at least part of the synthesis parameters. The successive frames of said subset have mutual time offsets of M samples, the number M being greater than N / 2, while the successive frames of said set have mutual time offsets of M / p samples, p being an integer more greater than 1. Each evaluated frame is modified by applying to it a processing corresponding to a division by said analysis window and to a multiplication by a synthesis window, and the synthesized signal is formed as an overlapping sum of the modified frames. The samples of a frame having rows i numbered from 0 to

N-1, the summary window f _s (i) has support limited to rows i ranging from N / 2 - M / p to N / 2 + M / p and checks f _s (i) + f _s (i + M / p) = A for N / 2 - M / p <i <N / 2, A being a positive constant.

The invention also provides audio processing devices comprising means for implementing the above analysis and synthesis methods. Other features and advantages of the present invention will appear in the following description of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

- Figure 1 is a block diagram of an audio encoder according to the invention;

- Figures 2 and 3 are diagrams illustrating the formation of audio signal frames in the encoder of Figure 1;

- Figures 4 and 5 are graphs showing an example of the audio signal spectrum and illustrating the extraction of the upper and lower envelopes of this spectrum;

- Figure 6 is a block diagram of an example of quantization means usable in the encoder of Figure 1;

- Figure 7 is a block diagram of means used to extract parameters relating to the phase of the non-harmonic component in a variant of the encoder of Figure 1;

- Figure 8 is a block diagram of an audio decoder corresponding to the encoder of Figure 1;

FIG. 9 is a flow diagram of an example of a procedure for smoothing spectral coefficients and extracting minimum phases implemented in the decoder of FIG. 8;

- Figure 10 is a block diagram of analysis and spectral mixing modules of harmonic and non-harmonic components of the audio signal;

- Figures 11 to 13 are graphs showing examples of non-linear functions usable in the analysis module of Figure 10; FIGS. 14 and 15 are diagrams illustrating one way of proceeding to the temporal synthesis of the signal frames in the decoder of FIG. 8;

- Figures 16 and 17 are graphs showing windowing functions usable in the synthesis of the frames according to Figures 14 and 15; FIGS. 18 and 19 are block diagrams of interpolation means which can be used in an alternative embodiment of the coder and the decoder,

FIG. 20 is a block diagram of interpolation means which can be used in another variant embodiment of the coder, and

FIGS. 21 and 22 are diagrams illustrating another way of carrying out the temporal synthesis of the signal frames in the decoder of FIG. 8, using an interpolation of parameters

The coder and the decoder described below are digital circuits which can, as is usual in the field of audio signal processing, be produced by programming a digital signal processor (DSP) or an integrated circuit d specific application (ASIC)

The audio coder represented in FIG. 1 processes an input audio signal x which, in the nonlimiting example considered below, is a speech signal The signal x is available in digital form, for example at a frequency d the 8 kHz sampling F _e is for example delivered by an analog-digital converter processing the amplified output signal of a microphone The input signal x can also be formed from another version, analog or digital, coded or not, of the speech signal The coder comprises a module 1 which forms successive frames of audio signal for the various treatments carried out, and an output multiplexer 6 which delivers an output stream Φ containing for each frame sets of parameters of quantization from which a decoder will be able to synthesize a decoded version of the audio signal The structure of the frames is illustrated by Figures 2 and 3 Each frame 2 is composed of a no mbre N of consecutive samples of the audio signal x The successive frames have mutual time offsets corresponding to M samples, so that their overlap is L = NM samples of the signal In the example considered, where N = 256, M = 160 and L = 96, the duration of frames 2 is N / F _e = 32 ms, and a frame is formed every M / F _e = 20 ms

Conventionally, the module 1 multiplies the samples of each frame 2 by a windowing function f _A , preferably chosen for its good spectral properties The samples x (ι) of the frame being numbered from i = 0 to i = N- 1, the analysis window f _A (ι) can thus be a window for Hamming, expression: i - (N-1) / 2 ^' f _A (i) = 0.54 + 0.46. cos 2π (1) N or a Hanning expression window

or a window of Kaiser, of expression:

where α is a coefficient for example equal to 6, and l ₀ (.) denotes the function of

Bessel with index 0.

The coder in FIG. 1 analyzes the audio signal in the spectral domain. It includes a module 3 which calculates the fast Fourier transform (TFR) of each signal frame. The signal frame is formatted before being submitted to the TFR module 3: the module 1 adds N = 256 samples to zero in order to obtain the maximum resolution of the Fourier transform, and it also performs a circular permutation of the 2N = 512 samples in order to compensate for the phase effects resulting from the analysis window. This modification of the frame is illustrated in FIG. 3. The frame for which the fast Fourier transform on 2N = 512 points is calculated begins with the N / 2 = 128 last weighted samples of the frame, followed by the N = 256 samples with zero, and ends with the N / 2 = 128 first weighted samples of the frame.

The TFR 3 module obtains the signal spectrum for each frame, the module and phase of which are respectively denoted | X | and φ _x , or | X (i) | and φ _χ (i) for the frequency indexes i = 0 to i = 2N-1 (thanks to the symmetry of the Fourier transform and of the frames, we can limit ourselves to the values for 0 <i <N).

A fundamental frequency detector 4 estimates for each signal frame a value of the fundamental frequency F ₀ . The detector 4 can apply any known method of analysis of the speech signal of the frame to estimate the fundamental frequency F ₀ , for example a method based on the autocorrelation function or the AMDF function, possibly preceded by a bleaching module by linear prediction. The estimation can also be performed in the spectral domain or in the cepstral domain. Another possibility is to evaluate the time intervals between the consecutive breaks in the speech signal attributable to closures of the glottis of the intervening speaker during the duration of the frame. Well-known methods which can be used to detect such micro-ruptures are described in the following articles: M. Basseville et al., “Sequential detection of abrupt changes in spectral characteristics of digital signais” (IEEE Trans. On Information Theory, 1983, Vol IT-29, No. 5, pages 708-723); R. Andre-Obrecht, "A new statistical approach for the automatic segmentation of continuous speech signais" (IEEE Trans. On Acous., Speech and Sig. Proc, Vol. 36, N ° 1, January 1988); and C. MURGIA et al., “An algorithm for the estimation of glottal closure instants using the sequential detection of abrupt changes in speech signais” (Signal Processing VII, 1994, pages 1685-1688).

The estimated fundamental frequency F ₀ is subject to quantification, for example scalar, by a module 5, which supplies the output multiplexer 6 with an index iF for quantizing the fundamental frequency for each frame of the signal. The encoder uses cepstral parametric models to represent an upper envelope and a lower envelope of the spectrum of the audio signal. The first step of the cepstral transformation consists in applying to the signal spectrum module a spectral compression function, which can be a logarithmic or root function. The coder module 8 thus operates, for each value X (i) of the signal spectrum (0 <i <N), the following transformation:

LX (i) = Log (jX (i) |) (4) in the case of logarithmic compression or

in the case of compression at the root, γ being an exponent between 0 and 1.

The compressed spectrum LX of the audio signal is processed by a module 9 which extracts spectral amplitudes associated with the harmonics of the signal corresponding to the multiples of the estimated fundamental frequency F0. These amplitudes are then interpolated by a module 10 in order to obtain a compressed upper envelope denoted LX_sup

It should be noted that the spectral compression could equivalently be carried out after the determination of the amplitudes associated with the harmonics It could also be carried out after the interpolation, which would only modify the form of the interpolation functions

The module 9 for extracting the maxima takes into account the possible variation of the fundamental frequency on the analysis frame, the errors that the detector 4 can make, as well as inaccuracies linked to the discrete nature of the frequency sampling. that, the search for the amplitudes of the spectral peaks does not simply consist in taking the values LX (ι) corresponding to the indices i such that i F _e / 2N is the frequency closest to a harmonic of frequency k F ₀ (k> 1) The spectral amplitude retained for a harmonic of order k is a local maximum of the spectrum module in the vicinity of the frequency k F ₀ (this amplitude is obtained directly in compressed form when the spectral compression 8 is carried out before the extraction maxima 9)

Figures 4 and 5 show an example of the shape of the compressed spectrum

LX, where we see that the maximum amplitudes of the harmonic peaks do not necessarily coincide with the amplitudes corresponding to the integer multiples of the estimated fundamental frequency F ₀ The sides of the peaks being quite steep, a small positioning error of the fundamental frequency F ₀ , amplified by the harmonic index k, can strongly distort the estimated upper envelope of the spectrum and cause poor modeling of the formantic structure of the signal For example, take the spectral amplitude directly for the frequency 3 F ₀ in the case of Figures 4 and 5 would produce a significant error in the extraction of the upper envelope in the vicinity of the harmonic of order k = 3, whereas it is an energetically significant zone in the example drawn By performing interpolation from the true maximum, this kind of estimation error of the upper envelope is avoided

In the example shown in FIG. 4, the interpolation is carried out between points whose abscissa is the frequency corresponding to the maximum of the amplitude of a spectral peak, and whose ordinate is this maximum, before or after compression

The interpolation carried out to calculate the upper envelope LX_sup is a simple linear interpolation Of course another form of interpolation could be used (for example polynomial or spline) In the preferred variant represented in FIG. 5, the interpolation is carried out between points whose abscissa is a frequency k F ₀ multiple of the fundamental frequency (in fact the closest frequency in the discrete spectrum) and whose ordinate is the maximum amplitude, before or after compression, of the spectrum in the vicinity of this multiple frequency By comparing Figures 4 and 5, we can see that the extraction mode according to Figure 5, which reposition the peaks on the harmonic frequencies, leads to better precision on the amplitude of the peaks that the decoder will assign at frequencies multiple of the fundamental frequency II can occur a slight frequency shift of the position of these peaks, which is not perceived ely very important and is not avoided either in the case of Figure 4 In the case of Figure 4, the anchor points for the interpolation are merged with the vertices of the harmonic peaks In the case of FIG. 5, it is imposed that these anchoring points are precisely at the frequencies multiple of the fundamental frequency, their amplitudes corresponding to those of the peaks

The maximum amplitude search interval associated with a harmonic of rank k is centered on the index i of the frequency of the highest TFR

close to k F ₀ , i.e. i = where aj denotes the integer equal to or immediately below the number a The width of this search interval depends on the sampling frequency F _e , the size 2N of the TFR and the range of possible variation of the fundamental frequency This width is typically of the order of ten frequencies with the examples of values previously considered. It can be made adjustable as a function of the value F ₀ of the fundamental frequency and of the number k of the harmonic In order to improve the resolution in the low frequencies and therefore to represent more faithfully the amplitudes of the harmonics in this zone, a non-linear distortion of the frequency scale is operated on the upper envelope compressed by a module 12 before the module 13 performs the inverse fast Fourier transform (TFRI) providing the cepstral coefficients cx_sup.

The non-linear distortion makes it possible to minimize the modeling error more effectively. It is for example carried out according to a Mel or Bark type frequency scale. This distortion may possibly depend on the estimated fundamental frequency F ₀ . Figure 1 illustrates the case of the Mel scale. The relationship between the frequencies F of the linear spectrum, expressed in hertz, and the frequencies F 'of the Mel scale is as follows:

In order to limit the transmission rate, a truncation of the cepstral coefficients cx_sup is performed. The TFRI module 13 needs to calculate only one cepstral vector of NCS cepstral coefficients of orders 0 to NCS-1. As an example, NCS can be equal to 16.

A post-filtering in the cepstral domain, called post-liftrage, is applied by a module 15 to the compressed upper envelope LX_sup. This post-liftrage corresponds to a manipulation of the cx_sup cepstral coefficients delivered by the TRF module! 13, which corresponds approximately to a post-filtering of the harmonic part of the signal by a transfer function having the classical form:

where A (z) is the transfer function of a linear prediction filter of the audio signal, γ ₁ and γ ₂ are coefficients between 0 and 1, and μ is a possibly zero pre-emphasis coefficient. The relation between the post-liftré coefficient of order i, noted c _p (i), and the corresponding cepstral coefficient c (i) = cx_sup (i) delivered by module 13 is then: c _p (0) = c ( 0) (i ι \ u ^j < ⁸ ) c _p (ι) = p + γ ₂ - γι ^c ( ⁱ ) - for ι> 0

The optional pre-emphasis coefficient μ can be controlled by placing the constraint of preserving the value of the cepstral coefficient cx_sup (1) relative to the slope. Indeed, the value c (1) = cx_sup (1) of a white noise filtered by the pre-emphasis filter corresponds to the pre-emphasis coefficient. We can thus choose the latter as follows: After the post-liftre 15, a normalization module 16 further modifies the cepstral coefficients by imposing the exact modeling constraint of a point on the initial spectrum, which is preferably the most energetic point among the spectral maxima extracted by the module 9 In practice, this normalization only modifies the value of the coefficient c _p (0)

The normalization module 16 operates as follows: it recalculates a value of the spectrum synthesized at the frequency of the maximum indicated by the module 9, by Fourier transform of the truncated and post-liftral cepstral coefficients, taking into account the non-linear distortion of the frequency axis, it determines a normalization gain g _N by the logarithmic difference between the maximum value provided by the module 9 and this recalculated value, and it adds the gain g _N to the post-liftrated cepstral coefficient cp (0 )

This standardization can be seen as part of the post-liftering

The post-liftrated and normalized cepstral coefficients are subject to quantification by a module 18 which transmits corresponding quantization indexes icxs to the output multiplexer 6 of the encoder

The module 18 can operate by vector quantization from cepstral vectors formed from post-liftred and normalized coefficients, noted HERE cx [n] for the signal frame of rank n. For example, the cepstral vector cx [π] of NCS = 16 cepstral coefficients cx [n, 0], cx [n, 1],, cx [n, NCS-1] is distributed into four cepstral sub-vectors each containing four coefficients of consecutive orders The cepstral vector cx [n ] can be processed by the means shown in FIG. 6, which are part of the quantization module 18 These means implement, for each component cx [n, ι], a predictor of the form cx _p [n, ι] = ( l - α (ι)) rcx [n, ι] + α (ι) rcx [n-1, ι] (9) where rcx [n] denotes a residual prediction vector for the frame of rank n whose components are respectively denoted rcx [n, 0], rcx [n, 1], rcx [n, NCS-1], and α (ι) denotes a prediction coefficient chosen to be representative of an inter-frame correlation assumed after s quantification of the residues this residual vector is defined by rcx _r [n, i], = (. „10_.),

2- α (ι) where rcx_q [π-1] denotes the quantized residual vector for the frame of rank n— 1, whose components are respectively noted rcx_q [n, 0], rcx_q [n, 1], rcx_q [n, NCS-1]

The numerator of the relation (10) is obtained by a subtractor 20, whose components of the output vector are divided by the quantities 2-α (ι) at 21 For the purposes of quantification, the residual vector rcx [n] is subdivided into four sub-vectors, corresponding to the subdivision into four cepstral sub-vectors On the basis of a dictionary obtained by prior learning, the unit 22 proceeds to the vector quantization of each sub-vector of the residual vector rcx [n] This quantization can consist, for each sub-vector srcx [n], in selecting in the dictionary the quantized sub-vector srcx_q [n] which minimizes the quadratic error

|| srcx [n] - The set icxs of the quantization indexes icx, corresponding to the addresses in the dictionary or dictionaries of the quantized residual sub-vectors srcx_q [n], is supplied to the output multiplexer 6 Unit 22 also delivers the values of the quantized residual sub-vectors , which form the vector rcx_q [n] This is delayed by a frame at 23, and its components are multiplied by the coefficients α (ι) at 24 to provide the vector at the negative input of the subtractor 20 This last vector is also supplied to an adder 25, the other input of which receives a vector formed by the components of the quantized residue rcx_q [n] respectively multiplied by the quantities 1-α (ι) at 26 The adder 25 thus delivers the quantized cepstral vector cx_q [n] that will recover the decoder

The prediction coefficient α (ι) can be optimized separately for each of the cepstral coefficients The quantization dictionaries can also be optimized separately for each of four cepstral sub-vectors On the other hand, it is possible, in a manner known per se, to normalize cepstral vectors before applying the prediction / quantification scheme, based on the vanance of cepstrums

It should be noted that the above scheme for quantifying cepstral coefficients may only be applied for some of the frames. For example, a second quantization mode can be provided as well as a selection process from that of the two modes which minimizes a criterion of least squares with the cepstral coefficients to be quantified, and transmit with the quantization indexes of the frame a bit indicating which of the two modes has been selected The quantized cepstral coefficients cx_sup_q = cx_q [n] supplied by the adder 25 are sent to a module 28 which recalculates the spectral amplitudes associated with one or more of the harmonics of the fundamental frequency F ₀ (Figure 1) These spectral amplitudes are for example calculated in compressed form, by applying the Fourier transform to the quantified cepstral coefficients taking into account the non-linear distortion of the frequency scale used in the cepstral transformation The amplitudes thus recalculated are supplied to an adaptation module 29 which compare to maxima amplitudes determined by the extraction module 9

The adaptation module 29 controls the post-hftre 15 so as to minimize a module difference between the spectrum of the audio signal and the corresponding module values calculated at 28 This module difference can be expressed by a sum of absolute difference values amplitudes, compressed or not, corresponding to one or more of the harmonic frequencies This sum can be weighted according to the spectral amplitudes associated with these frequencies

Optimally, the modulus difference taken into account in the adaptation of the post-liftring would take into account all the harmonics of the spectrum However, in order to reduce the complexity of the optimization, the module 28 can only re-synthesize the spectral amplitudes for one or more frequencies multiple of the fundamental frequency F ₀ , selected on the basis of the importance of the spectrum module in absolute value The adaptation module 29 can for example consider the three most intense spectral peaks in the calculation of l module gap to minimize

In another embodiment, the adaptation module 29 estimates a spectral masking curve of the audio signal by means of a psychoacoustic model, and the frequencies taken into account in the calculation of the module deviation to be minimized are selected on the basis the importance of the spectrum modulus relative to the masking curve (we can for example take the three frequencies for which the spectrum modulus exceeds the masking curve the most) Different conventional methods can be used to calculate the masking curve at from the audio signal We can for example use the one developed by JD Johnston ("Transform Coding of Audio Signais Using Perceptual Noise Cπteπa", IEEE Journal on Selected Area in Communications, Vol 6, No 2, February 1988)

To carry out the adaptation of post-liftering, the module 29 can use a filter identification model. A simpler method consists in predefining a set of sets of post-liftering parameters, that is to say a set of couples γ- ₎ , γ ₂ in the case of a post-liftring according to relations (8), to carry out the operations incumbent on modules 15, 16, 18 and 28 for each of these sets of parameters, and to retain that of sets of parameters which leads to the minimum modulus deviation between the signal spectrum and the recalculated values The quantization indexes provided by the module 18 are then those which relate to the best set of parameters

By a process analogous to that of the extraction of the coefficients cx_sup representing the compressed upper envelope LX_sup of the signal spectrum, the coder determines coefficients cx_ιnf representing a compressed lower envelope LX_ιnf A module 30 extracted from the compressed spectrum LX of the spectral amplitudes associated with frequencies located in intermediate spectrum areas with respect to frequencies multiple of the estimated fundamental frequency F ₀

In the example illustrated by FIGS. 4 and 5, each amplitude associated with a frequency situated in an intermediate zone between two successive harmonics k F ₀ and (k + 1) F ₀ simply corresponds to the modulus of the spectrum for the frequency (k + 1/2) F ₀ located in the middle of the interval separating the two harmonics In another embodiment, this amplitude could be an average of the spectrum module over a small range surrounding this frequency (k + 1/2) F ₀ A module 31 proceeds to an interpolation, for example linear, of the spectral amplitudes associated with the frequencies located in the intermediate zones to obtain the compressed lower envelope LX_ιnf

The cepstral transformation applied to this compressed lower envelope LXjnf is carried out according to a frequency scale resulting from a non-linear distortion applied by a module 32 The TFRI module 33 calculates a cepstral vector of NCI cepstral coefficients cx nf of orders 0 to NCI-1 representing the lower envelope NCI is a number which can be significantly smaller than NCS, for example NCI = 4

The non-linear transformation of the frequency scale for the cepstral transformation of the lower envelope can be performed towards a finer scale at high frequencies than at low frequencies, which advantageously makes it possible to model the unvoiced components of the signal at high frequencies. However, to ensure a uniformity of representation between the upper envelope and the lower envelope, it may be preferable to adopt in module 32 the same scale as in module 12 (Mel in the example considered).

The cepstral coefficients cx_inf representing the compressed lower envelope are quantified by a module 34, which can operate in the same way as the module 18 for quantifying the cepstral coefficients representing the compressed upper envelope. In the case considered, where we limit ourselves to NCI = 4 cepstral coefficients for the lower envelope, the vector thus formed is subjected to a vector quantization of prediction residue, carried out by means identical to those represented in FIG. 6 but without subdivision into sub-vectors. The quantization index icx = icxi determined by the vector quantizer 22 for each frame relative to the coefficients cx_inf is supplied to the output multiplexer 6 of the coder.

The coder shown in FIG. 1 does not include any particular device for coding the phases of the spectrum with the harmonics of the audio signal. On the other hand, it includes means 36-40 for coding temporal information linked to the phase of the non-harmonic component represented by the lower envelope.

A spectral decompression module 36 and a TFRI module 37 form a temporal estimate of the frame of the non-harmonic component. The module 36 applies a reciprocal decompression function of the compression function applied by the module 8 (that is to say an exponential or a power 1 / γ function) to the compressed lower envelope LXjnf produced by the module interpolation 31. This provides the modulus of the estimated frame of the non-harmonic component, the phase of which is taken to be equal to that φ _χ of the spectrum of the signal X over the frame. The inverse Fourier transform performed by the module 37 provides the estimated frame of the non-harmonic component.

The module 38 subdivides this estimated frame of the non-harmonic component into several time segments. The frame delivered by the module 37 consisting of 2N = 512 weighted samples as illustrated in FIG. 3, the module 38 considers only the N / 2 = 128 first samples and the N / 2 = 128 last samples, and subdivides them for example into eight segments of 32 consecutive samples each representing 4 ms of signal.

For each segment, the module 38 calculates the energy equal to the sum of the squares of the samples, and forms a vector E1 formed by eight positive real components equal to the eight calculated energies. The largest of these eight energies, denoted EM, is also determined to be supplied, with the vector E1, to a normalization module 39. The latter divides each component of the vector E1 by EM, so that the normalized vector Emix is formed of eight components between 0 and 1. It is this normalized vector Emix, or weighting vector, which is subject to quantization by module 40. This can perform vector quantization with a dictionary determined during a prior learning. The quantization index iEm is supplied by the module 40 to the output multiplexer 6 of the coder.

FIG. 7 shows an alternative embodiment of the means used by the coder of FIG. 1 to determine the vector Emix of energy weighting of the frame of the non-harmonic component. The spectral decompression and TFRI modules 36, 37 operate like those which have the same references in FIG. 1. A selection module 42 is added to determine the value of the module of the spectrum subjected to the inverse Fourier transform 37. On the basis of the estimated fundamental frequency F ₀ , the module 42 identifies harmonic regions and non-harmonic regions of the spectrum audio signal. For example, a frequency will be considered to belong to a harmonic region if it is in a frequency interval centered on a harmonic kF ₀ and of width corresponding to a width of spectral line synthesized, and to a non-harmonic region otherwise. In non-harmonic regions, the complex signal subjected to TFRI 37 is equal to the value of the spectrum, that is to say that its module and its phase correspond to the values | Xj and φ _χ provided by the TFR module 3. In the harmonic regions, this complex signal has the same phase φ _χ as the spectrum and a modulus given by the lower envelope after spectral decompression 36. This way of proceeding according to FIG. 7 provides a more precise modeling of the regions not harmonics. The decoder shown in Figure 8 includes a demultiplexer input 45 which extracts from the bit stream Φ, coming from an encoder according to FIG. 1, the indexes iF, icxs, ICXI, lEm for quantification of the fundamental frequency F ₀ , cepstral coefficients representing the compressed upper envelope, coefficients representing the compressed lower envelope, and the weighting vector Emix, and distributing them respectively to modules 46, 47, 48 and 49 These modules 46-49 include quantization dictionaries similar to those of modules 5, 18, 34 and 40 of FIG. 1, in order to restore the values of the quantized parameters The modules 47 and 48 have dictionaries to form the quantized prediction residues rcx_q [n], and they deduce therefrom the quantized cepstral vectors cx_q [n] with identical elements elements 23-26 of figure 6 These quantified cepstral vectors cx_q [n] provide the cepstral coefficients cx_sup_q and cx_ιnf_q processed by the decoder

A module 51 calculates the fast Fourier transform of the cepstral coefficients cx_suρ for each signal frame The frequency spectrum of the resulting compressed spectrum is modified non-linearly by a module 52 applying the reciprocal non-linear transformation of that of module 12 of figure 1, and which provides the estimate LX_sup of the compressed upper envelope A spectral decompression of LX_sup, operated by a module 53, provides the upper envelope X_sup comprising the estimated values of the spectrum module at frequencies multiple of the frequency fundamental F ₀

The module 54 synthesizes the spectral estimate X _v of the harmonic component of the audio signal, by a sum of spectral lines centered on the frequencies multiple of the fundamental frequency F ₀ and whose amplitudes (in module) are those given by the envelope superior X_sup

Although the digital input stream Φ does not contain specific information on the phase of the signal spectrum at the harmonics of the fundamental frequency, the decoder in FIG. 8 is capable of extracting information on this phase from cepstral coefficients cx_sup_q representing the compressed upper envelope This phase information is used to assign a phase φ (k) to each of the spectral lines determined by the module 54 in the estimation of the harmonic component of the signal

As a first approximation, the speech signal can be considered to be at minimum phase. On the other hand, it is known that the information of minimum phase can be easily deduced from a cepstral modeling. minimum phase information is therefore calculated for each harmonic frequency. The minimum phase assumption means that the energy of the synthesized signal is localized at the start of each period of the fundamental frequency F ₀ . To be closer to a real speech signal, a little dispersion is introduced by means of a specific post-liftering of the cepstrums during the synthesis of the phase. With this post-liftrage, carried out by the module 55 in FIG. 8, it is possible to accentuate the form resonances of the envelope and therefore to control the dispersion of the phases. This post-liftrage is for example of the form (8).

To limit phase breaks, it is preferable to smooth the post-liftrated cepstral coefficients, which is done by module 56. Module 57 deduces post-liftrated cepstral coefficients and smoothed the minimum phase assigned to each spectral line representing a harmonic peak of the spectrum.

The operations performed by the modules 56, 57 for smoothing and extracting the minimum phase are illustrated by the flowchart in FIG. 9. The module 56 examines the variations of the cepstral coefficients in order to apply a lesser smoothing in the presence of sudden variations only in the presence of slow variations. For this, it performs the smoothing of the cepstral coefficients by means of a forgetting factor λ _c chosen as a function of a comparison between a threshold d _th and a distance d between two successive sets of post-liftrated cepstral coefficients. The threshold d _th is itself adapted as a function of the variations of the cepstral coefficients. The first step 60 consists in calculating the distance d between the two successive vectors relating to the frames n-1 and n. These vectors, denoted here cxp [n-1] and cxp [n], correspond for each frame to all of the NCS post-liftral cepstral coefficients representing the compressed upper envelope. The distance used can in particular be the Euclidean distance between the two vectors or even a quadratic distance.

Two smoothings are first carried out, respectively by means of forgetting factors λ _mjn and λ _maχ , to determine a minimum distance d _mjn and a maximum distance d _maχ . The threshold d _th is then determined in step 70 as being located between the minimum and maximum distances d _min , d _max : d _th = β d _max + (1-β) .d _min , the coefficient β being for example equal to 0.5.

In the example shown, the forgetting factors λ _mjrι and λ _max are themselves selected from two distinct values, respectively λ _min , λ _min2 ^{and λ} maxi '^ -ma i between 0 and 1, the indices λ _mjn1 , λ _max1 each being substantially closer to 0 than the indices λ _mjn2 , λ _max2 . If d> d _min (test 61), the forget factor λ _min is equal to λ _min1 (step 62); otherwise it is taken equal to λ _mjn2 (step 63). In step 64, the minimum distance d _min is taken equal to ^λ min min ⁺ (- ^λ min) ^d - ^{If d} > ^d max ( ^{test 65} ) - ^the ^forget ^factor ^λ max ^is 3 ^{al to} λ _max1 (step 66); otherwise it is taken equal to λ _max2 (step 67). In step 68, the minimum distance d _maχ is taken equal to λ _max .d _max + (1-λ _max ) .d.

If the distance d between the two consecutive cepstral vectors is greater than the threshold d _th (test 71), a value λ _c1 relatively close to 0 is adopted for the forget factor λ _c (step 72). In this case, the corresponding signal is considered to be of the non-stationary type, so that there is no need to keep a large memory of the previous cepstral coefficients. If d <d _th , in step 73 we adopt for the forget factor λ _c a value λ _c2 less close to 0 in order to further smooth the cepstral coefficients. Smoothing is performed in step 74, where the vector cxl [n] of smoothed coefficients for the current frame n is determined by: cxl [n] = λ _c .cxl [n-1] + (l - λ _c ) cxρ [n] (11)

The module 57 then calculates the minimum phases φ (k) associated with the harmonics kF ₀ . In known manner, the minimum phase for a harmonic of order k is given by:

NCS-1 φ (k) = -2. J cxl [n, m] .sin (2πmk F ₀ / F _e ) (12) m = l where cxl [n, m] denotes the smooth cepstral coefficient of order m for the frame n.

In step 75, the harmonic index k is initialized to 1. To initialize the calculation of the minimum phase assigned to the harmonic k, the phase φ (k) and the cepstral index m are initialized respectively at 0 and 1 in step 76. In step 77, the module 57 adds to phase φ (k) the quantity -2.cxl [n, m] .sin (2πmk.F ₀ / F _e ). The cepstral index m is incremented in step 78 and compared to NCS in step 79. Steps 77 and 78 are repeated as long as m <NCS. When m = NCS, the calculation of the minimum phase is completed for the harmonic k, and the index k is incremented in step 80. The calculation of minimum phases 76-79 is repeated for the following harmonic as long as kF ₀ F _e / 2 (test 81). In the exemplary embodiment according to FIG. 8, the module 54 takes account of a constant phase over the width of each spectral line, equal to the minimum phase φ (k) supplied for the corresponding harmonic k by the module 57.

The estimate X _v of the harmonic component is synthesized by summing spectral lines positioned at the harmonic frequencies of the fundamental frequency F ₀ . During this synthesis, the spectral lines can be positioned on the frequency axis with a resolution greater than the resolution of the Fourier transform. For that, one precalculates once and for all a spectral line of reference according to the higher resolution. This calculation can consist of a Fourier transform of the analysis window fA with a transform size of 16384 points, providing a resolution of 0.5 Hz per point. The synthesis of each harmonic line is then carried out by the module 54 by positioning on the frequency axis the reference line at high resolution, and by sub-sampling this spectral line of reference to reduce to the resolution of 16.625 Hz of the Fourier transform on 512 points. This allows to precisely position the spectral line.

For the determination of the lower envelope, the TFR module 85 of the decoder of FIG. 8 receives the NCI quantified cepstral coefficients cx_inf_q of orders 0 to NCI - 1, and it advantageously supplements them by the NCS - NCI cepstral coefficients cx_sup_q d NCI to NCS order - 1 representing the upper envelope. Indeed, it can be estimated as a first approximation that the rapid variations of the compressed lower envelope are well reproduced by those of the compressed upper envelope. In another embodiment, the TFR 85 module could only consider the NCI cepstraux parameters cx_inf_q.

The module 86 converts the frequency scale reciprocally from the conversion operated by the module 32 of the coder, in order to restore the estimate LX_inf of the compressed lower envelope, subjected to the spectral decompression module 87. At the output of the module 87 , the decoder has of a lower envelope X_ιnf comprising the values of the spectrum modulus in the valleys located between the harmonic peaks

This envelope Xjnf will modulate the spectrum of a noise frame whose phase is processed as a function of the quantized weighting vector Emix extracted by the module 49 A generator 88 delivers a normalized noise frame whose 4 ms segments are weighted in a module 89 in accordance with the normalized components of the Emix vector provided by module 49 for the current frame This noise is a high-pass filtered white noise to take account of the low level which in principle the non-voiced component has at low frequencies From the noise weighted in energy, the module 90 forms frames of 2N = 512 samples by applying the analysis window f _A , the insertion of 256 samples at zero and the circular permutation for the phase compensation in accordance with what has been explained with reference to Figure 3 The Fourier transform of the resulting frame is calculated by the TFR 91 module

The spectral estimate X _uv of the non-harmonic component is determined by the spectral synthesis module 92 which performs frequency-by-frequency weighting. This weighting consists in multiplying each complex spectral value supplied by the TFR module 91 by the value of the lower envelope X nf obtained for the same frequency by the spectral decompression module 87

The spectral estimates X _v X _uv of the harmonic components

(voiced in the case of a speech signal) and non-harmonic (or voiceless) are combined by a mixing module 95 controlled by a module 96 for analyzing the degree of harmony (or voicing) of the signal

The organization of these modules 95, 96 is illustrated by FIG. 10 The analysis module 96 comprises a unit 97 for estimating a degree of voicing W dependent on the frequency, from which four gains dependent on the frequency, namely two gains g _v , g _uv controlling the relative importance of the harmonic and non-harmonic components in the synthesized signal, and two gains g _v g _uv used to noise the phase of the harmonic component

The degree of voicing W (ι) is a continuously variable value between 0 and 1 determined for each frequency index i (0 <i <N) as a function of the upper envelope X_sup (ι) and the lower envelope X_inf (i) obtained for this frequency i by the decompression modules 53, 87. The degree of voicing W (i) is estimated by the unit 97 for each frequency index i corresponding to a harmonic of the fundamental frequency F ₀ ,

namely i = 2Nk ^ + l for k = 1, 2, ..., by an increasing function of

ratio between the upper envelope X_sup and the lower envelope Xjnf at this frequency, for example according to the formula: ¹⁰ ° »° t ^X _v - _h ^S g ^{) X} - ^{inf (i)} l} (13)

The threshold Vth (F ₀ ) corresponds to the average dynamics calculated on a synthetic spectrum purely voiced at the fundamental frequency. It is advantageously chosen depending on the fundamental frequency F ₀ .

The degree of voicing W (i) for a frequency other than the harmonic frequencies is obtained simply as being equal to that estimated for the nearest harmonic.

The gain g _v (i), which depends on the frequency, is obtained by applying a non-linear function to the degree of voicing W (i) (block 98). This nonlinear function has for example the form represented on figure 11: g _v (i) = 0 if 0 <W (i) <W1

g (i) = ^W (') - W1 _S ^j wi <WO) <W2 ⁽¹⁴⁾ ^v W2 - W1 g _v (i) = 1 if W2 <W (i) <1 the thresholds W1, W2 being such that 0 <W1 <W2 <1. The gain g _uv can be calculated in a similar way to the gain g _v (the sum of the two gains g _v , g _uv being constant, for example equal to 1), or simply deduced from it by the relation g _uv (i) = 1 - g _v (i), as shown diagrammatically by the subtractor 99 in FIG. 10. It is interesting to be able to noise the phase of the harmonic component of the signal at a given frequency if the analysis of the degree of voicing shows that the signal is rather of non-harmonic type at this frequency. For this, the phase φ _v of the mixed harmonic component is the result of a linear combination of the phases φ _v , φ _uv of the harmonic and non-harmonic components X _v , X _uv synthesized by the modules 54, 92. The gains g _v g _uv respectively applied to these phases are calculated from the degree of voicing W and also weighted as a function of the frequency index i. since the phase noise is only really useful beyond a certain frequency.

A first gain g _v1 is calculated by applying a non-linear function to the degree of voicing W (i), as shown diagrammatically by block 100 in FIG. 10. This non-linear function can have the form represented in FIG. 12:

9vi_ _φ ( ⁱ ) = G1 if O ≤ W (i) <W3

if W3 <W (i) <W4 (15) g _v1 _ _φ (i) = 1 if W4 ≤ W (i) <1 the thresholds W3 and W4 being such that 0 <W3 <W4 <1, and the minimum gain G1 being between 0 and 1.

A multiplier 101 multiplies for each index frequency i the gain g _v1 by another gain g _v2 depending only on the frequency index i, to form the gain g _v (i). The gain g _v2 (i) depends non-linearly on the frequency index i, for example as shown in Figure 13:

9v2_ _φ ( ⁱ ) ^{= 1} if 0 <i <it

9v2_ _φ ) = - (- G2) ^ SUK K I2 (16)

g _v2 _ _φ (i) = G2 if i2 ≤ i <1 the indices il and i2 being such that 0 <il <i2 <N, and the minimum gain G2 being between 0 and 1. The gain g _uv (i) can be calculated simply as being equal to 1 - g _v _ _φ (i) = 1 - g _v1 _ _φ (i) 9 _v2 _ _φ (i) (subtractor 102 of figure 10). The complex spectrum Y of the synthesized signal is produced by the mixing module 95, which realizes the following mixing relation, for 0 <i <N:

Y (i) = g _v (') | X _v (i) | ^. ΘXPD ΦVO)] ⁺ Quvfl) - Xuv (') ( ¹⁷ ) with φ ^' _v (i) = g _v _ _φ (i). φ _v (i) + g _uv _ _φ (i). φ _uv (i) (18) where φ _v (i) denotes the argument of the complex number X _v (i) supplied by the module 54 for the frequency of index i (block 104 of figure 10), and φ _uv (i) designates the argument of the complex number X _uv (ι) supplied by the module 92 (block 105 of FIG. 10) This combination is carried out by the multipliers 106-110 and the adders 111-112 represented in FIG. 10

The mixed spectrum Y (ι) for 0 <i <2N (with Y (2N-1-ι) = Y (ι)) is then transformed in the time domain by the TFRI 115 module (Figure 8) We only retain the first N / 2 = 128 and the last N / 2 = 128 samples of the frame of 2N = 512 samples produced by the module 115, and the inverse circular permutation of that illustrated in FIG. 3 is applied to obtain the synthesized frame of N = 256 samples weighted by the analysis window f _A The frames successively obtained in this way are finally processed by the time synthesis module 116 which forms the decoded audio signal x

The time synthesis module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115 The modification can be seen in two stages illustrated respectively in FIGS. 14 and 15

The first step (Figure 14) consists in multiplying each frame 2 'delivered by the TFRI module 115 by a window 1 / f _A inverse of the analysis window f _A used by the module 1 of the coder The samples of frame 2 "which result are therefore weighted uniformly

The second step (FIG. 15) consists in multiplying the samples of this 2 "frame by a synthesis window f _s verifying the following properties f _s (N-L + ι) + f _s (ι) = A for O ≤ KL ( 19) f _s (ι) = A for L <ι <NL (20) where A denotes an arbitrary positive constant, for example A = 1 The synthesis window f _s (ι) gradually increases from 0 to A for i going from 0 to L It is for example a raised half-smusoid ^f s (') = - (l - ∞s [(ι + 1/2) π / L]) for O ≤ KL (21)

After reweighting each frame 2 "by the summary window f _s , the module 116 positions the successive frames with their time offsets of M = 160 samples and their time overlaps of L = 96 samples, then it performs the sum of the frames thus positioned in time Due to the properties (19) and (20) of the summary window f _s , each sample of the decoded audio signal x thus obtained is assigned a uniform overall weight equal to A This overall weight comes from the contribution of a single frame if the sample has in this frame a rank i such that L <i <N - L, and includes the summed contributions of two successive frames if 0 <i <L where N - L ≤ ι <N

We can thus perform the time synthesis in a simple way even if, as in the case considered, the overlap L between two successive frames is smaller than half the size N of these frames

The two steps set out above for modifying the signal frames can be merged in a single step. It suffices to precompute a compound window f _c (ι) = f _s (ι) / f _A (ι), and to simply multiply the frames

2 'of N = 256 samples delivered by the module 115 by the compound window f _c before performing the overlapping summation

FIG. 16 shows the appearance of the compound window f _c in the case where the analysis window f _A is a Hamming window and the synthesis window f _s has the form given by the relations (19) to (21)

Other forms of the summary window f _s verifying the relations (19) and (20) can be used. In the variant of FIG. 17, it is a piecewise affine function defined by f _s (ι) = A i / L for 0 <ι <L (22)

In order to improve the coding quality of the audio signal, the coder in FIG. 1 can increase the rate of formation and analysis of the frames, in order to transmit more quantization parameters to the decoder. In the frame structure represented in the figure 2, a frame of N = 256 samples (32 ms) is formed every 20 ms These frames of 256 samples could be formed at a higher rate, for example of 10 ms, two successive frames then having an offset of M / 2 = 80 samples and a recovery of 176 samples

Under these conditions, it is possible to transmit the complete sets of quantization parameters iF, icxs, icxi, lEm for only a subset of the frames, and transmit for the other frames parameters making it possible to carry out an adequate interpolation at the level of the decoder In the example envisaged above, the subset for which complete sets of parameters are transmitted can be constituted by the frames of rank integer n, whose periodicity is M / F _e = 20 ms, and the frames for which an interpolation is performed can be those of rank half-integer n + 1/2 which are offset by 10 ms relative to the frames of the sub -together.

In the embodiment illustrated in FIG. 18, the notations cx_q [n-1] and cx_q [n] denote quantified cepstral vectors determined, for two successive frames of whole rank, by the quantization module 18 and / or by the quantization 34. These vectors comprise for example four consecutive cepstral coefficients each. They could also include more cepstral coefficients. A module 120 performs an interpolation of these two cepstral vectors cx_q [n-1] and cx_q [n], in order to estimate an intermediate value cx_i [n-1/2]. The interpolation performed by the module 120 can be a simple arithmetic mean of the vectors cx_q [n-1] and cx_q [n]. As a variant, the module 120 could apply a more sophisticated interpolation formula, for example polynomial, also relying on the cepstral vectors obtained for frames prior to the frame n-1. On the other hand, if more than one interpolated frame is interposed between two consecutive frames of whole rank, the interpolation takes account of the relative position of each interpolated frame.

Using the means described above, the coder also calculates the cepstral coefficients cx [n-1/2] relating to the frame of half-integer rank. In the case of the upper envelope, these cepstral coefficients are those provided by the module of TFR1 13 after post-liftrage 15 (for example with the same post-liftrage coefficients as for the previous frame n-1) and normalization 16. In the case of the lower envelope, the cepstral coefficients cx [n-1/2] are those delivered by the TFRI module 33.

A subtractor 121 forms the difference ecx [n-1/2] between the cepstral coefficients cx [n-1/2] calculated for the half-integer row frame and the coefficients cx_i [n-1/2] estimated by interpolation. This difference is supplied to a quantization module 122 which addresses quantization indices icx [n-1/2] to the output multiplexer 6 of the coder. The module 122 operates for example by vector quantization of the ecx interpolation errors [n-1/2] successively determined for the half-integer rank frames.

This quantification of the interpolation error can be carried out by the coder for each of the NCS + NCI cepstral coefficients used by the decoder, or only for some of them, typically those of orders the smallest.

The corresponding means of the decoder are illustrated in FIG. 19.

The decoder essentially functions as that described with reference to Figure 8 to determine the signal frames of whole rank. An interpolation module 124 identical to the module 120 of the coder estimates the intermediate coefficients cx_i [n-1/2] - from the quantized coefficients cx_q [n-1] and cx_q [π] supplied by the module 47 and / or the module 48 from the icxs, icxi indexes extracted from the flow Φ. A parameter extraction module 125 receives the quantization index icx [n-1/2] from the input demultiplexer 45 of the decoder, and deduces therefrom the quantized interpolation error ecx_q [n-1/2] from the same quantization dictionary as that used by the module 122 of the coder. An adder 126 sums the cepstral vectors cx_i [n-1/2] and ecx_q [n-1/2] in order to provide the cepstral coefficients cx [n-1/2] which will be used by the decoder (modules 51 - 57, 95, 96, 1 15 and / or modules 85-87, 92, 95, 96, 115) to form the interpolated frame of rank n-1/2.

If only some of the cepstral coefficients have been the subject of an interpolation error quantification, the others are determined by the decoder by a simple interpolation, without correction.

The decoder can also interpolate the other parameters F ₀ , Emix used to synthesize the signal frames. The fundamental frequency F ₀ can be interpolated linearly, either in the time domain, or (preferably) directly in the frequency domain. For the possible interpolation of the energy weighting vector Emix, the interpolation should be carried out after denormalization and of course taking account of the time offsets between frames.

It should be noted that it is particularly advantageous, to interpolate the representation of the spectral envelopes, to perform this interpolation in the cepstral domain. Contrary to an interpolation carried out on other parameters, such as the LSP coefficients (“Line Spectrum Pairs”), the linear interpolation of the cepstral coefficients corresponds to the linear interpolation of the compressed spectral amplitudes.

In the variant represented in FIG. 20, the coder uses the cepstral vectors cx_q [n], cx_q [n-1], ..., cx_q [nr] and cx_q [n-1/2] calculated for the last frames passed (r> 1) to identify an optimal interpolator filter which, when subject to the quantified cepstral vectors cx_q [nr], ..., cx_q [n] relating to frames of whole rank, delivers an interpolated cepstral vector cx_i [n-1/2] which has a minimum distance with the vector cx [n-1/2] calculated for the last frame of half-whole row.

In the example shown in FIG. 20, this interpolator filter 128 is present in the coder, and a subtractor 129 subtracts its output cx_i [n-1/2] from the calculated cepstral vector cx [n-1/2]. A minimization module 130 determines the set of parameters {P} of the interpolator filter 128, for which the interpolation error ecx [n-1/2] delivered by the subtractor 129 has a minimum standard. This set of parameters {P} is addressed to a quantization module 131 which provides a corresponding quantization index iP to the output multiplexer 6 of the coder.

Depending on the bit rate allocated in the flow Φ to the quantization indexes of the parameters {P} defining the optimal interpolator filter 128, we can adopt a more or less fine quantification of these parameters, or a more or less elaborate form of the interpolator filter, or still provide several distinctly quantized interpolating filters for different vectors of cepstral coefficients.

In a simple embodiment, the interpolator filter 128 is linear, with r = 1: cx_i [n-1/2] = p.cx_q [n-1] + (1-ρ) .cx_q [n] (23) and the set of parameters {P} is limited to the coefficient p between 0 and 1.

From the quantization indexes iP of the parameters {P} obtained in the bit stream φ, the decoder reconstructs the interpolator filter 128 (except for quantization errors), and processes the spectral vectors cx_q [nr], ..., cx_q [ n] in order to estimate the cepstral coefficients cx [n-1/2] used to synthesize the half-integer rank frames.

In general, the decoder can use a simple interpolation method (without transmission of parameters from the coder for half-integer rank frames), an interpolation method with consideration of an interpolation error quantized (according to Figures 17 and 18), or an interpolation method with an optimal interpolator filter (according to Figure 19) to evaluate the half-integer rank frames in addition to the whole rank frames evaluated directly as explained with reference to Figures 8 to 13. The time synthesis module 116 can then combine all of these evaluated frames to form the synthesized signal x in the manner explained above. after with reference to Figures 14, 21 and 22.

As in the temporal synthesis method previously described, the module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115, and this modification can be seen in two stages, the first of which is identical to that previously described with reference to FIG. 14 (divide the samples of the frame 2 'by the analysis window f).

The second step (Figure 21) consists in multiplying the samples of the renormalized 2 "frame by a synthesis window f _s verifying the following properties: f _s (i) = 0 for 0 <i <N / 2 - M / p and N / 2 + M / p <i <N (24) fs (i) + f _s (i + M / p) = A for N / 2 - M / p <i <N / 2 (25) where A denotes an arbitrary positive constant, for example A = 1, and p is the integer such that the time offset between the successive frames (calculated directly and interpolated) is of M / p samples, or p = 2 in the example described. summary window f _s (i) gradually increases for i going from

N / 2 - M / p to N / 2. It is for example a sinusoid raised on the interval

N / 2 - M / p <i <N / 2 + M / p. In particular, the synthesis window fs can be, over this interval, a Hamming window (as represented in FIG. 21) or a Hanning window.

FIG. 21 shows the successive frames 2 "repositioned in time by the module 116. The hatching indicates the portions eliminated from the frames (summary window at 0). It can be seen that by performing the overlapping sum of the samples of the successive frames, the property (25) ensures a homogeneous weighting of the samples of the synthesized signal.

As in the synthesis method illustrated by FIGS. 14 and 15, the weighting procedure of the frames obtained by inverse Fourier transform of the spectra Y can be carried out in a single step, with a compound window f _c (i) = fs (/ ^ (0 - Figure 22 shows the shape of the compound window f _c in the case where the windows f _A and fs are of the Hamming type.

Like the temporal synthesis method illustrated by figures 14 to 17, that illustrated by figures 14, 21 and 22 makes it possible to take into account an overlap L between two analysis frames (for which the analysis is carried out completely) smaller than half than the size N of these frames. In general, this last method is applicable when the successive analysis frames have mutual time shifts M of more than N / 2 samples (even possibly more than N samples if a very low bit rate is required), the interpolation leading to a set of frames whose mutual time offsets are less than N / 2 samples.

The interpolated frames can be the subject of a reduced transmission of coding parameters, as described above, but this is not compulsory. This embodiment makes it possible to maintain a relatively large interval M between two analysis frames, and therefore to limit the required transmission rate, while limiting the discontinuities likely to appear due to the size of this interval relative to the scales. of time typical of the variations of the parameters of the audio signal, in particular the cepstral coefficients and the fundamental frequency.

Claims

R E V E N D I C A T I O N S

1 Method for analyzing an audio signal (x) processed by successive frames of N samples, in which the samples of each frame are weighted by an analysis window (f _A ) of the Hamming, Hanning, Kaiser or similar type, we compute a spectrum of the audio signal by transforming each frame of weighted samples in the frequency domain, and we process the spectrum of the audio signal to deliver parameters (cx_sup, cx ^' nf, Emix) of synthesis of a signal derived from the signal analyzed audio, characterized in that the successive frames comprise an alternation of frames for which complete sets of synthesis parameters are delivered and of frames for which incomplete sets of synthesis parameters are delivered, and in that the successive frames for which complete sets of synthesis parameters are delivered with mutual recoveries of less than N / 2 samples.

2. Method according to claim 1, in which the incomplete sets of synthesis parameters include data (icx [n-1/2]) representing an interpolation error (ecx [n-1/2]) of at least one of the synthesis parameters.

3. Method according to claim 1, in which the incomplete sets of synthesis parameters include data (iP) representing a filter (128) of interpolation of at least one of the synthesis parameters.

4. Method according to any one of claims 1 to 3, in which the processing of the spectrum of the audio signal (x) comprises an extraction of coding parameters (cx_sup, cxjnf, Emix) with a view to transmission and / or storage. coded audio signal.

5. Method according to any one of claims 1 to 3, wherein the processing of the spectrum of the audio signal (x) includes denoising by spectral subtraction.

6. Audio processing device, comprising analysis means for executing a method according to any one of claims 1 to 5.

7. Method for synthesizing an audio signal, in which successive spectral estimates (Y) are obtained corresponding respectively to frames of N samples of the audio signal weighted by an analysis window (f _A ), the successive frames presenting mutual overlaps of L samples, each frame of the audio signal is evaluated by transforming the spectral estimates in the time domain, and the evaluated frames are combined to form the synthesized signal (x), characterized in that each evaluated frame is modified in it applying a processing corresponding to a division by said analysis window (f _A ) and to a multiplication by a synthesis window (f _s ), and the synthesized signal is formed as a sum with overlap of the modified frames, and in that , the number L being smaller than N / 2 and the samples of a frame having rows i numbered from 0 to N-1, the synthesis window f _s (i) checks f _s (N-L + i) + f _s (i) = A for 0 <i <L, and is equal to A for L <i <NL, A being a positive constant.

8. Method according to claim 7, in which the summary window f _s (i) increases from 0 to A for i ranging from 0 to L.

9. The method of claim 8, wherein the synthesis window f _s (i) for 0 <i <L is a raised half-sinusoid.

10. Method of synthesizing an audio signal, in which a set of successive overlapping frames of N samples of the audio signal weighted by an analysis window (f _A ) is evaluated, by transforming spectral estimates (Y ) corresponding to said frames respectively, and the evaluated frames are combined to form the synthesized signal (x), characterized in that, for a subset of the evaluated frames, the spectral estimates are obtained by processing synthesis parameters (cx_sup_q, cx_inf_q , Emix) respectively associated with the frames of said subset while, for the frames not forming part of the subset, the spectral estimates are obtained with an interpolation of at least part of the synthesis parameters, in that the frames of said subset exhibit mutual time offsets of M samples, the number M being greater than N / 2, while the successive frames es of said set have time offsets mutuals of M / p samples, p being an integer greater than 1, in that each evaluated frame is modified by applying to it a processing corresponding to a division by said analysis window (f _A ) and to a multiplication by a synthesis window (fs), and the synthesized signal is formed as an overlapping sum of the modified frames, and in that, the samples of a frame having rows i numbered from 0 to N-1, the synthesis window fs (i) has a support limited to rows i going from N / 2 - M / p to N / 2 + M / p and checks fs (') ^{+ f} s ( ^{i + M} / P) ^{= A} P for ^{N / 2} - M / p <i <N / 2, A being a positive constant.

11. The method of claim 10, wherein the synthesis window fs (i) increases for i ranging from N / 2 - M / p to N / 2.

12. The method of claim 11, wherein the synthesis window f _s (i) for N / 2 - M / p <i <N / 2 + M / p is a raised sinusoid.

13. Method according to any one of claims 10 to 12, in which data (icx_q [n-1/2]) representing an interpolation error (ecx_q [n-1 / 2J) are associated with the frames not making part of said subset, and are used to correct at least one of the interpolated synthesis parameters (cx_i [n-1 / 2j).

14. Method according to any one of claims 10 to 12, in which data (iP) representing an interpolating filter (128) are associated with the frames not forming part of said subset, and are used to interpolate at least one of the synthesis parameters.

15. Method according to any one of claims 10 to 14, in which the synthesis parameters include cepstral coefficients (cx [n]) subjected to the interpolation.

16. Audio processing device, comprising synthesis means for executing a method according to any one of claims 7 to 15.