WO2011073600A1

WO2011073600A1 - Parametric stereo encoding/decoding having downmix optimisation

Info

Publication number: WO2011073600A1
Application number: PCT/FR2010/052807
Authority: WO
Inventors: Stéphane RAGOT; Thi Minh Nguyet Hoang; Balazs Kovesi
Original assignee: France Telecom
Priority date: 2009-12-18
Filing date: 2010-12-17
Publication date: 2011-06-23

Abstract

The present invention relates to a method for parametric encoding of a stereo digital audio signal comprising a step of encoding (312) a mono signal produced by downmixing (307) applied to the stereo signal and encoding spatialisation information (315, 316) of the stereo signal. Said method is characterised in that downmixing comprises calculating (330), using the frequency coefficient, the amplitude of the mono signal according to the amplitudes of the channels of the stereo signal, and determining (331), for a predetermined set of frequency coefficients, the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal. The transmission of spatialisation information is also suitable, according to the invention, for said downmixing. The invention also relates to the relevant decoding method, and to the encoder and decoder implementing said respective methods.

Description

Stereo parametric coding / decoding with channel reduction processing optimization

The present invention relates to the field of coding / decoding of digital signals.

The coding and decoding according to the invention is particularly suitable for the transmission and / or storage of digital signals such as audio-frequency signals (speech, music or other).

More particularly, the present invention relates to the parametric encoding / decoding of multichannel audio signals, especially stereophonic signals hereinafter called stereo signals.

This type of coding / decoding is based on the extraction of spatial information parameters so that at decoding, these spatial characteristics can be reconstructed for the listener, in order to recreate the same spatial image as in the original signal.

Such a parametric encoding / decoding technique is for example described in the document entitled "Parametric Coding of Stereo Audio" in EURASIP Journal on Applied Signal Processing 2005 by J. Breebaart and S. van de Par and A. Kohlrausch and E. Schuijers. : 9, 1305-1322. This example is repeated with reference to FIGS. 1 and 2 respectively describing an encoder and a parametric stereo decoder.

Thus, FIG. 1 describes an encoder receiving two audio channels, a left channel

(noted L for Left in English) and a right channel (noted R for Right in English).

The time channels L (n) and .R (n) are processed by the blocks 101, 102 and 103, 104 respectively which perform a short-term Fourier analysis. The transformed signals L [jJ and R [j] are thus obtained.

Block 105 performs a channel reduction processing or "Downmix" in English to obtain in the frequency domain from the left and right signals, a monophonic signal hereinafter called a mono signal which is here a sum signal.

Extraction of spatial information parameters is also performed in block 105. The extracted parameters are as follows.

The ICLD (InterChannel Level Difference) parameters, also called interchannel intensity differences, characterize the energy ratios per frequency subband between the left and right channels. These parameters allow you to position sound sources in the stereo horizontal plane by panning. They are defined in dB by the following formula:

where L [j] and [j] correspond to the spectral (complex) coefficients of the L and R channels, the values B [k] and B [k + 1], for each frequency band k, define the subband cutout spectrum and the symbol * indicates the complex conjugate.

The parameters ICPD (for "InterChannel Phase Difference" in English), also called phase differences, are defined according to the following relation:

ICPDW = (Σ ^ L [, -].? * [./]) (2) where Δ indicates the argument (phase) of the complex operand.

Equivalent to ICPD can also be defined as an inter-channel time shift called ICTD (for "InterChannel Time Difference"), the definition of which is not repeated here.

The ICC (for "InterChannel Coherence") parameters represent inter-channel correlation (or coherence) and are associated with the spatial width of the sound sources; their definition is not recalled here, but it is noted in the article by Breebart et al. that these parameters are not necessary in the subbands reduced to a frequency coefficient.

These ICLD, ICPD and ICC parameters are extracted from the stereo signals, by block 105.

The mono signal is passed in the time domain (blocks 106 to 108) after short-term Fourier synthesis (inverse FFT, windowing and OverLap-Add or OLA) and a mono coding (block 109) is realized. . In parallel, the stereo parameters are quantized and coded in block 110.

In general, the spectrum of the signals (L [y], i? [J]) is divided according to a non-linear frequency scale of ERB (equivalent Rectangular Bandwidth) or Bark type, with a number of sub-bands typically ranging from 20 to 34 for a sampled signal of 16 to 48 kHz. This scale defines the values of B (k) and B (k + 1) for each subband k. The parameters (ICLD, ICPD, ICC) are encoded by scalar quantization possibly followed by entropy coding or differential coding. For example, in the article previously cited, 11CLD is encoded by a non-uniform quantizer (ranging from -50 to +50 dB) with differential coding. The non-uniform quantization step exploits the fact that the higher the value of the ICLD, the lower the auditory sensitivity to variations of this parameter.

For the coding of the mono signal (block 109), several quantization techniques with or without memory are possible, for example coding with "Coded Pulse Modulation" (MIC), its adaptive version called "Adaptive Differential Coded Pulse Modulation" ( ADPCM) or more advanced techniques such as transform perceptual coding or Code Excited Linear Prediction (CELP) coding.

Of particular interest here is UET-T Recommendation G.722, which uses ADPCM coding for ADAPM (Adaptive Differential Pulse Code Modulation).

The input signal of a G.722-type encoder is in an expanded band with a minimum bandwidth of [50-7000 Hz] with a sampling frequency of 16 kHz. This signal is decomposed into two sub-bands [0-4000 Hz] and [4000-8000 Hz] obtained by decomposition of the signal by Quadrature Mirror Filters (QMF) quadrature mirror filters in English, then each of the subbands is encoded separately by an ADPCM encoder.

The low band is coded by a 6, 5 and 4 bit nested code ADPCM coding while the high band is coded by a 2 bits per sample ADPCM coder. The total bit rate is 64, 56 or 48 bit / s depending on the number of bits used for decoding the low band.

The 1988 G.722 Recommendation was first used in ISDN (Digital Integrated Services Network) for audio and videoconferencing applications. For several years, this coder has been used in HD telephony (High Definition) or HD Voice enhanced telephony applications in English on a fixed IP network.

A quantized signal frame according to the G.722 standard consists of 6, 5 or 4-bit coded quantization indices per low-band sample (0-4000 Hz) and 2 bits per high-band sample (4000-8000 Hz). ). Since the transmission frequency of the scalar indices is 8 kHz in each subband, the bit rate is 64, 56 or 48 kbit / s.

At the decoder 200, with reference to FIG. 2, the mono signal is decoded (block 201), a de-correlator is used (block 202) to produce two versions M (n) and M '(n) of decoded mono signal. These two signals are passed in the frequency domain (blocks 203 to 206) and the decoded stereo parameters (block 207) are used by the stereo synthesis (block 208) to reconstruct the left and right channels in the frequency domain. These channels are finally reconstructed in the time domain (blocks 209 to 214).

Thus, as mentioned for the encoder, the block 105 performs a channel reduction processing or "downmix" by combining the stereo channels (left, right) to obtain a mono signal which is then encoded by a mono encoder. The spatial parameters (ICLD, ICPD, ICC, ...) are extracted from the stereo channels and transmitted in addition to the bitstream from the mono encoder.

Several techniques have been developed for channel reduction processing or stereo downmix to mono. This "downmix" can be performed in the time or frequency domain. There are usually two types of "downmix":

- The passive "downmix" which corresponds to a direct matrixing of the stereo channels to combine them into a single signal;

- Active (adaptive) downmix that includes energy and / or phase control in addition to the combination of the two stereo channels.

The simplest example of passive downmix is given by the following time stamping:

M {n) = ^ [L (n) + R (n)) = (3)

This type of "downmix", however, has the disadvantage of not conserving the energy of the signals after the stereo to mono conversion when the L and R channels are not in phase.

An active downmix mechanism that improves the situation is given by the following equation:

where γ (η) is a factor that compensates for a possible loss of energy.

However, the fact of combining the signals L (n) and R (n) in the time domain does not make it possible to finely control (with sufficient frequency resolution) the possible phase differences between channels, and therefore the energy conservation by frequency subbands. This is why it is often more advantageous in terms of quality to perform the downmix in the frequency domain, even if it involves calculating time / frequency transforms and induces additional delay and complexity with respect to a downmix. temporal.

We can thus transpose the previous active downmix with the spectra of the left and right channels, as follows:

_{Mm = m} im ± ₍₅₎

where k corresponds to the index of a frequency coefficient (Fourier coefficient for example representing a frequency subband). The compensation parameter can be set as follows:

This ensures that the overall energy of the "downmix" is the sum of the energies of the left and right channels. The γΜ factor is here saturated with an amplification of 6dB.

Stereo to mono "downmix" technique of Breebaart et al. cited above is performed in the frequency domain. The mono signal M [k] is obtained by a linear combination of the L and R channels according to the equation:

where w ₁ , w ₂ are complex value gains. If w, = w ₂ = 0.5, the mono signal is considered as an average of the two L and R channels. The gains W _y , w ₂ are generally adapted as a function of the short-term signal, in particular to align the phases.

A particular case of this frequency downmix technique is proposed in the document entitled "A stereo to mono downmixing scheme for MPEG-4 parametric stereo encoder" by Samsudin, E. Kurniawati, N. Boon Poh, F. Sattar, S. George, in IEEE Trans., ICASSP 2006. In this document, the L and R channels are aligned in phase before performing the channel reduction processing.

More precisely, the phase of the channel L for each frequency sub-band is chosen as the reference phase, the channel R is aligned according to the phase of the channel L for each sub-band by the following formula:

R '[k] = e ^{J, CPm} R [k] (8) where R '[k] is the aligned R channel, k is the index of a coefficient in the ^frequency b sub-band,

ICPD [b] is the inter-channel phase difference in the ^frequency subband frequency given by:

K∞ [b] = z (Σ ^ L [k] .R '[k]) (9) where k _h defines the frequency intervals of the corresponding subband and * is the complex conjugate. Note that when the subband of index b is reduced to a frequency coefficient, we find:

Finally the mono signal obtained by the "downmix" of the Samsudin document mentioned above is calculated by averaging the L channel and the aligned R 'channel, according to the following equation:

Phase alignment therefore conserves energy by eliminating the influence of the phase. This "downmix" corresponds to the "downmix" described in the document by Breebart et al. or:

M [k] = w, L [£] + w ₂ R [k] with w, = ^ and w ₂ = ^ICP ^ [b] ^

However, an ideal conversion of a stereo signal to a mono signal must conserve energy for all frequency components of the signal.

This "downmix" operation is important for parametric stereo coding because the decoded stereo signal is only a spatial shaping of the decoded mono signal.

The downmix technique in the frequency domain described above retains the energy level of the stereo signal in the mono signal by aligning the R channel and the L channel before processing. This phase alignment avoids situations where the channels are in phase opposition.

However, this method relies on a total dependence of the "downmix" treatment on the channel (L or R) chosen to set the reference phase.

In extreme cases where the reference channel has zero energy or corresponds to a random signal (ambient noise, etc.), the phase of the mono signal after the downmix can become random or be poorly conditioned and give a mono signal resulting in poor quality.

The invention improves the situation.

For this purpose, it proposes a method of parametric coding of a stereo audio signal comprising a step of coding a mono signal resulting from a channel reduction processing applied to the stereo signal and coding signal spatialization information. stereo, the channel reduction processing including a calculation, by frequency coefficient, of the amplitude of the mono signal as a function of the amplitude of the channels of the stereo signal. The method is such that it further comprises a determination for a predetermined set of frequency coefficients, the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal.

Thus, the channel reduction processing according to the invention is carried out in the frequency domain by frequency coefficient so as to control very precisely the energy and the phase over the entire frequency spectrum.

The determination, according to the invention, of the phase of the mono signal makes it possible to obtain this phase information more naturally without depending on a particular stereo channel. This avoids the problems of particular situations of the stereo channels.

To adapt to this channel reduction processing, in a first embodiment, the spatialization information includes a first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second information comprising, by frequency coefficient, the phase difference defined between the mono signal and a first predetermined stereo channel.

Thus, only spatialization information useful for the reconstruction of the stereo signal is encoded. A low rate coding is then possible while allowing the decoder to obtain a good quality stereo signal.

In a second embodiment, the second piece of information furthermore comprises a minimum indication enabling the phase difference between the mono signal and the second stereo channel to be deduced.

In a preferred embodiment, the minimum indication is coded on a bit and indicates the choice between two possible phase differences between the mono signal and the second stereo channel for a low bit rate coding. In an alternative embodiment, the minimum indication is coded on a bit and indicates the choice between two possible gains to be applied to the intensity of the mono signal to find the second stereo channel.

To adapt to the channel reduction processing of the invention, in a third embodiment, the spatialization information includes a first information on the amplitude of the stereo channels and a second piece of information giving in particular the amplitude of the sum stereo channels.

This spatialization information is sufficient to reconstruct a decoder stereo signal of good quality.

For a low bit rate coding of the spatialization information, the second piece of information comprises, by frequency coefficient, the value of the amplitude of the sum of the stereo channels and a minimum indication making it possible to deduce the direction of rotation of the stereo channels.

In an alternative embodiment, the second piece of information comprises, by frequency coefficient, the value of a gain to be applied to the amplitude of the mono signal and a minimum indication making it possible to deduce the direction of rotation of the stereo channels.

In an alternative embodiment of all the modes, adapted to a hierarchical coding, the first piece of information is coded by a first coding layer and the second piece of information is coded by a second coding layer.

The invention also relates to a method of parametric decoding of a stereo audio signal comprising a step of decoding a received mono signal, resulting from a channel reduction processing applied to the original stereo signal and to decoding information. spatialization of the original stereo signal. The decoding is such that it comprises a synthesis of the stereo signals, by frequency coefficient, from the decoded mono signal comprising an amplitude information obtained as a function of the amplitudes of the original stereo channels and a signal information of the signal by means of the stereo channels originals and from decoded spatialization information.

The mono signal thus received provides a stereo signal that retains the energy of the original stereo signal.

In a first embodiment, the decoded spatialization information includes a first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second information comprising, by frequency coefficient, the angle defined between the mono signal and a first predetermined stereo channel.

This information therefore makes it possible to reproduce a stereo signal of good quality. In a second embodiment, the second piece of information furthermore comprises a minimum indication enabling the angle between the mono signal and the second stereo channel to be deduced.

A simple low-speed indication provides the information to find the stereo channels with the correct phase shift.

In one case, the decoded minimum indication indicates the choice between two possible angles between the mono signal and the second stereo channel.

In a variant, the decoded minimum indication indicates the choice between two possible gains to be applied to the intensity of the mono signal to find the second stereo channel.

In a third embodiment, the spatialization information includes a first information on the amplitude of the stereo channels and a second information on the amplitude of the sum of the stereo channels.

This information also makes it possible to reproduce a stereo signal of good quality.

In an alternative embodiment, the second piece of information comprises, by frequency coefficient, the value of the amplitude of the sum of the stereo channels and a minimum indication enabling the direction of rotation of the stereo channels to be deduced and in a variant the second piece of information comprises, for example, frequency coefficient, the value of a gain to be applied to the amplitude of the decoded mono signal, and a minimum indication for deducing the direction of rotation of the stereo channels.

In an alternative embodiment of all the modes, adapted to the hierarchical decoding, the first information on the amplitude of the stereo channels is decoded by a first decoding layer and the second information is decoded by a second decoding layer.

The invention also relates to a parametric encoder of a stereo digital audio signal comprising a coding module of a mono signal from a channel reduction processing module applied to the stereo signal and an information coding module of a stereo signal. spatialization of the stereo signal, the channel reduction processing module comprising a module for calculating, by frequency coefficient, the amplitude of the mono signal as a function of the amplitudes of the channels of the stereo signal. The encoder is such that the channel reduction processing module further comprises a determination module for a predetermined set of frequency coefficients, the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal.

It also relates to a parametric decoder of a stereo audio signal comprising a coding module of a mono signal from a channel reduction processing module applied to the stereo signal and a spatialization information coding module of the stereo signal, the channel reduction processing module comprising a module for calculating, by frequency coefficient, the amplitude of the mono signal as a function of the amplitudes of the channels of the stereo signal. The decoder is such that the channel reduction processing module further comprises a determination module for a predetermined set of frequency coefficients, the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal.

Finally, the invention relates to a computer program comprising code instructions for implementing the steps of a coding method according to the invention and / or a decoding method according to the invention.

The invention finally relates to a storage means readable by a processor storing a computer program as described.

Other features and advantages of the invention will appear more clearly on reading the following description, given solely by way of nonlimiting example, and with reference to the appended drawings, in which:

FIG. 1 illustrates an encoder implementing a parametric coding known from the state of the art and previously described;

FIG. 2 illustrates a decoder implementing a parametric decoding known from the state of the art and previously described;

FIG. 3 illustrates a stereo parametric encoder according to one embodiment of the invention embodying a coding method according to several embodiments of the invention;

FIGS. 4a and 4b illustrate the bit stream of spatialization information coded in a particular embodiment;

FIGS. 5a and 5b illustrate, in flowchart form, the steps for determining the encoder, spatialization information according to a first mode and a second embodiment of the invention, respectively; FIGS. 6a and 6b illustrate a method of calculating the spatialization and synthesis information of the stereo signals using this information according to the first embodiment;

FIG. 6c illustrates a calculation mode of the spatialization and synthesis information of the stereo signals using this information according to the second embodiment;

FIGS. 7a and 7b illustrate, in the form of flowcharts, the steps of determining the spatialization information according to a third embodiment of the invention in a first and second variant;

FIGS. 8a and 8b illustrate a method of calculating the spatialization and synthesis information of stereo signals using this information according to a third embodiment;

FIG. 9 illustrates an alternative embodiment of an encoder according to the invention;

FIG. 10 illustrates a decoder according to one embodiment of the invention, implementing a decoding method according to several embodiments of the invention;

FIGS. 11a and 11b illustrate, in flowchart form, the decoder determination steps, spatialization information according to a first mode and a second embodiment of the invention, respectively;

FIG. 12 illustrates, in flowchart form, the decoder determination steps, spatialization information according to a third embodiment of the invention;

FIGS. 13a and 13b respectively illustrate a hardware example of an equipment incorporating an encoder and a decoder able to implement the coding method and the decoding method, according to one embodiment of the invention.

With reference to FIG. 3, a parametric encoder of stereo signals according to an embodiment of the invention, delivering both a mono signal and spatial information parameters of the stereo signal is now described.

This parametric stereo encoder as shown uses a G.722 mono coding and extends this coding by operating in wideband with stereo signals sampled at 16 kHz with 5 ms frames. It should be noted that the choice of a frame length of 5 ms is in no way restrictive in the invention which applies equally in variants of the embodiment where the frame length is different, for example from 10 or 20 ms. Each time channel (L and R) sampled at 16 kHz is first pre-filtered by a high pass filter (HPF) eliminating the components below 50 Hz (blocks 301 and 302).

The L and R channels are analyzed by discrete Fourier transform frequencies with overlapping sinusoidal windowing of 50% length 10 ms or 160 samples (blocks 303 to 306). For each frame, the signal (L, R) is weighted by a symmetric analysis window covering 2 frames of 5 ms or 10 ms (160 samples). The 10ms analysis window covers the current frame and the future frame. The future frame corresponds to a "future" signal segment commonly called "Iookahead" of 5 ms.

For the current frame, the spectra obtained,

include 81 complex coefficients, with a resolution of 100 Hz per frequency coefficient. The coefficient of index j = 0 corresponds to the DC component (0 Hz), it is real. The coefficient of index y ^" = 80 corresponds to the frequency of Nyquist (8000 Hz), it is also real, the coefficients of index 0 <j <80 are complex and correspond to a subband of width 100 Hz centered on the frequency of j.

Spectra L 'and R' are combined in block 307 to obtain a mono (downmix) signal M 'in the frequency domain. This signal is converted into time by inverse FFT and windowing-overlap with the "Iookahead" part of the previous frame (blocks 308 to 310).

Since the algorithmic delay of G.722 is 22 samples, the mono signal is delayed (block 311) of T = 80-22 samples in order to allow to reuse the result of the frequency analysis of the blocks 305 and 306 because the accumulated delay between the mono signal and the stereo channels becomes a multiple of the frame length (80 samples).

The current frame of 5 ms of the obtained mono signal is encoded by the G.722 encoder (block 312). However, the invention also applies in variant embodiments where a modified version of G.722 is used, or even an encoder other than G.722.

To synchronize the extraction of stereo parameters (block 314) and the spatial synthesis from the mono signal made to the decoder, a delay of 2 frames must be introduced into the codec. The delay of 2 frames is specific to the detailed implementation here, in particular it is related to symmetrical sinusoidal windows of 10 ms. This delay could be different, for example one could obtain a delay of a frame with an optimized window with a lower overlap between adjacent windows. In a particular embodiment of the invention, illustrated here in FIG. 3, the block 313 introduces a delay of two frames on the spectra L '[j] and R' [j] in order to obtain the spectra L [j] ] and JR [j].

However, it would be more advantageous in terms of the quantity of data to be stored, to shift the outputs of the parameter extraction block 314 or else the outputs of the quantization blocks 315 and 316. It would also be possible to introduce this offset to the decoder at the same time. receiving stereo enhancement layers.

In parallel with the mono coding, the coding of the stereo spatial information is implemented in the blocks 314 to 316.

The stereo parameters are extracted (block 314) and coded (blocks 315 and 316) from the spectrums L ', R and M' offset by two frames: L, R and M.

The channel reduction processing block 307 or "downmix" is now described in more detail.

According to one embodiment of the invention, the latter performs a "downmix" in the frequency domain to obtain a mono signal M '[jfj.

This mono signal M '[j] is calculated by the following formula which defines the amplitude and the phase for each frequency line:

Note that the amplitude of the mono channel can also be determined according to a formula of the type:

Thus, the channel reduction processing of the stereo signal comprises a frequency coefficient calculation, the amplitude of the mono signal as a function of the amplitudes of the channels of the stereo signal performed in the module 330 of the block 307 and a frequency coefficient determination of the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal carried out in the module 331 of the block 307.

It should be noted that an alternative embodiment of the calculation of the amplitude and the phase of this mono signal amounts to calculating in an equivalent way, by frequency coefficient (or frequency line):

M ij] = ij. (LV) + R]) (16)

with

So we find

The compensation factor is calculated and applied in the frequency domain, here coefficient by coefficient, and this factor is calculated from the amplitudes of the stereo channels and not from the energy of these channels and from the signal by means of the stereo channels. Thus the amplitude is maintained everywhere in the spectrum over the entire band to be coded, and not on a dominant frequency zone.

Frequency domain "downmix" processing involves a delay of 5 ms for recovery-addition reconstruction.

However, the encoder presented here uses short frames of 5 ms. The impact on the algorithmic delay of the overall encoder is therefore not too important. This additional delay would be more troublesome with longer frames of the order of 20 ms. Nevertheless, there are solutions to reduce this additional delay, in particular by using an optimized non-sinusoidal window with a lower overlap between adjacent windows. Moreover, the additional complexity due to the "downmix" operation in the frequency domain is limited here to the frequency / time conversion of the mono signal (blocks 308 to 310) because in all cases the time / frequency conversion of the stereo channels (blocks 303 to 306) is necessary for the extraction and coding of the stereo parameters which are defined and calculated in the frequency domain.

To adapt the spatialization parameters to the mono signal as obtained by the "downmix" processing described above, a particular extraction of the parameters by the block 314 is now described with reference to FIG.

For the extraction of the ICLD parameters (block 314), the spectra L [j] and R [j] are divided into 20 sub-frequency bands. These subbands are defined by the following boundaries:

{B (k)} _{t = 0} , .., ₂₀ = [0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 13, 16, 19, 23, 27, 31, 37, 44, 52, 61, 80]

The table above delimits (in number of Fourier coefficients) the frequency subbands of index k = 0 to 19. For example the first subband (k = 0) goes from the coefficient B (k) = to B (k + 1) -l = 0; it is therefore reduced to a single coefficient (100 Hz). Similarly, the last sub-band (k = 19) goes from the coefficient B (k) = 61 to B (k + 1) -l = 79, it comprises 19 coefficients (1900 Hz).

For each frame, the ICLD of the sub-band k-0,.,., 19 is calculated according to the equation:

ICLD [it] = 10.1og 10 dB (18)

where a [k] and <Τ _β [&] respectively represent the energy of the left channel (L) and the right channel (R).

According to a particular embodiment, in a first stereo extension layer (+8 kbit / s), the ICLD parameters are coded by differential non-uniform scalar quantization (block 315) on 40 bits per frame. This quantification will not be detailed here because it goes beyond the scope of the invention.

According to J Blauert, "Spatial Hearing: The Psychophysics of Human Sound Localization", revised edition, MIT Press, 1997 that phase information for frequencies below 1.5-2 kHz is particularly important for obtaining good stereo quality. The time-frequency analysis performed here gives 80 frequency coefficients per frame, a resolution of 100 Hz per coefficient. Since the bit budget is 40 bits and the allocation is, as explained below, 5 bits per coefficient, only 8 lines can be coded. By experimentation the lines of index j ~ 2 to 9 were chosen for this coding of the phase information. These lines correspond to a frequency band of 150 to 950 Hz.

Thus, for the second stereo extension layer (+8 kbit / s) the frequency coefficients where the phase information is the most perceptually important are identified, and the associated phases are coded (block 316) by a technique detailed below. after referring to Figures 6a and 6b using a budget of 40 bits per frame.

Normally so that the decoder can reconstruct the L and R channels, it would be necessary to code two angles by frequency coefficient.

In the first embodiment of the invention described here, a single angle between a first stereo channel, for example here the secondary channel (defined below) and the mono signal defined by the "downmix" processing of block 307, is code.

This information alone is sufficient to find the dominant channel as explained later with reference to Figure 6b.

In a second embodiment, the angle between the dominant channel (defined below) and the mono signal is coded and another minimum information is also coded, on 1 bit, to allow to deduce the angle of the second stereo channel, here, the secondary channel from the other already coded information.

More specifically, the parameters that are transmitted in the second stereo enhancement layer are for each line in the first embodiment:

the angle? j between the mono signal and the secondary channel, coded on 5 bits in

following a uniform scalar quantization and pas-.

In the second embodiment, the parameters that are transmitted in the second stereo enhancement layer are for each line

- 'angle a [j] between the mono signal and the dominant channel, coded on 4 bits in the interval

an indicator b [j] making it possible to choose between / ¾ [] and ¾ [/] coded on 1 bit: b [j] ~ 0 for fi ₀ [f and 1 for β _{ [j]. For each line j considered, therefore, 5 bits. Since the total budget of the second layer is 40 bits per frame, only the parameters associated with 8 sequential coefficients are coded, preferably for the lines of index j-2 to 9.

Figures 4a and 4b show the structure of the bitstream for the encoder in a preferred embodiment. It is a hierarchical binary bit structure derived from scalable coding with G.722 coding for core coding.

The mono signal is thus coded by G.722 at 56 or 64 kbit / s.

In FIG. 4a, the G.722 core coding operates at 56 kbit / s and a first stereo extension layer (Ext.stereo I) is added.

In FIG. 4b, the G.722 core coding operates at 64 kbit / s and two stereo extension layers (ExLstereo 1 and Ext.stereo 2) are added.

The encoder thus operates according to two possible modes (or configurations):

a mode with a bit rate of 56 + 8 kbit / s (FIG. 4a) with a coding of the mono (downmix) signal by a G.722 coding at 56 kbit / s and a stereo extension of 8 kbit / s.

a mode with a bit rate of 64 + 16 kbit / s (FIG. 4b) with a coding of the mono signal (downmix) by a G.722 coding at 64 kbit / s and a stereo extension of 16 kbit / s.

For this second mode, it is assumed that the additional 16 kbit / s are divided into two 8 kbit / s layers, the first of which is identical in terms of syntax (ie coded parameters) to the 56 + 8 kbit mode enhancement layer. / s.

Thus, the bit stream shown in FIG. 4a includes the information on the amplitude of the stereo channels, for example the ICLD parameters as described above.

The bit stream shown in FIG. 4b includes both the stereo channel amplitude information in the first extension layer and the stereo channel phase information in the second extension layer.

The division into two extension layers shown in FIGS. 4a and 4b could be generalized in the case where at least one of the two extension layers comprises both a part of the amplitude information and a part of the amplitude information. information on the phase.

In the embodiments described above, the phase information comprises the phase difference of the mono signal with one of the stereo channels determined as secondary for the first embodiment or dominant for the second embodiment. In the case of the second embodiment, the phase information also includes a minimum indication for deducing the phase difference of the mono signal with the stereo channel determined as secondary. The budget allocated to code this phase information is only one particular example of achievement. It can be lower and in this case take into account only a small number of frequency lines or on the contrary higher and can allow to code a greater number of frequency lines.

Similarly, the coding of these spatialization information on two extension layers is a particular embodiment. The invention is also applicable in the case where this information is coded in a single improvement coding layer.

The determination of the phase information is now explained with reference to FIGS. 6a, 6b and 6c.

We distinguish here two channels for each line j-2 to 9: the dominant channel X [j] and the secondary channel Y [j].

At the decoder these channels are determined as follows:

where î [j] is the amplitude information - defined in equation 42 - which corresponds to the amplitude ratio between left channel and right channel. Thus the dominant channel X [j] is the decoded channel L [j] or R [j] whose amplitude is the strongest. Similarly, the channel

i is the decoded channel L [j] or R [j] whose amplitude is the smallest.

In order to ensure coherence between encoder and decoder, the dominant and secondary channels are defined in the same way as the encoder:

and

IF HE <1

where is information available to the coder (by local decoding). The decision criterion 7 [j] is therefore identical for the coder and the decoder.

The complex vectors associated with the dominant channel X [j] and the secondary channel Y [j] are illustrated in FIG. 6a, where also the angles C £ [J] and β [respective to the mono channel M [j] are defined.

To simplify the notations, the indices of frequency coefficients are not noted on this figure. In particular X [j], Y [j] and M [j] are respectively denoted by X, Y and M in this figure.

Note that the developments are presented here using the original signals X [j], Y [j] and M [j]; however, to make the coder and decoder coherent, it would be possible in a variant of the encoder to use in their place quantized versions X [j], f [j] and M [j] available by hierarchical local decoding of the mono layers and the layers of stereo enhancement. In the preferred embodiment, the original signals X [j] are used at the encoder, ^{and this} in particular avoids local decoding G.722 and makes it possible to reduce the complexity.

The calculation of the "downmix" signal is illustrated in the complex plane, the mono signal M follows the angle of L + R but the amplitude is calculated by an average of the amplitudes of the channels of the stereo signal.

In FIG. 6a, two angles are defined:

- the phase difference between the dominant channel and the mono signal:

a [j} = Z (Xlj] .M ^< UI)

the phase difference between the secondary channel and the mono signal:

where Z (.) is the operator that gives the argument (or phase) of the complex operand.

By definition of the mono signal according to the invention, the angle a [j] is included in

K 7C

interval

2 '2 We now show how it is possible to find the angle cx [j] by assuming known M [j], Y [j] and | Z [jf] | .

The principle of this first embodiment is discussed on the basis of FIG.

6b.

On the theoretical level, the problem can be posed geometrically. According to the supposedly known elements (M [j], Y [j and \ Σ [j] |), we know in Figure 6b the points

M, Y and the angle β [and we look for the angle a [j]. The angle defined by YKO is identical to the angle a [j] and the length YK is identical to [. In the triangle YKO we know two sides and one of the angles

missing.

If we project the secondary channel Y on the line OM, where O is the point of the complex plane corresponding to a zero value and M is the point of the complex plane corresponding to M [j], we find:

| Z [7] |. | Sin a [;] | = | 7 [7] |. | Sin ^ [j} |

In other words, the phase information to be encoded is reduced to the angle β [j], since we find cx j] with the following formula:

To fully understand the coding of the spatialization information for the second stereo extension layer according to this first embodiment, Fig. 5a illustrates a flowchart showing the steps of this encoding.

Thus, in step B501, the amplitude information that corresponds to the amplitude ratio between the left channel and the right channel is decoded. This local decoding is possible because this information is available during the coding of the phase.

Then in step B502 the secondary channel is determined as follows:

YU] = RU]> 1

and

The angles β [representing the angles, by frequency coefficient, between the mono signal M [j] and the secondary channel Y [j] are calculated in B503.

The angle can for example be calculated as follows:

^ [;] = arctan 2 (Re (7.M ^* ), Im (F. ^* )) where the function arctan 2 (x, y) is defined by:

arctan (y I x) x> 0

^ + arctan (y / x) v≥0, x <0

-? zr + arctan (v / x) ν <0, λ '<0

arctan2 (; c, y) =

π / 2 y> 0, x = 0

-π / 2 y <Q, x = 0

undefined y = 0, x = 0

Angles

are quantified in block B504. For example, consider here

7V the case of a uniform scalar quantization on 5 bits and of not - in the interval [-π,

.

We now show how it is possible to find the angle β [assuming known M [j], X [j] and according to a second embodiment. This estimate of the angle β [is illustrated in Figure 6c. If we project the dominant channel X on the line OM, where O is the point of the complex plane corresponding to a zero value and M is the point of the complex plane corresponding to M [j], we find:

| Z [j] |. | Sin «[] | = | y [4 | sin /] | (19)

We can find the angle β []] with the relation:

The equation below allows to find the angle β []] as follows:

where s = +1 or -1 so that the sign of β [β is opposite to that of more precisely:

However, if β ₀ [β satisfies | X ^' ] |. | Sinûf [j] | = | y [] |. | sin /? [j] | , then / ?, [/] defined by:

also checks equation (21).

It is therefore necessary to add an additional information bit to remove the ambiguity between β ₀ [and AL /].

An example of implementation of the principle of calculation of β ₀ [and is given by an example of code in Appendix Al.

Note that for the above estimate to be valid it is necessary that, as shown in Figure 6c, the line OM defined by the mono signal has at least one intersection with the radius circle.

centered on X. In the opposite case, there would be a mathematical incoherence, and the hypothesis on the definition of the mono signal of the form

would be invalid and it would be impossible to deduce the phase of the secondary signal.

The difference in phase a [j] between the dominant channel X and the mono signal M is thus conditioned by the following general constraint:

! χ [4 _η (·]) | <| 7 [ ₇ ·] | (24)

We can deduce:

<arcsin (25)

û This condition must be verified, including the decoder from the decoded parameters.

In order to align the processing between encoder and decoder, it will be possible to use the parameters decoded locally at the encoder and available for the coding of the phase information, which gives the following relation:

<arcsin [j]) if i [j] <1

and

To fully understand the coding of spatialization information for the second stereo extension layer according to this second embodiment, Fig. 5b illustrates a flowchart showing the steps of this encoding.

Thus, in step E501, the angles O [j] are calculated by frequency coefficient.

These angles are those formed by frequency coefficient, the mono channel M [j with the dominant channel X j]. The angle c [j] can for example be calculated as follows:

The angles Ci [j] for j = 2, ..., 9 are then quantized to E502 and the condition of equation (25) above is verified in step E503.

The angles representing the angles, by frequency coefficient, between the mono signal M [j] and the secondary channel Y [j] are calculated in E504.

The angle β [ϊ] can for example be calculated as follows: β [ϊ] = arctan2 (Re (F. ^" *), Im (r. ^{, E} ))

The angles j3 ₀ [j] and / ¾ [] are also determined at E505 as formulated in equations (21) and (23) above. For each frequency line, an indicator b [j] is used to select one of the two angles β ₀ [β or / -¾ [/] by taking the angle β [] as a reference, in step E506. Finally, the indicator bj] is quantified in E507 by a bit, where

In a variant of the second embodiment described above, the secondary channel Y is reconstructed by combining the dominant channel X and the mono signal M multiplied by a gain factor. This calculation is an equivalent formulation in the absence of quantification on X and M and thus replaces the rotation by the angle β detailed above.

The principle of this variant is discussed on the basis of Figure 6c described above.

At the encoder, we have a relationship between the following vectors X, K and M:

| £ - X | = | F | (26)

But the point M being on the right OK, one seeks a gain λ such that:

Κ = λΜ (27)

From Equations (26) and (27), we deduce:

{X _r -XM _r f + {X. -λΜ,) ¹ = | i (28)

where M = M _r + jM _; and X = X _r + jX _{ We thus deduce the following equation:

(Χ _Γ ² + Y ² ) + to ² (M _r ² + M _l ² ) -2A (X _r M _r + X _i M _i ) =

(29)

We obtain an equation of order 2 according to the factor λ:

2 | | -2Α (Ζ _Γ _ + Χ, .Μ ,.) + (| Ζ | ² - | 7 | ² ) = 0 (30)

Equation (30) has two solutions and as a function of λ, which makes it possible to find two candidates for the secondary channel Y:

7 ₀ = ^. -J (31)

Υ ^ λ, Μ - Χ (32)

These two candidates Y ₀ and Y _x correspond to the two points K and K 'represented in FIG. 6c.

Note that for equation (30) to have two solutions, we find the condition: (X _r M _r + X _l M _l ) ² >

-frf) (33)

It can be shown that this condition given in equation (33) is in fact equivalent to the condition given in equation (25).

Thus, in this variant embodiment, in the second stereo enhancement layer, the coded parameters are:

he it

the angle a [j (/=2...9) encoded on 4 bits in the interval following a

'2 uniform scalar pitch quantization;

an indicator b [j] (j = 2 ... 9) making it possible to choose between two possible gains I ^and > ^{coded on 1 bit:} b [j]

j].

The embodiment has been presented from the original amplitude information \ L [j] \,

Note that the phase is frequency-coded and uses amplitude information (L [j], R [j]) coefficient by coefficient. However, the amplitude information - which is transmitted in the form of d1CLD in the first enhancement layer - is encoded by frequency subbands, and these subbands can comprise several frequency coefficients. So we make the following approximation for coding and decoding information in the second layer:

When the frequency line of index 7 ^* corresponds to a sub-band whose size is greater than a single coefficient, then the amplitudes are supposed to be

those obtained by applying the constant amplitude information [j] to the subband.

In a third embodiment of the coding method according to the invention, the channel reduction processing is identical but the spatialization information that is transmitted to the decoder is different. It should be noted that this third embodiment is new and inventive on its own.

As in the first embodiment, a first coding extension layer contains the intensity information of the stereo channels, for example the parameter ICLD as defined above.

However, here, the second extension layer does not contain information on the phase differences of the stereo channels. This second extension layer contains the coded amplitude of the sum of the stereo signals d [j] = | L [y] + i? [/] | . Since the budget available per frame is 40 bits in the particular mode of the encoder described with reference to FIG. 3, it does not make it possible to code the amplitude of the sum

, for all frequency lines. Only frequency lines where this information is identified as perceptually important are used.

In this embodiment the identification of the dominant channel is not necessary. During the description of this embodiment the following notation is used:

[j] is the angle between [/] and L [j] and? [; ^' ] is the angle between M [j] and R [j], regardless of the dominance of a channel.

FIGS. 7a and 7b illustrate flowcharts showing the steps of coding the spatialization information for the second stereo extension layer according to this third embodiment for two variants.

In a first step E701 for FIG. 7a and E711 for FIG. 7b, the amplitude d j] of the sum of the stereo channels is calculated by frequency line.

The quantization of this amplitude, in step E702 for FIG. 7a and E712 for FIG. 7b, can be done directly, by quantizing the value of the amplitude d [j], for example with a scalar quantizer using 4 bits per second. spectral line.

We know that the value of the decoded amplitude d [j] is non-negative and that it must verify the following inequalities which will be referred to as "triangular inequalities":

Indeed, L [jt,.

Equality is also allowed and corresponds to the case where the channels are perfectly in phase or perfectly in opposite phase. It will be considered later, by misuse of language, that these extreme cases also represent a triangle where the length of the longest side is the sum of the lengths of the other two sides. The three angles of this triangle are therefore 0, 0 and π. When the channels are perfectly in phase, <i [7] = j ^' | + | R [j] | , ^ [j] ⁼ 0 and β [ΐ] =. And when the channels are perfectly in opposite phase and for example

The quantization step can also be performed with respect to the value of the amplitude of the mono signal, M [j], already decoded, in the form of a scale factor (or gain), as follows :

where the value of the gain g [j] is quantized, for example with a scalar quantizer using 4 bits per spectral line.

In a similar way to the previous case, the quantified value of g [j] noted g [j] must verify the following inequalities:

In FIG. 7a, it is not checked whether the parameters after quantization satisfy the triangular inequalities, this verification and the possible correction will be made to the decoder. The version shown in FIG. 7b presents a local decoder (not shown in FIG. 3 of the encoder) and produces the quantized values of the parameters | L [7 ^' ] | , and d [j]. Triangular inequalities are verified at the encoder at step E713. In case of detected problem (N in E714) a new quantified value is chosen by the quantizer of d [7] in E712 and that until the verification of the triangular inequalities (O in E714).

In this case, in this case, the quantization index is sent with which the decoded value of d [j] satisfies the triangular inequalities.

In addition to this sum or gain amplitude information, a minimum information (1 bit) denoted b [j] is transmitted to enable the direction of rotation of the left and right channels to be deduced from the mono signal.

Figures 8a and 8b illustrate a geometric example for a selected frequency line, from the values]. On these figures OD corresponds to the amplitude value d [j] on the axis defined by the mono (OM). Figures 8a and 8b illustrate the two possible solutions with the values [y], | £ [/] | _> u] data, the bit of transmitted information allows to choose between these two possibilities.

For example, this bit can indicate the direction of rotation of a channel with respect to the mono signal, that of the left channel is sufficient, that of the other channel must not be transmitted because it is always the opposite of this. first. One can also choose to transmit the sign of the angle for the dominant channel.

For this in steps E703 and E715 of FIGS. 7a and 7b, the angle α [j] that forms one of the channels with the mono signal is determined. Then, in step E704 and E716, FIGS. 7a and 7b are determined, the sign of the angle

and one-bit rotation direction is quantified at step E705 and E707 respectively.

Thus, at the decoder, knowing M [j], | L [/] | , | i? [jr ^' ] | and d [j] we can find the quantified values of the angles â [j] and and so L [y] eti? [].

The embodiment has been presented from the original amplitude information odd by frequency coefficient and uses a coefficient per coefficient. Gold

the amplitude information - which is transmitted in ICLD form in the first enhancement layer - is encoded by frequency subbands, and these subbands can include several frequency coefficients. So we make the following approximation for coding and decoding information in the second layer:

When the frequency line of index j corresponds to a sub-band whose size is greater than a single coefficient, then the amplitudes L [j and R [j] \ are assumed to be those obtained by applying the information of constant amplitude î [j] on the sub-band, either

as defined below. An alternative embodiment of the encoder of FIG. 3 is now presented with reference to FIG. 9.

In this encoder, the block 307 performing the "downmix" processing using the modules 330 and 331 according to the invention also extracts spatialization parameters of the stereo signals through the module 332.

These parameters are determined in accordance with the first, second and third embodiments described above as well as for their variants.

In the first embodiment, the phase difference between the mono signal obtained by the "downmix" processing and one of the stereo channels determined as secondary, is determined by the module 332.

In the second embodiment, the phase difference between the mono signal obtained by the "downmix" processing and one of the stereo channels determined as dominant is determined by the module 332. The indicator making it possible to recover the phase difference between the mono signal and the second channel determined as secondary is also determined by the module 332.

In a variant of this second embodiment, the indicator for determining a gain to be applied to the mono signal to find the secondary channel, is determined.

In the third embodiment, it is the amplitude of the sum of the stereo signals that is determined in the module 332 of the block 307. An indication to find the direction of rotation of the stereo channels is also determined in this module.

In a variant of this third embodiment, it is a gain to be applied to the mono signal that is determined to recover the amplitude of the sum of the stereo signals.

These parameters are shifted by two frames at 313 as the signals L '[j] and R' [;] and as explained with reference to FIG.

The parameter extraction block 314 retrieves these parameters from the block 307 and determines the intensity information parameter, for example the parameter ICLD. This block 314 then transmits all these parameters for a quantification at 315.

The encoders as shown in FIGS. 3 and 9 use FT synthesis analysis with a sinusoidal symmetric window.

However, a shorter or even asymmetric window could be advantageously used to reduce the coding delay. In general, the invention applies similarly for an implementation using a time-frequency analysis difference of a bank of filters by FFT. For example, a bank of frequency filters with a "Modulated Complex Lapped Transform" (MCLT) transformation combining two transforms in quadrature a "Modulated Discrete Cosine Transform" (MDCT) and a "Modulated Discrete Sine Transform" (MDST), or again a filter bank of the type "pseudo quadrature mirror filter" (PQMF) complex.

Furthermore, the principle of the invention also applies to the case where the encoder and the G.722 decoder are replaced by other optionally different encoders of characteristics (flow, length of frames ...).

Referring to Figure 10 a decoder according to an embodiment of the invention is now described.

This decoder comprises a demultiplexer 501 in which the coded mono signal is extracted to be decoded at 502 by a G.722 decoder in this example. The portion of the bit stream (scalable) corresponding to G.722 is decoded at 56 or 64 kbit / s depending on the selected mode. It is assumed here that there is no loss of frames or bit errors on the bit stream to simplify the description, however, known frame loss correction techniques can obviously be implemented in the decoder.

The synthesized mono signal corresponds to M (n) in the absence of channel errors. A short-term discrete Fourier transform analysis with the same windowing as the encoder is performed on M in) (blocks 503 and 504) to obtain the spectrum M [j].

The part of the bit stream associated with the stereo extension is also de-multiplexed. The ICLD parameters are decoded to obtain {lCLD ^q [ί -Ι, Α;]} ^ _q (block 505) and the. phase difference β [j between the secondary channel and the signal M by frequency line is decoded (block 506) to obtain β [j] according to a first embodiment.

According to a second embodiment, it is the difference of phase C ([j between the dominant channel and the signal M by frequency line which is decoded (block 506) to obtain â [j]. The amplitudes of the left and right channels are reconstructed (block 507) by applying the decoded ICLD parameters by subband. This synthesis is carried out as follows:

[and c ₂ [j] are the factors that are calculated from ICLD values by

These factors JJ] and c ₂ [j are for example in the following form:

is defined from the parameter ICLD decod as

where ICLD ^q [j] is the ICLD parameter decoded for the line j.

The ratio 7 [j] is decoded from the information encoded in the first 8 kbit / s stereo enhancement layer. The coding and the associated decoding are not detailed here, but for a budget of 40 bits per frame it can be considered that this ratio is coded by subband and not frequency line, with a non-uniform subband cut. If the decoder operates at 56 + 8 kbit / s for the current frame, only subband-decoded / [/] parameters are used to reconstruct the spectra of the L and R channels, as previously described, ie the equation ( 41).

If the decoder operates at 64 + 16 kbit / s, the decoder also receives the coded information in the second stereo enhancement layer, which makes it possible to decode the parameters β [j] for the lines of index j = 2 to 9 in a first embodiment of the invention and the parameters â [j] and b [j] for the lines of index j ^' = 2 to 9 in a second embodiment.

The decoder defines for each frequency line the dominant channel X [j] and the secondary channel Y [j] as follows:

\ X [j] = ÎU]

IUI> i

Y [jl = R [j]

In a first embodiment, the secondary channel is reconstructed from angles β [j] simply decoded by block 506, simply according to the formula: [; rmn (c ₁ [; lc ₂ [;]). [i] e ^ ^'1 (43b)

The amplitude of the dominant channel is decoded using the decoded mono signal M [y], the decoded secondary channel Y [j] and the amplitude X [j] which is known from the ratio [[j], by the following formula:

The angle â [j] is derived from the following relation: â [j] =

and the dominant channel is reconstructed according to the formula

X [f = max (c, [j], c ₂ [j]). M [j] e (43)

In a second embodiment, the dominant channel is reconstructed from angles α [; ^' ] decoded by block 506, simply by the formula:

X [j] = max {cd C _t UjyMUV * ¹ »(43)

The amplitude of the secondary channel is decoded using the decoded mono signal M [j], the decoded dominant channel X [j] and the known amplitude \ Y [j] from the ratio [j], by the following formula: | [] .min (c ₁ [7], _C2 []) (44)

For each frequency line j = 2 to 9 receive the indicator b [j] whose value allows to choose between two decoded angles j3 ₀ [j] and

The calculation of p _Q [j and p ^ j] is identical to the method given in FIG. 6c and uses the decoded information [[j] and [[j] and not their original values.

From the calculation of β ₀ [β and and the received bit b [j], the secondary channel is reconstructed by rotating by the angle p [j] (the sign of p [j] is opposite to Oc [j] ]) according to the following formula:

Ylj] = M [j] e ^J ^ ^lJ1 (45)

Note that the above formula is only valid and applicable if the following condition is true:

where X [j] is the decoded dominant channel and f [j] is the decoded secondary channel. In the case considered where L is dominant and R is secondary, the condition becomes:

In the opposite case where R is dominant and L is secondary:

<arcsin (| / [;] |) (47)

In the case where this condition is not satisfied, limit the angle â [j] as follows:

archesin (48)

The decoding previously described for the 64 + 16 kbit / s rate then works correctly.

The spectra R [j] and L [j] are deduced from X [j] and f [j] and converted into the time domain by inverse FFT, windowing, addition and overlap (blocks 508 to 513) to obtain the synthesized channels R (n) and L n).

Figure 11a shows the decoding flowchart in the first embodiment, angles to [j] and for the second stereo extension layer.

The angle β [] 1 is decoded in step B1 101.

The secondary channel is reconstructed in step B1 102 according to the formula:

The angle? J] is deduced in step B 1103 from the relation

and the dominant channel is reconstructed in step B 1104 according to the formula:

The stereo signals L [j] and R [j] can thus be synthesized in step B 1105. FIG. 11b shows the decoding flow chart in the second embodiment, angles α [j] and for the second layer extension in stereo. The angle a [j] and the indicator b [j] are decoded at step El 101 and at step El 103, and the quantized values at [j] and b [j] are obtained.

The angles β ₀ &] ^and P _\ \ f \ ^are calculated in step El 102 according to the following equations:

Â [J] = "csto I -S - ^sin at [J] (49)

The value of is used to select the angle, [/] or y-¾ [j] in step El 104.

Stereo signals £ [] and R [j] can thus be synthesized in step El 105.

In a variant of this second embodiment, the dominant channel

X [j] is reconstructed as explained above, from the angles decoded by block 506 and at step El 101.

For each frequency coefficient where this information has been coded, an indicator b [j] is received which is the coded value of b [j] and which makes it possible to choose between and, gains to be applied to the amplitude of the synthesized mono signal.

The secondary channel Y [j] is then reconstructed from the following function:

In a third embodiment, the block 506 of FIG. 10 decodes information by frequency line on the sum of the stereo channels, ie the amplitude of the sum of the channels, or in a variant, a gain to be applied to the amplitude. of the mono signal to obtain the amplitude of the sum of the stereo channels.

In both cases, an indication by frequency line is also decoded at 506. This indication indicates the direction of rotation to be given for one of the stereo channels to be synthesized in the module 507.

FIG. 12 represents the flow diagram of the decoding of spatialization information of the second extension layer corresponding to the codings represented in FIGS. 7a and 7b. At the decoder, after inverse quantization of d [j at the step E1201, knowing M [j], L [j], R [j] and d [j] we can find the quantized values of the angles at [j] and β and thus L [] and -R [j] for example as described below.

If this has not been done with the encoding, one must first check that the triangular inequalities are valid with the parameters quantized in steps E1202 and E1203. If not

(N in E1203), the value of d [j] must be corrected in step E1204, for example as follows:

If d [j] is less than L [j] - Ê [j], we take d [j] = L [j] - | R,

If d [j] is greater than L [j] + R [j], we take <-? [./] = t [j]

The checks made by steps E1202, E1203 and E1204 are also necessary if the probability of bit errors during transmission is not zero.

In other cases these steps are optional.

If the triangular inequalities have been respected for the quantized value of d [j] as given above, L [j], R [j] and d [j] then determine a unique triangle as represented in FIG. 8b whose two angles are the two desired angles: â [j] between d [j] and L [j] and fi [j] between d [j] and R [j].

The absolute values of these decoded angles can be obtained in E1205 and E1206, using the AI-Kashi theorem, also known as the cosine law, according to the following formulas:

As the triangular inequalities have been respected, one is sure, without further verification, that the argument of these functions "arccos" is in the interval] 0,.

The functions "arccos" can thus be calculated and they give unique results in the interval] 0, ..., π [. We check, for example, that in the case where the two channels are in phase, we have

As previously explained, another bit is also transmitted to determine the sign of one of the angles (â [j] in our example), the sign of the other angle (j3 [j]) being opposite to this first.

Thus, in step E1207, the sign of â [j] which determines the direction of rotation for both L [j] and R [j] with respect to [j] is decoded,

Knowing the amplitudes L [j], R [j] and angles and J3 [j]

(rotation) relative to [j], the values L [/] eti [] in step E1208 are easily obtained in the same way as in the other embodiments already presented.

The encoder presented with reference to FIG. 3 and the decoder presented with reference to FIG. 10 have been described in the case of a particular application of hierarchical coding and decoding. The invention can also be applied in the case where the spatialization information is transmitted and received to the decoder in the same coding layer and for the same bit rate.

The encoders and decoders as described with reference to FIGS. 3, 9 and 10 may be integrated in multimedia equipment of the set-top box type or audio or video content player. They can also be integrated into mobile phone type communication equipment.

FIG. 13a represents an exemplary embodiment of such an equipment in which an encoder according to the invention is integrated. This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.

The memory block may advantageously comprise a computer program comprising code instructions for implementing the steps of the coding method in the sense of the invention, when these instructions are executed by the processor PROC, and in particular the coding steps of a mono signal from a channel reduction processing applied to the stereo signal and spatialization information coding of the stereo signal. During these steps, the channel reduction processing comprises a calculation, for frequency coefficient, of the amplitude of the mono signal as a function of the amplitude of the channels of the stereo signal, and additionally a determination for a predetermined set of frequency coefficients, of the phase of the mono signal by the calculation of the phase of the signal by means of the channels of the stereo signal.

The program may include the steps implemented to code the information adapted to this treatment.

Typically, the descriptions of FIGS. 3, 5 and 7 show the steps of an algorithm of such a computer program. The computer program can also be stored on a memory medium readable by a reader of the device or downloadable in the memory space thereof.

Such equipment or encoder comprises an input module adapted to receive a stereo signal comprising the R and L channels for right and left, either by a communication network, or by reading a content stored on a storage medium. This multimedia equipment may also include means for capturing such a stereo signal.

The device comprises an output module adapted to transmit the coded spatial information parameters P _c and a mono signal M from the coding of the stereo signal.

In the same way, FIG. 13b illustrates an example of multimedia equipment or decoding device comprising a decoder according to the invention.

This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.

. The memory block can advantageously comprise a computer program comprising code instructions for implementing the steps of the decoding method in the sense of the invention, when these instructions are executed by the processor PROC, and in particular the decoding steps of a received mono signal, resulting from channel reduction processing applied to the original stereo signal and decoding spatialization information of the original stereo signal. The decoding method further comprises a synthesis of the stereo signals, by frequency coefficient, from the decoded mono signal comprising an amplitude information obtained as a function of the amplitudes of the original stereo channels and a phase information of the signal by means of the original stereo channels and from decoded spatialization information.

Typically, the description of FIGS. 10, 11 and 12 repeats the steps of an algorithm of such a computer program. The computer program can also stored on a memory medium readable by a reader of the device or downloadable in the memory space of the equipment.

The device comprises an input module capable of receiving the coded spatial information parameters P _c and a mono signal M originating, for example, from a communication network. These input signals can come from a reading on a storage medium.

The device comprises an output module capable of transmitting a stereo signal, L and R, decoded by the decoding method implemented by the equipment.

This multimedia equipment may also include speaker type reproduction means or communication means capable of transmitting this stereo signal.

Obviously, such multimedia equipment may include both the encoder and the decoder according to the invention. The input signal then being the original stereo signal and the output signal, the decoded stereo signal.

ANNEX

α = angle (L * conj (M));

If [α |> π / 2

α = -2 * π * sign (a)

end

β = angIe (R * conj (M));

if (sign (ct * β)> 0)

β = β - 2 * π * sign (P)

end pO = asin (| L | / | R | * sin (a))

if (βθ * α> 0)

βΟ = -βΟ;

end

if (β0 <0)

β1 = β0 + π;

else

βΐ = π-βΟ;

end

ϊί (β1 * α> 0)

β1 = -β1

end b = arg min | β - βί |

Claims

A method of parametrically encoding a stereo digital audio signal comprising a step of encoding (312) a mono signal from a channel reduction processing (307) applied to the stereo signal and encoding spatialization information ( 315,316) of the stereo signal, the channel reduction processing comprising a calculation (330), by frequency coefficient, of the amplitude of the mono signal as a function of the amplitude of the channels of the stereo signal, characterized in that it comprises in in addition to determining (331) for a predetermined set of frequency coefficients, the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal.

2. Method according to claim 1, characterized in that the spatialization information comprises a first information (ICLD) on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second information comprising, by frequency coefficient, the phase difference (/ [.] or (xj \) defined between the mono signal and a first predetermined stereo channel.

3. Method according to claim 2, characterized in that the second information further comprises an indication (& [ _, ]) for deriving the phase difference between the mono signal and the second stereo channel.

4. Method according to claim 3, characterized in that the minimum indication is coded on a bit and indicates the choice between two phase differences (β ₀ []], ¾ [.]) Possible between the mono signal and the second signal. stereo channel.

5. Method according to claim 3, characterized in that the minimum indication is coded on a bit and indicates the choice between two possible gains (/ ¾ [/], ¾ [/]) to be applied to the intensity of the mono signal. to find the second stereo channel.

6. Method according to claim 1, characterized in that the spatialization information comprises a first information on the amplitude of the stereo channels and a second information on the amplitude of the sum of the stereo channels.

7. Method according to claim 6, characterized in that the second information comprises, by frequency coefficient, the value of the amplitude of the sum of the stereo channels and a minimum indication to deduce the direction of rotation of the stereo channels.

8. Method according to claim 6, characterized in that the second information comprises, by frequency coefficient, the value of a gain to be applied to the amplitude of the mono signal and a minimum indication to deduce the direction of rotation of the stereo channels. .

9. Method according to one of claims 2 to 8, characterized in that the first information is coded by a first coding layer and the second information is coded by a second coding layer.

A method of parametric decoding of a stereo digital audio signal comprising a step of decoding (502) a received mono signal from a channel reduction processing applied to the original stereo signal and decoding (505, 506) of spatialization information of the original stereo signal,

characterized in that it comprises a synthesis (507) of the stereo signals, by frequency coefficient, from the decoded mono signal comprising an amplitude information obtained as a function of the amplitudes of the original stereo channels and a phase information of the signal by means of the original stereo channels and from decoded spatialization information.

11. Method according to claim 10, characterized in that the spatialization information comprises a first information on the amplitude of the stereo channels (ICLD) and a second information on the phase of the stereo channels, the second information comprising, by frequency coefficient, the angle (? [/] or c [j]) defined between the mono signal and a first predetermined stereo channel.

12. decoding method according to claim 11, characterized in that the second information further comprises a minimum indication (& [./]) for deriving the angle between the mono signal and the second stereo channel.

13. The method of claim 10, characterized in that the spatialization information comprises a first information on the amplitude of the stereo channels and a second information on the amplitude of the sum of the stereo channels.

14. Method according to one of claims 10 to 13, characterized in that the first information is decoded by a first decoding layer and the second information is decoded by a second decoding layer.

15. A parametric encoder of a stereo audio signal having a coding module (312) of a mono signal from a channel reduction processing module (307) applied to the stereo signal and an information coding module of spatialization (315,316) of the stereo signal, the channel reduction processing module comprising a module (330) for calculating, by frequency coefficient, the amplitude of the mono signal as a function of the amplitudes of the channels of the stereo signal,

characterized in that the channel reduction processing module further comprises a determination module (331) for determining a predetermined set of frequency coefficients of the phase of the mono signal by calculating the phase of the signal by means of the channels of the stereo signal. .

16. Parametric decoder of a stereo audio signal comprising a decoding module (502) of a received mono signal, resulting from a channel reduction processing applied to the. original stereo signal and a decoding module (505, 506) of spatialization information of the original stereo signal,

characterized in that it comprises a synthesis module (507) of stereo signals, by frequency coefficient, from the decoded mono signal (M (n)) comprising an amplitude information obtained as a function of the amplitudes of the original stereo channels and signal phase information by means of the original stereo channels and from decoded spatialization information.

Computer program comprising code instructions for carrying out the steps of an encoding method according to one of claims 1 to 8 and or a decoding method according to one of claims 9 to 16, when these are executed by a processor.