EP3391370A1

EP3391370A1 - Adaptive channel-reduction processing for encoding a multi-channel audio signal

Info

Publication number: EP3391370A1
Application number: EP16825835.8A
Authority: EP
Inventors: Bertrand FATUS; Stéphane RAGOT
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2015-12-16
Filing date: 2016-12-13
Publication date: 2018-10-24
Also published as: CN108369810B; WO2017103418A1; CN108369810A; US20190156841A1; FR3045915A1; US10553223B2

Abstract

The invention relates to a method for parametric encoding of a multi-channel digital audio signal, including a step of encoding (312) a mono signal (M) from channel-reduction processing (307) applied to the multi-channel signal and of encoding spatialisation information (315, 316) of the multi-channel signal. Said method is characterised in that the channel-reduction processing includes the following steps, implemented by a spectral unit of the multi-channel signal: extracting (307a) at least one indicator characterising the channels of the multi-channel digital audio signal; selecting (307b), from a set of channel-reduction processing modes, a channel-reduction processing mode in accordance with the value of the at least one indicator characterising the channels of the multi-channel audio signal. The invention likewise relates to a corresponding encoding device and to a processing method which comprises channel-reduction processing as described.

Description

Adaptive Channel Reduction Processing for Encoding a Multichannel Audio Signal

The present invention relates to the field of coding / decoding of digital signals.

The coding and decoding according to the invention is particularly suitable for the transmission and / or storage of digital signals such as audio-frequency signals (speech, music or other).

More particularly, the present invention relates to parametric encoding or processing of multichannel audio signals, e.g. stereophonic signals hereinafter referred to as stereo signals.

This type of coding is based on the extraction of spatial information parameters so that at decoding, these spatial characteristics can be reconstructed for the listener, in order to recreate the same spatial image as in the original signal.

Such a parametric coding / decoding technique is for example described in the document by J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, titled "Parametric Coding of Stereo Audio" in EURASIP Journal on Applied Signal Processing 2005 : 9, pp. 1305-1322. This example is repeated with reference to FIGS. 1 and 2 respectively describing an encoder and a parametric stereo decoder.

Thus, Figure 1 describes a stereo encoder receiving two audio channels, a left channel (denoted L for Left in English) and a right channel (noted R for Right in English).

The time signals L (n) and R (n), where n is the entire index of the samples, are processed by the blocks 101, 102, 103 and 104 which perform a short-term Fourier analysis. The transformed signals L [k] and R [k], where k is the integer index of the frequency coefficients, are thus obtained.

Block 105 performs a channel reduction processing or "downmix" in English to obtain in the frequency domain from the left and right signals, a monophonic signal hereinafter called mono signal.

Extraction of spatial information parameters is also performed in block 105. The extracted parameters are as follows.

The ICLD (InterChannel Level Difference) parameters, also called interchannel intensity differences, characterize the energy ratios per frequency subband between the left and right channels. These parameters make it possible to position sound sources in the stereo horizontal plane by panning. They are defined in dB by the following formula:

where L [k] and ff [/ c] correspond to the spectral (complex) coefficients of the L and R channels, each frequency band of index b comprises the frequency lines in the interval [k _b , k _{b + 1} - 1 ] and the symbol * indicates the complex conjugate.

The ICPD parameters (for "InterChannel Phase Difference" in English), also called t the following relation:

(2)

where Z. indicates the argument (phase) of the complex operand.

It is also possible to define, in a manner equivalent to ICPD, an interchannel time shift called ICTD (for "InterChannel Time Difference" in English), the definition of which is known to those skilled in the art is not recalled here.

Unlike the ICLD, ICPD and ICTD parameters which are location parameters, the ICC (for "InterChannel Coherence") parameters represent the inter-channel correlation (or coherence) and are associated with the spatial width of the data. sound sources; their definition is not recalled here, but it is noted in the article by Breebart et al. that the ICC parameters are not necessary in the subbands reduced to a single frequency coefficient - in fact the amplitude and phase differences completely describe the spatialization in this "degenerate" case.

These parameters ICLD, ICPD and ICC are extracted by analysis of the stereo signals, by the block 105. If the parameters ICTD or ITD were also coded, these could also be extracted by subband from the spectra L [k] and R [k]; however, the extraction of the ITD parameters is in general simplified by assuming an identical inter-channel time shift for each sub-band and in this case a parameter can be extracted from the time channels L (n) and R (n) through inter-correlations.

The mono signal M [k] is transformed in the time domain (blocks 106 to 108) after short-term Fourier synthesis (inverse FFT, windowing and OverLap-Add or overlay) and a mono coding (block 109) is then realized. In parallel, the stereo parameters are quantized and coded in block 110.

In general, the spectrum of the signals (L [k], R [k]) is divided according to a nonlinear frequency scale of ERB (equivalent Rectangular Bandwidth) or Bark type, with a number of subbands typically ranging from 20 to 34. for a sampled signal from 16 to 48 kHz according to the Bark scale. This scale defines the values of k _b and k _{b + 1} for each subband b. The parameters (ICLD, ICPD, ICC, ITD) are encoded by scalar quantization possibly followed by entropy coding and / or differential coding. For example, in the article cited above, the ICLD is encoded by a non-uniform quantizer (ranging from from -50 to +50 dB) with differential entropy coding. The non-uniform quantization step exploits the fact that the higher the value of the ICLD, the lower the auditory sensitivity to variations of this parameter.

For the coding of the mono signal (block 109), several quantification techniques with or without memory are possible, for example coding with "Coded Pulse Modulation" (MIC), its version with adaptive prediction called "Adapted differential pulse coded modulation". "(ADPCM) or more advanced techniques such as transform perceptual coding or Code Excited Linear Prediction (CELP) coding or multi-mode coding.

We are interested here more particularly in the recommendation 3GPP EVS (For

"Enhanced Voice Services") that uses multi-mode encoding. The algorithmic details of the EVS codec are provided in 3GPP specifications TS 26.441 to 26.451 and are therefore not included here. Subsequently, these specifications will be referred to as EVS.

The input signal of the EVS codec is sampled at the frequency of 8, 16, 32 or 48 kHz and the codec may represent audio telephony bands (narrowband, NB), wideband (WB), super-wideband (superband) wideband, SWB) or full band (fullband, FB). The rates of the EVS codec are divided into two modes:

o "EVS Primary":

o Fixed rates: 7.2, 8, 9.6, 13.2, 16.4, 24.4, 32, 48, 64, 96, 128 o Variable rate mode (VBR) with average bit rate close to 5.9 kbit / s for active speech

o "channel-aware" mode at 13.2 in WB and SWB only o "EVS AMR-WB IO" whose bit rates are identical to the 3GPP AMR-WB codec (9 modes)

Added to this is the discontinuous transmission mode (DTX) in which the frames detected as inactive are replaced by SID frames (SID Primary or SID AMR-WB IO) which are transmitted intermittently, approximately once every 8 frames. .

At the decoder 200, with reference to FIG. 2, the mono signal is decoded (block 201), a de-correlator is used (block 202) to produce two versions M (n) and M '(n) of the decoded mono signal. This decorrelation, necessary only when the ICC parameter is used, makes it possible to increase the spatial width of the mono source (n). These two signals M (n) and M '(n) are passed in the frequency domain (blocks 203 to 206) and the decoded stereo parameters (block 207) are used by the stereo synthesis (or formatting) (block 208) to reconstruct the left and right channels in the frequency domain. These channels are finally reconstructed in the time domain (blocks 209 to 214). Thus, as mentioned for the encoder, the block 105 performs a channel reduction processing or "downmix" by combining the stereo channels (left, right) to obtain a mono signal which is then encoded by a mono encoder. The spatial parameters (ICLD, ICPD, ICC, ...) are extracted from the stereo channels and transmitted in addition to the bitstream from the mono encoder.

Several techniques have been developed for channel reduction processing or stereo downmix to mono. This "downmix" can be performed in the time or frequency domain. There are usually two types of "downmix":

- The passive "downmix" which corresponds to a direct matrixing of the stereo channels to combine them in a single signal - the coefficients of the matrix of donwmix are generally real and of predetermined values (fixed);

- Active (adaptive) downmix that includes energy and / or phase control in addition to the combination of the two stereo channels.

The simplest example of passive downmix is given by the following time stamping:

1/2 0 L (n)

M (ri) = (L (n) + ff (n)) =

0 1/2 (3)

This type of "downmix" however has the disadvantage of not conserving the energy of the signals after the stereo to mono conversion when the L and R channels are not in phase: in the extreme case where L (n) = - R (n), the mono signal is zero, which is not desirable.

An active downmix mechanism that improves the situation is given by the following equation:

where y (n) is a factor that compensates for a possible loss of energy.

However, the fact of combining the signals L (n) and R (n) in the time domain does not make it possible to finely control (with sufficient frequency resolution) any phase differences between channels L and R; when the L and R channels have comparable amplitudes and almost opposite phases, phenomena of "erasure" or "attenuation" (loss of "energy") on the mono signal can be observed by frequency subbands with respect to stereo channels.

That is why it is often more advantageous in terms of quality to achieve the

"downmix" in the frequency domain, even if it involves calculating time / frequency transforms and induces additional delay and complexity compared to a time "downmix".

We can thus transpose the previous active downmix with the spectra of the left and right channels, as follows: M [k] = _Y [k] ^{L [k] +} ₂ ^{R [k]} (5) where k corresponds to the index of a frequency coefficient (Fourier coefficient for example representing a frequency subband). The compensation parameter can be set as follows: v \ k \ = max \ 2, --- _{Γ Ί ι} _, (6)

- \ | | i. [fe] + R [fe] | ^{2/2 I}

This ensures that the overall energy of the "downmix" is the sum of the energies of the left and right channels. The y factor [/ c] is saturated here with an amplification of 6dB.

Stereo to mono "downmix" technique of Breebaart et al. cited above is performed in the frequency domain. The mono signal M [k] is obtained by a linear combination of the L and R channels according to the equation:

M [k \ = w ₁ L [k \ + w ₂ R [k \ (7) where Μ _{ΐ 5} w ₂ are complex value gains. If w = w ₂ = 0.5, the mono signal is considered as an average of the two L and R channels. The gains w, w ₂ are generally adapted according to the short-term signal, in particular to align the phases.

A particular case of this frequency downmix technique is proposed in the document entitled "A stereo to mono downmixing scheme for MPEG-4 parametric stereo encoder" by Samsudin, E. Kurniawati, N. Boon Poh, F. Sattar, S. George, in Proc.

ICASSP, 2006. In this document, the L and R channels are aligned in phase prior to performing channel reduction processing.

More precisely, the phase of the channel L for each frequency sub-band is chosen as the reference phase, the channel R is aligned according to the phase of the channel L for each sub-band by the following formula:

R '[k] = e ^j ^{CPD [b]} R [k] (8) where j = - 1, R' [k] is the channel aligned R, k is the index of a coefficient in the b ^th Frequency subband, ICPD [b] is the inter-channel phase difference in the ^sixth frequency subband given by Equation (1).

Note that when the subband of index b is reduced to a frequency coefficient, we find:

R '[k] = \ R [k] \. e ^ ^{L [k]} (9) Finally the mono signal obtained by the "downmix" of the document of Samsudin et al. cited above is calculated by averaging the L channel and the aligned R 'channel, according to the following equation:

_{M [k] =} _LM ₁ R _! M ₍₁₀₎ Phase alignment therefore conserves energy and avoids attenuation problems by eliminating the influence of the phase. This "downmix" corresponds to the "downmix" described in the document by Breebart et al. or:

M [k] = w ^ ik] + w ₂ R [k] (11)

ej.ICPD [b]

with w = 0.5 and w ₂ = in the case where the subband of index b has only a frequency value of index k.

An ideal conversion of a stereo signal to a mono signal should avoid attenuation problems for all frequency components of the signal.

This "downmix" operation is important for parametric stereo coding because the decoded stereo signal is only a spatial shaping of the decoded mono signal.

The downmix technique in the frequency domain described above retains the energy level of the stereo signal in the mono signal by aligning the R channel and the L channel before processing. This phase alignment avoids situations where the channels are in phase opposition.

The method described in the Samsudin document referred to above, however, relies on a total dependence of the "downmix" treatment on the channel (L or R) chosen to set the reference phase.

In extreme cases, if the reference channel is zero ("total" silence) and the other channel is non-zero, the phase of the mono signal after "downmix" becomes constant, and the resulting mono signal will generally be bad. quality; similarly, if the reference channel is a random signal (ambient noise, etc.), the phase of the mono signal may become random or be poorly conditioned with again a mono signal which will generally be of poor quality. An alternative frequency downmix technique has been proposed in the document entitled "Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme" by TMN Hoang, S. Ragot, B. Kovessi, P. Scalart, Proc. IEEE MMSP, 4-6 Oct. 2010. This document proposes a technique of "downmix" which solves the disadvantages of the "downmix" proposed by Samsudin et al .. According to this document, the mono signal M [k] is calculated from the stereo channels L [k] and R [k] by the polar decomposition M [k] = \ M [k] \. ei ^AM ^, where the amplitude \ M [k] | and the phase z [/ c] for each sub-band are defined by:

(\ Μ \! \ = Ι ^ ΜΗ ^ ΜΙ

1 ^{LJ l} 2 (12) [/ C] = (zL [/ c] + Zff [/ c])

The amplitude of M [k] is the average of the amplitudes of the L and R channels. The phase of M [k] is given by the phase of the signal summing the two stereo channels (L + R).

The method of Hoang et al. preserves the energy of the mono signal like the Samsudin et al. method, and it avoids the problem of total dependence of one of the stereo channels (L or R) for the phase calculation z [/ c]. However, it presents a disadvantage when the canals L and R are in quasi-phase opposition in some subbands (with the extreme case L = -R). Under these conditions, the resulting mono signal will be of poor quality.

In the ITU-T G.722 Appendix D codec and in the article "Parametric stereo coding scheme with a new downmix method and a whole band inter channel time / phase differences" by W. Wu, L. Miao, Y. Lang, D Virette, Proc. ICASSP, 2013 another method for managing the phase opposition of stereo signals has been described. The method relies in particular on estimating a full-band phase parameter. It can be verified experimentally that the quality of this method is unsatisfactory for stereo signals where the phase relationship between channels is complex or for stereo speech signals with AB-type sound recording (using two spaced omnidirectional microphones). Indeed, this method consists of calculating the phase of the downmix signal from the phases of the L and R signals, and this calculation can result in audio artifacts for certain signals because the phase defined by short-term FFT analysis is a delicate parameter to interpret and manipulate.

In addition, this method does not directly take into account the phase changes that can appear in successive frames which can possibly cause phase jumps.

Thus, there is a need for a coding / decoding method of limited complexity which makes it possible to combine channels with a "robust" quality, that is to say a good quality whatever the type of multichannel signal, while managing signals in phase opposition, signals whose phase is poorly conditioned (eg a null channel or a channel containing only noise), or signals whose channels have complex phase relationships that are better not not "manipulate", to avoid the quality problems that these signals can create.

The invention improves the situation of the state of the art.

For this purpose, it proposes a method of parametric coding of a multichannel digital audio signal comprising a step of coding a mono signal resulting from a channel reduction processing applied to the multichannel signal and coding signal spatialization information. multichannel. The method is remarkable in that the channel reduction processing comprises the following steps, implemented per spectral unit of the multichannel signal:

extracting at least one indicator characterizing the channels of the multichannel digital audio signal;

selecting, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels of the multichannel audio signal.

Thus, the method makes it possible to obtain a channel reduction processing that is adequate for the multichannel signal to be coded, especially when the channels of this signal are in opposition to each other. phase. In addition, the adaptation of the downmix being performed per frequency unit, that is to say by frequency sub-band or by frequency line, this makes it possible to adapt to the fluctuations of the multichannel signal of a frame to the 'other.

According to a particular embodiment, the method furthermore comprises the determination of a phase indicator, representative of a measurement of degree of phase opposition between the channels of the multichannel signal, and that one of the modes of processing of channel reduction of said set depends on the value of the phase indicator.

A particular downmix processing is thus performed for the signals whose channels are in phase opposition. This treatment is implemented in a manner adapted to the fluctuation of the signal over time.

In an exemplary embodiment, the set of channel reduction processing modes includes a plurality of processing in the following list:

- passive channel reduction processing with or without gain compensation;

adaptive-type channel reduction processing with phase alignment on a reference and / or energy control;

phase-indicator-type hybrid channel reduction processing, representative of a phase opposition degree measurement between the multichannel signal channels;

- combination of at least two modes of passive, adaptive or hybrid processing.

Several types of downmix processing are thus possible for better adaptation to the multichannel signal.

In a particular embodiment, the indicator characterizing the channels of the multichannel audio signal is a correlation measurement indicator between the channels of the multichannel audio signal.

This indicator makes it possible to adapt the channel reduction processing to the channel correlation characteristics of the multichannel audio signal. The determination of this indicator is simple to implement and the quality of the downmix is improved.

In another embodiment, the indicator characterizing the channels of the multichannel audio signal is a phase indicator, representative of a measure of degree of phase opposition between the multichannel signal channels.

This indicator makes it possible to adapt the channel reduction processing to the phase characteristics of the channels of the multichannel audio signal and in particular to the signals which have channels in phase opposition.

The invention relates to a device for parametric coding of a multichannel digital audio signal comprising an encoder able to encode a mono signal coming from a channel reduction processing module applied to the multichannel signal and a quantization module for encoding spatialisation information of the multichannel signal. The device is remarkable in that the channel reduction processing module comprises:

an extraction module capable of obtaining at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal;

a selection module capable of selecting, by spectral unit of the multichannel signal, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels; multichannel audio signal.

This device has the same advantages as the method it implements. The invention also applies to a method of processing a decoded multichannel audio signal comprising channel reduction processing to obtain a mono signal to be restored. The method is remarkable in that the channel reduction processing comprises the following steps, implemented per spectral unit of the multichannel signal:

Thus, it is possible to obtain a mono signal with a good hearing quality, from a multichannel audio signal already decoded. The method makes it possible to carry out a downmix processing adapted to the signal received, in a simple manner.

According to a particular embodiment, the processing method further comprises determining a phase indicator representative of a phase opposition degree measurement between the channels of the multichannel signal and that one of the modes of channel reduction processing of said set depends on the value of the phase indicator.

A particular downmix processing is thus performed for the decoded signals whose channels are in phase opposition. This treatment is implemented in a manner adapted to the fluctuation of the signal over time.

- passive channel reduction processing with or without gain compensation; adaptive-type channel reduction processing with phase alignment on a reference and / or energy control;

- combination of at least two modes of passive, adaptive or hybrid processing. Several types of downmix processing are thus possible for better adaptation to the multichannel signal.

This indicator is used to adapt the channel reduction processing to the channel correlation characteristics of the decoded multichannel audio signal. The determination of this indicator is simple to implement and the quality of the downmix is improved.

The invention also relates to a device for processing a decoded multichannel audio signal comprising a channel reduction processing module for obtaining a mono signal to be reproduced, which is remarkable in that the channel reduction processing module comprises:

an extraction module able to obtain at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal;

This device has the same advantages as the method described above that it implements.

Finally, the invention relates to a computer program comprising code instructions for implementing the steps of an encoding method according to the invention, when these instructions are executed by a processor. The invention finally relates to a storage medium readable by a processor on which is recorded a computer program comprising code instructions for performing the steps of the method as described.

Other features and advantages of the invention will appear more clearly on reading the following description, given solely by way of nonlimiting example, and with reference to the appended drawings, in which:

FIG. 1 illustrates an encoder implementing a parametric coding known from the state of the art and previously described;

FIG. 2 illustrates a decoder implementing a parametric decoding known from the state of the art and previously described;

FIG. 3 illustrates a stereo parametric encoder according to one embodiment of the invention;

FIGS. 4a, 4b, 4c, 4d, 4e and 4f illustrate in flowchart form the steps of the channel reduction processing according to different embodiments of the invention;

FIG. 5 illustrates an example of evolution of an indicator characterizing the channels of a given multichannel signal used according to one embodiment of the invention, for a given signal;

FIG. 6 illustrates an example of possible weightings as a function of the value of an indicator characterizing the channels of a signal according to one embodiment of the invention;

FIG. 7 illustrates a stereo parametric decoder implementing a decoding adapted to the signals coded according to the coding method of the invention;

FIG. 8 illustrates a device for processing a decoded audio signal in which a channel reduction processing according to the invention is carried out, and

FIG. 9 illustrates a hardware example of a device incorporating an encoder able to implement the coding method, according to one embodiment of the invention.

With reference to FIG. 3, a parametric encoder of stereo signals according to an embodiment of the invention, delivering both a mono signal and spatial information parameters of the stereo signal is now described.

This figure shows both the entities, hardware modules or software driven by a processor of the coding device and the steps implemented by the coding method according to one embodiment of the invention.

Here we describe the case of a stereo signal. The invention also applies to the case of a multichannel signal with a number of channels greater than 2.

This parametric stereo encoder as shown uses a standard EVS type mono coding, it works with stereo signals sampled at the frequency sampling F _s 8, 16, 32 and 48 kHz, with 20 ms frames. Subsequently, without loss of generality, the description is mainly given for the case F _s = 16 kHz.

It should be noted that the choice of a frame length of 20 ms is in no way restrictive in the invention which applies equally in variants of the embodiment where the frame length is different, for example from 5 or 10 ms, with another code than EVS.

Moreover, the invention applies similarly to other types of mono coding (eg IETF OPUS, ITU-T G.722) operating at identical or different sampling rates.

Each time channel (L (n) and R (n)) sampled at 16 kHz is first pre-filtered by a High Pass Filter (HPF) typically eliminating components below 50 Hz ( blocks 301 and 302). This pre-filtering is optional, but it can be used to avoid DC bias in estimating parameters such as ICTD or ICC.

The channels L '(n) and ff' (n) coming from pre-filtering blocks are analyzed in frequencies by discrete Fourier transform with overlapping sinusoidal windowing of 50% length 40 ms or 640 samples (blocks 303 to 306) . For each frame, the signal (L '(n), ff' (n)) is weighted by a symmetric analysis window covering 2 frames of 20 ms or 40 ms (ie 640 samples for _s = 16 kHz). The 40ms analysis window covers the current frame and the future frame. The future frame corresponds to a "future" signal segment commonly called "lookahead" of 20 ms. In variants of the invention, other windows may be used, for example an asymmetrical low-delay window called "ALDO" in the EVS codec. In addition, in variants, the analysis windowing can be made adaptive according to the current frame, in order to use an analysis with a long window on stationary segments and an analysis with short windows on transitional / non-transitory segments. stationary, possibly with transition windows between long and short windows.

For the current frame of 320 samples (20 ms at _s = 16 kHz), the obtained spectra, L [k] and R [k] (/ c = 0 ... 320), comprise 321 complex coefficients, with a resolution of 25 Hz by frequency coefficient. The coefficient of index k = 0 corresponds to the DC component (0 Hz), it is real. The coefficient of index / c = 320 corresponds to the frequency of Nyquist (8000 Hz for _s = 16 kHz), it is also real. The coefficients of index 0 <k <160 are complex and correspond to a 25 Hz sub-band centered on the frequency of k.

The spectra L [k] and R [k] are combined in the block 307 described later to obtain a mono (downmix) signal M [k] in the frequency domain. This signal is converted in time by inverse FFT and windowing-recovery with the "lookahead" part of the previous frame (blocks 308 to 310).

The algorithmic delay of the EVS codec is 30.9375 ms at _s = 8 kHz and 32 ms for the other frequencies F _S = 16, 32 or 48 kHz. This delay includes the current frame of 20 ms, the additional delay with respect to the frame length is therefore 10.9375 ms at _s = 8 kHz and 12 ms for the other frequencies (ie 192 samples at F _s = 16 kHz) , the mono signal is delayed (block 311) of T = 320-192 = 128 samples so that the accumulated delay between the mono signal decoded by EVS and the original stereo channels becomes a multiple of the frame length (320 samples). As a result, to synchronize the extraction of stereo parameters (block 314) and the spatial synthesis from the mono signal made at the decoder, the lookahead for the calculation of the mono signal (20 ms) and the mono coding / decoding delay to which is added the delay T to align the mono synthesis (20 ms) correspond to an additional delay of 2 frames (40 ms) compared to the current frame. This delay of 2 frames is specific to the detailed implementation here, in particular it is related to symmetrical sinusoidal windows of 20 ms. This delay could be different. In an alternative embodiment, it would be possible to obtain a delay of one frame with an optimized window with a lower overlap between adjacent windows with a block 311 not introducing a delay (Γ = 0).

The shifted mono signal is then coded (block 312) by the mono EVS encoder, for example at a rate of 13.2, 16.4 or 24.4 kbit / s. In variants, the coding may be performed directly on the non-shifted signal; in this case the shift can be performed after decoding.

It is considered in a particular embodiment of the invention, illustrated here in FIG. 3, that the block 313 introduces a delay of two frames on the spectra L [k], R [k] and M [k] in order to obtain the spectra L _bU f [k], R _bU f [k] and M _bU f [k].

One could more advantageously in terms of the amount of data to be stored, shift the outputs of the parameter extraction block 314 or the outputs of the quantization blocks 315, 316 and 317. It could also be introduced to the decoder at the same time. receiving stereo enhancement layers.

In parallel with the mono coding, the coding of the stereo spatial information is implemented in the blocks 314 to 317.

The stereo parameters are extracted (block 314) and coded (blocks 315 to 317) from the spectrums L [k], R [k] and M [k] offset by two frames: L _bU f [k], R _bU f [k] and M _bU f [k].

The channel reduction processing block 307 or "downmix" is now described in more detail.

The latter, according to one embodiment of the invention, performs a "downmix" in the frequency domain to obtain a mono signal [/ c]. This processing unit 307 comprises a module for obtaining 307a of at least one indicator characterizing the channels of the multichannel signal, here the stereo signal. The indicator may for example be an interchannel correlation type indicator or an indicator of degree of phase opposition between channels. Obtaining these indicators will be described later.

According to the value of this indicator, the selection block 307b, from among a set of downmix processing modes, selects a downmix processing mode which is applied at 307c to the input signals, here to the stereo signal L [ k], R [k] to give a mono signal [/ c].

Figures 4a to 4f illustrate various embodiments implemented by the processing block 307.

To present these figures and to simplify their descriptions, several parameters are first defined: · ICPD parameter [k]

The parameter ICPD [k] is calculated in the current frame for each frequency line k according to the formula:

ICPD [k] = (L [k] .R * [k]) (13)

This parameter corresponds to the phase difference between the L and R channels. It is used here to define the ICCr parameter.

• ICCr parameter [m]

A correlation parameter is calculated for the current frame as follows where N _FFT is the length of the FFT (here N _FFT = 640 for F _S = 16 kHz). In variants, the complex module may not be applied, but in this case the use of the ICCp parameter (or its derivatives) must take into account the signed value of this parameter.

It should be noted that the division in the calculation of the ICCp parameter can be avoided because ICCp (smoothed according to equation (16) below) is then compared to a threshold; It is common to add a low non-zero value to the denominator to avoid a division by zero, this precaution is in fact useless and we can set e = 0 in practice if we calculate separately the numerator and the denominator. In the embodiments of the invention this division is not necessary because the parameter ICCp (or its optionally smoothed version ICCr defined below) is compared a threshold; the lack of division in the implementation is advantageous in terms of complexity. However, to simplify the description that follows, we keep the notation involving a division.

This parameter may optionally be smoothed to mitigate temporal variations. If the current frame is of index m, this smoothing can be calculated with a filter MA (at Adjusted Average) of order 2:

ICCr [m] = 0.5. ICCp [m] + 0.25. ICCp [m-1] + 0.25. ICCp [m-2] (15) In practice, since the division in the definition of ICCr [m] does not have to be explicitly calculated, this MA filter will advantageously be applied separately to the numerator and denominator values.

Subsequently, the ICCr parameter will be used to designate ICCr [m] (without mentioning the index of the current frame); if the smoothing is not applied, the ICCr parameter will correspond directly to ICCp. In variants of other smoothing methods may be implemented, for example using an AR (autoregressive) filter, smoothing the signals.

The ICCr parameter quantifies the level of correlation between the L and R channels when the phase differences between these channels are ignored.

In variants, the parameter ICCp can be defined by subband simply by changing the bounds of the sums, as follows:

Σ l ^{1 →} L [k] .R * [k] ej- ^{, CPD} W

| (Σ¾- ¹ LM.L ^* _M ) (Σ¾- ¹ _R M. _R ^* M) H

where k _b ... k _{b + 1} - 1 represent the indices of the frequency lines in the subband of index b. Again, the ICCp [b] parameter can be smoothed and in this case the invention will be implemented in the following way: instead of having a single comparison to ICCr [m], there will be as many comparisons to ICCp [b] that there are subbands of index b.

• SGN parameter [m]

The dominant channel is also identified for use as a phase reference. For example, this dominant channel can be determined via an SGN sign parameter calculated for the current frame as the sign of the difference in channel levels.

where the function sign) takes as value 1 or -1 if its operand is respectively> 0 or <0.

It is important to note that the reference change (L or R) for the alignment of the mono signal (from the downmix) on the L or R phase is only under certain conditions. This avoids phase problems during the recovery operation. addition after inverse transform, when the phase reference passes arbitrarily from L to R or vice versa.

In the preferred embodiment, it is defined that switching is allowed only when the signal is weakly correlated and that this phase is not used in the current frame because the downmix is in this case passive type (see below). the details the different downmix used). Thus, the value of SGN _d in the current frame will be ignored if this condition is not fulfilled; the phase reference switching will only be allowed when the value of ICCr in the current frame is below a predetermined threshold, for example ICCr <0.4.

We will ask:

Si = 1, SGN [m] = 1 (initial choice arbitrarily fixed to the L channel)

If not

If ICCr [m] <0.4

SGN [m] = SGN _d

End if

In variants, the value of 0.4 may be modified, however it corresponds here to the threshold th1 = 0.4 used later.

In variants, the initial choice SGN [1] can be modified in SGN [1] = SGN _to ensure that the phase reference corresponds to the dominant signal in the first frame, even if it does not include by definition 20 ms of signal over 40 ms used (for the frame size used here preferentially).

In variants, the condition for authorizing a phase reference switching can be defined by frequency line and depend on the type of downmix used to the current frame (of index m) and the type of downmix used to the previous frame (of index m-1); indeed, if the dowmix for the line of index k in the frame m-1 was of the passive type (with compensation of gain) and if the downmix selected at the frame m is a downmix with alignment on an adaptive phase reference, in this case it will be possible to authorize a phase reference switching. In other words, the phase reference switch is forbidden for the index line k as long as the downmix explicitly uses the phase reference corresponding to the parameter SGN.

The SGN sign parameter [m] therefore only changes its value when ICCr is below a threshold (in the preferred embodiment). This precaution avoids changing the phase reference in areas where the channels are highly correlated and potentially in phase opposition. In variants, another criterion may be used to define the phase reference switching conditions. In variants of the invention, the binary decision associated with the calculation of SGN _d may be stabilized to avoid potentially rapid fluctuations. It will thus be possible to define a tolerance, for example of +/- 3 dB, on the value of the level of the channels L and R, in order to implement a hysteresis preventing the change of reference of phase if the tolerance is not exceeded. It will also be possible to apply inter-frame smoothing on the value of the signal level.

In other variants, the parameter SGN _d can be calculated with another definition of the level of the channels, for example:

SGN _d = sign (17) or from the ICLD parameters in the following form:

SGN _d = sign (Σ _{= 1} ^iC ™ M ¹⁰ - B) (18) where B is the number of subbands, or non-equivalent

SGN _d = sign (Σ _{= 1} ICPD [k]) (19) In other variants, the level of the different channels in the time domain can be calculated.

In variants of the invention, the explicit calculation of SGN _d will not be performed and a parameter representing the level of each channel (L or R) will be calculated separately. When using one _of SGN perform a simple comparison between these respective levels. The implementation is in fact strictly equivalent but it avoids explicitly calculating a sign. · ISD parameter [k]

An ISD parameter [k] defined for each line of the current frame and making it possible to detect a phase opposition is also calculated:

^ISD W = | ¾ | (20)

When the L and R channels are of opposite phase, the ISD value becomes arbitrarily large.

It should be noted that the division in the calculation of the ISD parameter can be avoided because ISD is then compared to a threshold; it is common to add a non-zero low value to the denominator to avoid a division by zero, this precaution is here unnecessary because in the embodiments of the invention this division is not implemented. Indeed the comparison ISD [k]> thO is equivalent to the comparison \ L [k] - R [k] \> thO. \ L [k] + R [k] \, which makes the downmix mode selection process attractive in terms of complexity.

In a first embodiment, FIG. 4a illustrates the steps implemented for the channel reduction processing of block 307.

In step E400, an indicator characterizing the channels of the multichannel audio signal is obtained. In the example illustrated here, it is the ICCr parameter as defined above, calculated from the ICPD parameter. The ICCr indicator corresponds to a correlation measurement between the channels of the multichannel signal, in the particular case here between the channels of the stereo signal.

As illustrated in this FIG. 4a, the choice of the downmix depends mainly on the ICCr [m] indicator calculated as previously explained from the L and R channels of the current frame and of any smoothing.

The choice between downmix processing modes is based on the value of the ICCr [m] indicator.

Several downmix processing modes are provided and are part of a set of downmix processing modes.

The calculation of the downmix signal is done by line as follows, using three potential downmixes which are listed below:

1. Downmix of passive type (with gain compensation).

This downmix - / c] is defined as a sum sign with equalization of energy in the form:

L [k] + R [k]

M ₁ [k] =. _Y [k]

where y [k] is defined so that M ₁ [k] is equivalent to:

^. . "",. \ L [k] \ + \ R [k] \

\ M [k] \ =

^ j / c] = z (L [/ c] + R [k)

We define :

\ L [k] \ + \ R [k] \

_Y [K] =

\ L [k] + R [k] \

This downmix is effective for stereo signals (and their frequency decompositions by line or subbands) whose channels are not highly correlated and do not have a complex phase relationship. Since it is not used for problematic signals where the gain y [k] could take large arbitrary values, no limitation of the gain is used here, however in variants a limitation of the amplification could be implemented.

In variants, this equalization by the gain y [k] may be different. For example it would be possible to take the already quoted value:

The advantage of the gain y [k] is that it provides the same level of amplitude for the downmix - / c] as for the other downmixes used. It is therefore preferable to adjust the gain y [k] to ensure a level of amplitude or homogeneous energy between the different downmixes.

2. Downmix with alignment to an adaptive phase reference

This downmix M ₃ [k] is defined as follows:

where the value of SGN is to be understood as the value SGN [m] in the current frame, but to lighten the notations the index of the frame is not mentioned here.

As explained above, the phase of this downmix can also be expressed in an equivalent way as:

This downmix is similar to the downmix proposed by the above Samsudin method, however here the reference phase is not given by the L channel and the phase is determined line by line and not at a frequency band.

The phase is here set according to the dominant channel identified by the parameter SGN.

This downmix is interesting for highly correlated signals, for example for sound signals with AB or binaural type microphones. It may also happen that independent channels have a fairly strong correlation even if it is not the same signal recorded in the L and R channels; to avoid inadvertent switching of the phase reference, it is preferable to allow such switching only when the signals do not present a risk of generating audio artifacts when this downmix is used. This explains the constraint ICCr [m] <0.4 in the SGN [m] parameter calculation when the phase reference switching condition uses this criterion. 3. Hybrid downmix between a passive downmix (with gain compensation) and a dowmix with alignment on an adaptive phase reference, depending on an indicator of degree of phase opposition between the channels (ISD [k], such as defined above).

This downmix M ₂ [k] is defined as follows:

If ISD [k]> thO (th0 = 1.3),

M ₂ [k] = M ₃ [k]

If not

M ₂ [k] = M ₁ [k]

End if

This downmix is applied here in cases where the signals are moderately correlated and where they are potentially in phase opposition. The ISD parameter [k] is used here to detect a phase relation close to the phase opposition, and in this case it is preferable to select the downmix with alignment on an adaptive phase reference M ₃ [k]; in the opposite case the passive dowmix with gain compensation M [k] is sufficient.

In variants the threshold th0 = l, 3 applied to ISD [k] may take other values.

It should be noted that the downmix M ₂ [k] corresponds to either M [k] or M ₃ [k], depending on the value of the ISD parameter [k]. It will be understood that in variants of the invention, it will therefore be possible not to explicitly define this downmix M ₂ [k] but to combine the decisions on the downmix selection and the criterion on ISD [k]. Such an example is given in Figure 4c however it is clear that this example applies of course to all embodiments presented here.

Thus, according to FIG. 4a, if in step E401, the indicator is lower than a first threshold th1, then a first downmix processing mode M1 is implemented in step E402.

If ICCr [m] <0.4 (step E401 with thl = 0.4)

M [k] = M _t [k]

If in step E403, the indicator is less than a second threshold th2, then a second downmix processing mode according to M1 and M2 is implemented in step E404.

If 0.4 <ICCr [m] <0.5 (Step E403 with th2 = 0.5)

M [k] = fl (M ₁ [k], M ₂ [k])

If in step E405, the indicator is less than a third threshold th3, then a third downmix processing mode according to M2 and M3 is implemented in step E406. If 0.5 <ICCr [m] <0.6 (Step E405 with th3 = 0.6)

M [k] = f2 (M ₂ [k], M ₃ [k])

Finally, if in step E405, the indicator is greater than the third threshold th3, then a fourth downmix processing mode M3 is implemented in step E407.

If ICCr [m]> 0.6 (Step E405, N)

M [k] = M ₃ [k]

In variants of the invention, the threshold values th1, th2, th3 may be set to other values; the values given here typically correspond to a frame length of 20 ms.

The functions of weighting of the functions of combinations fl. ,. ) and 2 (.,.) are shown in Figure 6. These combination functions perform a "cross-fade" between different downmixes to avoid threshold effects, that is, too steep transitions between them. respective downmix from one frame to another for a given line. Any weighting functions having complementary values between 0 and 1 are suitable within the defined range, but in the embodiment these functions are derived from the function:

with

A (i [k], M ₂ [k]) = (1 - p). M ₁ [k] + p. M ₂ [k] and

/ 2 ( ₂ [k], M ₃ [k]) = (1 - p). M ₃ [k] + p. M ₂ [k]

It should be noted that the parameter ICCr [m] is here defined at the level of the current frame; in variants this parameter can be estimated by frequency band (for example according to the ERB or Bark scale).

In a second embodiment, FIG. 4b illustrates the steps implemented for the channel reduction processing of block 307. This variant embodiment is intended to simplify the decision on the downmix method to be used and to reduce the complexity. by not fading between two downmix methods.

Steps E400, E401, E402, E405 and E407 are identical to those described with reference to FIG. 4a.

Thus, according to FIG. 4b, if in step E401, the indicator is lower than a first threshold th1, then a first downmix processing mode M1 is implemented in step E402.

If ICCr [m] <0.4 (step E401 with thl = 0.4)

M [k] = fc] If in step E405, the indicator is below a threshold th3, then a second downmix processing mode M2 is implemented in step E410.

If 0.4 <ICCr [m] <0.6 (Step E405 with th3 = 0.6)

M [k] = M ₂ [k]

Finally, if in step E405, the indicator is greater than threshold th3, then a third downmix M3 processing mode is implemented in step E407.

If ICCr [m]> 0.6 (Step E405, N)

M [k] = M ₃ [k]

The methods of downmix M1, M2 and M3 are for example those described above. Note that the downmix M2 is a hybrid downmix between the downmix Ml and M3 which involves another decision criterion on another indicator ISD as defined above.

A strictly identical embodiment in terms of the result of FIG. 4b is shown in FIG. 4c. In this variant, the evaluation of the selection parameters (block E450) and the downmix selection decisions (block E451) are gathered. In a third embodiment, FIG. 4d illustrates the steps implemented for the channel reduction processing of block 307. This variant embodiment is intended to simplify the decision on the downmix method to be used, this time by not using passive downmix - ^ / c]. Indeed, this passive downmix is in fact already included in the hybrid downmix M ₂ [k]; moreover, we can consider that the hybrid downmix is a more robust variant than the downmix M [k] because it makes it possible to avoid the problems of phase opposition.

The downmix in Figure 4d is calculated as follows:

If in step E403, the indicator is below a threshold th2, then downmix processing M2 is implemented in step E410.

If ICCr [m] <0.5 (Step E403 with th2 = 0.5)

M [k] = M ₂ [k]

If in step E405, the indicator is below a threshold th3, then a downmix processing mode according to M2 and M3 is implemented in step E406.

If 0.5 <ICCr [m] <0.6 (Step E405 with th3 = 0.6)

M [k] = f2 (M ₂ [k], M ₃ [k])

Finally, if in step E405, the indicator is greater than threshold th3, then a downmix processing mode M3 is implemented in step E407.

If ICCr [m]> 0.6 (Step E405, N)

M [k] = M ₃ [k] In a variant not shown here, we can not use cross-fade and thus eliminate the decision E405 in Figure 4d.

It will be noted that the embodiment of FIG. 4d is strictly equivalent to that of FIG. 4b by setting th1 to a value <0.

In a fourth embodiment, FIG. 4e illustrates the steps implemented for the block reduction processing of block 307. In this embodiment, the indicator characterizing the channels of the multichannel digital audio signal is the ISD phase indicator. representative of a phase opposition degree measurement of the multichannel signal channels.

It is determined in step E420. For a stereo signal, this parameter is as defined in equation (18) for a spectral line calculation.

Thus, according to FIG. 4e, if in step E421, the indicator ISD [k] is greater than a threshold thO, then a first downmix processing mode is implemented in step E422.

If ISD [k]> 1, 3 (O of step E421 with th0 = 1.3)

then the downmix processing is defined as follows:

ZM [k] = ZL [k]

\ L [k] \ + \ R [k] \

| M [fc] | = ' ₂ '

If in step E421, the ISD indicator [k] is below thO threshold, then a second downmix processing mode is implemented in step E423.

If ISD [k] <1, 3 (N of step E421 with th0 = 1.3)

then the downmix processing Ml [k] is applied. It is defined as follows:

L [k] + R [k]

M [k] = _{2 Y} [k]

Finally, a variant of the determination of the dowmix signal of FIG. 4e is presented in FIG. 4f. In this variant, the main criterion for selecting the downmix mode is defined as the ISD parameter as in FIG. 4e, however this parameter is this time defined by subband in step E430, ISD [b] where b is the index of the frequency subband (typically ERB or Bark). In this variant, when the phase relationship between the L and R channels is close to the phase opposition (ISD threshold [b]> 1, 3), in step E431, the selected downmix mode is this time. This is similar to the method defined in Annex D of G.722 but more directly, without the use of full-band IPD.

Thus, according to FIG. 4f, if in step E431, the ISD indicator [b] is greater than a threshold thO, then a first downmix processing mode is implemented in step E432.

If ISD [k]> 1, 3 (O of step E431 with th0 = 1.3) then the downmix processing is defined as follows (downmix with alignment to an adaptive phase reference, M3):

for / c = k _b ... k _{b + 1} - 1

ZL [k]. \ L [k] \ + ZR [k]. \ R [k] \

ZM [k] =

\ L [k] \ + \ R [k] \

\ M [k] \ =

2

If in step E431, the ISD flag [b] is below thO threshold, then a second downmix processing mode is implemented in step E433.

If ISD [b] <1.3 (N of step E431 with th0 = 1.3)

then the downmix processing is defined as follows (passive downmix with gain compensation, Ml):

for k = k _b ... k _{b + 1} - 1

L [k] + R [k]

M [k] = _{2 Y} [k]

In additional variants, it will be possible to add additional classification / decision criteria in order to refine the choice of the downmix more finely, but at least one decision will be kept between at least two downmix modes depending on the value of at least one characterizing indicator. the multichannel signal channels such as the ICCr parameter or the ISD parameter (on the frame, by subband, or by line).

The examples of downmix selection illustrated in FIGS. 4a to 4f are not limiting. Other combinations or applications of criteria may be considered.

For example, a cross fade could be applied in the embodiment where the criterion is the ISD indicator.

A downmix combining 3 types of downmix with adaptive weights, type

M [k] = pl. - / c] + p2. M ₂ [k] + p3. M ₃ [k] could also be chosen. The weights p1, p2 and p3 are then adapted according to the selection criteria.

FIG. 5 gives an example of the evolution of the parameter ICCr for a given signal with the decision thresholds th3 and th1 set at 0.4 and 0.6 as described in the embodiment of FIG. 4b. Note that these predetermined values are especially valid for a frame of 20 ms and they can be modified if the frame length is different.

This figure shows the fluctuation of this ICCr indicator and the SGN indicator. It is therefore wise to adapt the downmix treatment as best as possible to the evolution of this indicator. Indeed, a significant correlation of the signals for the frames from 100 to 300 for example, can allow an adaptive downmix with alignment on a reference of phase. When the ICCr indicator is between the thresholds th1 and th3, this means that the signal channels are moderately correlated and potentially out of phase. In this case, the downmix to be applied depends on an indicator indicating a phase opposition between the channels. If the indicator reveals a phase opposition, then it is preferable to select the downmix with alignment on an adaptive phase reference defined above by M ₃ [k]. In the opposite case, the passive downmix with gain compensation defined above by M- ^ [k] is sufficient.

The value of the parameter SGN which is also represented in FIG. 5 serves to choose the right phase reference in the case where the correlation indicator is under a threshold, for example 0.4. In the example of FIG. 5, the phase reference therefore passes from L to R around the frame 500.

We will now return to FIG. 3. To adapt the spatialization parameters to the mono signal as obtained by the "downmix" processing described above, a particular extraction of the parameters by the block 314 is now described.

To adapt the spatialization parameters to the mono signal as obtained by the "downmix" processing described above, a particular extraction of the parameters by the block 314 is now described with reference to FIG.

For the extraction of ICLD parameters (block 314), the spectra L _bU f [k] and R _bU f [k] are divided into 35 sub-frequency bands. These subbands are defined by the following boundaries:

2 3 4 6 7 9 11 13 15 18 21 24 28 32 36 41 47 53 59 67 75 84 94 105 118 131 146 163 182 202 225 250 278 308 321]

The table above delimits (in number of Fourier coefficients) the frequency subbands of index b = 0 to 34. For example the first subband (b = 0) goes from the coefficient k _b = Q to k _{b +1} - 1 = 0; it is thus reduced to a single coefficient which represents 25 Hz. Similarly, the last subband (k = 34) goes from the coefficient k _b = 30 & to k _{b + 1} - 1 = 320, it comprises 12 coefficients (300 Hz ). The frequency line of index k = 321 which corresponds to the frequency of Nyquist is not taken into account here.

For each frame, the ICLD of the sub-band b = 0, ..., 34 is calculated according to the equation:

ICLD [b] = 10. log _w ^ (21) where a [b] and σ [b] represent the energy of the left channel {L _bU f [k]) and the right channel (R _buf [k]) :

According to a particular embodiment, the ICLD parameters are coded by differential non-uniform scalar quantization (block 315). This quantification will not be detailed here because it goes beyond the scope of the invention.

Similarly, the ICPD and ICC parameters are encoded by methods known to those skilled in the art, for example with uniform scalar quantization over the appropriate interval.

Referring to Figure 7 a decoder according to an embodiment of the invention is now described.

This decoder comprises a demultiplexer 501 in which the coded mono signal is extracted to be decoded at 502 by a mono EVS decoder in this example. The part of the bitstream corresponding to the EVS mono encoder is decoded according to the bit rate used at the encoder. It is assumed here that there is no loss of frames or bit errors on the bit stream to simplify the description, however, known frame loss correction techniques can obviously be implemented in the decoder.

The decoded mono signal corresponds to (n) in the absence of channel errors. A short-term discrete Fourier transform analysis with the same windowing as the encoder is performed on M (n) (blocks 503 and 504) to obtain the spectrum M [k]. It is considered here that a decorrelation in the frequency domain (block 520) is also applied.

The part of the bit stream associated with the stereo extension is also de-multiplexed. The ICLD, ICPD, ICC parameters are decoded to obtain ICLD ^q [b], ICPD ^q [b] and ICC ^q [b] (blocks 505 to 507). In addition, the decoded mono signal may be decorrelated for example in the frequency domain (block 520). The implementation details of block 508 are not presented here because they go beyond the scope of the invention, but conventional techniques known to those skilled in the art can be used.

The spectra L [k] and R [k] are thus calculated and then converted in the time domain by inverse FFT, windowing, addition and overlap (blocks 509 to 514) to obtain the synthesized channels L (n) and R (n). .

The encoder presented with reference to FIG. 3 and the decoder presented with reference to FIG. 7 have been described in the case of a particular application of stereo coding and decoding. The invention has been described from a decomposition of stereo channels by discrete Fourier transform. The invention is also applicable to other complex representations, such as for example the Modulated Complex Lapped Transform (MCLT) decomposition. combining a discrete modified cosine transform (MDCT) and a discrete modified sinus transform (MDST), as well as the case of Pseudo-Quadrature Mirror Filter (PQMF) filter banks. Thus the term "frequency coefficient" used in the detailed description can be extended to the concept of "sub-band" or "frequency band", without changing the nature of the invention.

Finally, the downmix that is the subject of the invention may be used not only for coding but also for decoding in order to generate a mono signal at the output of a decoder or stereo receiver, in order to ensure compatibility with equipment only. mono. This can be the case for example when going from a sound reproduction to the headphones to a return to a speaker.

FIG. 8 illustrates this embodiment, for example a stereo signal is decoded (L (n), R (n)). It is transformed by the respective blocks 601, 602 and 603, 604 to obtain the left and right spectrums (L [k] and R [k]).

One of the methods as described with reference to FIGS. 4a to 4f is then implemented in the processing block 605, in the same way as for the processing block 307 of FIG. 3.

This processing block 605 comprises a obtaining module 605a of at least one indicator characterizing the channels of the multichannel stereo signal received, here the stereo signal. The indicator may for example be an interchannel correlation type indicator or an indicator of degree of phase opposition between channels.

According to the value of this indicator, the selection block 605b, from among a set of downmix processing modes, selects a downmix processing mode that applies at 605c to the input signals, here to the stereo signal L [ k], R [k] to give a mono signal [/ c].

The encoders and decoders as described with reference to FIGS. 3, 7 and 8 may be integrated in multimedia equipment of the set-top box type or audio or video content player. They can also be integrated into communication equipment of the mobile phone or communication gateway type.

In variants, the case of a downmix of 5.1 channels to a stereo signal is considered. Instead of 2 channels at the input of the downmix, we consider the case of a 5.1 type surround signal defined as a set of 6 channels: L (Front Left), C (Center), R (Front Right), Ls (Left Surround or Rear Left), Rs (Right Surround or Rear Right), LFE (Low Frequency Effects or Subwoofer). In this case, two downmix variants of 5.1 stereo can be applied according to the invention:

· The C and LFE channels can be combined by passive downmix and the result can be combined separately to the L and R channels by applying the embodiments of downmix from 2 channels (stereo) to 1 channel (mono) to obtain channels L 'and R' respectively. Then, the channels L 'and R' can also be combined with respectively Ls and Rs by applying the downmix embodiments of 2 channels (stereo) to 1 channel (mono) to obtain respectively L "and R" channels which are the result of the downmix.

This implementation therefore calls "hierarchically" (in successive steps) to an elementary downmix type 2 to 1 previously described according to different variants.

• In a more general variant, the invention can be generalized to simultaneously combine 3 channels on one side L, Ls, C + LFE and on the other side R, Rs, C + LFE where

C + LFE is the result of a simple passive downmix to directly obtain two L "and R" channels.

In this case, we can define several downmix as in the case of stereo: a passive downmix M [k] of 3 signals with gain compensation, a downmix M ₃ [k] of 3 signals with adaptive alignment of the phase on an adaptive reference (the dominant signal among the 3). In this case, the downmix is obtained according to the generalization:

M [k] = piyCCrlZ ^ CCrlS CCrZS ^. M ^ k]

+ p3 (ICCrl2, ICCrl3, 7CCV23). M ₃ [k]

where the weights p1 and p3 are multivariate functions, for example the ICCrij correlation between each pair of channels i and j (for example, L, Ls, C + LFE) taken in pairs.

In other variants of the invention, the number of input and output channels of the downmix may be different from the stereo to mono or 5.1 to stereo cases illustrated here.

FIG. 9 represents an exemplary embodiment of such an equipment in which an encoder as described with reference to FIG. 3 or a processing device as described with reference to FIG. 8, according to the invention is integrated. This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.

The memory block can advantageously comprise a computer program comprising code instructions for implementing the steps of the coding method in the sense of the invention, or the processing method when these instructions are executed by the processor PROC, and especially the steps of extracting at least one indicator characterizing the channels of the multichannel digital audio signal and selecting, from a set of channel reduction processing modes, a mode of processing of channel reduction according to the value of the at least one indicator characterizing the multichannel audio signal channels.

These instructions are executed for channel reduction processing when encoding a multichannel signal or processing a decoded multichannel signal.

The program may include the steps implemented to code the information adapted to this treatment.

The memory MEM can store the various downmix processing modes to be selected according to the method of the invention.

Typically, the descriptions of FIGS. 3, 4a to 4f show the steps of an algorithm of such a computer program. The computer program can also be stored on a memory medium readable by a reader of the device or equipment or downloadable in the memory space thereof.

Such equipment or encoder comprises an input module capable of receiving a multichannel signal, for example a stereo signal comprising the R and L channels for right and left, either by a communication network, or by reading a content stored on a terminal. storage medium. This multimedia equipment may also include means for capturing such a stereo signal.

The device comprises an output module adapted to transmit a mono signal M from the selected channel reduction processing according to the invention and in the case of a coding device, the coded spatial information parameters P _c .

Claims

A method of parametric coding of a multi-channel digital audio signal comprising a step of encoding (312) a mono signal (M) from channel reduction processing (307) applied to the multichannel signal and encoding information spatialization (315, 316, 317) of the multichannel signal,

characterized in that the channel reduction processing comprises the following steps, implemented per spectral unit of the multichannel signal:

-extracting (307a) at least one indicator characterizing the channels of the multichannel digital audio signal;

selecting (307b), from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels of the multichannel audio signal.

2. Method according to claim 1, characterized in that it further comprises the determination of a phase indicator, representative of a measurement of degree of phase opposition between the channels of the multichannel signal and that a channel reduction processing modes of said set depends on the value of the phase indicator.

3. Method according to one of claims 1 or 2, characterized in that the set of channel reduction processing modes comprises a plurality of processing in the following list:

- passive channel reduction processing with or without gain compensation;

- combination of at least two modes of passive, adaptive or hybrid processing.

4. Method according to one of the preceding claims, characterized in that the indicator characterizing the channels of the multichannel audio signal is a correlation measurement indicator between the channels of the multichannel audio signal.

5. Method according to claim 1, characterized in that the indicator characterizing the channels of the multichannel audio signal is a phase indicator, representative of a measurement of degree of phase opposition between the channels of the multichannel signal.

Parametric coding device for a multichannel digital audio signal comprising an encoder 312 capable of coding a mono signal M from a channel reduction processing module 307 applied to the multichannel signal and a quantization module (315, 316, 317) for encoding spatialization information of the multichannel signal,

characterized in that the channel reduction processing module comprises: - an extraction module (307a) capable of obtaining at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal;

a selection module (307b) capable of selecting, by spectral unit of the multichannel signal, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator; characterizing the channels of the multichannel audio signal.

7. A method for processing a decoded multichannel audio signal comprising channel reduction processing to obtain a mono signal to be restored, characterized in that the channel reduction processing comprises the following steps, implemented per spectral unit of the signal. multichannel:

-extracting (605a) at least one indicator characterizing the channels of the multichannel digital audio signal;

selecting (605b), from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels of the multichannel audio signal.

A device for processing a decoded multichannel audio signal comprising a channel reduction processing module for obtaining a mono signal to be reproduced, characterized in that the channel reduction processing module comprises:

an extraction module (605a) able to obtain at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal;

a selection module (605b) capable of selecting, by spectral unit of the multichannel signal, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator; characterizing the channels of the multichannel audio signal.

9. Computer program comprising code instructions for implementing the steps of the method according to one of claims 1 to 5, when these instructions are executed by a processor.

10. A processor-readable storage medium on which is stored a computer program comprising code instructions for performing the steps of the method according to one of claims 1 to 5.