EP2489206A1

EP2489206A1 - Processing of sound data encoded in a sub-band domain

Info

Publication number: EP2489206A1
Application number: EP10781956A
Authority: EP
Inventors: Marc Emerit; Rozenn Nicol; Grégory PALLONE
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2009-10-12
Filing date: 2010-10-08
Publication date: 2012-08-22
Also published as: WO2011045506A1; US20120201389A1; US8976972B2

Abstract

The invention relates to the processing of sound data encoded in a sub-band domain, for dual-channel playback of binaural or transaural® type, in which a matrix filtering is applied in order to go from a sound representation with N channels with N>0, to a dual-channel representation. This sound representation with N channels consists in considering N virtual loudspeakers surrounding the head of a listener, and, for each virtual loudspeaker of some at least of the loudspeakers: a first transfer function specific to an ipsi-lateral path from the loudspeaker (AVG) to a first ear (OG) of the listener, facing the loudspeaker, and a second transport function specific to a contra-lateral path from said loudspeaker (AVG) to the second ear (OD) of the listener, masked from the loudspeaker by the head of the listener. The matrix filtering applied within the meaning of the invention comprises a multiplicative coefficient ((C/I)AVG) defined by the spectrum, in the sub-band domain, of the second transfer function deconvolved with the first transfer function.

Description

Processing encoded sound data in a subband domain

The invention relates to a sound data processing. In the context of processing sound data in a multichannel format (5.1 or more), we seek to provide a 3D spatialization effect called "Virtual Surround". Such treatments involve filters that aim to reproduce a sound field at the entrances of a person's ear canals. Indeed, a listener is able to locate the sounds in the space with a certain precision, thanks to the perception of the sounds by his two ears. The signals emitted by the sound sources undergo acoustic transformations by spreading to the ears. These acoustic transformations are characteristic of the acoustic channel established between a sound source and a point of the auditory canal of the individual. Each ear has its own acoustic channel, and these acoustic channels depend on the position and orientation of the source relative to the listener, the shape of the head and ear of the listener, but also the acoustic environment (eg a reverberation due to a room effect). These acoustic channels can be modeled by filters commonly called "Head Impulse Responses" or "Head Related Impulse Responses" (HRIR), or "Head Transfer Functions" or HRTF ("Head Related Transfer Functions"). "in English) according to whether we give respectively a representation in the time domain or frequency. Referring to FIG. 1, there is shown a "direct" CD path from a source HP1 to the ear (left) OG of the listener AU (seen from above), this ear OG being situated directly opposite the HPl source. There is also shown a "crossed" path CC between a source HP2 and this same ear OG of the listener AU, the path CC crossing the head TET of the listener AU because the source HP2 is disposed on the other side of the plane median P with respect to the source HP2. In a medium without reverberation (for example an anechoic chamber), considering that the human faces are symmetrical, the HRTFs functions for the left ear and for the right ear (hereinafter "left HRTF" and "right HRTF" respectively ") are identical for the sources which lie in the median plane (plane P which separates the left half of the right half of the body as illustrated in figure 2). Acoustic indices exploited by the brain to locate sounds are often classified into two families of indices:

the so-called "monaural" indices concerning the location of a sound from a single ear, and

- The so-called "interaural" indices concerning the location of a sound by the brain by exploiting the differences between the signals perceived in the left ear and the right ear.

Hereinafter are described known techniques for processing sound data in multi-channel format (for example with more than two speakers) for playback on two speakers only, for example on a headset with an effect 3D spatialization.

By the terms "binaural rendering", it is then understood that headphones listen to audio contents initially in the multi-channel format (for example in the 5.1 format, or other formats delivering more than two channels), these audio contents being processed in particular with a mix of channels to deliver only two signals supplying, in so-called "binaural" configuration, the two mini-speakers (or "headsets") of a conventional stereophonic headphones). Thus, in the transformation of a "multi-channel" format to a "binaural" format, it is sought to offer a quality of spatialization and immersion in headphones that is close to or equivalent to that obtained with a multi-channel rendering system comprising as many distant speakers as channels. Furthermore, the term "transaural playback" means listening on two remote speakers of audio content initially in a multi-channel format. Conventionally, for listening to audio content in 5.1 multi-channel format on a stereo headset or on a pair of speakers, it performs a stamping channels, hereinafter referred to as "downmix" or "downmix". "Downmix" processing is a matrix processing that allows to go from N channels to M channels with N> M. It will be considered in the following that a "Downmix" treatment (since it does not take into account spatialisation effects) does not involve a filter based on HRTF functions. In general, "Downmix" processing matrices used in sound reproduction devices (PC, DVD, TV, etc.) have constant coefficients that are independent of time and frequency. Recent "Downmix" processes now have matrices whose coefficients depend on time and frequency and are adjusted at each instant according to a time and frequency representation of the input signals. This type of matrix makes it possible, for example, to prevent the input signals from canceling each other by adding up. A constant matrix version of a "Downmix" type of processing, named "ITU Downmix", has been standardized by the International Telecommunication Union or "ITU" (for "International Telecommunication Union"). This treatment is applied by implementing the following equations:

S _G = EAVG + E _c * 0.707 + EARG * 0.707

S _R = EAVD + E _c * 0.707 + E _ARD * 0.707,

or :

SG and SR are respectively left and right stereo output signals, EAVG and EAVD are respectively input signals which would have been intended to supply left side speakers AVG and right AVD (illustrated in FIG. 2),

- E _ARG and E _ARD are respectively input signals that would have been intended to supply left rear ARG loudspeakers and right ARD rear speakers, located behind the AU listener of FIG. 2,

Ec is an input signal that would have been intended to power a central loudspeaker C located in front of the AU listener, and

- 0.707 represents an approximation of the square root of 1/2. Such gains can be considered as gains applied to loudspeakers.

By way of example, the treatment referred to below as "ITU Downmix" does not allow the precise spatial perception of sound events. As indicated above, moreover, a "Downmix" type treatment, in general, does not allow spatial perception since it does not involve an HRTF filter. The feeling of immersion that multi-channel content can offer is then lost with headphone listening compared to listening on a system with more than two speakers (for example in the format 5.1 as illustrated in the figure 2). For example, a sound supposed to be emitted by a moving source from the front to the back of the listener, is not correctly reproduced on a simply stereo system (on a headset or a pair of high Speakers). In addition, a sound present only in the S-channel _G (OR SR) and processed by the downmix ITU submix is only output in the left (or right, respectively) atrium in the case of listening. on the headphones, whereas in the case of listening on a system with more than two speakers (for example in the 5.1 format), the right ear (or left, respectively) also perceives a diffraction signal.

In order to overcome these disadvantages, the downmixing process to a binaural format, called "Downmix binaural", has been developed. It consists of placing virtually five (or more) speakers in a sound environment restored on two channels only, as if five sources (or more) were to spatialize for a binaural restitution.

Thus, content in multi-channel format is broadcast on "virtual" speakers in a context of binaural playback. The uses of such a technique are currently found mainly in DVD players (on PCs, on television sets, on lounge readers, or others), and soon on mobile terminals for reading television or video data.

In the "Downmix binaural" process, the virtual loudspeakers are created by the so-called "binaural synthesis" technique. This technique consists in applying sound transfer functions of the head (HRTF) to audio signals monophonic, to obtain a binaural signal that allows, when listening to the headphones, to have the feeling that sound sources come from a particular direction of space. The signal of the right ear is obtained by filtering the monophonic signal by the HRTF function of the right ear and the signal of the left ear is obtained by filtering this same monophonic signal by the HRTF function of the left ear. The resulting binaural signal is then available for headphone listening.

This implementation is illustrated in Figure 3A. A transfer function defined by a filter is associated with each acoustic path between an ear of the listener and a virtual speaker (placed as recommended in the multi-channel format 5.1 in the example shown). Thus, with reference to FIG. 3B, for ten acoustic paths in all:

HCg (respectively HCd) is the filter corresponding to an HRTF for the path between the central loudspeaker C and the left ear OG (respectively right OD) of the listener,

HGg (respectively HDd) is the filter corresponding to an HRTF called "ipsi- lateral" (ear "illuminated" by the loudspeaker) for the direct path (solid line) between the AVG left lateral loudspeaker (respectively right lateral AVD) and the left ear OG (respectively right OD) of the listener,

- HGd (respectively HDg) is the filter corresponding to a so-called "contralateral" HRTF (ear in the "shadow" of the head) for the indirect path (in dashed lines) between the left lateral loudspeaker AVG (respectively right lateral AVD) and the right ear OD (respectively left OG) of the listener,

HGSg (respectively HDSd) is the filter corresponding to an ipsi-lateral HRTF for the direct path (solid line) between the ARG left rear speaker

(respectively back right ARD) and the left ear OG (respectively right OD) of the listener, and

- HGSd (respectively HDSg) is the filter corresponding to a contralateral HRTF for the indirect path (in dashed lines) between the ARG left rear loudspeaker (ARD right rear respectively) and the right OD ear

(respectively left OG) of the listener. A disadvantage of this technique is its complexity since two binaural filters per virtual loudspeaker (an ipsilateral HRTF and a contralateral HRTF), thus ten filters in all in the case of a 5.1 format, are required.

The problem is increased when it comes to manipulating these transfer functions during different treatments such as those according to the MPEG standard and in particular the processing called "MPEG surround" ®. Indeed, with reference to point 6.1 1.4.2.2.2 of the document "Information technology- MPEG audio technologies Part 1: MPEG Surround", ISO / IEC JTC 1 / SC 29 (July 21, 2006), matrix filtering is provided, in the field of subbands m (also denoted by k (k) here), of the type:

to go from two monophonic signals to stereophonic signals in binaural representation.

Indeed, this standard provides an embodiment in which a multi-channel signal is transported in the form of stereo downmix and spatialization parameters (CLD for Channel Level Difference, ICC for Inter-channel). Channel Coherence ", and CPC for" Channel Prediction Coefficient "). These parameters make it possible, in a first step, to implement stereo expansion downmix processing to three L ', R' and C signals. In a second step, they allow the expansion of L signals. , R 'and C to obtain 5.1 signals (denoted L, Ls, R, Rs, C and LFE for "Low Frequency Effect"). In the binaural mode, the C and LFE signals are not separated. Signal C is used for binaural Downmix processing. So here, from two monophonic signals, three signals are first constructed (for respective left channels L ', right R' and center C '). Thus, the notation

designates an expansion processing matrix of stereo signals to these three channels. The following treatments are then:

an expansion treatment of these three channels towards N channels in multi-channel configuration, for example 5 channels in 5.1 format, and

a spatialization processing of N virtual loudspeakers respectively associated with these N channels to obtain a bi-channel, binaural or transaural representation, with:

, for the path of a central loudspeaker associated with the aforementioned channel C

to the left ear,, for the path of the speaker associated with the

central C to the right ear, or the ipsilateral paths to the left ear, , for contra-lateral paths to

the left ear,

¾, for counter-lateral paths to

the right ear,, for ipsilateral trips to the ear

right,

or :

and represent relative gains to apply to the L 'channel signal for

define channels L and Ls, respectively, of the left and right surround virtual speakers in 5.1 format, for sample 1 of the frequency band m in time-frequency transform,

- or represent relative gains to be applied to the signal of the channel R 'for

define R and Rs channels of virtual speakers right and right in the 5.1 format, for the sample / frequency band m in time-frequency transform,

^and ^are phase shifts corresponding to interaural delays, and are weights such as:

We note in particular that:

is the expression of the spectrum of the HRTF transfer function for a path between a central loudspeaker in 5.1 format and the left ear of a listener, is the expression of the spectrum of the transfer function of type HRTF for a path between a 5.1 format center speaker and the right ear of a listener, - is the expression of the spectrum of the HRTF for a path between a loudspeaker

Ambient left in 5.1 format and left ear,

- is the expression of the spectrum of the HRTF for a path between a left surround speaker in 5.1 format and the right ear,

- is the expression of the spectrum of the HRTF for a path between a right surround speaker in 5.1 format and the left ear,

- is the spectrum expression of the HRTF for a path between a speaker

Ambient right in 5.1 format and the right ear,

is the spectrum expression of the HRTF for a path between a right-hand speaker in 5.1 format and the left ear, and

- is the expression of the spectrum of the HRTF for a path between a right speaker in 5.1 format and the right ear, - is the expression of the spectrum of the HRTF for a path between a left loudspeaker in 5.1 format and the left ear, and

- is the spectrum expression of the HRTF for a path between a speaker

left in 5.1 and the right ear.

There are thus ten filters associated with the aforementioned HRTFs transfer functions for the 5.1 format to a binaural representation in this example. It follows the complexity problem posed by this technique, requiring two binaural filters per virtual speaker (an ipsi-lateral HRTF and a contra-lateral HRTF).

The present invention improves the situation.

For this purpose, it first proposes a method for processing sound data encoded in a subband domain, for a bi-channel reproduction of binaural or transaural® type, in which a matrix filtering is applied to pass from an N channel sound representation with N> 0, at a bi-channel representation, this N channel sound representation of considering N virtual speakers surrounding the listener's head, and for each virtual speaker of at least part of the speakers:

a first transfer function specific to an ipsi-lateral path from the loudspeaker to a first ear of the listener, facing the loudspeaker, and

a second transfer function specific to a contra-lateral path of said loudspeaker towards the second ear of the listener, masked from the loudspeaker by the head of the listener.

Advantageously, the applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.

A first advantage that arises from such a construction is the significant reduction in the complexity of the treatments. Already, as will be seen in detail below, central virtual speaker transfer functions no longer need to be taken into account. Thus, it is not necessary to take into account the transfer functions of all the virtual speakers, but only a part of the virtual speakers.

Another simplification that follows from the construction in the sense of the invention is that it is no longer necessary to provide a transfer function for the ipsi-lateral paths. For example, in the case of a matrix filtering to go from a sound representation to M channels, with M> 0, to a bi-channel representation (binaural or transaural), passing through an intermediate representation on the N channels, with N> 2, as in the case of the standard described above, the coefficients of the matrix are expressed, for a lateral path, notably as a function of respective spatialization gains of the M channels on the N loudspeakers virtual images located in a hemisphere around a first ear, and spectra of contra-lateral transfer function, relative to the second ear of the listener, deconvolved by the ipsilateral transfer function, relative to the first ear. However, advantageously, for an ipsi-lateral path, the coefficients of the matrix are no longer expressed as a function of the HRTFs spectra but simply as a function of the spatialization gains of the M channels on the N virtual speakers located in a field. hemisphere around a first ear.

Thus, if the N-channel representation comprises, by hemisphere around an ear, at least one direct virtual speaker and a virtual ambience speaker as in the "virtual surround", the coefficients of the matrix s' expressing, in a domain of time-frequency transform subbands (for example of the "P MF" type for "Pseudo-Quadrature Mirror Filters"), by:

If the HRTF functions are symmetrical we have

- for counter-lateral paths to

the left ear;

-, for the counter-lateral paths to

the right ear; "only, for ipsilateral trips to the ear

left;

- only, for ipsi-lateral trips to the right ear,

or :

- and represent relative gains to apply to the same first signal

(eg the L 'channel signal in an initial three-channel configuration, as described above) to define L and Ls channels respectively of the left and right virtual surround speakers, for the sample / of the frequency band m in time-frequency transform,

- or represent relative gains to apply to the same second signal (for example the channel R ') to define R and Rs channels of the virtual right and right surround speakers, for the sample 1 of the frequency band m in time-frequency transform,

- or is the spectrum expression of the HRTF transfer function

contra-lateral, relative to the right ear of the listener, deconvolved by an ipsi-lateral transfer function, relating to the left ear, for a virtual left speaker, direct or respectively ambient,

- or is the spectrum expression of the HRTF transfer function

contra-lateral, relative to the left ear of the listener, deconvolved by an ipsi-lateral transfer function, relative to the right ear, for a virtual right speaker, direct or respectively ambient,

- and are phase shifts between counter-lateral transfer functions and

ipsi-lateral corresponding to selected interaural delays, and - are selected weights.

Typically, the coefficient g may have an advantageous value of 0.707 (corresponding to the root of 1/2, when a half energy distribution of the signal of the central loudspeaker is provided on the side loudspeakers), as recommended in the "Downmix ITU" treatment.

More precisely, by the implementation of the invention, the matrix filtering is expressed according to a product of matrices of type:

or :

- W ^{l, m} represents the expansion processing matrix of stereo signals to M 'cana x, with M'> 2 (for example M '= 3), and represents a matrix treatment

global comprising:

an expansion process of M 'channels towards the N channels, with N> 3 (for example 5, for a format 5.1), and

a spatial processing of the N virtual speakers respectively associated with the N channels to obtain a bi-channel, binaural or transaural® representation. Another disadvantage of the "Downmix binaural" method in the sense of the prior art is that it does not respect the tone of the initial sound, which is well reproduced by the "Downmix" treatment, because the binaural processing filters resulting from the HRTFs strongly modify the signal spectrum and thus provide "coloring" effects compared to "Downmix". The vast majority of users prefer the "Downmix" even if the "Downmix binaural" actually provides an extracranial spatial perception of sounds. The disadvantage of the de-stamping (or "coloring") provided by the "Downmix Binaural" is not compensated by the contribution of spatialisation effects, according to the feeling of the users.

Here again, the construction in the sense of the present invention improves the situation. The implementation of the invention as described above makes it possible to preserve any perceived distortion of the sound sources from any distortion.

Indeed, the filtering of the contralateral component defined by the counter-lateral transfer function deconvolved by the ipsilateral transfer function makes it possible to reduce the stamp distortion provided by the binauralization processing. As will be seen below, such a filtering returns to a low-pass filtering delayed by a value corresponding to the interaural delay. It is advantageous to choose a cut-off frequency of the low-pass filter for all the HRTF pairs at about 500 Hz, with a very large filter slope. The brain perceives, on one ear, the original signal (without treatment) and, on the other ear, the delayed and filtered signal passes low. Beyond the cutoff frequency, the difference in perceived level compared to the diotic listening of the attenuated moose signal of 6dB, is minimal. On the other hand, under the cutoff frequency, the signal is perceived twice as strong. For signals containing frequencies below the cutoff frequency, the difference in timbre will therefore consist of an amplification of the low frequencies.

Such de-stamping may advantageously be eliminated simply by high-pass filtering, which may be the same for all HRTFs transfer functions (speaker directions). In the case of a treatment for a binaural reproduction, the above-mentioned de-stamping can advantageously be applied to the binaural stereo signal resulting from the submixing. In addition, to avoid a loudness difference between the results of a "Downmix" type of treatment and a binauralization treatment within the meaning of the invention, an automatic gain control can be advantageously provided at the end of the treatment, so that so that the levels that the Downmix processing and the Binauralization process would deliver in the sense of the invention are Similar. For this purpose, as will be seen in detail below, there is provided at the end of the processing chain a high-pass filter and an automatic gain control.

Thus, in more generic terms, a selected gain is also applied to two left-channel and right-channel signals in two-channel representation (binaural or transaural®), before restitution, the selected gain being controlled to limit a signal energy. of left and right channels, at most, to a signal energy of the virtual loudspeakers. In a practical implementation, preferential automatic gain control is applied to the two left and right channel signals, downstream from the application of the variable frequency weighting factor.

It is furthermore advantageous to use the process according to the invention to eliminate the color distortion provided by the usual binauralization treatment. Indeed, it appears that the treatment for reducing color distortion is very simple to perform when it is used in the transformed domain of the subbands. Indeed, the equations above giving the matrix coefficients simply become:

The "Gain" weighting in the above equations being such that, in one exemplary embodiment:

Gain = 0.5 if the frequency band of index m is such that m <9 (or if the frequency f is itself less than 500 Hz) and Gain = 1, otherwise.

Thus, in more generic terms, the coefficients of the aforementioned matrix and intervening in the matrix filtering vary according to the frequency, according to a weighting of a factor (Gain) chosen and less than one, if the frequency is lower than a threshold chosen, and one if not. In the embodiment given above, the factor is about 0.5 and the chosen frequency threshold is about 500 Hz to eliminate a color distortion.

It is also possible to apply this gain directly at the output of processing, in particular to the output signals before playback on speakers or earphones, by applying to the equations:

aforementioned gain, as follows

"Gain" weighting and automatic gain control can also be integrated into a single treatment, as follows:

if the frequency band of index m is such that m <9 (or if the frequency

even less than 500 Hz) and, if not.

Another advantage provided by the invention is the transport of the encoded signal and its processing with a decoder to improve its sound quality, for example a decoder type MPEG Surround ®. In the context of the invention where no transfer function is applied for the direct paths (ipsi-lateral contributions) and additional processing is provided on the indirect paths (spectrum of the counter-lateral transfer function deconvolved by the function ipsi-lateral transfer), it is interesting to note that by applying a gain of 0.707 to the signals of the center, and surround (left and right) channels, then the untreated part of the stereo submix (ipsilateral contributions) has the same form as the result of ITU Downmix processing. The above can be generalized to any type of downmix processing. Indeed, downmix processing to two channels usually involves applying weighting to the channels (virtual speakers), then summing the N channels to two output signals. Applying binaural spatialization processing to Downmix processing involves applying to the N weighted channels the HRTF filters corresponding to the positions of the N virtual speakers. Since these filters are equal to 1 for the ipsi-lateral contributions, we find the Downmix treatment by applying the sum of the ipsi-lateral contributions.

Thus, the signals obtained by a binauralization processing in the sense of the invention are presented as being derived from a sum of Downmix type signals and a stereo signal comprising the localization indices necessary for the brain to perceive the spatialization of the sounds. . This second signal is hereinafter referred to as "Downmix Binaural Additionnel", so that the treatment in the sense of the invention here called "Downmix Binaural" is such that:

"Downmix Binaural" = "Downmix" + "Downmix Binaural Additional".

This last equation can be generalized to:

"Downmix Binaural" = "Downmix" + a "Downmix Binaural Additionnel"

In this equation, a can be a coefficient between 0 and 1. For example, a listener user can choose the level of coefficient a between 0 and 1, continuously or by switching between 0 and 1 (in "ON-OFF" mode). . So, we can choose a a weighting of the second treatment "Downmix Binaural Additional" global processing using matrix filtering within the meaning of the invention.

We can also consider the a-weighting in this equation as a quantization function, for example based on an energy thresholding of the result of the DBA treatment for "Downmix Binaural Additional" (with for example, α = 0 if the result of the DBA treatment presents , in a given spectral band, an energy below a threshold, and = 1, otherwise, for this same spectral band). This embodiment has the advantage of requiring only a low bandwidth for the transmission of the results of Downmix and DBA processing, from an encoder to a decoder as shown in FIG. 7 described below, by only requesting the bit rate if the result of the DBA treatment is significant compared to the result of the Downmix. Of course, it is possible to provide different thresholds with, for example, α = 0; 0.25; 0.5; 0.75; 1. This additional signal requires only a small amount of flow to transport it. Indeed, it presents itself as a residual signal, filtered low-pass and thus a priori much less energetic than the Downmix signal. In addition, it has redundancies with the Downmix signal. This property can be exploited advantageously in conjunction with Dolby Surround, Dolby Prologic or MPEG Surround type codecs.

The "Downmix Binaural Additional" signal can then be compressed and transported additionally and / or scalable to the Downmix signal, with little bit rate. When listening to the headphones, the addition of the two stereo signals allows the listener to take full advantage of the binaural signal with a quality very close to a 5.1 format.

Thus, it is enough to decode the signal "Downmix Binaural Additionnel" and add it directly to the downmix signal. It is possible to provide a scalable encoder, carrying for example by default a stereo signal without binauralizing effect, and, if the bit rate allows, carrying further an additional signal overlay for binauralization. In the case of the MPEG Surround encoder, in which it is currently planned, in one of its operating modes, to carry a stereo signal (Downmix type) and perform binaural processing in the coded (or transformed) domain, we obtain a reduced complexity and a better rendering quality. In the case of a headset rendering, the decoder simply has to calculate the signal "Downmix Binaural Additional". The complexity is reduced, without any risk of degradation of the Downmix type signal. The sound quality can only be improved.

Such characteristics are summarized as follows: matrix filtering within the meaning of the invention consists in applying, in an advantageous embodiment:

a first process of downmixing the N channels to two stereo signals (for example of the Downmix type), and

a second processing leading, when executed in conjunction with the first processing, to a spatialization of the N virtual loudspeakers respectively associated with the N channels to obtain a bi-channel, binaural or transaural representation.

Advantageously, the application of the second processing is decided optionally (for example as a function of the bit rate, the spatialized rendering capabilities of a terminal, or others). The first aforementioned treatment can be applied in an encoder communicating with a decoder, while the second treatment is advantageously applied to the decoder.

The treatment management in the sense of the invention may advantageously be conducted by a computer program comprising instructions for implementing the method according to the invention, when this program is executed by a processor, for example with a decoder in particular . In this respect, the invention also aims at such a program. The present invention also relates to a module equipped with a processor and a memory and capable of executing this computer program. A module in the sense of the invention, for the processing of sound data encoded in a subband domain, for binaural or transaural® bi-channel rendering, then comprises means for applying matrix filtering to switch from a representation. N channel sound with N> 0, to a two-channel representation. The N-channel sound representation consists of considering N virtual loudspeakers surrounding a listener's head, and, for each virtual loudspeaker of at least part of the loudspeakers:

The applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.

Such a module may advantageously be a decoder of the MPEG Surround® type and furthermore include decoding means of the MPEG Surround® type, or may alternatively be implanted in such a decoder. Other features and advantages of the invention will appear on examining the detailed description below, and the attached drawings in which:

- Figure 1 shows schematically a restitution on two speakers around the head of a listener;

- Figure 2 shows schematically a reproduction of five speakers in 5.1 multi-channel format;

FIG. 3A schematically represents the ipsilateral (solid lines) and counter-lateral (dashed lines) paths in multi-channel 5.1 format;

Fig. 3B shows a prior art processing scheme for switching from a multi-channel 5.1 format illustrated in Fig. 3A to a binaural or transaural format; FIG. 4A schematically represents the ipsilateral (solid lines) and contra-lateral (dashed lines) paths in multi-channel 5.1 format, with the ipsilateral and counter-lateral paths of the central loudspeaker;

FIG. 4B represents a processing diagram for the transition from a multi-channel 5.1 format illustrated in FIG. 4A to a binaural or transaural format, with only four filters in an embodiment within the meaning of the invention;

FIG. 5 illustrates a treatment equivalent to the application of one of the filters of FIG. 4B;

FIG. 6 illustrates an additional processing of high pass filtering and automatic gain control to be applied to the outputs SG and SD to avoid a color distortion and a difference in tone between a "downmix" treatment and a treatment according to the invention. invention;

- Figure 7 illustrates the situation of a treatment in the sense of the invention, made with the encoder in an exemplary embodiment of the invention, particularly in the case of an additional DBA treatment to be combined with the Downmix treatment.

FIG. 4A is firstly described to describe an example of implementation of the processing to switch from a multi-channel representation (format 5.1 in the example described) to a binaural or transaural stereo two-channel representation. . In this figure, five speakers configured in 5.1 format are illustrated:

a front loudspeaker C situated facing the listener, in a median plane (plane P of FIG. 2),

- an AVG left-side speaker,

a right side speaker AVD, and

an ARG left rear loudspeaker to produce a so-called "surround" effect,

- an ARD right rear speaker to produce a so-called "surround" effect.

Referring now to FIG. 4B, the reproduction of the audio content in binaural or transaural context is intended to be performed on a first channel SG and a second channel SD, this content being initially encoded in a multi-channel format (to N channels with N = 5 in the example described) in which each channel is associated with a loudspeaker position relative to the listener (Figure 4A).

Advantageously, the channels associated with speaker positions (for example the AVG and ARG loudspeakers of FIG. 4A) in a first hemisphere with respect to the listener (that of the left ear OG) are grouped together and applied. directly to the SQ channel of Figure 4B. The channels associated with the positions of the AVD and ARD loudspeakers in a second hemisphere relative to the listener (that of his right ear OD) are grouped together and applied directly to the other SD channel of Figure 4B. It is specified that the first and second hemispheres are separated by the median plane of the listener. Since these signal components AVG, ARG are directly applied to the SQ channel, on the one hand, and the signal components AVD, ARD are directly applied to the SD channel, on the other hand, it will be noted, in the example of FIG. Figure 4B, that no particular treatment is applied to them.

Referring back to FIG. 4B, the AVG and ARG channels associated with positions of the first hemisphere are grouped and also applied to the second SD path, and the AVD and ARD channels associated with positions of the second hemisphere are grouped together and applied as well. to the first SG-way Here, we plan an additional treatment to apply:

at each AVG and ARG channel of the first hemisphere destined for the second path SD, and

each AVD and ARD channel of the second hemisphere for the first SQ channel. The additional treatment preferably comprises the application of filtering (C / I) AVG, (C / I) _A VD, (C / I) ARG, (C / I) ARD (FIG. 4B) defined, in the coded domain (or transformed) by the spectrum of a counter-lateral acoustic transfer function deconvolved by an ipsilateral transfer function. Specifically, the ipsi-lateral transfer function is associated with a direct acoustic path Uvc IAVD, RG, URD (FIG. 4A) between a speaker position and an ear of the listener and the counter-lateral transfer function is associated with an acoustic path CAVG, CAVD, CARG _* CARD (Figure 4A) passing through the listener's head, between the above speaker position and the other ear of the listener.

Thus, for each channel associated with a virtual speaker located outside the median plane (so all the speakers except the front speaker), the spatialization of the virtual speaker is provided by a pair of transfer functions HRTF (expressed in the frequency domain) or HRIR (expressed in the time domain). These transfer functions translate the ipsi-lateral path (direct path between the loudspeaker and the closest ear in solid lines in FIG. 4 A) and the contra-lateral path (path between the loudspeaker and the ear masked by the listener's head in dashed lines in Figure 4A).

Rather than using raw transfer functions for each path as in the meaning of the prior art, the filter associated with the ipsi-lateral path is advantageously omitted and a filter corresponding to the transfer function is used for the contra-lateral path. counter-lateral deconvolved by the ipsilateral transfer function. Thus, for each virtual speaker (except the central speaker C), only one filter is used.

Thus, with reference to FIG. 4B:

the referenced filter (C / I) ARG is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the rear left speaker ARG and the right ear OD deconvolved by the function of ipsi-lateral transfer of the path between the left rear loudspeaker ARG and the left ear OG of the individual, - the filter referenced (C / I) _A RD is defined, in the transformed domain, by the spectrum of the function counter-lateral transfer of the path between the rear right speaker

ARD and the left ear OG deconvoluted by the ipsi-lateral transfer function of the path between the rear right speaker ARD and the right ear OD of the individual, - the referenced filter (C / I) _A VG is defined in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the left lateral loudspeaker AVG and the right ear OD deconvolved by the ipsilateral transfer function of the path between the AVG left lateral speaker and the left ear OG of the individual, and

the referenced filter (C / I) AVD is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the right lateral loudspeaker AVD and the left ear OG deconvolved by the function of ipsilateral lateral transfer of the path between the right lateral speaker AVD and the right ear OD of the individual.

Furthermore, the signal which, in encoding 5.1, is intended to supply the central loudspeaker C (in the median plane of symmetry of the listener's head), is distributed in two fractions (preferably equal to 50% and 50%) on two channels adding to two respective channels of the left and right side speakers. Similarly, if a rear speaker is provided in the middle plane, the associated signal is mixed with the signals associated with the ARG left rear speaker and ARD right rear speaker. Of course, if there are several central loudspeakers (front speaker for a reproduction of the midrange frequencies, front speaker for a reproduction of low frequencies, or other) their signals are added and distributed again on the signals associated with the side speakers.

As the channel associated with a central speaker position C, in the median plane, is divided into a first and a second signal fraction, respectively added to the AVG speaker channel in the first hemisphere (around the ear left OG) and to the AVD loudspeaker channel in the second hemisphere (around the right ear OD), it is not necessary to provide for filtering by the transfer functions associated with the loudspeakers in the plane median, without any change in the perception of the spatialization of the sound stage in binaural or transaural restitution.

Of course, it is also possible to provide a transition processing from a multi-channel format to N channels, with N still greater than 5 (format 7.1 or others) to a binaural format. For this purpose, it is sufficient, by adding two additional side speakers, to provide the same types of filters (represented by the HRTF contra-lateral deconvolved by the ipsi-lateral HRTF) for example for two additional speakers in the original 7.1 format.

The processing complexity is greatly reduced since the filters associated with the loudspeakers located in the median plane are removed. Another advantage is that the coloring effect of the associated signals is reduced.

The spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined in the transformed domain by:

the gain of the transform of the counter-lateral transfer function deconvolved by the ipsilateral transfer function, and

the delay defined by the difference of the respective phases of the counter-lateral and ipsilateral transfer functions,

and possibly according to an estimate of coherence between the left channel - and the right channel, in particular in the case of a single initial mono source to be spatialized in format 5.1 then in binaural format (this case being described later).

As a first approximation, one can simply consider that the ratio of the respective gains of the transforms of the transfer functions, in each frequency band considered, is close to the gain of the transform the counter-lateral transfer function deconvolved by the ipsit transfer function. lateral. The gains of the transforms of the counter-lateral and ipsilateral transfer functions, as well as their phases, in each spectral band, are given, for example, in Appendix C of the above-mentioned standard "Information technology- MPEG audio technologies-Part 1: MPEG Surround, "ISO / EC JTC 1 / SC 29 (July 21, 2006), for a PQMF transform in 64 subbands.

Thus, as a first approximation, for a contralateral path and in a given spectral band m, the spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined, in the transformed domain, by: and being the gain and the phase of the

counter-lateral transfer function and and being the gain and the phase of the

ipsilateral transfer function. With reference to FIG. 5, each filter is equivalent to applying:

an equalizer filter 1 1, preferably of the low-pass type,

advantageously an interaural delay (or "ITD") 10, to take account of the differences in path between a virtual source and each ear, and

possibly an attenuation 12 with respect to the unfiltered signal components (for example the AVG component on the SG channel of FIG. 4B).

It should be noted here that the applied ITD delay is "substantially" interaural, the term "substantially" referring in particular to the fact that the strict morphology of the listener may not be rigorously taken into account (eg if HRTFs are used by default, including HRTFs called "Kemar head").

Thus, the binaural synthesis of a virtual loudspeaker (AVG for example) consists simply of playing without modification the input signal on the relative ipsi-lateral channel (channel SG in FIG. 4B) and to apply to the signal to be played on the counter-lateral channel (SD channel in FIG. 4B) a corresponding AVG filter (C / I) in application of delay, attenuation and low-pass filtering. Thus, the resulting signal is delayed, attenuated and filtered by eliminating the high frequencies, which results, from the point of view of auditory perception, in masking the signal received by the "counter-lateral" ear (OD, in the example where the virtual speaker is the left side AVG), relative to the signal received by the "ipsi-lateral" ear (OG).

The coloration that can be perceived is therefore directly that of the signal received by the ipsilateral ear. However, advantageously, this signal undergoes no transformation and, therefore, the treatment in the sense of the invention should provide only a weak coloration. However, as a supplementary precaution, with reference to FIG. a processing of the output signals SG and SD of FIG. 4B can be provided consisting in applying a high-pass filter FPH followed by an automatic gain control AGC.

The high-pass filter is equivalent to applying the "Gain" factor described above, with:

Gain = 0.5 if the frequency f is less than 500 Hz and

Gain = 1 otherwise.

Advantageously, in this embodiment, this factor is applied globally at the output of the signals SG and SD, alternatively from an individual application to each coefficient of the matrix explained below.

Advantageously, the automatic gain control is calibrated on the overall intensity of the signals corresponding to the Downmix treatment, given by:

^I > ^{OR SO}

respective energies of the signals of the left front, right front, left back, right rear and center channels, of a 5.1 format. The gains g and g _s are applied globally to the signal C for the gain g and to the signals ARG and ARD for the gain g _s . In other words, the energy of the left channel signals S'G and the right channel S ' _D is thus limited at the end of this treatment, to the maximum, to the overall energy I _D ² of the signals from the top virtual speakers. The recovered signals S 'G and S'D can finally be routed to a sound reproduction device in binaural stereophonic mode.

In practice, in an encoder particularly of the MPEG Surround type, the overall intensity of the signals is usually calculated directly from the energy of the input signals. Thus, in a variant this data will be taken into account for the estimation of the intensity l _D.

The implementation of the invention then results in a suppression of monaural location indices. However, the more a source deviates from the median plane, the more the interaural indices become predominant to the detriment of the monaural indices. Account in view of the ITU-R BS.775 recommendation for 5.1 speaker layout, the angle between the side speakers (or between the rear speakers) is greater than 60 °, Monaural clues have little influence on the perceived position of the virtual speakers. Moreover, the difference perceived here is less than the difference that the listener could perceive from the fact that the HRTFs used would not be specific to him (for example models of HRTFs drawn from the so-called "Kemar head" technique). .

Thus, the spatial perception of the signal is respected, and this, without bringing color and retaining the timbre of the sound sources.

More so, the solution within the meaning of the present invention divides the number of filters to provide substantially by two and further corrects the coloring effects. In addition, it has been observed that the choice of the position of the virtual loudspeakers can significantly influence the quality of the result of the spatialization. Indeed, it has proved preferable to place the virtual speakers side and rear +/- 45 ° with respect to the median plane, rather than +/- 30 ° of the median plane according to the configuration recommended by the International Telecommunications Union (ITU). Indeed, when the virtual speakers approach the median plane, the ipsi-lateral and contra-lateral HRTF functions tend to resemble each other and the previous simplifications may no longer give a satisfactory spatialization.

Thus, in generic terms, considering an initial multi-channel format defining at least four positions:

two side speakers, symmetrical with respect to the median plane, and two rear loudspeakers, symmetrical with respect to the median plane,

the position of a lateral loudspeaker is advantageously in an angular sector of 10 ° to 90 ° and preferably 30 to 60 ° from a plane of symmetry P and facing the face of the listener. More particularly, the position of a lateral loudspeaker will preferably be close to 45 ° from the plane of symmetry. Referring now to Figure 7 to describe a possible embodiment of the invention in which the treatment in the sense of the invention occurs after the step of encoding the sound data, for example before transmission via a network 73 to a decoder 74. Here, a processing module within the meaning of the invention 72 intervenes directly downstream of an encoder 71, to deliver, as indicated previously, processed data according to a treatment of the type:

Downmix + α DBA (with DBA for "Downmix Binaural Additional"). A possible embodiment of such a treatment is described below.

Starting from a signal 5.0 (L, R, C, Ls, Rs) to be encoded and transported, we consider global downmix processing of type:

The signals and therefore correspond to the two stereo signals, without any effect of

spatialization, that could deliver a decoder to power two speakers in sound reproduction.

The computation of the Downmix processing, without binaural filtering, should thus make it possible to find these two signals and, what is then expressed by

example as follows:

By now applying binaural filtering and distributing the signal of the center loudspeaker on the L and R channels equally with gain g, we obtain:

If the contra-lateral contra-lateral HRTF functions deconvolved by the ipsi-lateral HRTF functions are used for counter-lateral filtering, we have,

and

and so :

The additional Binaural Downmix is written:

Taking again the example of a matrix filtering expressing according to a product of matrices of type: where W represents a matrix of

expansion processing of two stereo signals to M 'channels, with M'> 2 (for example M '= 3), this matrix W expressing itself as a 2x6 matrix of the type:

In particular, in the aforementioned MPEG Surround standard, the coefficients of the matrix are such that:

In developing this product, we find

Looking for an addition of two distinct matrices, we find:

what will be written below: with for Downmix treatment and for the additional Downmix Binaural treatment.

In this embodiment, it can be considered that the coefficients of the matrix are given by:

as previously stated.

It can be considered as a first approximation that a lateral channel (right or left) and the corresponding rear lateral channel (right or left respectively) are decorrelated between them. This assumption is reasonable insofar as the rear channel usually only takes over the room reverb or other (delayed in time) signal from the side channel. In this case, the channels L and Ls and the channels R and Rs have disjoint frequency time carriers and then:

The hypothesis above can not be verified on the other hand for all the signals. In the case where the signals have a common time-frequency support, it is preferable to seek to conserve the energies of the signals. This precaution is recommended elsewhere in the MPEG Surround standard. Indeed, the addition of signals in opposition of phase cancels. As indicated above, such a situation never occurs in practice if one considers the case of a room with a reverberation effect on the surround channels.

Nevertheless, in the example described below, variants of the above formulas are used to preserve the energy of the signals in the Downmix processing, as follows:

The global processing matrix H ₁ ^{1, k} is still expressed as the sum of two matrices:, with

and with:

The matrix Η ^ '"contains no term relating to the filter coefficients HRTF.

This matrix deals globally with the spatialization operations from two channels (M = 2) to five channels (N = 5) and the downmixing operations of these five channels into two channels. In a particular embodiment in which a signal "Downmix" derived from the signals 5.0 to be encoded is carried, the coefficients g, w _ij , and

can be calculated by the encoder for this matrix to approach the unit matrix. Indeed, we must have:

The matrix consists of applying function-based filtering

Contra-lateral HRTF deconvolved by ipsilateral functions. Note that going through a Downmix process described above is a particular embodiment. The invention can be implemented also with other types of Downmix matrices.

Moreover, the embodiment introduced above is described by way of example. It appears in fact that it is not necessary, in practice, to try to estimate the signals Lo and Ro by the application of the matrix "since these signals are transmitted from the encoder to the

decoder, which has these signals and, if necessary, spatialization parameters, to reconstruct the signals for the sound reproduction (possibly binaural if the decoder has received the spatialization parameters). This latter embodiment has two advantages. On the one hand, the number of treatments to be performed to find the signals Lo and R ₀ is thus reduced. On the other hand, the quality of the output signals is improved: the transition to the transformed domain and return to the starting domain, as well as the application of the matrix Η ' ₀ "', necessarily degrade the signals. to apply the following treatment:

It also appears that the matrix can still be simplified. Indeed, returning to the expression:, we can calculate the expressions of

five intermediate signals with binaural Downmix processing as follows

With still, we manage to:

and

These expressions are simplified compared to their usual calculation. However, here again, we can take the precaution of not leading to cancellation of signals in phase opposition by seeking to preserve the energy levels of the different signals in the Downmix process, as recommended above. We then obtain:

ave

The expression of the matrix is then the following:

Of course, the present invention is not limited to the embodiment described before by way of example; it extends to other variants. Thus, it has been described above the case of a processing of two initial stereo signals to encode and spatialize to binaural stereo and passing through a 5.1 spatialization. Nevertheless, the invention also applies to the processing of an initial mono signal (cgs-N = 1 in the general expression N> 0 given above and applying to the number of initial channels to be treated). For example, in the case of the "Information technology- MPEG audio technologies- Part 1: MPEG Surround" standard, ISO / BEC JTC 1 / SC 29 (July 21, 2006), the equations presented in 6.11.4.1.3.1, for the case of a first treatment of the 5.1 - binauralization type (denoted "5-1-5i" and consisting of treating the surround channels immediately before the central channel), is simplified by:

Similarly, the equations presented in point 6.1 1.4.1.3.2, for the case of a first mono - spatialization 5.1 - binauralisation type treatment (denoted "5-1-5" and consisting of processing the central channel immediately, then treating the surround effect on each left and right channel), are simplified by:

and

More generally, it is possible to provide further processing signals or signal components to be returned in binaural or transaural format. For example, the SG and SD channels of FIG. 4B may furthermore undergo dynamic low-pass filtering of the Dolby® or other type.

The present invention also relates to a MOD module (FIG. 4B) for processing sound data, for the transition from a multi-channel format to a binaural or transaural format, in the transformed domain, the elements of which could be those illustrated in FIG. 4B. Such a module then comprises processing means, such as a PROC processor and a MEM working memory, for the implementation of the invention. It can be implemented in any type of decoder, including a sound reproduction device (PC, walkman, mobile phone, or other) and possibly movie viewing. Alternatively, the module may be designed to operate separately from the restitution, for example to prepare binaural or transaural format content, for subsequent decoding.

The present invention also relates to a computer program, downloadable via a telecommunication network and / or stored in a memory of a processing module of the aforementioned type and / or stored on a memory medium intended to cooperate with a reader of such a module processing, and comprising instructions for the implementation of the invention, when they are executed by a processor of said module.

Claims

claims

A method for processing encoded sound data in a subband domain, for binaural or transaural® bi-channel rendering, in which matrix filtering is applied to switch from a N channel sound representation with N> 0, at a two-channel representation,

said N-channel sound representation of considering N virtual loudspeakers surrounding a listener's head, and, for each virtual speaker of at least a portion of the loudspeakers:

a first transfer function specific to an ipsi-lateral path of the loudspeaker (AVG) towards a first ear (OG) of the listener, facing the loudspeaker, and a second transfer function specific to a contrary path; side of said speaker (AVG) to the second ear (OD) of the listener, masked from the loudspeaker by the listener's head,

the applied matrix filtering comprising a multiplicative coefficient ((C / I) _AVG ) defined by the spectrum, in the subband domain, of the second transfer function deconvolved by the first transfer function.

2. The method according to claim 1, wherein a matrix filtering is applied to switch from a sound representation to M channels, with M> 0, to a two-channel representation, passing through an intermediate representation on said N channels, with N> 2, and in which the coefficients of the matrix are expressed, for a contra-lateral path, at least as a function of respective spatialization gains of the M channels on the N virtual loudspeakers located in a hemisphere around a first ear, and counter-lateral transfer function spectra, relative to the second ear of the listener, deconvolved by the ipsilateral transfer function, relative to the first ear,

while for an ipsi-lateral path, the coefficients of the matrix are expressed as a function of the spatialization gains of the M channels on the N virtual speakers located in a hemisphere around a first ear.

3. A method according to claim 2, wherein the N-channel representation comprises, by hemisphere around an ear, at least one direct virtual speaker and a virtual ambience speaker, the coefficients of the matrix s'. expressing, in a domain of time-frequency transform subbands (PQMF), by:

- for trips from a central virtual speaker to the ear

left,

- for trips from a central virtual speaker to the ear

right,

- For counter-lateral paths

to the left ear;

- for counter-lateral paths to

the right ear;

-, for ipsilateral trips to the left ear;

-, for ipsi-lateral trips to the right ear;

or :

g is a mix distribution gain of a central virtual speaker channel to left and right direct speaker channels,

- and represent relative gains to apply to the same first signal

to define L and Ls channels respectively of the virtual left and right surround speakers, for the sample / of the frequency band m in time-frequency transform,

- or represent relative gains to apply to the same second signal to define R and Rs channels of the virtual right and right surround virtual speakers, for the sample 1 of the frequency band m in time-frequency transform, - or _i . is the spectrum expression of the HRTF transfer function

- or is the spectrum expression of the HRTF transfer function

- are phase shifts between counter-lateral transfer functions and

ipsi-lateral corresponding to selected interaural delays, and

- are selected weights.

4. Method according to one of the preceding claims, wherein the coefficients of the matrix vary according to the frequency, according to a weighting of a selected factor and less than one, if the frequency is below a chosen threshold, and one if not.

The method of claim 4, wherein the factor is about 0.5 and the selected frequency threshold is about 500 Hz to eliminate color distortion.

6. Method according to one of the preceding claims, wherein a selected gain is further applied to two left channel and right channel signals in two-channel representation, before restitution, the selected gain being controlled to limit a signal energy. of left and right channels, at most, to a signal energy of the virtual loudspeakers.

7. The method of claim 6, taken in combination with one of claims 4 and 5, wherein an automatic gain control is applied to both left and right channel signals, downstream of the application of the variable frequency weighting.

8. Method according to one of claims 3 to 7, wherein the matrix filtering is expressed according to a product of matrices of type: or:

- W ^'s ^m is a stereo signal expansion processing to matrix M' channels, with M> 2, and

1 0 0 0 0 0

represents a global matrix treatment

comprising:

an expansion process of M 'channels to said N channels, with N> 3, and

a spatialization treatment of the N virtual loudspeakers respectively associated with the N channels to obtain a bi-channel, binaural or transaural representation, with:

9. Method according to one of the preceding claims, wherein the matrix filtering consists of applying:

a first downmix processing (DOWNMIX) of the N channels to two stereo signals, and

a second processing (DBA) leading, when executed together with the first processing, to a spatialization of the N virtual loudspeakers respectively associated with the N channels for a bi-channel, binaural or transaural® representation.

10. The method of claim 9, wherein a weighting (a) of the second processing is chosen in said matrix filtering.

11. The method of claim 10, wherein the first processing is applied in an encoder communicating with a decoder, and the second processing is applied in said decoder.

Method according to one of claims 9 to 11, taken in combination with claim 8, wherein the matrix:

is written as a sum of matrices ^with:

a first matrix representing the first treatment expressed by:

and a second matrix representing the second treatment expressed by, with:

13. Computer program comprising instructions for implementing the method according to one of the preceding claims, when the program is executed by a processor.

14. Module for processing sound data encoded in a subband domain, for binaural or transaural® bi-channel reproduction,

the module comprising means for applying a matrix filtering to pass from a N channel sound representation with N> 0, to a two-channel representation, said N channel sound representation consisting of considering N virtual speakers surrounding the sound head; 'a listener, and, for each virtual loudspeaker of at least part of the loudspeakers:

a first transfer function specific to an ipsi-lateral path of the loudspeaker (AVG) to a first ear (OG) of the listener, facing the loudspeaker, and

a second transfer function specific to a contra-lateral path of said loudspeaker (AVG) towards the second ear (OD) of the listener, masked from the loudspeaker by the listener's head,

the applied matrix filtering comprising a multiplicative coefficient ((C / I) AVG) defined by the spectrum, in the subband domain, of the second transfer function deconvolved by the first transfer function.

15. Module according to claim 14, further comprising decoding means of the MPEG Surround® type.