EP3058564B1

EP3058564B1 - Sound spatialisation with reverberation, optimised in terms of complexity

Info

Publication number: EP3058564B1
Application number: EP14796814.3A
Authority: EP
Inventors: Grégory PALLONE; Marc Emerit
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2013-10-18
Filing date: 2014-10-14
Publication date: 2023-07-26
Anticipated expiration: 2034-10-14
Also published as: KR20160073394A; US20160269850A1; ES2982054T3; EP4184505B1; EP4184505A1; EP3058564A1; JP6518661B2; US9641953B2; CN105706162B; FR3012247A1; ES2959534T3; CN105706162A; KR102156650B1; JP2016537866A; WO2015055946A1

Description

La présente invention concerne une spatialisation sonore avec effet de salle.The present invention relates to sound spatialization with room effect.

L'invention trouve une application avantageuse mais non limitative à un traitement de signaux sonores respectivement issus de L canaux associés à des haut-parleurs virtuels (par exemple dans une représentation multicanale, ou encore dans une représentation ambiophonique, du son à restituer), pour une restitution spatialisée sur des haut-parleurs réels (par exemple deux oreillettes d'un casque en restitution binaurale, ou deux haut-parleurs séparés en restitution transaurale).The invention finds an advantageous but non-limiting application to a processing of sound signals respectively originating from L channels associated with virtual loudspeakers (for example in a multichannel representation, or even in an ambiophonic representation, of the sound to be reproduced), for a spatialized restitution on real loudspeakers (for example two earpieces of a helmet in binaural restitution, or two separate loudspeakers in transaural restitution).

Par exemple, le signal de l'un de ces canaux peut être traité pour avoir une première contribution sur l'oreillette gauche et une deuxième contribution sur l'oreillette droite, en restitution binaurale, en appliquant en particulier une fonction de transfert avec effet de salle à chacune de ces contributions. L'application de ces fonctions de transfert à effet de salle contribue alors à offrir à l'auditeur un sentiment d'immersion lui permettant pratiquement de « situer dans l'espace» le haut-parleur virtuel associé à ce canal.For example, the signal from one of these channels can be processed to have a first contribution on the left atrium and a second contribution on the right atrium, in binaural reproduction, by applying in particular a transfer function with room effect to each of these contributions. The application of these hall effect transfer functions then contributes to offering the listener a feeling of immersion allowing him practically to “locate in space” the virtual loudspeaker associated with this channel.

Dans une réalisation particulière, décrite notamment dans le document FR1357299 , on applique une fonction de transfert avec effet de salle à chaque signal sonore d'un canal correspondant, dans le domaine temporel sous la forme de réponse impulsionnelle de type BRIR (pour « Binaural Room Impulse Response » ou « Réponse Impulsionnelle Binaurale de salle »). En particulier, dans ce document, incorporé ici par référence, on construit cette fonction de transfert BRIR comme la combinaison :

d'une première fonction de transfert, propre à chaque signal, et
d'une deuxième fonction de transfert, globale, commune à tous les signaux et caractérisant en particulier un champ diffus, dont la présence advient habituellement dans une salle après un certain temps, typiquement après les premières réflexions d'une onde sonore.

In a particular embodiment, described in particular in the document FR1357299 , a transfer function with room effect is applied to each sound signal of a corresponding channel, in the time domain in the form of an impulse response of the BRIR type (for “Binaural Room Impulse Response” or “Binaural Room Impulse Response”). In particular, in this document, incorporated here by reference, this BRIR transfer function is constructed as the combination:

a first transfer function, specific to each signal, and
a second transfer function, global, common to all the signals and characterizing in particular a diffuse field, the presence of which usually occurs in a room after a certain time, typically after the first reflections of a sound wave.

Une telle réalisation permet avantageusement d'appliquer un traitement commun à tous les signaux, ce qui correspond, dans une réalité physique, à un « mélange » des ondes acoustiques au fur et à mesure des réverbérations, donc au-delà d'une durée donnée (caractérisant un début de présence du champ diffus). Une telle réalisation permet alors de réduire la complexité des traitements de spatialisation avec effet de salle sur plusieurs canaux initiaux.Such an embodiment advantageously makes it possible to apply a common processing to all the signals, which corresponds, in a physical reality, to a "mixing" of the waves acoustic as the reverberations progress, therefore beyond a given duration (characterizing the beginning of the presence of the diffuse field). Such an embodiment then makes it possible to reduce the complexity of the spatialization processings with room effect on several initial channels.

En outre, le document US 2011/170721 A1 divulgue un procédé de traitement d'au moins un signal d'entrée par un ensemble de filtres binauraux de telle sorte que les sorties peuvent être lues sur des écouteurs pour fournir une sensation d'écoute du son dans une pièce d'écoute via un ou plusieurs haut-parleurs virtuels.Additionally, the document US 2011/170721 A1 discloses a method of processing at least one input signal through a set of binaural filters such that the outputs can be played over headphones to provide a listening experience of sound in a listening room through one or more virtual speakers.

Néanmoins, dans des modules de spatialisation intervenant en amont de la restitution, on cherche encore à réduire autant que possible la complexité des traitements de spatialisation. En effet, par exemple (mais non limitativement), les signaux des canaux sont reçus sous forme encodée, par un décodeur en compression. Ce décodeur envoie les signaux des canaux, une fois décodés, à un module de spatialisation pour une restitution sonore avec effet de salle, sur deux haut-parleurs. Il convient alors que cette étape de spatialisation (qui suit le décodage des signaux reçus) soit de complexité de traitement réduite pour ne pas retarder l'ensemble global des étapes de décodage et de spatialisation à la réception des signaux avant restitution.Nevertheless, in spatialization modules intervening upstream of the restitution, one still seeks to reduce as much as possible the complexity of the spatialization processings. Indeed, for example (but not limitingly), the channel signals are received in encoded form, by a compression decoder. This decoder sends the channel signals, once decoded, to a spatialization module for sound reproduction with room effect, on two loudspeakers. This spatialization step (which follows the decoding of the signals received) should then be of reduced processing complexity so as not to delay the overall set of decoding and spatialization steps on reception of the signals before restitution.

La présente invention vient améliorer la situation.The present invention improves the situation.

L'invention propose à cet effet de réduire la complexité de l'application de la fonction de transfert à effet de salle, en réduisant en particulier cette complexité dans le domaine spectral. En effet, dans le domaine spectral, la convolution par une fonction de transfert devient une multiplication de composantes spectrales du signal d'une part, et d'un filtre représentant la fonction de transfert d'autre part (figure 1 commentée en détails plus loin).The invention proposes for this purpose to reduce the complexity of the application of the room effect transfer function, in particular by reducing this complexity in the spectral domain. Indeed, in the spectral domain, the convolution by a transfer function becomes a multiplication of spectral components of the signal on the one hand, and of a filter representing the transfer function on the other hand ( figure 1 discussed in detail below).

L'invention part alors du constat avantageux selon lequel, après une propagation directe, une onde sonore tend à s'atténuer dans les hautes fréquences du fait des réflexions progressives sur des surfaces (murs typiquement, visage de l'auditeur, etc.) absorbant l'onde en particulier dans les hautes fréquences. En outre, l'air lui-même absorbe les composantes spectrales des fréquences les plus élevées du son pendant sa propagation. Ce phénomène est d'autant plus accru par exemple pour le champ sonore diffus, pour lequel il n'est pas nécessaire d'avoir une représentation fréquentielle pour des fréquences très élevées (par exemple supérieures à une fréquence dans une gamme de 5 à 15 kHz).The invention then starts from the advantageous finding that, after direct propagation, a sound wave tends to attenuate at high frequencies due to progressive reflections on surfaces (typically walls, the listener's face, etc.) absorbing the wave, in particular at high frequencies. In addition, the air itself absorbs the highest frequency spectral components of sound as it propagates. This phenomenon is all the more increased for example for the diffuse sound field, for which it is not necessary to have a frequency representation for very high frequencies (for example higher than a frequency in a range of 5 to 15 kHz).

Ainsi, il est possible de réduire la complexité du traitement de l'application de la fonction de transfert avec effet de salle, dans le domaine spectral, en ne tenant simplement pas compte, pour effectuer les multiplications de composantes spectrales précitées, des composantes associées à des fréquences supérieures à une fréquence de coupure prédéterminée (par exemple supérieures à 5 à 15 kHz).Thus, it is possible to reduce the complexity of the processing of the application of the transfer function with room effect, in the spectral domain, by simply not taking into account, to carry out the aforementioned multiplications of spectral components, the components associated with frequencies higher than a predetermined cut-off frequency (for example higher than 5 to 15 kHz).

L'invention vise alors un procédé de spatialisation sonore, comportant l'application d'au moins une fonction de transfert à effet de salle à au moins un signal sonore, ladite application revenant à multiplier, dans le domaine spectral, des composantes spectrales du signal sonore par les composantes spectrales d'un filtre correspondant à la fonction de transfert précitée. Chaque composante spectrale du filtre comporte une évolution temporelle dans une représentation temps-fréquence (comme détaillé plus loin en référence à la figure 3).The invention therefore aims at a method of sound spatialization, comprising the application of at least one room effect transfer function to at least one sound signal, said application amounting to multiplying, in the spectral domain, spectral components of the sound signal by the spectral components of a filter corresponding to the aforementioned transfer function. Each spectral component of the filter includes a temporal evolution in a time-frequency representation (as detailed later with reference to the picture 3 ).

En particulier, ces composantes spectrales du filtre sont ignorées, pour les multiplications de composantes précitées, au-delà d'une fréquence seuil et après au moins un instant donné dans ladite représentation temps-fréquence. Ainsi, après cet instant donné, les composantes spectrales du filtre sont prises en compte jusqu'à une fréquence de coupure qui peut être choisie par exemple entre 5 et 15 kHz (en fonction de l'effet de salle à appliquer et/ou du signal à spatialiser, comme décrit ci-après). Au-delà de la fréquence de coupure, la multiplication n'est même pas effectuée, ce qui mathématiquement revient au même que de multiplier le signal par zéro.In particular, these spectral components of the filter are ignored, for the aforementioned component multiplications, beyond a threshold frequency and after at least a given instant in said time-frequency representation. Thus, after this given instant, the spectral components of the filter are taken into account up to a cut-off frequency which can be chosen for example between 5 and 15 kHz (depending on the room effect to be applied and/or the signal to be spatialized, as described below). Beyond the cutoff frequency, the multiplication is not even performed, which mathematically is the same as multiplying the signal by zero.

Cet instant donné représente typiquement le moment où une onde sonore commence à subir des réverbérations (par réflexions successives, ou, plus tard encore, à partir d'une présence d'un champ sonore diffus). Ainsi, en termes généraux, dans une réalisation où la fonction de transfert tient compte de réverbérations dans l'effet de salle (avec prise en compte par exemple d'un champ sonore diffus), l'instant donné précité peut être choisi en fonction de telles réverbérations. Par exemple, l'instant donné précité peut être postérieur, dans l'effet de salle, à une propagation sonore directe avec des premières réflexions, et correspondre alors à un début de présence de champ sonore diffus.This given instant typically represents the moment when a sound wave begins to undergo reverberations (by successive reflections, or, later still, from the presence of a diffuse sound field). Thus, in general terms, in an embodiment where the transfer function takes account of reverberations in the room effect (with consideration for example of a diffuse sound field), the aforementioned given instant can be chosen as a function of such reverberations. For example, the aforesaid given instant may be later, in the room effect, than direct sound propagation with first reflections, and then correspond to the beginning of the presence of a diffuse sound field.

En outre, on peut prévoir une réalisation dans laquelle la fréquence seuil précitée diminue en fonction du temps dans ladite représentation temps-fréquence. Par exemple, si le signal est échantillonné sur plusieurs blocs successifs, il peut être prévu, à titre d'exemple, de conserver les composantes spectrales présentes dans le signal, dans la multiplication des composantes, pour un premier bloc, puis de les ignorer au-delà d'une première fréquence seuil pour un deuxième bloc qui suit le premier bloc, puis de les ignorer au-delà d'une deuxième fréquence seuil pour un troisième bloc qui suit le deuxième bloc, etc., la deuxième fréquence seuil étant plus basse que la première.Furthermore, an embodiment can be provided in which the aforementioned threshold frequency decreases as a function of time in said time-frequency representation. For example, if the signal is sampled over several successive blocks, provision may be made, by way of example, to retain the spectral components present in the signal, in the multiplication of the components, for a first block, then to ignore them beyond a first threshold frequency for a second block which follows the first block, then to ignore them beyond a second threshold frequency for a third block which follows the second block, etc., the second threshold frequency being lower than the first.

Ainsi, en termes plus génériques, dans une réalisation où le signal est échantillonné sur plusieurs blocs successifs, les composantes spectrales du filtre peuvent être ignorées, pour la multiplication des composantes :

au-delà d'une première fréquence seuil pour un bloc donné,
puis, au-delà d'une deuxième fréquence seuil, pour un bloc qui suit le bloc donné,

la deuxième fréquence seuil étant plus basse que la première fréquence seuil. Le bloc donné précité peut inclure par exemple des échantillons situés temporellement à des instants qui correspondent à des moments où une onde sonore a subi une ou plusieurs réflexions, avec même un début de présence de champ sonore diffus. Le bloc qui suit ce bloc donné (immédiatement ou quelques blocs plus tard) peut inclure par exemple des échantillons situés temporellement après ou à partir d'un début de présence de champ sonore diffus.Thus, in more generic terms, in a realization where the signal is sampled over several successive blocks, the spectral components of the filter can be ignored, for the multiplication of the components:

beyond a first threshold frequency for a given block,
then, beyond a second threshold frequency, for a block following the given block,

the second threshold frequency being lower than the first threshold frequency. The aforementioned given block can include, for example, samples located temporally at times that correspond to times when a sound wave has undergone one or more reflections, with even the beginning of the presence of a diffuse sound field. The block which follows this given block (immediately or a few blocks later) can include for example samples located temporally after or from a beginning of presence of diffuse sound field.

Une telle réalisation permet par exemple de limiter des artéfacts éventuellement audibles de limitation de signal dans les hautes fréquences pour les réverbérations, cette réalisation étant accomplie progressivement sur plusieurs blocs. Elle permet aussi de considérer plusieurs formes de fonctions de transfert (notées ci-après $B_{mean}^{k} (m)$

, m étant un indice de bloc) caractérisant un champ sonore diffus. En effet, il est possible par exemple d'appliquer une fonction de transfert B^k _mean à un bloc donné précité, et d'appliquer une fenêtre de coupure temporellement progressive (de type « fade out ») à cette fonction de transfert B^k _mean, pour le bloc qui suit, pour « terminer » la présence du champ sonore diffus.Such an embodiment makes it possible, for example, to limit possibly audible signal limiting artefacts in the high frequencies for the reverberations, this embodiment being accomplished progressively over several blocks. It also makes it possible to consider several forms of transfer functions (denoted below

B_{mean}^{k} (m)

, m being a block index) characterizing a diffuse sound field. Indeed, it is possible for example to apply a transfer function B ^k _mean to a given block mentioned above, and to apply a temporally progressive cut-off window (of the “fade out” type) to this transfer function B ^k _mean , for the block which follows, to “terminate” the presence of the diffuse sound field.

Dans une forme de réalisation où le procédé est mis en oeuvre par un module de spatialisation sonore recevant une pluralité de signaux d'entrée et délivrant au moins deux signaux de sortie, pour délivrer chaque signal de sortie, on applique une fonction de transfert à effet de salle, à chaque signal d'entrée,
chacun desdits signaux de sortie étant donné par application de la formule suivante: $\begin{matrix} O^{k} = \sum_{l = 1}^{L} (I (l) *_{[0; \dots f^{k} (l)]} A^{k} (l)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{l = 1}^{L} (G (I (l)) . \frac{1}{W^{k} (l)} . I (l))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}$

O^k étant un signal de sortie, et k étant l'indice relatif à un signal de sortie,
l ∈ [1;L], étant l'indice relatif à un signal d'entrée parmi lesdits signaux d'entrée, L étant le nombre de signaux d'entrée, et I(l) étant un signal d'entrée parmi lesdits signaux d'entrée,
A^k (l) étant une fonction de transfert avec effet de salle spécifique à un signal d'entrée,
$B_{mean}^{k} (m)$
étant une fonction de transfert globale, avec effet de salle, commune aux signaux d'entrée,
W^k(l) étant un poids de pondération choisi, et G(I(l)), un gain de compensation d'énergie prédéterminé,
z^-iDDm étant une application de délai, compté en nombre de blocs d'échantillons, correspondant à un écart temporel entre une émission sonore dans une salle correspondant à l'effet de salle, et un début de présence de champ diffus dans cette salle, l'indice m correspondant à un nombre de blocs d'échantillons de durée correspondant à ce délai, M étant le nombre total de blocs que dure une fonction de transfert dans une représentation temps-fréquence,
le signe « . » désignant la multiplication,
le signe «^∗[0f^k (l)]» désignant l'opérateur de convolution sur un nombre de fréquences limité et allant d'une plus basse fréquence à une fréquence maximale f ^k(l) qui est fonction au moins du signal d'entrée d'indice l, et
le signe « ^∗[0f^k (m)] » désignant l'opérateur de convolution sur un nombre de fréquences limité et allant d'une plus basse fréquence à une fréquence f^k (m) qui est fonction du bloc d'échantillons d'indice m.

In one embodiment where the method is implemented by a sound spatialization module receiving a plurality of input signals and delivering at least two output signals, to deliver each output signal, a room effect transfer function is applied to each input signal,
each of said output signals being given by application of the following formula:

\begin{matrix} O^{k} = \sum_{I = 1}^{I} (I (I) *_{[0; \dots f^{k} (I)]} {AT}^{k} (I)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{I = 1}^{I} (G (I (I)) . \frac{1}{W^{k} (I)} . I (I))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}

O ^k being an output signal, and k being the index relating to an output signal,
l ∈ [1; L ], being the index relating to an input signal among said input signals, L being the number of input signals, and I ( l ) being an input signal among said input signals,
A ^k ( l ) being a transfer function with room effect specific to an input signal,
$B_{mean}^{k} (m)$
being a global transfer function, with room effect, common to the input signals,
W ^k (l) being a chosen weighting weight, and G ( I ( l )), a predetermined energy compensation gain,
z ^-iDDm being an application of delay, counted in number of blocks of samples, corresponding to a time difference between a sound emission in a room corresponding to the room effect, and the beginning of the presence of a diffuse field in this room, the index m corresponding to a number of blocks of samples of duration corresponding to this delay, M being the total number of blocks that a transfer function lasts in a time-frequency representation,
the sign " . denoting multiplication,
the sign " ^∗ [0 f ^k ( l )]" designating the convolution operator on a limited number of frequencies and going from a lowest frequency to a maximum frequency f ^k ( l ) which is a function at least of the input signal of index l , and
the sign “ ^∗ [0 f ^k ( m )]” designating the convolution operator on a limited number of frequencies and going from a lowest frequency to a frequency f ^k ( m ) which is a function of the block of samples of index m .

Cette réalisation sera décrite en détails plus loin en référence aux figures 2 et 5, notamment.This realization will be described in detail later with reference to the figures 2 and 5 , notably.

On peut aussi appliquer une limitation de calculs de multiplications au-delà d'une première fréquence seuil, dès le ou les premiers blocs d'échantillons, en fonction des caractéristiques du signal (par exemple sa fréquence d'échantillonnage, ou la fréquence la plus élevée représentée dans les composantes spectrales du signal) ou en fonction des caractéristiques de spatialisation appliquée (avec par exemple une limitation des composantes de hautes fréquences pour un trajet acoustique contra-latéral comme détaillé plus loin).It is also possible to apply a limitation of multiplication calculations beyond a first threshold frequency, from the first block or blocks of samples, according to the characteristics of the signal (for example its sampling frequency, or the highest frequency represented in the spectral components of the signal) or according to the spatialization characteristics applied (with for example a limitation of the high frequency components for a contra-lateral acoustic path as detailed below).

Dans ce cas, le signal issu des réverbérations (après réflexions ou dans le champ diffus) ne comporte pas, normalement, de composantes spectrales de fréquence plus élevée que le signal initial. Ainsi, la fréquence seuil précitée ne peut pas être plus grande que cette fréquence la plus élevée.In this case, the signal resulting from the reverberations (after reflections or in the diffuse field) does not normally comprise spectral components of higher frequency than the initial signal. Thus, the aforementioned threshold frequency cannot be greater than this highest frequency.

Ainsi, en termes plus génériques, dans une forme de réalisation, on obtient une information de composante spectrale de fréquence la plus élevée dans le signal sonore, et la fréquence seuil précitée est choisie comme étant le minimum parmi une fréquence seuil prédéterminée (par exemple entre 5 et 15 kHz) et ladite fréquence la plus élevée.Thus, in more generic terms, in one embodiment, information is obtained on the highest frequency spectral component in the sound signal, and the aforementioned threshold frequency is chosen as being the minimum among a predetermined threshold frequency (for example between 5 and 15 kHz) and said highest frequency.

Typiquement, dans une réalisation où le signal sonore est issu d'un décodeur en compression, l'information de composante spectrale de fréquence la plus élevée peut être fournie par le décodeur.Typically, in an embodiment where the sound signal comes from a compression decoder, the highest frequency spectral component information can be provided by the decoder.

De même, si la spatialisation est réalisée auprès d'un module capable de supporter différents formats de signaux, notamment en termes de fréquence d'échantillonnage de tels signaux, la fréquence la plus élevée, précitée, ne peut pas être supérieure à la moitié de la fréquence d'échantillonnage, et, ainsi, la fréquence seuil pour la mise en oeuvre de l'invention peut être choisie en outre en fonction de cette fréquence d'échantillonnage.Similarly, if the spatialization is carried out with a module capable of supporting different signal formats, in particular in terms of sampling frequency of such signals, the highest frequency, mentioned above, cannot be greater than half the sampling frequency, and, thus, the threshold frequency for the implementation of the invention can be chosen in addition according to this sampling frequency.

Dans une réalisation où le signal sonore est spatialisé sur au moins des premier et deuxième haut-parleurs virtuels, associés respectivement à un premier et un deuxième canal, on applique respectivement des première et deuxième fonctions de transfert avec effet de salle sur ces premier et deuxième canaux, comme expliqué ci-avant en introduction (par exemple en adaptant des signaux sur des canaux ambiophoniques pour passer à une restitution binaurale ou transaurale). En particulier, dans le cas où l'une des première et deuxième fonctions de transfert applique un effet de trajet acoustique ipsi-latéral, tandis que l'autre des première et deuxième fonctions de transfert applique un effet de trajet acoustique contra-latéral, on peut prévoir une élimination des composantes spectrales du signal sonore au-delà d'une fréquence d'écrantage donnée. Cette fréquence « d'écrantage » s'explique par le fait que pour un trajet contra-latéral entre un haut-parleur virtuel et une oreille considérée de l'auditeur, la tête de l'auditeur masque le trajet acoustique et absorbe les tonalités les plus aigües de l'onde acoustiques (donc élimine les composantes spectrales associées aux fréquences les plus élevées de l'onde acoustique). Ainsi, la fréquence seuil précitée, pour la fonction de transfert appliquant un effet de trajet contra-latéral, peut être choisie comme un minimum parmi une fréquence seuil prédéterminée (par exemple choisie entre 5 et 15kHz) et cette fréquence d'écrantage. Cette réalisation est avantageuse pour être appliquée déjà pour le premier bloc d'échantillons. En revanche, elle n'exclut pas la possibilité d'augmenter à nouveau la fréquence seuil pour le bloc suivant pour simuler une première réflexion sur un mur situé en regard de l'oreille considérée, cette première réflexion étant reçue à cette oreille par un trajet ipsi-latéral.In an embodiment where the sound signal is spatialized on at least first and second virtual loudspeakers, associated respectively with a first and a second channel, first and second transfer functions with room effect are respectively applied to these first and second channels, as explained above in the introduction (for example by adapting signals on surround channels to switch to binaural or transaural reproduction). In particular, in the case where one of the first and second transfer functions applies an ipsi-lateral acoustic path effect, while the other of the first and second transfer functions applies a contra-lateral acoustic path effect, it is possible to provide for an elimination of the spectral components of the sound signal beyond a given screening frequency. This "screening" frequency is explained by the fact that for a contra-lateral path between a virtual loudspeaker and a considered ear of the listener, the listener's head masks the acoustic path and absorbs the highest tones of the acoustic wave (thus eliminating the spectral components associated with the highest frequencies of the acoustic wave). Thus, the aforementioned threshold frequency, for the transfer function applying a contra-lateral path effect, can be chosen as a minimum from among a predetermined threshold frequency (for example chosen between 5 and 15 kHz) and this screening frequency. This embodiment is advantageous for being applied already for the first block of samples. On the other hand, it does not exclude the possibility of again increasing the threshold frequency for the following block to simulate a first reflection on a wall located opposite the ear in question, this first reflection being received at this ear by an ipsi-lateral path.

On comprendra en tout état de cause que la fréquence de coupure peut être choisie commune à tous les signaux, dans une réalisation possible, après un instant donné qui correspond par exemple à la présence du champ diffus.It will be understood in any case that the cut-off frequency can be chosen common to all the signals, in one possible embodiment, after a given instant which corresponds for example to the presence of the diffuse field.

Ainsi, la réalisation décrite dans le document FR1357299 introduit plus haut peut être avantageuse dans le contexte de l'invention, et en particulier si chaque fonction de transfert appliquée à un signal comporte :

une fonction de transfert spécifique à ce signal, additionnée à
une fonction de transfert globale, commune à tous les signaux, et représentative d'une présence de champ diffus,

alors l'instant donné précité peut être commun à l'ensemble des signaux et correspondre par exemple à un début de présence de champ sonore diffus.Thus, the realization described in the document FR1357299 introduced above can be advantageous in the context of the invention, and in particular if each transfer function applied to a signal comprises:

a transfer function specific to this signal, added to
a global transfer function, common to all the signals, and representative of the presence of a diffuse field,

then the aforementioned given instant can be common to all the signals and correspond for example to the beginning of the presence of a diffuse sound field.

Dans une réalisation où les signaux comportent des blocs successifs d'échantillons, de mêmes tailles entre signaux, on prévoit au moins un instant donné pour la limitation de la prise en compte des composantes fréquentielles jusqu'à une fréquence de coupure, cet instant donné étant situé temporellement au début d'un bloc différent d'un premier bloc dans une succession de blocs. Cet instant donné intervient donc après une propagation directe, et au moment de réflexions sonores ou de présence de champ diffus.In an embodiment where the signals comprise successive blocks of samples, of the same sizes between signals, at least one given instant is provided for the limitation of the consideration of the frequency components up to a cutoff frequency, this given instant being located temporally at the start of a block different from a first block in a succession of blocks. This given instant therefore occurs after direct propagation, and at the time of sound reflections or the presence of a diffuse field.

Cette réalisation sera détaillée plus loin en référence à la figure 5, illustrant aussi, dans un exemple de réalisation, un possible algorithme d'un programme informatique qu'exécuterait un processeur d'un module de spatialisation opérant le procédé au sens de l'invention. A ce titre, la présente invention vise aussi, de façon générale, un programme informatique comportant des instructions pour la mise en oeuvre du procédé ci-avant, lorsqu'elles sont exécutées par un processeur.This achievement will be detailed later with reference to the figure 5 , also illustrating, in an exemplary embodiment, a possible algorithm of a computer program that would be executed by a processor of a spatialization module operating the method within the meaning of the invention. In this respect, the present invention is also aimed, in general, at a computer program comprising instructions for implementing the above method, when they are executed by a processor.

Cette divulgation présente également un exemple de mise en oeuvre d'un module de spatialisation sonore, comportant des moyens de calcul pour appliquer au moins une fonction de transfert à effet de salle à au moins un signal sonore d'entrée, ladite application revenant à multiplier, dans le domaine spectral, des composantes spectrales du signal sonore par les composantes spectrales d'un filtre correspondant à ladite fonction de transfert, chaque composante spectrale du filtre comportant une évolution temporelle dans une représentation temps-fréquence. En particulier, ces moyens de calcul sont configurés pour ignorer lesdites composantes spectrales du filtre pour lesdites multiplications de composantes, au-delà d'une fréquence seuil et après au moins un instant donné dans ladite représentation temps-fréquence.This disclosure also presents an example of implementation of a sound spatialization module, comprising calculation means for applying at least one room-effect transfer function to at least one input sound signal, said application amounting to multiplying, in the spectral domain, spectral components of the sound signal by the spectral components of a filter corresponding to said transfer function, each spectral component of the filter comprising a temporal evolution in a time-frequency representation. In particular, these calculation means are configured to ignore said spectral components of the filter for said multiplications of components, beyond a threshold frequency and after at least a given instant in said time-frequency representation.

Le module de spatialisation sonore recevant une pluralité de signaux d'entrée, délivre au moins deux signaux de sortie, les moyens de calcul étant configurés pour appliquer une fonction de transfert à effet de salle, à chaque signal d'entrée, chacun desdits signaux de sortie étant donné par application de la formule suivante: $\begin{matrix} O^{k} = \sum_{l = 1}^{L} (I (l) *_{[0; \dots f^{k} (l)]} A^{k} (l)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{l = 1}^{L} (G (I (l)) . \frac{1}{W^{k} (l)} . I (l))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}$

O^k étant un signal de sortie, et k étant l'indice relatif à un signal de sortie,
l ∈ [1;L], étant l'indice relatif à un signal d'entrée parmi lesdits signaux d'entrée, L étant le nombre de signaux d'entrée, et I(l) étant un signal d'entrée parmi lesdits signaux d'entrée,
A^k (l) étant une fonction de transfert avec effet de salle spécifique à un signal d'entrée,
$B_{mean}^{k} (m)$
étant une fonction de transfert globale, avec effet de salle, commune aux signaux d'entrée,
W^k(l) étant un poids de pondération choisi, et G(I(l)), un gain de compensation d'énergie prédéterminé,
z^-iDDm étant une application de délai, compté en nombre de blocs d'échantillons, correspondant à un écart temporel entre une émission sonore dans une salle correspondant à l'effet de salle, et un début de présence de champ diffus dans cette salle, l'indice m correspondant à un nombre de blocs d'échantillons de durée correspondant à ce délai, M étant le nombre total de blocs que dure une fonction de transfert dans une représentation temps-fréquence,
le signe « . » désignant la multiplication,
le signe « ^∗[0f^k (l)] » désignant l'opérateur de convolution sur un nombre de fréquences limité et allant d'une plus basse fréquence à une fréquence maximale f^k (l) qui est fonction au moins du signal d'entrée d'indice l, et
le signe « ^∗[0f^k (m)] » désignant l'opérateur de convolution sur un nombre de fréquences limité et allant d'une plus basse fréquence à une fréquence f^k (m) qui est fonction du bloc d'échantillons d'indice m.

The sound spatialization module receiving a plurality of input signals, delivers at least two output signals, the calculation means being configured to apply a room effect transfer function, at each input signal, each of said output signals being given by application of the following formula:

\begin{matrix} O^{k} = \sum_{I = 1}^{I} (I (I) *_{[0; \dots f^{k} (I)]} {AT}^{k} (I)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{I = 1}^{I} (G (I (I)) . \frac{1}{W^{k} (I)} . I (I))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}

O ^k being an output signal, and k being the index relating to an output signal,
l ∈ [1; L ], being the index relating to an input signal among said input signals, L being the number of input signals, and I ( l ) being an input signal among said input signals,
A ^k ( l ) being a transfer function with room effect specific to an input signal,
$B_{mean}^{k} (m)$
being a global transfer function, with room effect, common to the input signals,
W ^k ( l ) being a chosen weighting weight, and G ( I ( l )), a predetermined energy compensation gain,
^z- ^iDDm being an application of delay, counted in number of blocks of samples, corresponding to a time difference between a sound emission in a room corresponding to the room effect, and the beginning of the presence of a diffuse field in this room, the index m corresponding to a number of blocks of samples of duration corresponding to this delay, M being the total number of blocks that a transfer function lasts in a time-frequency representation,
the sign " . denoting multiplication,
the sign " ^∗ [0 f ^k ( l )]" designating the convolution operator on a limited number of frequencies and going from a lowest frequency to a maximum frequency f ^k ( l ) which is a function at least of the input signal of index l , and
the sign “ ^∗ [0 f ^k ( m )]” designating the convolution operator on a limited number of frequencies and going from a lowest frequency to a frequency f ^k ( m ) which is a function of the block of samples of index m .

Ce module peut être intégré dans un dispositif de décodage en compression, ou plus généralement dans un système de restitution.This module can be integrated into a compression decoding device, or more generally into a restitution system.

On a représenté sur la figure 6 un tel module de spatialisation SPAT, ainsi qu'un dispositif de décodage DECOD qui reçoit d'un réseau RES, dans l'exemple représenté, des signaux codés en compression I'(l) (avec I = 1,...,L) et les décode avant restitution en transmettant les signaux décodés I(l) (avec I = 1,...,L) au module de spatialisation. Ce dernier comprend dans l'exemple représenté une interface d'entrée IN pour recevoir les signaux décodés, ainsi que des moyens de calculs tels qu'un processeur PROC et une mémoire de travail MEM coopérant avec les interfaces IN/OUT pour spatialiser les signaux I(I) et délivrer par l'interface de sortie OUT uniquement deux signaux O^d et O^g destinés à alimenter les oreillettes respectives d'un casque CAS.We represented on the figure 6 such a spatialization module SPAT, as well as a decoding device DECOD which receives from a network RES, in the example represented, signals coded in compression I′( l ) (with I=1,...,L) and decodes them before restitution by transmitting the decoded signals I( l ) (with I=1,...,L) to the spatialization module. The latter comprises in the example represented an input interface IN to receive the decoded signals, as well as calculation means such as a processor PROC and a working memory MEM cooperating with the interfaces IN / OUT to spatialize the signals I ( I ) and deliver by the output interface OUT only two signals O ^d and O ^g intended to supply the respective earpieces of a CAS helmet.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ci-après, et des dessins annexés sur lesquels :

la figure 1 illustre une réalisation générale du procédé selon l'invention ;
la figure 2 illustre un exemple d'application du procédé selon un mode de réalisation où les fonctions de transfert sont sous la forme d'une combinaison de deux fonctions de transfert dont l'une est appliquée avec un retard sur le signal à traiter ;
la figure 3 représente un exemple de représentation temps-fréquence d'une fonction de transfert avec des fréquences de coupure (ou « fréquences seuils » précitées) variables notamment en fonction du temps ;
la figure 4 illustre un organigramme correspondant à un algorithme général possible du programme informatique au sens de l'invention,
la figure 5 représente une réalisation particulière issue du mode représenté sur la figure 2, mais sur plus de deux blocs temporels successifs, avec une évolution de la fonction de transfert $B_{mean}^{k} (m)$
représentant le champ diffus, en fonction des blocs m ;
la figure 6 illustre un exemple de module de spatialisation au sens de l'invention ;
la figure 7 illustre schématiquement les haut-parleurs virtuels et l'effet de salle pour appliquer une fonction de transfert appropriée, avec limitation des composantes fréquentielles de cette fonction de transfert jusqu'à une fréquence de coupure appropriée.

Other characteristics and advantages of the invention will appear on examination of the detailed description below, and of the appended drawings in which:

there figure 1 illustrates a general embodiment of the method according to the invention;
there figure 2 illustrates an example of application of the method according to an embodiment where the transfer functions are in the form of a combination of two transfer functions, one of which is applied with a delay to the signal to be processed;
there picture 3 represents an example of time-frequency representation of a transfer function with cut-off frequencies (or aforementioned “threshold frequencies”) which vary in particular as a function of time;
there figure 4 illustrates a flowchart corresponding to a possible general algorithm of the computer program within the meaning of the invention,
there figure 5 represents a particular realization resulting from the mode represented on the picture 2 , but over more than two successive time blocks, with an evolution of the transfer function $B_{mean}^{k} (m)$
representing the diffuse field, as a function of the blocks m ;
there figure 6 illustrates an example of a spatialization module within the meaning of the invention;
there figure 7 schematically illustrates the virtual loudspeakers and the room effect to apply an appropriate transfer function, with component limitation frequencies of this transfer function up to an appropriate cut-off frequency.

Avant de décrire la figure 1 et le principe général de l'invention, on se réfère à la figure 7 pour expliquer les phénomènes physiques qui sous-tendent la présente invention.Before describing the figure 1 and the general principle of the invention, reference is made to the figure 7 to explain the physical phenomena underlying the present invention.

Une pluralité de haut-parleurs virtuels entourent, dans l'exemple représenté, la tête TE d'un auditeur. Chacun des haut-parleurs virtuels HPV est alimenté initialement par un signal I(l) avec l ∈ [1;L] par exemple précédemment décodé comme indiqué ci-avant en référence à la figure 6. La disposition des haut-parleurs virtuels peut concerner une représentation multicanale ou aussi ambiophonique des signaux I(l) à traiter pour les restituer ensemble de façon spatialisée avec un effet de salle sur un casque à oreillettes CAS (figure 6). A cet effet, on applique habituellement à chaque signal une fonction de transfert à effet de salle pour chaque signal d'oreillette à délivrer O^k, avec k = d (pour droite), g (pour gauche). Ainsi, en référence à la figure 7, on considère pour chaque haut-parleur virtuel HPV le trajet acoustique (ipsi-latéral TIL dans l'exemple représenté) du haut-parleur HPV à l'oreille gauche OG, ainsi que le trajet acoustique (contra-latéral TCL dans l'exemple représenté) du haut-parleur HPV à l'oreille droite OD, ainsi que des réflexions sur les murs MUR (trajet RIL), et finalement un champ diffus après plusieurs réflexions. A chaque réflexion, il est considéré que l'onde acoustique s'atténue dans les fréquences les plus élevées.A plurality of virtual loudspeakers surround, in the example represented, the head TE of a listener. Each of the HPV virtual loudspeakers is initially fed by a signal I ( l ) with l ∈ [1; L ] for example previously decoded as indicated above with reference to the figure 6 . The layout of the virtual loudspeakers can relate to a multichannel or also surround sound representation of the signals I ( l ) to be processed in order to restore them together in a spatialized way with a room effect on a headset with CAS earphones ( figure 6 ). For this purpose, a room effect transfer function is usually applied to each signal for each atrial signal to be delivered O ^k , with k=d (for right), g (for left). So, with reference to the figure 7 , we consider for each virtual loudspeaker HPV the acoustic path (ipsi-lateral TIL in the example represented) from the loudspeaker HPV to the left ear OG, as well as the acoustic path (contra-lateral TCL in the example represented) from the loudspeaker HPV to the right ear OD, as well as reflections on the walls MUR (path RIL), and finally a diffuse field after several reflections. With each reflection, it is considered that the acoustic wave is attenuated in the highest frequencies.

Ainsi, en référence à la figure 3 relative à une représentation temps-fréquence d'une fonction de transfert adaptée pour le haut-parleur virtuel HPV représenté sur la figure 7, il apparait déjà que la tête de l'auditeur masque naturellement le trajet contra-latéral et les fréquences les plus élevées à considérer pour la fonction de transfert propre à l'oreille droite OD sont plus basses que celles à considérer pour la fonction de transfert propre à l'oreille gauche OG (qui est en regard du haut-parleur virtuel HPV selon un trajet ipsi-latéral). Ainsi, en considérant un premier bloc de temps de 0 à N-1, noté m=0, la fréquence maximale F_c ^d(0) d'un filtre représentant la fonction de transfert propre à l'oreille droite peut être plus basse que la fréquence maximale F_c ^g(0) d'un filtre représentant la fonction de transfert propre à l'oreille gauche. Un concepteur de tel filtre peut ainsi limiter les composantes de son filtre pour l'oreille droite jusqu'à la fréquence de coupure F_c ^d(0) (correspondant à une fréquence d'écrantage de la tête) quand bien même le signal à traiter I(l) peut avoir des composantes spectrales supérieures et jusqu'à la fréquence F_c ^g(0) au moins.So, with reference to the picture 3 relating to a time-frequency representation of a suitable transfer function for the HPV virtual loudspeaker represented on the figure 7 , it already appears that the listener's head naturally masks the contra-lateral path and the highest frequencies to be considered for the transfer function specific to the right ear OD are lower than those to be considered for the transfer function specific to the left ear OG (which is facing the virtual loudspeaker HPV along an ipsi-lateral path). Thus, by considering a first block of time from 0 to N-1, denoted m=0, the maximum frequency F _c ^d (0) of a filter representing the transfer function specific to the right ear can be lower than the maximum frequency F _c ^g (0) of a filter representing the transfer function specific to the left ear. A designer of such a filter can thus limit the components of his filter for the right ear up to the cutoff frequency F _c ^d (0) (corresponding to a head screening frequency) even though the signal to be processed I( l ) may have higher spectral components and up to the frequency F _c ^g (0) at least.

Ensuite, après des réflexions, l'onde acoustique tend à s'atténuer dans les fréquences élevées, ce qui est bien respecté par la représentation temps-fréquence de la fonction de transfert pour l'oreille gauche, comme pour l'oreille droite, pour les instants N à 2N-1, correspondant au bloc suivant noté m=1. Ainsi, un concepteur de filtres représentant ces fonctions de transfert peut prévoir de limiter les composantes des filtres pour l'oreille droite jusqu'à la fréquence de coupure F_c ^d(1) et pour l'oreille gauche jusqu'à la fréquence de coupure F_c ^g(1). Dans un mode de réalisation illustré en particulier sur la figure 5, on peut considérer que dans le bloc m=1, la fonction de transfert caractérise typiquement un champ diffus pour l'oreille droite comme pour l'oreille gauche, et, ainsi, il peut être établi (possiblement mais non limitativement) que F_c ^d(1)=F_c ^g(1).Then, after reflections, the acoustic wave tends to attenuate in the high frequencies, which is well respected by the time-frequency representation of the transfer function for the left ear, as for the right ear, for the instants N to 2N-1, corresponding to the following block noted m=1. Thus, a designer of filters representing these transfer functions can plan to limit the components of the filters for the right ear up to the cutoff frequency F _c ^d (1) and for the left ear up to the cutoff frequency F _c ^g (1). In an embodiment illustrated in particular on the figure 5 , we can consider that in the block m=1, the transfer function typically characterizes a diffuse field for the right ear as for the left ear, and, thus, it can be established (possibly but not limitingly) that F _c ^d (1)=F _c ^g (1).

Ensuite, en présence de champ diffus avec atténuation globale du son (« fade out »), l'onde acoustique tend à s'atténuer davantage dans les fréquences élevées, ce qui est bien respecté encore par la représentation temps-fréquence de la fonction de transfert pour l'oreille gauche, comme pour l'oreille droite, sur la figure 3, pour les instants 2N à 3N-1, correspondant au bloc noté m=2. Ainsi, un concepteur de filtres représentant ces fonctions de transfert peut prévoir de limiter les composantes des filtres pour l'oreille droite jusqu'à la fréquence de coupure F_c ^d(2) et pour l'oreille gauche jusqu'à la fréquence de coupure F_c ⁸(2).Then, in the presence of a diffuse field with overall attenuation of the sound ("fade out"), the acoustic wave tends to be attenuated more in the high frequencies, which is still well respected by the time-frequency representation of the transfer function for the left ear, as for the right ear, on the picture 3 , for times 2N to 3N-1, corresponding to the block noted m=2. Thus, a designer of filters representing these transfer functions can plan to limit the components of the filters for the right ear up to the cutoff frequency F _c ^d (2) and for the left ear up to the cutoff frequency F _c ⁸ (2).

On relèvera que des blocs plus courts permettraient de faire varier plus finement la plus haute fréquence à considérer par exemple pour tenir compte d'une première réflexion RIL pour laquelle la fréquence la plus haute augmente pour l'oreille droite (traits pointillés autour de F_c ^d(0) dans la figure 3) dans les premiers instants du bloc m=0.It will be noted that shorter blocks would make it possible to vary the highest frequency to be considered more finely, for example to take account of a first RIL reflection for which the highest frequency increases for the right ear (dotted lines around F _c ^d (0) in the picture 3 ) in the first moments of the block m=0.

Ainsi, on retiendra qu'il est possible de ne pas prendre en compte toutes les composantes spectrales d'un filtre représentant une fonction de transfert, en particulier au-delà d'une fréquence de coupure F_c. Dès lors, il est avantageux de traiter l'application de la fonction de transfert, dans le domaine spectral. En effet, la convolution d'un signal I(l) par une fonction de transfert devient, dans le domaine spectral, une multiplication des composantes spectrales du signal I(l) par les composantes spectrales du filtre représentant la fonction de transfert dans le domaine spectral, et, en particulier, cette multiplication peut être opérée jusqu'à une fréquence de coupure seulement, qui est fonction d'un bloc donné, par exemple, et du signal à traiter.Thus, it will be noted that it is possible not to take into account all the spectral components of a filter representing a transfer function, in particular beyond a cut-off frequency F _c . Consequently, it is advantageous to treat the application of the function of transfer, in the spectral domain. Indeed, the convolution of a signal I( l ) by a transfer function becomes, in the spectral domain, a multiplication of the spectral components of the signal I( l ) by the spectral components of the filter representing the transfer function in the spectral domain, and, in particular, this multiplication can be operated up to a cutoff frequency only, which is a function of a given block, for example, and of the signal to be processed.

Ainsi, en référence à la figure 1, L signaux d'entrée I(1), I(2), ..., I(L) sont transformés dans le domaine fréquentiel, respectivement aux étapes TF11, TF12, ..., TF1L. En variante, de tels signaux d'entrée peuvent déjà être disponibles sous forme fréquentielle (par exemple auprès du décodeur).So, with reference to the figure 1 , L input signals I(1), I(2), ..., I(L) are transformed into the frequency domain, respectively at steps TF11, TF12, ..., TF1L. As a variant, such input signals may already be available in frequency form (for example from the decoder).

A l'étape BA11, une réponse impulsionnelle complète de spatialisation (typiquement de type BRIR pour « Binaural Room Impulse Response ») sous forme temporelle correspondant au signal I(1) du canal 1 est mise en mémoire. A l'étape TFA11, on transforme cette réponse impulsionnelle sous forme fréquentielle pour obtenir un filtre correspondant dans le domaine spectral. Dans un mode de réalisation avantageux, le filtre est stocké sous sa forme spectrale pour éviter de répéter le calcul de la transformée. On multiplie ensuite ce filtre au signal d'entrée sous forme fréquentielle du canal 1 (ce qui équivaut à une convolution dans le domaine temporel). On dispose donc du signal spatialisé pour le signal I(1) du canal 1.At step BA11, a complete spatialization impulse response (typically of the BRIR type for “Binaural Room Impulse Response”) in temporal form corresponding to signal I(1) of channel 1 is stored in memory. In step TFA11, this impulse response is transformed into frequency form to obtain a corresponding filter in the spectral domain. In an advantageous embodiment, the filter is stored in its spectral form to avoid repeating the calculation of the transform. This filter is then multiplied with the input signal in frequency form of channel 1 (which is equivalent to a convolution in the time domain). We therefore have the spatialized signal for the signal I(1) of channel 1.

On effectue les mêmes opérations pour les L-1 autres canaux. On dispose ainsi au total de L canaux spatialisés. Ces canaux sont ensuite sommés pour obtenir un unique signal de sortie représentant les L canaux, et on repasse dans le domaine temporel à l'étape ITF11, pour délivrer l'un des signaux O^k (avec k=d,g) alimentant une oreillette. Un traitement analogue est réalisé pour l'autre oreillette. Dans un mode de réalisation décrit en détails plus loin en référence aux figures 2 et 5, les L canaux spatialisés ne sont pas accessibles indépendamment avant sommation : l'unique signal de sortie est construit en sommant au fur et à mesure chaque canal spatialisé avec le signal de sortie précédent.The same operations are carried out for the other L-1 channels. There is thus a total of L spatialized channels. These channels are then summed to obtain a single output signal representing the L channels, and the time domain is returned to step ITF11, to deliver one of the signals O ^k (with k=d,g) supplying an earpiece. A similar processing is carried out for the other auricle. In an embodiment described in detail below with reference to the figures 2 and 5 , the L spatialized channels are not independently accessible before summing: the single output signal is constructed by summing each spatialized channel with the previous output signal as it goes along.

On effectue ces opérations pour chaque signal de sortie O^k à construire. Typiquement, s'il s'agit d'un restitution binaurale, ces étapes sont effectuées deux fois, une fois pour le signal de sortie destiné à alimenter une oreillette gauche d'un casque d'écoute et une fois pour le signal de sortie destiné à alimenter l'oreillette droite du casque. Ainsi, on obtient finalement deux signaux spatialisés O^d et O^g correspondant chacun à une oreille.These operations are carried out for each output signal O ^k to be constructed. Typically, if it is a binaural playback, these steps are performed twice, once for the output signal intended to feed a left earpiece of a headset and once for the output signal intended to feed the right earpiece of the headset. Thus, two spatialized signals O ^d and O ^g are finally obtained, each corresponding to an ear.

Les L signaux d'entrée peuvent correspondre typiquement aux L canaux d'un contenu audio multicanal censé alimenter des haut-parleurs (« virtuels »). Les L signaux d'entrée peuvent par exemple correspondre aux L signaux ambiophoniques d'un contenu audio en représentation ambiophonique.The L input signals can typically correspond to the L channels of a multi-channel audio content supposed to feed (“virtual”) loudspeakers. The L input signals can for example correspond to the L surround signals of an audio content in surround representation.

En référence maintenant à la figure 2 illustrant une mise en oeuvre au sens de l'invention, on reprend le principe d'une spatialisation de L canaux comme présenté sur la figure 1. Néanmoins, la présentation de la figure 2 est simplifiée en ce que les L signaux d'entrée sont réunis en une seule voie I(l). Ainsi, L signaux d'entrées I(1), I(2), ..., I(L) sont transformés dans le domaine fréquentiel à l'étape S21. Comme indiqué précédemment, de tels signaux d'entrée peuvent en variante être déjà disponibles sous forme fréquentielle. A l'étape S22, une réponse impulsionnelle A^k(/) de spatialisation (typiquement de type BRIR) correspondant au signal I(l) du canal / est transformée dans le domaine spectral pour obtenir un filtre fréquentiel. Cette réponse impulsionnelle A^k(I) est incomplète dans la représentation de la figure 2 car elle correspond à un premier bloc temporel d'échantillons m=0. Comme indiqué précédemment, cette réponse impulsionnelle peut être déjà disponible sous forme fréquentielle. On multiplie ensuite les composantes de ce filtre au signal spectral du canal correspondant I(l). Cette multiplication est paramétrée (comme indiqué ci-dessous en référence à la figure 4) pour que certaines composantes fréquentielles soient ignorées, au sens de l'invention. Typiquement, les composantes fréquentielles les plus élevées seront ignorées pour limiter la complexité des calculs. Sur les figures 2 et 5, on note alors la multiplication des composantes limitée à une fréquence de coupure par le signe : X Referring now to the figure 2 illustrating an implementation within the meaning of the invention, the principle of a spatialization of L channels as presented on the figure 1 . Nevertheless, the presentation of the picture 2 is simplified in that the L input signals are combined into a single channel I( l ). Thus, L input signals I(1), I(2), ..., I(L) are transformed into the frequency domain at step S21. As indicated previously, such input signals may alternatively already be available in frequency form. At step S22, a spatialization impulse response A ^k (/) (typically of the BRIR type) corresponding to the signal I( 1 ) of the channel / is transformed into the spectral domain to obtain a frequency filter. This impulse response A ^k ( I ) is incomplete in the representation of the picture 2 because it corresponds to a first temporal block of samples m=0. As indicated above, this impulse response may already be available in frequency form. The components of this filter are then multiplied by the spectral signal of the corresponding channel I( l ). This multiplication is parameterized (as shown below with reference to the figure 4 ) so that certain frequency components are ignored, within the meaning of the invention. Typically, the highest frequency components will be ignored to limit the complexity of the calculations. On the figures 2 and 5 , we then note the multiplication of the components limited to a cut-off frequency by the sign: X

Une fréquence de coupure f_cA(I) à partir de laquelle les composantes fréquentielles sont ignorées est définie (par exemple la fréquence maximale représentée dans le signal du canal I(l), ou la moitié de sa fréquence d'échantillonnage). En outre, cette fréquence de coupure est spécifique à chaque filtre et pour chaque bloc (elle diminue par exemple pour les blocs m=1, m=2). Comme les filtres sont ici spécifiques à chaque signal d'entrée et à chaque oreille, une fréquence de coupure est spécifique à un signal d'entrée, à une oreille (donc un signal de sortie) et à un bloc temporel.A cutoff frequency f _cA(I) from which the frequency components are ignored is defined (for example the maximum frequency represented in the channel signal I( l ), or half of its sampling frequency). Furthermore, this cutoff frequency is specific to each filter and for each block (it decreases for example for the blocks m=1, m=2). As the filters here are specific to each input signal and to each ear, a cutoff frequency is specific to an input signal, to an ear (therefore an output signal) and to a temporal block.

On dispose alors du signal spatialisé pour le canal l pour ce premier bloc temporel. On effectue ces opérations pour tous les L canaux : l = 1,...,L. On dispose ainsi de L canaux spatialisés. Ces canaux sont ensuite sommés à l'étape S23 pour obtenir un unique signal représentant les L canaux sur le premier bloc temporel.The spatialized signal is then available for the channel l for this first time block. These operations are carried out for all the L channels: l = 1,...,L. There are thus L spatialized channels. These channels are then summed at step S23 to obtain a single signal representing the L channels over the first time block.

En pratique, la sommation est réalisée de façon particulière, car elle tient compte d'un retard sur les canaux pour caractériser les réverbérations (réflexions et champ diffus), comme détaillé ci-après. En effet, dans un mode de réalisation, les L canaux spatialisés ne sont pas accessibles indépendamment avant sommation : l'unique signal de sortie est construit en sommant au fur et à mesure chaque canal spatialisé avec le signal de sortie précédent. A cet effet, à l'étape DBD, on retarde les signaux d'entrée I(l) d'un certain délai donné par z^-iDD.m propre à chaque bloc m=1, ..., M. On remarque que pour le premier bloc, le retard m est nul. Dans le cas d'une représentation fréquentielle, ce délai correspond généralement à la taille d'une trame de signal traité pour le premier bloc, et s'interprète comme le fait de prendre le bloc d'entrée précédent dans sa forme fréquentielle.In practice, the summation is carried out in a special way, because it takes account of a delay on the channels to characterize the reverberations (reflections and diffuse field), as detailed below. Indeed, in one embodiment, the L spatialized channels are not independently accessible before summing: the single output signal is constructed by summing each spatialized channel with the preceding output signal as it goes along. To this end, in the DBD step, the input signals I( 1 ) are delayed by a certain delay given by z ^-iDD.m specific to each block m=1, ..., M. Note that for the first block, the delay m is zero. In the case of a frequency representation, this delay generally corresponds to the size of a signal frame processed for the first block, and is interpreted as taking the previous input block in its frequency form.

A l'étape S24, une réponse impulsionnelle B^k _m(l) incomplète de spatialisation (typiquement de type BRIR) correspondant au signal I(l) du canal l est transformée dans le domaine spectral pour obtenir un filtre fréquentiel. Cette réponse impulsionnelle B^k _m(/) est incomplète car elle correspond à un second bloc temporel d'échantillons (puis à un troisième bloc et ainsi de suite, pour m=1, ..., M). Comme indiqué précédemment, cette réponse impulsionnelle peut en variante être déjà disponible sous forme fréquentielle. En appliquant le principe décrit dans le document FR1357299 , il est possible de réduire la complexité du traitement en posant B^k _m(1)= ... = B^k _m(l) = ... = B^k _m(L)= B^k _mean(m) et de ne faire dépendre finalement cette fonction de transfert que du bloc m considéré (champ diffus principal, ou champ diffus secondaire avec atténuation « fade out ») et de l'oreille k. De même, le champ diffus ne dépend pas des canaux et il est possible de fixer la fréquence de coupure f_c comme étant identique pour chaque canal (mais qui peut diminuer encore d'un bloc au suivant, comme on l'a vu précédemment en référence à la figure 3). Cette réalisation est présentée sur la figure 5.At step S24, an incomplete spatialization impulse response B ^k _m ( l ) (typically of BRIR type) corresponding to signal I( l ) of channel l is transformed into the spectral domain to obtain a frequency filter. This impulse response B ^k _m (/) is incomplete because it corresponds to a second temporal block of samples (then to a third block and so on, for m=1, ..., M). As indicated above, this impulse response may alternatively already be available in frequency form. By applying the principle described in the document FR1357299 , it is possible to reduce the complexity of the processing by setting B ^k _m (1)= ... = B ^k _m ( l ) = ... = B ^k _m (L)= B ^k _mean ( m ) and finally to make this transfer function depend only on the block m considered (main diffuse field, or diffuse field secondary with “fade out” attenuation) and ear k. Similarly, the diffuse field does not depend on the channels and it is possible to fix the cut-off frequency f _c as being identical for each channel (but which can decrease further from one block to the next, as we saw previously with reference to the picture 3 ). This achievement is presented on the figure 5 .

En référence à nouveau à la figure 2, on multiplie ensuite ce filtre B^k _m(/) au signal I(l) du canal l. Les fréquences de coupure sont différentes pour ce second bloc temporel. Comme présenté en référence à la figure 3, des mesures montrent que les hautes fréquences sont plus atténuées dans les blocs temporels éloignés (correspondants à des sons diffus et aux réverbérations multiples). Les fréquences de coupure pour ces blocs éloignés peuvent donc être plus faibles que pour les premiers blocs. Or, plus la fréquence de coupure est faible, plus le nombre d'opérations est limité. Ainsi, la complexité des calculs est avantageusement réduite.Referring again to the picture 2 , this filter B ^k _m (/) is then multiplied with the signal I( l ) of channel l . The cutoff frequencies are different for this second time block. As presented with reference to picture 3 , measurements show that the high frequencies are more attenuated in the distant temporal blocks (corresponding to diffuse sounds and multiple reverberations). The cutoff frequencies for these distant blocks can therefore be lower than for the first blocks. However, the lower the cut-off frequency, the more limited the number of operations. Thus, the complexity of the calculations is advantageously reduced.

On effectue les mêmes opérations pour les L canaux et on répète les opérations de multiplication du filtre sur les signaux spectraux progressivement retardés en sommant les contributions à l'étape S25 réitérée pour chaque retard m jusqu'à obtenir un unique signal représentant les L canaux sur l'ensemble M des blocs temporels m considérés. L'unique signal de sortie est construit en sommant au fur et à mesure chaque canal spatialisé avec le signal de sortie précédent comme on le verra maintenant en référence à la figure 4.The same operations are carried out for the L channels and the filter multiplication operations are repeated on the progressively delayed spectral signals by summing the contributions at step S25 repeated for each delay m until a single signal representing the L channels is obtained over the set M of the temporal blocks m considered. The single output signal is constructed by gradually summing each spatialized channel with the previous output signal as will now be seen with reference to the figure 4 .

Enfin, on repasse dans le domaine temporel à l'étape S26 pour obtenir un signal de sortie destiné à alimenter l'une des oreillettes du casque.Finally, we go back to the time domain at step S26 to obtain an output signal intended to supply one of the earpieces of the helmet.

En référence à la figure 4, on décrit maintenant un procédé de spatialisation pour un bloc temporel donné (par exemple pour le bloc représentant le champ sonore direct à valeurs dans l'intervalle temporel [0 ; N-1]) et pour un signal correspondant, par exemple, à l'oreille droite. Bien sûr, le même procédé est appliqué pour le signal correspond à l'oreille gauche. La distinction entre les deux oreilles est introduite par l'application de filtres spécifiques à chacune de ces oreilles.With reference to the figure 4 , we now describe a spatialization method for a given time block (for example for the block representing the direct sound field with values in the time interval [0; N-1]) and for a signal corresponding, for example, to the right ear. Of course, the same process is applied for the signal corresponds to the left ear. The distinction between the two ears is introduced by the application of filters specific to each of these ears.

A l'étape S40, le signal de sortie S est initialisé à 0. Ce signal de sortie s'exprime dans le domaine fréquentiel. Il possède une taille limitée d'une longueur supérieure à la fréquence de coupure fc(l). Par exemple, ce signal est défini sur [0; fs(l)/2], fs(l) étant la fréquence d'échantillonnage de ce signal I(l). Une première variable de comptage l est également initialisée à 1. Cette première variable de comptage identifie un des signaux de canal I(1), I(2), ..., I(l), ..., I(L) sur le bloc temporel [0 ; N-1] pour l'oreille droite. A l'étape S41, une deuxième variable de comptage j est initialisée à 0. Cette seconde variable de comptage identifie une composante fréquentielle d'un signal I(l) sur le bloc temporel [0 ; N-1] pour l'oreille droite.At step S40, the output signal S is initialized to 0. This output signal is expressed in the frequency domain. It has a limited size with a length greater than the cutoff frequency fc( l ). For example, this signal is set to [0; fs( l )/2], fs( l ) being the sampling frequency of this signal I( l ). A first count variable l is also initialized to 1. This first count variable identifies one of the channel signals I(1), I(2), ..., I( l ), ..., I(L) on the temporal block [0; N-1] for the right ear. At step S41, a second count variable j is initialized to 0. This second count variable identifies a frequency component of a signal I( 1 ) on the temporal block [0; N-1] for the right ear.

A l'étape S42, le coefficient c_BRIR(j ; l) est mis en mémoire. Ce coefficient correspond à la composante fréquentielle j du filtre BRIR(l) sur le bloc temporel [0 ; N-1] pour l'oreille droite. De même, le coefficient c_I(j ; l) est mis en mémoire. Ce coefficient correspond à la composante fréquentielle j du signal I(l) sur le bloc temporel [0; N-1] pour l'oreille droite. Ainsi, les coefficients c_BRIR(j ; l) et c_I(j ; l) correspondent à la même composante fréquentielle (identifiée par la variable j) et pourront ainsi être ultérieurement multipliés terme à terme (étape S44).At step S42, the coefficient c _BRIR (j; l ) is stored. This coefficient corresponds to the frequency component j of the BRIR( l ) filter on the temporal block [0; N-1] for the right ear. Similarly, the coefficient c _I (j; l ) is stored. This coefficient corresponds to the frequency component j of the signal I( l ) on the temporal block [0; N-1] for the right ear. Thus, the coefficients c _BRIR (j; l ) and c _I (j; l ) correspond to the same frequency component (identified by the variable j) and can thus be subsequently multiplied term by term (step S44).

Au test T47, on vérifie que la fréquence correspondant à la variable j est inférieure (par exemple strictement) à la fréquence de coupure fc(/). Cette fréquence de coupure correspond à la fréquence de coupure du signal I(l) pour le bloc temporel [0 ; N-1] pour l'oreille droite. Si la fréquence j est inférieure à la fréquence de coupure fc(/), on passe à l'étape S44.In test T47, it is checked that the frequency corresponding to variable j is lower (for example strictly) than the cut-off frequency fc(/). This cutoff frequency corresponds to the cutoff frequency of the signal I( l ) for the temporal block [0; N-1] for the right ear. If frequency j is lower than cutoff frequency fc(/), step S44 proceeds.

A l'étape S44, on calcul une valeur MULT(j) correspondant à la multiplication des coefficients c_BRIR(j ; l) et c_i(j ; 1). Ces coefficients sont bien multipliés terme à terme car ils correspondent à la même composante fréquentielle j (pour un même canal, sur un même bloc et pour une même oreille).At step S44, a value MULT(j) corresponding to the multiplication of the coefficients c _BRIR (j; l ) and c _i (j; 1) is calculated. These coefficients are indeed multiplied term by term because they correspond to the same frequency component j (for the same channel, on the same block and for the same ear).

A l'étape S45, on incrémente cette valeur MULT(j) au signal S à la position de la fréquence j.At step S45, this value MULT(j) is incremented at signal S at the position of frequency j.

Ainsi, on procède à une construction pas à pas d'un signal S qui comprend (à la fin de la boucle de longueur fc(/)) l'ensemble des composantes fréquentielles jusqu'à la fréquence de coupure fc(t) (pour ce signal I(l) sur le bloc [0; N-1] et pour une oreille droite). Comme au début de la boucle de la figure 4, on a déjà toutes les composantes initialisées à 0, à la fin de la boucle, on a finalement rempli un buffer (initialement nul) jusqu'à la fréquence de coupure pour construire successivement le signal S. Ainsi, chaque multiplication MULT(j) de coefficients est ajoutée pas à pas au signal S en construction.Thus, we proceed to a step-by-step construction of a signal S which includes (at the end of the loop of length fc(/)) all the frequency components up to the cutoff frequency fc(t) (for this signal I( l ) on the block [0; N-1] and for a right ear). As at the beginning of the loop of the figure 4 , we already have all the components initialized to 0, at the end of the loop, we have finally filled a buffer (initially null) up to the cutoff frequency to successively build the signal S. Thus, each multiplication MULT(j) of coefficients is added step by step to the signal S under construction.

A l'étape S46, on incrémente la variable j et on reprend à l'étape S42. Si la variable j est supérieure (par exemple ou égale) à la fréquence de coupure fc(/), on passe au test T48. Ainsi, on a rempli le signal S sur l'intervalle [0 ; fc(l)].At step S46, variable j is incremented and return to step S42. If the variable j is greater than (for example or equal to) the cut-off frequency fc(/), test T48 is passed on to. Thus, the signal S has been filled over the interval [0; fc( l )].

Comme précisé ci-avant, ce signal peut être défini sur un intervalle plus grand que [0; fc(/)] (par exemple [0 ; fs(l)/2]). De plus, ce signal avait été initialisé à 0 sur la totalité de son intervalle de définition. Dès lors, il est nul sur le reste de l'intervalle qui n'a pas été rempli (par exemple [fc(l) ; fs(l)/2]). La complexité est donc ici améliorée car des étapes de remplissage du signal S n'ont pas été effectuées, ce qui réduit le nombre de calculs nécessaires.As specified above, this signal can be defined on an interval larger than [0; fc(/)] (e.g. [0; fs( l )/2]). Moreover, this signal had been initialized to 0 over its entire definition interval. Therefore, it is zero on the rest of the interval which has not been filled (for example [fc( l ); fs( l )/2]). The complexity is therefore improved here because steps of filling the signal S have not been carried out, which reduces the number of calculations necessary.

Au test T48, on vérifie que la variable de comptage / correspondant au signal I(l) du canal / est inférieure (par exemple strictement) au nombre L de canaux. Si la variable / est inférieure ou égale à L, on incrémente la variable / à l'étape S49 et on reprend le procédé à l'étape S41. Si la variable / est supérieure à L, le signal S correspondant au signal spatialisé pour le bloc temporel [0 ; N-1] pour l'oreille droite est disponible à l'étape S50.In test T48, it is checked that the counting variable / corresponding to the signal I( 1 ) of the channel / is less (for example strictly) than the number L of channels. If the variable / is less than or equal to L, the variable / is incremented at step S49 and the process is resumed at step S41. If the variable / is greater than L, the signal S corresponding to the spatialized signal for the temporal block [0; N-1] for the right ear is available at step S50.

Ce signal S correspondant au bloc temporel [0; N-1] est ensuite sommé aux autres signaux générés de manière similaire pour d'autres blocs temporels [N ; 2N-1], [2N ; 3N-1], etc., (et pour lesquels un retard approprié a été appliqué conformément à l'étape DBD ci-avant de la figure 2 par exemple).This signal S corresponding to the temporal block [0; N-1] is then summed with the other signals generated in a similar way for other time blocks [N; 2N-1], [2N; 3N-1], etc., (and for which an appropriate delay has been applied according to the above DBD step of the figure 2 For example).

Typiquement, pour construire le bloc [N ; 2N-1], on applique dans le domaine fréquentiel un filtre correspondant à une fonction de transfert commune à tous les signaux d'entrée I(l), représentant le champ diffus, avec une fréquence de coupure fc dans la multiplication des composantes spectrales qui correspond au minimum entre :

une fréquence maximale de champ diffus Fc (diffus) comme illustrée sur la figure 3 décrite ci-avant (choisie par exemple entre 10 à 15 kHz pour le bloc m=1 et entre 5 à 10 kHz pour le bloc m=2), et
la fréquence maximale fmax représentée dans chaque signal d'entrée (par exemple sa fréquence d'échantillonnage ou la fréquence maximale dont la composante spectrale n'est pas nulle, cette valeur étant habituellement donnée par un décodeur en compression).

Typically, to construct the block [N; 2N-1], a filter is applied in the frequency domain corresponding to a transfer function common to all the input signals I( l ), representing the diffuse field, with a cut-off frequency fc in the multiplication of the spectral components which corresponds to the minimum between:

a maximum diffuse field frequency Fc (diffuse) as illustrated in the picture 3 described above (chosen for example between 10 to 15 kHz for the block m=1 and between 5 to 10 kHz for the block m=2), and
the maximum frequency fmax represented in each input signal (for example its sampling frequency or the maximum frequency whose spectral component is not zero, this value usually being given by a compression decoder).

Il est à noter que la multiplication fréquentielle en s'arrêtant à une fréquence de coupure donnée (ce qui revient mathématiquement à multiplier par 0 au-delà) n'est pas triviale pour l'homme de l'art. En effet, dans un contexte de filtrage d'un signal audio, ce type de filtre passe-bas très violent apporte généralement des artefacts audibles (dits « d'aliasing »), dus à des phénomènes d'écho ou de pré-écho provenant du repliement temporel généré par la convolution circulaire, qu'il est généralement désirable d'éviter. Or, dans le contexte de l'invention, ce filtre passe-bas n'est pas appliqué au signal audio, mais au filtre BRIR (qui est lui-même convolué au signal audio) qui est déjà composé de multiples réflexions ; les artefacts produits seront donc, au pire, perçus comme des réflexions supplémentaires du filtre BRIR d'origine, et en pratique rarement perceptibles. Il est toutefois possible d'atténuer néanmoins ces artefacts en modifiant légèrement les fréquences du filtre précédant la fréquence de coupure (par exemple par atténuation douce par application d'une demi-fenêtre de Hanning (type fade out)).It should be noted that the frequency multiplication by stopping at a given cut-off frequency (which mathematically amounts to multiplying by 0 beyond) is not trivial for those skilled in the art. Indeed, in the context of filtering an audio signal, this type of very violent low-pass filter generally brings audible artifacts (known as "aliasing"), due to echo or pre-echo phenomena originating from the temporal aliasing generated by circular convolution, which it is generally desirable to avoid. However, in the context of the invention, this low-pass filter is not applied to the audio signal, but to the BRIR filter (which is itself convoluted to the audio signal) which is already composed of multiple reflections; the artefacts produced will therefore, at worst, be perceived as additional reflections of the original BRIR filter, and in practice rarely perceptible. It is however possible to attenuate these artefacts by slightly modifying the frequencies of the filter preceding the cutoff frequency (for example by soft attenuation by applying a half Hanning window (fade out type)).

De manière générale, en référence à la figure 4, on remarquera que deux opérations sont effectuées dans une même instance de boucle (typiquement un coup d'horloge) : la multiplication MULT(k) et son ajout au signal S de sortie. Ceci permet notamment d'implémenter ce procédé sur des processeurs qui ont la possibilité d'effectuer plusieurs opérations sur une même instance de boucle (typiquement un coup d'horloge) et de réduire ainsi le temps nécessaire aux calculs.In general, with reference to the figure 4 , it will be noted that two operations are carried out in the same loop instance (typically a clock stroke): the multiplication MULT(k) and its addition to the output signal S. This notably makes it possible to implement this method on processors which have the possibility of carrying out several operations on the same loop instance (typically a clock stroke) and thus of reducing the time necessary for the calculations.

On a illustré sur la figure 5 une forme algorithmique complète du traitement, conforme à la formule donnant un signal de sortie O^k présentée ci-avant : $\begin{matrix} O^{k} = \sum_{l = 1}^{L} (I (l) *_{[0; \dots f^{k} (l)]} A^{k} (l)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{l = 1}^{L} (G (I (l)) . \frac{1}{W^{k} (l)} . I (l))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}$

We have illustrated on the figure 5 a complete algorithmic form of the processing, conforming to the formula giving an output signal O ^k presented above:

\begin{matrix} O^{k} = \sum_{I = 1}^{I} (I (I) *_{[0; \dots f^{k} (I)]} {AT}^{k} (I)) + \\ \sum_{m = 1}^{M} (z^{- iDDm} . \sum_{I = 1}^{I} (G (I (I)) . \frac{1}{W^{k} (I)} . I (I))) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}

Comme indiqué ci-avant les pondérations W^k(l) et les gains G(I(l)) peuvent être fixés à 1. On n'a pas représenté sur la figure 5 les gains G(I(l)) car il convient de lire cette figure comme une intégration des gains aux poids 1/W^k(l). D'ailleurs, à la conception des filtres, ces deux paramètres sont déterminés, fixés et multipliés l'un à l'autre une fois pour toutes.As indicated above, the weightings W ^k ( l ) and the gains G(I( l )) can be fixed at 1. We have not represented on the figure 5 the gains G(I( l )) because this figure should be read as an integration of the gains at the weights 1/W ^k ( l ). Moreover, when designing filters, these two parameters are determined, fixed and multiplied together once and for all.

Claims

Method for achieving audio spatialization, comprising the application of at least one room-effect transfer function to at least one audio signal, said application amounting to multiplying, in the spectral domain, spectral components of the audio signal by spectral components of a filter corresponding to said transfer function, each spectral component of the filter comprising a variation as a function of time in a time-frequency representation,
in which method said spectral components of the filter are ignored, in said multiplications of components, beyond a threshold frequency and after at least one given time in said time-frequency representation and in which method, in an implementation by a module for achieving audio spatialization that receives a plurality of input signals and that delivers at least two output signals, to deliver each output signal, a room-effect transfer function is applied to each input signal, the method being characterized in that each of said output signals is given by application of the following formula: $\begin{matrix} O^{k} = \sum_{l = 1}^{L} (I (l) *_{[0; \dots; f^{k} (l)]} A^{k} (l)) \\ + \sum_{m = 1}^{M} (\begin{array}{l} z^{- iDDm} \\ \cdot \sum_{l = 1}^{L} (G (I (l)) \cdot \frac{1}{W^{k} (l)} \cdot I (l)) \end{array}) *_{[0; \dots; f^{k} (m)]} B_{mean}^{k} (m) \end{matrix}$

- O^k being an output signal, and k being the index relative to an output signal,

- l ∈ [1;L], being the index relative to one input signal among said input signals, L being the number of input signals, and I(l) being one input signal among said input signals,

- A^k (l) being a transfer function with room effect specific to one input signal,

- $B_{mean}^{k} (m)$
being an overall transfer function, with room effect, common to the input signals,

- W^k (l) being a selected weighting weight, and G(I(l)) being a predetermined energy compensation gain,

- z^-iDDm being an application of a delay, counted in number of blocks of samples, corresponding to a time difference between an audio emission in a room corresponding to the room effect and an onset of presence of a diffuse field in this room, the index m corresponding to a number of blocks of samples of duration corresponding to this delay, M being the total number of blocks that a transfer function lasts in a time-frequency representation,

- the sign "·" designating multiplication,

- the sign "^∗[0;...;f^k (l)]" designating convolution over a limited number of frequencies ranging from a lowest frequency to a maximum frequency f^k(l) that is dependent at least on the input signal of index l, and

- the sign "^∗[0;...;f^k (m)]" designating convolution over a limited number of frequencies ranging from a lowest frequency to a frequency f^k (m) that is dependent on the block of samples of index m.
Method according to Claim 1, wherein the threshold frequency decreases as a function of time in said time-frequency representation.
Method according to one of the preceding claims, wherein information is obtained on the spectral component of highest frequency in the audio signal, and wherein said threshold frequency is a minimum among a predetermined threshold frequency and said highest frequency.
Method according to Claim 3, wherein the audio signal is generated by a compression decoder and the information on the spectral component of highest frequency is delivered by the decoder.
Method according to one of Claims 3 and 4, wherein the audio signal is sampled at a given sampling frequency, said threshold frequency being chosen depending on said sampling frequency.
Method according to one of the preceding claims, wherein the audio signal is spatialized via at least a first virtual loudspeaker and a second virtual loudspeaker that are associated with a first channel and a second channel, respectively, and first and second transfer functions with room effect are applied to said first and second channels, respectively,
one of the first and second transfer functions applying an ipsi-lateral acoustic pathway effect, and the other of the first and second transfer functions applying a contra-lateral acoustic pathway effect with removal of spectral components above a given screening frequency from the audio signal,

and wherein said threshold frequency for the transfer function applying a contra-lateral pathway effect is a minimum among a predetermined threshold frequency and said screening frequency.
Method according to Claim 1, wherein the signals comprise successive blocks of samples, of same sizes between signals, and wherein said at least one given time is located temporally at the start of a block other than a first block in a succession of blocks.
Computer program, containing instructions for implementing the method according to one of the preceding claims, when they are executed by a processor.