EP1992198B1

EP1992198B1 - Optimization of binaural sound spatialization based on multichannel encoding

Info

Publication number: EP1992198B1
Application number: EP07731684.2A
Authority: EP
Inventors: Julien Faure; Jérôme DANIEL; Marc Emerit
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2006-03-09
Filing date: 2007-03-01
Publication date: 2016-07-20
Anticipated expiration: 2027-03-01
Also published as: WO2007101958A2; US20090067636A1; EP1992198A2; WO2007101958A3; US9215544B2

Description

La présente invention vise un traitement de signaux sonores pour leur spatialisation.The present invention aims at processing sound signals for their spatialization.

La restitution sonore spatialisée permet à un auditeur de percevoir des sources sonores provenant d'une direction ou d'une position quelconque de l'espace.Spatial sound reproduction allows a listener to perceive sound sources coming from a direction or from any position in space.

Les techniques particulières de restitution sonore spatialisées auxquelles se rapporte la présente invention sont basées sur les fonctions de transfert acoustique de la tête entre les positions de l'espace et le conduit auditif. Ces fonctions de transfert dites "HRTF" (pour "Head Related Transfer Functions") concernent la forme fréquentielle des fonctions de transfert. On désignera ci-après par "HRIR" (pour "Head Related Impulse Response") leur forme temporelle.The particular spatialized sound reproduction techniques to which the present invention relates are based on the acoustic transfer functions of the head between the positions of the space and the ear canal. These "HRTF" transfer functions (for "Head Related Transfer Functions") concern the frequency form of the transfer functions. Hereafter referred to as "HRIR" (for "Head Related Impulse Response") their temporal form.

Par ailleurs, le terme "binaural" vise la restitution sur un casque stéréophonique avec néanmoins des effets de spatialisation. La présente invention ne se limite pas à cette technique et s'applique notamment aussi à des techniques dérivées du binaural comme les techniques de restitution dite "transaurale", c'est-à-dire sur haut-parleurs distants. De telles techniques peuvent alors utiliser ce qui est appelé une "annulation de diaphonie" (ou "cross-talk cancellation") qui consiste à annuler les chemins croisés acoustiques de manière à ce qu'un son, ainsi traité puis émis par les haut-parleurs, puisse n'être perçu que par une seule des deux oreilles d'un auditeur.Moreover, the term "binaural" aims at restitution on a stereophonic headphones with nevertheless effects of spatialization. The present invention is not limited to this technique and also applies in particular to techniques derived from the binaural such as rendering techniques called "transaural", that is to say on remote speakers. Such techniques can then use what is called a "crosstalk cancellation" (or "cross-talk cancellation") which consists in canceling the acoustic cross paths so that a sound, thus processed and then emitted by the loudspeakers. speakers, can be perceived only by one of the two ears of a listener.

Le terme "multicanal", dans le traitement pour la restitution sonore spatialisée, consiste à produire une représentation du champ acoustique sous forme de N signaux (dites composantes spatiales). Ces signaux contiennent l'ensemble des sons qui composent le champ sonore avec néanmoins des pondérations qui dépendent de leur direction (ou "incidence") et décrites par N fonctions d'encodage spatial associées. La reconstruction du champ sonore, pour la restitution en un point choisi, est alors assurée par N' fonctions de décodage spatial (avec le plus souvent N=N').The term "multichannel", in the treatment for spatialized sound reproduction, consists of producing a representation of the acoustic field in the form of N signals (called spatial components). These signals contain all the sounds that make up the sound field with nevertheless weightings that depend on their direction (or "incidence") and described by N associated spatial encoding functions. Reconstruction the sound field, for the restitution at a chosen point, is then provided by N 'spatial decoding functions (with most often N = N').

Dans le cas particulier du binaural, cette décomposition permet de faire de l'encodage et du décodage dits "binaural multicanal". Les fonctions de décodage (qui sont en réalité des filtres), associées à un jeu de fonctions d'encodage spatial donné (qui sont en réalité des gains d'encodage), lorsqu'ils sont optimums en restitution, assurent un sentiment d'immersion parfaite de l'auditeur à l'intérieur d'une scène sonore, alors qu'il ne dispose en réalité, pour la restitution binaurale, que de deux haut-parleurs (oreillettes d'un casque ou haut-parleurs distants).In the particular case of the binaural, this decomposition makes it possible to do encoding and decoding called "binaural multichannel". The decoding functions (which are actually filters), associated with a set of spatial encoding functions (which are in fact encoding gains), when they are optimal in rendering, ensure a feeling of immersion perfect for the listener inside a sound stage, whereas it actually has, for the binaural restitution, only two loudspeakers (headphones of a headphone or distant loudspeakers).

Les avantages d'une approche multicanal pour les techniques binaurales sont multiples puisque l'étape d'encodage est indépendante de l'étape de décodage.
Ainsi, dans le cas d'une composition d'une scène sonore virtuelle à partir de signaux synthétisés ou enregistrés, l'encodage est généralement peu coûteux en mémoire et/ou en calculs puisque les fonctions spatiales sont des gains qui dépendent uniquement des incidences des sources à encoder et non pas du nombre de sources elles-mêmes. Le décodage a aussi un coût indépendant du nombre de sources à spatialiser.
Dans le cas en outre d'un champ sonore réel mesuré par un réseau de microphones et encodé selon des fonctions spatiales connues, il est possible aujourd'hui de trouver des
fonctions de décodage qui permettent une écoute binaurale satisfaisante. Enfin, les fonctions de décodage peuvent être individualisées pour chacun des auditeurs.The advantages of a multi-channel approach for binaural techniques are multiple since the encoding step is independent of the decoding step.
Thus, in the case of a composition of a virtual sound scene from synthesized or recorded signals, the encoding is generally inexpensive in memory and / or in calculations since the spatial functions are gains that depend solely on the effects of the effects. sources to encode and not the number of sources themselves. Decoding also has a cost independent of the number of sources to be spatialised.
In the case, moreover, of a real sound field measured by a network of microphones and encoded according to known spatial functions, it is possible today to find
decoding functions that allow satisfactory binaural listening. Finally, the decoding functions can be individualized for each of the listeners.

La présente invention vise en particulier une obtention perfectionnée des filtres de décodage et/ou des gains d'encodage en technique binaurale multicanale. Le contexte est le suivant : des sources sont spatialisées par encodage multicanal et la restitution du contenu encodé spatialement s'effectue par application de filtres de décodage appropriés.The present invention aims in particular an improved obtaining of decoding filters and / or encoding gains in binaural multichannel technique. The context is as follows: sources are spatialised by multichannel encoding and the restitution of the spatially encoded content is done by applying appropriate decoding filters.

On connaît par la référence WO-00/19415 un traitement de binaural multicanal qui prévoit le calcul de filtres de décodage. En désignant par :

g_i (θ_p ,ϕ_p ) des fonctions spatiales d'encodage fixées où g est le gain correspondant au canal i ∈ 1,..,N et à la position p ∈ 1,..,P définie par ses angles d'incidence θ (azimut) et ϕ (élévation),
L(θ_p,ϕ_p,f) et R(θ_p,ϕ_p,f) des bases de fonctions HRTF obtenues en mesurant les fonctions de transfert acoustique de chaque oreille L et R d'un individu pour un nombre P de positions de l'espace (p ∈ 1,...,P) et pour une fréquence donnée f, on prévoit dans ce document WO-00/19415 essentiellement deux étapes pour obtenir des filtres à partir de ces fonctions spatiales.

We know by reference WO-00/19415 a multichannel binaural processing which provides for the calculation of decoding filters. By designating by:

g _i ( θ _p , φ _p ) fixed encoding spatial functions where g is the gain corresponding to the channel i ∈ 1, .., N and the position p ∈ 1, .., P defined by its angles d incidence θ (azimuth) and φ (elevation),
L ( θ _p , φ _p , f ) and R ( θ _p , φ _p , f ) of HRTF function bases obtained by measuring the acoustic transfer functions of each individual L and R ear for a number P of positions of space ( p ∈ 1, ..., P ) and for a given frequency f, it is provided in this document WO-00/19415 basically two steps to get filters from these spatial functions.

On extrait les retards de chaque HRTF. En effet, la forme d'une tête est habituellement telle que, pour une position donnée, un son arrive à une oreille un certain temps avant d'arriver à l'autre oreille (un son situé à gauche arrivant bien entendu à l'oreille gauche avant d'arriver à l'oreille droite). La différence de retard t entre les deux oreilles est un indice interaural de localisation appelé ITD (pour "Interaural Time Différence"). On définit alors de nouvelles bases de HRTF notées L et R par : $L (θ_{p}, ϕ_{p}, f) = T_{L} (θ_{p}, ϕ_{p}) \underset{̲}{L} (θ_{p}, ϕ_{p}, f) pour p = 1, 2, ..., P$

R (θ_{p}, ϕ_{p}, f) = T_{R} (θ_{p}, ϕ_{p}) \underset{̲}{R} (θ_{p}, ϕ_{p}, f) pour p = 1, 2, ..., P

où T_L,R = e^j2πftL,R, avec un retard t_L,R The delays of each HRTF are extracted. Indeed, the shape of a head is usually such that, for a given position, a sound arrives at one ear a certain time before reaching the other ear (a sound to the left arriving of course to the ear left before reaching the right ear). The delay difference t between the two ears is an interaural location index called ITD (for "Interaural Time Difference"). We then define new HRTF bases denoted L and R by:

The (θ_{p}, φ_{p}, f) = T_{The} (θ_{p}, φ_{p}) \underset{}{The} (θ_{p}, φ_{p}, f) for p = 1, 2, ..., P

R (θ_{p}, φ_{p}, f) = T_{R} (θ_{p}, φ_{p}) \underset{}{R} (θ_{p}, φ_{p}, f) for p = 1, 2, ..., P

where T _{L, R} = e ^j2πft ^{L, R} , with a delay t _{L, R}

On obtient dans la seconde étape des filtres de décodage L _i (f) et R_i (f) du canal i qui satisfont les équations : $\underset{̲}{L} (θ_{p}, ϕ_{p}, f) = \sum_{i = 1, N} g_{i} (θ_{p}, ϕ_{p}) L_{i} (f) pour p = 1, 2, ..., P,$

et

\underset{̲}{R} (θ_{p}, ϕ_{p}, f) = \sum_{i = 1, N} g_{i} (θ_{p}, ϕ_{p}) R_{i} (f) pour p = 1, 2, ..., P,

ce qui s'écrit aussi, en notation matricielle, L = GL et R = GR , G désignant une matrice de gains.In the second step, decoding filters L _i ( f ) and R _i ( f ) of channel i are obtained which satisfy the equations:

\underset{}{The} (θ_{p}, φ_{p}, f) = \underset{i = 1, NOT}{Σ} {boy Wut}_{i} (θ_{p}, φ_{p}) {The}_{i} (f) for p = 1, 2, ..., P,

and

\underset{}{R} (θ_{p}, φ_{p}, f) = \underset{i = 1, NOT}{Σ} {boy Wut}_{i} (θ_{p}, φ_{p}) R_{i} (f) for p = 1, 2, ..., P,

which is also written, in matrix notation, L = GL and R = GR, G designating a matrix of gains.

Pour obtenir ces filtres, ce document propose une méthode dite "de calcul de la pseudo-inverse" qui vise à satisfaire les équations précédentes au sens des moindres carrés, soit : $\underset{̲}{L} = GL \to L = (G^{T} - G^{- 1}) G^{T} \underset{̲}{L}$

To obtain these filters, this document proposes a method called "pseudo-inverse calculation" which aims to satisfy the preceding equations in the sense of least squares, namely:

\underset{}{The} = GL \to The = ({BOY WUT}^{T} - {BOY WUT}^{- 1}) {BOY WUT}^{T} \underset{}{The}

La mise en oeuvre d'une telle technique nécessite donc de réintroduire un retard correspondant à l'ITD au moment de l'encodage de chaque source sonore. Chaque source est donc encodée deux fois (une fois pour chaque oreille). Le document WO-00/19415 précise qu'il est possible de ne pas extraire les retards mais qu'alors, la qualité de rendu sonore serait moindre. En particulier, la qualité est meilleure, même avec moins de canaux, si l'on extrait les retards.The implementation of such a technique therefore requires reintroducing a delay corresponding to the ITD at the time of encoding each sound source. Each source is encoded twice (once for each ear). The document WO-00/19415 specifies that it is possible not to extract the delays but that then, the sound quality would be less. In particular, the quality is better, even with fewer channels, if we extract the delays.

Par ailleurs, une deuxième approche, proposée dans le document US-5,500,900 , pour calculer conjointement les filtres de décodage et les fonctions spatiales d'encodage consiste à décomposer les jeux de HRIR en effectuant une analyse en composantes principale (PCA) puis en sélectionnant un nombre réduit de composantes (qui correspond au nombre de canaux).
Une approche équivalente, proposée dans US-5,596,644 , utilise plutôt une décomposition en valeurs singulières (SVD). Si les retards sont extraits des HRIR avant la décomposition, puis utilisés au moment de l'encodage, la reconstruction des HRIR est très bonne avec un nombre réduit de composantes.In addition, a second approach, proposed in the document US 5500900 , to jointly compute the decoding filters and the encoding spatial functions consists in decomposing the HRIR sets by performing a principal component analysis (PCA) and then selecting a reduced number of components (which corresponds to the number of channels).
An equivalent approach proposed in US 5596644 , rather uses a singular value decomposition (SVD). If the delays are extracted from the HRIR before the decomposition and then used at the time of the encoding, the reconstruction of the HRIR is very good with a reduced number of components.

Lorsque les retards sont laissés dans les filtres originaux, le nombre de canaux doit être augmenté afin d'obtenir une reconstruction de bonne qualité.
De plus, ces techniques de l'art antérieur ne permettent pas d'avoir des fonctions spatiales d'encodage universelles. En effet, la décomposition donne des fonctions spatiales différentes pour chaque individu.When delays are left in the original filters, the number of channels must be increased in order to obtain good quality reconstruction.
In addition, these techniques of the prior art do not allow to have universal encoding spatial functions. Indeed, decomposition gives different spatial functions for each individual.

On indique aussi que le binaural multicanal peut aussi être vu comme la simulation en binaural d'un rendu multicanal sur une pluralité de haut-parleurs (plus de deux). On parle alors de la méthode dite "des haut-parleurs virtuels" lorsque néanmoins la restitution binaurale se fait, selon cette approche, uniquement sur deux oreillettes d'un casque ou sur deux haut-parleurs distants. Le principe d'une telle restitution consiste à considérer une configuration de haut-parleurs répartis autour de l'auditeur. Lors du rendu sur deux haut-parleurs réels, des lois de panoramique d'intensité (ou " pan pot ") sont utilisées pour donner alors la sensation à l'auditeur que des sources sont réellement positionnées dans l'espace uniquement à partir de deux haut-parleurs. On parle alors de "sources fantômes". Des règles similaires sont utilisées pour définir des positions de haut-parleurs virtuels, ce qui revient à définir des fonctions spatiales d'encodage. Les filtres de décodage correspondent directement aux fonctions HRIR calculées aux positions des haut-parleurs virtuels.It is also indicated that the multichannel binaural can also be seen as the binaural simulation of multichannel rendering on a plurality of loudspeakers (more than two). We speak then of the method called "virtual speakers" when binaural restitution is, according to this approach, only on two headsets of a headset or two remote speakers. The principle of such a rendering consists in considering a configuration of loudspeakers distributed around the listener. When rendering on two real speakers, intensity panning laws (or " pan pot ") are used to then give the listener the feeling that sources are actually positioned in space only from two speakers. We are talking about "ghost sources". Similar rules are used to define virtual speaker positions, which amounts to defining spatial encoding functions. The decoding filters correspond directly to the HRIR functions calculated at the positions of the virtual loudspeakers.

Pour un rendu spatial performant avec un faible nombre de canaux, les techniques de l'art antérieur nécessitent l'extraction des retards des HRIR. Les techniques de prise de son ou d'encodage multicanal en un point de l'espace sont largement utilisées puisqu'il est alors possible de faire subir des transformations aux signaux encodés (par exemple des rotations). Or, dans le cas où le signal à décoder est un signal multicanal mesuré (ou encodé) en un point, l'information de retard n'est pas extractible à partir du signal seul. Les filtres de décodage doivent alors permettre de reproduire les retards pour un rendu sonore optimal. De plus, dans le cas d'enregistrements, le nombre de canaux peut être faible et les techniques de l'art antérieur ne permettent pas un bon décodage avec peu de canaux sans extraire les retards. Par exemple en technique d'acquisition à partir de microphones ambiophoniques, le signal multicanal acquis peut n'être constitué que de quatre canaux, typiquement. On entend par "microphones ambiophoniques" des microphones composés de capteurs directifs coïncidents. Les retards interauraux doivent alors être reproduits au décodage.For efficient spatial rendering with a small number of channels, the techniques of the prior art require the extraction of delays HRIR. The techniques of sound recording or multi-channel encoding at a point in space are widely used since it is then possible to make transformations to the encoded signals (for example rotations). However, in the case where the signal to be decoded is a multichannel signal measured (or encoded) at a point, the delay information is not extractable from the signal alone. The decoding filters must then be able to reproduce the delays for an optimal sound reproduction. In addition, in the case of recordings, the number of channels can be low and the techniques of the prior art do not allow good decoding with few channels without extracting delays. For example, in the acquisition technique from ambiophonic microphones, the multichannel signal acquired may consist of only four channels, typically. The term "ambiophonic microphones" means microphones composed of coinciding directional sensors. The interaural delays must then be reproduced at decoding.

Plus généralement, l'extraction des retards présente au moins deux autres inconvénients majeurs :

les retards doivent être pris en compte (rajout d'une étape) au moment de l'encodage, ce qui augmente les ressources nécessaires en calcul,
les retards étant pris en compte au moment de l'encodage, les signaux doivent être encodés pour chaque oreille et le nombre de filtrages nécessaire au décodage est double.

More generally, the extraction of delays has at least two other major drawbacks:

the delays must be taken into account (addition of a step) at the time of the encoding, which increases the resources necessary in computation,
delays being taken into account at the time of encoding, the signals must be encoded for each ear and the number of filtering necessary for decoding is double.

La présente invention vient améliorer la situation.The present invention improves the situation.

Elle propose à cette effet un procédé de spatialisation sonore avec un encodage multicanal et pour une restitution binaurale sur deux haut-parleurs, comprenant un encodage spatial défini par des fonctions d'encodage associées à une pluralité de canaux d'encodage et un décodage par application de filtres pour une restitution en contexte binaural sur les deux haut-parleurs.To this end, it proposes a sound spatialization method with multichannel encoding and binaural reproduction on two loudspeakers, comprising a spatial encoding defined by encoding functions associated with a plurality of encoding channels and a decoding by application. filters for binaural playback on the two speakers.

Le procédé au sens de l'invention comporte les étapes :

a) obtenir un jeu original de fonctions de transfert acoustique propres à une morphologie d'individu (HRIR;HRTF),
b) choisir des fonctions d'encodage spatial et/ou des filtres de décodage, et
c) par itérations successives, optimiser les filtres associés aux fonctions d'encodage choisies ou les fonctions d'encodage associées aux filtres choisis, ou conjointement les filtres et les fonctions d'encodage choisis, en minimisant une erreur calculée en fonction d'une comparaison entre :
- le jeu original de fonctions de transfert, et
- un jeu de fonctions de transfert reconstruit à partir des fonctions d'encodage et des filtres de décodage, optimisés et/ou choisis.

The process according to the invention comprises the steps:

a) obtain an original set of acoustic transfer functions specific to an individual morphology (HRIR; HRTF),
b) select spatial encoding functions and / or decoding filters, and
c) by successive iterations, optimize the filters associated with the chosen encoding functions or the encoding functions associated with the chosen filters, or jointly the filters and the encoding functions chosen, while minimizing an error calculated according to a comparison enter :
- the original set of transfer functions, and
- a set of transfer functions reconstructed from encoding functions and decoding filters, optimized and / or selected.

Ce que l'on entend par "fonctions de transfert acoustique propres à une morphologie d'individu" peut concerner les fonctions HRIR exprimées dans le domaine temporel. Toutefois, il n'est pas exclu de considérer à la première étape a) les fonctions HRTF exprimées dans le domaine fréquentiel et, en réalité, correspondant habituellement aux transformées de Fourier des fonctions HRIR.What is meant by "acoustic transfer functions specific to an individual morphology" can relate to the HRIR functions expressed in the time domain. However, it is not excluded to consider in the first step a) the HRTF functions expressed in the frequency domain and, in fact, usually corresponding to the Fourier transforms of the HRIR functions.

Ainsi, de façon générale, l'invention propose le calcul par optimisation des filtres associés à un ensemble de gains d'encodage choisis ou des gains d'encodage associés à un ensemble de filtres de décodage choisis, ou une optimisation conjointe des filtres de décodage et des gains d'encodage. Ces filtres et/ou ces gains ont par exemple été fixés ou calculés initialement par les techniques de la pseudo-inverse ou des haut-parleurs virtuels, décrites notamment dans le document WO-00/19415 . Puis, ces filtres et/ou les gains associés sont améliorés, au sens de l'invention, par une optimisation itérative qui vise à réduire une fonction d'erreur prédéterminée.Thus, in general, the invention proposes the optimization calculation of the filters associated with a set of chosen encoding gains or encoding gains associated with a set of selected decoding filters, or a joint optimization of the decoding filters. and encoding gains. These filters and / or these gains have for example been fixed or initially calculated by the techniques of the pseudo-inverse or the virtual loudspeakers, described in particular in the document WO-00/19415 . Then, these filters and / or the associated gains are improved, within the meaning of the invention, by an iterative optimization which aims to reduce a predetermined error function.

L'invention propose ainsi la détermination de filtres de décodage et de gains d'encodage qui permettent à la fois une bonne reconstruction du retard mais aussi une bonne reconstruction de l'amplitude des HRTF (module des HRTF), et ce, pour un faible nombre de canaux, comme on le verra en référence à la description détaillée ci-après.The invention thus proposes the determination of decoding filters and encoding gains which allow both a good reconstruction of the delay but also a good reconstruction of the HRTF amplitude (modulus of HRTF), and this, for a weak number of channels, as will be seen with reference to the detailed description below.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ci-après, et des dessins annexés sur lesquels :

la figure 1 illustre les étapes générales d'un procédé au sens de l'invention,
la figure 2 illustre l'amplitude (niveaux de gris) des fonctions temporelles HRIR (sur plusieurs échantillons successifs Ech) qui ont été choisies pour la mise en oeuvre de l'étape E0 de la figure 1, en fonction de l'azimut (en degrés notés deg°),
la figure 3 illustre l'allure de quelques premiers harmoniques sphériques en contexte ambiophonique, en tant que fonctions d'encodage spatial dans un premier mode de réalisation,
les figures 4A, 4B, 4C comparent les performances du traitement selon le premier mode de réalisation, pour une solution non optimisée (figure 4A), pour une solution partiellement optimisée par quelques itérations de traitement (figure 4B) et pour une solution complètement optimisée par le traitement au sens de l'invention (figure 4C),
la figure 5 illustre les fonctions d'encodage dans la technique des haut-parleurs virtuels utilisée dans un second mode de réalisation,
la figure 6 compare une fonction HRTF moyenne réelle (représentée en trait plein) aux fonctions HRTF moyenne reconstruites en utilisant la solution de la pseudo-inverse au sens de l'art antérieur (représentée en traits pointillés), la solution de départ donnée par la méthode des haut-parleurs virtuels (représentés en traits interrompus longs) et la solution optimisée convergente, au sens du second mode de réalisation de l'invention (représentés en traits mixtes),
la figure 7 compare les variations du retard interaural ITD original (traits pleins) à celui obtenu par la solution optimisée au sens du second mode de réalisation de l'invention (traits mixtes), à celui reconstruit à partir de la technique des haut-parleurs virtuels (traits interrompus longs) et à celui reconstruit à partir des filtres obtenus par la solution de la pseudo-inverse au sens de l'art antérieur (traits pointillés),
la figure 8 représente schématiquement un système de spatialisation pouvant être obtenu par la mise en oeuvre du premier mode de réalisation, en tenant compte des retards interauraux à l'encodage,
la figure 9 représente schématiquement un système de spatialisation pouvant être obtenu par la mise en oeuvre du second mode de réalisation, sans prise en compte des retards interauraux à l'encodage mais en incluant ces retards dans les filtres de décodage.

Other features and advantages of the invention will appear on examining the detailed description below, and the attached drawings in which:

the figure 1 illustrates the general steps of a process within the meaning of the invention,
the figure 2 illustrates the amplitude (gray levels) of the temporal functions HRIR (on several successive samples Ech) which have been chosen for the implementation of the step E0 of the figure 1 , as a function of the azimuth (in degrees deg °),
the figure 3 illustrates the appearance of some first spherical harmonics in ambiophonic context, as spatial encoding functions in a first embodiment,
the Figures 4A, 4B, 4C compare the performance of the processing according to the first embodiment, for a non-optimized solution ( Figure 4A ), for a solution partially optimized by a few iterations of treatment ( Figure 4B ) and for a solution completely optimized by the treatment in the sense of the invention ( figure 4C )
the figure 5 illustrates the encoding functions in the virtual speaker technique used in a second embodiment,
the figure 6 compares a real average HRTF function (represented in solid line) with the reconstructed average HRTF functions using the solution of the pseudo-inverse in the sense of the prior art (represented by dashed lines), the starting solution given by the high method. virtual speakers (represented in long broken lines) and the convergent optimized solution, in the sense of the second embodiment of the invention (shown in phantom),
the figure 7 compares the variations of the original ITD interaural delay (solid lines) to that obtained by the optimized solution in the sense of the second embodiment of the invention (mixed lines), to that reconstructed from the virtual speakers technique (features). interrupted long) and to that reconstructed from the filters obtained by the solution of the pseudo-inverse in the sense of the prior art (dashed lines),
the figure 8 schematically represents a spatialization system obtainable by the implementation of the first embodiment, taking into account interaural delays in encoding,
the figure 9 schematically represents a spatialization system obtainable by the implementation of the second embodiment, without taking into account the interaural delays in the encoding but including these delays in the decoding filters.

Dans un exemple de réalisation, le procédé au sens de l'invention peut se décomposer en trois étapes :

a) obtenir un jeu de HRIR (oreille gauche et /ou oreille droite) en P positions autour de l'auditeur, noté ci-après H(θ_p,ϕ_p,t),
b) fixer des fonctions d'encodage spatial et/ou des filtres de base, les fonctions d'encodages étant notées g(θ,ϕ,n) (ou encore g(θ,ϕ,n,f)), où :
- θ,ϕ sont les angles d'incidence en azimut et élévation,
- n est l'indice du canal d'encodage considéré,
- et f est la fréquence,
c) et trouver les filtres associés aux fonctions spatiales fixées ou les fonctions spatiales associées aux filtres fixés ou une combinaison de filtres et de fonctions spatiales associés, par une technique d'optimisation qui sera décrite en détail plus loin.

In an exemplary embodiment, the method within the meaning of the invention can be broken down into three steps:

a) obtaining a set of HRIR (left ear and / or right ear) in P positions around the listener, denoted hereinafter H ( θ _p , φ _p , t ) ,
b) set spatial encoding functions and / or base filters, the encoding functions being denoted g ( θ, φ, n ) (or also g ( θ, φ, n, f )), where:
- θ, φ are the angles of incidence in azimuth and elevation,
- n is the index of the encoding channel considered,
- and f is the frequency,
c) and find the filters associated with the fixed spatial functions or the spatial functions associated with the fixed filters or a combination of filters and associated spatial functions, by an optimization technique which will be described in detail below.

On indique simplement ici que, pour la mise en oeuvre de la première étape a) précitée, l'obtention des HRTF de la deuxième oreille peut être déduite de la mesure de la première oreille par symétrie. Le jeu de fonctions HRIR peut par exemple être mesuré sur un sujet en positionnant des microphones à l'entrée de son conduit auditif. En variante, ce jeu de HRIR peut aussi être calculé par des méthodes de simulation numérique (modélisation de la morphologie du sujet ou calcul par réseau de neurones artificiels) ou encore avoir fait l'objet d'un traitement choisi (réduction du nombre d'échantillons, correction de la phase, ou autre).
Il est possible dans cette étape a) d'extraire les retards des HRIR, de les stocker puis de les rajouter au moment de l'encodage spatial, les étapes b) et c) restant inchangées. Cette réalisation sera décrite en détail en référence notamment à la figure 8.It is simply stated here that, for the implementation of the first step a) above, obtaining the HRTF of the second ear can be deduced from the measurement of the first ear by symmetry. The set of functions HRIR can for example be measured on a subject by positioning microphones at the entrance of his ear canal. As a variant, this HRIR game can also be calculated by numerical simulation methods (modeling of the morphology of the subject or calculation by artificial neural network) or having been the subject of a chosen treatment (reduction of the number of samples, phase correction, or other).
It is possible in this step a) to extract the delays from the HRIRs, to store them and then to add them at the time of the spatial encoding, the steps b) and c) remaining unchanged. This achievement will be described in detail with particular reference to the figure 8 .

Cette première étape a) porte la référence E0 sur la figure 1.This first step a) has the reference E0 on the figure 1 .

Pour la mise en oeuvre de l'étape b), si l'on cherche à obtenir des filtres optimisés d'une part, il faut fixer les fonctions d'encodage spatial g(θ,ϕ,n) (ou g(θ,ϕ,n,f)) et, pour obtenir des fonctions spatiales optimisées d'autre part, il faut fixer les filtres de décodage notés F(t,n).
Néanmoins, il peut être prévu d'optimiser conjointement, à la fois les filtres et les fonctions spatiales, comme indiqué ci-dessus.For the implementation of step b), if one seeks to obtain optimized filters on the one hand, it is necessary to set the spatial encoding functions g ( θ, φ, n ) (or g ( θ, φ, n, f )) and, to obtain optimized spatial functions, the decoding filters denoted F ( t, n ) must be fixed .
Nevertheless, it can be planned to jointly optimize both the filters and the spatial functions, as indicated above.

Le choix d'une optimisation des fonctions spatiales ou d'une optimisation des filtres de décodage peut dépendre de divers contextes d'application.
Si les fonctions d'encodage spatial sont fixées, elles sont alors reproductibles et universelles et l'individualisation des filtres se fait simplement au décodage.
Par ailleurs, les fonctions d'encodage spatial, lorsqu'elles comportent un grand nombre de zéros parmi n canaux d'encodage comme dans le second mode de réalisation décrit plus loin, permettent de limiter le nombre d'opérations lors de l'encodage. Les lois de panoramique d'intensité ("pan pot") entre des haut-parleurs virtuels en deux dimensions et leurs extensions en trois dimensions peuvent être représentées par des fonctions d'encodage comportant seulement deux gains non nuls, au plus, pour deux dimensions et trois gains non nuls pour trois dimensions, pour une seule source donnée. Le nombre de gains non nuls est, bien entendu, indépendant du nombre de canaux et, surtout, les gains nuls permettent d'alléger les calculs d'encodage.The choice of an optimization of the spatial functions or an optimization of the decoding filters can depend on various contexts of application.
If the spatial encoding functions are fixed, they are then reproducible and universal and the individualization of the filters is simply decoding.
Moreover, the spatial encoding functions, when they comprise a large number of zeros among n encoding channels as in the second embodiment described below, make it possible to limit the number of operations during encoding. Panoramic intensity of Laws ( "pan pot") between virtual speakers in two dimensions and their extensions in three dimensions can be represented by encoding functions with only two non-zero earnings at most two dimensions and three non-zero gains for three dimensions, for a single given source. The number of non-zero winnings is, of course, independent of the number of channels and, most importantly, the zero winnings make it possible to lighten the encoding calculations.

Quant aux fonctions d'encodage proprement dites, plusieurs choix s'offrent encore.
Les fonctions spatiales du type harmoniques sphériques en contexte ambiophonique ont des qualités mathématiques qui permettent de faire subir des transformations aux signaux encodés (par exemple des rotations du champ sonore). De plus, de telles fonctions assurent une compatibilité entre le décodage binaural et des enregistrements ambiophoniques basés sur une décomposition du champ sonore en harmoniques sphériques.
Les fonctions d'encodage peuvent être des fonctions de directivités réelles ou simulées de microphones afin de permettre une écoute d'enregistrements en binaural multicanal. Les fonctions d'encodage peuvent être quelconques (non universelles) et déterminées par une méthode quelconque, le rendu devant alors être optimisé lors d'étapes subséquentes du procédé au sens de l'invention.
Les fonctions spatiales peuvent aussi bien être fonction du temps ou de la fréquence. L'optimisation se fera alors en tenant compte de cette dépendance (par exemple en optimisant de manière indépendante chaque échantillon temporel ou fréquentiel).As for the encoding functions themselves, several choices are still available.
Spherical harmonic space functions in ambiophonic context have mathematical qualities that make it possible to transform the encoded signals (for example rotations of the sound field). In addition, such functions provide compatibility between binaural decoding and surround sound recordings based on a decomposition of the sound field into spherical harmonics.
The encoding functions may be real or simulated directivity functions of microphones to allow listening of binaural multichannel recordings. The encoding functions can be arbitrary (non-universal) and determined by any method, the rendering then having to be optimized during subsequent steps of the method within the meaning of the invention.
Spatial functions may also be a function of time or frequency. The optimization will be done taking into account this dependence (for example by optimizing independently each time sample or frequency).

Pour ce qui concerne les filtres de décodage, ces derniers peuvent être fixés de manière à ce que le décodage puisse être universel.
Les filtres de décodage peuvent être choisis aussi de manière à réduire le coût en ressources qu'implique le filtrage. Par exemple, l'utilisation de filtres dits "à réponse impulsionnelle infinie" ou "IIR" est avantageuse.
Les filtres de décodages peuvent aussi être choisis selon un critère psychoacoustique, par exemple construit à partir de bandes de Bark normalisées.
De manière plus générale, les filtres de décodage peuvent être déterminés par une méthode quelconque. Le rendu, notamment pour un auditeur individuel, peut alors être optimisé lors d'étapes suivantes du procédé portant sur les fonctions d'encodage.As far as the decoding filters are concerned, these can be fixed so that the decoding can be universal.
Decoding filters can also be chosen to reduce the resource cost of filtering. For example, the use of filters called "infinite impulse response" or "IIR" is advantageous.
The decoding filters can also be chosen according to a psychoacoustic criterion, for example constructed from standardized Bark bands.
More generally, the decoding filters can be determined by any method. The rendering, in particular for an individual listener, can then be optimized during the next steps of the method relating to the encoding functions.

Cette deuxième étape b) relative au calcul d'une solution initiale S0 porte la référence E1 sur la figure 1. En bref, elle consiste à choisir les filtres de décodage (référencés "F") et/ou les fonctions d'encodage spatial (référencées "g") et déterminer une solution initiale S0 pour les fonctions d'encodage ou les filtres de décodage, par une méthode choisie aussi.This second step b) relating to the calculation of an initial solution S0 carries the reference E1 on the figure 1 . In short, it consists in choosing the decoding filters (referenced "F") and / or the spatial encoding functions (referenced "g") and determining an initial solution S0 for the encoding functions or the decoding filters, by a chosen method too.

Par exemple, dans le cas où les fonctions spatiales fixées sont des fonctions définissant les lois de panoramique d'intensité (" pan pot ") entre des haut-parleurs virtuels, les filtres de la solution de départ S0 à l'étape E1 peuvent être directement les fonctions HRIR données aux positions correspondantes des haut-parleurs virtuels.For example, in the case where the fixed spatial functions are functions defining the intensity panning laws (" pan pot ") between virtual loudspeakers, the filters of the starting solution S0 at step E1 can be directly the HRIR functions given to the corresponding positions of the virtual speakers.

Dans cet exemple, il peut être prévu aussi d'optimiser conjointement les filtres de décodage et les gains d'encodage, la solution de départ S0 étant encore déterminée par des fonctions définissant les lois de panoramique d'intensité (" pan pot ") en tant que fonctions d'encodage et par les fonctions HRIR, elles-mêmes, données aux positions des haut-parleurs virtuels, en tant que filtres de décodage.In this example, it may be provided also to jointly optimize the decoding and encoding filter gains, S0 starting solution being further determined by the functions defining the intensity panning laws ( "pan pot") in as encoding functions and by the HRIR functions, themselves, given to the positions of the virtual loudspeakers, as decoding filters.

Dans un autre exemple où les fonctions d'encodage spatiales sont fixées comme étant des harmoniques sphériques, on calcule les filtres de décodage à l'étape E1 à partir de la pseudo-inverse, pour déterminer la solution de départ S0.In another example where the spatial encoding functions are set as spherical harmonics, the decoding filters in step E1 are calculated from the pseudo-inverse to determine the starting solution S0.

Plus généralement, la solution de départ S0 à l'étape E1 peut être calculée à partir de la solution aux moindres carrés : $F = HRIR g^{- 1}$

More generally, the starting solution S0 at step E1 can be calculated from the least squares solution:

F = HRIR {boy Wut}^{- 1}

Il convient de préciser ici que les éléments F, HRIR et g sont des matrices. En outre, la notation g ^-1 désigne la pseudo-inverse de la matrice de gain g selon l'expression : g ^-1 = pinv(g) = g ^T .(g.g ^T)^-1, la notation g ^T désignant la transposée de la matrice g.It should be specified here that the elements F, HRIR and g are matrices. In addition, the notation g ^-1 denotes the pseudo-inverse of the gain matrix g according to the expression: g ^{- 1} = pinv (g) = g ^T. ( g, g ^T ) ^-1 , the notation g ^T denoting the transpose of the matrix g .

De manière générale encore, la solution de départ S0 peut être quelconque (aléatoire ou fixée), l'essentiel étant qu'elle mène à l'obtention d'une solution convergée SC à l'étape E6 de la figure 1.In a still general manner, the starting solution S0 may be arbitrary (random or fixed), the essential point being that it leads to obtaining a converged solution SC at step E6 of FIG. figure 1 .

La figure 1 illustre aussi les opérations E2, E3, T4, E5, E6 de l'étape générale c), d'optimisation au sens de l'invention. Ici, cette optimisation est menée par itérations. A titre d'exemple aucunement limitatif, la méthode d'optimisation dite "du gradient" (recherche de zéros de la dérivée première d'une fonction d'erreur à plusieurs variables par différences finies) peut être appliquée. Bien entendu, des méthodes variantes qui permettent d'optimiser des fonctions selon un critère établi peuvent aussi être considérées.The figure 1 also illustrates the operations E2, E3, T4, E5, E6 of the general step c), optimization within the meaning of the invention. Here, this optimization is conducted by iterations. By way of non-limiting example, the so-called "gradient" optimization method (searching for zeros of the first derivative of a finite difference multi-variable error function) can be applied. Of course, variant methods that make it possible to optimize functions according to an established criterion can also be considered.

A l'étape E2, la reconstruction du jeu de fonctions HRIR donne alors un jeu reconstruit HRIR* = gF différent du jeu original, à la première itération.In step E2, the reconstruction of the set of functions HRIR then gives a reconstructed set HRIR * = gF different from the original set, at the first iteration.

A l'étape E3, le calcul d'une fonction d'erreur est un point important de la méthode d'optimisation au sens de l'invention. Une fonction d'erreur proposée consiste à minimiser simplement la différence de modules entre la transformée de Fourier HRTF* du jeu de fonctions HRIR reconstruite et la transformée de Fourier HRTF du jeu de fonctions HRIR original (donné à l'étape E0). Cette fonction d'erreur, notée c, s'écrit : $c = \sum_{p} \sum_{f} {||F (HRIR)| - F |(HRIR *)||}^{2} soit c = \sum_{p} \sum_{f} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2},$

où F (X) désigne la transformée de Fourier de la fonction X.In step E3, the calculation of an error function is an important point of the optimization method within the meaning of the invention. A proposed error function simply minimizes the module difference between the HRTF * Fourier transform of the reconstructed HRIR function set and the HRTF Fourier transform of the original HRIR function set (given in step E0). This error function, noted c, is written:

vs = \underset{p}{Σ} \underset{f}{Σ} {||F (HRIR)| - F |(HRIR *)||}^{2} is vs = \underset{p}{Σ} \underset{f}{Σ} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2},

where F (X) denotes the Fourier transform of the function X.

D'autres fonctions d'erreur permettent aussi un rendu spatial optimal. Par exemple, il est possible de pondérer les fonctions HRIR par un gain qui dépend de la position des fonctions HRIR afin de mieux reconstruire certaines positions privilégiées de l'espace, ce qui s'écrit : $c = \sum_{p} w_{p} \sum_{f} {||F (HRIR)| - F |(HRIR *)||}^{2} ou c = \sum_{p} w_{p} \sum_{f} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2},$

où w_p est le gain correspondant à une position p. Il est ainsi possible de favoriser la reconstruction de certaines zones spatiales de la fonction HRIR (par exemple la partie frontale).Other error functions also allow optimal spatial rendering. For example, it is possible to weight the HRIR functions by a gain that depends on the position of the HRIR functions in order to better reconstruct certain privileged positions of the space, which is written as:

vs = \underset{p}{Σ} w_{p} \underset{f}{Σ} {||F (HRIR)| - F |(HRIR *)||}^{2} or vs = \underset{p}{Σ} w_{p} \underset{f}{Σ} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2},

where w _p is the gain corresponding to a position p. It is thus possible to promote the reconstruction of certain spatial areas of the HRIR function (for example the front part).

De la même façon, il est aussi possible de pondérer les fonctions HRIR en fonction du temps ou de la fréquence.In the same way, it is also possible to weight the HRIR functions as a function of time or frequency.

La fonction d'erreur peut aussi minimiser la différence d'énergie entre les modules, soit : $c = \sum_{p} \sum_{t} {|F {|(HRIR)|}^{2} - {|F (HRIR *)|}^{2}|}^{2} ou c = \sum_{p} \sum_{f} {|{|HRTF (p, f)|}^{2} - {|HRTF * (p, f)|}^{2}|}^{2}$

The error function can also minimize the energy difference between the modules, ie:

vs = \underset{p}{Σ} \underset{t}{Σ} {|F {|(HRIR)|}^{2} - {|F (HRIR *)|}^{2}|}^{2} or vs = \underset{p}{Σ} \underset{f}{Σ} {|{|HRTF (p, f)|}^{2} - {|HRTF * (p, f)|}^{2}|}^{2}

De manière générale, on retiendra que toute fonction d'erreur calculée entièrement ou en partie à partir des fonctions HRIR peut être prévue (module, phase, retard ou ITD estimé, différences interaurales, ou autre).
Par ailleurs, si le critère d'erreur porte sur les échantillons fréquentiels des fonctions HRTF, indépendamment les uns des autres contrairement à ce qui était proposé ci-avant (somme sur toutes les fréquences pour le calcul de la fonction d'erreur c), les itérations d'optimisation peuvent être appliquées successivement à chaque échantillon fréquentiel, avec l'avantage de réduire alors le nombre de variables simultanées, d'avoir une fonction d'erreur propre à chaque fréquence f et de rencontrer un critère d'arrêt en fonction de la convergence propre à chaque fréquence.In general, it will be remembered that any error function calculated entirely or in part from the HRIR functions can be provided (module, phase, delay or estimated ITD, interaural differences, or other).
Moreover, if the error criterion relates to the frequency samples of the HRTF functions, independently of each other contrary to what was proposed above (sum over all the frequencies for the calculation of the error function c), the optimization iterations can be successively applied to each frequency sample, with the advantage of then reducing the number of simultaneous variables, to have an error function specific to each frequency f and to meet a stopping criterion as a function of the convergence specific to each frequency.

L'étape T4 est un test pour arrêter ou non l'itération de l'optimisation en fonction d'un critère d'arrêt choisi. Il peut s'agir d'un critère caractérisant le fait que :

la variable c a atteint une valeur minimale ε, et/ou que
la variable c ne décroît plus suffisamment, et/ou que
un nombre maximal d'itérations est atteint, et/ou que
les modifications des filtres ne sont plus suffisantes, ou autre.

Step T4 is a test for stopping or not the iteration of the optimization according to a chosen stopping criterion. It may be a criterion characterizing the fact that:

the CA variable reaches a minimum value ε , and / or that
variable c does not decrease enough, and / or
a maximum number of iterations is reached, and / or that
filter changes are no longer sufficient, or else.

Si le critère est atteint (flèche 0 en sortie du test T4), les filtres F(n,t) ou les gains g(θ,ϕ,n) ou les couples filtre/gains calculés permettent d'obtenir un rendu spatial optimal, comme on le verra notamment en référence à la figure 4C ou à la figure 6 ci-après. Le traitement s'arrête alors par l'obtention d'une solution convergée (étape E6).If the criterion is reached (arrow 0 at the output of the test T4), the filters F ( n, t ) or the gains g ( θ, φ, n ) or the calculated filter / gain pairs make it possible to obtain an optimal spatial rendering, as will be seen in particular with reference to the figure 4C or at figure 6 below. The treatment then stops by obtaining a converged solution (step E6).

Si le critère n'est pas atteint (flèche N en sortie du test T4), selon la fonction d'erreur utilisée, il est difficile de connaître de manière analytique quelle doit être l'évolution des filtres F ou des gains g afin de minimiser l'erreur c. On a avantageusement recours à un calcul de gradient pour ajuster les filtres et/ou les gains afin qu'ils mènent à une réduction de la fonction d'erreur c (étapes itératives E5).If the criterion is not reached (arrow N at the output of the test T4), according to the error function used, it is difficult to know analytically what the evolution of the filters F or gains g should be in order to minimize the error c . Advantageously, a gradient calculation is used to adjust the filters and / or the gains so that they lead to a reduction of the error function c (iterative steps E5).

Ce traitement est avantageusement assisté par informatique. Une fonction dénommée "fminunc" du module "optimization Toolbox" du logiciel Matlab®, programmée de façon appropriée, permet de réaliser les étapes E2, E3, T4, E5, E6 décrites ci-avant en référence à la figure 1.This treatment is advantageously assisted by computer. A function called "fminunc" of the "optimization Toolbox" module of the Matlab® software, appropriately programmed, allows to perform the steps E2, E3, T4, E5, E6 described above with reference to the figure 1 .

Bien entendu, cette réalisation illustrée sur la figure 1 s'applique tout aussi bien lorsqu'il a été choisi de fixer à l'étape E1 les filtres de décodage, puis d'optimiser les fonctions d'encodage spatial lors des étape E2,E3,E5,E6. Elle s'applique aussi lorsqu'il a été choisi d'optimiser de manière itérative à la fois les fonctions d'encodage et les filtres de décodage.Of course, this realization illustrated on the figure 1 applies equally well when it was chosen to set the decoding filters in step E1, and then to optimize the spatial encoding functions during the steps E2, E3, E5, E6. It also applies when has been chosen to iteratively optimize both the encoding functions and the decoding filters.

First embodiment

On décrit ci-après un exemple d'optimisation des filtres de décodage d'un contenu issu d'un encodage spatial par des fonctions harmoniques sphériques en contexte ambiophonique d'ordre élevé (ou "high order ambisonic"), pour une restitution vers du binaural. Il s'agit ici d'un cas sensible car si des sources ont été enregistrées ou encodées en contexte ambiophonique, les retards interauraux doivent être respectés dans le traitement au décodage, par application des filtres de décodage.An example of optimization of the decoding filters of a content resulting from a spatial encoding by spherical harmonic functions in a high order ambiophonic context (or "high order ambisonic") is described below, for a restitution to binaural. This is a sensitive case because if sources have been recorded or encoded in surround context, the interaural delays must be respected in the decoding processing, by application of the decoding filters.

Dans la mise en oeuvre de l'invention exposée ci-après à titre d'exemple, on a choisi de se limiter au cas de deux dimensions et on cherche alors à fournir des filtres optimisés afin de décoder un contenu ambiophonique à l'ordre 2 (cinq canaux ambiophoniques) pour une écoute binaurale sur casque à oreillettes.In the implementation of the invention described hereinafter by way of example, we have chosen to limit ourselves to the case of two dimensions and we then seek to provide optimized filters in order to decode an ambiophonic content in the order 2 (five surround channels) for binaural listening on headphones.

Pour la réalisation de la première étape a) du procédé général décrit ci-avant (référence E0 de la figure 1), on utilise un jeu de fonctions HRIR mesuré pour l'oreille gauche en chambre sourde et pour 64 valeurs d'angle d'azimut différentes et allant de 0 à environ 350° (ordonnées du graphe de la figure 2). Les filtres de ce jeu de fonctions HRIR ont été réduits à 32 échantillons temporels non nuls (abscisses du graphe de la figure 2). On suppose une symétrie de la tête de l'auditeur et les HRIR de l'oreille droite sont les symétriques des HRIR de l'oreille gauche.
En variante de mesures à effectuer sur un individu, on peut obtenir les fonctions HRIR à partir de bases de données standard ("tête de Kemar") ou par modélisation de la morphologie de l'individu, ou autre.For the realization of the first step a) of the general method described above (reference E0 of the figure 1 ), a set of HRIR functions measured for the left ear in the deaf chamber and for 64 different azimuth angle values ranging from 0 to about 350 ° (ordinate of the graph of the figure 2 ). The filters of this set of functions HRIR have been reduced to 32 non-zero time samples (abscissa of the graph of the figure 2 ). A symmetry of the listener's head is assumed and the HRIRs of the right ear are the symmetries of the HRIRs of the left ear.
As a variant of measurements to be carried out on an individual, the HRIR functions can be obtained from standard databases ("Kemar head") or by modeling the morphology of the individual, or the like.

Les fonctions spatiales d'encodage choisies ici sont les harmoniques sphériques calculées à partir des fonctions cos( mθ ) et sin( mθ ), avec des fréquences angulaires croissantes m=0,1,2,..., N pour caractériser la dépendance en azimut (comme illustré sur la figure 3), et à partir des fonctions de Legendre pour la dépendance en élévation, pour un encodage 3D.The spatial encoding functions chosen here are the spherical harmonics calculated from the functions cos ( mθ ) and sin ( mθ ), with increasing angular frequencies m = 0,1,2, ..., N to characterize the dependence on azimuth (as shown on the figure 3 ), and from Legendre's functions for elevation dependence, for a 3D encoding.

La solution de départ S0 pour l'étape E1 est donnée par calcul de la pseudo-inverse (avec résolution linéaire). Cette solution de départ constitue la solution de décodage qui était proposée en tant que telle dans le document WO-00/19415 de l'art antérieur décrit ci-avant. La technique d'optimisation employée au sens de l'invention est préférentiellement celle du gradient décrite ci-avant. La fonction d'erreur c employée correspond aux moindres carrés sur le module de la transformée de Fourier des fonctions HRIR, soit : $c = \sum_{p} \sum_{f} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2}$

The starting solution S0 for step E1 is given by calculating the pseudo-inverse (with linear resolution). This starting solution constitutes the decoding solution which was proposed as such in the document WO-00/19415 of the prior art described above. The optimization technique used in the sense of the invention is preferably that of the gradient described above. The error function c used corresponds to the least squares on the module of the Fourier transform of the HRIR functions, namely:

vs = \underset{p}{Σ} \underset{f}{Σ} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2}

Les figures 4A, 4B, 4C montrent l'allure temporelle (sur quelques dizaines d'échantillons temporels) des cinq filtres de décodage et les erreurs de reconstruction du module (en dB, illustrées par des niveaux de gris) et de la phase (en radians, illustrées par des niveaux de gris) de la transformée de Fourier des fonctions HRIR pour chaque position (ordonnées repérées en azimut) et pour chaque fréquence (abscisses repérées en fréquences), respectivement :

à l'issue de la première étape E1 (solution de départ S0 obtenue par résolution linéaire par calcul de la pseudo-inverse),
après quelques itérations E5 (solution intermédiaire SI),
à l'issue de la dernière étape de traitement E6 (solution convergée SC).

The Figures 4A, 4B, 4C show the time course (over a few tens of time samples) of the five decoding filters and the module reconstruction errors (in dB, illustrated by gray levels) and the phase (in radians, illustrated by levels of gray) of the Fourier transform of the HRIR functions for each position (ordinates indicated in azimuth) and for each frequency (abscissae marked in frequencies), respectively:

at the end of the first step E1 (starting solution S0 obtained by linear resolution by calculation of the pseudo-inverse),
after a few iterations E5 (intermediate solution SI),
at the end of the last processing step E6 (converged solution SC).

Pour la solution de départ qui constituait pourtant la solution de décodage au sens du document WO-00/19415 , le module des fonctions HRTF est relativement mal reconstruit, la plupart des erreurs de reconstruction étant supérieures à 8 dB. Néanmoins, il apparaît que l'erreur sur la phase n'est pratiquement pas modifiée au cours des itérations. Cette erreur est toutefois minimale en basses fréquences et sur la partie ispilatérale des fonctions HRTF (région à 0-180° d'azimut). Par contre, l'erreur sur le module diminue fortement au fur et à mesure des itérations d'optimisation, surtout dans cette région ispilatérale. L'optimisation au sens de l'invention permet donc d'améliorer le module des fonctions HRTF sans modifier la phase, donc le retard de groupe, et, de là et surtout, le retard interaural ITD, de sorte que le rendu est particulièrement fidèle grâce à la mise en oeuvre de ce premier mode de réalisation.For the initial solution which was nevertheless the decoding solution in the sense of the document WO-00/19415 , the HRTF function module is relatively poorly reconstructed, with most reconstruction errors being greater than 8 dB. Nevertheless, it appears that the error on the phase is practically not modified during the iterations. This error is however minimal at low frequencies and on the ispilateral portion of the HRTF functions (region at 0-180 ° azimuth). On the other hand, the error on the module decreases strongly as iterations of optimization, especially in this ispilateral region. Optimization in the sense of the invention therefore makes it possible to improve the module of the HRTF functions without modifying the phase, therefore the group delay, and, from there and above all, the interaural delay ITD, so that the rendering is particularly faithful thanks to the implementation of this first embodiment.

Second embodiment

On décrit ci-après un exemple d'optimisation des filtres de décodage pour des fonctions spatiales issues de lois de panoramique d'intensité ("pan pot"), consistant en termes simples en des règles de mixage.An example of optimization of the decoding filters for spatial functions resulting from pan-pan laws , consisting of simple terms in mixing rules, is described below.

Les lois de panoramique (dites "de panning") sont couramment employées par les techniciens du son pour produire des contenus audio, notamment des contenus multicanaux aux formats dits "surround" qui sont utilisés en restitution sonore 5.1, 6.1 , ou autre. Dans ce second mode de réalisation, on cherche à calculer les filtres qui permettent de restituer un contenu "surround" sur un casque. Dans ce cas, l'encodage par des lois de panning est réalisé par mixage d'une ambiance sonore selon un format "surround" (pistes 5.1 d'un enregistrement numérique par exemple). Les filtres optimisés à partir des mêmes lois de panning permettent alors d'obtenir un décodage binaural optimal pour le rendu souhaité avec cet effet "surround".Panoramic laws (called "panning") are commonly used by sound technicians to produce audio content, including multichannel content in so-called "surround" formats that are used in sound reproduction 5.1, 6.1, or other. In this second embodiment, we seek to calculate the filters that can render content "surround" on a headset. In this case, encoding by panning laws is achieved by mixing a sound environment in a "surround" format (tracks 5.1 of a digital recording for example). Optimized filters from the same panning laws then allow for optimal binaural decoding for the desired rendering with this "surround" effect.

La présente invention s'applique avantageusement au cas où les positions des haut-parleurs virtuels correspondent à des positions d'un système de restitution multicanal grand public, à effet "surround". Les filtres de décodage optimisés permettent alors un décodage de contenus multimédias grand public (typiquement des contenus multicanaux avec effet "surround") pour une restitution sur deux haut-parleurs, par exemple sur casque en binaural. Cette restitution en binaural d'un contenu qui est par exemple initialement au format 5.1 est optimisée grâce à la mise en oeuvre de l'invention.The present invention is advantageously applicable in the case where the positions of the virtual speakers correspond to positions of a multichannel rendering system for the general public, with "surround" effect. The optimized decoding filters then allow decoding of multimedia consumer content (typically multi-channel content with "surround" effect) for playback on two speakers, for example on binaural headphones. This binaural reproduction of a content that is for example initially in 5.1 format is optimized thanks to the implementation of the invention.

Ci-après, on décrit le cas d'un exemple de dix haut-parleurs virtuels, "disposés" autour de l'auditeur.
On obtient tout d'abord les fonctions HRIR en 64 positions autour de l'auditeur, comme décrit en référence au premier mode de réalisation ci-avant.Hereinafter, we describe the case of an example of ten virtual speakers, "arranged" around the listener.
First, the HRIR functions are obtained at 64 positions around the listener, as described with reference to the first embodiment above.

On détermine dans ce second mode de réalisation les fonctions spatiales données par les lois de panoramique d'intensité ou "panning" (ici en tangente) entre chaque couple de haut-parleurs adjacents, par une relation du type : $\tan (θ_{v}) = \frac{L - R}{L + R} \tan (u),$

où :

L est le gain du haut-parleur de gauche,
R est le gain du haut-parleur de droite,
u est l'angle entre les haut-parleurs (360/10=36° dans cet exemple, comme illustré sur la figure 5),
θ_v est l'angle pour lequel on souhaite calculer les gains (typiquement l'angle entre le plan de symétrie des deux haut-parleurs et la direction souhaitée).

In this second embodiment, the spatial functions given by the laws of intensity panning or "panning" (here in tangent) between each pair of adjacent loudspeakers are determined by a relation of the type:

\tan (θ_{v}) = \frac{The - R}{The + R} \tan (u),

or :

L is the gain of the left speaker,
R is the gain of the right speaker,
u is the angle between the loudspeakers (360/10 = 36 ° in this example, as shown in figure 5 )
θ _v is the angle for which it is desired to calculate the gains (typically the angle between the plane of symmetry of the two loudspeakers and the desired direction).

Les formes des dix fonctions spatiales retenues en fonction de l'azimut sont données sur la figure 5. Pour chaque azimut, seuls deux gains, au maximum, à associer aux canaux d'encodage sont non nuls. En effet, on considère ici qu'un haut-parleur virtuel est "placé" de telle sorte qu'un gain (s'il est disposé sur un axe d'encodage) ou deux gains (s'il est disposé entre deux axes d'encodage), seulement, sont à déterminer pour définir l'encodage. En revanche, on indique qu'aucun gain d'encodage n'est nul a priori en contexte ambiophonique dont les fonctions d'encodage sont illustrées sur la figure 3 décrite ci-avant. Néanmoins, la qualité de restitution avec un choix d'encodage ambiophonique, après optimisation au sens du premier mode de réalisation, est généralement très appréciée.The shapes of the ten spatial functions selected according to the azimuth are given on the figure 5 . For each azimuth, only two gains, at most, to associate with the encoding channels are non-zero. Indeed, it is considered here that a virtual speaker is "placed" so that a gain (if it is arranged on an encoding axis) or two gains (if it is arranged between two axes of 'encoding), only, are to be determined to define the encoding. On the other hand, it is indicated that no encoding gain is null a priori in ambiophonic context whose encoding functions are illustrated on the figure 3 described above. Nevertheless, the quality of reproduction with a choice of surround encoding, after optimization in the sense of the first embodiment, is generally very much appreciated.

La méthode d'optimisation utilisée dans le second mode de réalisation est encore celle du gradient. La solution de départ S0 à l'étape E1 est donnée par les dix filtres de décodage qui correspondent aux dix fonctions HRIR données aux positions des haut-parleurs virtuels. Les fonctions spatiales fixées sont les fonctions d'encodage représentant les lois de panning. La fonction d'erreur c est basée sur le module de la transformée de Fourier des fonctions HRIR, soit : $c = \sum_{p} \sum_{f} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2}$

The optimization method used in the second embodiment is still that of the gradient. The starting solution S0 in step E1 is given by the ten decoding filters which correspond to the ten HRIR functions given to the positions of the loudspeakers. virtual. The fixed spatial functions are the encoding functions representing the panning laws. The error function c is based on the module of the Fourier transform of the HRIR functions, namely:

vs = \underset{p}{Σ} \underset{f}{Σ} {||HRTF (p, f)| - |HRTF * (p, f)||}^{2}

On se réfère maintenant à la figure 6, laquelle compare une fonction HRTF réelle (représentée en trait plein), moyennée sur un ensemble de 64 positions mesurées (pour des angles d'azimut allant de 0 à environ 350°), aux fonctions HRTF moyennes reconstruites en utilisant :

la solution de départ pseudo-inverse, sans optimisation (représentée en traits pointillés),
la solution de départ donnée par la méthode plus adaptée des haut-parleurs virtuels (représentée en traits interrompus longs),
et la solution optimisée convergente après quelques itérations, au sens de l'invention (représentée en traits mixtes).

We now refer to the figure 6 , which compares a real HRTF function (represented as a solid line), averaged over a set of 64 measured positions (for azimuth angles ranging from 0 to about 350 °), to reconstructed average HRTF functions using:

the pseudo-inverse starting solution, without optimization (represented by dashed lines),
the starting solution given by the more suitable method of the virtual loudspeakers (represented in long broken lines),
and the convergent optimized solution after a few iterations, within the meaning of the invention (shown in phantom).

La solution optimisée au sens de l'invention concorde parfaitement avec la fonction originale, ce qui s'explique par le fait que la fonction d'erreur c proposée ici vise à réduire au maximum l'erreur sur le module de la fonction.The optimized solution within the meaning of the invention is perfectly consistent with the original function, which is explained by the fact that the error function c proposed here aims to minimize the error on the module of the function.

La figure 7 illustre les variations du retard interaural ITD en fonction de la position en azimut des fonctions HRIR. La solution optimisée permet de reconstruire un retard ITD (traits mixtes) relativement proche de l'ITD original (traits pleins), mais tout aussi proche néanmoins que celui reconstruit à partir de la solution de départ, ici obtenue par la technique des haut-parleurs virtuels (traits interrompus longs). Le retard ITD reconstruit à partir des filtres obtenus par résolution linéaire (pseudo-inverse), représenté par des traits pointillés sur la figure 7, est assez irrégulier et éloigné de l'ITD original. Ces résultats confirment bien la faible performance de la méthode par résolution linéaire lorsque les retards sont reconstruits à partir des filtres de décodage.The figure 7 illustrates the variations of ITD interaural delay as a function of the azimuth position of the HRIR functions. The optimized solution makes it possible to reconstruct a delay ITD (mixed lines) relatively close to the original ITD (solid lines), but just as close as that reconstructed from the initial solution, here obtained by the technique of the loudspeakers virtual (long broken lines). The ITD delay reconstructed from the filters obtained by linear (pseudo-inverse) resolution, represented by dashed lines on the figure 7 , is quite irregular and far from the original ITD. These results confirm the poor performance of the linear resolution method when the delays are reconstructed from the decoding filters.

L'optimisation du procédé au sens de l'invention permet donc de reconstruire à la fois le module des fonctions HRTF et le retard de groupe ITD entre les deux oreilles.The optimization of the method in the sense of the invention therefore makes it possible to reconstruct both the HRTF function module and the ITD group delay between the two ears.

De plus, il est apparu dans ce second mode de réalisation que la qualité des filtres reconstruits n'est pas affectée par le choix des fonctions d'encodage. De ce fait, il est possible d'utiliser des fonctions spatiales d'encodage quelconque, par exemple comportant avantageusement beaucoup de zéros, comme dans cet exemple de réalisation, ce qui permet de réduire d'autant les ressources nécessaires au calcul de l'encodage.In addition, it appeared in this second embodiment that the quality of the reconstructed filters is not affected by the choice of encoding functions. As a result, it is possible to use arbitrary spatial encoding functions, for example advantageously comprising many zeros, as in this exemplary embodiment, which makes it possible to reduce by the same amount the resources needed to calculate the encoding. .

Examples of implementation

L'objet de cette partie de la description est d'apprécier le gain en termes de nombre d'opérations et de ressources en mémoire nécessaires pour la mise en oeuvre de l'encodage et du décodage binaural multicanal au sens de l'invention, avec des filtres de décodage qui incluent la prise en compte du retard.The purpose of this part of the description is to appreciate the gain in terms of the number of operations and memory resources required for the implementation of multi-channel binaural encoding and decoding within the meaning of the invention, with decoding filters that include taking into account the delay.

Le cas traité dans l'exemple décrit ici est celui de deux sources spatialement distinctes à encoder en multicanal et à restituer en binaural. Les deux exemples de mise en oeuvre des figures 8 et 9 utilisent les propriétés de symétrie des fonctions HRIR.The case treated in the example described here is that of two spatially distinct sources to encode multichannel and restore binaural. The two examples of implementation of Figures 8 and 9 use the symmetry properties of the HRIR functions.

L'exemple donné à la figure 9 correspond au cas où les gains d'encodage sont obtenus par application de la méthode des haut-parleurs virtuels selon le second mode de réalisation décrit plus haut. La figure 8 présente une mise en oeuvre de l'encodage et du décodage multicanal lorsque les retards ne sont pas inclus dans les filtres de décodage mais doivent être pris en compte dès l'encodage. Elle peut correspondre à celle de l'art antérieur décrit ci-avant WO-00/19415 , si tant est que les filtres de décodage (et/ou les fonctions d'encodage) n'ont pas été optimisés au sens de l'invention.The example given to figure 9 corresponds to the case where the encoding gains are obtained by applying the virtual loudspeaker method according to the second embodiment described above. The figure 8 presents an implementation of multichannel encoding and decoding when delays are not included in the decoding filters but must be taken into account as soon as encoding. It may correspond to that of the prior art described above WO-00/19415 if the decoding filters (and / or the encoding functions) have not been optimized within the meaning of the invention.

La réalisation de la figure 8 consiste, en termes génériques, à extraire, à partir des fonctions de transfert obtenues à l'étape a), des informations de retard interaural, tandis que l'optimisation, au sens de l'invention, des fonctions d'encodage et/ou des filtres de décodage est menée ici à partir des fonctions de transfert desquelles ont été extraites ces informations de retard. Ensuite, ces retards interauraux peuvent être stockés puis appliqués ultérieurement, en particulier à l'encodage.The realization of the figure 8 consists, in generic terms, of extracting, from the transfer functions obtained in step a), interaural delay information, while that the optimization, within the meaning of the invention, encoding functions and / or decoding filters is conducted here from the transfer functions from which these delay information has been extracted. Then, these interaural delays can be stored and then applied later, in particular to the encoding.

Dans l'exemple de la figure 8, la symétrie des fonctions HRTF pour l'oreille droite et l'oreille gauche permet de considérer n filtres F _j,L et n filtres symétriques F_j,L, donc 2n canaux. Les gains d'encodage sont notés $g_{j, L}^{i}$

(les gains d'indice R n'ayant pas besoin d'être pris en compte du fait de la symétrie), où i va de 1 à K pour K sources à considérer (dans l'exemple K=2) et j va de 1 à n pour n filtres F_j,L. In the example of the figure 8 the symmetry of the HRTF functions for the right ear and the left ear makes it possible to consider n filters F _{j, L} and n symmetrical filters F _{j, L} , thus 2n channels. Encoding gains are noted

{boy Wut}_{j, The}^{i}

(the gains of index R do not need to be taken into account because of the symmetry), where i goes from 1 to K for K sources to consider (in the example K = 2) and j goes from 1 to n for n filters F _{j, L.}

Sur les figures 8 et 9, on a adopté, bien entendu, les mêmes notations S ₁ et S ₂ pour les deux sources à encoder, chacune étant placée en une position donnée de l'espace.
Sur la figure 8, on note $τ_{ITD}^{1}$

et

τ_{ITD}^{2}

les retards (ITD) correspondant aux positions des sources S ₁ et S ₂. Dans cet exemple, les deux sons sont censés arriver à l'oreille droite avant d'arriver à l'oreille gauche.
Sur la figure 9, on note aussi

g_{j, L}^{i}

les gains d'encodage pour la position de la source i et pour le canal j ∈ [1,..., n ]. On rappelle que les gains pour l'oreille gauche ou droite sont identiques, la symétrie étant introduite lors du filtrage.On the Figures 8 and 9 , we adopted, of course, the same notations S ₁ and S ₂ for the two sources to be encoded, each being placed in a given position of space.
On the figure 8 , we notice

τ_{ITD}^{1}

and

τ_{ITD}^{}

the delays (ITD) corresponding to the positions of the sources S ₁ and S ₂ . In this example, both sounds are supposed to arrive at the right ear before reaching the left ear.
On the figure 9 , we also note

{boy Wut}_{j, The}^{i}

the encoding gains for the position of the source i and for the channel j ∈ [1, ..., n ] . It is recalled that the gains for the left or right ear are identical, the symmetry being introduced during the filtering.

Pour la partie décodage de la figure 8, on note F _j,L les filtres de décodage pour le canal j et

les filtres symétriques des filtres F _j,L. On indique ici que dans le cas de haut-parleurs virtuels, le filtre symétrique d'un haut-parleur virtuel donné (un canal donné) est le filtre du haut-parleur virtuel symétrique (en considérant le plan de symétrie gauche/droite de la tête).
Enfin, on note L et R les canaux binauraux gauche et droit.For the decoding part of the figure 8 , we write F _{j, L} the decoding filters for the channel j and

the symmetrical filters of the filters F _{j, L.} It is indicated here that in the case of virtual loudspeakers, the symmetrical filter of a given virtual loudspeaker (a given channel) is the filter of the virtual symmetrical loudspeaker (considering the left / right plane of symmetry of the head).
Finally, L and R are the left and right binaural channels.

Dans la mise en oeuvre de la figure 8, comme le retard ITD est introduit au moment de l'encodage, les signaux multicanaux pour la voie gauche sont différents de ceux pour la voie droite. Les conséquences de l'introduction de retards à l'encodage sont donc la multiplication par deux du nombre d'opérations d'encodage et la multiplication par deux du nombre de canaux, par rapport à la seconde mise en oeuvre illustrée sur la figure 9 et profitant des avantages qu'offre le second mode de réalisation de l'invention. Ainsi, en référence à la figure 8, chaque signal issu d'une source S_i dans le bloc d'encodage ENCOD est dédoublé pour que soit appliqué à l'un d'eux un retard (positif ou négatif) $τ_{ITD}^{1},$

τ_{ITD}^{2},

et chaque signal dédoublé est multiplié par chaque gain

les résultats des multiplications étant regroupés ensuite par indice de canal j (n canaux) et selon qu'un retard interaural a été appliqué ou non (2 fois n canaux au total). Les 2n signaux obtenus sont véhiculés à travers un réseau, stockés, ou autre, en vue d'une restitution et, pour ce faire, sont appliqués à un bloc de décodage DECOD comportant n filtres F _j,L pour une voie de gauche L et n filtres symétriques

pour une voie de droite R. On rappelle que la symétrie des filtres résulte du fait que l'on considère une symétrie des fonctions HRTF. Les signaux auxquels sont appliqués les filtres sont regroupés en chaque voie et le signal résultant de ce regroupement est destiné à alimenter l'un des deux haut-parleurs en restitution sur deux haut-parleurs distants (auquel cas il convient d'ajouter une opération d'annulation des chemins croisés) ou directement l'un des deux canaux d'un casque à oreillettes en restitution binaurale.In the implementation of the figure 8 since the ITD delay is introduced at the time of encoding, the multichannel signals for the left channel are different from those for the right channel. The consequences of the introduction of coding delays are therefore the doubling of the number of encoding operations and the doubling of the number of channels, compared to the second implementation illustrated on the figure 9 and taking advantage of the advantages offered by the second embodiment of the invention. So, with reference to the figure 8 , each signal from a source S _i in the encoding block ENCOD is split so that one of them is applied to a delay (positive or negative)

τ_{ITD}^{},

τ_{ITD}^{},

and each split signal is multiplied by each gain

the results of the multiplications are then grouped by channel index j (n channels) and whether interaural delay has been applied or not (2 times n channels in total). The 2n signals obtained are conveyed through a network, stored, or otherwise, for restitution and, for this purpose, are applied to a DECOD decoding block having n filters F _{j, L} for a left channel L and n symmetrical filters

for a right lane R. It is recalled that the symmetry of the filters results from the fact that we consider a symmetry of the HRTF functions. The signals to which the filters are applied are grouped together in each channel and the signal resulting from this grouping is intended to supply one of the two speakers with playback on two distant loudspeakers (in which case it is necessary to add an operation of cross paths cancellation) or directly one of the two channels of a headset with auricles in binaural restitution.

La figure 9 présente, quant à elle, une mise en oeuvre de l'encodage et du décodage multicanal lorsque les retards sont, au contraire, inclus dans les filtres de décodage au sens du second mode de réalisation utilisant la méthode des haut-parleurs virtuels et en exploitant l'observation résultant des figures 6 et 7 ci-avant.The figure 9 presents, in turn, an implementation of encoding and multichannel decoding when the delays are, on the contrary, included in the decoding filters in the sense of the second embodiment using the virtual speaker method and exploiting the resultant observation Figures 6 and 7 above.

Ainsi, le fait de ne pas avoir à prendre en compte les retards interauraux à l'encodage permet de réduire le nombre de canaux à n (et non plus 2n). L'utilisation de la symétrie des filtres de décodage permet en outre, dans la mise en oeuvre de la figure 9, d'appliquer le principe du filtrage de décodage par une somme $(F_{j, L} + {F̑}_{j, L}) / 2$

sur k premiers canaux (k étant ici le nombre de haut-parleurs virtuels positionnés entre 0 et 180° inclus), suivie d'une différence

(F_{j, L} - {F̑}_{j, L}) / 2

sur les canaux suivants et donc de réduire de moitié le nombre de filtrages nécessaires. Bien entendu, chaque somme ou chaque différence de filtres est à considérer comme un filtre en soi. Ce qui est indiqué ici comme étant une somme ou une différence de filtres est à considérer en relation avec les expressions des filtres F _j,L et

décrits ci-avant en référence à la figure 8.Thus, the fact of not having to take into account the interaural delays in encoding makes it possible to reduce the number of channels to n (and no longer 2n). The use of the symmetry of the decoding filters also makes it possible, in the implementation of the figure 9 , to apply the principle of decoding filtering by a sum

(F_{j, The} + F_{j, The}) / 2

on k first channels (where k is the number of virtual speakers positioned between 0 and 180 ° inclusive), followed by a difference

(F_{j, The} - F_{j, The}) / 2

on the following channels and thus halve the number of filtering required. Of course, each sum or each difference of filters is to be considered as a filter in itself. What is indicated here as being a sum or a difference of filters is to be considered in relation to the expressions of the filters F _{j, L} and

described above with reference to the figure 8 .

On indique que cette mise en oeuvre de la figure 9 serait, en revanche, impossible si les retards devaient être intégrés à l'encodage comme illustré sur la figure 8.It is stated that this implementation of the figure 9 would be impossible if the delays were to be included in the encoding as shown in figure 8 .

Le traitement au décodage de la figure 9 se poursuit par un regroupement des sommes SS et un regroupement des différences SD alimentant par leur somme la voie L (module SL délivrant le signal SS+SD) et par leur différence la voie R (module DR délivrant le signal SS-SD).The decoding processing of the figure 9 is continued by a grouping sums SS and a grouping of SD differences feeding by their sum the L path (SL module delivering SS + SD signal) and by their difference the R channel (DR module delivering SS-SD signal).

Ainsi, alors que la solution illustrée sur la figure 8 nécessite :

à l'encodage, la prise en compte de deux retards, des multiplications par 4n gains et 2n sommes, et
au décodage 2n filtrages et 2n sommes,

la solution illustrée sur la figure 9 ne requiert que :

2n gains et n sommes à l'encodage, et
n filtrages, n sommes et simplement une somme et une différence globale, au décodage.

So while the solution illustrated on the figure 8 need :

the encoding, the taking into account of two delays, multiplications by 4n gains and 2n sums, and
at decoding 2n filtering and 2n sums,

the solution illustrated on the figure 9 only requires:

2n gains and n are at the encoding, and
n filtering, n is and simply a sum and a global difference, to the decoding.

Par ailleurs, même si le stockage en mémoire requiert, pour les deux solutions, les mêmes capacités (stockage de n filtres en calculant les retards et les gains à la volée), la mémoire de travail utile (tampon) pour la mise en oeuvre de la figure 8 requiert plus du double de celle utile pour la mise en oeuvre de la figure 9, puisque 2n canaux transitent entre l'encodage et le décodage et qu'il faut implémenter une ligne à retard par source dans la mise en oeuvre de la figure 8.Moreover, even if the storage in memory requires, for both solutions, the same capacities (storage of n filters by calculating the delays and the gains on the fly), the useful working memory (buffer) for the implementation of the figure 8 more than twice that useful for the implementation of the figure 9 since 2n channels pass between encoding and decoding and a delay line must be implemented per source in the implementation of the figure 8 .

La présente invention vise alors un système de spatialisation sonore avec un encodage multicanal et pour une restitution sur deux canaux comprenant un bloc d'encodage spatial ENCOD défini par des fonctions d'encodage associées à une pluralité de canaux d'encodage et un bloc de décodage DECOD par application de filtres pour une restitution en contexte binaural. En particulier, les fonctions d'encodage spatial et/ou les filtres de décodage sont déterminés par la mise en oeuvre du procédé décrit ci-avant. Un tel système peut correspondre à celui illustré sur la figure 8, dans une réalisation pour laquelle les retards sont intégrés au moment de l'encodage, ce qui correspond à l'état de l'art au sens du document WO-00/19415 .The present invention thus aims at a sound spatialization system with multichannel encoding and for a two-channel reproduction comprising an ENCOD spatial encoding block defined by encoding functions associated with a plurality of encoding channels and a decoding block. DECOD by applying filters for binaural rendition. In particular, the spatial encoding functions and / or the decoding filters are determined by the implementation of the method described above. Such a system may correspond to that illustrated on the figure 8 , in an embodiment for which the delays are integrated at the time of the encoding, which corresponds to the state of the art in the sense of the document WO-00/19415 .

Une autre réalisation avantageuse consiste en la mise en oeuvre du procédé selon le second mode de réalisation pour construire alors un système de spatialisation avec un bloc d'encodage direct, sans application de retard, de manière à réduire un nombre de canaux d'encodage et un nombre correspondant de filtres de décodage, lesquels incluent directement les retards interauraux ITD, selon un avantage qu'offre la mise en oeuvre de l'invention, comme illustré sur la figure 9.Another advantageous embodiment consists of implementing the method according to the second embodiment to then build a spatialization system with a direct encoding block, without applying a delay, so as to reduce a number of encoding channels and a corresponding number of decoding filters, which directly include ITD interaural delays, according to an advantage offered by the implementation of the invention, as illustrated in FIG. figure 9 .

Cette réalisation de la figure 9 permet d'atteindre une qualité de rendu spatial au moins aussi bonne, sinon meilleure, que les techniques de l'art antérieur, et ce, avec un nombre de filtres deux fois moins important et un coût de calcul moindre. En effet, comme on l'a montré en référence aux figures 6 et 7, dans le cas où la décomposition vise un jeu de fonctions HRIR, cette réalisation permet une qualité de reconstruction du module des HRTF et du retard interaural meilleure que les techniques de l'art antérieur avec un nombre de canaux réduit.This realization of the figure 9 achieves a quality of spatial rendering at least as good, if not better, than the techniques of the prior art, and this with a number of filters half the size and a lower computational cost. Indeed, as we have shown with reference to Figures 6 and 7 , in the case where the decomposition targets a set of HRIR functions, this embodiment allows a quality of reconstruction of the HRTF module and interaural delay better than the techniques of the prior art with a reduced number of channels.

La présente invention vise aussi un programme informatique comportant des instructions pour la mise en oeuvre du procédé décrit ci-avant et dont l'algorithme peut être illustré par un organigramme général du type représenté sur la figure 1.The present invention also relates to a computer program comprising instructions for implementing the method described above and whose algorithm can be illustrated by a general flowchart of the type shown in FIG. figure 1 .

Claims

Method of sound spatialization with a multichannel encoding and for reproduction on two loudspeakers, comprising a spatial encoding defined by encoding functions associated with a plurality of encoding channels and a decoding by applying filters for reproduction in a binaural context on the two loudspeakers, characterized in that it comprises the steps:
a) obtaining an original suite of acoustic transfer functions specific to an individual's morphology (HRIR;HRTF),

b) fixing spatial encoding functions (g(,ϕ,n,f)) and/or decoding filters (F(t,n)), and

c) through successive iterations, optimizing the filters associated with the chosen encoding functions or the encoding functions associated with the chosen filters, or jointly the chosen filters and encoding functions, by minimizing an error (c(HRIR, HRIR*)) calculated as a function of a comparison between:
- the original suite of transfer functions (HRIR), and

- a suite of transfer functions reconstructed (HRIR*) on the basis of the encoding functions and the decoding filters, optimized and/or chosen, characterized in that the comparison of step c) is calculated by differences between respective moduli of the original (HRTF(p,f)) and reconstructed (HRTF * (p, f)) transfer functions, expressed in the frequency domain, for each position in space associated with a transfer function.
Method according to Claim 1, characterized in that the reconstructed suite of transfer functions (HRIR*) is calculated by multiplying the filters by the encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) at each iteration.
Method according to Claim 2, characterized in that in step b) spatial encoding functions are chosen which represent intensity panning laws based on virtual loudspeaker positions.
Method according to Claim 3, characterized in that the positions of the virtual loudspeakers correspond to positions of a multichannel reproduction system with "surround" effect, the optimized decoding filters allowing a decoding of multichannel multimedia contents with "surround" effect for reproduction on two loudspeakers.
Method according to one of Claims 3 and 4, characterized in that the encoding functions comprise a plurality of zero gains to be associated with encoding channels.
Method according to one of the preceding claims, characterized in that interaural delay information is extracted, on the basis of the transfer functions (HRIR, HRTF) obtained in step a), while the optimization of the encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) and/or of the decoding filters is conducted (F(t,n)) on the basis of transfer functions from which said delay information has been extracted, said delay information being applied subsequently, on encoding.
Method according to one of Claims 1 to 5, characterized in that interaural delay information is taken into account in the optimization of the decoding filters (F(t,n)), and in that the spatial encoding is conducted without delay application (ITD).
Method according to one of the preceding claims, characterized in that, in step b), some at least of the transfer functions obtained (HRTF) are chosen as decoding filters.
Method according to Claim 2, characterized in that in step b) spatial encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) of the spherical harmonic type in an ambiophonic context are chosen.
Method according to one of Claims 1, 2, 3, 4, 5, 6, 7 and 9, characterized in that, for the first optimization iteration, the decoding filters (F(t,n)) are calculated by a solution of the pseudo-inverse type.
Method according to Claim 1, characterized in that each difference is weighted as a function of a given direction in space so as to favor certain of said directions.
Computer program for determining encoding functions (g(θ,ϕ,n)) and/or decoding filters (F(t,n)), for a sound spatialization processing with a multichannel spatial encoding and a decoding for binaural reproduction on two loudspeakers, characterized in that it comprises instructions for executing the method according to one of the preceding claims.
Sound spatialization system with a multichannel encoding and for reproduction on two loudspeakers, comprising a spatial encoding block (ENCOD) defined by encoding functions associated with a plurality of encoding channels and a block for decoding (DECOD) by applying filters for reproduction in a binaural context on two loudspeakers, characterized in that the system is adapted for implementing the method according to one of Claims 1 to 11.
System according to Claim 13, characterized in that the spatial encoding functions and/or the decoding filters are determined by implementing the method according to Claim 7,
and in that it comprises a direct encoding block without delay application so as to reduce a number of encoding channels and a corresponding number of decoding filters.