EP3895446B1

EP3895446B1 - Method for interpolating a sound field and corresponding computer program product and device

Info

Publication number: EP3895446B1
Application number: EP19816809.8A
Authority: EP
Inventors: Alexandre GUÉRIN
Original assignee: Fondation B Com
Current assignee: Fondation B Com
Priority date: 2018-12-14
Filing date: 2019-12-13
Publication date: 2023-01-25
Anticipated expiration: 2039-12-13
Also published as: WO2020120772A1; FR3090179B1; FR3090179A1; US20220132262A1; EP3895446A1; US11736882B2

Description

Domaine de l'inventionField of the invention

Le domaine de l'invention est celui de l'interpolation d'un champ sonore (ou acoustique) ayant été émis par une ou plusieurs sources et ayant été capté par un ensemble fini de microphones.The field of the invention is that of the interpolation of a sound (or acoustic) field having been emitted by one or more sources and having been picked up by a finite set of microphones.

L'invention a de nombreuses applications, notamment, mais non exclusivement, dans le domaine de la réalité virtuelle, par exemple pour permettre à un auditeur de se déplacer dans une scène sonore qui lui est restituée, ou dans le domaine de l'analyse de scène sonore, par exemple pour déterminer le nombre de sources sonores présentes dans la scène analysée, ou dans le domaine de la restitution d'une scène multicanal, par exemple au sein d'un décodeur MPEG-H 3D, etc.The invention has many applications, in particular, but not exclusively, in the field of virtual reality, for example to allow a listener to move around in a sound scene which is restored to him, or in the field of the analysis of sound scene, for example to determine the number of sound sources present in the analyzed scene, or in the field of the restitution of a multichannel scene, for example within a 3D MPEG-H decoder, etc.

Art antérieur et ses inconvénientsPrior art and its drawbacks

Afin d'interpoler un champ sonore en une position donnée d'une scène sonore, une approche classique consiste à estimer le champ sonore à la position donnée à l'aide d'une interpolation linéaire entre les champs tels que captés et encodés par les différents microphones de la scène. Les coefficients d'interpolation sont estimés en minimisant une fonction de coût. Une telle approche est montrée dans le document WO 2018/064528 .In order to interpolate a sound field at a given position of a sound scene, a classic approach consists in estimating the sound field at the given position using a linear interpolation between the fields as picked up and encoded by the different stage microphones. The interpolation coefficients are estimated by minimizing a cost function. Such an approach is shown in the document WO 2018/064528 .

Dans une telle approche, les techniques connues privilégient une captation du champ sonore par des microphones dits ambisoniques. Plus particulièrement, un microphone ambisonique encode et délivre le champ sonore qu'il capte dans un format ambisonique. Le format ambisonique se caractérise par des composantes qui consistent en la projection du champ sonore suivant différentes directivités. Ces composantes sont regroupées par ordre. L'ordre zéro encode la pression acoustique instantanée captée par le microphone, l'ordre un encode les trois gradients de pression suivant les trois axes de l'espace, etc. Plus on monte dans les ordres, plus la résolution spatiale de la représentation du champ est importante. Le format ambisonique dans sa représentation complète, i.e. à l'ordre infini, permet d'encoder le champ en tout point à l'intérieur de la sphère maximale exempte de sources sonores, et ayant comme centre l'emplacement physique du microphone ayant effectué la captation. Un tel encodage du champ sonore permet en théorie, à partir d'un seul microphone, de se déplacer à l'intérieur de la zone délimitée par la source la plus proche du microphone, sans pouvoir cependant contourner aucune des sources en question.In such an approach, the known techniques favor capturing the sound field by so-called ambisonic microphones. Specifically, an ambisonic microphone encodes and delivers the sound field it picks up in an ambisonic format. The Ambisonic format is characterized by components that consist of the projection of the sound field following different directivities. These components are grouped in order. Order zero encodes the instantaneous acoustic pressure picked up by the microphone, order one encodes the three pressure gradients along the three axes of space, etc. The higher you go in the orders, the greater the spatial resolution of the representation of the field. The ambisonic format in its complete representation, i.e. at infinite order, makes it possible to encode the field at any point inside the maximum sphere free of sound sources, and having as its center the physical location of the microphone having carried out the capture. Such encoding of the sound field theoretically makes it possible, from a single microphone, to move within the zone delimited by the source closest to the microphone, without however being able to circumvent any of the sources in question.

De tels microphones permettent ainsi de représenter le champ sonore en trois dimensions via une décomposition de ce dernier en harmoniques sphériques. Cette décomposition est particulièrement adaptée à la navigation dite 3DoF (de l'anglais « Degree of Freedom »), e.g. une navigation suivant les trois dimensions. C'est d'ailleurs ce format qui a été retenu pour les contenus immersifs sur la chaîne de réalité virtuelle de Youtube ou encore sur Facebook-360.Such microphones thus make it possible to represent the sound field in three dimensions via a decomposition of the latter into spherical harmonics. That decomposition is particularly suitable for so-called 3DoF navigation (from the English “Degree of Freedom”), eg navigation along the three dimensions. This is also the format that has been chosen for immersive content on Youtube's virtual reality channel or on Facebook-360.

Cependant, les méthodes d'interpolation de l'état de l'art supposent généralement qu'il existe un couple de microphones à égale distance de la position de l'auditeur comme dans la méthode divulguée dans l'article de conférence de A. Southern, J. Wells et D. Murphy : « Rendering walk-through auralisations using wave-based acoustical models », 17th European Signal Processing Conférence, 2009, p. 715-719 ». Une telle condition d'égalité des distances est impossible à garantir en pratique. Par ailleurs, de telles approches ne donnent de résultats intéressants que lorsque le réseau de microphones est dense dans la scène, ce qui est rarement le cas en pratique.However, state-of-the-art interpolation methods generally assume that there is a pair of microphones equidistant from the position of the listener as in the method disclosed in the conference paper of A. Southern, J. Wells and D. Murphy: “Rendering walk-through auralizations using wave-based acoustical models”, 17th European Signal Processing Conference, 2009, p. 715-719 ". Such a condition of equality of distances is impossible to guarantee in practice. Moreover, such approaches only give interesting results when the network of microphones is dense in the scene, which is rarely the case in practice.

Il existe ainsi un besoin pour une méthode améliorée d'interpolation de champ sonore. En particulier, la méthode doit permettre d'estimer le champ sonore à la position d'interpolation de manière à ce que le champ en question soit cohérent avec la position des sources sonores. Par exemple, un auditeur se situant à la position d'interpolation doit avoir l'impression que le champ interpolé arrive effectivement en direction de la ou des source(s) sonore(s) de la scène sonore qui lorsque le champ en question lui est restitué (e.g. pour permettre à l'auditeur de naviguer dans la scène sonore).There is thus a need for an improved method of sound field interpolation. In particular, the method must make it possible to estimate the sound field at the interpolation position so that the field in question is consistent with the position of the sound sources. For example, a listener located at the interpolation position must have the impression that the interpolated field is indeed arriving in the direction of the sound source(s) of the sound scene which, when the field in question is rendered (e.g. to allow the listener to navigate the soundstage).

Il existe également un besoin pour que la complexité calculatoire de la méthode d'interpolation soit maitrisée, par exemple pour permettre une implémentation en temps-réel sur des dispositifs à capacité de calcul limitée (e.g. sur un terminal portable, un casque de réalité virtuelle, etc.).There is also a need for the computational complexity of the interpolation method to be controlled, for example to allow real-time implementation on devices with limited computing capacity (e.g. on a portable terminal, a virtual reality headset, etc.).

Exposé de l'inventionDisclosure of Invention

Dans un mode de réalisation de l'invention, il est proposé un procédé d'interpolation d'un champ sonore capté par une pluralité de N microphones délivrant chacun le champ sonore encodé sous une forme comprenant au moins une pression captée et un vecteur de gradients de pression associé. Un tel procédé comprend une interpolation du champ sonore en une position d'interpolation délivrant un champ sonore encodé interpolé s'exprimant comme une combinaison linéaire des N champs sonores encodés pondérés chacun par un facteur de pondération correspondant. Le procédé comprend en outre une estimation des N facteurs de pondération à partir au moins :

de la position d'interpolation ;
d'une position de chacun desdits N microphones ;
desdites N pressions captées par lesdits N microphones ; et
d'une puissance estimée dudit champ sonore à ladite position d'interpolation.

In one embodiment of the invention, there is proposed a method for interpolating a sound field picked up by a plurality of N microphones each delivering the sound field encoded in a form comprising at least one pressure picked up and a vector of gradients associated pressure. Such a method comprises an interpolation of the sound field at an interpolation position delivering an interpolated encoded sound field expressed as a linear combination of the N encoded sound fields each weighted by a corresponding weighting factor. The method further comprises an estimation of the N weighting factors from at least:

the interpolation position;
a position of each of said N microphones;
said N pressures picked up by said N microphones; and
an estimated power of said sound field at said interpolation position.

Ainsi, l'invention propose une solution nouvelle et inventive pour réaliser une interpolation d'un champ sonore capté par au moins deux microphones, par exemple dans une scène comprenant une ou plusieurs source(s) sonore(s).Thus, the invention proposes a new and inventive solution for carrying out an interpolation of a sound field picked up by at least two microphones, for example in a scene comprising one or more sound source(s).

Plus particulièrement, la méthode proposée tire parti de l'encodage du champ sonore sous une forme donnant accès au vecteur de gradients de pression en plus de la pression. De la sorte, le vecteur de gradients de pression du champ interpolé reste cohérent avec celui du champ sonore tel qu'émis par la ou les source(s) de la scène à la position d'interpolation. Par exemple, un auditeur se situant à la position d'interpolation et écoutant le champ interpolé a l'impression que le champ qui lui est restitué est cohérent avec le ou les source(s) sonore(s) (i.e. que le champ qui lui est restitué arrive effectivement en direction de la ou des source(s) sonore(s) en question).More particularly, the proposed method takes advantage of the encoding of the sound field in a form giving access to the vector of pressure gradients in addition to the pressure. In this way, the vector of pressure gradients of the interpolated field remains consistent with that of the sound field as emitted by the source(s) of the scene at the interpolation position. For example, a listener located at the interpolation position and listening to the interpolated field has the impression that the field which is restored to him is coherent with the sound source(s) (i.e. that the field which is is reproduced actually arrives in the direction of the sound source(s) in question).

Par ailleurs, l'utilisation d'une puissance estimée du champ sonore à la position d'interpolation pour estimer les facteurs de pondération permet de conserver une complexité calculatoire faible. Ceci permet par exemple une implémentation en temps-réel sur des dispositifs à capacité de calcul limitée.Furthermore, the use of an estimated power of the sound field at the interpolation position to estimate the weighting factors makes it possible to keep computational complexity low. This allows for example a real-time implementation on devices with limited computing capacity.

Selon un mode de réalisation, l'estimation met en oeuvre une résolution de l'équation $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) = \hat{W_{a}^{2}} (t) x_{a} (t)$

, avec :

x_i (t) un vecteur représentatif de la position du microphone d'indice i parmi les N microphones ;
x_a (t) un vecteur représentatif de la position d'interpolation ;
$\hat{W_{a}^{2}} (t)$
l'estimation de la puissance du champ sonore à la position d'interpolation ; et
$\hat{W_{ι}^{2}} (t)$
une estimation de la puissance instantanée $W_{i}^{2} (t)$
de la pression captée par le microphone d'indice i.

According to one embodiment, the estimation implements a resolution of the equation

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{}} (you) x_{I} (you) = \hat{W_{has}^{2}} (you) x_{has} (you)

, with :

x _i ( t ) a vector representative of the position of the microphone of index i among the N microphones;
x _a ( t ) a vector representative of the interpolation position;
$\hat{W_{has}^{2}} (you)$
estimating the strength of the sound field at the interpolation position; and
$\hat{W_{ι}^{}} (you)$
an estimate of the instantaneous power $W_{I}$ $_{2} (you)$
the pressure picked up by the microphone of index i.

Par exemple, l'équation en question est résolue au sens de la minimisation de l'erreur quadratique moyenne, e.g. par minimisation de la fonction de coût ${‖ \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) - \hat{W_{a}^{2}} (t) x_{a} (t) ‖}^{2}$

. En pratique, la méthode de résolution (e.g. l'algorithme du Simplexe) est choisie en fonction du caractère surdéterminé (plus d'équations que de microphones) ou sous-déterminé (plus de microphones que d'équations).For example, the equation in question is solved in the sense of minimizing the root mean square error, eg by minimizing the cost function

{‖ \sum_{I} {has}_{I} (you) \hat{W_{ι}^{}} (you) x_{I} (you) - \hat{W_{has}^{2}} (you) x_{has} (you) ‖}^{2}

. In practice, the resolution method (eg the Simplex algorithm) is chosen according to whether it is overdetermined (more equations than microphones) or underdetermined (more microphones than equations).

Selon un mode de réalisation, la résolution est effectuée sous la contrainte que $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)$

.According to one embodiment, the resolution is carried out under the constraint that

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{}} (you) = \hat{W_{has}^{2}} (you)

.

Selon un mode de réalisation, la résolution est en outre effectuée sous la contrainte que les N facteurs de pondération a_i (t) sont tous positifs ou nuls.According to one embodiment, the resolution is also performed under the constraint that the N weighting factors a _i ( t ) are all positive or zero.

Ainsi les inversions de phases sont évitées, conduisant par là-même à des résultats améliorés. Par ailleurs, la résolution de l'équation précitée se trouve accélérée.Thus phase inversions are avoided, thereby leading to improved results. Moreover, the resolution of the aforementioned equation is accelerated.

Selon un mode de réalisation, l'estimation met également en oeuvre une résolution de l'équation $α \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = α \hat{W_{a}^{2}} (t)$

, avec α un facteur d'homogénéisation.According to one embodiment, the estimation also implements a resolution of the equation

α \sum_{I} {has}_{I} (you) \hat{W_{ι}^{}} (you) = α \hat{W_{has}^{2}} (you)

, with α a homogenization factor.

Selon un mode de réalisation, le facteur d'homogénéisation α est proportionnel à la norme L-2 du vecteur x_a (t). According to one embodiment, the homogenization factor α is proportional to the L-2 norm of the vector x _a ( t ) .

Selon un mode de réalisation, l'estimation comprend :

un moyennage temporel de ladite puissance instantanée $W_{i}^{2} (t)$
sur une durée temporelle prédéterminée délivrant ladite estimation $W_{ι}$ $\hat{_{2}} (t)$
; ou
un filtrage autorégressif d'échantillons temporels de ladite puissance instantanée $W_{i}^{2} (t)$
, délivrant ladite estimation $W_{ι}$ $\hat{_{2}} (t)$
.

According to one embodiment, the estimate includes:

a temporal average of said instantaneous power $W_{I}$ $_{2} (you)$
over a predetermined time duration delivering said estimate $W_{ι}$ $\hat{_{2}} (you)$
; Where
autoregressive filtering of time samples of said instantaneous power $W_{I}^{2} (you)$
, delivering said estimate $W_{ι}$ $\hat{_{2}} (you)$
.

Ainsi, en utilisant la puissance efficace, les variations de la puissance instantanée $W_{i}^{2} (t)$

sont lissées dans le temps. De la sorte, le bruit pouvant entacher les facteurs de pondération est réduit lors de leur estimation. Le champ sonore interpolé se trouve ainsi plus stable.Thus, using the effective power, the variations of the instantaneous power

W_{I}^{2} (you)

are smoothed over time. In this way, the noise that could taint the weighting factors is reduced during their estimation. The interpolated sound field is thus more stable.

Selon un mode de réalisation, l'estimation $\hat{W_{a}^{2}} (t)$

de la puissance du champ sonore à la position d'interpolation est estimée à partir de la puissance sonore instantanée

W_{i}^{2} (t)

captée par celui des N microphones le plus proche de la position d'interpolation ou à partir de l'estimation

\hat{W_{ι}^{2}} (t)

de la puissance sonore instantanée

W_{i}^{2} (t)

captée par celui des N microphones le plus proche de la position d'interpolation.According to one embodiment, the estimate

\hat{W_{has}^{2}} (you)

of the sound field power at the interpolation position is estimated from the instantaneous sound power

W_{I}^{2} (you)

picked up by the one of the N microphones closest to the interpolation position or from the estimate

W_{ι}

\hat{_{2}} (you)

instant sound power

W_{I}^{2} (you)

picked up by the one of the N microphones closest to the interpolation position.

Selon un mode de réalisation, l'estimation $\hat{W_{a}^{2}} (t)$

de la puissance du champ sonore à la position d'interpolation est estimée à partir d'un barycentre des N puissances sonores instantanées

W_{i}^{2} (t)

captées par les N microphones, respectivement à partir d'un barycentre des N estimations

\hat{W_{ι}^{2}} (t)

des N puissances sonores instantanées

W_{i}^{2} (t)

captées par les N microphones. Un coefficient pondérant la puissance sonore instantanée

W_{i}^{2} (t)

, respectivement pondérant l'estimation

\hat{W_{ι}^{2}} (t)

de la puissance sonore instantanée

W_{i}^{2} (t)

captée par le microphone d'indice i, dans le barycentre est inversement proportionnel à une version normalisée de la distance entre la position du microphone d'indice i délivrant la pression W_i (t) et la position d'interpolation. La distance est exprimée au sens d'une norme L-p.According to one embodiment, the estimate

\hat{W_{has}^{2}} (you)

of the sound field power at the interpolation position is estimated from a barycenter of the N instantaneous sound powers

W_{I}^{2} (you)

picked up by the N microphones, respectively from a barycenter of the N estimates

\hat{W_{ι}^{}} (you)

N instantaneous sound powers

W_{I}^{2} (you)

picked up by the N microphones. A coefficient weighting the instantaneous sound power

W_{I}^{2} (you)

, respectively weighting the estimate

W_{ι}

\hat{_{2}} (you)

instant sound power

W_{I}^{2} (you)

picked up by the microphone of index i , in the barycenter is inversely proportional to a normalized version of the distance between the position of the microphone of index i delivering the pressure W _i ( t ) and the interpolation position. The distance is expressed in the sense of an L- p norm.

Ainsi, la pression du champ sonore à la position d'interpolation est estimée de manière précise sur la base des pressions délivrés par les microphones. Notamment, lorsque p est choisi égal à deux, la loi de décroissance de la pression du champ sonore est respectée, conduisant à de bons résultats quelle que soit la configuration de la scène.Thus, the sound field pressure at the interpolation position is accurately estimated based on the pressures delivered by the microphones. In particular, when p is chosen equal to two, the law of decreasing pressure of the sound field is respected, leading to good results whatever the configuration of the stage.

Selon un mode de réalisation, le procédé d'interpolation comprend en outre, préalablement à l'interpolation, une sélection des N microphones parmi Nt microphones, Nt>N.According to one embodiment, the interpolation method further comprises, prior to the interpolation, a selection of the N microphones from among Nt microphones, Nt>N.

Ainsi, les facteurs de pondération peuvent être obtenus via un système d'équations déterminé ou surdéterminé, permettant ainsi d'éviter ou du moins minimiser les changements de timbres perceptibles à l'oreille sur le champ sonore interpolé.Thus, the weighting factors can be obtained via a determined or overdetermined system of equations, thus making it possible to avoid or at least minimize the changes of timbres perceptible to the ear on the interpolated sound field.

Selon un mode de réalisation, les N microphones sélectionnés sont les plus proches de la position d'interpolation parmi les Nt microphones.According to one embodiment, the N microphones selected are the closest to the interpolation position among the Nt microphones.

Selon un mode de réalisation, la sélection comprend :

une sélection de deux microphones d'indices i₁ et i₂ les plus proches de ladite position d'interpolation parmi lesdits Nt microphones ;
un calcul d'un vecteur médian u₁₂(t) ayant pour origine ladite position d'interpolation et pointant entre les positions des deux microphones d'indices i₁ et i₂ ; et
une détermination d'un troisième microphone d'indices i₃ différent desdits deux microphones d'indices i₁ et i₂ parmi les Nt microphones et dont la position est la plus à l'opposé du vecteur médian u₁₂(t).

According to one embodiment, the selection includes:

a selection of two microphones of indices i ₁ and i ₂ closest to said interpolation position from among said Nt microphones;
a calculation of a median vector u ₁₂ (t) originating from said interpolation position and pointing between the positions of the two microphones of indices i ₁ and i ₂ ; and
a determination of a third microphone of indices i ₃ different from said two microphones of indices i ₁ and i ₂ among the Nt microphones and whose position is furthest away from the median vector u ₁₂ (t).

Ainsi, les microphones sont sélectionnés de manière à être répartis autour de la position d'interpolation.Thus, the microphones are selected so as to be distributed around the interpolation position.

Selon un mode de réalisation, le vecteur médian u₁₂(t) s'exprime comme $u_{12} (t) = \frac{(x_{i_{2}} (t) - x_{a} (t) + x_{i_{1}} (t) - x_{a} (t))}{‖ x_{i_{2}} (t) - x_{a} (t) + x_{i_{1}} (t) - x_{a} (t) ‖}$

, avec x_a (t) le vecteur représentatif de la position d'interpolation, x _{i 1} (t) un vecteur représentatif de la position du microphone d'indice i₁, et x_i2 (t) un vecteur représentatif de la position du microphone d'indice i₂ . L'indice i₃ du troisième microphone est un indice différent de i₁ et i₂ qui minimise le produit scalaire

〈 u_{12} (t), \frac{x_{i} (t) - x_{a} (t)}{‖ x_{i} (t) - x_{a} (t) ‖} 〉

parmi les Nt indices de microphones.According to one embodiment, the median vector u ₁₂ (t) is expressed as

a_{12} (you) = \frac{(x_{I_{2}} (you) - x_{has} (you) + x_{I_{1}} (you) - x_{has} (you))}{‖ x_{I_{2}} (you) - x_{has} (you) + x_{I_{1}} (you) - x_{has} (you) ‖}

, with x _a ( t ) the representative vector of the interpolation position, x _i ₁ (t) a vector representative of the position of the microphone of index i ₁ , and x _i ₂ ( t ) a vector representative of the position of the microphone of index i ₂ . The index i ₃ of the third microphone is an index different from i ₁ and i ₂ which minimizes the scalar product

〈 a_{12} (you), \frac{x_{I} (you) - x_{has} (you)}{‖ x_{I} (you) - x_{has} (you) ‖} 〉

among the Nt microphone indices.

Selon un mode de réalisation, le procédé d'interpolation comprend en outre, pour un champ sonore encodé donné parmi les N champs sonores encodés délivrés par les N microphones, une transformation du champ sonore encodé donné par application d'un banc de filtres à reconstruction parfaite délivrant M composantes fréquentielles de champ associées au champ sonore encodé donné, chaque composante fréquentielle de champ parmi les M composantes fréquentielles de champ étant localisée dans une sous-bande fréquentielle distincte. La transformation répétée pour les N champs sonores encodés délivre N jeux correspondants de M composantes fréquentielles de champ. Pour une sous-bande fréquentielle donnée parmi les M sous-bandes fréquentielles, l'interpolation délivre une composante fréquentielle de champ interpolée en la position d'interpolation et localisée dans la sous-bande fréquentielle donnée, la composante fréquentielle de champ interpolée s'exprime comme une combinaison linéaire des N composantes fréquentielles de champ, parmi les N jeux, localisées dans la sous-bande fréquentielle donnée. L'interpolation répétée pour les M sous-bandes fréquentielles délivre M composantes fréquentielles de champ interpolées en la position d'interpolation, chaque composante fréquentielle de champ interpolée parmi les M composantes fréquentielles de champ interpolées étant localisée dans une sous-bande fréquentielle distincte.According to one embodiment, the interpolation method further comprises, for a given encoded sound field among the N encoded sound fields delivered by the N microphones, a transformation of the given encoded sound field by applying a bank of reconstruction filters perfect delivering M field frequency components associated with the given encoded sound field, each field frequency component among the M field frequency components being located in a distinct frequency sub-band. The repeated transformation for the N encoded sound fields delivers N corresponding sets of M field frequency components. For a given frequency sub-band among the M frequency sub-bands, the interpolation delivers a frequency component of field interpolated at the interpolation position and localized in the given frequency sub-band, the interpolated field frequency component is expressed as a linear combination of the N field frequency components, among the N sets, located in the given frequency sub-band. The repeated interpolation for the M frequency sub-bands delivers M interpolated field frequency components at the interpolation position, each interpolated field frequency component among the M interpolated field frequency components being located in a distinct frequency sub-band.

Ainsi, les résultats sont améliorés dans le cas où le champ sonore est généré par une pluralité de sources sonores.Thus, the results are improved in the case where the sound field is generated by a plurality of sound sources.

Selon un mode de réalisation, le procédé d'interpolation comprend en outre une transformation inverse à ladite transformation. La transformation inverse appliquée aux M composantes fréquentielles de champ interpolées délivre le champ sonore encodé interpolé en la position d'interpolation.According to one embodiment, the interpolation method further comprises a transformation inverse to said transformation. The inverse transformation applied to the M interpolated field frequency components delivers the interpolated encoded sound field at the interpolation position.

Selon un mode de réalisation, le banc de filtres à reconstruction parfaite appartient au groupe comprenant :

DFT (de l'anglais « Discrète Fourier Transform ») ;
QMF (de l'anglais « Quadrature Mirror Filter ») ;
PQMF (de l'anglais « Pseudo - Quadrature Mirror Filter ») ; et
MDCT (de l'anglais « Modified Discrète Cosine Transform »).

According to one embodiment, the bank of filters with perfect reconstruction belongs to the group comprising:

DFT (from the English “Discrete Fourier Transform”);
QMF (from the English “Quadrature Mirror Filter”);
PQMF (from the English “Pseudo-Quadrature Mirror Filter”); and
MDCT (from the English “Modified Discrete Cosine Transform”).

L'invention concerne également un procédé de restitution d'un champ sonore. Un tel procédé comprend :

une captation du champ sonore par une pluralité de N microphones délivrant chacun un champ sonore capté correspondant ;
un encodage de chacun des champs sonores captés délivrant un champ sonore encodé correspondant sous une forme comprenant au moins une pression captée et un vecteur de gradients de pression associé ;
une phase interpolation mettant en oeuvre le procédé d'interpolation décrit ci-dessus (selon l'un quelconque des modes de réalisation précités) délivrant le champ sonore encodé interpolé en la position d'interpolation ;
une compression du champ sonore encodé interpolé délivrant un champ sonore encodé interpolé compressé ;
une transmission du champ sonore encodé interpolé compressé à au moins un dispositif de restitution ;
une décompression du champ sonore encodé interpolé compressé reçu ; et
une restitution du champ sonore encodé interpolé sur ledit au moins un dispositif de restitution.

The invention also relates to a method for restoring a sound field. Such a process includes:

a capture of the sound field by a plurality of N microphones each delivering a corresponding captured sound field;
encoding each of the sensed sound fields delivering a corresponding encoded sound field in a form comprising at least one sensed pressure and an associated pressure gradient vector;
an interpolation phase implementing the interpolation method described above (according to any one of the aforementioned embodiments) delivering the interpolated encoded sound field at the interpolation position;
compression of the interpolated encoded sound field delivering a compressed interpolated encoded sound field;
a transmission of the compressed interpolated encoded sound field to at least one rendering device;
decompressing the received compressed interpolated encoded sound field; and
a restitution of the interpolated encoded sound field on said at least one restitution device.

L'invention concerne également un programme d'ordinateur, comprenant des instructions de code de programme pour la mise en oeuvre d'un procédé d'interpolation ou de restitution tel que décrit précédemment, selon l'un quelconque de ses différents modes de réalisation, lorsque ledit programme est exécuté par un processeur.The invention also relates to a computer program, comprising program code instructions for the implementation of an interpolation or restitution method as described previously, according to any one of its various embodiments, when said program is executed by a processor.

Dans un autre mode de réalisation de l'invention, il est proposé un dispositif d'interpolation d'un champ sonore capté par une pluralité de N microphones délivrant chacun le champ sonore encodé sous une forme comprenant au moins une pression captée et un vecteur de gradients de pression associé. Un tel dispositif d'interpolation comprend une machine de calcul reprogrammable ou une machine de calcul dédiée, apte à et configurée pour mettre en oeuvre les étapes du procédé d'interpolation décrit précédemment (selon l'un quelconque de ses différents modes de réalisation).In another embodiment of the invention, there is proposed a device for interpolating a sound field picked up by a plurality of N microphones each delivering the sound field encoded in a form comprising at least one pressure picked up and a vector of associated pressure gradients. Such an interpolation device comprises a reprogrammable calculation machine or a dedicated calculation machine, capable of and configured to implement the steps of the interpolation method described previously (according to any one of its various embodiments).

Ainsi, les caractéristiques et avantages de ce dispositif sont les mêmes que ceux du procédé d'interpolation décrit précédemment. Par conséquent, ils ne sont pas détaillés plus amplement.Thus, the characteristics and advantages of this device are the same as those of the interpolation method described above. Therefore, they are not further detailed.

Liste des figuresList of Figures

D'autres buts, caractéristiques et avantages de l'invention apparaîtront plus clairement à la lecture de la description suivante, donnée à titre de simple exemple illustratif, et non limitatif, en relation avec les figures, parmi lesquelles :

[ fig. 1 ] représente une scène sonore dans laquelle un auditeur se déplace, un champ sonore ayant été diffusé par des sources sonores et ayant été capté par des microphones ;
[ fig. 2 ] représente les étapes d'un procédé d'interpolation du champ sonore capté par les microphones de la [ fig. 1 ] selon un mode de réalisation de l'invention ;
[ fig. 3a ] représente une scène dans laquelle un champ sonore est diffusé par une source sonore unique et est capté par quatre microphones selon une première configuration ;
[ fig. 3b ] représente une cartographie de l'opposé de l'intensité acoustique normalisée dans le plan 2D générée par la source sonore de la scène de la [ fig. 3a ] ainsi qu'une cartographie de l'opposé de l'intensité acoustique normalisée tel qu'estimée par une méthode connue à partir des grandeurs captées par les quatre microphones de la [ fig. 3a ] ;
[ fig. 3c ] représente une cartographie de l'opposé de l'intensité acoustique normalisée dans le plan 2D générée par la source sonore de la scène de la [ fig. 3a ] ainsi qu'une cartographie de l'opposé de l'intensité acoustique normalisée tel qu'estimée par la méthode de la figure [ fig. 2 ] à partir des grandeurs captées par les quatre microphones de la [ fig. 3a ] ;
[ fig. 4a ] représente une autre scène dans laquelle un champ sonore est diffusé par une source sonore unique et est capté par quatre microphones selon une deuxième configuration ;
[ fig. 4b ] représente une cartographie de l'opposé de l'intensité acoustique normalisée dans le plan 2D générée par la source sonore de la scène de la [ fig. 4a ] ainsi qu'une cartographie de l'opposé de l'intensité acoustique normalisée du champ sonore tel qu'estimée par une méthode connue à partir des grandeurs captées par les quatre microphones de la [ fig. 4a ] ;
[ fig. 4c ] représente une cartographie de l'opposé de l'intensité acoustique normalisée dans le plan 2D générée par la source sonore de la scène de la [ fig. 4a ] ainsi qu'une cartographie de l'opposé de l'intensité acoustique normalisée du champ sonore tel qu'estimée par la méthode de la figure [ fig. 2 ] à partir des grandeurs captées par les quatre microphones de la [ fig. 4a ] ;
[ fig. 5 ] représente les étapes d'un procédé d'interpolation du champ sonore capté par les microphones de la [ fig. 1 ] selon un autre mode de réalisation de l'invention;
[ fig. 6 ] représente les étapes d'un procédé de restitution, à l'auditeur de la [ fig. 1 ], du champ sonore capté par les microphones de la [ fig. 1 ] selon un mode de réalisation de l'invention ;
[ fig. 7 ] représente un exemple de structure de dispositif d'interpolation selon un mode de réalisation de l'invention.

Other aims, characteristics and advantages of the invention will appear more clearly on reading the following description, given by way of a simple illustrative example, and not limiting, in relation to the figures, among which:

[ fig. 1 ] represents a sound scene in which a listener moves, a sound field having been diffused by sound sources and having been picked up by microphones;
[ fig. 2 ] represents the stages of a process of interpolation of the sound field picked up by the microphones of the [ fig. 1 ] according to one embodiment of the invention;
[ fig. 3a ] represents a scene in which a sound field is broadcast by a single sound source and is picked up by four microphones according to a first configuration;
[ fig. 3b ] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene of the [ fig. 3a ] as well as a map of the opposite of the normalized acoustic intensity as estimated by a known method from the quantities picked up by the four microphones of the [ fig. 3a ];
[ fig. 3c ] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene of the [ fig. 3a ] as well as a map of the opposite of the normalized acoustic intensity as estimated by the method of figure [ fig. 2 ] from the quantities picked up by the four microphones of the [ fig. 3a ];
[ fig. 4a ] shows another scene in which a sound field is broadcast by a single sound source and is picked up by four microphones according to a second configuration;
[ fig. 4b ] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene of the [ fig. 4a ] as well as a map of the opposite of the normalized acoustic intensity of the sound field as estimated by a known method from the quantities picked up by the four microphones of the [ fig. 4a ];
[ fig. 4c ] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene of the [ fig. 4a ] as well as a map of the opposite of the normalized acoustic intensity of the sound field as estimated by the method of figure [ fig. 2 ] from the quantities picked up by the four microphones of the [ fig. 4a ];
[ fig. 5 ] represents the stages of a process of interpolation of the sound field picked up by the microphones of the [ fig. 1 ] according to another embodiment of the invention;
[ fig. 6 ] represents the stages of a restitution process, to the listener of the [ fig. 1 ], the sound field picked up by the microphones of the [ fig. 1 ] according to one embodiment of the invention;
[ fig. 7 ] represents an example of an interpolation device structure according to one embodiment of the invention.

Description détaillée de modes de réalisation de l'inventionDetailed Description of Embodiments of the Invention

Sur toutes les figures du présent document, les éléments et étapes identiques sont désignés par une même référence.In all the figures of this document, the identical elements and steps are designated by the same reference.

Le principe général de l'invention repose sur l'encodage du champ sonore par les microphones captant le champ sonore en question sous une forme comprenant au moins une pression captée et un gradient de pression associé. De la sorte, le gradient de pression du champ interpolé via une combinaison linéaire des champs sonores encodés par les microphones reste cohérent avec celui du champ sonore tel qu'émis par la ou les source(s) de la scène à la position d'interpolation. Par ailleurs, la méthode selon l'invention base l'estimation des facteurs de pondération mis en jeu dans la combinaison linéaire en question sur une estimation de la puissance du champ sonore à la position d'interpolation. Ainsi, une complexité calculatoire faible est obtenue.The general principle of the invention is based on the encoding of the sound field by the microphones picking up the sound field in question in a form comprising at least one pressure picked up and an associated pressure gradient. In this way, the pressure gradient of the field interpolated via a linear combination of the sound fields encoded by the microphones remains consistent with that of the sound field as emitted by the source(s) of the scene at the interpolation position. . Moreover, the method according to the invention bases the estimation of the weighting factors involved in the linear combination in question on an estimation of the power of the sound field at the interpolation position. Thus, a low computational complexity is obtained.

On s'intéresse dans la suite à décrire un exemple particulier d'application de l'invention au contexte de la navigation d'un auditeur dans une scène sonore. On notera que l'invention n'est bien sûr pas limitée à ce type d'application et peut être avantageusement utilisée dans d'autres domaines comme celui de la restitution d'une scène multicanal, de la compression d'une scène multicanal, etc.We are interested in the following to describe a particular example of application of the invention to the context of the navigation of a listener in a sound scene. It will be noted that the invention is of course not limited to this type of application and can be advantageously used in other fields such as that of the restitution of a multi-channel scene, the compression of a multi-channel scene, etc.

Par ailleurs, dans la présente demande :

la terminologie d'encodage (ou de codage) est utilisée pour désigner l'opération de représentation d'un champ sonore physique capté par un microphone donné suivant une ou plusieurs grandeurs selon un format de représentation prédéfini. Un tel format est par exemple le format ambisonique décrit ci-dessus en relation avec la section « Art antérieur et ses inconvénients ». L'opération inverse s'apparente alors à une restitution du champs sonore, e.g. sur un dispositif du type haut-parleur qui convertit des échantillons du champs sonore au format de représentation prédéfini en un champs acoustique physique ; et
la terminologie de compression est quant à elle utilisée pour désigner un traitement visant à réduire la quantité de données nécessaire à représenter une quantité d'information donnée. Il s'agit par exemple d'un traitement du type « codage entropique » (e.g. selon le standard MP3) appliqué aux échantillons du champ sonore encodé. La terminologie de décompression correspond ainsi à l'opération inverse.

Further, in this application:

the terminology of encoding (or coding) is used to designate the operation of representing a physical sound field picked up by a given microphone according to one or more quantities according to a predefined representation format. Such a format is for example the ambisonic format described above in relation to the section “Prior art and its disadvantages”. The inverse operation is then similar to a restitution of the sound field, eg on a device of the loudspeaker type which converts samples of the sound field in the predefined representation format into a physical acoustic field; and
the terminology of compression is itself used to designate a processing aimed at reducing the quantity of data necessary to represent a given quantity of information. This is for example a processing of the “entropic coding” type (eg according to the MP3 standard) applied to the samples of the encoded sound field. The decompression terminology thus corresponds to the reverse operation.

On présente désormais en relation avec la [ fig. 1 ] une scène sonore 100 dans laquelle un auditeur 110 se déplace, un champ sonore ayant été diffusé par des sources sonores 100s et ayant été capté par des microphones 100m.We now present in relation to the [ fig. 1 ] a sound scene 100 in which a listener 110 moves, a sound field having been broadcast by sound sources 100s and having been picked up by microphones 100m.

Plus particulièrement, l'auditeur 110 est muni d'un casque équipé de haut-parleurs 110hp permettant la restitution du champ sonore interpolé à la position d'interpolation qu'il occupe. Il s'agit par exemple d'un casque Hi-Fi, ou d'un casque de réalité virtuelle comme l'Oculus, le HTC Vive ou encore le Samsung Gear. Le champ sonore est ici interpolé et restitué par mise en oeuvre du procédé de restitution décrit ci-dessous en relation avec la [fig. 6].More particularly, the listener 110 is provided with a helmet equipped with 110hp loudspeakers allowing the restitution of the interpolated sound field at the interpolation position which it occupies. This is for example a Hi-Fi headset, or a virtual reality headset such as the Oculus, the HTC Vive or the Samsung Gear. The sound field is here interpolated and restored by implementing the restitution process described below in relation to the [ fig. 6 ].

Par ailleurs, le champ sonore capté par les microphones 100m est encodé sous une forme comprenant une pression captée et un gradient de pression associé.Furthermore, the sound field picked up by the 100m microphones is encoded in a form comprising a picked up pressure and an associated pressure gradient.

Dans d'autres modes de réalisation non illustrés, le champ sonore capté par les microphones est encodé sous une forme comprenant la pression captée, le vecteur des gradients de pression associé ainsi que tout ou partie des composantes d'ordres supérieurs du champ sonore au format ambisonique.In other embodiments not illustrated, the sound field picked up by the microphones is encoded in a form comprising the pressure picked up, the vector of the associated pressure gradients as well as all or part of the higher order components of the sound field in the format ambisonic.

De retour à la [fig. 1], la perception de la direction d'arrivée du front d'onde du champ sonore est directement corrélée avec un vecteur intensité acoustique I (t) qui mesure le flux instantané d'énergie acoustique à travers une surface élémentaire. Le vecteur intensité en question est égal au produit de la pression acoustique instantanée W(t) par la vélocité particulaire, qui est opposée au vecteur des gradients de pression B (t). Ce vecteur des gradients de pression peut être exprimé en 2D ou 3D selon que l'on désire se déplacer et/ou percevoir les sons en 2D ou 3D. Dans la suite, on se place dans le cas 3D, la dérivation du cas 2D étant immédiate. Dans ce cas le vecteur des gradients s'exprime comme un vecteur de dimension 3 : B (t) = [X(t) Y(t) Z(t)] ^T . Ainsi, dans le formalisme considéré où le champ sonore est encodé sous une forme comprenant la pression captée et le vecteur des gradients de pression associé (à un coefficient multiplicateur près) : $\vec{I} (t) = - W (t) [\begin{matrix} X (t) \\ Y (t) \\ Z (t) \end{matrix}] .$

Back to the [ fig. 1 ], the perception of the direction of arrival of the sound field wavefront is directly correlated with an acoustic intensity vector I ( t ) which measures the instantaneous flux of acoustic energy through an elementary surface. The intensity vector in question is equal to the product of the instantaneous acoustic pressure W(t) by the particle velocity, which is opposed to the vector of the pressure gradients B (t). This vector of gradients pressure can be expressed in 2D or 3D depending on whether one wishes to move and/or perceive sounds in 2D or 3D. In the following, we place ourselves in the 3D case, the derivation of the 2D case being immediate. In this case the gradient vector is expressed as a 3-dimensional vector: B ( t ) = [X( t ) Y ( t ) Z ( t )] ^T . Thus, in the formalism considered where the sound field is encoded in a form comprising the pressure captured and the vector of the associated pressure gradients (within a multiplier coefficient):

\vec{I} (you) = - W (you) [\begin{matrix} X (you) \\ Y (you) \\ Z (you) \end{matrix}] .

On montre que ce vecteur est orthogonal au front d'onde et pointe dans la direction de la propagation de l'onde sonore, soit à l'opposé de la position de la source émettrice : en ce sens, il est directement corrélé avec la perception du front d'onde. C'est particulièrement évident si on considère un champ généré par une seule source ponctuelle et lointaine s(t) se propageant dans un milieu anéchoïque. La théorie de l'ambinosie stipule que, pour une telle onde plane d'incidence (θ, ϕ), où θ est l'azimut et ϕ l'élévation, le champ sonore au premier ordre est donné par l'équation suivante : ${\begin{matrix} W (t) & = & s (t) \\ X (t) & = & \cos θ \cos φ s (t) \\ Y (t) & = & \sin θ \cos φ s (t) \\ Z (t) & = & \sin φ s (t) \end{matrix} .$

We show that this vector is orthogonal to the wave front and points in the direction of the propagation of the sound wave, i.e. opposite the position of the emitting source: in this sense, it is directly correlated with the perception of the wavefront. This is particularly evident if we consider a field generated by a single distant point source s(t) propagating in an anechoic medium. Ambinosia theory states that, for such a plane wave of incidence (θ , ϕ), where θ is the azimuth and ϕ the elevation, the first-order sound field is given by the following equation:

{\begin{matrix} W (you) & = & s (you) \\ X (you) & = & \cos θ \cos φ s (you) \\ Y (you) & = & \sin θ \cos φ s (you) \\ Z (you) & = & \sin φ s (you) \end{matrix} .

Dans ce cas, l'intensité acoustique pleine-bande I (t) est égale (à un coefficient multiplicateur près), à : $\vec{I} (t) = - [\begin{matrix} \cos θ \cos φ \\ \sin θ \cos φ \\ \sin φ \end{matrix}] s^{2} (t) .$

In this case, the full-band sound intensity I ( t ) is equal (to within a multiplier coefficient), to:

\vec{I} (you) = - [\begin{matrix} \cos θ \cos φ \\ \sin θ \cos φ \\ \sin φ \end{matrix}] s^{2} (you) .

On voit donc qu'il pointe à l'opposé de la direction de la source émettrice et la direction d'arrivée (θ, ϕ) du front d'onde peut être estimée par les relations trigonométriques suivantes : ${\begin{matrix} θ = & \arctan (\frac{WY}{WX}) \\ φ = & \arctan (\frac{WZ}{\sqrt[2]{{(WX)}^{2} + {(WY)}^{2}}}) \end{matrix} .$

We can therefore see that it points opposite to the direction of the emitting source and the direction of arrival (θ , ϕ) of the wave front can be estimated by the following trigonometric relations:

{\begin{matrix} θ = & \arctan (\frac{WY}{WX}) \\ φ = & \arctan (\frac{WZ}{(WX)}) (\sqrt[2]{{()}^{2} + {(WY)}^{2}}) \end{matrix} .

On présente désormais, en relation avec la [ fig. 2 ], un procédé d'interpolation du champ sonore capté par les microphones 100m de la scène 100 selon un mode de réalisation de l'invention.We now present, in relation to the [ fig. 2 ], a method for interpolating the sound field picked up by the microphones 100m of the stage 100 according to one embodiment of the invention.

Un tel procédé comprend une étape E200 de sélection de N microphones parmi les Nt microphones de la scène 100. On notera que dans le mode de réalisation représenté sur la [fig. 1], Nt=4. Cependant, dans d'autres modes de réalisation non illustrés, la scène considérée peut comprendre un nombre Nt différent de microphones.Such a method comprises a step E200 of selecting N microphones from among the Nt microphones of the stage 100. It will be noted that in the embodiment shown in [ fig. 1 ], Nt=4. However, in other embodiments not illustrated, the scene considered may comprise a different number Nt of microphones.

Plus particulièrement, comme discuté ci-dessous en relation avec les étapes E210 et E210a, le procédé selon l'invention met en oeuvre la résolution de systèmes d'équations (i.e. [Math 4] dans différentes alternatives de contraintes (i.e. hyperplan et/ou positivité des facteurs de pondération) et [Math 5]). En pratique, il s'avère que la résolution des systèmes en question dans le cas où ils sont sous-déterminés (cas qui correspond à la configuration où il y a plus de microphones 100m que d'équations à résoudre) amène à des solutions qui, au cours du temps, peuvent privilégier différents ensembles de microphones. Si la localisation des sources 100s telle que perçue via le champ sonore interpolé reste toujours cohérente, il en résulte cependant des changements de timbres perceptibles à l'oreille. Ces différences sont dues : i) à la coloration de la réverbération qui est différente d'un microphone 100m à un autre ; ii) au filtrage en peigne induit par le mélange de microphones 100m non coïncidents, filtrage qui a des caractéristiques différentes d'un ensemble de microphones à un autre.More particularly, as discussed below in relation to steps E210 and E210a, the method according to the invention implements the resolution of systems of equations (i.e. [Math 4] in different constraint alternatives (i.e. hyperplane and/or positivity of the weighting factors) and [Math 5]). In practice, it turns out that the resolution of the systems in question in the case where they are under-determined (case which corresponds to the configuration where there are more 100m microphones than equations to be solved) leads to solutions which , over time, may favor different sets of microphones. If the localization of the sources 100s as perceived via the interpolated sound field always remains coherent, it results however from it changes of timbres perceptible to the ear. These differences are due to: i) the coloring of the reverberation which is different from one 100m microphone to another; ii) the comb filtering induced by the mixing of non-coincident 100m microphones, filtering which has different characteristics from one set of microphones to another.

Pour éviter de tels changements de timbres, on sélectionne N microphones 100m en se ramenant à un mélange déterminé, voire sur-déterminé. Par exemple, dans le cas d'une interpolation en 3D, on pourra sélectionner jusqu'à trois microphones parmi les Nt microphones 100m.To avoid such changes in timbre, N 100m microphones are selected by reducing ourselves to a determined, or even over-determined, mixture. For example, in the case of a 3D interpolation, it is possible to select up to three microphones among the Nt microphones 100m.

Dans une variante, on sélectionne les N microphones 110m les plus proches de la position à interpoler. Cette solution est à privilégier lorsqu'un grand nombre Nt de microphones 110m est présent dans la scène. Cependant, dans certains cas, le choix des N microphones 110m les plus proches peut s'avérer « déséquilibré » en regard de la position d'interpolation par rapport à la source 100s et amener à une inversion complète de la direction d'arrivée : c'est notamment le cas lorsque la source 100s est placée entre les microphones 100m et la position d'interpolation.In a variant, the N microphones 110m closest to the position to be interpolated are selected. This solution is to be preferred when a large number Nt of 110m microphones is present in the scene. However, in some cases, the choice of the N closest 110m microphones may turn out to be "unbalanced" with regard to the interpolation position with respect to the 100s source and lead to a complete inversion of the direction of arrival: c This is notably the case when the 100s source is placed between the 100m microphones and the interpolation position.

Pour éviter cette situation, dans une autre variante les N microphones sont choisis de manière répartie autour de la position d'interpolation. Par exemple, on sélectionne les deux microphones d'indices i₁ et i₂ les plus proches de la position d'interpolation parmi les Nt microphones 100m, puis on cherche parmi les microphones restants celui qui maximise « l'enveloppement » de la position d'interpolation. Pour réaliser ceci, l'étape E200 comprend par exemple :

une sélection de deux microphones d'indices i₁ et i₂ les plus proches de la position d'interpolation parmi les Nt microphones 110m ;
un calcul d'un vecteur médian u₁₂(t) ayant pour origine la position d'interpolation et pointant entre les positions des deux microphones d'indices i₁ et i₂ ; et
une détermination d'un troisième microphone d'indices i₃ différent des deux microphones d'indices i₁ et i₂ parmi les Nt microphones 110m et dont la position est la plus à l'opposé du vecteur médian u₁₂(t).

To avoid this situation, in another variant the N microphones are chosen in a distributed manner around the interpolation position. For example, we select the two microphones of indices i ₁ and i ₂ closest to the interpolation position among the Nt microphones 100m, then we search among the remaining microphones for the one that maximizes the “envelopment” of the position d 'interpolation. To achieve this, step E200 comprises for example:

a selection of two microphones of indices i ₁ and i ₂ closest to the interpolation position from among the Nt microphones 110m;
a calculation of a median vector u ₁₂ (t) originating from the interpolation position and pointing between the positions of the two microphones of indices i ₁ and i ₂ ; and
a determination of a third microphone of indices i ₃ different from the two microphones with indices i ₁ and i ₂ among the Nt microphones 110m and whose position is furthest away from the median vector u ₁₂ (t).

Par exemple, le vecteur médian u₁₂(t) s'exprime comme : $u_{12} (t) = \frac{(x_{i_{2}} (t) - x_{a} (t) + x_{i_{1}} (t) - x_{a} (t))}{‖ x_{i_{2}} (t) - x_{a} (t) + x_{i_{1}} (t) - x_{a} (t) ‖}$

avec :

x_a (t) = (x_a (t) y_a (t) z_a (t)) ^T un vecteur représentatif de la position d'interpolation (i.e. la position de l'auditeur 110 dans le mode de réalisation représenté sur la [fig. 1]) ;
x _{i 1}(t) = (x _{i 1}(t) y _{i 1}(t) z _{i 1}(t)) ^T un vecteur représentatif de la position du microphone d'indice i₁ ; et
x _{i 2}(t) = (x _{i 2}(t) y _{i 2}(t) z _{i 2}(t)) ^T un vecteur représentatif de la position du microphone d'indice i₂ ,

les vecteurs en question étant exprimés dans un repère donné.For example, the median vector u ₁₂ (t) is expressed as:

a_{12} (you) = \frac{(x_{I_{2}} (you) - x_{has} (you) + x_{I_{1}} (you) - x_{has} (you))}{‖ x_{I_{2}} (you) - x_{has} (you) + x_{I_{1}} (you) - x_{has} (you) ‖}

with :

x _a ( t ) = ( x _a ( t ) y _a ( t ) z _a ( t )) ^T a vector representative of the interpolation position (ie the position of the listener 110 in the embodiment represented on the [ fig. 1 ]);
x _i ₁ ( t ) = ( x _i ₁ ( t ) y _i ₁ ( t ) z _i ₁ ( t )) ^T a vector representative of the position of the microphone of index i ₁ ; and
x _i ₂ ( t ) = ( x _i ₂ ( t ) y _i ₂ ( t ) z _i ₂ ( t )) ^T a vector representative of the position of the microphone of index i ₂ ,

the vectors in question being expressed in a given frame.

Dans ce cas, l'indice i₃ dudit troisième microphone est par exemple un indice différent de i₁ et i₂ qui minimise le produit scalaire $〈 u_{12} (t), \frac{x_{i} (t) - x_{a} (t)}{‖ x_{i} (t) - x_{a} (t) ‖} 〉$

parmi les Nt indices de microphones 100m. En effet, le produit scalaire en question varie entre -1 et +1, et il est minimal lorsque les vecteurs u₁₂(t) et

\frac{x_{i} (t) - x_{a} (t)}{‖ x_{i} (t) - x_{a} (t) ‖}

sont opposés, c'est-à-dire lorsque les 3 microphones sélectionnés parmi les Nt microphones 110m entourent la position d'interpolation.In this case, the index i ₃ of said third microphone is for example an index different from i ₁ and i ₂ which minimizes the scalar product

〈 a_{12} (you), \frac{x_{I} (you) - x_{has} (you)}{‖ x_{I} (you) - x_{has} (you) ‖} 〉

among the Nt indices of microphones 100m. Indeed, the scalar product in question varies between -1 and +1, and it is minimal when the vectors u ₁₂ (t) and

\frac{x_{I} (you) - x_{has} (you)}{‖ x_{I} (you) - x_{has} (you) ‖}

are opposite, that is to say when the 3 microphones selected from the Nt microphones 110m surround the interpolation position.

Dans d'autres modes de réalisation non illustrés sur la [fig. 2], l'étape E200 de sélection n'est pas mise en oeuvre et les étapes E210 et E210a décrites ci-dessous sont mises en oeuvre sur la base des champs sonores encodés par l'ensemble des Nt microphones 100m. En d'autres termes, N=Nt pour la mise en oeuvre des étapes E210 et E210a dans les autres modes de réalisation en question.In other embodiments not illustrated in [ fig. 2 ], the selection step E200 is not implemented and the steps E210 and E210a described below are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m. In other words, N=Nt for the implementation of steps E210 and E210a in the other embodiments in question.

De retour à la [fig. 2], le procédé comprend une étape E210 d'interpolation du champ sonore en une position d'interpolation, délivrant un champ sonore encodé interpolé s'exprimant comme une combinaison linéaire des N champs sonores encodés par les N microphones 100m sélectionnés, les N champs sonores encodés étant pondérés chacun par un facteur de pondération correspondant.Back to the [ fig. 2 ], the method comprises a step E210 of interpolating the sound field at an interpolation position, delivering an interpolated encoded sound field expressed as a linear combination of the N sound fields encoded by the N selected microphones 100m, the N sound fields encoded, each being weighted by a corresponding weighting factor.

Ainsi, dans le mode de réalisation discuté ci-dessus en relation avec la [fig. 1], dans lequel le champ sonore capté par les N microphones 100m sélectionnés est encodé sous une forme comprenant une pression captée et le vecteur des gradients de pression associé, on peut écrire la combinaison linéaire des N champs sonores encodés sous la forme : $(\begin{matrix} W_{a} (t) \\ X_{a} (t) \\ Y_{a} (t) \\ Z_{a} (t) \end{matrix}) = \sum_{i} a_{i} (t) (\begin{matrix} W_{i} (t) \\ X_{i} (t) \\ Y_{i} (t) \\ Z_{i} (t) \end{matrix}),$

avec :

(W_i (t) X_i (t) Y_i (t) Z_i (t)) ^T le vecteur colonne du champ au format encodé délivré par le microphone d'indice i, i un entier de 1 à N ;
(W_a (t) X_a (t) Y_a (t) Z_a (t)) ^T le vecteur colonne du champ au format encodé au niveau de la position d'interpolation (e.g. la position de l'auditeur 110 dans le mode de réalisation illustré sur la [fig. 1]) ; et
a_i (t) le facteur de pondération pondérant le champ au format encodé délivré par le microphone d'indice i dans la combinaison linéaire donnée par [Math 1].

Thus, in the embodiment discussed above in connection with the [ fig. 1 ], in which the sound field picked up by the N selected 100m microphones is encoded in a form comprising a picked up pressure and the vector of the associated pressure gradients, the linear combination of the N encoded sound fields can be written in the form:

(\begin{matrix} W_{has} (you) \\ X_{has} (you) \\ Y_{has} (you) \\ Z_{has} (you) \end{matrix}) = \sum_{I} {has}_{I} (you) (\begin{matrix} W_{I} (you) \\ X_{I} (you) \\ Y_{I} (you) \\ Z_{I} (you) \end{matrix}),

with :

( W _i ( t ) X _i ( t ) Y _i ( t ) Z _i ( t )) ^T the column vector of the field in the encoded format delivered by the microphone of index i, i an integer from 1 to N;
( W _a ( t ) X _a ( t ) Y _a ( t ) Z _a ( t )) ^T the column vector of the field in the encoded format at the level of the interpolation position (eg the position of the listener 110 in the embodiment illustrated in [ fig. 1 ]); and
a _i ( t ) the weighting factor weighting the field in encoded format delivered by the microphone of index i in the linear combination given by [Math 1].

Dans d'autres modes de réalisation non illustrés sur la [fig. 1] où le champ sonore capté par les microphones est encodé sous une forme comprenant la pression captée, le vecteur des gradients de pression associé ainsi que tout ou partie des composantes d'ordres supérieurs du champ sonore décomposé au format ambisonique, la combinaison linéaire donnée par [Math 1] se réécrit de manière plus générale comme : $(\begin{matrix} W_{a} (t) \\ X_{a} (t) \\ Y_{a} (t) \\ Z_{a} (t) \\ ⋮ \end{matrix}) = \sum_{i} a_{i} (t) (\begin{matrix} W_{i} (t) \\ X_{i} (t) \\ Y_{i} (t) \\ Z_{i} (t) \\ ⋮ \end{matrix}),$

où les pointillés désignent les composantes d'ordres supérieurs du champ sonore décomposé au format ambisonique.In other embodiments not illustrated in [ fig. 1 ] where the sound field picked up by the microphones is encoded in a form comprising the picked up pressure, the vector of the associated pressure gradients as well as all or part of the higher order components of the sound field decomposed in ambisonic format, the linear combination given by [Math 1] can be rewritten more generally as:

(\begin{matrix} W_{has} (you) \\ X_{has} (you) \\ Y_{has} (you) \\ Z_{has} (you) \\ ⋮ \end{matrix}) = \sum_{I} {has}_{I} (you) (\begin{matrix} W_{I} (you) \\ X_{I} (you) \\ Y_{I} (you) \\ Z_{I} (you) \\ ⋮ \end{matrix}),

where the dotted lines designate the higher-order components of the sound field decomposed in Ambisonic format.

Quel que soit le mode de réalisation considéré pour l'encodage du champ sonore, le procédé d'interpolation selon l'invention s'applique de la même manière afin d'estimer les facteurs de pondération a_i (t).Whatever the embodiment considered for the encoding of the sound field, the interpolation method according to the invention applies in the same way in order to estimate the weighting factors a _i ( t ).

Pour ce faire, le procédé de la [fig. 2] comprend une étape E210a d'estimation des N facteurs de pondération a_i (t) de manière à avoir les gradients de pression estimés à la position d'interpolation, représentés par le vecteur $\hat{B_{a}} = {(\begin{matrix} \hat{X_{a}} (t) & \hat{Y_{a}} (t) & \hat{Z_{a}} (t) \end{matrix})}^{T}$

, qui soient cohérents relativement à la position des sources 100s présentes dans la scène sonore 100.To do this, the process of the [ fig. 2 ] comprises a step E210a of estimating the N weighting factors a _i ( t ) so as to have the pressure gradients estimated at the interpolation position, represented by the vector

\hat{B_{has}} = {(\begin{matrix} \hat{X_{has}} (you) & \hat{Y_{has}} (you) & \hat{Z_{has}} (you) \end{matrix})}^{T}

, which are consistent with respect to the position of the sources 100s present in the sound scene 100.

Plus particulièrement, dans le mode de réalisation de la [fig. 2], on suppose qu'une seule des sources 100s est active en même temps. En effet, dans ce cas et tant que la réverbération est suffisamment contenue, le champ capté en tout point de la scène 100 peut être assimilé à une onde plane. De la sorte, les composantes d'ordre un (i.e. les gradients de pression) sont inversement proportionnelles à la distance entre la source 100s active et le point de mesure, e.g. le microphone 100m d'indice i, et pointe depuis la source 100s active vers le microphone 100m d'indice i en question. On peut ainsi écrire que le vecteur du gradient de pression capté par le microphone 100m d'indice i vérifie : $B_{i} % \frac{1}{d^{2} (x_{i} (t), x_{s} (t))} (x_{i} (t) - x_{s} (t)) .$

avec :

x_i (t) = (x_i (t) y_i (t) z_i (t)) ^T un vecteur représentatif de la position du microphone 100m d'indice i ;
x_s (t) = (x_s (t) y_s (t) z_s (t)) ^T un vecteur représentatif de la position de la source 100s active ; et
d(x_i (t), x _s (t)) est la distance entre le microphone 100m d'indice i et la source 100s active.

More particularly, in the embodiment of the [ fig. 2 ], it is assumed that only one of the sources 100s is active at the same time. Indeed, in this case and as long as the reverberation is sufficiently contained, the field captured at any point of the scene 100 can be likened to a plane wave. In this way, the first-order components (ie the pressure gradients) are inversely proportional to the distance between the active 100s source and the measurement point, eg the 100m microphone of index i, and point from the active 100s source to microphone 100m of index i in question. We can thus write that the vector of the pressure gradient picked up by the microphone 100m of index i verifies:

B_{I} % \frac{1}{d^{2} (x_{I} (you), x_{s} (you))} (x_{I} (you) - x_{s} (you)) .

with :

x _i ( t ) = ( x _i ( t ) y _i ( t ) z _i ( t )) ^T a vector representative of the position of the microphone 100m of index i ;
x _s ( t ) = ( x _s ( t ) y _s ( t ) z _s ( t )) ^T a vector representative of the position of the active 100s source; and
d ( _xi ( t ), xs(t ₎ ) is the distance between the 100m microphone of index i and the active 100s source.

L'équation [Math 2] traduit ici simplement que pour une onde plane :

La composante du premier ordre (i.e. le vecteur des gradients de pression) du champ sonore encodé est orienté dans la direction « source-point de captation » ; et
L'amplitude du champ sonore décroit linéairement avec la distance.

The equation [Math 2] translates here simply that for a plane wave:

The first-order component (ie the vector of pressure gradients) of the encoded sound field is oriented in the “source-point of capture” direction; and
The amplitude of the sound field decreases linearly with distance.

La distance d(x_i (t),x_s (t)) est a priori inconnue, mais on peut observer que, dans l'hypothèse d'une onde plane unique, la pression acoustique instantanée W_i (t) au niveau du microphone 100m d'indice i est, elle-aussi, inversement proportionnelle à cette distance. Ainsi : $W_{i} (t) % \frac{1}{d (x_{i} (t), x_{s} (t))}$

The distance d ( x _i ( t ), x _s ( t )) is a priori unknown, but it can be observed that, in the hypothesis of a single plane wave, the instantaneous acoustic pressure W _i ( t ) at the level of the microphone 100m of index i is also inversely proportional to this distance. Thereby :

W_{I} (you) % \frac{1}{d (x_{I} (you), x_{s} (you))}

En substituant cette relation dans [Math 2], on obtient la relation de proportionnalité suivante : $B_{i} % W_{i}^{2} (x_{i} (t) - x_{s} (t))$

By substituting this relation in [Math 2], we obtain the following proportionality relation:

B_{I} % W_{I}^{2} (x_{I} (you) - x_{s} (you))

En remplaçant la relation cette dernière relation dans [Math 1], on obtient l'équation suivante : $\sum_{i} a_{i} (t) W_{i}^{2} (t) (x_{i} (t) - x_{s} (t)) = W_{a}^{2} (t) (x_{a} (t) - x_{s} (t)),$

avec x_a (t) = (x_a (t) y_a (t) z_a (t)) ^T un vecteur représentatif de la position d'interpolation dans le repère précité. En réorganisant, on obtient :

\sum_{i} a_{i} (t) W_{i}^{2} (t) x_{i} (t) - W_{a}^{2} (t) x_{a} (t) = (\sum_{i} a_{i} (t) W_{i}^{2} (t) - W_{a}^{2} (t)) x_{s} (t) .

By replacing the relation this last relation in [Math 1], we obtain the following equation:

\sum_{I} {has}_{I} (you) W_{I}^{2} (you) (x_{I} (you) - x_{s} (you)) = W_{has}^{2} (you) (x_{has} (you) - x_{s} (you)),

with x _a ( t ) = ( x _a ( t ) y _a ( t ) z _a ( t )) ^T a vector representative of the interpolation position in the aforementioned reference frame. Rearranging, we get:

\sum_{I} {has}_{I} (you) W_{I}^{2} (you) x_{I} (you) - W_{has}^{2} (you) x_{has} (you) = (\sum_{I} {has}_{I} (you) W_{I}^{2} (you) - W_{has}^{2} (you)) x_{s} (you) .

En général, les différentes positions précitées (e.g. de la source 100s active, des microphones 100m, de la position d'interpolation, etc.) varient dans le temps. Ainsi les facteurs de pondération a_i (t) sont de manière générale fonction du temps. Estimer les facteurs de pondération a_i (t) revient à résoudre un système de trois équations linéaires (écrites ci-dessus sous forme d'une seule équation vectorielle dans [Math 3]). Pour que l'interpolation reste cohérente au cours du temps avec la position d'interpolation qui peut varier dans le temps (e.g. si la position en question correspond à la position de l'auditeur 110 qui est amené à se déplacer), elle est réalisée à différents instants avec une résolution temporelle T_a adaptée à la vitesse de changement de la position d'interpolation. En pratique, une fréquence de rafraîchissement f_a = $\frac{1}{T_{a}}$

est largement inférieure à la fréquence d'échantillonnage f_s des signaux acoustiques. Par exemple, une mise-à-jour des coefficients d'interpolation a_i (t) toutes les T_a = 100ms est tout à fait suffisante.In general, the different positions mentioned above (eg of the active 100s source, of the 100m microphones, of the interpolation position, etc.) vary over time. So the factors of weighting a _i ( t ) are generally a function of time. Estimating the weighting factors a _i ( t ) amounts to solving a system of three linear equations (written above as a single vector equation in [Math 3]). For the interpolation to remain consistent over time with the interpolation position which may vary over time (eg if the position in question corresponds to the position of the listener 110 who is required to move), it is carried out at different instants with a temporal resolution T _a adapted to the rate of change of the interpolation position. In practice, a refresh rate f _a =

\frac{1}{T_{has}}

is much lower than the sampling frequency f _s of the acoustic signals. For example, an update of the interpolation coefficients a _i ( t ) every T _a = 100 ms is quite sufficient.

Dans [Math 3], le carré de la pression sonore au niveau de la position d'interpolation, $W_{a}^{2} (t)$

, également appelée puissance acoustique instantanée (ou plus simplement puissance instantanée), est une inconnue, ainsi que le vecteur représentatif de la position x_s (t) de la source 100s active.In [Math 3], the square of the sound pressure at the interpolation position,

W_{has}^{2} (you)

, also called instantaneous acoustic power (or more simply instantaneous power), is an unknown, as well as the representative vector of the position x _s ( t ) of the active 100s source.

Afin de pouvoir estimer les facteurs de pondération a_i (t) sur la base d'une résolution de [Math 3], une estimation $\hat{W_{a}^{2}} (t)$

de la puissance acoustique au niveau de la position d'interpolation est par exemple obtenue.In order to be able to estimate the weighting factors a _i ( t ) based on a resolution of [Math 3], an estimate

\hat{W_{has}^{2}} (you)

acoustic power at the interpolation position is for example obtained.

Une première approche consiste à approcher la puissance acoustique instantanée par celle captée par le microphone 100m le plus proche de la position d'interpolation en question, i.e. : $\hat{W_{a}^{2}} (t) = W_{k}^{2} (t), o ù k = \arg (\min_{i} (d (x_{i} (t), x_{a} (t)))) .$

A first approach consists in approximating the instantaneous acoustic power by that picked up by the microphone 100m closest to the interpolation position in question, i.e.:

\hat{W_{has}^{2}} (you) = W_{k}^{} (you), oh where k = \arg (\min_{I} (d (x_{I} (you), x_{has} (you)))) .

En pratique, la puissance acoustique instantanée $W_{k}^{2} (t)$

peut varier rapidement au cours du temps, ce qui peut amener à une estimation bruitée des facteurs de pondération a_i (t) et à une instabilité de la scène interpolée. Ainsi, dans des variantes on calcule la puissance moyenne ou efficace captée par le microphone 100m le plus proche de la position d'interpolation sur une fenêtre temporelle autour de l'instant considéré, en moyennant la puissance instantanée sur une trame de Téchantillons :

\hat{W_{ι}^{2}} (t) = \frac{1}{T} \sum_{n = t - T}^{t} W_{i}^{2} (n),

où T correspond à une durée de quelques dizaines de millisecondes, ou encore être égale à la résolution temporelle du rafraîchissement des facteurs de pondération a_i (t).In practice, the instantaneous sound power

W_{k}

_{2} (you)

can vary rapidly over time, which can lead to a noisy estimation of the weighting factors a _i ( t ) and to an instability of the interpolated scene. Thus, in variants, the average or effective power picked up by the microphone 100m closest to the interpolation position is calculated over a time window around the instant considered, by averaging the instantaneous power over a frame of Tsamples:

\hat{W_{ι}^{2}} (you) = \frac{1}{T} \sum_{not = you - T}^{you} W_{I}^{2} (not),

where T corresponds to a duration of a few tens of milliseconds, or else to be equal to the temporal resolution of the refreshing of the weighting factors a _i ( t ).

Dans d'autres variantes, on peut estimer la puissance efficace par lissage autorégressif de la forme : $\hat{W_{ι}^{2}} (t) = α_{w} \hat{W_{ι}^{2}} (t - 1) + (1 - α_{w}) W_{i}^{2} (t),$

où le facteur d'oubli α_w est déterminé de telle manière à intégrer la puissance sur quelques dizaines de millisecondes. En pratique, des valeurs de 0.95 à 0.98 pour des fréquences d'échantillonnage du signal allant de 8kHz à 48kHz réalise un bon compromis entre la robustesse de l'interpolation et sa réactivité aux changements de position de la source.In other variants, the effective power can be estimated by autoregressive smoothing of the form:

\hat{W_{ι}^{}} (you) = α_{w} W_{ι} \hat{_{2}} (you - 1) + (1 - α_{w}) W_{I}^{2} (you),

where the forgetting factor α _w is determined in such a way as to integrate the power over a few tens of milliseconds. In practice, values of 0.95 to 0.98 for signal sampling frequencies ranging from 8kHz to 48kHz achieve a good compromise between the robustness of the interpolation and its reactivity to source position changes.

Dans une deuxième approche, la puissance acoustique instantanée $W_{a}^{2} (t)$

au niveau de la position d'interpolation est estimée comme un barycentre des N estimations

\hat{W_{ι}^{2}} (t)

des N puissances instantanées

W_{i}^{2} (t)

des N pressions captées par les N microphones 100m sélectionnés. Une telle approche s'avère plus pertinente lorsque les microphones 100m sont espacés les uns des autres. Par exemple, on détermine les coefficients barycentriques en fonction de la distance ∥x_i (t) - x_a (t)∥ _p, où p est un réel positif et ∥.∥ _p est la norme L-p, entre la position d'interpolation et le microphone 110m d'indice i parmi les N microphones 100m. Ainsi, selon cette deuxième approche :

{\begin{matrix} \hat{W_{a}^{2}} (t) = \sum_{i} \frac{\hat{W_{ι}^{2}} (t)}{\tilde{d} (x_{i} (t), x_{a} (t))} \\ \tilde{d} (x_{i} (t), x_{a} (t)) = \frac{{‖ x_{i} (t) - x_{a} (t) ‖}_{p}}{\sum_{j} {‖ x_{j} (t) - x_{a} (t) ‖}_{p}} \end{matrix}

où d̃(x_i (t), x_a (t)) est la version normalisée de ∥x_i (t) - x_a (t)∥ _p telle que Σ _i d̃(x_i (t), x_a (t)) = 1. Ainsi, un coefficient pondérant l'estimation

\hat{W_{ι}^{2}} (t)

de la puissance instantanée

W_{i}^{2} (t)

de la pression captée par le microphone 110m d'indice i, dans l'expression barycentrique ci-dessus est inversement proportionnel à une version normalisée de la distance, au sens de la norme L-p, entre la position du microphone d'indice i délivrant la pression W_i (t) et la position d'interpolation.In a second approach, the instantaneous acoustic power

W_{has}^{2} (you)

at the interpolation position is estimated as a barycenter of the N estimates

\hat{W_{ι}^{}} (you)

of the N instantaneous powers

W_{I}^{2} (you)

of the N pressures captured by the N microphones 100m selected. Such an approach is more relevant when the microphones are 100m apart from each other. For example, one determines the barycentric coefficients according to the distance ∥ x _i ( t ) - x _a ( t )∥ _p , where p is a positive real and ∥.∥ _p is the Lp norm, between the interpolation position and the microphone 110m of index i among the N microphones 100m. Thus, according to this second approach:

{\begin{matrix} \hat{W_{has}^{2}} (you) = \sum_{I} \frac{\hat{W_{ι}^{2}} (you)}{\tilde{d} (x_{I} (you), x_{has} (you))} \\ \tilde{d} (x_{I} (you), x_{has} (you)) = \frac{{‖ x_{I} (you) - x_{has} (you) ‖}_{p}}{\sum_{I} {‖ x_{I} (you) - x_{has} (you) ‖}_{p}} \end{matrix}

where d̃ ( x _i ( t ), x _a ( t )) is the normalized version of ∥ x _i ( t ) - x _a ( t )∥ _p such that Σ _i d̃ ( x _i ( t ), x _a ( t )) = 1. Thus, a coefficient weighting the estimate

W_{ι}

\hat{_{2}} (you)

instant power

W_{I}^{2} (you)

of the pressure picked up by the microphone 110m of index i , in the barycentric expression above is inversely proportional to a normalized version of the distance, within the meaning of the Lp standard, between the position of the microphone of index i delivering the pressure W _i ( t ) and the interpolation position.

Dans des alternatives, la puissance acoustique instantanée $W_{a}^{2} (t)$

au niveau de la position d'interpolation est estimée directement comme un barycentre des N puissances instantanées

W_{i}^{2} (t)

des N pressions captées par les N microphones 100m. En pratique, cela revient à substituer

W_{i}^{2} (t)

à

\hat{W_{ι}^{2}} (t)

dans l'équation ci-dessus.In alternatives, the instantaneous sound power

W_{has}^{2} (you)

at the interpolation position is estimated directly as a barycenter of the N instantaneous powers

W_{I}^{2} (you)

of the N pressures picked up by the N microphones 100m. In practice, this amounts to substituting

W_{I}^{2} (you)

at

\hat{W_{ι}^{2}} (you)

in the equation above.

Par ailleurs, différents choix de la norme p peuvent être envisagés. Par exemple une valeur faible de p tend à moyenner la puissance sur toute la zone délimitée par les microphones 100m, tandis qu'une valeur élevée tend à privilégier le microphone 100m le plus proche de la position d'interpolation, le cas p = ∞ revenant à l'estimation par celle du microphone 100m le plus proche. Par exemple, lorsque p est choisi égal à deux, la loi de décroissance de la pression du champ sonore est respectée, conduisant à de bons résultats quelle que soit la configuration de la scène.Moreover, different choices of the norm p can be envisaged. For example, a low value of p tends to average the power over the entire area bounded by the 100m microphones, while a high value tends to favor the 100m microphone closest to the interpolation position, the case p = ∞ returning to the estimate by that of the nearest 100m microphone. For example, when p is chosen equal to two, the law of decreasing sound field pressure is respected, leading to good results whatever the configuration of the stage.

Par ailleurs, l'estimation des facteurs de pondération a_i (t) sur la base d'une résolution de [Math 3] nécessite d'adresser la problématique de la non connaissance du vecteur représentatif de la position x_s (t) de la source 100s active.Moreover, the estimation of the weighting factors a _i ( t ) on the basis of a resolution of [Math 3] requires addressing the problem of not knowing the representative vector of the position x _s ( t ) of the source 100s active.

Dans une première variante, on estime les facteurs de pondération a_i (t) en négligeant le terme contenant la position de la source que l'on ne connaît pas, i.e. le membre de droite dans [Math 3]. Par ailleurs, à partir de l'estimation de la puissance $\hat{W_{a}^{2}} (t)$

et de l'estimation

\hat{W_{ι}^{2}} (t)

de la puissance instantanée

W_{i}^{2} (t)

captée par les microphones 100m, une telle négligence du membre de droite de [Math 3] revient à résoudre le système de trois équations linéaires suivant, écrit ici sous forme vectorielle :

\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} = \hat{W_{a}^{2}} (t) x_{a} (t) .

In a first variant, we estimate the weighting factors a _i ( t ) by neglecting the term containing the position of the source that we do not know, ie the right-hand side in [Math 3]. Moreover, from the estimate of the power

\hat{W_{has}^{2}} (you)

and estimate

W_{ι}

\hat{_{2}} (you)

instant power

W_{I}^{2} (you)

captured by the 100m microphones, such neglect of the right-hand side of [Math 3] amounts to solving the following system of three linear equations, written here in vector form:

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) x_{I} = \hat{W_{has}^{2}} (you) x_{has} (you) .

Ainsi, il apparaît que les facteurs de pondération a_i (t) sont estimés à partir :

de la position d'interpolation, représentée par le vecteur x_a (t) ;
de la position de chacun des N microphones 100m, représentée par le vecteur x_i (t) correspondant, i de 1 à N, dans le repère précité;
des N pressions W_i (t), i de 1 à N, captées par les N microphones ; et
de la puissance estimée $\hat{W_{a}^{2}} (t)$
du champ sonore à la position d'interpolation, $\hat{W_{ι}^{}} (t)$
étant effectivement estimée à partir des grandeurs en question comme décrit ci-dessus.

Thus, it appears that the weighting factors a _i ( t ) are estimated from:

the interpolation position, represented by the vector x _a ( t );
the position of each of the N microphones 100m, represented by the corresponding vector x _i ( t ), i from 1 to N, in the aforementioned frame;
N pressures W _i ( t ), i from 1 to N, picked up by the N microphones; and
estimated power $\hat{W_{has}^{2}} (you)$
from the sound field to the interpolation position, $\hat{W_{ι}^{}} (you)$
being actually estimated from the quantities in question as described above.

Par exemple, [Math 4] est résolue au sens de la minimisation de l'erreur quadratique moyenne, e.g. par minimisation de la fonction de coût ${‖ \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) - \hat{W_{a}^{2}} (t) x_{a} (t) ‖}^{2}$

. En pratique, la méthode de résolution (e.g. l'algorithme du Simplexe) est choisie en fonction du caractère surdéterminé (plus d'équations que de microphones) ou sous-déterminé (plus de microphones que d'équations).For example, [Math 4] is solved in the sense of minimizing the root mean square error, eg by minimizing the cost function

{‖ \sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) x_{I} (you) - \hat{W_{has}^{2}} (you) x_{has} (you) ‖}^{2}

Dans une deuxième variante, on estime les facteurs de pondération a_i (t) non plus en négligeant le terme contenant la position de la source que l'on ne connaît pas, i.e. le membre de droite de [Math 3], mais en contraignant la recherche des coefficients a_i (t) autour de l'hyperplan $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)$

. En effet, dans le cas où l'estimation

{\hat{W}}_{a}^{2} (t)

est une estimation fiable de la puissance réelle

W_{a}^{2} (t)

, imposer que les coefficients a_i (t) respectent « au mieux » la relation

\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)

implique que le membre de droite dans [Math 3] est faible, et par suite toute solution qui résout le système d'équations [Math 4] reconstruit correctement les gradients de pression.In a second variant, we estimate the weighting factors a _i ( t ) no longer by neglecting the term containing the position of the source that we do not know, ie the right-hand side of [Math 3], but by constraining the search for the coefficients a _i ( t ) around the hyperplane

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

. Indeed, in the case where the estimate

{\hat{W}}_{has}^{2} (you)

is a reliable estimate of the real power

W_{has}^{2} (you)

, impose that the coefficients a _i ( t ) respect “at best” the relation

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

implies that the right-hand side in [Math 3] is weak, and hence any solution that solves the system of equations [Math 4] correctly reconstructs the pressure gradients.

Ainsi, dans cette deuxième variante, les facteurs de pondération a_i (t) sont estimés par résolution du système [Math 4] sous la contrainte que $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)$

. Dans le système en question,

\hat{W_{ι}^{2}} (t)

et

\hat{W_{a}^{2}} (t)

sont par exemple estimés selon une des variantes proposées ci-dessus. En pratique, la résolution d'un tel système linéaire sous contrainte linéaire peut être réalisée par l'algorithme du Simplexe ou tout autre algorithme de minimisation sous contrainte.Thus, in this second variant, the weighting factors a _i ( t ) are estimated by solving the system [Math 4] under the constraint that

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

. In the system in question,

\hat{W_{ι}^{}} (you)

and

\hat{W_{has}^{2}} (you)

are for example estimated according to one of the variants proposed above. In practice, the resolution of such a linear system under linear constraint can be carried out by the Simplex algorithm or any other minimization algorithm under constraint.

Pour accélérer la recherche, on peut ajouter une contrainte de positivité des facteurs de pondération a_i (t). Dans ce cas, les facteurs de pondération a_i (t) sont estimés par résolution du système [Math 4] sous la double contrainte que $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)$

, et que Vi,a_i (t) ≥ 0. Par ailleurs, la contrainte de positivité des facteurs de pondération a_i permet d'éviter les inversions de phases, conduisant par là-même à des résultats d'estimation améliorés.To speed up the search, we can add a constraint of positivity of the weighting factors a _i ( t ). In this case, the weighting factors a _i ( t ) are estimated by solving the system [Math 4] under the double constraint that

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

, and that Vi, a _i ( t ) ≥ 0. Furthermore, the positivity constraint of the weighting factors a _i makes it possible to avoid phase inversions, thereby leading to improved estimation results.

Alternativement, afin de réduire le temps de calcul, une autre implémentation consiste à intégrer directement la contrainte de l'hyperplan $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)$

dans le système [Math 4], ce qui revient au final à la résolution du système linéaire :

{\begin{matrix} \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) = \hat{W_{a}^{2}} (t) x_{a} (t) \\ α \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = α \hat{W_{a}^{2}} (t) \end{matrix}

Alternatively, in order to reduce the computation time, another implementation consists in directly integrating the constraint of the hyperplane

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

in the system [Math 4], which ultimately amounts to solving the linear system:

{\begin{matrix} \sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) x_{I} (you) = \hat{W_{has}^{2}} (you) x_{has} (you) \\ α \sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = α \hat{W_{has}^{2}} (you) \end{matrix}

Ici, le coefficient α permet d'homogénéiser les unités des grandeurs $\hat{W_{a}^{2}} (t) x_{a} (t)$

et

\hat{W_{a}^{2}} (t)

. En effet, les grandeurs en question ne sont pas homogènes et, en fonction de l'unité choisie pour les coordonnées de position (mètre, centimètre, ...), les solutions vont privilégier soit l'ensemble d'équation

\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) = \hat{W_{a}^{2}} (t) x_{a} (t)

, soit l'hyperplan

\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) =

\hat{W_{a}^{2}} (t)

. Afin de rendre ces grandeurs homogènes, le coefficient α est par exemple choisi égal à la norme L-2 du vecteur x_a (t), i.e. α = ∥x_a (t)∥₂, avec

{‖ x_{a} (t) ‖}_{2} = \sqrt[2]{x_{a}^{2} (t) + y_{a}^{2} (t) + z_{a}^{2} (t)}

. En pratique, il peut être intéressant de contraindre plus fortement les coefficients d'interpolation à respecter la contrainte de l'hyperplan

\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t)

. Ce peut être obtenu en pondérant le coefficient α par un facteur d'amplification λ > 1. Les résultats montrent qu'un facteur d'amplification λ de 2 à 10 rend plus robuste la prédiction des gradients de pression.Here, the coefficient α makes it possible to homogenize the units of the quantities

\hat{W_{has}^{2}} (you) x_{has} (you)

and

\hat{W_{has}^{2}} (you)

. Indeed, the magnitudes in question are not homogeneous and, depending on the unit chosen for the position coordinates (meter, centimeter, ...), the solutions will favor either the set of equations

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) x_{I} (you) = \hat{W_{has}^{2}} (you) x_{has} (you)

, i.e. the hyperplane

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) =

\hat{W_{has}^{2}} (you)

. In order to make these quantities homogeneous, the coefficient α is for example chosen equal to the L-2 norm of the vector x _a ( t ), ie α = ∥ x _a ( t )∥ ₂ , with

{‖ x_{has} (you) ‖}_{2} = \sqrt[2]{x_{has}^{2} (you) + {there}_{has}^{2} (you) + z_{has}^{2} (you)}

. In practice, it may be interesting to constrain the interpolation coefficients more strongly to respect the constraint of the hyperplane

\sum_{I} {has}_{I} (you) \hat{W_{ι}^{2}} (you) = \hat{W_{has}^{2}} (you)

. This can be obtained by weighting the coefficient α by an amplification factor λ > 1. The results show that an amplification factor λ of 2 to 10 makes the prediction of pressure gradients more robust.

On note ainsi que dans cette deuxième variante également, les facteurs de pondération a_i (t) sont estimés à partir :

de la position d'interpolation, représentée par le vecteur x_a (t) ;
de la position de chacun des N microphones 100m , représentées chacune par le vecteur x_i (t) correspondant, i de 1 à N;
des N pressions W_i (t), i de 1 à N, captées par les N microphones ; et
de la puissance estimée $\hat{W_{a}^{2}} (t)$
du champ sonore à la position d'interpolation, $\hat{W_{ι}^{}} (t)$
étant effectivement estimée à partir des grandeurs en question comme décrit ci-dessus.

It is thus noted that in this second variant also, the weighting factors a _i ( t ) are estimated from:

the interpolation position, represented by the vector x _a ( t );
the position of each of the N microphones 100m , each represented by the corresponding vector x _i ( t ), i from 1 to N;
N pressures W _i ( t ) , i from 1 to N, picked up by the N microphones; and
estimated power $\hat{W_{has}^{2}} (you)$
from the sound field to the interpolation position, $\hat{W_{ι}^{}} (you)$
being actually estimated from the quantities in question as described above.

On présente désormais, en relation avec les [ fig. 3a], [fig. 3b] et [fig. 3c ] les performances du procédé de la [fig. 2] appliqué à une scène 300 comprenant quatre microphones 300m et une source 300s disposée dans une configuration symétrique par rapport à la scène 300 et aux quatre microphones 300m.We now present, in relation to the [ fig. 3a], [fig. 3b] and [fig. 3c ] the process performance of the [ fig. 2 ] applied to a stage 300 comprising four microphones 300m and a source 300s arranged in a symmetrical configuration with respect to the stage 300 and the four microphones 300m.

Plus particulièrement, les quatre microphones 300m sont disposés aux quatre coins d'une pièce et la source 300s est disposée au centre de la pièce. La pièce présente une réverbération moyenne, avec un temps de réverbération ou T₆₀ d'environ 500ms. Le champ sonore capté par les microphones 300m est encodé sous une forme comprenant une pression captée et le vecteur de gradients de pression associé.More particularly, the four microphones 300m are placed at the four corners of a room and the source 300s is placed in the center of the room. The room has medium reverberation, with a reverberation time or T ₆₀ of around 500ms. The sound field picked up by the 300m microphones is encoded in a form comprising a picked up pressure and the associated pressure gradient vector.

Les résultats obtenus par application du procédé de la [fig. 2] sont comparés avec ceux obtenus par application de la méthode du barycentre proposée dans l'article de conférence de A. Southern, J. Wells et D. Murphy précité et qui présente un coût de calcul d'un même ordre de grandeur. Le calcul des coefficients a_i (t) est adapté en fonction de la distance de la position d'interpolation à la position du microphone 300m d'indice i correspondant : $a_{i} (t) = \frac{{‖ x_{i} (t) - x_{a} (t) ‖}_{5}}{\sum_{k = 1}^{N} {‖ x_{k} (t) - x_{a} (t) ‖}_{5}}$

The results obtained by application of the method of [ fig. 2 ] are compared with those obtained by application of the barycenter method proposed in the aforementioned conference article by A. Southern, J. Wells and D. Murphy and which has a computational cost of the same order of magnitude. The calculation of the coefficients a _i ( t ) is adapted according to the distance from the interpolation position to the position of the microphone 300m of corresponding index i:

{has}_{I} (you) = \frac{{‖ x_{I} (you) - x_{has} (you) ‖}_{5}}{\sum_{k = 1}^{NOT} {‖ x_{k} (you) - x_{has} (you) ‖}_{5}}

Les simulations montrent que cette formule heuristique donne de meilleurs résultats que la méthode avec des poids fixes proposée dans la littérature.The simulations show that this heuristic formula gives better results than the method with fixed weights proposed in the literature.

Pour mesurer la performance de l'interpolation du champ, on utilise le vecteur intensité I (t) qui doit en théorie pointer dans la direction opposée à la source 300s active. Sur les [fig. 3b] et [fig. 3c] sont respectivement tracés les vecteurs intensité normés I (t)/∥ I (t)∥ réels et estimés par la méthode de l'état de l'art et par le procédé de la [fig. 2]. Dans la configuration symétrique de la scène 300, on note un biais plus faible du procédé de la [fig. 2] par rapport à la méthode de l'état de l'art, notamment à la frontière entre deux microphones 300m et en dehors de la zone délimitée par les microphones 300m.To measure the performance of the field interpolation, we use the intensity vector I ( t ) which should in theory point in the opposite direction to the active 300s source. On the [ fig. 3b] and [fig. 3c ] are respectively plotted the normalized intensity vectors I ( t )/∥ I ( t )∥ real and estimated by the method of the state of the art and by the process of [ fig. 2 ]. In the symmetrical configuration of stage 300, we note a lower bias of the process of the [ fig. 2 ] compared to the method of the state of the art, in particular at the border between two microphones 300m and outside the zone delimited by the microphones 300m.

On présente désormais, en relation avec les [ fig. 4a], [fig. 4b] et [fig. 4c ] les performances du procédé de la [fig. 2] appliqué à une scène 400 comprenant quatre microphones 400m et une source 400s disposée dans une configuration non symétrique par rapport à la scène 400 et aux quatre microphones 400m.We now present, in relation to the [ fig. 4a], [fig. 4b] and [fig. 4c ] the process performance of the [ fig. 2 ] applied to a stage 400 comprising four microphones 400m and a source 400s arranged in a configuration that is not symmetrical with respect to the stage 400 and the four microphones 400m.

Plus particulièrement, par rapport à la configuration de la scène 300 de la [fig. 3a], les quatre microphones 400m restent ici disposés aux quatre coins d'une pièce alors que la source 400s est maintenant décentrée par rapport au centre de la pièce.Specifically, compared to the Stage 300 configuration of the [ fig. 3a ], the four 400m microphones remain here positioned at the four corners of a room while the 400s source is now off-center in relation to the center of the room.

Sur les [fig. 4b] et [fig. 4c] sont respectivement tracés les vecteurs intensité normés I (t)/∥ I (t)∥ réels et estimés par la méthode de l'état de l'art et par le procédé de la [fig. 2] pour la configuration de la scène 400. On constate la robustesse de la méthode proposée : le champ sonore interpolé par le procédé de la [fig. 2] est cohérent sur tout l'espace, y compris en dehors de la zone délimitée par les microphones 400m (proches des murs). Au contraire, le champ interpolé par la méthode de l'état de l'art est incohérent sur quasiment la moitié de l'espace de la scène 400 si l'on se réfère à la divergence entre l'intensité acoustique réelle et estimée représentées sur la [fig. 4b].On the [ fig. 4b] and [fig. 4c ] are respectively plotted the normalized intensity vectors I ( t )/∥ I ( t )∥ real and estimated by the method of the state of the art and by the process of [ fig. 2 ] for the configuration of the scene 400. We note the robustness of the proposed method: the sound field interpolated by the process of [ fig. 2 ] is consistent throughout the space, including outside the area delimited by the 400m microphones (close to the walls). On the contrary, the field interpolated by the method of the state of the art is incoherent over almost half of the space of the scene 400 if one refers to the discrepancy between the real and estimated acoustic intensity represented on there [ fig. 4b ].

On présente désormais, en relation avec la [ fig. 5 ], un autre mode de réalisation du procédé d'interpolation du champ sonore capté par les microphones 100m de la scène 100.We now present, in relation to the [ fig. 5 ], another embodiment of the method for interpolating the sound field picked up by the microphones 100m of the stage 100.

Selon le mode de réalisation de la [fig. 5], le procédé comprend l'étape E200 de sélection de N microphones parmi les Nt microphones de la scène 100 décrite ci-dessus en relation avec la [fig. 2].According to the embodiment of the [ fig. 5 ], the method comprises the step E200 of selecting N microphones from among the Nt microphones of the scene 100 described above in relation to the [ fig. 2 ].

Cependant, dans d'autres modes de réalisation non illustrés sur la [fig. 5], l'étape E200 de sélection n'est pas mise en oeuvre et les étapes E500, E210 et E510 discutées ci-dessous, sont mises en oeuvre sur la base des champs sonores encodés par l'ensemble des Nt microphones 100m. En d'autres termes, N=Nt dans ces autres modes de réalisation.However, in other embodiments not shown in [ fig. 5 ], the selection step E200 is not implemented and the steps E500, E210 and E510 discussed below are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m. In other words, N=Nt in these other embodiments.

De retour à la [fig. 5], le mode de réalisation en question se trouve être adapté au cas où plusieurs sources parmi les sources 100s sont actives simultanément. Dans ce cas, l'hypothèse d'un champ pleine bande ressemblant à une onde plane n'est plus valide. En effet, même en milieu anéchoïque, le mélange de deux ondes planes n'est pas une onde plane - sauf dans le cas bien particulier d'une même source émettant depuis 2 points de l'espace équidistants du point de captation. En pratique, la procédure de reconstruction du champ « plein bande » s'adapte à la source prépondérante dans la trame utilisée pour le calcul des puissances efficaces. Ceci produit des variations rapides de directivité, et parfois des incohérences de localisation des sources : lorsqu'une source est plus énergétique qu'une autre, les deux sources en question sont estimées comme étant localisées à la position de la source la plus énergétique.Back to the [ fig. 5 ], the embodiment in question is suitable for the case where several sources among the sources 100s are active simultaneously. In this case, the assumption of a full-band field resembling a plane wave is no longer valid. Indeed, even in an anechoic medium, the mixture of two plane waves is not a plane wave - except in the very specific case of the same source emitting from 2 points in space equidistant from the capture point. In practice, the "full band" field reconstruction procedure adapts to the preponderant source in the frame used for the calculation of the effective powers. This produces rapid variations in directivity, and sometimes source location inconsistencies: when one source is more energetic than another, the two sources in question are estimated to be located at the position of the most energetic source.

Pour éviter ceci, le mode de réalisation de la [fig. 5] exploite la parcimonie des signaux dans le domaine fréquentiel. Pour des signaux de parole par exemple, il est prouvé statistiquement que les supports fréquentiels de plusieurs signaux de parole sont globalement disjoints : c'est à dire que la majorité du temps, une seule source est présente dans chaque bande de fréquence. Le mode de réalisation de la [fig. 2] (selon l'une quelconque des variantes précitées) peut ainsi s'appliquer au signal présent dans chaque bande de fréquences.To avoid this, the embodiment of the [ fig. 5 ] exploits signal sparseness in the frequency domain. For speech signals for example, it is statistically proven that the frequency supports of several speech signals are globally disjoint: that is to say that most of the time, a single source is present in each band of frequency. The embodiment of the [ fig. 2 ] (according to any of the aforementioned variants) can thus be applied to the signal present in each frequency band.

Ainsi, lors d'une étape E500, pour un champ sonore encodé donné parmi les N champs sonores encodés délivrés par les N microphones 100m sélectionnés, une transformation du champ sonore encodé donné est effectuée par application d'une transformation temps-fréquence comme la transformée de Fourier ou d'un banc de filtres à reconstruction parfaite ou presque parfaite, comme les filtres miroirs en quadrature ou QMF. Une telle transformation délivre M composantes fréquentielles de champ associées au champ sonore encodé donné, chaque composante fréquentielle de champ parmi les M composantes fréquentielles de champ étant localisée dans une sous-bande fréquentielle distincte.Thus, during a step E500, for a given encoded sound field from among the N encoded sound fields delivered by the N selected microphones 100m, a transformation of the given encoded sound field is performed by applying a time-frequency transformation such as the transform filter or a bank of filters with perfect or almost perfect reconstruction, such as quadrature mirror filters or QMF. Such a transformation delivers M field frequency components associated with the given encoded sound field, each field frequency component among the M field frequency components being located in a distinct frequency sub-band.

Par exemple, le vecteur de champs encodé, ψ_i , délivré par le microphone d'indice i, i de 1 à N, est segmenté en trames d'indice n, de taille T compatible avec la stationnarité des sources présentes dans la scène : $ψ_{i} (n) = [\begin{matrix} ψ_{i} (t_{n} - T + 1) & ψ_{i} (t_{n} - T + 2) & \dots & ψ_{i} (t_{n}) \end{matrix}] .$

For example, the encoded field vector, ψ _i , delivered by the microphone of index i, i from 1 to N, is segmented into frames of index n, of size T compatible with the stationarity of the sources present in the scene:

ψ_{I} (not) = [\begin{matrix} ψ_{I} ({you}_{not} - T + 1) & ψ_{I} ({you}_{not} - T + 2) & \dots & ψ_{I} ({you}_{not}) \end{matrix}] .

Le rythme trame correspond par exemple au rythme de réactualisation T_a des facteurs de pondération a_i (t), i.e. : $t_{n + 1} = t_{n} + E [T_{a} / T_{s}],$

où Ts = 1/fs est la fréquence d'échantillonnage des signaux et E[.] désigne la partie entière.The frame rate corresponds for example to the updating rate T _a of the weighting factors a _i ( t ), i.e.:

{you}_{not + 1} = {you}_{not} + E [T_{has} / T_{s}],

where Ts = 1/ fs is the signal sampling frequency and E[.] denotes the integer part.

La transformation est ainsi appliquée à chaque composante du vecteur ψ_i représentant le champ sonore encodé par le microphone 100m d'indice i (i.e. est appliquée à la pression captée, aux composantes du vecteur des gradients de pression, ainsi qu'aux composantes d'ordre supérieur présentes dans le champ sonore encodé le cas échant), pour produire une représentation temps-fréquence. Par exemple, la transformation en question est une transformation de Fourier directe. De sorte, on obtient pour la I-ème composante ψ _i,l du vecteur ψ_i : $ψ_{i, l} (n, ω) = \frac{1}{T} \sum_{t = 0}^{T - 1} ψ_{i, l} (t_{n} - t) e^{- j ω t}$

où

j = \sqrt{- 1}

, et ω la pulsation normalisée.The transformation is thus applied to each component of the vector ψ _i representing the sound field encoded by the microphone 100m of index i (ie is applied to the pressure picked up, to the components of the pressure gradient vector, as well as to the components of higher order present in the encoded sound field if any), to produce a time-frequency representation. For example, the transformation in question is a direct Fourier transformation. So, we obtain for the I-th component ψ _{i, l} of the vector ψ _i :

ψ_{I, I} (not, ω) = \frac{1}{T} \sum_{you = 0}^{T - 1} ψ_{I, I} ({you}_{not} - you) e^{- I ω you}

where

I = \sqrt{- 1}

, and ω the normalized pulsation.

En pratique, on peut choisir T comme une puissance de deux (e.g. immédiatement supérieure à T_a ) et choisir ω = 2πk/T, 0 ≤ k < T de sorte à implémenter la transformation de Fourier sous forme d'une transformée de Fourier rapide $ψ_{i, l} (n, k) = \frac{1}{T} \sum_{t = 0}^{T - 1} ψ_{i, l} (t_{n} - t) e^{- \frac{2 j π k t}{T}}$

In practice, one can choose T as a power of two (eg immediately greater than T _a ) and choose ω = 2πk / T, 0 ≤ k < T so as to implement the Fourier transform in the form of a fast Fourier transform

ψ_{I, I} (not, k) = \frac{1}{T} \sum_{you = 0}^{T - 1} ψ_{I, I} ({you}_{not} - you) e^{- \frac{2 I π k you}{T}}

Dans ce cas, le nombre de composantes fréquentielles M est égal à la taille de la trame d'analyse T. Lorsque T > T_a, on peut également appliquer la technique du zero-padding afin d'appliquer la transformation de Fourier rapide. Ainsi, pour une sous-bande fréquentielle ω (ou k dans le cas d'une transformée de Fourier rapide) considérée, le vecteur constitué de l'ensemble des composantes ψ _i,l(n,ω), (ou ψ _i,l (n,k)) pour les différents /, représente la composante fréquentielle du champ ψ _i dans la sous-bande fréquentielle ω (ou k) considérée.In this case, the number of frequency components M is equal to the size of the analysis frame T. When T>T _a , the zero-padding technique can also be applied in order to apply the fast Fourier transformation. Thus, for a frequency sub-band ω (or k in the case of a fast Fourier transform) considered, the vector made up of the set of components ψ _{i , l} ( n , ω ), (or ψ _i,l ( n , k )) for the different /, represents the frequency component of the field ψ _i in the frequency sub-band ω (or k) considered.

Par ailleurs, dans d'autres variantes, la transformation appliquée à l'étape E500 n'est pas une transformée de Fourier, mais un banc de filtres à reconstruction (presque) parfaite, par exemple un banc de filtres :

QMF (de l'anglais « Quadrature Mirror Filter ») ;
PQMF (de l'anglais « Pseudo - Quadrature Mirror Filter ») ; ou
MDCT (de l'anglais « Modified Discrète Cosine Transform »).

Furthermore, in other variants, the transformation applied in step E500 is not a Fourier transform, but a bank of filters with (almost) perfect reconstruction, for example a bank of filters:

QMF (from the English “Quadrature Mirror Filter”);
PQMF (from the English “Pseudo-Quadrature Mirror Filter”); Where
MDCT (from the English “Modified Discrete Cosine Transform”).

De retour à la [fig. 5], la transformation mise en oeuvre lors de l'étape E500 est répétée pour les N champs sonores encodés par les N microphones 100m sélectionnés, délivrant N jeux correspondants de M composantes fréquentielles de champ.Back to the [ fig. 5 ], the transformation implemented during step E500 is repeated for the N sound fields encoded by the N selected microphones 100m, delivering N corresponding sets of M field frequency components.

De la sorte, les étapes E210 et E210a décrites ci-dessus en relation avec la [fig. 2] (selon l'une quelconque des variantes précitées) sont mises en oeuvre pour chaque sous-bande fréquentielle parmi les M sous-bandes fréquentielles. Plus particulièrement, pour une sous-bande fréquentielle donnée parmi les M sous-bandes fréquentielles, l'interpolation délivre une composante fréquentielle de champ interpolée en la position d'interpolation et localisée dans la sous-bande fréquentielle donnée. La composante fréquentielle de champ interpolée s'exprime comme une combinaison linéaire des N composantes fréquentielles de champ, parmi les N jeux, localisées dans la sous-bande fréquentielle donnée. En d'autres termes, la résolution des systèmes d'équations permettant de déterminer les facteurs de pondération (i.e. [Math 4] dans les alternatives de contraintes précitées (i.e. hyperplan et/ou positivité des facteurs de pondération) et [Math 5]) est effectuée dans chacune des sous-bande fréquentielles pour produire un ensemble de facteurs de pondération par sous-bande fréquentielle a_i (n, ω) (ou a_i (n, k)).In this way, the steps E210 and E210a described above in relation to the [ fig. 2 ] (according to any one of the aforementioned variants) are implemented for each frequency sub-band among the M frequency sub-bands. More particularly, for a given frequency sub-band among the M frequency sub-bands, the interpolation delivers a field frequency component interpolated at the interpolation position and localized in the given frequency sub-band. The interpolated field frequency component is expressed as a linear combination of the N field frequency components, among the N sets, located in the given frequency sub-band. In other words, the resolution of the systems of equations making it possible to determine the weighting factors (ie [Math 4] in the alternatives of constraints mentioned above (ie hyperplane and/or positivity of the weighting factors) and [Math 5]) is performed in each of the frequency sub-bands to produce a set of weighting factors per frequency sub-band a _i ( n, ω ) (or a _i ( n , k )).

Par exemple, afin de mettre en oeuvre la résolution des systèmes [Math 4] ou [Math 5], la puissance efficace dans chaque sous-bande fréquentielle est estimée soit par moyenne glissante : $\hat{W_{ι}^{2}} (n, ω) = \frac{1}{P} \sum_{p = n - P + 1}^{n} |W_{i}^{2} (p, ω)|,$

soit par filtrage autorégressif :

\hat{W_{ι}^{2}} (n, ω) = α_{w} \hat{W_{ι}^{2}} (n - 1, ω) + (1 - α_{w}) |W_{i}^{2} (n, ω)| .

For example, in order to implement the resolution of the systems [Math 4] or [Math 5], the effective power in each frequency sub-band is estimated either by moving average:

\hat{W_{ι}^{}} (not, ω) = \frac{1}{P} \sum_{p = not - P + 1}^{not} |W_{I}^{2} (p, ω)|,

either by autoregressive filtering:

\hat{W_{ι}^{}} (not, ω) = α_{w} W_{ι} \hat{_{2}} (not - 1, ω) + (1 - α_{w}) |W_{I}^{2} (not, ω)| .

Ainsi, l'interpolation répétée pour les M sous-bandes fréquentielles délivre M composantes fréquentielles de champ interpolées en la position d'interpolation, chaque composante fréquentielle de champ interpolée parmi les M composantes fréquentielles de champ interpolées étant localisée dans une sous-bande fréquentielle distincte.Thus, the repeated interpolation for the M frequency sub-bands delivers M interpolated field frequency components at the interpolation position, each interpolated field frequency component among the M interpolated field frequency components being located in a distinct frequency sub-band .

Ainsi, lors d'une étape E510, une transformation inverse à la transformation appliquée lors de l'étape E500 est appliquée aux M composantes fréquentielles de champ interpolées délivrant le champ sonore encodé interpolé en la position d'interpolation.Thus, during a step E510, a transformation inverse to the transformation applied during step E500 is applied to the M interpolated field frequency components delivering the interpolated encoded sound field at the interpolation position.

Par exemple, reconsidérant l'exemple donné ci-dessus où la transformation appliquée lors de l'étape E500 est une transformée de Fourier directe, la transformation inverse appliquée lors de l'étape E510 est une transformée de Fourier inverse.For example, reconsidering the example given above where the transformation applied during step E500 is a direct Fourier transform, the inverse transformation applied during step E510 is an inverse Fourier transform.

On présente désormais, en relation avec la [ fig. 6 ], un procédé de restitution du champ sonore capté par les microphones 100m de la [fig. 1] à l'auditeur 110 selon un mode de réalisation de l'invention.We now present, in relation to the [ fig. 6 ], a process for restoring the sound field picked up by the 100m microphones of the [ fig. 1 ] to the listener 110 according to one embodiment of the invention.

Plus particulièrement, lors d'une étape E600, le champ sonore est capté par les microphones 110m, chaque microphone parmi les microphones 110m délivrant un champ sonore capté correspondant.More particularly, during a step E600, the sound field is picked up by the microphones 110m, each microphone among the microphones 110m delivering a corresponding picked up sound field.

Lors d'une étape E610, chacun des champs sonores captés est encodé sous une forme comprenant la pression captée et un vecteur de gradients de pression associé.During a step E610, each of the sound fields picked up is encoded in a form comprising the pressure picked up and an associated vector of pressure gradients.

Dans d'autres modes de réalisation non illustrés, le champ sonore capté par les microphones 110m est encodé sous une forme comprenant la pression captée, un vecteur de gradients de pression associé ainsi que tout ou partie des composantes d'ordres supérieurs du champ sonore décomposé au format ambisonique.In other embodiments not illustrated, the sound field picked up by the microphones 110m is encoded in a form comprising the pressure picked up, a vector of associated pressure gradients as well as all or part of the higher order components of the decomposed sound field in Ambisonics format.

De retour à la [fig. 6], le procédé de restitution comprend une phase d'interpolation E620 correspondant à la mise en oeuvre du procédé d'interpolation selon l'invention (selon l'un quelconque des modes de réalisation et/ou variantes décrit(e)s ci-dessus en relation avec les [fig. 2] et [fig. 5]) délivrant le champ sonore encodé interpolé en la position d'interpolation, e.g. la position de l'auditeur 110.Back to the [ fig. 6 ], the restitution method comprises an interpolation phase E620 corresponding to the implementation of the interpolation method according to the invention (according to any one of the embodiments and/or variants described above in relation to the [ fig. 2] and [fig. 5 ]) delivering the interpolated encoded sound field at the interpolation position, eg the position of the listener 110.

Lors d'une étape E630, le champ sonore encodé interpolé est compressé, e.g. par mise en oeuvre d'un codage entropique. Un champ sonore encodé interpolé compressé est ainsi délivré. Par exemple, l'étape E630 de compression est mise en oeuvre par le dispositif 700 (décrit ci-dessous en relation avec la figure 7) qui se trouve déporté du dispositif de restitution 110hp.During a step E630, the interpolated encoded sound field is compressed, eg by implementing entropic coding. A compressed interpolated encoded sound field is thus delivered. For example, the compression step E630 is implemented by the device 700 (described below in relation to the figure 7 ) which is remote from the 110hp restitution device.

Ainsi, lors d'une étape E640, le champ sonore encodé interpolé compressé délivré par le dispositif 700 est transmis au dispositif de restitution 110hp. Dans d'autres modes de réalisation, le champ sonore encodé interpolé compressé est transmis à un autre dispositif disposant d'une capacité de calcul permettant de décompresser un contenu compressé, e.g. un smartphone, un ordinateur, ou tout autre terminal connecté disposant d'une capacité de calcul suffisante, en vue d'une transmission ultérieure.Thus, during a step E640, the compressed interpolated encoded sound field delivered by the device 700 is transmitted to the restitution device 110hp. In other embodiments, the compressed interpolated encoded sound field is transmitted to another device having a computing capacity making it possible to decompress compressed content, eg a smartphone, a computer, or any other connected terminal with sufficient computing capacity, for subsequent transmission.

De retour à la [fig. 6], lors d'une étape E650, le champ sonore encodé interpolé compressé reçu par le dispositif de restitution 110hp est décompressé afin de délivrer les échantillons du champ sonore encodé interpolé au format de codage utilisé (i.e. au format comprenant au moins la pression captée par le microphone 110m correspondant, les composantes du vecteur gradient de pression, ainsi que les composantes d'ordre supérieur présentes dans le champ sonore encodé le cas échant).Back to the [ fig. 6 ], during a step E650, the compressed interpolated encoded sound field received by the restitution device 110hp is decompressed in order to deliver the samples of the interpolated encoded sound field in the coding format used (ie in the format comprising at least the pressure picked up by the corresponding microphone 110m, the components of the pressure gradient vector, as well as the higher order components present in the encoded sound field if applicable).

Lors d'une étape E660, le champ sonore encodé interpolé est restitué sur le dispositif de restitution 110hp.During a step E660, the interpolated encoded sound field is reproduced on the restitution device 110hp.

Ainsi, lorsque la position d'interpolation correspond à la position physique de l'auditeur 110, ce dernier a l'impression que le champ sonore qui lui est restitué est cohérent avec les sources sonores 100s (i.e. que le champ qui lui est restitué arrive effectivement en direction des sources sonores 100s).Thus, when the interpolation position corresponds to the physical position of the listener 110, the latter has the impression that the sound field which is restored to him is coherent with the sound sources 100s (i.e. that the field which is restored to him actually in the direction of sound sources 100s).

Dans certains modes de réalisation non illustrés sur la [fig. 6], les étapes E630 de compression et E650 de décompression ne sont pas mises en oeuvre. Dans ces modes de réalisation, ce sont les échantillons bruts du champ sonore encodé interpolé qui sont transmis au dispositif de restitution 110hp.In some embodiments not shown in [ fig. 6 ], the compression and decompression steps E630 E650 are not implemented. In these embodiments, it is the raw samples of the interpolated encoded sound field which are transmitted to the rendering device 110hp.

Dans d'autres modes de réalisation non illustrés sur la [fig. 6], le dispositif 700 mettant en oeuvre au moins la phase d'interpolation E620 est embarqué dans le dispositif de restitution 110hp. Dans ce cas, ce sont les échantillons du champ sonore encodé (une fois compressés ou non selon les variantes) qui sont transmis au dispositif de restitution 110hp lors de l'étape E640, et non pas les échantillons du champ sonore encodé interpolé (une fois compressés ou non selon les variantes). En d'autres termes, dans ces modes de réalisation, l'étape E640 est mise en oeuvre juste après les étapes E600 et E610 de captation et d'encodage.In other embodiments not illustrated in [ fig. 6 ], the device 700 implementing at least the interpolation phase E620 is embedded in the restitution device 110hp. In this case, it is the samples of the encoded sound field (once compressed or not depending on the variants) which are transmitted to the restitution device 110hp during step E640, and not the samples of the interpolated encoded sound field (once compressed or not depending on the variants). In other words, in these embodiments, step E640 is implemented just after steps E600 and E610 of capture and encoding.

On présente désormais, en relation avec la [ fig. 7 ], un exemple de structure de dispositif 700 d'interpolation selon un mode de réalisation de l'invention.We now present, in relation to the [ fig. 7 ], an example of structure of interpolation device 700 according to one embodiment of the invention.

Le dispositif 700 comprend une mémoire vive 703 (par exemple une mémoire RAM), une unité de traitement 702 équipée par exemple d'un processeur, et pilotée par un programme d'ordinateur stocké dans une mémoire morte 701 (par exemple une mémoire ROM ou un disque dur). A l'initialisation, les instructions de code du programme d'ordinateur sont par exemple chargées dans la mémoire vive 703 avant d'être exécutées par le processeur de l'unité de traitement 702.The device 700 comprises a random access memory 703 (for example a RAM memory), a processing unit 702 equipped for example with a processor, and controlled by a computer program stored in a read only memory 701 (for example a ROM memory or a hard disc). On initialization, the code instructions of the computer program are for example loaded into the RAM 703 before being executed by the processor of the processing unit 702.

Cette [fig. 7] illustre seulement une manière particulière, parmi plusieurs possibles, de réaliser le dispositif 700 afin qu'il effectue certaines étapes du procédé d'interpolation selon l'invention (selon l'un quelconque des modes de réalisation et/ou variantes décrit(e)s ci-dessus en relation avec les [fig. 2] et [fig. 5]). En effet, ces étapes peuvent être réalisées indifféremment sur une machine de calcul reprogrammable (un ordinateur PC, un processeur DSP ou un microcontrôleur) exécutant un programme comprenant une séquence d'instructions, ou sur une machine de calcul dédiée (par exemple un ensemble de portes logiques comme un FPGA ou un ASIC, ou tout autre module matériel).That [ fig. 7 ] illustrates only one particular way, among several possible, of making the device 700 so that it performs certain steps of the interpolation method according to the invention (according to any one of the embodiments and/or variants described above in relation to the [ fig. 2] and [fig. 5 ]). Indeed, these steps can be carried out either on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).

Dans le cas où le dispositif 700 est réalisé avec une machine de calcul reprogrammable, le programme correspondant (c'est-à-dire la séquence d'instructions) pourra être stocké dans un médium de stockage amovible (tel que par exemple une disquette, un CD-ROM ou un DVD-ROM) ou non, ce médium de stockage étant lisible partiellement ou totalement par un ordinateur ou un processeur.In the case where the device 700 is produced with a reprogrammable calculating machine, the corresponding program (that is to say the sequence of instructions) could be stored in a removable storage medium (such as for example a diskette, a CD-ROM or a DVD-ROM) or not, this storage medium being partially or totally readable by a computer or a processor.

Par ailleurs, dans certains modes de réalisation discutés ci-dessus en relation avec la [fig. 6], le dispositif 700 est également configuré pour mettre en oeuvre tout ou partie des étapes supplémentaires du procédé de restitution de la [fig. 6] (e.g. les étapes E600, E610, E630, E640, E650 ou E660).Furthermore, in certain embodiments discussed above in connection with the [ fig. 6 ], the device 700 is also configured to implement all or part of the additional steps of the method for restoring the [ fig. 6 ] (eg steps E600, E610, E630, E640, E650 or E660).

Ainsi, dans certains modes de réalisation, le dispositif 700 est inclus dans le dispositif de restitution 110hp.Thus, in some embodiments, device 700 is included in renderer 110hp.

Dans d'autres modes de réalisation, le dispositif 700 est inclus dans un des microphones 110m ou bien est dupliqué dans plusieurs des microphones 110m.In other embodiments, the device 700 is included in one of the microphones 110m or is duplicated in several of the microphones 110m.

Dans encore d'autres modes de réalisation, le dispositif 700 est inclus dans un équipement déporté à la fois des microphones 110m ainsi que du dispositif de restitution 110hp. Par exemple, l'équipement déporté est un décodeur MPEG-H 3D, un serveur de contenus, un ordinateur, etc.In yet other embodiments, the device 700 is included in equipment remote from both the microphones 110m as well as the playback device 110hp. For example, the remote equipment is an MPEG-H 3D decoder, a content server, a computer, etc.

Claims

A method for interpolating a sound field captured by a plurality of N microphones each outputting said encoded sound field in a form comprising at least one captured pressure and an associated pressure gradient vector,
said method comprising an interpolation of said sound field at an interpolation position outputting an interpolated encoded sound field as a linear combination of said N encoded sound fields each weighted by a corresponding weighting factor,

characterised in that said interpolation comprises an estimation of said N weighting factors at least from:
- said interpolation position;

- a position of each of said N microphones,

- said N pressures captured by said N microphones; and

- an estimated power of said sound field at said interpolation position.
The method according to claim 1, wherein said estimation implements a resolution of the equation $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) x_{i} (t) = \hat{W_{a}^{2}} (t) x_{a} (t)$
, with:
- x _i (t) a vector representative of said position of the microphone bearing the index i among said N microphones,

- x _a (t) a vector representative of said interpolation position;

- $\hat{W_{a}^{2}} (t)$
said estimate of the power of said sound field at said interpolation position; and

- $\hat{W_{ι}^{2}} (t)$
an estimate of the instantaneous power $W_{i}^{2} (t)$
of said pressure captured by said microphone bearing the index i.
The method according to claim 2, wherein said resolution is performed with the constraint that $\sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = \hat{W_{a}^{2}} (t) .$
The method according to claim 3, wherein said resolution is further performed with the constraint that of the N weighting factors a_i (t) are positive or zero.
The method according to claim 2, wherein said estimation also implements a resolution of the equation $α \sum_{i} a_{i} (t) \hat{W_{ι}^{2}} (t) = α \hat{W_{a}^{2}} (t)$
, with α a homogenisation factor.
The method according to any one of claims 2 to 5, wherein said estimation comprises:
- a time averaging of said instantaneous power $W_{i}^{2} (t)$
over a predetermined period of time outputting said estimate $\hat{W_{ι}^{2}} (t)$
; or

- an autoregressive filtering of time samples of said instantaneous power $W_{i}^{2} (t)$
, outputting said estimate $\hat{W_{ι}^{2}} (t)$
.
The method according to any one of claims 2 to 6, wherein said estimate $\hat{W_{a}^{2}} (t)$
of the power of said sound field at said interpolation position is estimated from said instantaneous sound power $W_{i}^{2} (t)$
captured by that one among said N microphones the closest to said interpolation position or from said estimate $\hat{W_{ι}^{2}} (t)$
of said instantaneous sound power $W_{i}^{2} (t)$
captured by that one among said N microphones the closest to said interpolation position.
The method according to any one of claims 2 to 6, wherein said estimate $\hat{W_{a}^{2}} (t)$
of the power of said sound field at said interpolation position is estimated from a barycentre of said N instantaneous sound powers $W_{i}^{2} (t)$
captured by said N microphones, respectively from a barycentre of said N estimates $\hat{W_{ι}^{2}} (t)$
of said N instantaneous sound powers $W_{i}^{2} (t)$
captured by said N microphones, a coefficient weighting the instantaneous sound power $W_{i}^{2} (t)$
, respectively weighting the estimate $\hat{W_{ι}^{2}} (t)$
of the instantaneous sound power $W_{i}^{2} (t)$
captured by said microphone bearing the index i, in said barycentre being inversely proportional to a normalised version of the distance between the position of said microphone bearing the index i outputting said pressure W_i (t) and said interpolation position, said distance being expressed in the sense of a L-p norm.
The method according to any one of claims 1 to 8, further comprising, prior to said interpolation, a selection of said N microphones among Nt microphones, Nt>N.
The method according to claim 9, wherein the N selected microphones are those the closest to said interpolation position among said Nt microphones.
The method according to claim 9, wherein said selection comprising:
- a selection of two microphones bearing the indexes i₁ and i₂ the closest to said interpolation position among said Nt microphones;

- a calculation of a median vector u ₁₂(t) having as an origin said interpolation position and pointing between the positions of the two microphones bearing the indexes i₁ and i₂; and

- a determination of a third microphone bearing the index i₃ different from said two microphones bearing the indexes i₁ and i₂ among the Nt microphones and whose position is the most opposite to the median vector u ₁₂(t).
The method according to any one of claims 1 to 11, further comprising, for given encoded sound field among said N encoded sound fields output by said N microphones, a transformation of said given encoded sound field by application of a perfect reconstruction filter bank outputting M field frequency components associated to said given encoded sound field, each field frequency component among said M field frequency components being located in a distinct frequency sub-band,
said transformation repeated for said N encoded sound fields outputting N corresponding sets of M field frequency components,

wherein, for a given frequency sub-band among said M frequency sub-bands, said interpolation outputs a field frequency component interpolated at said interpolation position and located within said given frequency sub-band, said interpolated field frequency component being expressed as a linear combination of said N field frequency components, among said N sets, located in said given frequency sub-band,

said interpolation repeated for said M frequency sub-bands outputting M interpolated field frequency components at said interpolation position, each interpolated field frequency component among said M interpolated field frequency components being located in a distinct frequency sub-band.
The method according to claim 12, further comprising an inverse transformation of said transformation, said inverse transformation applied to said M interpolated field frequency components outputting said interpolated encoded sound field at said interpolation position.
A method for rendering a sound field characterised in that it comprises:
- capturing said sound field by a plurality of N microphones each outputting a corresponding captured sound field;

- encoding of each of said captured sound fields outputting a corresponding encoded sound field in a form comprising at least one captured pressure and an associated pressure gradient vector;

- an interpolation phase according to any one of claims 1 to 13 outputting said interpolated encoded sound field at said interpolation position;

- a compression of said interpolated encoded sound field outputting a compressed interpolated encoded sound field;

- a transmission of said compressed interpolated encoded sound field to at least one rendering device;

- a decompression of said received compressed interpolated encoded sound field; and

- rendering said interpolated encoded sound field on said at least one rendering device.
A computer program product, comprising program code instructions for the implementation of a method according to any one of claims 1 to 14, when said program is executed on a computer.
A device for interpolating a sound field captured by a plurality of N microphones each outputting said encoded sound field in a form comprising at least one captured pressure and an associated pressure gradient vector,
said device comprising a reprogrammable computing machine or a dedicated computing machine, configured to interpolate said sound field at an interpolation position outputting an interpolated encoded sound field expressed as a linear combination of said N encoded sound fields each weighted by a corresponding weighting factor,

characterised in that said reprogrammable computing machine or said dedicated computing machine is further configured to estimate said N weighting factors from at least:
- said interpolation position;

- a position of each of said N microphones,

- said N pressures captured by said N microphones, and

- an estimate of the power of said sound field at said interpolation position.