EP3895446A1 - Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung - Google Patents

Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung

Info

Publication number
EP3895446A1
EP3895446A1 EP19816809.8A EP19816809A EP3895446A1 EP 3895446 A1 EP3895446 A1 EP 3895446A1 EP 19816809 A EP19816809 A EP 19816809A EP 3895446 A1 EP3895446 A1 EP 3895446A1
Authority
EP
European Patent Office
Prior art keywords
microphones
sound field
interpolation
field
interpolated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP19816809.8A
Other languages
English (en)
French (fr)
Other versions
EP3895446B1 (de
Inventor
Alexandre GUÉRIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fondation B Com
Original Assignee
Fondation B Com
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fondation B Com filed Critical Fondation B Com
Publication of EP3895446A1 publication Critical patent/EP3895446A1/de
Application granted granted Critical
Publication of EP3895446B1 publication Critical patent/EP3895446B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/301Automatic calibration of stereophonic sound system, e.g. with test microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • TITLE Sound field interpolation process, computer program product and corresponding device.
  • the field of the invention is that of the interpolation of a sound (or acoustic) field having been emitted by one or more sources and having been picked up by a finite set of microphones.
  • the invention has many applications, in particular, but not exclusively, in the field of virtual reality, for example to allow a listener to move in a sound scene which is returned to him, or in the field of analysis of sound scene, for example to determine the number of sound sources present in the analyzed scene, or in the field of the reproduction of a multichannel scene, for example within an MPEG-H 3D decoder, etc.
  • a classic approach consists in estimating the sound field at the given position using linear interpolation between the fields as captured and encoded by the different stage microphones.
  • the interpolation coefficients are estimated by minimizing a cost function.
  • an ambisonic microphone encodes and delivers the sound field which it picks up in an ambisonic format.
  • the ambisonic format is characterized by components which consist of the projection of the sound field according to different directivities. These components are grouped in order. The zero order encodes the instantaneous sound pressure picked up by the microphone, the one order encodes the three pressure gradients along the three axes of space, etc. The higher the order, the greater the spatial resolution of the representation of the field.
  • the ambisonic format in its complete representation makes it possible to encode the field at any point inside the maximum sphere free of sound sources, and having as center the physical location of the microphone having performed the capture.
  • Such encoding of the sound field theoretically makes it possible, from a single microphone, to move within the zone delimited by the source closest to the microphone, without however being able to bypass any of the sources in question.
  • Such microphones thus make it possible to represent the sound field in three dimensions via a decomposition of the latter into spherical harmonics.
  • This decomposition is particularly suitable for navigation called 3DoF (from the English "Degree of Freedom”), eg navigation according to the three dimensions. It is this format that was chosen for immersive content on the virtual reality channel of YouTube or on Facebook-360.
  • the method must allow the sound field at the interpolation position to be estimated so that the field in question is consistent with the position of the sound sources. For example, a listener in the interpolation position must have the impression that the interpolated field actually arrives towards the sound source (s) of the sound scene which, when the field in question is returned (eg to allow the listener to navigate the sound scene).
  • a method of interpolating a sound field picked up by a plurality of N microphones each delivering the encoded sound field in a form comprising at least one sensed pressure and a vector of gradients. associated pressure comprises an interpolation of the sound field at an interpolation position delivering an interpolated encoded sound field expressed as a linear combination of the N encoded sound fields each weighted by a corresponding weighting factor.
  • the method further comprises an estimation of the N weighting factors from at least:
  • the invention proposes a new and inventive solution for carrying out an interpolation of a sound field picked up by at least two microphones, for example in a scene comprising one or more sound source (s).
  • the proposed method takes advantage of the encoding of the sound field in a form giving access to the vector of pressure gradients in addition to the pressure.
  • the vector of pressure gradients of the interpolated field remains consistent with that of the sound field as emitted by the source or sources of the scene at the interpolation position.
  • a listener in the interpolation position and listening to the interpolated field has the impression that the field which is restored to it is coherent with the sound source (s) (ie that the field which is returned actually arrives in the direction of the sound source (s) in question).
  • the use of an estimated power of the sound field at the interpolation position to estimate the weighting factors makes it possible to keep a low computational complexity. This allows for example a real-time implementation on devices with limited computing capacity.
  • the estimation implements a resolution of the equation
  • the equation in question is solved in the sense of minimizing the mean square error, eg by minimizing the cost function .
  • the resolution method eg the Simplex algorithm
  • the overdetermined character more equations than microphones
  • under determined more microphones than equations
  • the resolution is carried out under the constraint that
  • the resolution is also carried out under the constraint that the N weighting factors a ⁇ ⁇ t) are all positive or harmful. Thus phase reversals are avoided, thereby leading to improved results. In addition, the resolution of the above equation is accelerated.
  • the estimation also implements a resolution of homogenization factor.
  • the homogenization factor a is proportional to the standard L-2 of the vector x a (t).
  • the estimate includes:
  • the estimate l / a 2 (t) of the power of the sound field at the interpolation position is estimated from the instantaneous sound power W 2 (t) picked up by that of the N most microphones close to the interpolation position or from the estimate W 2 (t) of the instantaneous sound power W 2 (t) picked up by that of the N microphones closest to the interpolation position.
  • the estimate l / a 2 (t) of the power of the sound field at the interpolation position is estimated from a barycenter of the N instantaneous sound powers W 2 t) picked up by the N microphones , respectively from a barycenter of the N estimates W 2 (t) of the N instantaneous sound powers W 2 (t) picked up by the N microphones.
  • a coefficient weighting the instantaneous sound power W 2 (t), respectively weighting the estimate W 2 (t) of the instantaneous sound power W 2 (t) picked up by the microphone with index /, in the barycenter is inversely proportional to a normalized version of the distance between the position of the index / delivery microphone M ⁇ (t) and the interpolation position. The distance is expressed in the sense of a standard Lp.
  • the interpolation method further comprises, prior to the interpolation, a selection of the N microphones from Nt microphones, Nt> N.
  • the weighting factors can be obtained via a system of determined or overdetermined equations, thus making it possible to avoid or at least minimize the changes in timbre perceptible to the ear on the interpolated sound field.
  • the N microphones selected are the closest to the interpolation position among the Nt microphones.
  • the selection includes:
  • the microphones are selected so as to be distributed around the interpolation position.
  • the median vector Uu (t) is expressed as with x a (t) the vector representative of the interpolation position, Xi (t) a vector representative of the position of the microphone of index h, and; 2 (t) a vector representative of the position of the microphone of index 12.
  • the index 13 of the third microphone is an index different from ii and 12 which minimizes the dot product among the Nt indices of microphones.
  • the interpolation method further comprises, for an encoded sound field given from among the N encoded sound fields delivered by the N microphones, a transformation of the encoded sound field given by application of a bank of reconstruction filters perfect delivering M field frequency components associated with the given encoded sound field, each field frequency component among the M field frequency components being located in a separate frequency sub-band.
  • the repeated transformation for the N encoded sound fields delivers N corresponding sets of M frequency field components.
  • the interpolation delivers a frequency component of field interpolated in the position of interpolation and located in the given frequency sub-band, the frequency component of interpolated field is expressed as a linear combination of the N frequency components of field, among the N sets, located in the given frequency sub-band.
  • the repeated interpolation for the M frequency sub-bands delivers M frequency components of the interpolated field in the interpolation position, each frequency component of the interpolated field among the M frequency components of the interpolated field being located in a separate frequency sub-band.
  • the results are improved in the case where the sound field is generated by a plurality of sound sources.
  • the interpolation method further comprises a transformation opposite to said transformation.
  • the inverse transformation applied to the M frequency components of interpolated field delivers the encoded sound field interpolated in the interpolation position.
  • the bank of filters with perfect reconstruction belongs to the group comprising:
  • MDCT from “Modified Discrt Cosine Transform”.
  • the invention also relates to a method for restoring a sound field.
  • Such a method includes:
  • the invention also relates to a computer program, comprising program code instructions for the implementation of an interpolation or restitution method as described above, according to any one of its different embodiments, when said program is executed by a processor.
  • a device for interpolating a sound field picked up by a plurality of N microphones each delivering the encoded sound field in a form comprising at least one sensed pressure and a vector of associated pressure gradients comprises a reprogrammable calculation machine or a dedicated calculation machine, capable of and configured to implement the steps of the interpolation method described above (according to any one of its different embodiments).
  • FIG. 1 represents a sound scene in which a listener moves, a sound field having been diffused by sound sources and having been picked up by microphones;
  • FIG. 2 represents the stages of a process of interpolation of the sound field picked up by the microphones of [fig. 1] according to one embodiment of the invention
  • FIG. 3a represents a scene in which a sound field is diffused by a single sound source and is picked up by four microphones according to a first configuration
  • [fig. 3b] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 3a] as well as a map of the opposite of the normalized acoustic intensity as estimated by a known method from the quantities picked up by the four microphones of [fig. 3a];
  • [fig. 3c] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 3a] as well as a map of the opposite of the normalized acoustic intensity as estimated by the method of the figure [fig. 2] from the quantities picked up by the four microphones in [fig. 3a];
  • [fig. 4a] represents another scene in which a sound field is diffused by a single sound source and is picked up by four microphones according to a second configuration;
  • [fig. 4b] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 4a] as well as a cartography of the opposite of the normalized acoustic intensity of the sound field as estimated by a known method from the quantities picked up by the four microphones of [fig. 4a];
  • [fig. 4c] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 4a] as well as a map of the opposite of the normalized acoustic intensity of the sound field as estimated by the method in Figure [fig. 2] from the quantities picked up by the four microphones in [fig. 4a];
  • FIG. 5 represents the stages of a process of interpolation of the sound field picked up by the microphones of [fig. 1] according to another embodiment of the invention
  • FIG. 6 represents the stages of a restitution process, to the listener of [fig. 1], of the sound field picked up by the microphones in [fig. 1] according to one embodiment of the invention
  • FIG. 7 shows an example of an interpolation device structure according to an embodiment of the invention.
  • the general principle of the invention is based on the encoding of the sound field by the microphones picking up the sound field in question in a form comprising at least one sensed pressure and an associated pressure gradient.
  • the pressure gradient of the field interpolated via a linear combination of the sound fields encoded by the microphones remains consistent with that of the sound field as emitted by the source (s) of the scene at the interpolation position.
  • the method according to the invention bases the estimation of the weighting factors involved in the linear combination in question on an estimation of the power of the sound field at the interpolation position.
  • a low computational complexity is obtained.
  • encoding (or coding) terminology is used to designate the operation of representing a physical sound field picked up by a given microphone according to one or more quantities according to a predefined representation format.
  • a predefined representation format is for example the ambisonic format described above in relation to the section "Prior art and its drawbacks”.
  • the reverse operation is then similar to a restitution of the sound field, e.g. on a loudspeaker type device which converts samples of the sound field in the predefined representation format into a physical sound field; and
  • compression terminology is used to designate processing aimed at reducing the amount of data necessary to represent a given amount of information. This is, for example, a processing of the “entropy coding” type (eg according to the MP3 standard) applied to samples of the encoded sound field.
  • the decompression terminology thus corresponds to the reverse operation.
  • the listener 110 is provided with a headset equipped with HOhp speakers allowing the restitution of the interpolated sound field at the interpolation position which it occupies.
  • a headset equipped with HOhp speakers allowing the restitution of the interpolated sound field at the interpolation position which it occupies.
  • This is for example a Hi-Fi headset, or a virtual reality headset like the Oculus, the HTC Vive or the Samsung Gear.
  • the sound field is here interpolated and restored by implementing the reproduction process described below in relation to [fig. 6]
  • the sound field picked up by the 100m microphones is encoded in a form comprising a captured pressure and an associated pressure gradient.
  • the sound field picked up by the microphones is encoded in a form comprising the pressure picked up, the vector of the associated pressure gradients as well as all or part of the higher order components of the sound field in format. ambisonic.
  • the perception of the direction of arrival of the wave front of the sound field is directly correlated with an acoustic intensity vector / (t) which measures the instantaneous flow of acoustic energy through an elementary surface.
  • the intensity vector in question is equal to the product of the instantaneous sound pressure W (t) by the particle velocity, which is opposite to the vector of the pressure gradients B (t).
  • This vector of gradients pressure can be expressed in 2D or 3D depending on whether you want to move and / or perceive sounds in 2D or 3D. In the following, we place our in the 3D case, the derivation of the 2D case being immediate.
  • this vector is orthogonal to the wave front and points in the direction of the propagation of the sound wave, ie opposite to the position of the emitting source: in this sense, it is directly correlated with perception of the wave front. This is particularly obvious if we consider a field generated by a single point and distant source s (t) propagating in an anechoic medium.
  • the theory of ambinosia stipulates that, for such a plane wave of incidence (q, f), where é? Is the azimuth and elevation, the first order sound field is given by the following equation:
  • the full-band acoustic intensity / (t) is equal (to within a multiplying coefficient), to:
  • the method according to the invention implements the resolution of systems of equations (ie [Math 4] in different constraint alternatives (ie hyperplane and / or weighting factors) and [Math 5]).
  • systems of equations ie [Math 4] in different constraint alternatives (ie hyperplane and / or weighting factors) and [Math 5]).
  • the resolution of the systems in question in the case where they are under-determined (case which corresponds to the configuration where there are more 100m microphones than equations to be solved) leads to solutions which , over time, may favor different sets of microphones. If the location of the sources 100s as perceived via the interpolated sound field always remains consistent, it nevertheless results in changes in timbre perceptible to the ear.
  • N microphones 100m are selected by reducing to a determined, even over-determined, mixture. For example, in the case of a 3D interpolation, it will be possible to select up to three microphones from among the Nt 100m microphones.
  • the N microphones 110m closest to the position to be interpolated are selected. This solution is to be preferred when a large number Nt of 110m microphones is present in the scene. However, in certain cases, the choice of the N closest 110m microphones may prove to be "unbalanced" with regard to the position of interpolation with respect to the source 100s and lead to a complete inversion of the direction of arrival: c 'is particularly the case when the source 100s is placed between the microphones 100m and the interpolation position.
  • step E200 includes for example:
  • a (t) (x a (t) y a ⁇ t) z a (t)) T a vector representative of the interpolation position (ie the position of the listener 110 in the embodiment shown in [ fig. 1]);
  • Xi (t) (Xi 1 vector representative of the position of the microphone of index ii;
  • the index of said third microphone is for example an index different from ii and which minimizes the dot product among the Nt microphone indices
  • the dot product in question varies between -1 and +1, and it is minimal when the vectors Uu (t) and are opposite, that is to say when the 3 microphones selected from the Nt microphones 110m surround the interpolation position.
  • the selection step E200 is not implemented and the steps E210 and E210a described below are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m.
  • N Nt for the implementation of steps E210 and E210a in the other embodiments in question.
  • the method comprises a step E210 of interpolation of the sound field in an interpolation position, delivering an encoded interpolated sound field expressed as a linear combination of the N sound fields encoded by the N selected microphones 100m, the N fields encoded sound signals are each weighted by a corresponding weighting factor.
  • the interpolation method according to the invention applies in the same way in order to estimate the weighting factors a * (t).
  • the first order components are inversely proportional to the distance between the active source 100s and the measurement point, eg the microphone 100m with index /, and points from the active source 100s to the microphone 100m index / in question.
  • x s (t) (x s (t) y s (t) z s (t)) T a vector representative of the position of the active source 100s;
  • d (xi (t), x s (t)) is the distance between the microphone 100m with index / and the active source 100s.
  • the first order component (i.e. the vector of pressure gradients) of the encoded sound field is oriented in the “source-point of capture” direction;
  • the amplitude of the sound field decreases linearly with distance.
  • the different positions mentioned above (eg of the active source 100s, microphones 100m, of the interpolation position, etc.) vary over time.
  • the factors of weighting a ⁇ t) are generally a function of time.
  • Estimating the weighting factors a ⁇ ⁇ t) amounts to solving a system of three linear equations (written above as a single vector equation in [Math 3]). So that the interpolation remains consistent over time with the interpolation position which can vary over time (eg if the position in question corresponds to the position of the listener 110 which is caused to move), it is carried out at different times with a time resolution T a adapted to the speed of change of the interpolation position.
  • Wa the square of the sound pressure at the interpolation position, Wa (t ⁇ also called instantaneous acoustic power (or more simply instantaneous power), is an unknown, as well as the vector representative of the position x s (t) of the active 100s source.
  • an estimate M 2 (t) of the sound power at the level of the interpolation position is for example obtained.
  • a first approach consists in approaching the instantaneous sound power by that picked up by the microphone 100m closest to the interpolation position in question, i.e.:
  • the instantaneous sound power Wj * (t) can vary rapidly over time, which can lead to a noisy estimate of the weighting factors a ⁇ ⁇ t) and to an instability of the interpolated scene.
  • the average or effective power picked up by the microphone 100m closest to the interpolation position over a time window around the instant is calculated, by averaging the instantaneous power over a frame of T samples:
  • T corresponds to a duration of a few tens of milliseconds, or even be equal to the temporal resolution of the refreshment of the weighting factors a ⁇ t.
  • a w is determined in such a way as to integrate the power over a few tens of milliseconds.
  • values from 0.95 to 0.98 for signal sampling frequencies ranging from 8 kHz to 48 kHz achieves a good compromise between the robustness of the interpolation and its reactivity to changes in position of the source.
  • the instantaneous acoustic power V a 2 (t) at the interpolation position is estimated as a barycenter of the N estimates W t 2 (t) of the N instantaneous powers l / 2 (t) of the N pressures picked up by the N selected 100m microphones.
  • W t 2 (t) of the N instantaneous powers l / 2 (t) of the N pressures picked up by the N selected 100m microphones is more relevant when the 100m microphones are spaced from each other.
  • a coefficient weighting the estimate W t 2 (f) of the instantaneous power W 2 t) of the pressure sensed by the microphone 110m of index /, in the barycentric expression above is inversely proportional to a normalized version of the distance, within the meaning of the Lp standard, between the position of the index microphone / delivering the pressure W ⁇ t and the interpolation position.
  • the instantaneous acoustic power M 2 (t) at the interpolation position is estimated directly as a barycenter of the N instantaneous powers W 2 t) of the N pressures picked up by the N microphones 100m. In practice, this amounts to substituting W 2 t) for W 2 (t) in the above equation.
  • weighting factors a ⁇ t are estimated from:
  • the resolution method e.g. the Simplex algorithm
  • the character is overdetermined (more equations than microphones) or underdetermined (more microphones than equations).
  • W t 2 (t) and l / a 2 (t) are for example estimated according to one of the variants proposed here. - above
  • the resolution of such a linear system under linear stress can be carried out by the Simplex algorithm or any other algorithm of minimization under stress.
  • the coefficient a makes it possible to homogenize the units of the quantities l / a 2 (t) a (t) and
  • Wa (t) the quantities in question are not homogeneous and, depending on the unit chosen for the position coordinates (meter, centimeter, ...), the solutions will favor either
  • the coefficient a is for example chosen equal to the standard L-2 of the vector
  • weighting factors W ( ( ⁇ ) are estimated from:
  • the four microphones 300m are placed at the four corners of a room and the source 300s is placed in the center of the room.
  • the room has an average reverberation, with a reverberation time or Teo of around 500ms.
  • the sound field picked up by the 300m microphones is encoded in a form comprising a captured pressure and the associated pressure gradient vector.
  • the method comprises the step E200 of selecting N microphones from among the Nt microphones of the scene 100 described above in relation to [fig. 2].
  • the selection step E200 is not implemented and the steps E500, E210 and E510 discussed below, are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m.
  • N Nt in these other embodiments.
  • the embodiment in question is found to be suitable for the case where several sources among the sources 100s are active simultaneously.
  • the hypothesis of a full band field resembling a plane wave is no longer valid. Indeed, even in an anechoic medium, the mixture of two plane waves is not a plane wave - except in the very specific case of the same source emitting from 2 points in space equidistant from the point of capture.
  • the “full band” field reconstruction procedure adapts to the preponderant source in the frame used for the calculation of the effective powers. This produces rapid variations in directivity, and sometimes inconsistencies in the location of sources: when one source is more energetic than another, the two sources in question are estimated to be located at the position of the most energetic source.
  • [fig. 5] exploits the parsimony of signals in the frequency domain.
  • speech signals for example, it is statistically proven that the frequency carriers of several speech signals are globally disjoint: that is to say that most of the time, only one source is present in each band frequency.
  • the embodiment of [fig. 2] (according to any one of the aforementioned variants) can thus be applied to the signal present in each frequency band.
  • a transformation of the given encoded sound field is carried out by application of a time-frequency transformation like the transform Fourier or a bank of filters with perfect or almost perfect reconstruction, such as quadrature or QMF mirror filters.
  • a transformation delivers M frequency components of field associated with the given encoded sound field, each frequency component of field among the M frequency components of field being located in a distinct frequency sub-band.
  • the encoded field vector, y ⁇ delivered by the microphone with index /, / from 1 to N, is segmented into frames of index n, of size T compatible with the stationarity of the sources present in the scene:
  • Y ⁇ (h) [y ⁇ ( ⁇ h - T + 1) xl i (t n - T + 2) ⁇ > i (tn)] ⁇
  • the frame rate is for example the rhythm of updating T weighting factors a * (t), ie:
  • each component of the vector y ⁇ representing the sound field encoded by the microphone 100m of index / (ie is applied to the sensed pressure, to the components of the vector of the pressure gradients, as well as to the components of higher order present in the sound field encoded if necessary, to produce a time-frequency representation.
  • the transformation in question is a direct Fourier transformation. So, we obtain for the / -th component in of the vector Y ⁇
  • M the number of frequency components M is equal to the size of the analysis frame T.
  • the vector consisting of the set of components i /; ⁇ ; (h, w), (or Yu (h, k)) for the different / represents the frequency component of the field y ⁇ in the frequency sub-band w (or k) considered.
  • the transformation applied in step E500 is not a Fourier transform, but a bank of filters with (almost) perfect reconstruction, for example a bank of filters:
  • MDCT from “Modified Discrt Cosine Transform”.
  • step E500 is repeated for the N sound fields encoded by the N microphones 100m selected, delivering N corresponding sets of M frequency field components.
  • steps E210 and E210a described above in relation to [fig. 2] are implemented for each frequency sub-band among the M frequency sub-bands. More particularly, for a given frequency sub-band among the M frequency sub-bands, the interpolation delivers a frequency component of field interpolated in the position of interpolation and located in the given frequency sub-band.
  • the interpolated field frequency component is expressed as a linear combination of the N field frequency components, among the N sets, located in the given frequency sub-band.
  • the resolution of the systems of equations making it possible to determine the weighting factors is performed in each of the frequency sub-bands to produce a set of weighting factors per frequency sub-band a ⁇ h, w (or W ( (h, k)).
  • the effective power in each frequency sub-band is estimated either by sliding average:
  • the repeated interpolation for the M frequency sub-bands delivers M frequency components of the interpolated field in the interpolation position, each frequency component of the interpolated field among the M frequency components of the interpolated field being located in a separate frequency sub-band.
  • a reverse transformation to the transformation applied during step E500 is applied to the M frequency components of interpolated field delivering the encoded sound field interpolated in the interpolation position.
  • the reverse transformation applied during step E510 is an inverse Fourier transform.
  • the sound field is picked up by the microphones 110m, each microphone among the microphones 110m delivering a corresponding picked up sound field.
  • each of the captured sound fields is encoded in a form comprising the captured pressure and an associated pressure gradient vector.
  • the sound field picked up by the 110m microphones is encoded in a form comprising the sensed pressure, an associated pressure gradient vector as well as all or part of the higher order components of the decomposed sound field. in ambisonic format.
  • the restitution method comprises an interpolation phase E620 corresponding to the implementation of the interpolation method according to the invention (according to any of the embodiments and / or variants described below) above in relation to [fig. 2] and [fig. 5]) delivering the encoded sound field interpolated in the interpolation position, eg the position of the listener 110.
  • the interpolated encoded sound field is compressed, e.g. by implementing entropy coding.
  • a compressed interpolated encoded sound field is thus delivered.
  • the compression step E630 is implemented by the device 700 (described below in relation to FIG. 7) which is removed from the 11Ohp rendering device.
  • the compressed interpolated encoded sound field delivered by the device 700 is transmitted to the playback device 11Ohp.
  • the compressed interpolated encoded sound field is transmitted to another device having a calculation capacity making it possible to decompress compressed content, eg a smartphone, a computer, or any other connected terminal with sufficient computing capacity, for later transmission.
  • the compressed interpolated encoded sound field received by the llOhp playback device is decompressed in order to deliver the samples of the interpolated encoded sound field in the coding format used (ie in the format comprising at least the pressure sensed by the corresponding microphone 110m, the components of the pressure gradient vector, as well as the higher order components present in the sound field encoded if necessary).
  • the coding format used ie in the format comprising at least the pressure sensed by the corresponding microphone 110m, the components of the pressure gradient vector, as well as the higher order components present in the sound field encoded if necessary.
  • step E660 the interpolated encoded sound field is restored on the llOhp reproduction device.
  • the interpolation position corresponds to the physical position of the listener 110
  • the latter has the impression that the sound field which is restored to him is consistent with the sound sources 100s (ie that the field which is restored to him arrives effectively towards sound sources 100s).
  • steps E630 of compression and E650 of decompression are not implemented.
  • it is the raw samples of the interpolated encoded sound field which are transmitted to the llOhp reproduction device.
  • the device 700 implementing at least the interpolation phase E620 is embedded in the 11Ohp rendering device.
  • it is the samples of the encoded sound field (once compressed or not depending on the variants) which are transmitted to the llOhp playback device during step E640, and not the samples of the interpolated encoded sound field (once compressed or not depending on the variants).
  • step E640 is implemented just after the steps E600 and E610 of capture and encoding.
  • the device 700 comprises a random access memory 703 (for example a RAM memory), a processing unit 702 equipped for example with a processor, and controlled by a computer program stored in a read-only memory 701 (for example a ROM memory or a hard disc). On initialization, the code instructions of the computer program are for example loaded into the random access memory 703 before being executed by the processor of the processing unit 702.
  • a random access memory 703 for example a RAM memory
  • a processing unit 702 equipped for example with a processor
  • a computer program stored in a read-only memory 701 for example a ROM memory or a hard disc
  • This [fig. 7] illustrates only one particular way, among several possible, of producing the device 700 so that it performs certain steps of the interpolation method according to the invention (according to any one of the embodiments and / or variants described above in relation to [fig. 2] and [fig. 5]). Indeed, these steps can be carried out indifferently on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).
  • a reprogrammable calculation machine a PC computer, a DSP processor or a microcontroller
  • a program comprising a sequence of instructions
  • a dedicated calculation machine for example a set of logic gates like an FPGA or an ASIC, or any other hardware module.
  • the corresponding program (that is to say the sequence of instructions) may be stored in a removable storage medium (such as for example a floppy disk, CD-ROM or DVD-ROM) or not, this storage medium being partially or completely readable by a computer or a processor.
  • a removable storage medium such as for example a floppy disk, CD-ROM or DVD-ROM
  • the device 700 is also configured to implement all or part of the additional steps of the restitution process of [fig. 6] (e.g. steps E600, E610, E630, E640, E650 or E660).
  • the device 700 is included in the llOhp rendering device.
  • the device 700 is included in one of the microphones 110m or is duplicated in several of the microphones 110m.
  • the device 700 is included in a remote device for both the 110m microphones and the llOhp playback device.
  • the remote equipment is an MPEG-H 3D decoder, a content server, a computer, etc.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Stereophonic System (AREA)
EP19816809.8A 2018-12-14 2019-12-13 Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung Active EP3895446B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1872951A FR3090179B1 (fr) 2018-12-14 2018-12-14 Procédé d’interpolation d’un champ sonore, produit programme d’ordinateur et dispositif correspondants.
PCT/EP2019/085175 WO2020120772A1 (fr) 2018-12-14 2019-12-13 Procédé d'interpolation d'un champ sonore, produit programme d'ordinateur et dispositif correspondants

Publications (2)

Publication Number Publication Date
EP3895446A1 true EP3895446A1 (de) 2021-10-20
EP3895446B1 EP3895446B1 (de) 2023-01-25

Family

ID=66530214

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19816809.8A Active EP3895446B1 (de) 2018-12-14 2019-12-13 Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung

Country Status (4)

Country Link
US (1) US11736882B2 (de)
EP (1) EP3895446B1 (de)
FR (1) FR3090179B1 (de)
WO (1) WO2020120772A1 (de)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2588801A (en) * 2019-11-08 2021-05-12 Nokia Technologies Oy Determination of sound source direction
FR3131164B1 (fr) 2021-12-16 2023-12-22 Fond B Com Procédé d’estimation d’une pluralité de signaux représentatifs du champ sonore en un point, dispositif électronique et programme d’ordinateur associés
US20240098439A1 (en) * 2022-09-15 2024-03-21 Sony Interactive Entertainment Inc. Multi-order optimized ambisonics encoding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355769A1 (en) * 2013-05-29 2014-12-04 Qualcomm Incorporated Energy preservation for decomposed representations of a sound field
US11032663B2 (en) * 2016-09-29 2021-06-08 The Trustees Of Princeton University System and method for virtual navigation of sound fields through interpolation of signals from an array of microphone assemblies

Also Published As

Publication number Publication date
FR3090179B1 (fr) 2021-04-09
FR3090179A1 (fr) 2020-06-19
WO2020120772A1 (fr) 2020-06-18
US11736882B2 (en) 2023-08-22
EP3895446B1 (de) 2023-01-25
US20220132262A1 (en) 2022-04-28

Similar Documents

Publication Publication Date Title
EP2898707B1 (de) Optimierte kalibrierung eines klangwiedergabesystems mit mehreren lautsprechern
EP3895446B1 (de) Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung
EP2374124B1 (de) Verwaltete codierung von mehrkanaligen digitalen audiosignalen
EP2374123B1 (de) Verbesserte codierung von mehrkanaligen digitalen audiosignalen
EP2002424B1 (de) Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse
EP3427260B1 (de) Optimierte codierung und decodierung von verräumlichungsinformationen zur parametrischen codierung und decodierung eines mehrkanaligen audiosignals
EP3807669B1 (de) Ortung von schallquellen in einer bestimmten akustischen umgebung
EP3052958B1 (de) Verfahren zur positionsbestimmung einer schallquelle und humanoider roboter zur durchführung des verfahrens
WO2013166439A1 (en) Systems and methods for source signal separation
EP1479266A2 (de) Verfahren und vorrichtung zur steuerung einer anordnung zur wiedergabe eines schallfeldes
EP1502475B1 (de) Verfahren und system zum repräsentieren eines schallfeldes
EP3400599B1 (de) Verbesserter ambisonic-codierer für eine tonquelle mit mehreren reflexionen
WO2018115666A1 (fr) Traitement en sous-bandes d'un contenu ambisonique réel pour un décodage perfectionné
FR3009158A1 (fr) Spatialisation sonore avec effet de salle
FR3051959A1 (fr) Procede et dispositif pour estimer un signal dereverbere
EP2452293A1 (de) Quelllokation
Sharma et al. Development of a speech separation system using frequency domain blind source separation technique
FR2943867A1 (fr) Traitement d'egalisation de composantes spatiales d'un signal audio 3d
WO2022207994A1 (fr) Estimation d'un masque optimise pour le traitement de donnees sonores acquises
WO2009081002A1 (fr) Traitement d'un flux audio 3d en fonction d'un niveau de presence de composantes spatiales
FR2682252A1 (fr) Procede de traitement de signal et dispositif de prise et de restitution de son mettant en óoeuvre ce procede.

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210521

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20220804

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1546591

Country of ref document: AT

Kind code of ref document: T

Effective date: 20230215

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602019024868

Country of ref document: DE

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG9D

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20230125

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1546591

Country of ref document: AT

Kind code of ref document: T

Effective date: 20230125

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230525

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230425

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230525

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230426

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602019024868

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20231026

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20231229

Year of fee payment: 5

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230125

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20231219

Year of fee payment: 5

Ref country code: DE

Payment date: 20231221

Year of fee payment: 5