EP3895446A1 - Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung - Google Patents
Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtungInfo
- Publication number
- EP3895446A1 EP3895446A1 EP19816809.8A EP19816809A EP3895446A1 EP 3895446 A1 EP3895446 A1 EP 3895446A1 EP 19816809 A EP19816809 A EP 19816809A EP 3895446 A1 EP3895446 A1 EP 3895446A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- microphones
- sound field
- interpolation
- field
- interpolated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000004590 computer program Methods 0.000 title claims description 6
- 239000013598 vector Substances 0.000 claims abstract description 70
- 230000009466 transformation Effects 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 230000002441 reversible effect Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 230000006837 decompression Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000000265 homogenisation Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 7
- 239000005711 Benzoic acid Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- CDBYLPFSWZWCQE-UHFFFAOYSA-L sodium carbonate Substances [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 239000004334 sorbic acid Substances 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000004247 glycine and its sodium salt Substances 0.000 description 4
- 239000004245 inosinic acid Substances 0.000 description 4
- 238000009877 rendering Methods 0.000 description 4
- NLXLAEXVIDQMFP-UHFFFAOYSA-N Ammonium chloride Substances [NH4+].[Cl-] NLXLAEXVIDQMFP-UHFFFAOYSA-N 0.000 description 3
- 101000582320 Homo sapiens Neurogenic differentiation factor 6 Proteins 0.000 description 3
- 102100030589 Neurogenic differentiation factor 6 Human genes 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 239000004246 zinc acetate Substances 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 239000004220 glutamic acid Substances 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000010349 pulsation Effects 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/301—Automatic calibration of stereophonic sound system, e.g. with test microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- TITLE Sound field interpolation process, computer program product and corresponding device.
- the field of the invention is that of the interpolation of a sound (or acoustic) field having been emitted by one or more sources and having been picked up by a finite set of microphones.
- the invention has many applications, in particular, but not exclusively, in the field of virtual reality, for example to allow a listener to move in a sound scene which is returned to him, or in the field of analysis of sound scene, for example to determine the number of sound sources present in the analyzed scene, or in the field of the reproduction of a multichannel scene, for example within an MPEG-H 3D decoder, etc.
- a classic approach consists in estimating the sound field at the given position using linear interpolation between the fields as captured and encoded by the different stage microphones.
- the interpolation coefficients are estimated by minimizing a cost function.
- an ambisonic microphone encodes and delivers the sound field which it picks up in an ambisonic format.
- the ambisonic format is characterized by components which consist of the projection of the sound field according to different directivities. These components are grouped in order. The zero order encodes the instantaneous sound pressure picked up by the microphone, the one order encodes the three pressure gradients along the three axes of space, etc. The higher the order, the greater the spatial resolution of the representation of the field.
- the ambisonic format in its complete representation makes it possible to encode the field at any point inside the maximum sphere free of sound sources, and having as center the physical location of the microphone having performed the capture.
- Such encoding of the sound field theoretically makes it possible, from a single microphone, to move within the zone delimited by the source closest to the microphone, without however being able to bypass any of the sources in question.
- Such microphones thus make it possible to represent the sound field in three dimensions via a decomposition of the latter into spherical harmonics.
- This decomposition is particularly suitable for navigation called 3DoF (from the English "Degree of Freedom”), eg navigation according to the three dimensions. It is this format that was chosen for immersive content on the virtual reality channel of YouTube or on Facebook-360.
- the method must allow the sound field at the interpolation position to be estimated so that the field in question is consistent with the position of the sound sources. For example, a listener in the interpolation position must have the impression that the interpolated field actually arrives towards the sound source (s) of the sound scene which, when the field in question is returned (eg to allow the listener to navigate the sound scene).
- a method of interpolating a sound field picked up by a plurality of N microphones each delivering the encoded sound field in a form comprising at least one sensed pressure and a vector of gradients. associated pressure comprises an interpolation of the sound field at an interpolation position delivering an interpolated encoded sound field expressed as a linear combination of the N encoded sound fields each weighted by a corresponding weighting factor.
- the method further comprises an estimation of the N weighting factors from at least:
- the invention proposes a new and inventive solution for carrying out an interpolation of a sound field picked up by at least two microphones, for example in a scene comprising one or more sound source (s).
- the proposed method takes advantage of the encoding of the sound field in a form giving access to the vector of pressure gradients in addition to the pressure.
- the vector of pressure gradients of the interpolated field remains consistent with that of the sound field as emitted by the source or sources of the scene at the interpolation position.
- a listener in the interpolation position and listening to the interpolated field has the impression that the field which is restored to it is coherent with the sound source (s) (ie that the field which is returned actually arrives in the direction of the sound source (s) in question).
- the use of an estimated power of the sound field at the interpolation position to estimate the weighting factors makes it possible to keep a low computational complexity. This allows for example a real-time implementation on devices with limited computing capacity.
- the estimation implements a resolution of the equation
- the equation in question is solved in the sense of minimizing the mean square error, eg by minimizing the cost function .
- the resolution method eg the Simplex algorithm
- the overdetermined character more equations than microphones
- under determined more microphones than equations
- the resolution is carried out under the constraint that
- the resolution is also carried out under the constraint that the N weighting factors a ⁇ ⁇ t) are all positive or harmful. Thus phase reversals are avoided, thereby leading to improved results. In addition, the resolution of the above equation is accelerated.
- the estimation also implements a resolution of homogenization factor.
- the homogenization factor a is proportional to the standard L-2 of the vector x a (t).
- the estimate includes:
- the estimate l / a 2 (t) of the power of the sound field at the interpolation position is estimated from the instantaneous sound power W 2 (t) picked up by that of the N most microphones close to the interpolation position or from the estimate W 2 (t) of the instantaneous sound power W 2 (t) picked up by that of the N microphones closest to the interpolation position.
- the estimate l / a 2 (t) of the power of the sound field at the interpolation position is estimated from a barycenter of the N instantaneous sound powers W 2 t) picked up by the N microphones , respectively from a barycenter of the N estimates W 2 (t) of the N instantaneous sound powers W 2 (t) picked up by the N microphones.
- a coefficient weighting the instantaneous sound power W 2 (t), respectively weighting the estimate W 2 (t) of the instantaneous sound power W 2 (t) picked up by the microphone with index /, in the barycenter is inversely proportional to a normalized version of the distance between the position of the index / delivery microphone M ⁇ (t) and the interpolation position. The distance is expressed in the sense of a standard Lp.
- the interpolation method further comprises, prior to the interpolation, a selection of the N microphones from Nt microphones, Nt> N.
- the weighting factors can be obtained via a system of determined or overdetermined equations, thus making it possible to avoid or at least minimize the changes in timbre perceptible to the ear on the interpolated sound field.
- the N microphones selected are the closest to the interpolation position among the Nt microphones.
- the selection includes:
- the microphones are selected so as to be distributed around the interpolation position.
- the median vector Uu (t) is expressed as with x a (t) the vector representative of the interpolation position, Xi (t) a vector representative of the position of the microphone of index h, and; 2 (t) a vector representative of the position of the microphone of index 12.
- the index 13 of the third microphone is an index different from ii and 12 which minimizes the dot product among the Nt indices of microphones.
- the interpolation method further comprises, for an encoded sound field given from among the N encoded sound fields delivered by the N microphones, a transformation of the encoded sound field given by application of a bank of reconstruction filters perfect delivering M field frequency components associated with the given encoded sound field, each field frequency component among the M field frequency components being located in a separate frequency sub-band.
- the repeated transformation for the N encoded sound fields delivers N corresponding sets of M frequency field components.
- the interpolation delivers a frequency component of field interpolated in the position of interpolation and located in the given frequency sub-band, the frequency component of interpolated field is expressed as a linear combination of the N frequency components of field, among the N sets, located in the given frequency sub-band.
- the repeated interpolation for the M frequency sub-bands delivers M frequency components of the interpolated field in the interpolation position, each frequency component of the interpolated field among the M frequency components of the interpolated field being located in a separate frequency sub-band.
- the results are improved in the case where the sound field is generated by a plurality of sound sources.
- the interpolation method further comprises a transformation opposite to said transformation.
- the inverse transformation applied to the M frequency components of interpolated field delivers the encoded sound field interpolated in the interpolation position.
- the bank of filters with perfect reconstruction belongs to the group comprising:
- MDCT from “Modified Discrt Cosine Transform”.
- the invention also relates to a method for restoring a sound field.
- Such a method includes:
- the invention also relates to a computer program, comprising program code instructions for the implementation of an interpolation or restitution method as described above, according to any one of its different embodiments, when said program is executed by a processor.
- a device for interpolating a sound field picked up by a plurality of N microphones each delivering the encoded sound field in a form comprising at least one sensed pressure and a vector of associated pressure gradients comprises a reprogrammable calculation machine or a dedicated calculation machine, capable of and configured to implement the steps of the interpolation method described above (according to any one of its different embodiments).
- FIG. 1 represents a sound scene in which a listener moves, a sound field having been diffused by sound sources and having been picked up by microphones;
- FIG. 2 represents the stages of a process of interpolation of the sound field picked up by the microphones of [fig. 1] according to one embodiment of the invention
- FIG. 3a represents a scene in which a sound field is diffused by a single sound source and is picked up by four microphones according to a first configuration
- [fig. 3b] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 3a] as well as a map of the opposite of the normalized acoustic intensity as estimated by a known method from the quantities picked up by the four microphones of [fig. 3a];
- [fig. 3c] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 3a] as well as a map of the opposite of the normalized acoustic intensity as estimated by the method of the figure [fig. 2] from the quantities picked up by the four microphones in [fig. 3a];
- [fig. 4a] represents another scene in which a sound field is diffused by a single sound source and is picked up by four microphones according to a second configuration;
- [fig. 4b] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 4a] as well as a cartography of the opposite of the normalized acoustic intensity of the sound field as estimated by a known method from the quantities picked up by the four microphones of [fig. 4a];
- [fig. 4c] represents a map of the opposite of the normalized acoustic intensity in the 2D plane generated by the sound source of the scene in [fig. 4a] as well as a map of the opposite of the normalized acoustic intensity of the sound field as estimated by the method in Figure [fig. 2] from the quantities picked up by the four microphones in [fig. 4a];
- FIG. 5 represents the stages of a process of interpolation of the sound field picked up by the microphones of [fig. 1] according to another embodiment of the invention
- FIG. 6 represents the stages of a restitution process, to the listener of [fig. 1], of the sound field picked up by the microphones in [fig. 1] according to one embodiment of the invention
- FIG. 7 shows an example of an interpolation device structure according to an embodiment of the invention.
- the general principle of the invention is based on the encoding of the sound field by the microphones picking up the sound field in question in a form comprising at least one sensed pressure and an associated pressure gradient.
- the pressure gradient of the field interpolated via a linear combination of the sound fields encoded by the microphones remains consistent with that of the sound field as emitted by the source (s) of the scene at the interpolation position.
- the method according to the invention bases the estimation of the weighting factors involved in the linear combination in question on an estimation of the power of the sound field at the interpolation position.
- a low computational complexity is obtained.
- encoding (or coding) terminology is used to designate the operation of representing a physical sound field picked up by a given microphone according to one or more quantities according to a predefined representation format.
- a predefined representation format is for example the ambisonic format described above in relation to the section "Prior art and its drawbacks”.
- the reverse operation is then similar to a restitution of the sound field, e.g. on a loudspeaker type device which converts samples of the sound field in the predefined representation format into a physical sound field; and
- compression terminology is used to designate processing aimed at reducing the amount of data necessary to represent a given amount of information. This is, for example, a processing of the “entropy coding” type (eg according to the MP3 standard) applied to samples of the encoded sound field.
- the decompression terminology thus corresponds to the reverse operation.
- the listener 110 is provided with a headset equipped with HOhp speakers allowing the restitution of the interpolated sound field at the interpolation position which it occupies.
- a headset equipped with HOhp speakers allowing the restitution of the interpolated sound field at the interpolation position which it occupies.
- This is for example a Hi-Fi headset, or a virtual reality headset like the Oculus, the HTC Vive or the Samsung Gear.
- the sound field is here interpolated and restored by implementing the reproduction process described below in relation to [fig. 6]
- the sound field picked up by the 100m microphones is encoded in a form comprising a captured pressure and an associated pressure gradient.
- the sound field picked up by the microphones is encoded in a form comprising the pressure picked up, the vector of the associated pressure gradients as well as all or part of the higher order components of the sound field in format. ambisonic.
- the perception of the direction of arrival of the wave front of the sound field is directly correlated with an acoustic intensity vector / (t) which measures the instantaneous flow of acoustic energy through an elementary surface.
- the intensity vector in question is equal to the product of the instantaneous sound pressure W (t) by the particle velocity, which is opposite to the vector of the pressure gradients B (t).
- This vector of gradients pressure can be expressed in 2D or 3D depending on whether you want to move and / or perceive sounds in 2D or 3D. In the following, we place our in the 3D case, the derivation of the 2D case being immediate.
- this vector is orthogonal to the wave front and points in the direction of the propagation of the sound wave, ie opposite to the position of the emitting source: in this sense, it is directly correlated with perception of the wave front. This is particularly obvious if we consider a field generated by a single point and distant source s (t) propagating in an anechoic medium.
- the theory of ambinosia stipulates that, for such a plane wave of incidence (q, f), where é? Is the azimuth and elevation, the first order sound field is given by the following equation:
- the full-band acoustic intensity / (t) is equal (to within a multiplying coefficient), to:
- the method according to the invention implements the resolution of systems of equations (ie [Math 4] in different constraint alternatives (ie hyperplane and / or weighting factors) and [Math 5]).
- systems of equations ie [Math 4] in different constraint alternatives (ie hyperplane and / or weighting factors) and [Math 5]).
- the resolution of the systems in question in the case where they are under-determined (case which corresponds to the configuration where there are more 100m microphones than equations to be solved) leads to solutions which , over time, may favor different sets of microphones. If the location of the sources 100s as perceived via the interpolated sound field always remains consistent, it nevertheless results in changes in timbre perceptible to the ear.
- N microphones 100m are selected by reducing to a determined, even over-determined, mixture. For example, in the case of a 3D interpolation, it will be possible to select up to three microphones from among the Nt 100m microphones.
- the N microphones 110m closest to the position to be interpolated are selected. This solution is to be preferred when a large number Nt of 110m microphones is present in the scene. However, in certain cases, the choice of the N closest 110m microphones may prove to be "unbalanced" with regard to the position of interpolation with respect to the source 100s and lead to a complete inversion of the direction of arrival: c 'is particularly the case when the source 100s is placed between the microphones 100m and the interpolation position.
- step E200 includes for example:
- a (t) (x a (t) y a ⁇ t) z a (t)) T a vector representative of the interpolation position (ie the position of the listener 110 in the embodiment shown in [ fig. 1]);
- Xi (t) (Xi 1 vector representative of the position of the microphone of index ii;
- the index of said third microphone is for example an index different from ii and which minimizes the dot product among the Nt microphone indices
- the dot product in question varies between -1 and +1, and it is minimal when the vectors Uu (t) and are opposite, that is to say when the 3 microphones selected from the Nt microphones 110m surround the interpolation position.
- the selection step E200 is not implemented and the steps E210 and E210a described below are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m.
- N Nt for the implementation of steps E210 and E210a in the other embodiments in question.
- the method comprises a step E210 of interpolation of the sound field in an interpolation position, delivering an encoded interpolated sound field expressed as a linear combination of the N sound fields encoded by the N selected microphones 100m, the N fields encoded sound signals are each weighted by a corresponding weighting factor.
- the interpolation method according to the invention applies in the same way in order to estimate the weighting factors a * (t).
- the first order components are inversely proportional to the distance between the active source 100s and the measurement point, eg the microphone 100m with index /, and points from the active source 100s to the microphone 100m index / in question.
- x s (t) (x s (t) y s (t) z s (t)) T a vector representative of the position of the active source 100s;
- d (xi (t), x s (t)) is the distance between the microphone 100m with index / and the active source 100s.
- the first order component (i.e. the vector of pressure gradients) of the encoded sound field is oriented in the “source-point of capture” direction;
- the amplitude of the sound field decreases linearly with distance.
- the different positions mentioned above (eg of the active source 100s, microphones 100m, of the interpolation position, etc.) vary over time.
- the factors of weighting a ⁇ t) are generally a function of time.
- Estimating the weighting factors a ⁇ ⁇ t) amounts to solving a system of three linear equations (written above as a single vector equation in [Math 3]). So that the interpolation remains consistent over time with the interpolation position which can vary over time (eg if the position in question corresponds to the position of the listener 110 which is caused to move), it is carried out at different times with a time resolution T a adapted to the speed of change of the interpolation position.
- Wa the square of the sound pressure at the interpolation position, Wa (t ⁇ also called instantaneous acoustic power (or more simply instantaneous power), is an unknown, as well as the vector representative of the position x s (t) of the active 100s source.
- an estimate M 2 (t) of the sound power at the level of the interpolation position is for example obtained.
- a first approach consists in approaching the instantaneous sound power by that picked up by the microphone 100m closest to the interpolation position in question, i.e.:
- the instantaneous sound power Wj * (t) can vary rapidly over time, which can lead to a noisy estimate of the weighting factors a ⁇ ⁇ t) and to an instability of the interpolated scene.
- the average or effective power picked up by the microphone 100m closest to the interpolation position over a time window around the instant is calculated, by averaging the instantaneous power over a frame of T samples:
- T corresponds to a duration of a few tens of milliseconds, or even be equal to the temporal resolution of the refreshment of the weighting factors a ⁇ t.
- a w is determined in such a way as to integrate the power over a few tens of milliseconds.
- values from 0.95 to 0.98 for signal sampling frequencies ranging from 8 kHz to 48 kHz achieves a good compromise between the robustness of the interpolation and its reactivity to changes in position of the source.
- the instantaneous acoustic power V a 2 (t) at the interpolation position is estimated as a barycenter of the N estimates W t 2 (t) of the N instantaneous powers l / 2 (t) of the N pressures picked up by the N selected 100m microphones.
- W t 2 (t) of the N instantaneous powers l / 2 (t) of the N pressures picked up by the N selected 100m microphones is more relevant when the 100m microphones are spaced from each other.
- a coefficient weighting the estimate W t 2 (f) of the instantaneous power W 2 t) of the pressure sensed by the microphone 110m of index /, in the barycentric expression above is inversely proportional to a normalized version of the distance, within the meaning of the Lp standard, between the position of the index microphone / delivering the pressure W ⁇ t and the interpolation position.
- the instantaneous acoustic power M 2 (t) at the interpolation position is estimated directly as a barycenter of the N instantaneous powers W 2 t) of the N pressures picked up by the N microphones 100m. In practice, this amounts to substituting W 2 t) for W 2 (t) in the above equation.
- weighting factors a ⁇ t are estimated from:
- the resolution method e.g. the Simplex algorithm
- the character is overdetermined (more equations than microphones) or underdetermined (more microphones than equations).
- W t 2 (t) and l / a 2 (t) are for example estimated according to one of the variants proposed here. - above
- the resolution of such a linear system under linear stress can be carried out by the Simplex algorithm or any other algorithm of minimization under stress.
- the coefficient a makes it possible to homogenize the units of the quantities l / a 2 (t) a (t) and
- Wa (t) the quantities in question are not homogeneous and, depending on the unit chosen for the position coordinates (meter, centimeter, ...), the solutions will favor either
- the coefficient a is for example chosen equal to the standard L-2 of the vector
- weighting factors W ( ( ⁇ ) are estimated from:
- the four microphones 300m are placed at the four corners of a room and the source 300s is placed in the center of the room.
- the room has an average reverberation, with a reverberation time or Teo of around 500ms.
- the sound field picked up by the 300m microphones is encoded in a form comprising a captured pressure and the associated pressure gradient vector.
- the method comprises the step E200 of selecting N microphones from among the Nt microphones of the scene 100 described above in relation to [fig. 2].
- the selection step E200 is not implemented and the steps E500, E210 and E510 discussed below, are implemented on the basis of the sound fields encoded by all of the Nt microphones 100m.
- N Nt in these other embodiments.
- the embodiment in question is found to be suitable for the case where several sources among the sources 100s are active simultaneously.
- the hypothesis of a full band field resembling a plane wave is no longer valid. Indeed, even in an anechoic medium, the mixture of two plane waves is not a plane wave - except in the very specific case of the same source emitting from 2 points in space equidistant from the point of capture.
- the “full band” field reconstruction procedure adapts to the preponderant source in the frame used for the calculation of the effective powers. This produces rapid variations in directivity, and sometimes inconsistencies in the location of sources: when one source is more energetic than another, the two sources in question are estimated to be located at the position of the most energetic source.
- [fig. 5] exploits the parsimony of signals in the frequency domain.
- speech signals for example, it is statistically proven that the frequency carriers of several speech signals are globally disjoint: that is to say that most of the time, only one source is present in each band frequency.
- the embodiment of [fig. 2] (according to any one of the aforementioned variants) can thus be applied to the signal present in each frequency band.
- a transformation of the given encoded sound field is carried out by application of a time-frequency transformation like the transform Fourier or a bank of filters with perfect or almost perfect reconstruction, such as quadrature or QMF mirror filters.
- a transformation delivers M frequency components of field associated with the given encoded sound field, each frequency component of field among the M frequency components of field being located in a distinct frequency sub-band.
- the encoded field vector, y ⁇ delivered by the microphone with index /, / from 1 to N, is segmented into frames of index n, of size T compatible with the stationarity of the sources present in the scene:
- Y ⁇ (h) [y ⁇ ( ⁇ h - T + 1) xl i (t n - T + 2) ⁇ > i (tn)] ⁇
- the frame rate is for example the rhythm of updating T weighting factors a * (t), ie:
- each component of the vector y ⁇ representing the sound field encoded by the microphone 100m of index / (ie is applied to the sensed pressure, to the components of the vector of the pressure gradients, as well as to the components of higher order present in the sound field encoded if necessary, to produce a time-frequency representation.
- the transformation in question is a direct Fourier transformation. So, we obtain for the / -th component in of the vector Y ⁇
- M the number of frequency components M is equal to the size of the analysis frame T.
- the vector consisting of the set of components i /; ⁇ ; (h, w), (or Yu (h, k)) for the different / represents the frequency component of the field y ⁇ in the frequency sub-band w (or k) considered.
- the transformation applied in step E500 is not a Fourier transform, but a bank of filters with (almost) perfect reconstruction, for example a bank of filters:
- MDCT from “Modified Discrt Cosine Transform”.
- step E500 is repeated for the N sound fields encoded by the N microphones 100m selected, delivering N corresponding sets of M frequency field components.
- steps E210 and E210a described above in relation to [fig. 2] are implemented for each frequency sub-band among the M frequency sub-bands. More particularly, for a given frequency sub-band among the M frequency sub-bands, the interpolation delivers a frequency component of field interpolated in the position of interpolation and located in the given frequency sub-band.
- the interpolated field frequency component is expressed as a linear combination of the N field frequency components, among the N sets, located in the given frequency sub-band.
- the resolution of the systems of equations making it possible to determine the weighting factors is performed in each of the frequency sub-bands to produce a set of weighting factors per frequency sub-band a ⁇ h, w (or W ( (h, k)).
- the effective power in each frequency sub-band is estimated either by sliding average:
- the repeated interpolation for the M frequency sub-bands delivers M frequency components of the interpolated field in the interpolation position, each frequency component of the interpolated field among the M frequency components of the interpolated field being located in a separate frequency sub-band.
- a reverse transformation to the transformation applied during step E500 is applied to the M frequency components of interpolated field delivering the encoded sound field interpolated in the interpolation position.
- the reverse transformation applied during step E510 is an inverse Fourier transform.
- the sound field is picked up by the microphones 110m, each microphone among the microphones 110m delivering a corresponding picked up sound field.
- each of the captured sound fields is encoded in a form comprising the captured pressure and an associated pressure gradient vector.
- the sound field picked up by the 110m microphones is encoded in a form comprising the sensed pressure, an associated pressure gradient vector as well as all or part of the higher order components of the decomposed sound field. in ambisonic format.
- the restitution method comprises an interpolation phase E620 corresponding to the implementation of the interpolation method according to the invention (according to any of the embodiments and / or variants described below) above in relation to [fig. 2] and [fig. 5]) delivering the encoded sound field interpolated in the interpolation position, eg the position of the listener 110.
- the interpolated encoded sound field is compressed, e.g. by implementing entropy coding.
- a compressed interpolated encoded sound field is thus delivered.
- the compression step E630 is implemented by the device 700 (described below in relation to FIG. 7) which is removed from the 11Ohp rendering device.
- the compressed interpolated encoded sound field delivered by the device 700 is transmitted to the playback device 11Ohp.
- the compressed interpolated encoded sound field is transmitted to another device having a calculation capacity making it possible to decompress compressed content, eg a smartphone, a computer, or any other connected terminal with sufficient computing capacity, for later transmission.
- the compressed interpolated encoded sound field received by the llOhp playback device is decompressed in order to deliver the samples of the interpolated encoded sound field in the coding format used (ie in the format comprising at least the pressure sensed by the corresponding microphone 110m, the components of the pressure gradient vector, as well as the higher order components present in the sound field encoded if necessary).
- the coding format used ie in the format comprising at least the pressure sensed by the corresponding microphone 110m, the components of the pressure gradient vector, as well as the higher order components present in the sound field encoded if necessary.
- step E660 the interpolated encoded sound field is restored on the llOhp reproduction device.
- the interpolation position corresponds to the physical position of the listener 110
- the latter has the impression that the sound field which is restored to him is consistent with the sound sources 100s (ie that the field which is restored to him arrives effectively towards sound sources 100s).
- steps E630 of compression and E650 of decompression are not implemented.
- it is the raw samples of the interpolated encoded sound field which are transmitted to the llOhp reproduction device.
- the device 700 implementing at least the interpolation phase E620 is embedded in the 11Ohp rendering device.
- it is the samples of the encoded sound field (once compressed or not depending on the variants) which are transmitted to the llOhp playback device during step E640, and not the samples of the interpolated encoded sound field (once compressed or not depending on the variants).
- step E640 is implemented just after the steps E600 and E610 of capture and encoding.
- the device 700 comprises a random access memory 703 (for example a RAM memory), a processing unit 702 equipped for example with a processor, and controlled by a computer program stored in a read-only memory 701 (for example a ROM memory or a hard disc). On initialization, the code instructions of the computer program are for example loaded into the random access memory 703 before being executed by the processor of the processing unit 702.
- a random access memory 703 for example a RAM memory
- a processing unit 702 equipped for example with a processor
- a computer program stored in a read-only memory 701 for example a ROM memory or a hard disc
- This [fig. 7] illustrates only one particular way, among several possible, of producing the device 700 so that it performs certain steps of the interpolation method according to the invention (according to any one of the embodiments and / or variants described above in relation to [fig. 2] and [fig. 5]). Indeed, these steps can be carried out indifferently on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).
- a reprogrammable calculation machine a PC computer, a DSP processor or a microcontroller
- a program comprising a sequence of instructions
- a dedicated calculation machine for example a set of logic gates like an FPGA or an ASIC, or any other hardware module.
- the corresponding program (that is to say the sequence of instructions) may be stored in a removable storage medium (such as for example a floppy disk, CD-ROM or DVD-ROM) or not, this storage medium being partially or completely readable by a computer or a processor.
- a removable storage medium such as for example a floppy disk, CD-ROM or DVD-ROM
- the device 700 is also configured to implement all or part of the additional steps of the restitution process of [fig. 6] (e.g. steps E600, E610, E630, E640, E650 or E660).
- the device 700 is included in the llOhp rendering device.
- the device 700 is included in one of the microphones 110m or is duplicated in several of the microphones 110m.
- the device 700 is included in a remote device for both the 110m microphones and the llOhp playback device.
- the remote equipment is an MPEG-H 3D decoder, a content server, a computer, etc.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Circuit For Audible Band Transducer (AREA)
- Stereophonic System (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1872951A FR3090179B1 (fr) | 2018-12-14 | 2018-12-14 | Procédé d’interpolation d’un champ sonore, produit programme d’ordinateur et dispositif correspondants. |
PCT/EP2019/085175 WO2020120772A1 (fr) | 2018-12-14 | 2019-12-13 | Procédé d'interpolation d'un champ sonore, produit programme d'ordinateur et dispositif correspondants |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3895446A1 true EP3895446A1 (de) | 2021-10-20 |
EP3895446B1 EP3895446B1 (de) | 2023-01-25 |
Family
ID=66530214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19816809.8A Active EP3895446B1 (de) | 2018-12-14 | 2019-12-13 | Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung |
Country Status (4)
Country | Link |
---|---|
US (1) | US11736882B2 (de) |
EP (1) | EP3895446B1 (de) |
FR (1) | FR3090179B1 (de) |
WO (1) | WO2020120772A1 (de) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2588801A (en) * | 2019-11-08 | 2021-05-12 | Nokia Technologies Oy | Determination of sound source direction |
FR3131164B1 (fr) | 2021-12-16 | 2023-12-22 | Fond B Com | Procédé d’estimation d’une pluralité de signaux représentatifs du champ sonore en un point, dispositif électronique et programme d’ordinateur associés |
US20240098439A1 (en) * | 2022-09-15 | 2024-03-21 | Sony Interactive Entertainment Inc. | Multi-order optimized ambisonics encoding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140355769A1 (en) * | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Energy preservation for decomposed representations of a sound field |
US11032663B2 (en) * | 2016-09-29 | 2021-06-08 | The Trustees Of Princeton University | System and method for virtual navigation of sound fields through interpolation of signals from an array of microphone assemblies |
-
2018
- 2018-12-14 FR FR1872951A patent/FR3090179B1/fr active Active
-
2019
- 2019-12-13 US US17/413,229 patent/US11736882B2/en active Active
- 2019-12-13 EP EP19816809.8A patent/EP3895446B1/de active Active
- 2019-12-13 WO PCT/EP2019/085175 patent/WO2020120772A1/fr unknown
Also Published As
Publication number | Publication date |
---|---|
FR3090179B1 (fr) | 2021-04-09 |
FR3090179A1 (fr) | 2020-06-19 |
WO2020120772A1 (fr) | 2020-06-18 |
US11736882B2 (en) | 2023-08-22 |
EP3895446B1 (de) | 2023-01-25 |
US20220132262A1 (en) | 2022-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2898707B1 (de) | Optimierte kalibrierung eines klangwiedergabesystems mit mehreren lautsprechern | |
EP3895446B1 (de) | Verfahren zur interpolation eines schallfeldes und zugehöriges computerprogrammprodukt und vorrichtung | |
EP2374124B1 (de) | Verwaltete codierung von mehrkanaligen digitalen audiosignalen | |
EP2374123B1 (de) | Verbesserte codierung von mehrkanaligen digitalen audiosignalen | |
EP2002424B1 (de) | Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse | |
EP3427260B1 (de) | Optimierte codierung und decodierung von verräumlichungsinformationen zur parametrischen codierung und decodierung eines mehrkanaligen audiosignals | |
EP3807669B1 (de) | Ortung von schallquellen in einer bestimmten akustischen umgebung | |
EP3052958B1 (de) | Verfahren zur positionsbestimmung einer schallquelle und humanoider roboter zur durchführung des verfahrens | |
WO2013166439A1 (en) | Systems and methods for source signal separation | |
EP1479266A2 (de) | Verfahren und vorrichtung zur steuerung einer anordnung zur wiedergabe eines schallfeldes | |
EP1502475B1 (de) | Verfahren und system zum repräsentieren eines schallfeldes | |
EP3400599B1 (de) | Verbesserter ambisonic-codierer für eine tonquelle mit mehreren reflexionen | |
WO2018115666A1 (fr) | Traitement en sous-bandes d'un contenu ambisonique réel pour un décodage perfectionné | |
FR3009158A1 (fr) | Spatialisation sonore avec effet de salle | |
FR3051959A1 (fr) | Procede et dispositif pour estimer un signal dereverbere | |
EP2452293A1 (de) | Quelllokation | |
Sharma et al. | Development of a speech separation system using frequency domain blind source separation technique | |
FR2943867A1 (fr) | Traitement d'egalisation de composantes spatiales d'un signal audio 3d | |
WO2022207994A1 (fr) | Estimation d'un masque optimise pour le traitement de donnees sonores acquises | |
WO2009081002A1 (fr) | Traitement d'un flux audio 3d en fonction d'un niveau de presence de composantes spatiales | |
FR2682252A1 (fr) | Procede de traitement de signal et dispositif de prise et de restitution de son mettant en óoeuvre ce procede. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210521 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20220804 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1546591 Country of ref document: AT Kind code of ref document: T Effective date: 20230215 Ref country code: IE Ref legal event code: FG4D Free format text: LANGUAGE OF EP DOCUMENT: FRENCH |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602019024868 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20230125 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1546591 Country of ref document: AT Kind code of ref document: T Effective date: 20230125 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230525 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230425 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230525 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230426 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602019024868 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20231026 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231229 Year of fee payment: 5 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230125 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20231219 Year of fee payment: 5 Ref country code: DE Payment date: 20231221 Year of fee payment: 5 |