EP4248231A1 - Localisation perfectionnée d'une source acoustique - Google Patents
Localisation perfectionnée d'une source acoustiqueInfo
- Publication number
- EP4248231A1 EP4248231A1 EP21810072.5A EP21810072A EP4248231A1 EP 4248231 A1 EP4248231 A1 EP 4248231A1 EP 21810072 A EP21810072 A EP 21810072A EP 4248231 A1 EP4248231 A1 EP 4248231A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- vector
- velocity vector
- wall
- tau1
- peaks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000013598 vector Substances 0.000 claims abstract description 259
- 238000012545 processing Methods 0.000 claims abstract description 27
- 239000000203 mixture Substances 0.000 claims abstract description 13
- 230000005236 sound signal Effects 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 64
- 230000014509 gene expression Effects 0.000 claims description 56
- 230000002123 temporal effect Effects 0.000 claims description 55
- 235000017284 Pometia pinnata Nutrition 0.000 claims description 15
- 240000009305 Pometia pinnata Species 0.000 claims description 15
- 230000004807 localization Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 101150052012 PPP1R14B gene Proteins 0.000 claims description 8
- 101100013829 Zea mays PHI1 gene Proteins 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 230000001934 delay Effects 0.000 description 27
- 230000006870 function Effects 0.000 description 16
- 230000004044 response Effects 0.000 description 15
- 238000012360 testing method Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 8
- 230000003247 decreasing effect Effects 0.000 description 8
- 238000005192 partition Methods 0.000 description 7
- 230000001364 causal effect Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000002349 favourable effect Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 238000011282 treatment Methods 0.000 description 6
- 230000002452 interceptive effect Effects 0.000 description 5
- 230000000717 retained effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 241001637516 Polygonia c-album Species 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009897 systematic effect Effects 0.000 description 3
- 230000001052 transient effect Effects 0.000 description 3
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 229910003460 diamond Inorganic materials 0.000 description 2
- 239000010432 diamond Substances 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 108010046685 Rho Factor Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000009304 pastoral farming Methods 0.000 description 1
- 230000010363 phase shift Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/22—Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
- G01S3/802—Systems for determining direction or deviation from predetermined direction
- G01S3/803—Systems for determining direction or deviation from predetermined direction using amplitude comparison of signals derived from receiving transducers or transducer systems having differently-oriented directivity characteristics
- G01S3/8034—Systems for determining direction or deviation from predetermined direction using amplitude comparison of signals derived from receiving transducers or transducer systems having differently-oriented directivity characteristics wherein the signals are derived simultaneously
- G01S3/8036—Systems for determining direction or deviation from predetermined direction using amplitude comparison of signals derived from receiving transducers or transducer systems having differently-oriented directivity characteristics wherein the signals are derived simultaneously derived directly from separate directional systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S11/00—Systems for determining distance or velocity not using reflection or reradiation
- G01S11/14—Systems for determining distance or velocity not using reflection or reradiation using ultrasonic, sonic, or infrasonic waves
Definitions
- the present invention relates to the field of the localization of acoustic sources, in particular for the estimation of acoustic direction or "DoA" (Direction of Arrival) by a compact microphone system (for example a microphone capable of picking up sounds in “ambiophonic” or “ambisonic” representation below).
- DoA Direction of Arrival
- a possible application is for example the formation of channels or ("beamforming", which then passes through a spatial separation of audio sources, in order in particular to improve speech recognition (for example for a virtual assistant by voice interaction
- Such processing can also be used in 3D audio coding (pre-analysis of a sound scene to code the main signals individually), or even allow the spatial editing of immersive sound content possibly in audio-visual (to artistic vocation, radiophonic, cinema, etc.) It also allows the monitoring of speakers in teleconferences or the detection of sound events (with or without associated video).
- One category of methods is based on the analysis of the velocity vector V(f) or the intensity vector l(f) (the first being an alternative version of the second, normalized by the power of the omnidirectional reference component) , as expressed in Eq.6 and Eq.7.
- the imaginary part (reactive part associated with the energy gradient) is considered characteristic of stationary acoustic phenomena.
- the known method operates either on time samples filtered in sub-bands in which case they are real and the intensity vector is also, or on complex frequency samples in which case it is the part of the intensity vector which is only exploited as designating the direction of origin (or more precisely, its opposite).
- the calculation of a so-called “diffuseness” coefficient linked to the ratio between the norm of the vector and the energy of the sound field, makes it possible to decide whether the information available at the frequency considered is rather characteristic of directional components (at which the direction of the vector determines the location) or of an "ambience” (field resulting from diffuse reverberation and/or a mixture of undifferentiated secondary sound sources).
- WM Another method denoted “WM” below is based on the velocity vector and the statistics of the angular direction of its real part, weighted by certain factors linked to the ratio between real and imaginary parts and their standards.
- a spherical cartography (2D histogram, for example equi-rectangular) is established by collecting values on all the frequency samples and on a certain number of time frames. The estimate is therefore essentially based on a maximum probability and is subject to a certain latency.
- covariance Another category of so-called “covariance” methods, sometimes presented as an extension of the first, involves the calculation of a covariance matrix of the spatial components (also sometimes called the Power Spectral Density matrix or “ PSD”) by frequency sub-bands. Here again, the imaginary part is sometimes totally ignored. It should be noted that the first line (or first column) of this matrix is equivalent to the intensity vector when the spatial components are of the ambisonic type. Many of these approaches involve “subspace” methods and algorithms that are sometimes costly, especially when working on a large number of frequency subbands, and when exploiting higher spatial resolutions.
- PSD Power Spectral Density matrix
- vector-based or matrix-based methods attempt to discern the “directional” components associated with localizable acoustic sources or paths, on the one hand, from ambient components, on the other.
- the localization is generally biased by the systematic interference of the direct sound with reflections associated with the same acoustic source.
- it is the real part of the velocity vector which is mainly considered, while the imaginary part is usually ignored (or at least underused).
- Acoustic reflections, considered annoying, are not included in the estimation problem. They therefore remain an ignored component, not modeled, without taking into account the particular interference structures induced.
- the acoustic localization is generally estimated in angular terms only. Moreover, no effective approach seems to propose an evaluation of distance from a single pick-up point (considered as unique for a coincident or more generally "compact" microphone system, i.e. contained in a volume small in size compared to the distances from the acoustic sources, typically around ten centimeters for an ambisonic microphone).
- a particularly advantageous approach uses the velocity vector of the sound to obtain in particular the direction of arrival of the sound, its delay (thus the distance to the source), the delays linked to possible reflections and thereby determining partition positions.
- Such a realization makes it possible to model the interference between the direct wave and at least one indirect wave (resulting from reflection) and exploiting the manifestations of this model on the entire velocity vector (on its imaginary part as well as on its real part).
- a process for processing sound signals acquired by at least one microphone is proposed, for localization of at least one sound source in a space comprising at least one wall, in which:
- a generalized velocity vector V'(f) is expressed in the frequency domain, estimated from an expression of a velocity vector V(f) in which a reference component D(f), different from an omnidirectional component W(f), appears in the denominator of said expression, said expression being complex with a real part and an imaginary part, the generalized velocity vector V'(f) characterizing a composition between:
- At least one parameter is determined from among:
- the aforementioned generalized velocity vector V'(f) is constructed from the velocity vector V(f) which is generally expressed as a function of a component at denominator which is omnidirectional.
- the generalized velocity vector V'(f) replaces the "conventional" velocity vector V(f) within the meaning of the aforementioned document FR1911723, with then a "reference" component in the denominator which is different from an omnidirectional component, This reference component can indeed be more “selective" towards the direction of arrival of the sound.
- the direction of arrival of the sound making it possible to calculate the reference component can be obtained as a first approximation by using the conventional velocity vector V(f) by example during a first iteration of an iterative process gradually converging towards an exact DoA.
- the method may comprise, as indicated above, a plurality of iterations in at least part of which the generalized velocity vector V'(f) is used with, in its denominator, a reference component D (f) determined based on an approximation of the direction of the direct path (DoA) obtained at a previous iteration. In most situations, these iterations converge to a fairer DoA direction.
- Such a method can then comprise a first iteration in which the “conventional” velocity vector V(f) is used instead of the generalized velocity vector V′(f).
- the velocity vector V(f) is expressed in the frequency domain by making the omnidirectional component W(f) appear in the denominator. It is then possible to determine at least, at the end of this first iteration, a first approximation of the direction of the direct path (DoA).
- the generalized velocity vector V'(f) is used, estimated from an expression of the velocity vector V(f) in the denominator of which the omnidirectional component W(f) is replaced by the reference component D(f), the latter being spatially more selective than the omnidirectional component W(f).
- the reference component D(f) is more selective in a direction corresponding to the aforementioned first approximation of the direction of the direct path (DoA).
- the iterations can be repeated until convergence is reached according to a predetermined criterion.
- a predetermined criterion can be a criterion of causality to identify with a reasonable degree of certainty at least first sound reflections on obstacles (or "partitions" above) in the sound propagation environment between the microphone and a source.
- An inverse transform is also applied, from frequencies to time, to said expression of the generalized velocity vector V'(f) to obtain, in the time domain, a succession of peaks each linked to a reflection on at least one wall, in addition to a peak linked to an arrival of sound along said direct path (DoA), and
- the second case is followed by the implementation of the following steps, the acquired signals being delivered in the form of successive frames of samples:
- a score for the presence of a sound attack in the frame is estimated (in accordance for example with an equation of the type Eq.53 in the appendix), and
- the frames with scores higher than a threshold are selected to process the sound signals acquired in the selected frames.
- the “classic” velocity vector V(f) can be expressed in the frequency domain by ambisonic components of order 1 in a form of the type:
- V(f) 1/W(f) [X(f), Y(f), Z(f)] T ,
- V(f) being the omnidirectional component
- V'(f) expressed in the frequency domain by ambisonic components of order 1
- V(f) 1/D(f) [X(f), Y(f), Z(f)] T ,
- the order considered here is 1, which makes it possible to express the components of the velocity vector in a three-dimensional frame, but other embodiments are possible with in particular a higher ambisonic order.
- an estimate of the direction of the direct path assimilated to the first vector U0 can be determined from an average over a set of frequencies of the real part of the generalized velocity vector V'(f) expressed in the frequency domain (in accordance with the formalism of equation Eq.24 applied here of course to the generalized velocity vector V'(f)).
- V1 V'(TAU1) - ( (V'(TAU1) . V'(2.TAU1 )) / 1
- 2 ) V'(0), The vector U1 then being given by: U1 V1 /
- the space comprises a plurality of walls:
- peaks linked to a reflection on a wall of said plurality of walls are identified, each identified peak having a temporal abscissa function of a first delay TAUn of the acoustic path resulting from the reflection on the wall n corresponding, with respect to the direct path,
- At least one parameter is determined from:
- a first part of peaks can be pre-selected at the smallest positive temporal abscissas, to identify in this part the peaks each associated with a single reflection on a wall.
- the signals acquired by the microphone are in the form of a succession of samples, it is possible more generally to apply to these samples a weighting window with decreasing exponential variation over time (as will be seen later with reference to FIG. 5A).
- this window can be placed at the very start of the sound attack (or even just before the start of the attack). This avoids the discomfort of multiple reflections.
- the application of such a weighting window makes it possible to obtain a less biased first estimate of the parameters U0, dO, etc. resulting from the exploitation of the expression of the velocity vector in the time domain, in particular when it is a question of the “classic” velocity vector, for example within the framework of a first iteration of the method. Indeed, in certain situations where the cumulative magnitude of the reflections is greater than that of the direct sound, the estimation of the aforementioned parameters may be biased. These situations can be detected when peaks are observed at negative temporal abscissas (curve at the top of FIG. 5B) in the temporal expression of the velocity vector.
- the application of a weighting window of the aforementioned type makes it possible to reduce these peaks to positive abscissae as illustrated by the bottom curve of FIG. 5B, and to give less biased estimates.
- this embodiment is optional insofar as the use of the generalized velocity vector instead of the "conventional" velocity vector already allows a practically unbiased estimation of the parameters U0, dO, etc., including included in this type of situation. Nevertheless, such processing can take place for example for a first iteration of the method with the “conventional” velocity vector or even in the second case, mentioned above, of non-convergence of the iterative processing.
- q(f) exp (-
- Such an embodiment makes it possible to select the most usable frequency bands for determining the aforementioned parameters.
- the present invention also relates to a device for processing sound signals, comprising a processing circuit arranged for the implementation of the above method.
- FIG. 4 schematically presents such a processing circuit which can then comprise:
- an IN input interface to receive SIG signals acquired by the microphone (which may include several piezoelectric pads to compose these signals, for example in an ambisonic context),
- processor PROC cooperating with a working memory MEM to process these signals in particular to work out the expression of the generalized velocity vector in order to derive therefrom the desired parameters dO, U0, etc., the values of which can be delivered by the output interface OUT.
- Such a device can be in the form of a module for locating a sound source in a 3D environment, this module being connected to a microphone (sound antenna type, or other). Conversely, it can be a sound rendering engine based on a given position of a source in a virtual space (comprising one or more walls) in augmented reality.
- the present invention also relates to a computer program comprising instructions for implementing the above method, when these instructions are executed by a processor of a processing circuit.
- FIGS. 3A, 3B and 7 illustrate an example of a possible flowchart of the algorithm of such a program.
- a non-transitory recording medium readable by a computer, on which such a program is recorded.
- the generalized velocity vector and the "classic" generalized vector are designated without distinction by the same term “velocity vector”, with the same notation V (V(f); V(t )) notably in the equations presented in the appendix.
- V' V'(f); V'(t)
- FIG. 5B the principles underlying the formalism used in document FR1911723 and included here in the equations given in the appendix are recalled.
- Figure 1 shows, by way of illustration, various parameters involved in locating a sound source according to one embodiment.
- FIG. 2 shows by way of illustration the various successive peaks exhibited by the temporal expression of a velocity vector after its inverse frequency-to-time transform ("IDFT").
- IDFT inverse frequency-to-time transform
- Figure 3A shows the beginning steps of an algorithmic processing to determine the relevant parameters U0, dO, etc.
- FIG. 3B shows the continuation of the processing steps of FIG. 3A.
- Figure 4 schematically shows a device within the meaning of the invention according to one embodiment.
- Figure 5A shows a weighting window of the samples of the acquired signals, exponentially decreasing over time, according to one embodiment.
- Figure 5B compares a time expression after IDFT of the velocity vector:
- Figures 6A [Fig. 6A], 6B [Fig. 6B] and 6C [Fig. 6C], represent the shape of the linked peaks present in the temporal expression of the generalized velocity vector V′(t), as iterations of the method described later with reference to FIG.
- Figure 6D illustrates very schematically and by way of illustration the form of the reference component D(f) appearing in the denominator of the expression of the generalized velocity vector V'(f), over several successive iterations of the process, and
- Figure 7 schematically represents the steps of an iterative method within the meaning of the invention, according to an embodiment given here by way of example.
- the velocity vector can be calculated in a manner known per se. However, some specific settings can be recommended to improve the final results obtained.
- frequency spectra B(f) of Ambisonic signals are first typically obtained by Short Term Fourier Transform (or STFT for “Short Term Fourier Transform”) for a succession of time frames b(t), generally overlapping (with addition/overlap for example).
- a velocity vector is then calculated for all the frequency samples as a ratio of the directional components X(f), Y(f) and Z(f):
- the characteristic of the source signal is substantially removed to highlight the characteristics of the acoustic channel, if indeed the spectral composition of the audio signal excites a substantial amount of useful frequencies (e.g. over a wide frequency band).
- This impulse response is called SRIR for "Spatial Room Impulse Response" and is generally represented as a series of temporal peaks:
- the effect of interference between the direct sound and at least one reflection is inspected on the expression of the complex velocity vector, so as to estimate relevant parameters for defining the position of the sound source. .
- the frequency expression of the ambisonic field follows, if the later part, given by the expression Eq.16, is neglected.
- the short-term velocity vector is then expressed by the equation Eq.17, or even by the equation Eq.18 according to a regularized version with a non-zero EPSILON term so as to avoid values (quasi) infinite when the component in the denominator is (almost) zero.
- the component W (specific to the classical velocity vector) can be replaced by the reference component D to express the generalized velocity vector.
- the short-term analysis makes it possible to observe, over time and according to the dynamic evolution of the source signal, frequency fingerprints (denoted “FDW” below), characteristics of sub-mixtures of fronts d wave within the spatial impulse response.
- a characteristic submix (smx for “submix”), for a given observation, is modeled according to Eq.19, in the time and frequency domains.
- the implicit model h smx (t) is plausibly similar to the beginning of the spatial impulse response h early (t) at least in terms of directions and relative delays of fronts d 'wave.
- the implicit relative gain parameters g n being impacted by the temporal windowing and the dynamic characteristics of the signal, they do not occur necessarily conform to those of the impulse response.
- the affine segment (real part) is on a line containing the unit vectors U0 and U1 pointing to the direct and indirect waves respectively, and the two segments are orthogonal to the median plane of these two vectors (and thus the imaginary part of the vector l' is always itself since it is on the linear segment). Furthermore, assuming a homogeneous distribution of the phase shifts between the waves (therefore a representative sweep of the frequencies), according to a calculation of statistics, the average of the real part of the velocity vector is equal to the vector U0 as expressed in Eq.24 and the maximum probability is an average of U0 and U1 weighted by the respective amplitude of the waves as expressed in Eq.25.
- a solution to this problem consists in operating a time-frequency analysis that is more temporally selective (i.e. with shorter time windows) to have the chance of seeing a simpler acoustic mixture appear during the attacks.
- amplitude transients, signal rise
- the delays associated with successive reflections may be too close together to isolate the effect of the first reflection in its interference with the direct sound.
- a first step consists in converting the fingerprint of the velocity vector in the time domain (or “TDW”, for “Time-Domain Velocity Vector”), by means of an Inverse Fourier Transform as presented in Eq.31.
- TDW Time-Domain Velocity Vector
- This has the effect of condensing the effects of frequency cyclicity associated with certain axes and which manifest themselves in complex peregrinations of the velocity vector, in more parsimonious data and therefore more easily analysable.
- Such a conversion causes series of peaks to appear at regular time intervals, peaks of which the most important are easily detectable and extractable (see for example FIG. 5B).
- the attenuation parameter g1 (this being likely to be modified by the time-frequency analysis parameters, in particular by the shape of an analysis window and by its temporal placement relative to the acoustic events The estimation of this parameter is therefore of less utility in the application contexts considered here).
- Equation Eq.B2 For the generalized velocity vector V', we find a slightly different equation in equation Eq.B2 at the end of the appendix, this equation also being denoted Eq.35b because it corresponds to equation Eq.35 but given here for the generalized velocity vector. Moreover, at the end of the appendix are given the equations specific to the generalized velocity vector V' and the correspondence with a previously written equation specific to the classical velocity vector, is indicated by a "b" after the equation number ( Eq.xxb).
- Figure 2 illustrates such a series in the simplified case of two reflections interfering with a direct sound.
- Each marker (respectively round, cross, diamond) indicates by its ordinate the contribution of the vectors U0, U1, U2 (characteristics of the direct sound, of a first reflection and of a second reflection respectively) to the temporal imprint TDVV as a function of temporal abscissas. It can thus be seen that the reception of the direct sound is characterized by the first peak at time zero and of amplitude 1, illustrated by a circle.
- the interference of a first reflection (delay TAU1) with the direct path causes a first series of peaks in TAU1 , 2xTAU1 , 3xTAU1 , etc., which are marked here by a cross at one end and a circle at the other end (top-bottom).
- the interference of a second reflection (TAU2 delay) with the direct path causes a second series of peaks in TAU2, 2xTAU2, 3xTAU2, etc., marked here by a diamond at one end and a circle at the other end.
- an element of the "crossed series” that is to say the interference between the reflections (first delay: TAU1 +TAU2, then 2TAU1 +TAU2, then TAU1 +2TAU2, etc.).
- the first case is detected if there are positive integers k or partially null giving TAUnew according to Eq.40, otherwise we fall into the second case and the set of identified reflections is increased by introducing the new delay TAUN+1, associated with a direction which can be estimated in the manner described in the case of a single reflection.
- the inverse Fourier transform gives a “bidirectional” TDW time imprint, with series generally developing both towards the positive times and towards negative times (top curve of FIG. 5B for illustration).
- Such a situation in which one or more reflection gains would be greater than 1 can be encountered, for example, when the direct wave has an amplitude less than the sum of the amplitudes of the waves resulting from the reflections on one or more partitions.
- the main peak at time zero no longer corresponds strictly to the vector uO, but to a mixture of the latter with a more or less significant proportion of the vectors designating the directions of the reflections. This leads to a location bias (of the ‘estimated DoA’).
- Another symptom is that the main peak then has a norm different from 1 in general, and more often less than 1.
- the invention proposes a robust method, particularly in this type of situation. It proposes to adjust the expression of the velocity vector by conferring spatial selectivity towards the DoA on the D component which appears in its denominator instead of the usual omnidirectional component W.
- Equation Eq.B6 Starting from the equation Eq.B6 at the end of the appendix, a particular relationship is retained between two successive vectors of a series, in particular between the first two vectors V'(TAUn) and V'(2.TAUn) , the most salient.
- the equation Eq.B7 thus shows a factor (-Gn/BETAn) denoted here "RHO" and whose equation Eq.B8 proposes an estimate as a scalar product of the first two aforementioned vectors of the same series V'( TAUn) and V'(2.TAUn), this scalar product being divided by the squared norm of the first.
- Equation Eq.BI O By reintegrating the RHO factor into the equation Eq.B6, the obtained equation Eq.B9 can be rearranged to give the equation Eq.BI O.
- the vector Un in particular, the vector U1 if we focus on the first reflection and its associated series
- a factor NUO/Gn which is positive (except a priori rare situation such as a reflection with phase inversion): it can therefore be obtained by normalization of the left side V'(TAUn)-RHO.V'(0).
- the global factor NUO is likely to integrate other influencing factors than the reference directivity, for example an overall reduction in amplitude which could be caused by a limitation of the frequency bandwidth of the signal source, and/or its partial masking by noise (although the latter effect is generally more complex to model). It is interesting to note that finally, the direction of the vector U1 (or more generally Un) can be estimated in the same way and this for any cause of this overall reduction in amplitude NUO.
- step S721 calculations presented above are carried out from the frequency expression of the classical velocity vector V(f) until the estimation of the temporal expression of the classical velocity vector V(t ).
- step S731 the analysis of the temporal expression of the classical velocity vector V(t) is carried out as a time series of peaks.
- a modified velocity vector V′(f), then V′(t) in the time domain is calculated (step S781 ), on the basis of these filtered ambisonic data.
- the method can be reiterated again by taking the estimate of the coarse DoA(n) obtained previously (S752) to determine a reference component D(f) (denoted D(n) for the iteration n of the method at step S762) whose directivity makes it possible to represent the direction of the direct sound more selectively than its estimate D(n-1) at the previous iteration and to replace (S772) the latter D(n- 1 ) in the estimation of the generalized velocity vector V'(f), then V'(t) at step S782.
- the directivity of a reference component “captures” the estimated direction of the direct sound more selectively than a reference component at a previous iteration.
- the process can thus be repeated until the peaks at negative times are lower in amplitude or in energy than the chosen threshold THR, as illustrated in FIG. 6C.
- the first step S71 begins with the construction of the conventional velocity vector in the frequency domain V(f) with the omnidirectional component W(f) in its denominator.
- V(t) in the time domain is estimated.
- a signal structure is identified representing the temporal expression of the classical velocity vector V(t) with peaks such that the NRJ energy of this signal structure at negative temporal abscissas (t ⁇ 0) remains below a fixed threshold THR (arrow KO), then the present acoustic situation already makes it possible to derive an unbiased DoA directly from the classical velocity vector.
- step S781 the temporal expression of the generalized vector V'(t) is estimated to determine in test S732 whether there remains significant energy (greater than the threshold THR) in the structure of the signal of this expression V'(t ) at negative time abscissas. If such is not the case (arrow KO at the output of test S732) the process can stop at this first iteration by giving the DoA parameters, etc. at step S742. Otherwise, the method is repeated, by updating the index n of iterations of the method, at step S791 (here, the steps referenced S79x being relative to the iterations of the method, such as the incrementation of the index n ( step S793) or the determination of the termination of the process S792-S794).
- a new reference component D(n) is estimated at step S762, to replace the old reference component D (n-1) to the denominator of the generalized velocity vector V'(f) in step S772.
- its expression in the time domain V'(t) is determined in step S782.
- the comparison of its signal structure (in energy with respect to the threshold THR) to the test S733 is repeated, to determine whether the new DoA which can be estimated therefrom would be biased or not. If this is not the case (arrow KO at the output of the S733 test), then the parameters, in particular of DoA, etc. can be obtained.
- step S743 here after three iterations as in the illustrative example of FIGS. 6A to 6C. Otherwise (OK arrow at the output of test S733), the method must be reiterated again from step S752 with the last estimated DoA, even coarse and possibly biased.
- the 0 component ( ) typically derives from a matrixing (or a vector-weighted sum D 0 ) of the ambisonic components: [0140] WHERE
- - B(f) is a vector of signals describing an ambisonic field in the frequency domain such as for example in the case of a plane wave carrying a signal S(f) and coming from a direction described by the unit vector so that for the mixture of direct and indirect waves considered we have is the vector of the spherical harmonic coefficients
- - D 0 can be a vector of the 'steering vector' type causing a beamforming ("formation of way” or “formation of beam” in French) oriented in general in a particular direction 0 (which can also be designated by a unit vector so that by defining a 'steering gain' such as:
- the spherical function g O (un) is axially symmetric, the available degrees of freedom only influencing the proportion between positive and negative lobes (if any) in addition to the orientation.
- the main lobe (a priori in the targeted direction) does not necessarily have symmetry, and secondary lobes can have more varied shapes and orientations. So despite the notation D 0 , the beamforming is not only parameterized by the targeted main direction 0.
- the gain is then expressed as a polynomial Pbeamshape(.) of the product scalar (a variant of Legendre polynomial):
- the gbeamshape diagonal coefficients can take into account: - on the one hand, the choice of ambisonic encoding convention, therefore the calculation of spherical harmonic functions
- the iterations of the method can stop at step S794. If not (OK arrow at the output of test S792), the method can be executed for a following iteration at step S793 starting with an incrementation of the iteration counter n.
- the frames where signal attacks are localized are those which make it possible to observe an acoustic mixture involving only the earliest wave fronts: the direct sound and one or more reflections ( so that the aforementioned “sum of the gammas” remains less than 1 according to Eq.38).
- the temporal imprint is of a unidirectional nature, which is evidenced by peaks only for positive times after application of the decreasing exponential window (lower part of FIG. 5B). It is also observed that in practice the energy of the observed signal decreases very quickly with the exponential, the numerical impact - on the estimates - of a truncation of the said signal becomes quite negligible beyond a relatively short time. of truncation. In other words, one obtains in the shorter term the advantages of a long-term analysis which encompasses both the entire exciter signal and its reverberation. Indeed, the observed “TDW” conforms to the interference model without the errors due to the dynamics of the signal. It is therefore a double property possessed by the weighting by such a window, which ideally makes it possible to obtain an exploitable temporal imprint.
- the ALPHA attenuation preferably by seeking a compromise between a sufficiently low value to ensure one idirectionality of the imprint temporal and a value that is not too low to avoid reducing the chances of detecting and estimating indirect waves.
- An iterative process (for example by dichotomy) can be implemented to adjust the attenuation value. From a threshold value of attenuation, when the temporal imprint obtained is detected as bidirectional, therefore a priori with a biased vector U0, the analysis is repeated with a stronger attenuation, otherwise at least the estimate of U0 and if the following peaks are not very discernible (because reduced with the attenuation) then the analysis is repeated with an intermediate attenuation between the two previous ones, and so on if necessary until the vector U1 can be estimated.
- An initial DFT size that is generally larger than this analysis window is chosen.
- time-frequency de-noising for example by defining a mask (time-frequency filter, possibly binary), so as to avoid introducing elements from other ambient and/or diffuse field sources into the interference footprint. It is necessary to calculate the impulse response of the mask (result of the inverse transform) to control the influence of the mask on the analysis of the peaks. It can alternatively be integrated into a frequency weighting of the imprint of a frame considered to be stored, so as to subsequently calculate a weighted average of frequency imprints corresponding a priori to similar interfering mixtures (typically on signal attacks, by checking that the source concerned has not moved, which can be guessed through an estimation of the delays).
- the peaks are then extracted and observed, for example according to the norm
- the peak at time zero is by construction equal to the average of the velocity vector over its complete spectrum (the real part canceling out by Hermitian symmetry), or even its real part if we do not considers only positive frequencies. We can estimate that it is then useless to calculate an Inverse Transform of the FDW to have an estimate of DoA if we are only interested in the direct sound. However, the temporal examination of the TDW makes it possible to detect whether this DoA is reliable (criterion of development towards positive and increasing times).
- the frequency and time imprints of the W are not always identifiable with an ideal model of interfering wave mixing. It may be that the source signal does not sufficiently or not always excite a significant range of frequencies at key moments, due to a lack of transmitted power, possibly taking into account competition from other components of the sound field captured (insufficient SNR or SIR) . This can be linked to a more or less diffuse background sound (other sound sources), microphonic noise.
- step S1 the Fourier transform is calculated (from time to frequency) of the Ambisonic signals which may be in the form of a succession of “frames” (blocks of successive samples).
- step S2 a dynamic mask can be applied to some of the frequency bands for which the signal to noise ratio is below a threshold (some frequency bands can in fact be highly noisy, for example by a noise inherent in the microphone or other, so that the exploitation of a signal picked up in this frequency band is compromised).
- the search for noise by frequency band is carried out in step S3 preferentially on the "omni" component W, and the frequency bands altered by the noise (beyond a threshold for example such as SNR ⁇ OdB) are masked (that is to say set to zero) in step S4.
- a threshold for example such as SNR ⁇ OdB
- step S5 the velocity vector V(f) is calculated in the frequency domain, for example by equation Eq.6 (or even in the form of Eq.11, Eq.18 or Eq.20 ).
- weights q(f) calculated as described below are applied to give more or less importance to frequency bands f.
- the optimum weights are iteratively calculated as a function of U0 and V(f).
- the various weights q(f) are set to 1.
- U0 is determined for each frame k, such that:
- this first estimate of U0(k) is rough. It is refined iteratively by calculating the weights with respect to the previous determination of U0(k) using the equation Eq.49 based on the imaginary part of the vector V(f), and where the vector m is a unit vector , normal to the plane defined by the vector U0 and a normal to the wall (the direction z of figure 1 for example).
- the vector m is estimated iteratively also as a function of U0 in step S9, then the weights are calculated by Eq.49 in step S10.
- the weights found are applied in step S7, and the estimation of U0 is refined until convergence at the output of the test S11. At this stage, U0(k) was estimated for the different frames.
- U1 can be deduced therefrom, by a relation of the Eq.41 type described above.
- U1 is determined by the equations Eq.50 to Eq.52, having previously applied an inverse transform IDFT (from frequency to time) in step S12 to the vector Vbar(f) found in step S7, to obtain a time representation V(t) of the velocity vector.
- IDFT from frequency to time
- Vbar(f) found in step S7
- the various delays, TAU1 , then TAU2 are then determined in step S15 (by removing modules from V(t)k to be compared in Eq.51 with those corresponding to the delay TAU1 ), etc.
- the delay TAUm is given by the component tmax found at each iteration m, divided by the sampling frequency fs according to Eq.52, taking into account that the times t and tmax(k) are first expressed in terms of the index sample (time zero being taken as a reference for the zero index).
- the step S18 may consist in further selecting the “good” frames, representative of a sound attack with first reflections.
- the criterion D(k) for selecting such frames can be illustrated by way of example by equation Eq.53 where C(f)i (k) designates a magnitude (amplitude in absolute value) detected on the ambisonic channel i , to the time-frequency sample (t, f) resulting from the first transform (time to frequency) of frame k.
- Epsilon designates a nonzero positive value to avoid a zero in the denominator in the absence of a signal.
- F designates the total number of frequency sub-bands used.
- step S22 It is thus possible to retain at step S22 only the frames whose criterion D(k) calculated from Eq.53 is not smaller than 90% of the maximum Dmax found at step S21 among the criteria of all frames D(k).
- the values D(k) are calculated for all the frames, then at step S19, the processing delivers the U0(k), dO(k), D(k ) for the different frames.
- the values D(k) are collected to identify at step S21 the highest and to eliminate at step S22 the frames whose value D(k) is less than 0.9 Dmax.
- the vector U0 which is retained is preferably here the median (rather than the mean) among the vectors U0 of the different frames retained.
- the distance d0 retained is also the median value among the distances d0 of the various frames retained.
- the velocity vector can be replaced by a ratio between components of a “coincident” type spatial acoustic representation in the frequency domain and work in a coordinate system characteristic of said spatial representation.
- TDVV can be used more generally in association with Artificial intelligence methods, including neural networks.
- Some training strategies envisaged (for example on fingerprints from models or windowed SRIR, and not necessarily from original signals) can allow the network to learn to exploit the succession of frames to improve detections and estimates compared to to given room situations.
- step S761 where D(1) would simply be calculated as a function of the angular sector considered), these iterative methods being carried out in parallel for each of these sectors.
- the parameters to be optimized are the beamforming parameters, formulated for example as the coefficients of the matrix D involved in the equations Eq. A1 and A4, or as the direction Theta (0) and the beam shape parameters (gbeamshape) of equation EQ.A5 when opting for axial symmetry or in the case where one is restricted to the order 1;
- DoA a principle of adjustment of the parameters tested (during the iterations) because typically a reorientation of the acoustic beam in the last estimated DoA is not always a sufficiently robust choice: it is then necessary, rather than stopping the algorithm for lack of improvement, start again from one of the stored situations (and a priori the best from the point of view of the minimization criterion) and adjust the parameters along another axis (for example a directivity shape parameter) or another combination of axes.
- another axis for example a directivity shape parameter
- the parameters ultimately targeted are not directly those which are optimized, but they derive therefrom. It should be noted that there is then potentially a multiplicity of sets of beamforming parameters which could all be 'optimal' in the sense that they induce compliance with a causal model. Consequently, they all then make it possible to deduce the same set of parameters Un in an a priori exact manner.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR2011874A FR3116348A1 (fr) | 2020-11-19 | 2020-11-19 | Localisation perfectionnée d’une source acoustique |
PCT/FR2021/051801 WO2022106765A1 (fr) | 2020-11-19 | 2021-10-15 | Localisation perfectionnée d'une source acoustique |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4248231A1 true EP4248231A1 (fr) | 2023-09-27 |
Family
ID=75108412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21810072.5A Pending EP4248231A1 (fr) | 2020-11-19 | 2021-10-15 | Localisation perfectionnée d'une source acoustique |
Country Status (7)
Country | Link |
---|---|
US (1) | US20240012093A1 (ja) |
EP (1) | EP4248231A1 (ja) |
JP (1) | JP2023550434A (ja) |
KR (1) | KR20230109670A (ja) |
CN (1) | CN116472471A (ja) |
FR (1) | FR3116348A1 (ja) |
WO (1) | WO2022106765A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3102325A1 (fr) * | 2019-10-18 | 2021-04-23 | Orange | Localisation perfectionnée d’une source acoustique |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2998438A1 (fr) * | 2012-11-16 | 2014-05-23 | France Telecom | Acquisition de donnees sonores spatialisees |
FR3067511A1 (fr) * | 2017-06-09 | 2018-12-14 | Orange | Traitement de donnees sonores pour une separation de sources sonores dans un signal multicanal |
FR3081641A1 (fr) * | 2018-06-13 | 2019-11-29 | Orange | Localisation de sources sonores dans un environnement acoustique donne. |
-
2020
- 2020-11-19 FR FR2011874A patent/FR3116348A1/fr active Pending
-
2021
- 2021-10-15 EP EP21810072.5A patent/EP4248231A1/fr active Pending
- 2021-10-15 CN CN202180078003.5A patent/CN116472471A/zh active Pending
- 2021-10-15 KR KR1020237019883A patent/KR20230109670A/ko unknown
- 2021-10-15 WO PCT/FR2021/051801 patent/WO2022106765A1/fr active Application Filing
- 2021-10-15 US US18/251,967 patent/US20240012093A1/en active Pending
- 2021-10-15 JP JP2023530282A patent/JP2023550434A/ja active Pending
Also Published As
Publication number | Publication date |
---|---|
JP2023550434A (ja) | 2023-12-01 |
US20240012093A1 (en) | 2024-01-11 |
WO2022106765A1 (fr) | 2022-05-27 |
FR3116348A1 (fr) | 2022-05-20 |
KR20230109670A (ko) | 2023-07-20 |
WO2022106765A8 (fr) | 2023-05-04 |
CN116472471A (zh) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3807669B1 (fr) | Localisation de sources sonores dans un environnement acoustique donné | |
EP3822654B1 (en) | Audio recognition method, and target audio positioning method, apparatus and device | |
JP4406428B2 (ja) | 信号分離装置、信号分離方法、信号分離プログラム及び記録媒体 | |
US7626889B2 (en) | Sensor array post-filter for tracking spatial distributions of signals and noise | |
EP3635718B1 (fr) | Traitement de donnees sonores pour une separation de sources sonores dans un signal multicanal | |
EP3281026B1 (fr) | Procédé de séparation de sources pour signaux parcimonieux | |
EP2680262A1 (fr) | Procédé de débruitage d'un signal acoustique pour un dispositif audio multi-microphone opérant dans un milieu bruité | |
EP4046390A1 (fr) | Localisation perfectionnee d'une source acoustique | |
Foy et al. | Mean absorption estimation from room impulse responses using virtually supervised learning | |
EP4248231A1 (fr) | Localisation perfectionnée d'une source acoustique | |
EP3025342B1 (fr) | Procédé de suppression de la réverbération tardive d'un signal sonore | |
EP0410826B1 (fr) | Procédé itératif d'estimation de mouvement, entre une image de référence et une image courante, et dispositif pour la mise en oeuvre de ce procédé | |
JP2019054344A (ja) | フィルタ係数算出装置、収音装置、その方法、及びプログラム | |
WO2023156316A1 (fr) | Localisation d'une source acoustique en mouvement | |
CN115267672A (zh) | 声源检测和定位的方法 | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
Traa | Multichannel source separation and tracking with phase differences by random sample consensus | |
Al-Ali et al. | Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments | |
CN111627425B (zh) | 一种语音识别方法及系统 | |
WO2022207994A1 (fr) | Estimation d'un masque optimise pour le traitement de donnees sonores acquises | |
EP3934282A1 (fr) | Procédé de conversion d'un premier ensemble de signaux représentatifs d'un champ sonore en un second ensemble de signaux et dispositif électronique associé | |
CN116756486A (zh) | 基于声光电磁多源数据融合的海上目标识别方法及装置 | |
Aparicio | A Geometric Deep Learning Approach to Sound Source Localization and Tracking | |
Díaz-Guerra Aparicio et al. | A Geometric Deep Learning Approach to Sound Source Localization and Tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230512 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ORANGE |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) |