WO2018029341A1 - Acoustic environment mapping - Google Patents
Acoustic environment mapping Download PDFInfo
- Publication number
- WO2018029341A1 WO2018029341A1 PCT/EP2017/070429 EP2017070429W WO2018029341A1 WO 2018029341 A1 WO2018029341 A1 WO 2018029341A1 EP 2017070429 W EP2017070429 W EP 2017070429W WO 2018029341 A1 WO2018029341 A1 WO 2018029341A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sensor array
- source
- estimates
- processing unit
- central processing
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
Definitions
- the present invention relates to estimation of loudspeaker topology, i.e. to determine a map of an acoustic environment including a set known sources and at least one unknown source.
- the present invention relates to estimation of loudspeaker positions relative to each other and to a listener. Background
- the listening experience is highly influenced by the position of the loudspeakers relative to the listener and the room in which they are placed.
- the two loudspeakers and the listener should ideally be placed on the vertices of an even sided triangle, and in a surround set-up, the loudspeakers should be placed at certain angles on a circle centered on the listening position.
- the loudspeakers and listener(s) are often not be placed in their ideal positions since other interior design considerations often take higher priority.
- signal processing algorithms can to a certain extent compensate for the non-ideal positions.
- an acoustic environment includes a set of loudspeakers 1 in an acoustic enclosure 2 such as a room, and a listener at a given position 3 in the enclosure 2.
- the audio rendering should take the position of the loudspeakers and the listening position into account.
- a map containing the position of the loudspeakers and the listening position is required.
- the map should also contain the orientation of the loudspeakers.
- a map including the positon (and possibly also orientation) of the loudspeakers in an acoustic environment is not only useful in the above scenario. Future consumer audio will to a much wider extent be object-based, in which case it becomes more important for the rendering system to know the speaker layout.
- the typical approach to compensate for a non-ideal system layout is to measure the transfer function from all loudspeakers to the listening position. Unfortunately, this is quite inconvenient for the listener, and this approach does not directly create a speaker and listener map that can be used for rendering object-based audio faithfully.
- a method for estimating a map of an acoustic environment including a set of S known sources comprising:
- each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;
- central processing unit being synchronized with said known sources and with said sensor arrays
- a system for estimating a map of an acoustic environment including a set of known sources comprising:
- each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;
- each sensor array is configured to:
- the central processing unit is configured to align all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.
- a set of M sensor arrays are used to create a map containing the positions and orientation of the sensor arrays, and the position of a set of known sources (e.g. loudspeakers).
- the "known" sources are thus synchronized with the sensor arrays and with the central processing unit.
- all loudspeakers are connected to the same system and synchronization within a few tens of a microsecond is typically required for playback of audio.
- the synchronized sensor array can determine distance as well as direction. Accordingly, the sensor array can determine the position of the source in its own, local coordinate system.
- the sensor arrays and sources are connected to the central processing unit (referred to as the master).
- the master central processing unit
- the required data rate can typically be provided by available connections, without the need for high speed data transmission.
- the sensor arrays and the known sources do not need to be co-located, and the orientation of the sensor arrays may be unknown.
- at least one sensor array is co- located with one of the known sources. Thereby, the estimated orientation of the sensor array can be used to determine the orientation of the co-located known source.
- each sensor array includes at least three microphones when the positions are estimated in two-dimensional space, and at least four microphones when the positions are estimated in three-dimensional space.
- the acoustic environment further includes one or several unknown sources (e.g. a user in a listening position), and the method further includes:
- DOA direction of arrival
- determining the position of each unknown source in the map based on the DOA estimates using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.
- Figure 1 schematically shows an acoustic environment.
- Figure 2 schematically shown an acoustic environment according to an embodiment of the present invention.
- Figure 3 is a block diagram of a sensor array in figure 2.
- Figure 4 is a flow chart of a method according to an embodiment of the present invention.
- Figure 5 shows a map of sensor arrays and known sources determined according to an embodiment of the invention.
- Figure 6 is a more detailed view of a part of the map in figure 5.
- Figure 7 shows a map of sensor arrays, known sources and one unknown source determined according to an embodiment of the invention.
- Figure 8 shows a map of sensor arrays, known sources and two unknown sources determined according to an embodiment of the invention.
- an acoustic environment includes a set of S sources (loudspeakers) 1 placed in an acoustic enclosure such as a room 2.
- a set of M sensor arrays 4 have been placed in the environment.
- each sensor array 4 can include a plurality of microphones 5 and processing circuitry 6 for processing sound signals received by the microphones 5.
- a central processing unit 7, hereinafter referred to as "master”, is connected to and synchronized with each sensor array 4.
- the sources 1 are also synchronized with the sensor arrays 4, and will therefore be referred to as "known" sources.
- the environment may further include one or several "unknown” sources 8, i.e. sources which are not synchronized with the central
- An example of such an "unknown" source is a listener located at a given listening position 3.
- the S known sources 1 emit a known source signal (step
- step S1 while the M sensor arrays 4 estimate the position of these sources in their own local coordinate system 1 1 (step S2).
- the sensor arrays 4 also compute a quality matrix describing the accuracy of the estimated positions (step S3).
- step S4 the position estimates of the S sources in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices.
- the master 7 now rotates and translates the local coordinate systems so that they fit (are aligned) as well as possible (step S5).
- the quality matrices are used in this process to ensure that the most accurate estimates have a higher weight than the less-accurate estimates.
- the result of the process is a map of all sensor arrays 4 and known sources 1 in a common coordinate system 12.
- step S6 the R unknown sources 8 emit an unknown source signal while M sensor arrays 4 estimate the direction of arrival (DOA) in their own coordinate system 1 1 (step S7).
- the unknown source signals may be a talking person or a mobile phone. Since the source signal is non-synchronized and often in the far-field of the arrays, only the DOA and not the distance is estimated. Again, a quality matrix is computed describing the accuracy of the estimated DOAs (step S8).
- step S9 the DOA estimates of the R unknown sources 8 in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices.
- the master 7 now finds the R points that best describe the estimated DOAs (step S10).
- the quality matrices are used in this process to ensure that the most accurate estimates have a higher weight than the less-accurate estimates.
- the result of the process is a map of all sensors and sources emitting a known or an unknown source signal.
- S, M, and R there is no upper bound on the values of S, M, and R. However, at least two sensor arrays 4 are required to estimate the position of an unknown source 8. If only one sensor array 4 is present, only the direction of arrival (DOA) of an unknown source can be estimated. With two or more sensor arrays 4, the position of an unknown source 8 can be triangulated.
- DOA direction of arrival
- Figure 5 and 6 show an example of a map including four UCA sensor arrays, and four known sources (i.e. all source signals are assumed known and synchronized with the sensor arrays).
- the map has been determined with a signal-to-noise ratio of 20 dB, a data length of 999, and a sampling frequency of 4 kHz.
- the true positions of the sources and sensors are marked by dots and crosses, respectively.
- the estimated sensor positions are marked by circles and the estimated source positions are marked by stars.
- the quality matrices are illustrated as lines (which in fact are compressed ellipses) which indicate the one standard deviation uncertainty contours. Thus, bigger ellipses suggest more uncertain estimates and vice versa.
- Figure 6 shows a part of figure 5 in which the ellipses for one source are easier to see. Two ellipses are much bigger than the other two and this is hardly surprising since the corresponding sensor arrays are further away from this source.
- step 3 above (steps S6-S8 in figure 4) is a part of step 1 , and step 4 (steps S9-S10) can be omitted.
- An illustration of such a joint estimation is given in figure 7 and 8.
- Step 1 Source localisation
- x is the source position
- d is the distance to the source in meters
- ⁇ is the DOA in radians
- ⁇ is the gain for the source to the sensor array
- N is the number of
- K is the number of sensors in the array
- r is the array radius in meters
- c is the propagation speed
- ⁇ 2 is the estimated noise variance.
- the difference between how accurately we can estimate the x- and y-coordinates of x varies with ⁇ and can be huge.
- Step two Combining independent source localisation estimates
- the algorithm for combining the independent source localisation estimates into one map is described below.
- the algorithm is a variant of weighted generalised orthogonal Procrustes analysis.
- e m vec(E m ) is a white Gaussian noise vector with an unknown variance
- W m is a block diagonal matrix of the form
- the weighting matrix V mn is a 2 x 2 matrix proportional to the inverse FIM in (1.1) (see step one above). Combining these two e uations gives the signal model
- the complete problem has M — 1 nonlinear parameters.
- the K sources emitting an unknown source signal emit these in turn.
- the sensor arrays produce DOA estimates and quality measures.
- the ML estimator is only a minor modification of the estimator in step 1.
- the inverse FIM for this estimator is given by
- the ML estimator is efficient and distributed as a Gaussian. Since the DOA is periodic in 2 ⁇ , we therefore model the distribution of the DOA estimator with the circular normal distribution (Von Mises distribution) which is given by [1, pp. 105-110]
- ⁇ ( ⁇ ) is the zeroth-order Bessel function of the first kind and given by
- Step 4 Source Position Estimation
- step 1 For M > 2, a solution can be found by doing a search on a 2D grid. Details of step 1 and step 3
- step 1 and step 3 of the four step procedure for the sake of clarity. Below, however, all the details will be described.
- the description also includes an extention of the algorithm which compensates for non-ideal microphones responses. Practical microphones do not have an ideal response, and this will affect the source localisation of step 1 and step 3. If the microphone responses are known in terms of impulse responses, however, the above algorithm can be extended to take non-ideal microphones into account, thus producing more accurate estimates.
- the filtering of an N-periodic signal with an N-length FIR filter can be implemented using circular convolution without any zero-padding.
- the signal model can be written as
- the vector is real-valued and contains the unique elements from the DFT.
- the non-zero part of the objective can be written as
- the array structure establishes a map between a source location and the the delays That is, we can model the array structure as a function
- UCA uniform circular array
- f s and c are the sampling frequency in Hz and the propagation speed in metres per second, respectively.
- the propagation speed depends on the temperature ⁇ in degrees Celsius approximately via
- angles ⁇ and p k represent the distance to the source and the k'th microphone in radians, respectively.
- the definitions are straight-forward, but a bit cumbersome to write down in general. However, when is an integer, we have that
- ⁇ ( ⁇ ) is a DFT vector.
- the definition of ⁇ ( ⁇ ) depends on whether N is even or uneven, even when is an integer. When this is the case, we have that
- FIM Fisher information matrix
- ⁇ contains the unknown model parameters, except c 2 .
Abstract
A method for estimating a map of an acoustic environment including a set of known sources, a set of sensor arrays, and a central processing unit which is synchronized with the known sources and sensor arrays. The method includes sequentially emitting a known measurement signal from each known source, and, in each sensor array, estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and computing a quality measure of the estimated position. The position estimates and quality measures of each position estimate are transmitted to the central processing unit, which aligns all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.
Description
ACOUSTIC ENVIRONMENT MAPPING
Field of the invention
The present invention relates to estimation of loudspeaker topology, i.e. to determine a map of an acoustic environment including a set known sources and at least one unknown source. In particular, the present invention relates to estimation of loudspeaker positions relative to each other and to a listener. Background
The listening experience is highly influenced by the position of the loudspeakers relative to the listener and the room in which they are placed. For example, in a stereo set-up, the two loudspeakers and the listener should ideally be placed on the vertices of an even sided triangle, and in a surround set-up, the loudspeakers should be placed at certain angles on a circle centered on the listening position. Unfortunately, the loudspeakers and listener(s) are often not be placed in their ideal positions since other interior design considerations often take higher priority. However, if the positions of the loudspeakers and the listener are known, signal processing algorithms can to a certain extent compensate for the non-ideal positions.
As illustrated in figure 1 , an acoustic environment includes a set of loudspeakers 1 in an acoustic enclosure 2 such as a room, and a listener at a given position 3 in the enclosure 2. In order to achieve a satisfactory listening experience, the audio rendering should take the position of the loudspeakers and the listening position into account. In order to achieve that, a map containing the position of the loudspeakers and the listening position is required. Preferably, the map should also contain the orientation of the loudspeakers.
A map including the positon (and possibly also orientation) of the loudspeakers in an acoustic environment is not only useful in the above scenario. Future consumer audio will to a much wider extent be object-based, in which case it becomes more important for the rendering system to know the speaker layout.
The typical approach to compensate for a non-ideal system layout is to measure the transfer function from all loudspeakers to the listening position. Unfortunately, this is quite inconvenient for the listener, and this approach does not directly create a speaker and listener map that can be used for rendering object-based audio faithfully.
Another approach is provided in US 8,279,709. Here, a microphone is placed on each speaker, and impulse responses from each speaker are measured to determine distances between each pair of speakers. The distance matrix is then used to estimate the position of each speaker.
However, this approach still does not provide the orientation of the loudspeakers. Also, the method in US 8,279,709 is sensitive for errors in the measurements by the microphones. Finally, the method requires a significant amount of data to be sent between each speaker and the central computation unit.
General disclosure of the Invention
It is an object of the present invention to at least mitigate some of the problems mentioned above, and provide an improved way to determine a map of an acoustic environment including a set of known sources and optionally one or more unknown sources.
According to a first aspect of the present invention, this and other objects are achieved by a method for estimating a map of an acoustic environment including a set of S known sources, comprising:
providing a set of M sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;
providing a central processing unit, said central processing unit being synchronized with said known sources and with said sensor arrays;
sequentially emitting a known measurement signal from each known source;
in each sensor array:
estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and
computing a quality measure of the estimated position;
transmitting position estimates of each known source in each local coordinate system and the quality measure of each position estimate to the central processing unit; and
in the central processing unit, aligning all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array..
According to a second aspect of the present invention, the above object is achieved by a system for estimating a map of an acoustic environment including a set of known sources, comprising:
a set of sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;
a central processing unit, said central processing unit being
synchronized with said known sources and with said sensor arrays;
wherein each sensor array is configured to:
- sequentially receive a known measurement signal from each known source,
- estimate a position of the currently emitting known source in a local coordinate system of the sensor array,
- compute a quality measure of the estimated position, and
- transmit, to the central processing unit, position estimates of each known source in said local coordinate system and a quality measure of each position estimate; and
wherein the central processing unit is configured to align all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.
According to these aspects, a set of M sensor arrays are used to
create a map containing the positions and orientation of the sensor arrays, and the position of a set of known sources (e.g. loudspeakers).
The "known" sources are thus synchronized with the sensor arrays and with the central processing unit. In a loudspeaker setup, all loudspeakers are connected to the same system and synchronization within a few tens of a microsecond is typically required for playback of audio.
By emitting a known signal, the synchronized sensor array can determine distance as well as direction. Accordingly, the sensor array can determine the position of the source in its own, local coordinate system.
The sensor arrays and sources (referred to as slaves) are connected to the central processing unit (referred to as the master). As position estimates and quality measures are calculated locally in each sensor array, only low rate data transmission is required from the slaves to the master. The required data rate can typically be provided by available connections, without the need for high speed data transmission.
The sensor arrays and the known sources (i.e. loudspeakers) do not need to be co-located, and the orientation of the sensor arrays may be unknown. However, in one embodiment, at least one sensor array is co- located with one of the known sources. Thereby, the estimated orientation of the sensor array can be used to determine the orientation of the co-located known source.
Preferably, each sensor array includes at least three microphones when the positions are estimated in two-dimensional space, and at least four microphones when the positions are estimated in three-dimensional space.
According to one embodiment, the acoustic environment further includes one or several unknown sources (e.g. a user in a listening position), and the method further includes:
sequentially emitting an unknown signal from each unknown source; in each sensor array:
estimating a direction of arrival (DOA) of the currently emitting unknown source in the local coordinate system of the sensor array, and
computing a quality measure of the estimated DOA;
transmitting DOA estimates of each unknown source in each local
coordinate system and the quality measure of each DOA estimate to the central processing unit; and
in the central processing unit, determining the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.
Brief description of the drawings
The present invention will be described in more detail with reference to the appended drawings, showing currently preferred embodiments of the invention.
Figure 1 schematically shows an acoustic environment.
Figure 2 schematically shown an acoustic environment according to an embodiment of the present invention.
Figure 3 is a block diagram of a sensor array in figure 2.
Figure 4 is a flow chart of a method according to an embodiment of the present invention.
Figure 5 shows a map of sensor arrays and known sources determined according to an embodiment of the invention.
Figure 6 is a more detailed view of a part of the map in figure 5.
Figure 7 shows a map of sensor arrays, known sources and one unknown source determined according to an embodiment of the invention.
Figure 8 shows a map of sensor arrays, known sources and two unknown sources determined according to an embodiment of the invention.
Detailed description of preferred embodiments
As illustrated in figure 1 , an acoustic environment includes a set of S sources (loudspeakers) 1 placed in an acoustic enclosure such as a room 2. As illustrated in figure 2, a set of M sensor arrays 4 have been placed in the environment. As shown in figure 3, each sensor array 4 can include a plurality of microphones 5 and processing circuitry 6 for processing sound signals received by the microphones 5. A central processing unit 7, hereinafter referred to as "master", is connected to and synchronized with each sensor
array 4. The sources 1 are also synchronized with the sensor arrays 4, and will therefore be referred to as "known" sources.
The environment may further include one or several "unknown" sources 8, i.e. sources which are not synchronized with the central
processing unit 7. An example of such an "unknown" source is a listener located at a given listening position 3.
According to the present invention, such a map is obtained by a method illustrated in figure 4. In brief, and with reference to figures 2 and 4, the method includes:
1 . In turn, the S known sources 1 emit a known source signal (step
S1 ) while the M sensor arrays 4 estimate the position of these sources in their own local coordinate system 1 1 (step S2). In addition to the position estimates, the sensor arrays 4 also compute a quality matrix describing the accuracy of the estimated positions (step S3).
2. In step S4, the position estimates of the S sources in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices. The master 7 now rotates and translates the local coordinate systems so that they fit (are aligned) as well as possible (step S5). The quality matrices are used in this process to ensure that the most accurate estimates have a higher weight than the less-accurate estimates. The result of the process is a map of all sensor arrays 4 and known sources 1 in a common coordinate system 12.
3. If the environment includes unknown sources 8, in step S6 , the R unknown sources 8 emit an unknown source signal while M sensor arrays 4 estimate the direction of arrival (DOA) in their own coordinate system 1 1 (step S7). The unknown source signals may be a talking person or a mobile phone. Since the source signal is non-synchronized and often in the far-field of the arrays, only the DOA and not the distance is estimated. Again, a quality matrix is computed describing the accuracy of the estimated DOAs (step S8).
4. In step S9, the DOA estimates of the R unknown sources 8 in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices. The master 7 now finds the R points that best describe the estimated DOAs (step S10). The quality matrices are used in this process
to ensure that the most accurate estimates have a higher weight than the less-accurate estimates. The result of the process is a map of all sensors and sources emitting a known or an unknown source signal.
There is no upper bound on the values of S, M, and R. However, at least two sensor arrays 4 are required to estimate the position of an unknown source 8. If only one sensor array 4 is present, only the direction of arrival (DOA) of an unknown source can be estimated. With two or more sensor arrays 4, the position of an unknown source 8 can be triangulated.
Figure 5 and 6 show an example of a map including four UCA sensor arrays, and four known sources (i.e. all source signals are assumed known and synchronized with the sensor arrays). The map has been determined with a signal-to-noise ratio of 20 dB, a data length of 999, and a sampling frequency of 4 kHz. The true positions of the sources and sensors are marked by dots and crosses, respectively. The estimated sensor positions are marked by circles and the estimated source positions are marked by stars. Finally, the quality matrices are illustrated as lines (which in fact are compressed ellipses) which indicate the one standard deviation uncertainty contours. Thus, bigger ellipses suggest more uncertain estimates and vice versa.
Aside from seeing that all the source and sensor locations are estimated with a very high precision, the most interesting part is the
uncertainty ellipses. The ellipses have almost no extension in the direction corresponding to range from the source, indicating that we are much more certain about the range than the DOA. This means that we get a much more accurate estimate of the sources if we use weighted Procrustes analysis.
Figure 6 shows a part of figure 5 in which the ellipses for one source are easier to see. Two ellipses are much bigger than the other two and this is hardly surprising since the corresponding sensor arrays are further away from this source.
If we assume that sources emitting an unknown source signal are in the near-field of the arrays, the positions of these sources can be estimated jointly with the positions of sources emitting a known source signal. In this case, step 3 above (steps S6-S8 in figure 4) is a part of step 1 , and step 4 (steps S9-S10) can be omitted. An illustration of such a joint estimation is
given in figure 7 and 8.
In figure 7, the source signal of the source in (3, 0) is unknown, while in figure 8, the source signal of the sources in (3, 0) and in (2, 2) are unknown. The other sources are known, and other conditions are the same as in figures 5 and 6.
Clearly, the ellipses for the unknown sources are now much bigger indicating that they are much harder to estimate. However, the sources are still estimated fairly accurately since two of the arrays have a fairly good estimate of the source. This further demonstrates the power of weighted Procrustes analysis.
In the following, a more detailed disclosure of an example of the present invention will be provided.
Step 1: Source localisation
Many source localisation algorithms already exist in the scientific literature for various array geometries. In principle, any array geometry can be used as long as
• at least three sensors not on the same line are used for 2D source localisation, and
• at least four sensors not in the same plane are used for 3D source localisation.
For 2D localisation, we have used a uniform circular array (UCA) as the array geometry since the range and DO A estimation performance are independent of the direction of the source. To estimate the source position using a UCA, we use the maximum likelihood (ML) estimator in [5] since it is optimal in a statistical sense. Assuming white and Gaussian measurement noise, the covariance matrix of the estimator is the inverse Fisher information matrix for large enough data sizes. We use this matrix as an inverse quality matrix, and, for a source in the far field1, it is given by
where x is the source position, d is the distance to the source in meters, Θ is the DOA in radians, β is the gain for the source to the sensor array, N is the number of
data points, K is the number of sensors in the array, r is the array radius in meters, c is the propagation speed,
is the z'th DFT coefficient of the source signal, and σ2 is the estimated noise variance.
Clearly, the difference between how accurately we can estimate the x- and y-coordinates of x varies with Θ and can be huge. For example, the error in the x-coordinate can be expected to be 2d2/r2— 1 times bigger than the error in the y-coordinate when θ = π/2\ This can be a lot depending on d and r. By including these quality matrices as inverse weighting matrices in step two, we automatically include the differences in uncertainties.
A quality matrix for a source in the near-field can also be found (see details later).
Step two: Combining independent source localisation estimates
The algorithm for combining the independent source localisation estimates into one map is described below. The algorithm is a variant of weighted generalised orthogonal Procrustes analysis.
Assume that the true coordinates of S sources in a reference coordinate system are given as the columns in the matrix X. In the coordinate system of the m'th sensor array these global coordinates are observed rotated and translated as
where Qm and tm are a rotation and a translation matrix, respectively. Unfortunately, we do not observe Xm directly, but only the noisy version
where em = vec(Em) is a white Gaussian noise vector with an unknown variance, and Wm is a block diagonal matrix of the form
The weighting matrix Vmn is a 2 x 2 matrix proportional to the inverse FIM in (1.1) (see step one above). Combining these two e uations gives the signal model
By stacking all the ym's on top of each other for m = 1, . . . , M, we obtain the signal model
where
For white Gaussian noise with an unknown variance σ2, the maximum likelihood estimator of Q is
It is well known from generalised Procrustes analysis, that a closed-form solution to the above problem is not available unless M = 2 AND the same weights are applied to each column of Em. In this case, a D-dimensional eigenvalue decomposition can be used in the computation of Q Q2 [2]. If M > 2 AND the same weights are applied to each column of Em, the estimates of X and Q are computed iteratively as detailed in [2]. In this case, Qm is estimated from (1.8) for m = 1, . . . , M as the solution to
Although this looks complicated, the estimate of Qm is the result of an eigenvalue decomposition WHEN the same weights are applied to each column of Em. Since the uncertainty in the x- and y-coordinates can be far from satisfying this condition in our case, we will not describe the detailed solution here. Instead, we will seek to find a solution for a general weighting matrix.
According to [2], (1.20) can only be solved iteratively and they refer to [4]. In [4], an iterative algorithm has been suggested, but it seems to be very sensitive to the starting point. Specifically, the authors suggest that at least 20 random starting points should be tried out, and that the unweighted solution is not suitable to use as a starting point. This is a major drawback of the algorithm, and we therefore suggest that something else is done. In our initial case, the dimension of the problem is D = 2 so the rotation matrix can be written as
Thus, the complete problem has M — 1 nonlinear parameters. For many loudspeakers, it might be computationally very intensive to optimise such a high-dimensional nonlinear cost function (especially, if we move to 3D), so we instead attack the problem as it is traditionally solved in generalised orthogonal Procrustes analysis. That is, we have to solve a number of ID nonlinear optimisation problems where the objective is given in (1.20). In 3D, we instead get a series of 2D nonlinear optimisation problems which are not too costly to solve.
Step 3: DOA Estimation
In the third step, the K sources emitting an unknown source signal emit these in turn. In a very similar fashion to step 1, the sensor arrays produce DOA estimates and quality measures. The ML estimator is only a minor modification of the estimator in step 1. The inverse FIM for this estimator is given by
where we have used the same notation as above. Asymptotically, the ML estimator is efficient and distributed as a Gaussian. Since the DOA is periodic in 2π, we therefore model the distribution of the DOA estimator with the circular normal distribution (Von Mises distribution) which is given by [1, pp. 105-110]
variance 1 /κ. Since we expect to be small, we can therefore set κ as
Since the source signal is unknown, we know neither α, nor β. However, we can estimate at all sensors.
Step 4: Source Position Estimation
Assume that the unknown source is positioned at location z in a reference coordinate system. In the coordinate system of sensor array m, this point is given by
where Qm and tm are the rotation and translation vectors estimated in step 2. Since the source signal is unknown, we only estimate the angle 9m of the polar representation of zra (z), but not the length. In terms of the source and sensor array position, this angle is given by
where em is the unit vector in the m'th dimension. As alluded to above, we assume that our estimator is unbiased and distributed as a Von Mises random variable. Thus, we model the m'th estimator as
where is the DOA estimate. Since it is reasonable to assume, that M sensor arrays produce observation errors independently, we therefore model the joint distribution as
Clearly, the maximum likelihood estimator of z is now the argument maximizing the above distribution w.r.t. z. Thus, we have to solve the optimisation problem
where S is the set of sensor array positions. By inserting the definition of 9m (z) and by doing a number of manipulations, the above objective function can be rewritten into
where
For M > 2, a solution can be found by doing a search on a 2D grid. Details of step 1 and step 3
In the description above, many details were left out of step 1 and step 3 of the four step procedure for the sake of clarity. Below, however, all the details will be described. The description also includes an extention of the algorithm which compensates for non-ideal microphones responses. Practical microphones do not have an ideal response, and this will affect the source localisation of step 1 and step 3. If the microphone responses are known in terms of impulse responses, however, the above algorithm can be extended to take non-ideal microphones into account, thus producing more accurate estimates.
Modelling periodic signals
The reason for considering the periodic signal model is that
1. a time-shift of a finite-length periodic signal leads to a phase shift in the frequency domain
2. the filtering of an N-periodic signal with an N-length FIR filter can be implemented using circular convolution without any zero-padding.
The applicability of these two properties will be apparent later. First, however, we will consider the model of such a periodic signal. Any periodic signal can be written as
fo
is the fundamental frequency in radians /sample, Ai > 0 is the amplitude of
the Z'th harmonic component,
is the phase of the I'th harmonic component, and L < [N/2J . Using Euler's identity, we obtain that
where we have defined
This assumption is the same as saying that the signal is periodic in N. When we can design the signal, this assumption is easily satisfied. However, if we cannot design the signal, the above is an assumption which is used (nearly always and often implicitly [3]) for source localisation. We also set the number of harmonic components to the maximum of
This is not a critical assumption when we assume the signal to be known since we can always set some to zero.
The above assumptions were made to facilitate a fast implementation using a fast Fourier transform. To see this, we will reformulate the above signal model slightly. We first define the DFT matrix
where, as assumed above . It then follows that
Moreover, it turns out that it is also desirable to establish a link between the DFT of a periodic signal and the vector UL - This is given by
where of s (0) and
Using these expressions, we obtain an alternative formulation for the periodic signal which is given by
Linear filtering of a periodic signal
where we have defined
If the signal s (n) is periodic in N, we can then rewrite the above as
where we have now defined
The two equations are identical since s (n + N) = s (n) when s(n) is a periodic signal (in N) . The reason for considering the second form is that H is a circulant matrix. Thus, it can be diagonalised with the discrete Fourier transform so that
Modelling sensor array measurements from a periodic source
Assume that we have K sensors, each with their own direction-dependent impulse response vector where is the position the source. The source emits an N-periodic signal x{0)
sections, we can model the k'th. received sensor signal as
source to the k'th sensor. In the far-field model, all
are the same whereas we assume that the near-field model is given by
If we now concatenate all of the sensor data into one big vector, we obtain that
Unknown source signal
If the source signal s (0) is unknown, we cannot distinguish between β and s (0) . We, therefore, set β = 1 and keep in mind that we are estimating a scaled version of the source signal. Given
the source position
the least-squares estimate of the source signal is
Note that Q {rjk) and A]( (ps) are diagonal matrices so no matrix inversions are necessary. Also note th
In practice, however, we can set
also for an even data length and non-integer r\k since anti-aliasing filters will ensure that
H for the frequency
where the NLS objective is given by
Known source signal
If the source signal s (0) is known, we can estimate the non-negative gain parameter β. However, since the gain factor is non-negative, we have to solve a constrained optimisation problem of the form
Fortunately, it is easy to modify the standard unconstrained solution to give a solution to the constrained optimisation problem. First, suppose ps is given. Then, the unconstrained solution for β is the least squares solution which is given by
Note that the denominator is easily evaluated since are diagonal matrices. If the constraint is inactive. However, the constraint is active and β = 0.
where
Since the dominator is non-zero and the numerator is the squared value of we can rewrite the optimisation problem into its final form of
where the NLS objective is given by
So far, we have not assumed a particular array structure. The array structure establishes a map between a source location and the the delays
That is, we can model the array structure as a function Here, we will consider a 2D uniform circular array (UCA) in
source in radians, respectively. Similarly, we also have known microphone locations in either Euclidean or polar coordinates. For the k'th microphone, we have that
For a UCA centred in the origin of the coordinate system
where fs and c are the sampling frequency in Hz and the propagation speed in metres per second, respectively. The propagation speed depends on the temperature Γ in degrees Celsius approximately via
The interesting thing about the far-field model is that the distance d and the angle Θ are separated into two different terms. Thus, inserting the far-field expression for into the vector gives
where
The angles ζ and pk represent the distance to the source and the k'th microphone in radians, respectively. The definitions
are straight-forward, but a bit cumbersome to write down in general. However, when is an integer, we have that
meaning that ιι (ζ) is a DFT vector. The definition of Β (θ) depends on whether N is even or uneven, even when is an integer. When this is the case, we have that
Quality matrices
To derive the quality (or weighting) matrices, we first have to compute the Fisher information matrix (FIM) for the case of a known and an unknown source signal.
Recall that the signal model for the microphone data are given by
where
If we assume that the noise is white and Gaussian with an unknown variance c2, the vector y is distributed as
where ϋ contains the unknown model parameters, except c2, and
find an expression for the FIM. Since ps is a common parameter for both a known and an unknown source signal, we will start with this. When we do not assume that (it is not
where
When the source signal is known, we have to find the derivative w.r.t. β. This is
where
Quality matrix for an unknown source signal
When the source signal is unknown, we set β = 1 and have to find the derivative w.r.t. s (0) . This is
Thus, the FIM is where
from with we can extract the inverse quality matrix to
Note that some of the DFT matrices F cancel when calculating the term BA 1 B nT
Bibliography
[1] Christopher M. Bishop. Pattern Recognition and Machine Learning. New York, NY, USA:
Springer, 2006. ISBN: 0387310738.
[2] Fabio Crosilla and Alberto Beinat. "Use of generalised Procrustes analysis for the pho- togrammetric block adjustment by independent models". In: ISPRS Journal of Photogram- metry and Remote Sensing 56.3 (2002), pp. 195-209.
[3] J. R. Jensen et al. "On Frequency Domain Models for TDOA Estimation". In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2015.
[4] M. A. Koschat and D. F. Swayne. "A weighted Procrustes criterion". In: Psychometrika 56.2 (1991), pp. 229-239.
[5] J. K Nielsen et al. "Grid Size Selection for Nonlinear Least-Squares Optimisation in Spectral Estimation and Array Processing". In: Proc. European Signal Processing Conf. 2016.
Claims
1 . A method for estimating a map of an acoustic environment including a set of known sources, comprising:
providing a set of sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;
providing a central processing unit, said central processing unit being synchronized with said known sources and with said sensor arrays;
sequentially emitting a known measurement signal from each known source;
in each sensor array:
- estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and
- computing a quality measure of the estimated position; transmitting the position estimates of each known source in each local coordinate system and the quality measure of each position estimate to the central processing unit; and
in the central processing unit, aligning all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.
2. The method according to claim 1 , wherein at least one sensor array is co-located with one of the known sources.
3. The method according to claim 1 or 2, wherein the acoustic environment includes more than two known sources.
4. The method according to any one of the preceding claims, wherein the known sources are loudspeakers.
5. The method according to any one of the preceding claims, wherein each sensor array has at least three microphones.
6. The method according to any one of the preceding claims, wherein the sensor array estimates the position in a two-dimensional plane.
7. The method according to claim 6, wherein each sensor array is a uniform circular array (UCA).
8. The method according to any one of the preceding claims, wherein the environment further comprises at least one unknown source which is unsynchro ni zed with said central processing unit, and wherein the method further comprises:
sequentially emitting an unknown signal from each unknown source; in each sensor array:
- estimating a direction of arrival (DOA) of the currently
emitting unknown source in the local coordinate system of the sensor array, and
- computing a quality measure of the estimated DOA;
transmitting DOA estimates of each unknown source in each local coordinate system and the quality measure of each DOA estimate to the central processing unit; and
in the central processing unit, determining the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.
9. The method according to claim 8, wherein the at least one unknown source includes a user at an unknown listening position.
10. A system for estimating a map of an acoustic environment including a set of known sources, comprising:
a set of sensor arrays, each sensor array including at least two
microphones and processing circuitry for processing sound signals received by the microphones;
a central processing unit, said central processing unit being
synchronized with said known sources and with said sensor arrays;
wherein each sensor array is configured to:
- sequentially receive a known measurement signal from each known source,
- estimate a position of the currently emitting known source in a local coordinate system of the sensor array,
- compute a quality measure of the estimated position, and
- transmit, to the central processing unit, position estimates of each known source in said local coordinate system and a quality measure of each position estimate; and
wherein the central processing unit is configured to align all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.
1 1 . The system according to claim 10, wherein at least one sensor array is co-located with one of the known sources.
12. The system according to claim 10 or 1 1 , wherein the acoustic environment includes more than two known sources.
13. The system according to any one of claims 10 - 12, wherein the known sources are loudspeakers.
14. The system according to any one of claims 10 - 13, wherein each sensor array has at least three microphones.
15. The system according to any one of claims 10 - 14, wherein each sensor array is configured to estimate the position in a two-dimensional plane.
1 6. The system according to claim 15, wherein each sensor array is a uniform circular array (UCA).
17. The system according to any one of claims 10 - 1 6, wherein the environment further comprises at least one unknown source which is unsynchronized with said central processing unit;
wherein each sensor array is further configured to:
- sequentially receive an unknown signal from each unknown source, - estimate a direction of arrival (DOA) of the currently emitting
unknown source in the local coordinate system of the sensor array,
- compute a quality measure of the estimated DOA, and
- transmit DOA estimates of each unknown source in each local coordinate system and a quality measure of each DOA estimate to the central processing unit; and
wherein the central processing unit is further configured to determine the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.
18. The system according to claim 17, wherein the at least one unknown source includes a user at an unknown listening position.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DKPA201600468 | 2016-08-12 | ||
DKPA201600468 | 2016-08-12 | ||
DKPA201700003 | 2017-01-04 | ||
DKPA201700003 | 2017-01-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018029341A1 true WO2018029341A1 (en) | 2018-02-15 |
Family
ID=61163040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2017/070429 WO2018029341A1 (en) | 2016-08-12 | 2017-08-11 | Acoustic environment mapping |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018029341A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8279709B2 (en) | 2007-07-18 | 2012-10-02 | Bang & Olufsen A/S | Loudspeaker position estimation |
US20120295637A1 (en) * | 2010-01-12 | 2012-11-22 | Nokia Corporation | Collaborative Location/Orientation Estimation |
CN102901949A (en) * | 2012-10-13 | 2013-01-30 | 天津大学 | Two-dimensional spatial distribution type relative sound positioning method and device |
-
2017
- 2017-08-11 WO PCT/EP2017/070429 patent/WO2018029341A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8279709B2 (en) | 2007-07-18 | 2012-10-02 | Bang & Olufsen A/S | Loudspeaker position estimation |
US20120295637A1 (en) * | 2010-01-12 | 2012-11-22 | Nokia Corporation | Collaborative Location/Orientation Estimation |
CN102901949A (en) * | 2012-10-13 | 2013-01-30 | 天津大学 | Two-dimensional spatial distribution type relative sound positioning method and device |
Non-Patent Citations (8)
Title |
---|
ANDREAS M ALI ET AL: "An Empirical Study of Collaborative Acoustic Source Localization", INFORMATION PROCESSING IN SENSOR NETWORKS, 2007. IPSN 2007. 6TH INTERN ATIONAL SYMPOSIUM ON, IEEE, PI, 1 April 2007 (2007-04-01), pages 41 - 50, XP031158396, ISBN: 978-1-59593-638-7 * |
AYLLON DAVID ET AL: "Indoor Blind Localization of Smartphones by Means of Sensor Data Fusion", IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 65, no. 4, 1 April 2016 (2016-04-01), pages 783 - 794, XP011602096, ISSN: 0018-9456, [retrieved on 20160309], DOI: 10.1109/TIM.2015.2494629 * |
CHRISTOPHER M. BISHOP.: "Pattern Recognition and Machine Learning", 2006, SPRINGER |
FABIO CROSILLA; ALBERTO BEINAT: "Use of generalised Procrustes analysis for the pho-togrammetric block adjustment by independent models", ISPRS JOURNAL OF PHOTOGRAM-METRY AND REMOTE SENSING, vol. 56, no. 3, 2002, pages 195 - 209 |
J. K NIELSEN ET AL.: "Grid Size Selection for Nonlinear Least-Squares Optimisation in Spectral Estimation and Array Processing", PROC. EUROPEAN SIGNAL PROCESSING CONF., 2016 |
J. R. JENSEN ET AL.: "On Frequency Domain Models for TDOA Estimation", PROC. IEEE INT. CONF. ACOUST., SPEECH, SIGNAL PROCESS, 2015 |
LEWIS GIROD ET AL: "The design and implementation of a self-calibrating distributed acoustic sensing platform", SENSYS'06 : PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS : OCT. 31 - NOV. 3, 2006, BOULDER, COLORADO, USA, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, NY, USA, 31 October 2006 (2006-10-31), pages 71 - 84, XP058318518, ISBN: 978-1-59593-343-0, DOI: 10.1145/1182807.1182815 * |
M. A. KOSCHAT; D. F. SWAYNE: "A weighted Procrustes criterion", PSYCHOMETRIKA, vol. 56, no. 2, 1991, pages 229 - 239 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101591220B1 (en) | Apparatus and method for microphone positioning based on a spatial power density | |
Gillette et al. | A linear closed-form algorithm for source localization from time-differences of arrival | |
Brandstein et al. | A practical methodology for speech source localization with microphone arrays | |
JP3881367B2 (en) | POSITION INFORMATION ESTIMATION DEVICE, ITS METHOD, AND PROGRAM | |
Ajdler et al. | Acoustic source localization in distributed sensor networks | |
Salvati et al. | Exploiting a geometrically sampled grid in the steered response power algorithm for localization improvement | |
CN104041075A (en) | Audio source position estimation | |
Pertilä et al. | Closed-form self-localization of asynchronous microphone arrays | |
US20120195436A1 (en) | Sound Source Position Estimation Apparatus, Sound Source Position Estimation Method, And Sound Source Position Estimation Program | |
Valente et al. | Geometric calibration of distributed microphone arrays from acoustic source correspondences | |
Gala et al. | Realtime active sound source localization for unmanned ground robots using a self-rotational bi-microphone array | |
Choi et al. | Multiarray eigenbeam-ESPRIT for 3D sound source localization with multiple spherical microphone arrays | |
Srivastava et al. | Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators | |
Gburrek et al. | Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information | |
Boora et al. | A TDOA-based multiple source localization using delay density maps | |
Khanal et al. | A free-source method (FrSM) for calibrating a large-aperture microphone array | |
US11579275B2 (en) | Echo based room estimation | |
KR20090128221A (en) | Method for sound source localization and system thereof | |
Su et al. | Asynchronous microphone arrays calibration and sound source tracking | |
WO2018029341A1 (en) | Acoustic environment mapping | |
EP3203760A1 (en) | Method and apparatus for determining the position of a number of loudspeakers in a setup of a surround sound system | |
Zhayida et al. | Minimal solutions for dual microphone rig self-calibration | |
KR20060124443A (en) | Sound source localization method using head related transfer function database | |
Le et al. | Closed-form solution for TDOA-based joint source and sensor localization in two-dimensional space | |
Nielsen | Loudspeaker and listening position estimation using smart speakers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17752111 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17752111 Country of ref document: EP Kind code of ref document: A1 |