WO2018210429A1 - Calibration system for loudspeakers - Google Patents

Calibration system for loudspeakers Download PDF

Info

Publication number
WO2018210429A1
WO2018210429A1 PCT/EP2017/062104 EP2017062104W WO2018210429A1 WO 2018210429 A1 WO2018210429 A1 WO 2018210429A1 EP 2017062104 W EP2017062104 W EP 2017062104W WO 2018210429 A1 WO2018210429 A1 WO 2018210429A1
Authority
WO
WIPO (PCT)
Prior art keywords
loudspeakers
loudspeaker
microphones
voice sound
listener
Prior art date
Application number
PCT/EP2017/062104
Other languages
French (fr)
Inventor
Antoine PEILLOT
Benoit Burette
Original Assignee
Gibson Innovations Belgium Nv
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gibson Innovations Belgium Nv filed Critical Gibson Innovations Belgium Nv
Priority to PCT/EP2017/062104 priority Critical patent/WO2018210429A1/en
Publication of WO2018210429A1 publication Critical patent/WO2018210429A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/301Automatic calibration of stereophonic sound system, e.g. with test microphone
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/30Determining absolute distances from a plurality of spaced points of known location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation

Definitions

  • the invention relates to calibration systems for loudspeakers, particularly systems for calibrating offsets applied to audio channels played back by the loudspeakers.
  • the loudspeakers are typically placed in a variety of positions, most of which are non-optimal in terms of sound propagation to a listening position.
  • the possible positions for placing the loudspeakers are often constrained by objects such as furniture in the room.
  • a listener may not necessarily sit at the optimal listening position (or "sweet spot"), which again is usually due to constraints resulting from the positioning of furniture or other objects in the room.
  • Spatial calibration is a feature that has become more and more available for surround sound systems over the last few years. Spatial calibration is based on adjusting the gains and delays (both of which can be termed "offsets") to be applied to each surround sound channel being played back through a respective loudspeaker to produce an optimal sound field at the listening position wherever the loudspeakers are placed. Spatial calibration therefore focuses the sweet spot at the listening position to provide the listener with an optimal surround sound experience. The sweet spot is virtually focused at the listening position.
  • an external microphone was used at the listening position to capture a test signal played by the loudspeakers.
  • the signal captured was used to estimate the gains and delays based on distances between the listening position and the speakers.
  • one prior sound system http://www.sony.co.uk electronics/sound-bars/ht-rt5
  • Another prior sound system made use of a smartphone microphone at the listening position to calibrate the sound system (http://www.whathifi.com/news/sonos- trueplay-what-it-how-can-you-get-it). In those scenarios, however, the user is required to manually capture the sound emitted by the loudspeakers at the listening position.
  • Calibrating a sound system can also be achieved by inversing the problem, that is, emitting a sound at the listening position and capturing the sound at one or more loudspeaker locations using a microphone array.
  • a microphone array is integrated in a soundbar to calibrate the surround speakers.
  • the surround speakers are placed at the listening position to emit the calibration sound. The speakers are then moved to the desired locations and the process is started again.
  • this process is a tedious task for a user and the robustness of the calibration is highly dependent on the user.
  • Walsh makes use of a microphone array embedded in an anchoring component of the system, such as a soundbar, located in a known front centre position or other predictable or assumed position.
  • the speakers are calibrated by emitting MLS input signals for receipt by the microphone array.
  • the listening position is calibrated with a voice input from the user at the listening position for receipt by the microphone array. In this way, the locations of the speakers and the listening position are estimated relative to the microphone array, and based on these spatial calibration is performed.
  • Walsh relies on a microphone array embedded in an anchoring component located in a known location, and the locations of the other speakers and the listening positon are estimated relative to this anchoring component.
  • TDOA time difference of arrival
  • Choisel discloses a method to estimate speaker positions in a room using one embedded microphone in each speaker. Choisel only uses specific generated test signals such as MLS input signals. In the Choisel system, these test signals are long and broadband. In order to deal with room reflections, a signal covering all of the audible frequency range is used to maximize the discrimination of sound sources. Accordingly, such test signals are noise-like or very close to noise. Furthermore, the localization of a listener is not considered by Choisel.
  • the present invention provides, in a first aspect, a method of calibrating a plurality of loudspeakers each corresponding to a respective microphone located at a respective predetermined location relative to the respective loudspeaker, each loudspeaker for playing back an audio channel, the method comprising:
  • the voice sound played back through one or more of the loudspeakers is a prerecorded voice sound. In one embodiment, the voice sound played back through one or more of the loudspeakers is a recording of the same voice sound received through the one or more microphones from the listener. In one embodiment, the voice sound is a short word sequence.
  • At least one loudspeaker and the respective microphone form part of a loudspeaker unit.
  • At least two of the loudspeakers form part of a soundbar.
  • the two respective microphones corresponding to the two loudspeakers are located at a minimum distance of 40 cm from each other. In one embodiment, the two respective microphones corresponding to the two loudspeakers form part of the soundbar.
  • the plurality of loudspeakers comprises three or more loudspeakers.
  • the plurality of loudspeakers comprises four or more loudspeakers.
  • no more than one line of sight between two of the loudspeakers or between the listener and one of the loudspeakers is obstructed.
  • the plurality of loudspeakers comprise three or more loudspeakers with a first one of the loudspeakers matched to a first audio channel and a second one of the loudspeakers matched to a second audio channel, and the method comprises automatically matching at least one of the remaining loudspeakers to an appropriate audio channel based on the calculated location of the at least one loudspeaker.
  • the method comprises automatically matching each loudspeaker to an audio channel based on the respective calculated location of each loudspeaker.
  • the microphones are synchronized to each other.
  • the present invention provides a calibration system for a plurality of loudspeakers, each loudspeaker for playing back an audio channel, the system comprising:
  • a processor configured to:
  • Fig. 1 is a schematic diagram of a calibration system for a plurality of loudspeakers in accordance with an embodiment of the present invention
  • Fig. 2 is a schematic diagram of a calibration system for a plurality of loudspeakers in accordance with another embodiment of the present invention
  • Fig. 3 is a schematic diagram of a loudspeaker localization sequence in accordance with an embodiment of the present invention.
  • Fig. 4 is a graph of a GCC- ⁇ - ⁇ function between two loudspeakers exhibiting strong reflections
  • Fig. 5 is a schematic diagram of a listening position localization sequence in accordance with an embodiment of the present invention.
  • Fig. 6 is a graph of a GCC- ⁇ - ⁇ function between a pair of microphones corrupted by noise, reverberation, and strong early reflections;
  • Fig. 7(a) to (c) are schematic diagrams showing the process of determining consistent TDOA combinations with Fig. 7(a) showing a consistent triple of microphones, Fig. 7(b) showing a consistent quadruple of microphones, and Fig. 7(a) showing a consistent quintuple of microphones.
  • Each loudspeaker is for playing back an audio channel.
  • the method comprises:
  • each loudspeaker has one corresponding microphone.
  • two or more of the loudspeakers can form part of a loudspeaker unit, in which case, there can be two or more corresponding microphones.
  • Fig. 2 shows a soundbar 5 having loudspeakers 1 c, 1 d, 1 e, and 1 f integrated within it.
  • Microphones 2c and 2d correspond to the soundbar 5 and all the loudspeakers 1 c, 1 d, 1 e, and 1f integrated within it.
  • microphone 2c can correspond to loudspeaker 1 c only, or correspond to both loudspeakers 1 c and 1 e only.
  • microphone 2d can correspond to loudspeaker 1 d only, or correspond to both loudspeakers 1 d and 1 f only.
  • a loudspeaker unit such as the soundbar 5
  • these microphones are located at a minimum distance of 40 cm from each other.
  • the two microphones 2c and 2d are located at a minimum distance of 40 cm from each other. This ensures the uniqueness of the information picked up by each microphone.
  • a loudspeaker 1 a, 1 b, 1 c, 1 d, etc. and its corresponding microphone 2a, 2b, 2c, 2d, etc. form part of a loudspeaker unit.
  • a microphone is integrated within the same housing that houses the corresponding loudspeaker.
  • loudspeaker 1 a is integrated with corresponding microphone 2a in a loudspeaker unit
  • loudspeaker 1 b is similarly integrated with corresponding microphone 2b
  • loudspeaker 1 c is similarly integrated with corresponding microphone 2c
  • loudspeaker 1 d is similarly integrated with corresponding microphone 2d.
  • microphones 2c and 2d are integrated with corresponding loudspeakers 1 c, 1 d, 1 e, and 1f within soundbar 5.
  • the voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a prerecorded voice sound.
  • the voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a recording of the same voice sound received through the one or more microphones 2a, 2b, 2c, 2d , etc. from the listener 3.
  • the step of receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. a voice sound from the listener 3 at the listening position 4 can be performed either before or after playing back the prerecorded voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially and receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. the prerecorded voice sound played sequentially back through the one or more loudspeakers 1 a, 1 b, 1 c, 1 d, etc.
  • the voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a recording of the same voice sound received through the one or more microphones 2a, 2b, 2c, 2d , etc. from the listener 3, the step of receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. a voice sound from the listener 3 at the listening position 4 must of course be performed before playing back the recorded voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially.
  • the voice sound is a short word sequence. This is a significant advantage over prior methods since this allows much quicker and simpler calibration of loudspeakers. The involvement of the user in the calibration process is also minimized. This greatly improves the user experience during the calibration process, which is as seamless as possible.
  • the calibration performance is good where the plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc. comprises four or more loudspeakers. Also, no more than one line of sight between two of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. or between the listener 3 and one of the loudspeakers should be obstructed. However, in other embodiments, the plurality of loudspeakers comprise just three loudspeakers.
  • the microphones 2a, 2b, 2c, 2d, etc. are synchronized to each other. This means that the microphones are operated simultaneously to receive the voice sound. The microphones do not need to be synchronized with the loudspeakers.
  • the method can also comprise automatically matching at least one of the remaining loudspeakers to an appropriate audio channel based on the calculated location of the at least one loudspeaker.
  • the appropriate audio channel depends on the type of multi-channel audio system or surround sound audio system. For example, in a two-channel (left and right) stereo audio system, a first one of the loudspeakers is matched to a first audio channel (e.g.
  • the method automatically matches at least one of the remaining loudspeakers to the first or second audio channel based on the calculated location of the at least one loudspeaker.
  • the remaining loudspeakers can be automatically matched to one of the left front, right front, left rear, right rear, etc. audio channels appropriate to the calculated location of each remaining loudspeaker.
  • a third one of the loudspeakers is also matched to an audio channel before automatically matching the remaining to appropriate respective audio channels.
  • the method can automatically match each loudspeaker 1 a, 1 b, 1 c, 1 d, etc. to an audio channel based on the respective calculated location of each loudspeaker.
  • the present invention also includes a calibration system for a plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc., with each loudspeaker for playing back an audio channel.
  • the calibration system comprises a plurality of microphones 2a, 2b, 2c, 2d, etc. each corresponding to a respective loudspeaker 1 a, 1 b, 1 c, 1 d, etc.
  • a processor is configured to: receive through one or more of the microphones 2a, 2b, 2c, 2d, etc. a sound from a listener 3 at a listening position 4; play back a voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially; receive through one or more of the microphones 2a, 2b, 2c, 2d, etc.
  • the audio channels played back through the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. are typically the audio channels that form a stereo or surround sound system.
  • loudspeakers 1 a and 1 c can play back a left stereo channel
  • loudspeakers 1 b and 1 d play back an associated right stereo channel.
  • loudspeaker 1 a plays back a left rear channel
  • loudspeaker 1 b plays back a right rear channel
  • loudspeaker 1 c plays back a left front channel
  • loudspeaker 1 d plays back a right front channel of a surround sound system.
  • loudspeakers 1 a, 1 c, and 1 e can play back a left stereo channel, whilst loudspeakers 1 b, 1 d, and 1 f play back an associated right stereo channel.
  • loudspeaker 1 a plays back a left rear channel
  • loudspeaker 1 b plays back a right rear channel
  • loudspeakers 1 c and 1 e play back a left front channel
  • loudspeakers 1 d and 1 f play back a right front channel of a surround sound system.
  • a typical operation of the method and system of the present invention described above involves, firstly, localizing and mapping the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. to corresponding audio channels.
  • Short voice sounds or sequences are used as short playback sequences to localize the loudspeakers 1 a, 1 b, 1 c, 1 d, etc.
  • Each loudspeaker 1 a, 1 b, 1 c, 1 d, etc. plays the short voice sequence one after the other while the remaining loudspeakers capture the radiated sound utilizing their corresponding built-in microphones 2a, 2b, 2c, 2d, etc.
  • the inter-distances between each pair of loudspeakers are estimated by correlating the captured sound with the original signal. By combining these inter-distances, the location of each loudspeaker is then estimated.
  • the listener/user 3 is localized.
  • the voice of the user 3 is captured by the built-in microphones 2a, 2b, 2c, 2d, etc. and used to localize the listening position 4.
  • the time difference of arrivals (TDOAs) is estimated by combining every possible pair of signals captured by the built-in microphones 2a, 2b, 2c, 2d, etc.
  • the location of the listening position 4 is estimated.
  • offsets in the form of gains and delays are calculated for each audio channel to virtually locate the listening position 4 to the optimal listening position, such as the centre of a surround sound audio track.
  • the present invention it is possible to automatically assign any number of arbitrarily placed satellite loudspeakers (such as 1 a, 1 b, 1 c, 1 d, etc.) to the correct audio channel knowing two reference points, e.g. the front right and front left loudspeakers.
  • the loudspeaker localization stage is based on a multidimensional scaling (MDS) algorithm as described by S. Choisel et al. in “Loudspeaker Position Estimation", US Patent 0195444, issued 5 August 2010.
  • MDS multidimensional scaling
  • the distance matrix is then used by an MDS algorithm to determine loudspeaker coordinates.
  • a MDS implementation can be found in "An Introduction to MDS” Sound Quality Research Unit, Aalborg University, Denmark, May 2003 by F. Wickelmaier.
  • This STRESS value is used to inform the system of a loudspeaker localization success, or failure if the value is too high meaning that one or more inter-speaker distances have been wrongly estimated.
  • the estimated coordinates are only relative meaning that front and rear as well as left and right cannot be disambiguated.
  • two reference points are needed to get absolute coordinates. For instance, if both front left and right loudspeakers are assigned to their corresponding channel during the setup, then any number of loudspeakers can be localized in the listening room system.
  • equation (2) cannot be applied to estimate the direct TOF. To circumvent this issue, more peaks are selected than just the maximum one. These peaks are constrained to be above a given threshold depending on the maximum peak value. Then, as many distance matrices D ' are built as possible distance combinations. Each Di is used as an input for
  • TDOA time difference of arrivals
  • the pairwise GCC- ⁇ - ⁇ functions are computed as described above.
  • the function r TM T is written as:
  • Fig. 6 shows a typical GCC- ⁇ - ⁇ function computed from a pair of microphones 1 and 2 .
  • the maximum peak does not correspond to the direct TDOA between microphones but to a strong reflection occuring in the listening room. Hence the need to disambiguate the direct sound from room reflections.
  • the peaks search in each pairwise cross-correlation function is bounded by the inter-microphones distance d»m .
  • a TDOA from a sound source to a pair of microphones can never exceed in magnitude the time of flight from one microphone to the other.
  • the disambiguation algorithm is inspired by "Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Environments", IEEE Trans. On Audio, Speech, and Lang. Proc , Vol. 16, No. 8, November 2008 by J. Scheuing, B. Yang. It is meant to estimate the combination of TDOA resulting from the direct sound, that is, the listener voice.
  • the zero cycling sum condition always holds for TDOAs originating from the same sound source.
  • the zero cycling sum condition is as follows:
  • ⁇ TM is the TDOA between microphones m and n .
  • a combination of TDOA fulfilling the above condition forms a consistent triple.
  • the zero cycling sum condition usually holds for several TDOA combinations. They are linked to sound sources including the direct one, i.e. the listener voice, image sources resulting from room reflections, or even ghost sources occurring because of noise interferences within the listening room.
  • the number N of microphones is taken advantage of to build consistent N-tuple to narrow down the number of possible consistent TDOA combinations.
  • the process is shown in Fig. 7 in the case of five microphones.
  • TDOA values mn are replaced by letters (a,b,c,). The process is as follows:
  • the optimal TDOA combination is converted into listener coordinates. It is actually the intersection point of hyperbolas defined by each TDOA pair.
  • SI simple spherical interpolation
  • the problem with the SI method is that it can introduce significant bias in the listener localization when small errors are present in the estimated TDOA and/or loudspeaker coordinates.
  • D would simply be augmented with one column and the symmetric row containing all dsn to obtain joint loudspeaker and listener coordinates from the MDS algorithm.
  • dsn the closest loudspeaker to the listener.
  • simply computed as dsn T * « - C .
  • the augmented matrix aug is used as the input of the MDS algorithm.
  • the STRESS value returned describes somehow the unknown absolute distance between the listener
  • aug is updated based on the current STRESS value.
  • the process is iterated until the STRESS value converges. Experiments have shown that it usually converges quickly after a few iterations.
  • the algorithm returns the refined loudspeaker coordinates x " as well as the listener coordinates Xs and the final STRESS value. Again, the latter is used to inform the system of an accurate localization success or otherwise a wrong localization if the final STRESS value is still too high.
  • embodiments of the method of the present invention allow the automatic allocation of any number of arbitrarily placed loudspeakers to their appropriate audio channel in a sound system.
  • two reference points are needed to then allocate each loudspeaker to its actual channel. For instance, if front left and right loudspeakers are assigned to their corresponding channel during setup, then any number of speakers in any order can be added to the system. Each loudspeaker will be correctly localized and assigned to its appropriate channel. It is thus very useful and advantageous for scalable systems including those having wireless speakers.
  • Embodiments of the present invention surprisingly and advantageously estimate proper parameters to perform spatial calibration of a multi-channel sound system.
  • the spatial calibration parameters are based on a joint loudspeaker and listener localization algorithm that is designed to provide very good robustness and accuracy in real environments such as typical living rooms.
  • embodiments of the method of the present invention use a microphone integrated in each loudspeaker which makes it particularly suitable for scalable systems.
  • the present invention advantageously overcomes issues such as background noise, early reflections against walls, and reverberation, which can hamper or prevent successful calibration.
  • Aforementioned algorithms are used to disambiguate direct sound from reflected sound.
  • the energy radiated is +3dB more than if the loudspeaker was located far from the wall because of strong early reflections.
  • Embodiments of the present invention in which loudspeaker units have built-in microphones help to estimate the energy radiated at the loudspeaker position for a loudspeaker, and then reduce the gain of the audio channel being played back by the loudspeaker accordingly.
  • the present invention provides a robust calibration method and system for calibrating loudspeakers in a multi-channel audio system.
  • the present invention overcomes the problems discussed above associated with requiring such an external microphone.
  • the problems discussed above associated with the user having to move an external microphone around or having to move a soundbar with an integrated microphone around are avoided.

Abstract

A method of calibrating a plurality of loudspeakers each corresponding to a respective microphone located at a respective predetermined location relative to the respective loudspeaker, each loudspeaker for playing back an audio channel, the method comprising: receiving through one or more of the microphones a voice sound from a listener at a listening position; playing back a voice sound through one or more of the loudspeakers sequentially; receiving through one or more of the microphones the voice sound played sequentially back through the one or more loudspeakers; calculating locations of one or more of the loudspeakers based on receipt through the one or more microphones of the voice sound from the one or more loudspeakers; calculating a location of the listening position based on receipt through the one or more microphones of the voice sound from the listener; calculating a respective offset for one or more of the loudspeakers based on the calculated locations of the listening position and the one or more loudspeakers; and adjusting the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.

Description

TITLE OF THE I NVENTION
Calibration System for Loudspeakers
FI ELD OF THE INVENTION
The invention relates to calibration systems for loudspeakers, particularly systems for calibrating offsets applied to audio channels played back by the loudspeakers.
BACKGROUND OF THE INVENTION
In multichannel audio systems having separate multiple loudspeakers, such as surround sound systems, the loudspeakers are typically placed in a variety of positions, most of which are non-optimal in terms of sound propagation to a listening position. The possible positions for placing the loudspeakers are often constrained by objects such as furniture in the room. Also, a listener may not necessarily sit at the optimal listening position (or "sweet spot"), which again is usually due to constraints resulting from the positioning of furniture or other objects in the room.
Spatial calibration is a feature that has become more and more available for surround sound systems over the last few years. Spatial calibration is based on adjusting the gains and delays (both of which can be termed "offsets") to be applied to each surround sound channel being played back through a respective loudspeaker to produce an optimal sound field at the listening position wherever the loudspeakers are placed. Spatial calibration therefore focuses the sweet spot at the listening position to provide the listener with an optimal surround sound experience. The sweet spot is virtually focused at the listening position.
Previously, an external microphone was used at the listening position to capture a test signal played by the loudspeakers. The signal captured was used to estimate the gains and delays based on distances between the listening position and the speakers. For instance, one prior sound system (http://www.sony.co.uk electronics/sound-bars/ht-rt5) featured an auto-calibration using an external microphone included in the sales package for the system. Another prior sound system made use of a smartphone microphone at the listening position to calibrate the sound system (http://www.whathifi.com/news/sonos- trueplay-what-it-how-can-you-get-it). In those scenarios, however, the user is required to manually capture the sound emitted by the loudspeakers at the listening position. This is a tedious task for a user and the robustness of the calibration is highly dependent on the user. In particular, the user must handle the microphone in a proper position until calibration is complete. For some systems, this process can last a long time which makes it even more vulnerable to user error. Also, during this process, there is a high chance of the user obstructing the line of sight between a speaker and a microphone with his or her hand or body unintentionally and unknowingly. The use of a smartphone microphone can result in inconsistency and unreliability.
Calibrating a sound system can also be achieved by inversing the problem, that is, emitting a sound at the listening position and capturing the sound at one or more loudspeaker locations using a microphone array. In one particular example (http://www.gibsoninnovations.com/en/news/philips-fidelio-b5-soundbar-wins-eisa- european-home-theater-solution-2015-2016), a microphone array is integrated in a soundbar to calibrate the surround speakers. In a first step, the surround speakers are placed at the listening position to emit the calibration sound. The speakers are then moved to the desired locations and the process is started again. Like the previously described systems, this process is a tedious task for a user and the robustness of the calibration is highly dependent on the user.
Recently, US patent publication 2015/0016642 ("Walsh") proposed a simpler way to calibrate sound systems. Walsh makes use of a microphone array embedded in an anchoring component of the system, such as a soundbar, located in a known front centre position or other predictable or assumed position. The speakers are calibrated by emitting MLS input signals for receipt by the microphone array. The listening position is calibrated with a voice input from the user at the listening position for receipt by the microphone array. In this way, the locations of the speakers and the listening position are estimated relative to the microphone array, and based on these spatial calibration is performed. Walsh, however, relies on a microphone array embedded in an anchoring component located in a known location, and the locations of the other speakers and the listening positon are estimated relative to this anchoring component. Also, in Walsh, time difference of arrival (TDOA) techniques are used to localize sound sources in a free field environment. However, in real-world environments, like a living room, Walsh suffers from issues arising from room reflections which decrease the performance and accuracy of TDOA techniques.
Prior US patent 8,279,709 ("Choisel") discloses a method to estimate speaker positions in a room using one embedded microphone in each speaker. Choisel only uses specific generated test signals such as MLS input signals. In the Choisel system, these test signals are long and broadband. In order to deal with room reflections, a signal covering all of the audible frequency range is used to maximize the discrimination of sound sources. Accordingly, such test signals are noise-like or very close to noise. Furthermore, the localization of a listener is not considered by Choisel.
It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative. SUMMARY OF THE INVENTION
The present invention provides, in a first aspect, a method of calibrating a plurality of loudspeakers each corresponding to a respective microphone located at a respective predetermined location relative to the respective loudspeaker, each loudspeaker for playing back an audio channel, the method comprising:
receiving through one or more of the microphones a voice sound from a listener at a listening position;
playing back a voice sound through one or more of the loudspeakers sequentially; receiving through one or more of the microphones the voice sound played sequentially back through the one or more loudspeakers;
calculating locations of one or more of the loudspeakers based on receipt through the one or more microphones of the voice sound from the one or more loudspeakers;
calculating a location of the listening position based on receipt through the one or more microphones of the voice sound from the listener;
calculating a respective offset for one or more of the loudspeakers based on the calculated locations of the listening position and the one or more loudspeakers; and
adjusting the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
In one embodiment, the voice sound played back through one or more of the loudspeakers is a prerecorded voice sound. In one embodiment, the voice sound played back through one or more of the loudspeakers is a recording of the same voice sound received through the one or more microphones from the listener. In one embodiment, the voice sound is a short word sequence.
In one embodiment, at least one loudspeaker and the respective microphone form part of a loudspeaker unit.
In one embodiment, at least two of the loudspeakers form part of a soundbar. In one embodiment, the two respective microphones corresponding to the two loudspeakers are located at a minimum distance of 40 cm from each other. In one embodiment, the two respective microphones corresponding to the two loudspeakers form part of the soundbar.
In one embodiment, the plurality of loudspeakers comprises three or more loudspeakers.
In one embodiment, the plurality of loudspeakers comprises four or more loudspeakers.
In one embodiment, no more than one line of sight between two of the loudspeakers or between the listener and one of the loudspeakers is obstructed.
In one embodiment, the plurality of loudspeakers comprise three or more loudspeakers with a first one of the loudspeakers matched to a first audio channel and a second one of the loudspeakers matched to a second audio channel, and the method comprises automatically matching at least one of the remaining loudspeakers to an appropriate audio channel based on the calculated location of the at least one loudspeaker.
In one embodiment, the method comprises automatically matching each loudspeaker to an audio channel based on the respective calculated location of each loudspeaker.
In one embodiment, the microphones are synchronized to each other.
In a second aspect, the present invention provides a calibration system for a plurality of loudspeakers, each loudspeaker for playing back an audio channel, the system comprising:
a plurality of microphones each corresponding to a respective loudspeaker and located at a respective predetermined location relative to the respective loudspeaker; and a processor configured to:
receive through one or more of the microphones a sound from a listener at a listening position;
play back a voice sound through one or more of the loudspeakers sequentially; receive through one or more of the microphones the voice sound played sequentially back through the one or more loudspeakers;
calculate locations of one or more of the loudspeakers based on receipt through the one or more microphones of the voice sound from the one or more loudspeakers;
calculate a location of the listening position based on receipt through the one or more microphones of the voice sound from the listener;
calculate a respective offset for one or more of the loudspeakers based on the calculated locations of the listening position and the one or more loudspeakers; and
adjust the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
Throughout this specification, including the claims, the words "comprise", "comprising", and other like terms are to be construed in an inclusive sense, that is, in the sense of "including, but not limited to", and not in an exclusive or exhaustive sense, unless explicitly stated otherwise or the context clearly requires otherwise.
BRIEF DESCRIPTION OF THE FIGURES
Preferred embodiments in accordance with the best mode of the present invention will now be described, by way of example only, with reference to the accompanying figures, in which:
Fig. 1 is a schematic diagram of a calibration system for a plurality of loudspeakers in accordance with an embodiment of the present invention; Fig. 2 is a schematic diagram of a calibration system for a plurality of loudspeakers in accordance with another embodiment of the present invention;
Fig. 3 is a schematic diagram of a loudspeaker localization sequence in accordance with an embodiment of the present invention;
Fig. 4 is a graph of a GCC-ΡΗΑΤ-β function between two loudspeakers exhibiting strong reflections;
Fig. 5 is a schematic diagram of a listening position localization sequence in accordance with an embodiment of the present invention;
Fig. 6 is a graph of a GCC-ΡΗΑΤ-β function between a pair of microphones corrupted by noise, reverberation, and strong early reflections; and
Fig. 7(a) to (c) are schematic diagrams showing the process of determining consistent TDOA combinations with Fig. 7(a) showing a consistent triple of microphones, Fig. 7(b) showing a consistent quadruple of microphones, and Fig. 7(a) showing a consistent quintuple of microphones.
DETAI LED DESCRI PTION OF PREFERRED EMBODIMENTS OF THE I NVENTION
Referring to Figs. 1 and 2, there is provided a method of calibrating a plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc., each corresponding to a respective microphone 2a, 2b, 2c, 2d, etc. (with 1 a corresponding to 2a, 1 b corresponding to 2b, 1 c corresponding to 2c, etc.) located at a respective predetermined location relative to the respective loudspeaker 1 a, 1 b, 1 c, 1 d, etc. Each loudspeaker is for playing back an audio channel. The method comprises:
receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. a voice sound from a listener 3 at a listening position 4;
playing back a voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially;
receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. the voice sound played sequentially back through the one or more loudspeakers 1 a, 1 b, 1 c, 1 d, etc;
calculating locations of one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. based on receipt through the one or more microphones 2a, 2b, 2c, 2d, etc. of the voice sound from the one or more loudspeakers;
calculating a location of the listening position 4 based on receipt through the one or more microphones 2a, 2b, 2c, 2d, etc. of the voice sound from the listener 3;
calculating a respective offset for one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. based on the calculated locations of the listening position 4 and the one or more loudspeakers; and adjusting the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
Typically, each loudspeaker has one corresponding microphone. However, two or more of the loudspeakers can form part of a loudspeaker unit, in which case, there can be two or more corresponding microphones. For example, Fig. 2 shows a soundbar 5 having loudspeakers 1 c, 1 d, 1 e, and 1 f integrated within it. Microphones 2c and 2d correspond to the soundbar 5 and all the loudspeakers 1 c, 1 d, 1 e, and 1f integrated within it. Alternatively, microphone 2c can correspond to loudspeaker 1 c only, or correspond to both loudspeakers 1 c and 1 e only. Similarly, microphone 2d can correspond to loudspeaker 1 d only, or correspond to both loudspeakers 1 d and 1 f only. In cases where a loudspeaker unit, such as the soundbar 5, includes two or more corresponding microphones, these microphones are located at a minimum distance of 40 cm from each other. For example, in the example shown in Fig. 2, the two microphones 2c and 2d (where microphone 2c corresponds to loudspeakers 1 c and/or 1 e, and microphone 2d corresponds to loudspeakers 1 d and/or 1f) are located at a minimum distance of 40 cm from each other. This ensures the uniqueness of the information picked up by each microphone.
Typically, a loudspeaker 1 a, 1 b, 1 c, 1 d, etc. and its corresponding microphone 2a, 2b, 2c, 2d, etc. form part of a loudspeaker unit. In other words, a microphone is integrated within the same housing that houses the corresponding loudspeaker. As shown in Fig. 1 , loudspeaker 1 a is integrated with corresponding microphone 2a in a loudspeaker unit, loudspeaker 1 b is similarly integrated with corresponding microphone 2b, loudspeaker 1 c is similarly integrated with corresponding microphone 2c, and loudspeaker 1 d is similarly integrated with corresponding microphone 2d. In Fig. 2, microphones 2c and 2d are integrated with corresponding loudspeakers 1 c, 1 d, 1 e, and 1f within soundbar 5.
The voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a prerecorded voice sound. However, in other embodiments, the voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a recording of the same voice sound received through the one or more microphones 2a, 2b, 2c, 2d , etc. from the listener 3.
In the present embodiment, where the voice sound is a prerecorded voice sound, the step of receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. a voice sound from the listener 3 at the listening position 4 can be performed either before or after playing back the prerecorded voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially and receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. the prerecorded voice sound played sequentially back through the one or more loudspeakers 1 a, 1 b, 1 c, 1 d, etc. In embodiments where the voice sound played back through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. is a recording of the same voice sound received through the one or more microphones 2a, 2b, 2c, 2d , etc. from the listener 3, the step of receiving through one or more of the microphones 2a, 2b, 2c, 2d, etc. a voice sound from the listener 3 at the listening position 4 must of course be performed before playing back the recorded voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially.
The voice sound is a short word sequence. This is a significant advantage over prior methods since this allows much quicker and simpler calibration of loudspeakers. The involvement of the user in the calibration process is also minimized. This greatly improves the user experience during the calibration process, which is as seamless as possible.
It has been found that the calibration performance is good where the plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc. comprises four or more loudspeakers. Also, no more than one line of sight between two of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. or between the listener 3 and one of the loudspeakers should be obstructed. However, in other embodiments, the plurality of loudspeakers comprise just three loudspeakers.
The microphones 2a, 2b, 2c, 2d, etc. are synchronized to each other. This means that the microphones are operated simultaneously to receive the voice sound. The microphones do not need to be synchronized with the loudspeakers.
Where the plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc. comprise three or more loudspeakers with a first one of the loudspeakers matched to a first audio channel and a second one of the loudspeakers matched to a second audio channel, the method can also comprise automatically matching at least one of the remaining loudspeakers to an appropriate audio channel based on the calculated location of the at least one loudspeaker. The appropriate audio channel depends on the type of multi-channel audio system or surround sound audio system. For example, in a two-channel (left and right) stereo audio system, a first one of the loudspeakers is matched to a first audio channel (e.g. the left channel) and a second one of the loudspeakers is matched to a second audio channel (e.g. the right channel), and the method automatically matches at least one of the remaining loudspeakers to the first or second audio channel based on the calculated location of the at least one loudspeaker. In a surround sound system, the remaining loudspeakers can be automatically matched to one of the left front, right front, left rear, right rear, etc. audio channels appropriate to the calculated location of each remaining loudspeaker.
Where it is important to match the loudspeakers more accurately in 3D space, a third one of the loudspeakers is also matched to an audio channel before automatically matching the remaining to appropriate respective audio channels.
Alternatively, the method can automatically match each loudspeaker 1 a, 1 b, 1 c, 1 d, etc. to an audio channel based on the respective calculated location of each loudspeaker. As well as a calibration method, it should be clear from the foregoing that the present invention also includes a calibration system for a plurality of loudspeakers 1 a, 1 b, 1 c, 1 d, etc., with each loudspeaker for playing back an audio channel. The calibration system comprises a plurality of microphones 2a, 2b, 2c, 2d, etc. each corresponding to a respective loudspeaker 1 a, 1 b, 1 c, 1 d, etc. (with 1 a corresponding to 2a, 1 b corresponding to 2b, 1 c corresponding to 2c, etc.) and located at a respective predetermined location relative to the respective loudspeaker. A processor is configured to: receive through one or more of the microphones 2a, 2b, 2c, 2d, etc. a sound from a listener 3 at a listening position 4; play back a voice sound through one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. sequentially; receive through one or more of the microphones 2a, 2b, 2c, 2d, etc. the voice sound played sequentially back through the one or more loudspeakers 1 a, 1 b, 1 c, 1 d, etc.; calculate locations of one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. based on receipt through the one or more microphones 2a, 2b, 2c, 2d, etc. of the voice sound from the one or more loudspeakers; calculate a location of the listening position 4 based on receipt through the one or more microphones 2a, 2b, 2c, 2d, etc. of the voice sound from the listener 3; calculate a respective offset for one or more of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. based on the calculated locations of the listening position 4 and the one or more loudspeakers; and adjust the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
The audio channels played back through the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. are typically the audio channels that form a stereo or surround sound system. For example, in Fig. 1 , loudspeakers 1 a and 1 c can play back a left stereo channel, whilst loudspeakers 1 b and 1 d play back an associated right stereo channel. Alternatively, loudspeaker 1 a plays back a left rear channel, loudspeaker 1 b plays back a right rear channel, loudspeaker 1 c plays back a left front channel, and loudspeaker 1 d plays back a right front channel of a surround sound system. In Fig. 2, loudspeakers 1 a, 1 c, and 1 e can play back a left stereo channel, whilst loudspeakers 1 b, 1 d, and 1 f play back an associated right stereo channel. Alternatively, loudspeaker 1 a plays back a left rear channel, loudspeaker 1 b plays back a right rear channel, loudspeakers 1 c and 1 e play back a left front channel, and loudspeakers 1 d and 1 f play back a right front channel of a surround sound system.
A typical operation of the method and system of the present invention described above involves, firstly, localizing and mapping the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. to corresponding audio channels. Short voice sounds or sequences are used as short playback sequences to localize the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. Each loudspeaker 1 a, 1 b, 1 c, 1 d, etc. plays the short voice sequence one after the other while the remaining loudspeakers capture the radiated sound utilizing their corresponding built-in microphones 2a, 2b, 2c, 2d, etc. In doing so, the inter-distances between each pair of loudspeakers are estimated by correlating the captured sound with the original signal. By combining these inter-distances, the location of each loudspeaker is then estimated.
Once the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. have been mapped, the listener/user 3 is localized. The voice of the user 3 is captured by the built-in microphones 2a, 2b, 2c, 2d, etc. and used to localize the listening position 4. The time difference of arrivals (TDOAs) is estimated by combining every possible pair of signals captured by the built-in microphones 2a, 2b, 2c, 2d, etc. By combining the estimated set of TDOAs and the location of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc., the location of the listening position 4 is estimated.
From the calculated locations of the loudspeakers 1 a, 1 b, 1 c, 1 d, etc. and the listening position 4, offsets in the form of gains and delays are calculated for each audio channel to virtually locate the listening position 4 to the optimal listening position, such as the centre of a surround sound audio track. Using the present invention, it is possible to automatically assign any number of arbitrarily placed satellite loudspeakers (such as 1 a, 1 b, 1 c, 1 d, etc.) to the correct audio channel knowing two reference points, e.g. the front right and front left loudspeakers.
Further details of how the loudspeakers and the listening position are localized will now be presented.
The loudspeaker localization stage is based on a multidimensional scaling (MDS) algorithm as described by S. Choisel et al. in "Loudspeaker Position Estimation", US Patent 0195444, issued 5 August 2010.
It starts with a sequence of short clips played by each loudspeaker one after the other. For a better user experience during the calibration process, embodiments of the present invention use short voice samples as calibration clips. Each time a loudspeaker is playing a voice sample, propagated sound is captured by all the embedded microphones. The purpose of such a sequence, as best shown in Fig. 3 in the case of a four speakers setup, is to estimate inter-speaker distances d»m for η = ι- · Ν anc| m = i- - N t to eventually compute loudspeaker coordinates by means of an MDS algorithm. N denotes the total number of loudspeakers. It is noted that the reference numerals used in Fig. 3 are specific to Fig. 3 only and do not refer to the same or like reference numerals in the other figures.
Let x" denote the signal played by loudspeaker n and ymn the signal captured by microphone m when loudspeaker n is playing. Their Fourier Transforms are respectively χ η(ω) anc| γ,ηη(ω) _ We compute the generalized cross-correlation (GCC) functions Γτ) (as defined in "The Generalized Correlation Method for Estimation of Time Delay", IEEE Trans, on Acoustics, Speech, and Sig. Proc , Vol. 24, pp. 320-327, August. 1976 by C.H. Knapp and G.C. Carter):
Figure imgf000012_0001
The superscript * denotes the complex conjugate, τ is the time index, ω the angular frequency, and w^ a weighting function in the Fourier domain to be defined. In free field environment, the maximum peak in Γτ) should corresponds to the direct time of flight (TOF) between loudspeakers n and m . The corresponding estimated distance dnm is found as:
dm„ = argmaX{rmB (r).c}
(2) c being the celerity of sound. However, the GCC function is often corrupted by noise and reverberation (including early reflections) making the estimation of the inter-speaker direct TOF difficult. Using the ΡΗΑΤ-β transform method, the benefit of which has been proven in noisy and reverberant environments (see "Performance of Phase Transform for Detecting Sound Sources with Microphone Arrays in Reverberant and Noisy environments", Signal Processing, Vol. 87, pp. 1977-1691 , July 2007 by K. Donohue, J, Hannemann, H.G. Dietz), the weighting function for GCC-ΡΗΑΤ-β can be given as follows:
W( ) = l— -
Y (ω)Χ *(ω)
mn " > > (3)
For ^ = 0 , the peak resolution is too poor with narrowband signals such as voice to be able to extract direct TOF. Using ^ = 1 is like whitening the signal which gives as a result narrower peaks in the GCC functions for narrowband signals. However, it gives noise and other distorted frequency components as much importance as relevant signals in the correlation. Decreasing the value of ^ gives more importance to the relevant signal itself.
Experiments have shown good results by using ^ between 0.75 and 0.95.
Besides, maximum peaks are selected so that one distance and its reciprocal are almost equal. It must fulfil the following condition:
Figure imgf000012_0002
where ε is a small constant. It results in the set of inter-speaker distances « ^=i-jv,«=i-jv stored in the distance matrix D . Note that D , whose entries in the main diagonal are all zeros, is symmetric. In the case of a four loudspeaker setup, D is written as: 0 dn d dl4
d2l 0 d24
D =
d 0 d,4
d4l d42 0
The distance matrix is then used by an MDS algorithm to determine loudspeaker coordinates. A MDS implementation can be found in "An Introduction to MDS" Sound Quality Research Unit, Aalborg University, Denmark, May 2003 by F. Wickelmaier. The algorithm outputs the estimated coordinates x" , « = i- - N , and an error value of the estimati written as:
Figure imgf000013_0001
(6)
This STRESS value is used to inform the system of a loudspeaker localization success, or failure if the value is too high meaning that one or more inter-speaker distances have been wrongly estimated.
Note that the estimated coordinates are only relative meaning that front and rear as well as left and right cannot be disambiguated. In 2D space, two reference points are needed to get absolute coordinates. For instance, if both front left and right loudspeakers are assigned to their corresponding channel during the setup, then any number of loudspeakers can be localized in the listening room system.
In some critical configurations, particularly when a loudspeaker-microphone pair is close to a corner of the listening room, early reflections may have a strong impact on the captured signal and distort the GCC-ΡΗΑΤ-β functions. The direct TOF estimation is then problematic. The example shown in Fig. 4 illustrates that the maximum peak of a function Γγ2^ is not corresponding to the direct TOF between speakers l and 2 but to a strong reflection. In case the strong reflection is also present in the reciprocal function, that is
^1 ^-* , equation (2) cannot be applied to estimate the direct TOF. To circumvent this issue, more peaks are selected than just the maximum one. These peaks are constrained to be above a given threshold depending on the maximum peak value. Then, as many distance matrices D' are built as possible distance combinations. Each Di is used as an input for
(Y \OPT the MDS algorithm. Finally, the optimal loudspeaker coordinates set 1 n S which corresponds to the minimum STRESS value returned by the MDS algorithm is selected.
To localize the listening position, the listener is asked by the system in accordance with embodiments of the present invention to say a short sentence. The voice signal is then captured by each embedded microphone composing the surround sound system, as shown in Fig. 5 in case of a four loudspeaker setup. Thus, time difference of arrivals (TDOA) between each possible pair of microphones can be estimated. In a free field environment, a direct pairwise TDOA should correspond to the maximum peak in the cross-correlation function involving the two microphones. It would describe the direct path between the listener and each of the microphones. However, in a real environment, the cross-correlation function is usually corrupted by several factors. First, the narrowband voice signal does not allow clear peaks to be obtained. Secondly, other peaks will occur because of background noise as well as early reflections and reverberation. It is noted that the reference numerals used in Fig. 5 are specific to Fig. 5 only and do not refer to the same or like reference numerals in the other figures.
To be able to obtain relevant information from a narrowband signal such as voice, the pairwise GCC-ΡΗΑΤ-β functions are computed as described above. In the case of listener localization, the function rT) is written as:
Figure imgf000014_0001
Y Y
where m and » are the signals captured by microphones m and n respectively. Fig. 6 shows a typical GCC-ΡΗΑΤ-β function computed from a pair of microphones 1 and 2 . In that case, the maximum peak does not correspond to the direct TDOA between microphones but to a strong reflection occuring in the listening room. Hence the need to disambiguate the direct sound from room reflections.
First of all, the peaks search in each pairwise cross-correlation function is bounded by the inter-microphones distance d»m . In fact, a TDOA from a sound source to a pair of microphones can never exceed in magnitude the time of flight from one microphone to the other. The disambiguation algorithm is inspired by "Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Environments", IEEE Trans. On Audio, Speech, and Lang. Proc , Vol. 16, No. 8, November 2008 by J. Scheuing, B. Yang. It is meant to estimate the combination of TDOA resulting from the direct sound, that is, the listener voice.
The main idea is that the zero cycling sum condition always holds for TDOAs originating from the same sound source. For any microphone triple {n> m> k} , the zero cycling sum condition is as follows:
(8) where τ™ is the TDOA between microphones m and n . A combination of TDOA fulfilling the above condition forms a consistent triple. Unfortunately, the zero cycling sum condition usually holds for several TDOA combinations. They are linked to sound sources including the direct one, i.e. the listener voice, image sources resulting from room reflections, or even ghost sources occurring because of noise interferences within the listening room. Thus, the number N of microphones is taken advantage of to build consistent N-tuple to narrow down the number of possible consistent TDOA combinations. The process is shown in Fig. 7 in the case of five microphones. For the sake of convenience, TDOA values mn are replaced by letters (a,b,c,...). The process is as follows:
(a) Start with an arbitrary set of three microphones 1 ,2,3. Build consistent triples by finding TDAO combinations {a,b,c} that fulfil the zero cycling sum condition, (b) Add microphone 4 to the reference triple of microphones. Find TDOA combinations {a,d,e}, {b,e,f}, and {c,d,f} that form consistent triples. I results in building consistent quadruples {a,b,c,d,e,f}.
(c) Add microphone 5 to build consistent quintuples.
(d) Continue the process to build consistent N-tuples.
Ideally, the final set of N-tuples may only contain one TDOA combination corresponding to the listener position. However, in a critical environment, a few TDOA combinations remain, particularly if the number of microphones N is small (typical situation when N = 4 ). It is noted that the reference numerals used in Fig. 7 are specific to Fig. 7 only and do not refer to the same or like reference numerals in the other figures.
Thus, we make use of another criteria to estimate the optimal TDOA combination linked to the listener position. In fact, we can usefully combine GCC and the steered response power (SRP) method similar to that suggested in "A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays", Ph.D. Thesis, Brown Univ., Providence Rl, May 2000 by J.H. DiBiase. For each N-tuple linked to a potential sound source 5 , we compute the SRP value as follows:
N N
P(.S) =∑∑rm„(TS )
n=\ m=\ (9)
Selecting the maximum of p^ yields the optimal sound source 5 and its related TDOA combination.
As a final step, the optimal TDOA combination is converted into listener coordinates. It is actually the intersection point of hyperbolas defined by each TDOA pair. The simple spherical interpolation (SI) method in "Closed-Form Least-Squares Source Location Estimation from Range-Difference measurements", IEEE Trans. Acoust, Speech, Sig. Proc , Vol. 35, pp. 1661 -1669, December 1987 by J.O. Smith and J.S. Abel provides for instance a solution to this problem.
The problem with the SI method is that it can introduce significant bias in the listener localization when small errors are present in the estimated TDOA and/or loudspeaker coordinates. To overcome this issue embodiments of the present invention implement another solution based on the MDS algorithm. The starting point is the distance matrix D returned by the MDS algorithm during the loudspeaker localization stage. If the absolute distances dsn , n = b - - N between the listener s and each speaker n have been estimated,
D would simply be augmented with one column and the symmetric row containing all dsn to obtain joint loudspeaker and listener coordinates from the MDS algorithm. However, only relative distances dsn from the listener to the loudspeakers can be computed from the estimated TDOA set. Let k denote the closest loudspeaker to the listener. The relative distances are written as dsn = d^ ~ d^ anc| simply computed as dsn = T*«-C . Note that the relative distance between the listener and the closest loudspeaker is dsk = 0.
We use an iterative process to estimate the absolute listener-to-loudspeaker distances and therefore the listener position. In the first iteration, D is augmented with the relative listener-to-speaker distances. In case of a four loudspeaker setup:
0 dn d du
d2l 0 d24
di2 0 di4
d4l d42 0 d4s
ds2 ds4 0
The augmented matrix aug is used as the input of the MDS algorithm. The STRESS value returned describes somehow the unknown absolute distance between the listener
• D '
and the closest loudspeaker. Thus, at each iteration 1 , aug is updated based on the current STRESS value. The process is iterated until the STRESS value converges. Experiments have shown that it usually converges quickly after a few iterations. At the last iteration, the algorithm returns the refined loudspeaker coordinates x" as well as the listener coordinates Xs and the final STRESS value. Again, the latter is used to inform the system of an accurate localization success or otherwise a wrong localization if the final STRESS value is still too high.
From the loudspeaker and the listener coordinates, absolute listener-to-loudspeaker distances dsn can be computed. It leads to the proper gains and delays applied to each channel to virtually relocate the listening position to the centre of the surrounding sound events. Gains n in dB are expressed as:
Gn = 2Q * \og(-^ ) , n = \- N
m∞t(dsn )
Delays τ« in sec are expressed as: d„- nm{d„) f n = l_ N
(12)
Moreover, embodiments of the method of the present invention allow the automatic allocation of any number of arbitrarily placed loudspeakers to their appropriate audio channel in a sound system. As explained above, in the 2D space, two reference points are needed to then allocate each loudspeaker to its actual channel. For instance, if front left and right loudspeakers are assigned to their corresponding channel during setup, then any number of speakers in any order can be added to the system. Each loudspeaker will be correctly localized and assigned to its appropriate channel. It is thus very useful and advantageous for scalable systems including those having wireless speakers.
Embodiments of the present invention surprisingly and advantageously estimate proper parameters to perform spatial calibration of a multi-channel sound system. The spatial calibration parameters are based on a joint loudspeaker and listener localization algorithm that is designed to provide very good robustness and accuracy in real environments such as typical living rooms. Besides, embodiments of the method of the present invention use a microphone integrated in each loudspeaker which makes it particularly suitable for scalable systems.
The present invention advantageously overcomes issues such as background noise, early reflections against walls, and reverberation, which can hamper or prevent successful calibration. Aforementioned algorithms are used to disambiguate direct sound from reflected sound. Furthermore, if a loudspeaker is located next to a wall, the energy radiated is +3dB more than if the loudspeaker was located far from the wall because of strong early reflections. Embodiments of the present invention in which loudspeaker units have built-in microphones help to estimate the energy radiated at the loudspeaker position for a loudspeaker, and then reduce the gain of the audio channel being played back by the loudspeaker accordingly. Thus, the present invention provides a robust calibration method and system for calibrating loudspeakers in a multi-channel audio system.
By not requiring a separate external microphone, the present invention overcomes the problems discussed above associated with requiring such an external microphone. In particular, the problems discussed above associated with the user having to move an external microphone around or having to move a soundbar with an integrated microphone around are avoided.
It is appreciated that the aforesaid embodiments are only exemplary embodiments adopted to describe the principles of the present invention, and the present invention is not merely limited thereto. Various variants and modifications can be made by those of ordinary skill in the art without departing from the spirit and essence of the present invention, and these variants and modifications are also covered within the scope of the present invention. Accordingly, although the invention has been described with reference to specific examples, it is appreciated by those skilled in the art that the invention can be embodied in many other forms. It is also appreciated by those skilled in the art that the features of the various examples described can be combined in other combinations.

Claims

1 . A method of calibrating a plurality of loudspeakers each corresponding to a respective microphone located at a respective predetermined location relative to the respective loudspeaker, each loudspeaker for playing back an audio channel, the method comprising: receiving through one or more of the microphones a voice sound from a listener at a listening position;
playing back a voice sound through one or more of the loudspeakers sequentially; receiving through one or more of the microphones the voice sound played sequentially back through the one or more loudspeakers;
calculating locations of one or more of the loudspeakers based on receipt through the one or more microphones of the voice sound from the one or more loudspeakers;
calculating a location of the listening position based on receipt through the one or more microphones of the voice sound from the listener;
calculating a respective offset for one or more of the loudspeakers based on the calculated locations of the listening position and the one or more loudspeakers; and
adjusting the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
2. A method according to claim 1 wherein the voice sound played back through one or more of the loudspeakers is a prerecorded voice sound.
3. A method according to claim 1 wherein the voice sound played back through one or more of the loudspeakers is a recording of the same voice sound received through the one or more microphones from the listener.
4. A method according to any one of claims 1 to 3 wherein the voice sound is a short word sequence.
5. A method according to any one of claims 1 to 4 wherein at least one loudspeaker and the respective microphone form part of a loudspeaker unit.
6. A method according to any one of claims 1 to 5 wherein at least two of the loudspeakers form part of a soundbar.
7. A method according to claim 6 wherein the two respective microphones corresponding to the two loudspeakers are located at a minimum distance of 40 cm from each other.
8. A method according to any of claims 6 to 7 wherein the two respective microphones corresponding to the two loudspeakers form part of the soundbar.
9. A method according to any one of claims 1 to 8 wherein the plurality of loudspeakers comprises three or more loudspeakers.
10. A method according to any one of claims 1 to 8 wherein the plurality of loudspeakers comprises four or more loudspeakers.
1 1 . A method according to claim 10 wherein no more than one line of sight between two of the loudspeakers or between the listener and one of the loudspeakers is obstructed.
12. A method according to any one claims 1 to 1 1 wherein the plurality of loudspeakers comprise three or more loudspeakers with a first one of the loudspeakers matched to a first audio channel and a second one of the loudspeakers matched to a second audio channel, the method comprising automatically matching at least one of the remaining loudspeakers to an appropriate audio channel based on the calculated location of the at least one loudspeaker.
13. A method according to any one of claims 1 to 1 1 comprising automatically matching each loudspeaker to an audio channel based on the respective calculated location of each loudspeaker.
14. A method according to any one of claims 1 to 13 wherein the microphones are synchronized to each other.
15. A calibration system for a plurality of loudspeakers, each loudspeaker for playing back an audio channel, the system comprising:
a plurality of microphones each corresponding to a respective loudspeaker and located at a respective predetermined location relative to the respective loudspeaker; and a processor configured to:
receive through one or more of the microphones a sound from a listener at a listening position;
play back a voice sound through one or more of the loudspeakers sequentially; receive through one or more of the microphones the voice sound played sequentially back through the one or more loudspeakers;
calculate locations of one or more of the loudspeakers based on receipt through the one or more microphones of the voice sound from the one or more loudspeakers; calculate a location of the listening position based on receipt through the one or more microphones of the voice sound from the listener;
calculate a respective offset for one or more of the loudspeakers based on the calculated locations of the listening position and the one or more loudspeakers; and adjust the audio channel being played back on a respective loudspeaker based on the respective offset calculated for the respective loudspeaker.
PCT/EP2017/062104 2017-05-19 2017-05-19 Calibration system for loudspeakers WO2018210429A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/062104 WO2018210429A1 (en) 2017-05-19 2017-05-19 Calibration system for loudspeakers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/062104 WO2018210429A1 (en) 2017-05-19 2017-05-19 Calibration system for loudspeakers

Publications (1)

Publication Number Publication Date
WO2018210429A1 true WO2018210429A1 (en) 2018-11-22

Family

ID=58737576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/062104 WO2018210429A1 (en) 2017-05-19 2017-05-19 Calibration system for loudspeakers

Country Status (1)

Country Link
WO (1) WO2018210429A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023070105A3 (en) * 2021-10-21 2023-07-20 Syng, Inc. Systems and methods for loudspeaker layout mapping

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1651007A2 (en) * 2004-10-20 2006-04-26 Matsushita Electric Industrial Co., Ltd. Multichannel sound reproduction apparatus and multichannel sound adjustment method
WO2006131893A1 (en) * 2005-06-09 2006-12-14 Koninklijke Philips Electronics N.V. Method of and system for determining distances between loudspeakers
US20100195444A1 (en) 2007-07-18 2010-08-05 Bank & Olufsen A/S Loudspeaker position estimation
US20150016642A1 (en) 2013-07-15 2015-01-15 Dts, Inc. Spatial calibration of surround sound systems including listener position estimation
WO2015076930A1 (en) * 2013-11-22 2015-05-28 Tiskerling Dynamics Llc Handsfree beam pattern configuration
WO2016165863A1 (en) * 2015-04-15 2016-10-20 Qualcomm Technologies International, Ltd. Speaker location determining system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1651007A2 (en) * 2004-10-20 2006-04-26 Matsushita Electric Industrial Co., Ltd. Multichannel sound reproduction apparatus and multichannel sound adjustment method
WO2006131893A1 (en) * 2005-06-09 2006-12-14 Koninklijke Philips Electronics N.V. Method of and system for determining distances between loudspeakers
US20100195444A1 (en) 2007-07-18 2010-08-05 Bank & Olufsen A/S Loudspeaker position estimation
US8279709B2 (en) 2007-07-18 2012-10-02 Bang & Olufsen A/S Loudspeaker position estimation
US20150016642A1 (en) 2013-07-15 2015-01-15 Dts, Inc. Spatial calibration of surround sound systems including listener position estimation
WO2015076930A1 (en) * 2013-11-22 2015-05-28 Tiskerling Dynamics Llc Handsfree beam pattern configuration
WO2016165863A1 (en) * 2015-04-15 2016-10-20 Qualcomm Technologies International, Ltd. Speaker location determining system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
C.H. KNAPP; G.C. CARTER: "The Generalized Correlation Method for Estimation of Time Delay", IEEE TRANS. ON ACOUSTICS, SPEECH, AND SIG. PROC., vol. 24, August 1976 (1976-08-01), pages 320 - 327, XP002281206, DOI: doi:10.1109/TASSP.1976.1162830
F. WICKELMAIER: "An Introduction to MDS", SOUND QUALITY RESEARCH UNIT, May 2003 (2003-05-01)
J. SCHEUING; B. YANG: "Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Environments", IEEE TRANS. ON AUDIO, SPEECH, AND LANG. PROC., vol. 16, no. 8, November 2008 (2008-11-01), XP055258121, DOI: doi:10.1109/TASL.2008.2004533
J.H. DIBIASE: "A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays", PH.D. THESIS, May 2000 (2000-05-01)
J.O. SMITH; J.S. ABEL: "Closed-Form Least-Squares Source Location Estimation from Range-Difference measurements", IEEE TRANS. ACOUST., SPEECH, SIG. PROC., vol. 35, December 1987 (1987-12-01), pages 1661 - 1669, XP007914836
K. DONOHUE; J, HANNEMANN; H.G. DIETZ: "Performance of Phase Transform for Detecting Sound Sources with Microphone Arrays in Reverberant and Noisy environments", SIGNAL PROCESSING, vol. 87, July 2007 (2007-07-01), pages 1977 - 1691

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023070105A3 (en) * 2021-10-21 2023-07-20 Syng, Inc. Systems and methods for loudspeaker layout mapping

Similar Documents

Publication Publication Date Title
US10972835B2 (en) Conference system with a microphone array system and a method of speech acquisition in a conference system
US11425503B2 (en) Automatic discovery and localization of speaker locations in surround sound systems
KR101591220B1 (en) Apparatus and method for microphone positioning based on a spatial power density
KR101619578B1 (en) Apparatus and method for geometry-based spatial audio coding
KR101415026B1 (en) Method and apparatus for acquiring the multi-channel sound with a microphone array
US9484038B2 (en) Apparatus and method for merging geometry-based spatial audio coding streams
KR101555416B1 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
TWI700937B (en) Room characterization and correction for multi-channel audio
US9488716B2 (en) Microphone autolocalization using moving acoustic source
EP3214858A1 (en) Apparatus and method for determining delay and gain parameters for calibrating a multi channel audio system
Canclini et al. A robust and low-complexity source localization algorithm for asynchronous distributed microphone networks
US11284211B2 (en) Determination of targeted spatial audio parameters and associated spatial audio playback
TW200948165A (en) Sound system with acoustic calibration function
CN108141665A (en) Signal processing apparatus, signal processing method and program
US20080170728A1 (en) Processing microphone generated signals to generate surround sound
WO2013064943A1 (en) Spatial sound rendering system and method
Gaubitch et al. Calibration of distributed sound acquisition systems using TOA measurements from a moving acoustic source
WO2018210429A1 (en) Calibration system for loudspeakers
KR20090128221A (en) Method for sound source localization and system thereof
Di Carlo et al. dEchorate: a calibrated room impulse response database for echo-aware signal processing
JP2015070578A (en) Acoustic control device
Martins et al. Time-of-flight selection for improved acoustic sensor localization using multiple loudspeakers
Chun et al. Conversion of nearly monaural audio to 5.1-channel audio for portable multimedia devices
Xiangyang et al. Ray based virtual time reversal method for the localization of sound sources in reverberant fields

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17724365

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/02/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17724365

Country of ref document: EP

Kind code of ref document: A1