EP3900402A1

EP3900402A1 - Calibration of a distributed sound reproduction system

Info

Publication number: EP3900402A1
Application number: EP19839368.8A
Authority: EP
Inventors: Grégory PALLONE; Marc Emerit; Stéphane Louis Dit Picard; Thomas JOUBAUD
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2018-12-21
Filing date: 2019-12-09
Publication date: 2021-10-27
Also published as: WO2020128214A1; FR3090918A1; US20220060840A1; US11689874B2

Abstract

The invention concerns a method for calibrating a distributed audio reproduction system, comprising a set of N heterogeneous loudspeakers controlled by a server. This method comprises the following steps: a) placing (E415) a microphone in front of a first loudspeaker of the set; b) capturing (E420), by the microphone, a calibration signal sent to the loudspeaker at a first time and reproduced by same; c) capturing (E430), by the microphone, the calibration signal sent with a known time delay to the N-1 other loudspeakers of the set and reproduced by these N-1 loudspeakers; d) capturing (E440), by the microphone, the calibration signal sent to the first loudspeaker at a second time and reproduced again by same; e) repeating steps a) to d) for the N loudspeakers of the set; f) determining (E470) a plurality of heterogeneity factors to be corrected for the set of N loudspeakers by analysing the data thus captured; g) correcting (E480) the determined heterogeneity factors. The invention also concerns a sound reproduction system implementing the described method.

Description

DESCRIPTION

Title: Calibration of a distributed sound reproduction system

The present invention relates to the field of audio reproduction in a distributed and heterogeneous audio reproduction system.

More particularly, the present invention relates to a method and system for calibrating an audio reproduction system comprising a plurality of speakers or heterogeneous sound reproduction elements.

By heterogeneous loudspeakers is meant loudspeakers which come from different suppliers and / or which are of different types, for example wired or wireless.

In such a heterogeneous distributed context, where wired and wireless speakers, of different brands and models, are networked and controlled by a server, obtaining a coherent listening system making it possible to listen to a complete sound stage or to broadcast the same audio signal simultaneously in several rooms of the same house, is not easy.

Indeed, several factors of heterogeneity can arise. The different wireless speakers have their own clock. This situation creates a lack of coordination between the speakers. This lack of coordination includes both a lack of synchronization between the clocks of the loudspeakers, that is to say that the loudspeakers do not start to "play" at the same time, and a fault in tuning, c that is, the speakers do not "play" at the same rate.

A lack of synchronization can lead to an audible delay and / or a shift in the spatial image between the devices. A lack of tuning can cause a comb filter effect, an unstable spatial image and / or audible clicks due to starvation or force-feeding of samples.

Another factor of heterogeneity can come from the fact that different speakers can have different sound renderings. First of all from a global point of view, as some speakers are not on the same sound card and others are wireless speakers, they probably do not play at the same volume. In addition, each speaker has its own frequency response implying that the rendering of each frequency component of the signal to be played is not the same.

Yet another factor of heterogeneity may reside in the spatial configuration of the loudspeakers. In the case of a multichannel rendering, the speakers are generally not ideally positioned, that is to say that their positioning with respect to each other does not follow standardized positions to obtain optimal listening at a position defined of an auditor. For example, the ITU standard entitled "Multichannel stereophonie sound System with and without accompanying picture" from ITTU-R BS.775-3, Radiocommunication Sector of ITU, Broadcasting service (sound), published in 2012 describes such speaker positioning for multichannel stereophonic systems.

There are different systems or protocols allowing only certain heterogeneity factors to be corrected independently.

Conventional multichannel listening systems control different speakers from the same sound card, so these systems do not experience synchronization issues. Synchronization problems appear as soon as multiple sound cards are present or wireless speakers are used. In this case, the synchronization problem stems from a latency problem between the speakers.

Wireless speaker manufacturers can solve this problem by applying a network synchronization protocol between their products from the same manufacturer, but this is no longer possible in the case of heterogeneous distributed audio where the speakers come from different manufacturers.

Another solution is to find the latency between the speakers from an electro-acoustic measurement. If the same signal is sent to all speakers in a distributed audio system at the same time, each speaker will play it at a different time. Measuring the differences between these times gives the relative latencies between the speakers. Synchronizing the speakers therefore amounts to delaying those who are the most ahead from the estimated values. This technique has already been applied to synchronize Bluetooth speakers of different brands and models. However, it does not take into account the clock drift that exists between the speakers. Thus, the speakers or speakers may appear to be playing at the same time at the start of playback but will become out of sync over time.

Other techniques make it possible to reduce faults such as the level of sound rendering or the position of loudspeakers, but this requires independent measurements linked to each defect capable of being corrected.

The present invention improves the situation.

To this end, it proposes a method for calibrating a distributed audio reproduction system, comprising a set of N heterogeneous loudspeakers controlled by a server. The method is such that it includes the following steps: a) placing a microphone in front of a first speaker of the assembly; b) pick-up by the microphone of a calibration signal sent to the first loudspeaker at a first instant and restored by the latter; c) capture, by the microphone, of the calibration signal sent with a known time difference to the Nl other speakers of the set and restored by these Nl speakers; d) picking up, by the microphone, of the calibration signal sent to the first loudspeaker at a second instant and restored once again by the latter; e) iteration of steps a) to d) for the N loudspeakers of the set; determination of a plurality of heterogeneity factors to be corrected for all of the N loudspeakers by analysis of the data thus captured; g) correction of the determined heterogeneity factors.

The calibration process thus described makes it possible to optimize the pickup for different heterogeneous loudspeakers which do not necessarily belong to the same supplier or which are of different type in order to obtain corrections adapted to the different factors of heterogeneity of the loudspeakers of the system. of restitution. A single calibration process makes it possible to correct different heterogeneity factors, which on the one hand improves the quality of the distributed system and optimizes the resources necessary for the calibration of this system. Steps b), c) and d) of this process can be carried out in a different order without this detracting from the scope of the invention.

The various particular embodiments mentioned below can be added independently or in combination with each other, to the steps of the calibration process defined above.

Different heterogeneity factors are possible such as synchronization, tuning of the speakers making up the coordination of these speakers, loudness of the speakers, sound rendering of the speakers and / or mapping of the speakers .

These different heterogeneous factors are at least partly to be corrected. All these factors can be corrected by the same calibration process.

In a particular embodiment, the microphone is included in a calibration device previously tuned to the server.

Thus, it is possible to use, for example, a terminal provided with a microphone to carry out the capture steps. This calibration device being at the same rate as the server, it is then possible to correct the heterogeneity factors of the different loudspeakers in an adapted manner compared to the server which controls them and thanks to the data captured.

In one embodiment, the analysis of the captured data includes multiple detections of peaks in a signal resulting from a convolution of the captured data with an inverse calibration signal, a maximum peak being detected by taking into account a threshold for exceeding the peak detected and a minimum duration between two peaks detected, to obtain N * (N + 1) timestamp data.

The convolution of the captured data with the inverse calibration signal gives the impulse responses of the different speakers during the capture according to the process described. Peak detection therefore makes it possible to find the time stamp data for these impulse responses.

According to an advantageous embodiment, oversampling is carried out on the data captured before the peaks are detected. This oversampling allows to have more precise peak detection, which refines the timestamp data determined from this peak detection and will make it possible to increase the precision of the estimated drifts.

In a particular embodiment, an estimate of a clock drift of a loudspeaker of the assembly with respect to a clock of the processing server is carried out from the time stamp data obtained for the signals of calibration sent at the first and second instant and the time elapsed between these two moments.

The calculation of this clock drift makes it possible to determine the heterogeneity factor relating to the tuning of the loudspeakers which can then be corrected to homogenize the restitution system.

To complete this drift estimate, in one embodiment, an estimate of the relative latency between the loudspeakers of the assembly, taken two by two, is carried out from the time stamping data obtained and the estimated drifts.

The calculation of these latencies makes it possible to determine the heterogeneity factor relating to the synchronization of the different loudspeakers which can then be corrected to homogenize the restitution system.

From this estimate of latency, it is possible according to one embodiment, to make an estimate of the distance between the loudspeakers of the set, taken two by two, from the timestamp data obtained, estimated relative latencies and estimated drift.

The estimation of these distances makes it possible to determine the heterogeneity factor relating to the mapping of the loudspeakers in the restitution system which can be corrected to homogenize it.

According to one embodiment of the invention, a heterogeneity factor relating to a tuning of the speakers of the set is corrected by resampling the audio signals intended for the corresponding speakers, according to a frequency dependent on the drifts of estimated clock of speakers with server clock.

This type of correction thus makes it possible to correct the clock drifts of the speakers without modifying the clock of their respective customers.

According to one embodiment, a heterogeneity factor relating to a synchronization of the speakers of the assembly is corrected by adding a buffer memory, for the transmission of the audio signals intended for the corresponding speakers, the duration of which is dependent on the estimated latencies of the speakers. In the same way, this type of correction makes it possible to correct the relative latencies between the speakers without modifying the clocks of the respective customers.

According to a particular embodiment, a heterogeneity factor relating to the sound rendering and / or a heterogeneity factor relating to the sound volume of the speakers of the assembly is corrected by equalization of the audio signals intended for the speakers corresponding, according to gains dependent on impulse responses received from the loudspeakers.

Thus, the correction made on the audio signals makes it possible to easily adapt the sound rendering and / or the sound volume. Several heterogeneity factors can thus be corrected via the same calibration process.

In a particular embodiment, a heterogeneity factor relating to a mapping of the loudspeakers of the assembly is corrected by applying a spatial correction to the corresponding loudspeakers, according to at least one delay depending on the distances estimated between the speakers and a given position of a listener.

Another heterogeneity factor is thus corrected on the basis of these same captured data and estimated distances between the loudspeakers.

The present invention also relates to a system for calibrating a distributed audio reproduction system, comprising a set of N heterogeneous loudspeakers controlled by a server. The calibration system comprises: a microphone which, placed in front of a first loudspeaker of the assembly, is capable of picking up a calibration signal sent to the first loudspeaker at a first instant and restored by the latter, of picking up the calibration signal sent with a known time offset to the Nl other speakers of the set and restored by these Nl speakers, to pick up the calibration signal sent to the first speaker at a second instant and restored by the latter and iterating the capture operations for the N loudspeakers of the set, and a processing server comprising a module for collecting the captured data, an analysis module able to analyze the captured and collected data to determine a plurality of heterogeneity factors to be corrected and a correction module capable of calculating the corrections of the determined heterogeneity factors and of transmitting them to the different client modules of the corresponding loudspeakers in order to apply calculated corrections. In a particular embodiment, the microphone is integrated into a terminal.

The calibration system has the same advantages as the method described above, which it implements.

The invention relates to a computer program comprising code instructions for implementing the steps of the calibration method as described, when these instructions are executed by a processor.

Finally, the invention relates to a storage medium, readable by a processor, integrated or not into the calibration system, possibly removable, on which is recorded a computer program comprising code instructions for the execution of the steps of the calibration process. as previously described.

Other characteristics and advantages of the invention will appear more clearly on reading the following description, given solely by way of nonlimiting example, and made with reference to the appended drawings, in which: [Fig 1] Figure 1 illustrates a calibration system comprising a plurality of heterogeneous speakers, a server and a microphone for implementing the calibration process according to an embodiment of the invention.

[Fig 2] Figure 2 illustrates a clock model and the heterogeneity factors relating to synchronization and tuning according to an embodiment of the invention;

[Fig 3] Figure 3 illustrates an example of a calibration signal used to implement the calibration method according to an embodiment of the invention;

[Fig 4] Figure 4 illustrates a flowchart representing the main steps of a calibration process according to an embodiment of the invention; and

[Fig 5] Figure 5 illustrates in detail, the analysis and correction steps implemented according to an embodiment of the calibration method according to the invention.

Thus, FIG. 1 represents a calibration system according to an embodiment of the invention. This system includes a set of N heterogeneous speakers HP1, HP2, HP3, ..., HPi ..., HPN. In the example illustrated here, the speakers come from different suppliers, some are connected to a sound card by wire, others are by a wireless transmission system. For example, the speaker shown in HPI is a Bluetooth® speaker from any manufacturer, the speaker shown in HPN is also a Bluetooth® speaker from another manufacturer.

The loudspeaker shown in HP3 is for example an enclosure using “Apple Airplay®” technology to connect wirelessly to a broadcasting server.

Other loudspeakers in the entire playback system are wired to devices which may be different and have different sound cards. For example, the speaker shown in HP2 is connected to a living room audio video decoder, of the “set top box” type, the speaker HPi is connected to a personal computer. Of course, this configuration is only one example of a possible configuration, many other types of configuration are possible and the number N of speakers is variable.

All these speakers in this set are therefore heterogeneous, they each have their own clock. Each sound card or wireless speaker is controlled by a software module called client module represented here in C1, C2, C3, ..., Ci, ..., CN. These client modules are themselves connected to a processing server of a local network represented at 100. This server of the local network can be a personal computer, a compact computer of the “Rapsberry Pi®” type, an audio / video amplifier ( "AVR" for Audio Video Receiver in English), a home gateway serving as both an external network access point and a local network server, a communication terminal. The server 100 and the client modules can be integrated into the same device or distributed over several devices in the house. For example, the client module C1 of the speaker HPI is integrated into the server 100 while the client module C2 of the speaker HP2 is integrated into a TV decoder controlled by the server 100. The server 100 comprises a processing module 150 comprising a processor mR for controlling the interactions between the various modules of the server and cooperating with a memory block 120 (MEM) comprising a storage and / or working memory. The memory module 120 stores a computer program (Pg) comprising instructions for the execution, when these instructions are executed by the processor, of the steps of the calibration process and as described for example with reference to FIGS. 4 and 5. The program IT can also be stored on a memory medium readable by a reader of the server device or downloadable in the memory space of the latter.

This server 100 comprises an input or communication module 110 capable of receiving audio data S coming from different audio sources, either local or from a communication network.

The processing module 150 then sends to the client modules C1 to CN, the audio data received, in the form of RTP packets (for “Real-Time Protocol” in English). So that this audio data is reproduced by the set of speakers, in a homogeneous manner, that is to say so that they constitute a homogeneous and audible sound scene between the different speakers, the modules customers must be able to control their speakers without their having heterogeneous factors between them, not corrected. For example, the different clients C1 to CN must be both synchronized with the server but also tuned. An explanation of these two terms is described later with reference to Figure 2.

The calibration system presented in FIG. 1 comprises at least one microphone 140 connected to a client control module (CAL) 130 which can be integrated into the server as shown here. In this case, the microphone can be wired to the server. The microphone client control module and the server then share the same clock. This client module is then naturally tuned to the server.

In another embodiment, a microphone 240 is integrated into a calibration device 200 comprising the microphone control module 230, a processing module 210 comprising a microprocessor and a memory MEM. Such a calibration device also includes a communication module 220 capable of communicating data to the server 100. This calibration device can for example be a communication terminal of the smart phone or “smartphone” type in English.

In this embodiment, the calibration device has its own sound card and its own clock. Tuning is then to be provided so that the calibration device and the server have the same clock rate and so that the data collection and the corrections to be made to the speakers are consistent with the server clock. For this, one can implement a network synchronization protocol of PTP type (for "Precision Time Protocol" in English) and as described for example in the IEEE standard entitled "standard for a precision dock synchronization protocol for networked measurement and control Systems ”, published by IEEE Instrumentation and Measurement Society IEEE 1588-2008.

To implement the calibration method according to the invention, the microphone is placed in front of the loudspeakers of the set of loudspeakers of the reproduction system according to a calibration process described below. A calibration signal as described later with reference to FIG. 4 is sent by the processing server 100 to the different speakers of the system and at different times according to the capture procedure described later with reference to FIG. 4.

All the data captured by this microphone and according to this calibration procedure, are collected for example by the collection module 160 of the server which stores in memory the captured signals as well as the time stamping information determined after analysis of the restored signals and the different times sending calibration signals to the different speakers.

These captured and recorded data are analyzed by the analysis module 170 of the server 100 to determine a plurality of heterogeneity factors to be corrected on the different loudspeakers. Corrections to these different heterogeneity factors are then determined by the correction module 180 which calculates the sampling frequencies, buffer memory duration, gains or other parameters to be applied to the loudspeakers to make the system homogeneous.

These different parameters are then sent to the different client modules so that the appropriate correction is made on the corresponding speakers.

In the case where the microphone is integrated into a calibration device 200, this device can also include a collection module 260 which collects the captured data and sends them to the server by the communication module 220. This calibration device can also integrate a analysis module 270 which, in the same way as described above for the server, analyzes the data collected to determine a plurality of heterogeneity factors to be corrected. The calibration device can send these heterogeneity factors to the server via its communication module 220 or else determine itself the corrections to be made if it integrates a correction module 270. In this case, it sends the corrections which are to be applied to the loudspeakers via their respective client module.

Thus, when the calibration process is carried out, the restitution system has become homogeneous, that is to say that the different heterogeneity factors of the loudspeakers of the assembly have been corrected. The different speakers are then, for example, synchronized, tuned, they have a homogeneous sound quality and volume. Their spatial restitution can be corrected so that the sound scene reproduced by this restitution system is optimal with respect to the given position of a listener. We will now describe a definition of the terms of synchronization and tuning of the clocks of the different loudspeakers. Two independently operating devices have their own clock. A clock is defined as a monotonic function equal to a time which increases at the rate determined by the clock frequency. It generally originates from the moment the device is started up.

The clocks of two devices are necessarily different and three parameters are defined:

- clock offset: time difference at the origin between two clocks; clock drift: frequency difference between two clocks;

- the clock deviation: variation of the drift over time, or second derivative of the clock with respect to time.

A classical modeling of a clock neglects the clock deviation, mainly caused by temperature changes. Thus, in a server / client network context, the client clock Te is expressed as a function of the server clock 7s · according to equation (EQ1): ^ ^{c ~ a} ^ ^s ^ where a represents the drift d client's clock relative to that of the server, and Q represents the offset of the client's clock. FIG. 2 represents this modeling.

The offset is a time and is expressed in seconds. The drift is a dimensionless value equal to the ratio of the clock frequencies of the server and the client fs / fc. It is generally given in the form of a value in ppm (parts-per-million) produced by calculating {EQ2):

In an audio context, the drift can be found from the sampling frequencies. Figure 2 introduces the problem of clock coordination: for the client to have the same clock as the server, it is necessary to correct its drift a and to correct its offset Q. The first operation leads to the tuning of the client and the server, while the second leads to their synchronization.

The calibration method implemented by the calibration system described above with reference to FIG. 1, is now described with reference to FIG. 4. The system is described here when a calibration is planned for N loudspeakers.

A first step E410 of launching the capture is implemented by initializing the number of speakers taken into account at 0 (i = 0).

In step E415, the microphone for picking up the calibration device is placed in front of a first loudspeaker (HPi) of the restitution system which therefore includes N loudspeakers. In step E420, a calibration signal is sent at a first instant t1 to the speaker HPi by the server via the client module Ci of the speaker HPi. The restitution of this signal is picked up by the microphone at this step E420.

The calibration signal is for example a signal whose frequency increases logarithmically with time, this signal being called in English "chirps" or "logarithmic sweeps".

The convolution of the signal measured at the output of the loudspeaker with a reverse calibration signal allows the impulse response of the loudspeaker to be obtained directly. Such a signal is for example a signal of the exponential sliding sine type as illustrated with reference to FIG. 3, ESS of length T (0.2 s in the example illustrated in FIG. 3) and going from the frequency fl (20 Hz) at f2 (20 kHz). This signal is written in the following way as a function of time t as follows (EQ3) \

The measurement of this signal played by a loudspeaker makes it possible to estimate its impulse response by calculating the intercorrelation between the measured signal and the theoretical signal £ 55 (0. This is achieved in practice by convolving the measured signal with a sliding sine inverse / ESS with exponential decay to compensate for energy differences between frequencies (EQ4) \

Figure 3 presents such an example of calibration signal, the graph (a) represents an exponential sliding sine of 0.2 s, the graph (b), the inverse signal and the graph (c) the impulse response obtained by convolution of the sliding sine by its reverse.

In steps E430, E432 and E435 of FIG. 4, the calibration signal is sent to the speakers of the set of speakers, HPk, with k going from 1 to N-1 and different from i. This signal is sent to each of the speakers via its client module Ck with a known time offset At which can be, for example, 5 s.

This time difference is stored in memory in the server. It can be equivalent between each speaker or different. The restitution of these signals is picked up at this step E430 by the microphone which remains opposite the speaker HPi.

The order of sending the calibration signal to these different speakers can be pre-established by the server. For example, in the embodiment illustrated in steps E430 to E435 of FIG. 4, if the microphone is in front of speaker i, the server sends a calibration signal to speaker i + 1 then to speaker i +2, ..., to speaker i + k modulo N until all speakers other than i are taken into account. It performs this same sequence each time the microphone position is changed.

Another pre-established order can be for example to start sending the calibration signal always starting from the same speaker different from i according to a sequence and a defined order (to the next speaker if equal to the microphone positioning speaker ).

These pre-established orders are known to the server and to the analysis module in order to know which loudspeaker corresponds to a captured data.

Finally, the server can send the calibration signal in a random order to the speakers other than i, but in this case, the identification of the speaker for which the calibration signal is sent must be given in association with the received data. so that the analysis of the captured data is relevant.

At step E440, the calibration signal is played again by the speaker HPi, at a time t2 different from tl, which may be a time offset At from the last speaker in the loop E430 to E435 or a time instant offset by tl and before the implementation of the loop E430 to E435.

The time separating time t2 from time tl is stored in the memory of the processing server.

In step E440, it is checked whether the loop E415 to E455 is finished, that is to say that all the speakers have been treated in the same way. If this is not the case (N in E450), then the steps E415 to E440 are iterated for the next speaker i, i going from 0 to N-1. The order of passage of the speakers is the same for the loop E430 to E435 for each iteration. When all the loudspeakers have been processed by the loop E415 to E440 (O in E450), step E460 is implemented.

Steps E420 to E440 can be performed in a different order. For example, the capture of the calibration signal sent at times tl and t2 on the same loudspeaker i can be done before the signals reproduced by the other speakers are picked up. It is also possible to carry out the capture of the signals restored by the speakers different from i, before the capture of the signal restored at the instants t1 and t2 of the speaker i. The order of these steps does not matter on the result of the process.

In step E460, the capture by the microphone is stopped and the captured data (De) are collected and recorded in a memory of the server or of the calibration device according to the embodiments. These data are taken into account in the analysis step E470. This analysis step makes it possible to determine a plurality of heterogeneity factors to be corrected for all of the N loudspeakers. These heterogeneity factors are part of a list among:

- clock coordination of the speakers including synchronization and tuning of the speakers; - loudspeaker volume

- a sound rendering of the speakers; and

- a map of the speakers.

A correction adapted to the determined heterogeneity factors is then determined and applied in E480.

These steps E470 and E480 are detailed in FIG. 5 now described. Thus, the captured data received in E460 and coming from the capture steps E410 to E460 are transformed into impulse responses by convolution with the reverse signal, as described above with reference to FIG. 3. Since the overall operation can be cumbersome, it it may be better to do it using an analysis window.

Once this operation has been carried out, a signal is obtained comprising a series of impulse responses corresponding to the different loudspeakers according to the order of restitution of the calibration signal from the capture procedure.

In step E520, a peak detection is determined on the impulse responses thus obtained. The times corresponding to the maximum of the impulse responses are kept as time stamp data. The detection step is in fact a detection of multiple peaks. The approach used here as an embodiment consists in discovering each local maximum defined by the passage from a positive slope to a negative slope. All these local maximums are then sorted in descending order and the first N% N + 1) are kept.

This approach is simple but can lead to errors if an impulse response has a lower maximum than a noise. For these particular cases to be detected, a peak detection threshold is defined.

In addition, for each impulse response, secondary peaks may be present and larger than the main peak of another response. To avoid this, a minimum duration is defined between two peaks detected on the signal.

This gives N * {N + 1) timestamp data (or "timestamps" in English).

In step E522, the drift ¾ of its clock with respect to that of the processing server is determined for each of the speakers HPi in the set.

The captured data used are the N + 1 timestamp data measured when the calibration microphone is placed in front of the speaker HPi. These timestamp data T f are noted with k e [0, ..., N + 1 [, as well as the theoretical time elapsed between two measurements of the same loudspeaker HPi: t2 -tl.

If the theoretical time elapsed between the signal played by the speaker HPi at time tl and at time t2 is equal to NS with S-At, the constant theoretical time elapsed between two restitution of the calibration signal on two high -adjacent speakers of the loop E430 to E435, we can estimate the drift of the speaker HPi compared to the following server the following equation (EQS)

This theoretical time t2-tl is fixed before launching the calibration and its choice may be a function of the desired precision in terms of estimation of the various heterogeneity factors.

In fact, the precision of estimation of the different clock coordination and mapping parameters is mainly linked to the precision of estimation of the time stamp data. The detection of peaks on the impulse responses implies a temporal precision corresponding to a sample, that is to say approximately 20 ps for a sampling frequency at 48 kHz. Beyond the fact that better precision may be desirable, it is above all the estimation of clock drift that is impacted. Indeed, small values of drifts are to be expected, of the order of 10 ppm. If the theoretical duration between the two time stamp data used to estimate the drift in the previous EQ5 equation is equal to ls, an error of a sample on the estimate of the time stamp data results in an error of about 20 ppm.

A first solution to reduce this error is to increase the duration d between the restitution of the calibration signal. If this duration is such that the duration between the two restitutions of the calibration signal on the same loudspeaker (t2 - tl), used to estimate the drift is at least equal to 20 s, the estimation error becomes less than 1 ppm. This solution involves significantly increasing the total duration of the acoustic calibration, which is not always possible.

A second solution consists in oversampling the impulse responses during a step E510 represented in FIG. 5, in order to increase the precision of the peak detection. Oversampling by an integer factor Z ³ is a conventional process in signal processing. P - 1 zeros are first inserted between the samples of the signal to be oversampled. The signal produced is then filtered by a low-pass filter. In an exemplary embodiment, this low-pass filter is a “Butterworth” filter of order 100 as described in the document entitled “Discrete-Time Signal Processing” by the authors Oppenheim, AV, Schafer, RW, and Buck, JR and published in Prentice Hall, second edition in 1999. This low-pass filter has a cutoff frequency fixed at the frequency of Nyquist Fs / l, with Fs the sampling frequency of the initial signal. This technique makes it possible to reduce the errors of estimation of the time stamp data, and therefore of the calibration parameters, without increasing the measurement time, but oversampling results in an increase in the calculation time.

In practice, a mixture of the two solutions (increasing the time interval d and oversampling) is used. The time between the signals used to estimate the drift is increased to around 8 s and oversampling by a factor of 10 is implemented.

Thus, the drift of each speaker is estimated in E522.

From the time stamping data obtained in E520 and the theoretical time elapsed between the calibration signal played by the speaker i and the signal played by the speaker 0, equal to i {N + 1) 5, it is possible to define a relative latency q _{ί 0} between these two speakers and equal to (EQ6) \

The definition of the relative latencies with respect to the first loudspeaker is arbitrary and can lead to negative values. To reach only positive values and thus to have the delay of each loudspeaker compared to that which is most ahead, one calculates (EQ7) \

E = Eo - _f min _f (^ _{, c} )

One thus obtains all the relative latencies between loudspeaker taken two by two, in step E524.

When all the clock drifts and all the relative latencies are known, the distances between the loudspeakers can be estimated in step E526. According to the calibration procedure described in Figure 4, when the microphone is placed in front of the speaker /) the other speakers play the calibration signal in a circular order. For ke [0 ... N [, the theoretical time elapsed between the time stamp data T ° and T f is equal to kS. The distance between the speaker / and another speaker j is estimated according to equation (EQ8):

with this speed of sound in the air.

The value tij represents the propagation time of a sound wave between the two speakers. For each pair (/) j) of speakers, the distance d / i is estimated twice. The average of these two values is used, ie (EQ9):

to build a symmetrical square matrix D whose elements are the squares of the distances between each pair of loudspeakers:

d ²

for (i, /) e [0 ... L '[ ²

After this step of detailed analysis E470, the calibration method implements a step of correction E480 now detailed to homogenize the heterogeneous distributed audio system.

In step E530, a correction of the tuning heterogeneity factor, corresponding to the clock drift of a loudspeaker relative to the server is calculated. The clock drift between a speaker and the server is not corrected by a direct modification of the clock of the sound card of the corresponding speaker or of the wireless speaker, mainly because access to this clock is not possible in this context of heterogeneous distributed audio. The correction is applied here to the audio data by the client module controlling the loudspeaker. In fact, the audio samples are supplied to the sound card or to the wireless speaker by a client module as described with reference to FIG. 1.

To correct this drift, processing on the sampling frequency is carried out. Indeed, if the acoustic calibration shows that the data is played too quickly, the client module must slow it down.

Thus, for a speaker HPi whose drift ¾ with respect to the server was estimated in step E522, the new sampling frequency (FSRC) to be applied to the audio samples is calculated in E530 and is equal to ^Fs / a This new sampling frequency is given to the sampling frequency converter SRC of the client module Ci (SRC for “Sample Rate Converter” in English). In step E570, this correction is applied by the client Ci, via its SRC converter which implements in this embodiment, a linear interpolation between the samples and takes as parameter only the new sampling frequency / SÆiTtelle that defined above. This resampling is carried out in E580 by each of the customers Cl, C2, ..., CN corresponding to the speakers HPI, HP2, ..., HPN to correct the heterogeneity factor of tuning of the different speakers.

In the same way as the correction of clock drift and therefore of the tuning heterogeneity factor, the correction of the synchronization heterogeneity factor, due to the relative latencies between the speakers, is carried out by the client module of the speaker affected by the correction.

The latencies Q, calculated in E524 represent the delay of each loudspeaker compared to that which is the most ahead. In practice, to correct this latency, it is not possible to advance the playback of late devices. It is therefore necessary to delay the reading of the speakers in advance compared to that which is the most late. For this, we manage to delay reading by adding a buffer memory (or "buffering" in English). The duration of this buffer memory 0, for the loudspeaker is obtained in E540 from the latencies qi according to the equation {EQIO) ·.

This buffer value is transmitted to the client module Ci of the speaker HPi in E580 so that the audio data received from the server is not sent directly to the sound card or to the wireless speaker but after a delay corresponding to the size of the buffer thus determined. The synchronization of all the speakers can then be achieved by adding <¾ to the size of the buffer memory of each client Ci.

To correct the heterogeneity factor of the sound output from the loudspeakers, step E560 recovers the impulse responses from the loudspeakers which have been generated and stored from the captured data. The amplitude of its Fourier transform constitutes the response of the loudspeaker as a function of the frequency. It allows step E560 to calculate the energy in each frequency band considered. The calibration process, described in Figure 4, produces two impulse responses per speaker. The estimated energy values can therefore be averaged over these two measurements. The energy value obtained is then averaged over each frequency band to obtain an equalization correction in the form of a gain to be made to each loudspeaker in each band. These equalization gains can be applied at the server level or can be sent in E580 to the different clients to equalize the audio signal to be transmitted to the speakers and thus homogenize the sound rendering of the speakers.

To now correct the sound volume of the loudspeakers, in step E570 and in one embodiment of this step, only a global volume equalization is carried out, that is to say on a single band taking into account the whole audible spectrum. To avoid overloading the speakers, the equalization applies a gain reduction to each speaker in order to adjust its volume to the lowest of them.

For this, the client modules of the corresponding loudspeakers have a volume option expressed as a percentage. If Ei is the estimated overall energy for each speaker i, its volume Vi (in%) is calculated according to the following equation (EQ11):

min (E)

Vt 100 Ë,

This volume correction is thus sent in E580 to the corresponding client modules so that they apply this volume correction by applying a suitable gain.

The acoustic calibration produces the matrix D of the squares of the distances in step E526, between each pair of loudspeakers. In step E550, a loudspeaker mapping is first carried out on the basis of this data, so that a spatial correction can then be applied to adapt the optimal listening point to a given position of a listener. An approach based on Euclidean distance matrices (EDM or “Eudidean Distance Matrix” in English) can therefore be applied.

The MDS algorithm (for “Muiti-Dimensionai Sca / ing” in English) may apply. It relies on the rank properties of EDMs to estimate the Cartesian coordinates of the speakers in an arbitrary frame of reference as derived in the document entitled “Eudidean distance matrices: Essential theory, algorithms, and applications” by the authors Dokmanic, I., Parhizkar, R., Ranieri, 1, and Vetterli, M published in IEEE Signai Processing Magazine, 32 (6): 12-30 in 2015.

In particular, classic MDS defines the center of the coordinate system at the barycenter of the loudspeakers. However, an important assumption must be respected to be able to apply the MDS: the matrix D must be a Euclidean distance matrix.

According to the authors, this hypothesis is verified if the Gram matrix obtained after centering the matrix D is defined semi-positive, that is to say that its eigenvalues are greater than or equal to 0. It is found that this condition is not always met in the case of application described above because of the placement of the measurement microphone or errors in estimating the distances between the speakers.

If the matrix D is not an EDM, another approach is necessary for the mapping. For example, the ACD algorithm (for “Aiternate Coordinate Descent” in English). This method consists of a gradient descent on each coordinate sought to minimize the error between the matrix D measured and that estimated. This method is described in the document entitled “Eudidean Distance Matrices: Properties, Algorithms and Applications” by the author Parhizkar, R, published in his PhD thesis, École Polytechnique Fédérale de Lausanne, Suisse in 2013. If this algorithm converges quickly, it is still heavier than conventional MDS. This is why, in one embodiment of the invention, the mapping algorithm carried out begins with the application of the MDS method and only applies the ACD method once the matrix of the measured distances has been verified. is not an EDM.

The cartography returns the positions of all the speakers in the form of Cartesian coordinates in an arbitrary coordinate system. The application of a spatial correction of the system adapted to the position of an auditor requires knowledge of this position in the same frame of reference. It can be obtained by localization methods based on microphone antennas or on several microphones distributed in the room. Other approaches can be based on video localization. The determination of the position of the auditor is not the object of this invention. It is received by the server in step E550 to determine the spatial corrections to be made to the different speakers.

A first method of spatial correction consists in moving virtually all the speakers on a circle whose center is the listener. The distance between the latter and each speaker is calculated. The radius of the speaker circle is the largest of these distances. The virtual displacement is finally achieved by applying a delay and a gain to each speaker whose distance to the listener is less than the radius of the circle.

This method already greatly contributes to improving the immersion of the listener, but is not sufficient if the actual positions of the loudspeakers are too far from the optimal positions defined in the standard (ITU, 2012) cited above.

In this case, an angular adaptation virtually replacing the speakers in the optimal positions can be used. This functionality is for example present in the MPEG-H coded and described in the standard (ISO / IEC 23008-3, 2015).

These delay, gain or angle parameters determined in this step E550 are sent to the corresponding client modules so that they implement in E570 these corrections in order to correct the heterogeneity factor relating to the mapping.

Thus, carrying out a calibration method according to the invention makes it possible, in a single measurement, to have access to all the parameters necessary for the homogenization of a heterogeneous distributed audio system. This global calibration is important since the parameters depend on each other, namely the relative latency between two speakers depends on their respective clock drift, and the estimate of the distance between two speakers depends on their relative latency and their respective drift.

The method presented here by the audio playback system can then make the necessary corrections: tuning by sampling frequency conversion; synchronization by adapting buffer memories; global equalization of the speakers by adjusting their volume;

- EQ by frequency band to homogenize the sound rendering;

- the spatial configuration of the system by a mapping algorithm.

One or more of these factors can be corrected in this way.

Claims

1. Method for calibrating a distributed audio reproduction system, comprising a set of N heterogeneous loudspeakers controlled by a server, the method comprising the following steps:

a) placing (E415) a microphone in front of a first speaker of the assembly;

b) capture (E420), by the microphone, of a calibration signal sent to the first loudspeaker at a first instant and restored by the latter;

c) capture (E430), by the microphone, of the calibration signal sent with a known time difference to the N-1 other speakers of the set and restored by these N-1 speakers;

d) capture (E440), by the microphone, of the calibration signal sent to the first loudspeaker at a second instant and restored once again by this latter;

e) iteration of steps a) to d) for the N loudspeakers of the set;

determination (E470) of a plurality of heterogeneity factors to be corrected for all of the N loudspeakers by analysis of the data thus captured;

g) correction (E480) of the determined heterogeneity factors.

2. Method according to claim 1, in which the heterogeneity factors form part of a list among:

- clock coordination of the speakers including synchronization and tuning of the speakers;

- loudspeaker volume

- a sound rendering of the speakers; and

- a map of the speakers.

3. Method according to one of claims 1 to 2, wherein the microphone is included in a calibration device previously tuned with the server.

4. Method according to one of claims 1 to 3, in which the analysis of the captured data comprises multiple detections of peaks in a signal resulting from a convolution of the captured data with an inverse calibration signal, a maximum peak being detected. taking into account a threshold for exceeding the detected peak and a minimum duration between two detected peaks, to obtain N * (N + 1) time stamping data.

5. Method according to claim 4, in which oversampling is carried out on the data captured before the detection of peaks.

6. Method according to one of claims 4 to 5, in which an estimate of a clock drift of a loudspeaker of the assembly with respect to a clock of the processing server is carried out from the data timestamps obtained for calibration signals sent at the first and second instant and the time elapsed between these two instants.

7. The method of claim 6, wherein an estimation of the relative latency between the loudspeakers of the assembly, taken two by two, is carried out from the time stamping data obtained and the estimated drifts.

8. The method of claim 7, wherein an estimate of the distance between the speakers of the assembly, taken two by two, is performed from the time stamp data obtained, the estimated relative latencies and the estimated drifts.

9. Method according to one of claims 6 to 8, in which a heterogeneity factor relating to a tuning of the speakers of the assembly is corrected by resampling the audio signals intended for the corresponding speakers, according to a frequency dependent on the estimated clock drifts of the speakers with the server clock.

10. Method according to one of claims 7 to 9, in which a heterogeneity factor relating to a synchronization of the loudspeakers of the assembly is corrected by adding a buffer memory, for the transmission of the audio signals intended to the corresponding loudspeakers, the duration of which depends on the estimated latencies of the loudspeakers.

11. Method according to one of claims 1 to 10, in which a heterogeneity factor relating to the sound rendering and / or a heterogeneity factor relating to the sound volume of the loudspeakers of the assembly is corrected by equalizing the audio signals intended for the corresponding loudspeakers, according to gains dependent on impulse responses picked up from the loudspeakers.

12. Method according to one of claims 8 to 11, in which a heterogeneity factor relating to a mapping of the speakers of the assembly is corrected by the application of a spatial correction on the corresponding speakers, according to at least one delay depending on the estimated distances between the speakers and a given position of a listener.

13. Calibration system for a distributed audio reproduction system, comprising a set of N heterogeneous loudspeakers (HP1, HP2, .. HPN) controlled by client modules (Cl, ..., CN) controlled by a server (100), the calibration system comprising:

- a microphone (140) which, placed in front of a first loudspeaker of the assembly, is capable of picking up a calibration signal sent to the first loudspeaker at a first instant and restored by the latter, of picking up the signal of calibration sent with a known time difference to the Nl other speakers of the set and restored by these Nl speakers, to pick up the calibration signal sent to the first speaker at a second instant and restored by the latter and to iterate the operations of captures for the N loudspeakers of the set, and - a processing server (100) comprising a collection module (160) of the captured data, an analysis module (170) able to analyze the captured and collected data for determining a plurality of heterogeneity factors to be corrected and a correction module (180) capable of calculating the corrections of the determined heterogeneity factors and of transmitting them to the different client modules (Cl, ..., CN) of the loudspeakers correspondents to apply the calculated corrections.

14. The calibration system according to claim 13, in which the microphone (240) is integrated in a terminal (200).

15. Storage medium readable by a processor, on which a computer program is recorded comprising code instructions for the execution of the steps of the calibration method according to one of claims 1 to 12.