CN117178567A

CN117178567A - Measuring speech intelligibility of an audio environment

Info

Publication number: CN117178567A
Application number: CN202280023026.0A
Authority: CN
Inventors: 尤金·F·戈夫; 雷蒙德·J·迪珀特; 马修·V·科特维斯; 萨玛尔·贝胡拉
Original assignee: Baianpu System Co ltd
Current assignee: Baianpu System Co ltd
Priority date: 2021-01-21
Filing date: 2022-01-20
Publication date: 2023-12-05

Abstract

An example method of operation may include: starting an automatic tuning process; detecting, via one or more microphones, sound measurements associated with outputs of one or more speakers at two or more locations; determining a number of voice transmission index (STI) values equal to the number of microphones; and averaging the voice transmission index values to identify a single voice transmission index value.

Description

Measuring speech intelligibility of an audio environment

Background

In a workplace, conference area, public forum, or other environment, speakers producing audio (audio) and microphones capturing audio may be arranged in a networked configuration covering multiple floors, areas, and rooms of different sizes. Tuning audio in all or most locations is a challenge for manufacturers and design teams of such large-scale audio systems. More advanced tuning work (e.g., combining different test signal strategies and independent speaker signals) presents further challenges to the setup and configuration process.

In one example, the test procedure may initiate a tone (tone) via one speaker and the capture procedure may be initiated via one or more microphones, however, by testing a single speaker signal and identifying feedback for that speaker when other speakers are to be used during a bulletin, presentation, or other auditory event, numerous speakers may not be accurately represented.

In a typical audio system (e.g., conference room), there may be microphones, speakers, phone integration, input signal processing, output signal processing, acoustic echo cancellation, noise reduction, non-linear processing and mixing of the audio signals. Due to the complexity of the corresponding equipment, installation process and software configuration, a team of experts is required to set up, test and install all audio equipment.

Disclosure of Invention

An example embodiment may provide a method comprising one or more of: identifying a plurality of individual speakers on a network controlled by a controller; providing a first test signal to a first speaker and a second test signal to a second speaker, the second test signal comprising a different frequency than the first test signal; detecting different test signals at one or more microphones; the speaker output parameters are automatically tuned based on analysis of the different test signals.

Another example embodiment includes a process configured to perform one or more of the following operations: in a particular room environment, identifying a plurality of speakers and one or more microphones on a network controlled by a controller and an amplifier; providing a test signal for sequential playback from each amplifier channel and a plurality of speakers of the amplifier; simultaneously monitoring test signals from one or more microphones to detect operational speaker and amplifier channels; providing additional test signals to the plurality of speakers to determine tuning parameters; detecting an additional test signal at one or more microphones controlled by a controller; and automatically establishing a background noise level and a noise spectrum of the room environment based on the detected additional test signal.

Another example embodiment may include an apparatus comprising a processor configured to perform one or more of the following: identifying a plurality of speakers and one or more microphones on a network controlled by a controller and an amplifier in a particular room environment; providing a test signal for sequential playback from each amplifier channel and a plurality of speakers of the amplifier; simultaneously monitoring test signals from one or more microphones to detect speaker and amplifier channels in operation; providing additional test signals to the plurality of speakers to determine tuning parameters; detecting an additional test signal at one or more microphones controlled by a controller; and automatically establishing a background noise level and a noise spectrum of the room environment based on the detected additional test signal.

Another example embodiment may include a non-transitory computer-readable storage medium configured to store instructions that, when executed, cause a processor to perform one or more of the following: identifying a plurality of speakers and one or more microphones on a network controlled by a controller and an amplifier in a particular room environment; providing a test signal for sequential playback from each amplifier channel and a plurality of speakers of the amplifier; simultaneously monitoring test signals from one or more microphones to detect speaker and amplifier channels in operation; providing additional test signals to the plurality of speakers to determine tuning parameters; detecting an additional test signal at one or more microphones controlled by a controller; and automatically establishing a background noise level and a noise spectrum of the room environment based on the detected additional test signal.

Another example embodiment may include a method comprising one or more of the following operations: identifying a plurality of speakers and microphones connected to a network controlled by a controller; assigning preliminary output gains to a plurality of speakers for applying the test signals; measuring ambient noise detected from the microphone; simultaneously recording chirp (chirp) responses from all microphones based on the test signal; deconvolve (deconvolve) all chirp responses to determine a corresponding number of impulse responses; and measuring an average Sound Pressure Level (SPL) of each microphone to obtain the SPL level based on the average of the SPLs.

Another example embodiment includes an apparatus comprising a processor configured to: identifying a plurality of speakers and microphones connected to a network controlled by a controller; assigning preliminary output gains to a plurality of speakers for applying the test signals; measuring ambient noise detected from the microphone; simultaneously recording chirp responses from all microphones based on the test signal; deconvolving all chirp responses to determine a corresponding number of impulse responses; and measuring an average Sound Pressure Level (SPL) of each microphone to obtain the SPL level based on the average of the SPLs.

Another example embodiment includes a non-transitory computer-readable storage medium configured to store instructions that, when executed, cause a processor to perform one or more of: identifying a plurality of speakers and microphones connected to a network controlled by a controller; assigning preliminary output gains to a plurality of speakers for applying the test signals; measuring ambient noise detected from the microphone; simultaneously recording chirp responses from all microphones based on the test signal; deconvolving all chirp responses to determine a corresponding number of impulse responses; and measuring an average Sound Pressure Level (SPL) of each microphone to obtain the SPL level based on the average of the SPLs.

Another example embodiment may include a method comprising one or more of the following operations: determining a frequency response of the measured chirp signal detected from the one or more speakers; determining an average value of the frequency response based on the high limit value and the low limit value; subtracting the measured response from the target response, wherein the target response is based on the one or more filter frequencies; determining a frequency limited target filter having audible parameters based on the subtracting; and applying an Infinite Impulse Response (IIR) biquad (biquad) filter to equalize the frequency response of the one or more speakers based on the region defined by the frequency limited target filter.

Another example embodiment includes an apparatus comprising a processor configured to: determining a frequency response of the measured chirp signal detected from the one or more speakers; determining an average value of the frequency response based on the high limit value and the low limit value; subtracting the measured response from the target response, wherein the target response is based on the one or more filter frequencies; determining a frequency limited target filter having audible parameters based on the subtracting; and applying an Infinite Impulse Response (IIR) biquad filter based on the region defined by the frequency-limited target filter to equalize the frequency response of the one or more speakers.

Another example embodiment includes a non-transitory computer-readable storage medium configured to store instructions that, when executed, cause a processor to perform one or more of: determining a frequency response of the measured chirp signal detected from the one or more speakers; determining an average value of the frequency response based on the high limit value and the low limit value; subtracting the measured response from the target response, wherein the target response is based on the one or more filter frequencies; determining a frequency limited target filter having audible parameters based on the subtracting; and applying an Infinite Impulse Response (IIR) biquad filter based on the region defined by the frequency-limited target filter to equalize the frequency response of the one or more speakers.

Another example embodiment includes a method comprising one or more of the following operations: applying a set of initial power and gain parameters to the speaker; playing the excitation signal via a speaker; determining a sound level at the microphone location and a sound level at a predetermined distance from the speaker; determining a gain at the microphone location based on a difference between a sound level at the microphone location and a sound level at a predetermined distance from the speaker; and applying a gain to the speaker output.

Another example embodiment includes an apparatus comprising a processor configured to: applying a set of initial power and gain parameters to the speaker; playing the excitation signal via a speaker; determining a sound level at the microphone location and a sound level at a predetermined distance from the speaker; determining a gain at the microphone location based on a difference between a sound level at the microphone location and a sound level at a predetermined distance from the speaker; and applying a gain to the speaker output.

Another example embodiment includes a non-transitory computer-readable storage medium configured to store instructions that, when executed, cause a processor to: applying a set of initial power and gain parameters to the speaker; playing the excitation signal via a speaker; determining a sound level at the microphone location and a sound level at a predetermined distance from the speaker; determining a gain at the microphone location based on a difference between a sound level at the microphone location and a sound level at a predetermined distance from the speaker; and applying a gain to the speaker output.

Another example embodiment includes a method comprising one or more of the following operations: starting an automatic tuning process; detecting, via one or more microphones, sound measurements associated with outputs of one or more speakers at two or more locations; determining a number of voice transmission index (STI) values equal to the number of microphones; and averaging the voice transmission index values to identify a single voice transmission index value.

Another example embodiment includes an apparatus comprising a processor configured to: starting an automatic tuning process; detecting, via one or more microphones, sound measurements associated with outputs of one or more speakers at two or more locations; determining a number of voice transmission index (STI) values equal to the number of microphones; and averaging the voice transmission index values to identify a single voice transmission index value.

Another example embodiment includes a non-transitory computer-readable storage medium configured to store instructions that, when executed, cause a processor to perform one or more of: starting an automatic tuning process; detecting, via one or more microphones, sound measurements associated with outputs of one or more speakers at two or more locations; determining a number of voice transmission index (STI) values equal to the number of microphones; and averaging the voice transmission index values to identify a single voice transmission index value.

Drawings

Fig. 1 illustrates a controlled speaker and microphone environment according to an example embodiment.

Fig. 2 illustrates a process for performing an auto-tuning process in a controlled speaker and microphone environment according to an example embodiment.

Fig. 3 illustrates a process for performing an automatic equalization process in a controlled speaker and microphone environment, according to an example embodiment.

Fig. 4 illustrates an audio configuration for identifying gain levels in a controlled speaker and microphone environment according to an example embodiment.

Fig. 5 illustrates an audio configuration for identifying sound pressure levels (sound pressure level, SPL) in a controlled speaker and microphone environment according to an example embodiment.

Fig. 6A shows a flow chart of an auto-tuning process in a controlled speaker and microphone environment according to an example embodiment.

Fig. 6B shows a flow chart of another auto-tuning process in a controlled speaker and microphone environment according to an example embodiment.

Fig. 7 shows another flow chart of an auto-configuration process in a controlled speaker and microphone environment according to an example embodiment.

Fig. 8 shows a flow chart of an automatic equalization process in a controlled speaker and microphone environment, according to an example embodiment.

Fig. 9 shows a flow chart of an automatic gain identification process in a controlled speaker and microphone environment according to an example embodiment.

Fig. 10 shows a flowchart of an automatic speech intelligibility determination process in a controlled speaker and microphone environment, according to an example embodiment.

Fig. 11 shows a system configuration for storing and executing an auto-tuning process.

Detailed Description

It will be readily understood that the instant assemblies as generally described and illustrated in the figures herein could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of embodiments of at least one of the methods, apparatus, non-transitory computer readable media, and systems represented in the accompanying drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments.

The instant features, structures, or characteristics may be combined in any suitable manner in one or more embodiments throughout this specification. For example, throughout this specification, the use of the phrase "example embodiments," "some embodiments," or other similar language refers to the fact that: a particular feature, structure, or characteristic described in connection with the embodiments may be included within at least one embodiment. Thus, appearances of the phrases "in an example embodiment," "in some embodiments," "in other embodiments," or other similar language throughout this specification do not necessarily all refer to the same group of embodiments, but the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, although the term "message" may be used in the description of the embodiments, the present application is applicable to various types of network data, such as packets, frames, datagrams, and the like. The term "message" also includes packets, frames, datagrams and equivalents thereof. Furthermore, although certain types of messages and signaling may be depicted in the exemplary embodiments, they are not limited to a certain type of message, nor is the application limited to a certain type of signaling.

The startup process for establishing auto-tune and configuration settings for an audio system may include a series of operations. In the auto-configuration phase, the system firmware may use an ethernet-based networking protocol to discover peripheral devices connected to the central controller device. These peripheral devices may include beam tracking microphones, amplifiers, universal Serial Bus (USB) and Bluetooth (BT) I/O interfaces, and telephone dial devices. The device firmware then modifies its own configuration and the configuration of the discovered peripheral devices to correlate them and route the associated audio signals through the appropriate audio signal processing functions. The auto-tuning phase has three sub-phases: microphone (mic) and speaker detection, tuning and verification.

Not every amplifier output channel (not shown) managed by the controller device may have a speaker attached. During the microphone and speaker detection phase, each amplifier channel in turn plays a unique (unique) detection signal. During the playing of each detected signal, all the input signals detected by the microphones are monitored simultaneously. With this technique, unconnected amplifier output channels are identified and the integrity of each microphone input signal is verified. During the tuning phase, each connected amplifier output channel in turn plays other unique test signals. These signals are again monitored simultaneously by all microphones. By knowing the frequency response(s) of the microphones in advance, and using various audio processing techniques, the firmware can calculate the background noise level and noise spectrum of the room, the sensitivity of each amplifier channel and connected speakers (room SPL generated at a given signal level), the frequency response of each speaker, the distance of each microphone to each speaker, the room reverberation time (RT 60), etc. Using these calculations, the firmware can calculate tuning parameters to optimize the level settings for each speaker channel to achieve a given target SPL and to optimize the Equalization (EQ) settings for each speaker channel to normalize the frequency response of the speaker and achieve a target room frequency response. Acoustic Echo Cancellation (AEC), noise Reduction (NR) and nonlinear processing (NLP) settings are most suitable and efficient for room environments.

The verification phase occurs after the tuning parameters are applied. At this stage, the test signal is again played back sequentially from each connected amplifier output channel and monitored simultaneously by all microphones. The measurement results are used to verify that the system reaches the target SPL and that the system reaches the target room frequency response. During the verification phase, all speakers play the specially designed speech intelligibility test signal simultaneously and are monitored simultaneously by all microphones. Speech intelligibility is a measure in industry standards of how well a listener can correctly recognize and understand sound. Most of the measurements made by the automatic settings and the settings applied are provided in the information report for download from the device.

Example embodiments provide a system that includes a controller or central computer system to manage multiple microphones and speakers to provide audio-optimized tuning management in a particular environment (e.g., a workplace environment, a conference room, a conference hall, multiple rooms at different floors, etc.). Automatic tuning of an audio system includes tuning various sound levels, performing equalization, identifying a target sound pressure level SPL (SPL), determining whether compression is required, measuring speech intelligibility, determining an optimal gain approximation to apply to a speaker/microphone, and so forth. The environment may include multiple microphones and speaker areas, where the various speakers are separated by different distances. Third party test equipment is not ideal and does not provide simplified scalability. Ideally, identifying network components active on a network and using only those components to set up an optimized audio platform for conference or other presentation purposes would be the best choice for time, expertise, and cost.

The automatic equalization process is capable of automatically equalizing the frequency response of any speaker in any room to any desired response shape, which may be defined by a flat line and/or a parameterized curve. The process may not run in real-time during the active program audio event (active program audio event), but rather during the system setup process. This process considers and equalizes the log-amplitude frequency response (decibels versus frequency) and may not attempt to equalize the phase. The process identifies the optimal filter with a frequency response that is very close to the inverse of the measured response, thus smoothing the curve or reshaping the curve to other desired response values. The process may use a single biquad infinite impulse response (infinite impulse response, IIR) filter that is bell-shaped for enhancing or clipping parameterized, low pass and/or high pass filters. FIR filters may also be used, but IIR filters have optimized computational efficiency and low frequency resolution, are more suitable for spatial averaging, or are equalized over a wide listening area of a room.

The desired target frequency response is identified when performing the equalization process. Typically, this will be a flat response with low and high frequency roll-offs to avoid designing a filter bank that attempts to achieve an unachievable effect through the frequency limited speaker(s). The target mid-band response (target mid-band response) need not be flat and this process allows for an arbitrary target frequency response in the form of a biquad filter array. This process also allows the user to set maximum dB boost or some clipping limits on the overall DSP filter bank to be applied before any auto-tuning process.

Fig. 1 illustrates a controlled speaker and microphone environment according to an example embodiment. Referring to fig. 1, the illustration shows an audio control environment 112 that may have any number of speakers 114 and microphones 116 to detect audio, play audio back, adjust audio output levels, etc. via an auto-tuning process. The arrangement 100 may include various different areas 130-160 separated by spaces, walls, and/or floors. The controller 128 may be in communication with all of the audio elements, and the controller 128 may include software applications, computers, processors, etc. for receiving and generating audio. In this example, the frequency response may be obtained by measuring the speaker using a chirp (chirp) response measurement technique.

Regarding the setup process, a startup option (auto setup + auto tune) at the front end of the user interface of the user device in communication with the controller 128 may provide a method to test the sound profile (profile) of the room(s), speaker(s) and microphone(s). Network discovery may be used to find devices that are inserted and contained in a list of system devices and provide them with a baseline configuration that is initiated during run-time. During the device discovery process, the audio system may be displayed in a graphical format, and the operator may then drag and drop the data to obtain a more customizable experience, or reset to a factory default level. If the system is not sufficiently tuned to a certain level, an alarm may be generated and any misconnection (miswire) found by sending test signals to all known devices.

The audio environment typically includes various components and devices, such as microphones, amplifiers, speakers, DSP devices, and so forth. After installation, these devices need to be configured to act as an integrated system. The software application may be used to configure certain functions performed by each device. The controller or central computing device may store a configuration file that may be updated during installation to include the newly discovered audio profile.

One method of performing an auto-tuning process may include allowing the auto-tuning process to run on a device that also contains custom DSP processing. To enable this combined function, the code will find the appropriate signal injection and monitoring points in the custom configuration. Any selected DSP processing topology will be automatically compatible after the injection and monitoring points are identified. Some operations in the auto-tuning process will send test signals from each speaker one by one, which increases the total measurement time in case of many speakers. Other operations may include transmitting test signals from all speakers simultaneously or within overlapping time periods, and performing a test procedure on the received and processed aggregated sound.

To reduce the total measurement time, different signals may be played from each speaker at the same time. Some different methods of providing a mixed signal may include: generating a specific sine wave for each speaker, wherein each different speaker uses a unique frequency; a short piece of musical composition is played in which each speaker plays a unique instrument in the mix of the musical composition, or only tones of different frequencies may be paired with each speaker separately. In the case where there are a large number of speakers, songs containing a plurality of percussion instruments may be used, each speaker corresponding to a drum sound. Any other mix of multi-channel sounds may be used to drive the dynamic and/or custom sound testing process. There are other sound event detection algorithms that can detect the presence of one sound in a mix of many other sounds, which may be useful in the present test analysis process. The auto-tuning may be a combination of a voice prompt and a test signal played by each speaker. The test signals are used to gather information about the amplifiers, speakers, and microphones in the system, as well as the locations of these devices in the acoustic space.

Other signals may also be used to collect the same room and device information that is collected for testing. The decision to use different signals may be based on different objectives, such as the signals used, which signals are acoustically pleasing, which signals may include voice and/or music cues. This has the advantage that the playing of scientific sound test tones in space (scientific-sounding test tone) can be avoided. A potential disadvantage is that additional time is required to extract room and device information from non-ideal signal sources. To reduce the total measurement time, voice prompts may be eliminated and the basic test signal that produces the fastest results may be used.

The automatic equalization process (see fig. 3) is capable of automatically equalizing the frequency response of any speaker in any room to any desired response shape, which may be defined by a flat straight line and/or a parameterized curve. The process may not be real-time during an active program audio event, but rather, real-time during a system setup process. The process equalizes the log-amplitude frequency response (decibel versus frequency) and may not equalize the phase. The process identifies a set of optimal filters whose frequency response closely matches the inverse of the measured response, flattening or reshaping the response to other desired response values. The process uses a single biquad IIR filter, which may be bell-shaped (e.g., a boost or cut parametric filter), low-pass, or high-pass. FIR filters may be used, but IIR filters have better computational efficiency and low frequency resolution and are more suitable for spatial averaging and/or equalization over a wide listening area of a room.

In performing the equalization process, a desired target frequency response is first identified. Typically, this will be a flat response with low and high frequency roll-off to avoid the process designing a filter bank that tries to achieve an unachievable effect with frequency limited speakers. The in-target frequency band response need not be flat and this process allows any arbitrary target frequency response in the form of a biquad filter array. The process also allows the user to cut-down limits or set maximum dB boost on the overall DSP filter bank to be applied.

An example process associated with the automatic setup process (see fig. 2) may provide sequencing through each speaker output channel and perform the following for each output: lifting the multitone signal until a desired SPL level is detected; determining whether the output channel of the loudspeaker works normally; determining whether all microphone (mic) input channels are operating properly; setting preliminary output gains of the unknown amplifier and the speaker for the test signal; measuring the ambient noise from all microphones to set a reference for the RT60 measurement, which is a measure of the time required for sound to attenuate 60dB in space with diffuse sound field; checking for excessive noise; providing a chirp test signal; the chirp responses from all "N" microphones are recorded simultaneously into the array; deconvolving all chirps from the "N" microphones to obtain "N" impulse responses; and for each microphone input: locating the peak of the main pulse and calculating the distance from the loudspeaker to the microphone; calculating a smoothed log-amplitude frequency response and applying microphone compensation values (using known microphone sensitivities); calculating SPL average values on all frequencies; averaging the frequency responses of all microphones to obtain a spatial average; performing automatic equalization on the spatial average response to match the target response, the SPL level and the distance of the nearest and farthest microphones are used to calculate the room attenuation; calculating an output gain using SPL from the nearest microphone and room attenuation to reach a desired level at the average distance of all microphones; calculating a SPL limiter threshold while enabling automatic equalization and automatic gain; generating chirp to measure and verify responses; an octave-band (RT 60) for each microphone is measured; and measuring the average SPL for each microphone, and then averaging all microphones to obtain the achieved SPL level.

Another example embodiment may include an automatic setup process, the process including: determining which input microphones are operating and which output speaker channels are operating; performing automatic equalization on each output speaker channel to achieve any desired target frequency response (defined by parameterized EQ parameters); automatically setting each output path gain to reach a target SPL level at the center of the room, the target SPL level being determined by the average speaker-to-microphone distance; automatically setting an output limiter for a maximum SPL level in the center of the room; based on room measurements, automatically setting nonlinear processing (NLP) and Noise Reduction (NR) values, automatic Echo Cancellation (AEC); measuring the frequency response of each output speaker channel in the room; measuring a final nominal SPL level expected at the center of the room based on each output channel; measuring the octave band and full-band reverberation time of a room; measuring the noise spectrum and octave band noise of each microphone; measuring a noise standard (NC) level of the room; and measuring the minimum, maximum and average distances of all microphones from the speaker, and the speech intelligibility of the room. All measurement data can be used to establish optimal speaker and microphone configuration values.

In one example audio system setup process, a start-up operation (i.e., auto-setup + auto-tune) on the user interface may provide a method of starting up testing of sound profiles of rooms, speakers, and microphones. Network discovery may be used to find devices that are inserted and included in the list of system devices, providing them with a baseline configuration to initiate in an audio usage scenario. The audio system may be implemented in a graphical format during the device discovery process, an operator may interface with the display, and drag and drop data to obtain a more customizable experience, or may be reset to a factory default level before or after automatic system configuration. If the system is not sufficiently tuned to a certain level, an alarm may be generated and any misconnections discovered by sending test signals to all known devices.

An audio environment typically includes various components and devices, such as microphones, amplifiers, speakers, digital Signal Processing (DSP) devices, and so forth. After installation, these devices need to be configured to act as an integrated system. Application software may be used to configure certain functions performed by each device. The controller or central computing device may store a configuration file that may be updated during installation based on the currently installed hardware, audio environment profile(s), and/or desired configuration to include the newly discovered audio profile. In one example embodiment, the auto-tuning process may tune the audio system, including all accessible hardware managed by the central network controller. The audio input/output level, equalization and average Sound Pressure Level (SPL)/compression values may all be selected to achieve optimal performance in a particular environment.

During automatic setup, it is determined which input microphones are in operation and which output speaker channels are in operation. Each output speaker channel is automatically equalized to achieve the desired target frequency response (defined by parameterized EQ parameters, high pass filters, low pass filters, etc.). The default option may be a "flat" response. Other operations may include: automatically setting each output path gain to achieve a target SPL level for the user at the center of the room (assuming average distance of microphones); and automatically setting an output limiter to achieve a maximum SPL level for the user at the center of the room. Another function may include automatically determining Automatic Echo Cancellation (AEC), non-linear processing (NLP), and NRD values based on room measurements. Information measurements may also be made including measuring the frequency response of each output speaker channel within the room, measuring the final nominal SPL level that each output channel is expected to be in the center of the room, measuring the octave band reverberation time (RT-60) of the room, and measuring the noise floor of the room. Other functions may include measuring the minimum, maximum, and average distances of all microphones from the speaker. These values may provide information needed to perform other automatic settings, such as setting the high pass filter cut-off frequency of the beam tracking microphone based on the reverberation time of the room low frequency band, and fine tuning the adaptive filter profile of AEC to best match the expected echo characteristics of the room. The obtained information may be saved in memory and may be used by an application to provide examples of meeting room acoustic features and sound quality characteristics. Some suggestions may be used to increase the spacing between the microphone and speaker, or to make acoustic adjustments to the room via the speaker and microphone, based on room audio characteristics, because RT-60 (the "score" of reverberation used to predict speech intelligibility) is too high.

The audio setup process may include a series of operations, such as pausing any type of conference audio layout function and providing input (microphone) and output (speaker) control to the automatic setup application. Each output speaker participating in the automatic setup will in turn produce a series of tones and/or "chirps" designed to capture the acoustic characteristics of the room. The number of sounds produced in a room is directly related to the number of inputs and outputs that are involved in the automatic setup process. For example, in a system with three microphones and two speakers, the automatic setting will perform the following actions: (- -first speaker- - -), speaker 1 produces a series of sounds which are captured by microphone 1, speaker 1 produces a series of sounds which are captured by microphone 2, and speaker 1 produces a series of sounds which are captured by microphone 3; the loudspeaker 2 produces a series of sounds which are captured by the microphone 1, the loudspeaker 2 produces a series of sounds which are captured by the microphone 2, the loudspeaker 2 produces a series of sounds which are captured by the microphone 3, and after this process is completed, the normal conference layout audio processing resumes. Based on the automatic setting process, the gain and equalization of each speaker are adjusted; based on the auto-setup process, AEC performance for the room is tuned; based on the automatic setting process, the microphone LPF of the field room is tuned; and the acoustic properties of the room have been recorded. Optionally, the user may also be presented with some summary data describing the results of the automatic setup process. During processing, the automatic setting may "fail" if a defective microphone or speaker is found, or an unexpected loud sound (e.g., street noise) is captured during processing. If this is the case, the automatic setting will stop and alert the end user. In addition, friendly automatic setup voices can be used to discuss with the user what is being done in the overall process of automatic setup.

Fig. 2 illustrates an automatic equalization process, including an iterative process for multiple speakers in an environment. Referring to fig. 2, during startup, a user interface may be used to control startup and "auto-tune" options. Memory allocation operations may be performed to detect certain speakers, microphones, etc. The identified network elements may be stored in memory. The tuning process may also be performed such that the operations of fig. 2 are initiated. Each speaker may receive an output signal (202) that is input (204) to produce sound or a signal. The ambient noise level may also be identified (206) from the speaker and detected by the microphone. Multiple tones may be sent to different speakers (208), the tones measured and the values stored in memory. Furthermore, the chirp response (210) may be used to determine the level of the speakers and the corresponding room/environment. An impulse response may be identified (212) and a corresponding frequency response value calculated based on the input (214). Furthermore, a speech intelligibility level (speech transmission index (STI)) can be calculated, as well as a "RT60" value, which is the time required for sound to attenuate 60dB in space with a diffuse sound field, which means that the room is large enough and the reflections of the sound source reach the microphone from all directions at the same level. An average of the input values (216) may be determined to estimate an overall sound value for the corresponding network element. Averaging may include summing the input values and dividing by the number of input values.

Continuing with the same example, automatic equalization may be performed based on a spatial average of the input responses (218). The automatic equalization level may be output 222 until the process is complete 224. When the output is complete (224), the output value is set (226), which may include parameters used in outputting audio signals to various speakers. During the verification process (230), the process continues iteratively, which may include similar operations for each speaker, e.g., 202, 204, 210, 212, 214, 216. Further, in an iterative verification process, a speech intelligibility measurement may be performed until all output values are identified. If the output is not complete in operation 224, the automatic equalization level (225) is used to continue measuring the next output value of the next speaker (i.e., iteratively), and continuing until the outputs of all speakers are measured and stored.

The automatic setup operation relies on measuring speaker, microphone and room parameters using the chirp signal and possible chirp deconvolution to obtain an impulse response. Chirp signal deconvolution can be used to obtain a high quality Impulse Response (IR) that is free of noise, system distortion, and surface reflection using the actual FFT size. One factor that will affect the effectiveness of the automatic setup process is the degree of knowledge of the system components (e.g., microphone, power amplifier, and speaker). With the component frequency response known, a Digital Signal Processor (DSP) should apply corrective equalization before generating and recording any chirp signals to improve the accuracy of the chirp measurements.

An automatic equalization process may be used to equalize the frequency response values of any speaker in any room to achieve a desired response shape (e.g., a flat line and/or a parameterized curve). This process may utilize a single biquad IIR filter of the bell-type. The process may begin with a desired target frequency response with a low frequency roll-off and a high frequency roll-off to avoid encountering limitations on filters built for specific speakers and rooms. Target response (H) _target ) May be a flat response of the low frequency roll-off. By using chirped stimulus/response measurements, a measured frequency response of the speakers in the room can be obtained. The response needs to be normalized to average 0dB and the high and low frequency limits can be used to equalize and set the limits of the data used. The process will calculate the average level between the limits and subtract the average level value from the measured response to provide a response normalized to "0" (H _meas ). The frequency-limited target filter is then determined by subtracting the measured response from the target response: h _targfilt ＝H _target -H _meas This value is the target response for the next auto EQ biquad filter.

Fitting H in order to find a parameterized filter _targfilt All important curve features (0 dB cross-point and peak point) are found by a function named findfequfeaturs ().

The filter selection at the two frequency limits is slightly different. If the target filter requires boosting at a frequency limit, a PEQ boosting filter will be used, with its center frequency being the limit frequency. If the target filter requires attenuation at the frequency limit (which typically occurs in the case of a target response roll-off), then the HPF/LPF is selected and the-3 dB corner frequency is calculated to match the point where the curve is-3 dB. This approach can produce a better match when the auto EQ range is exceeded, especially when a roll-off response is required (which is most often the case). Once all the frequency characteristics of the target filter are identified, the most prominent biquad filter can be found for the target filter using a function named findbigestarea (), as shown in the following figure, the largest area under the target filter curve characterizes the most prominent biquad filter.

Based on these characteristics, a function named DeriveFilter ParamsFromFreqFeatues () calculates 3 parameters (fctr, dB, Q) from the curve center frequency, dB boost/cut, and bandwidth (Q). The bandwidth of the bipolar bandpass filter is defined as fctr/(f) _upper –f _lower ) Wherein f _upper And f _lower Is the position with linear amplitude.707 peak. There is here A 1+ bandpass bell filter, but empirically it was found that using a.707 x peak (dB) (baseline of 0 dB) also provided the best result for estimating the Q of the bell. The edge frequency is not used to calculate PEQ bandwidth, but rather to delineate two adjacent PEQ peaks. If the region represents attenuation at the frequency limit, the function will calculate the LPF/HPF filter angular frequency with a response of-3 dB. From these filter parameters, auto EQ biquad filter coefficients are calculated and added to the auto EQ DSP filter bank. The updated DSP filter response (H _dspfilt ) Added to the measurement response (H _meas ) { all numbers are in dB }, to show an automatic equalization response (H _autoeq ) Is the same as the above. Then respond from the target (H _target ) To subtract the automatic equalization response (H) _autoeq ) To generate a new target filter (H _targfilt ). This new target filter represents the error, i.e. the difference between the expected target response and the corrected response.

Fig. 3 illustrates a process for determining an automatic equalization filter bank applied to a speaker environment according to an example embodiment. Referring to fig. 3, the process may include: defining the target response as a list of HPF/LPF frequencies and biquad filters (302); measuring a chirp response from the microphone (304); normalizing the value between the frequency limits to 0dB (306); subtracting the measured response from the target response to provide a target filter (308); finding zero crossings and derivative zeros of the target filter (310); combining the two sets of zero frequencies in order to identify a frequency characteristic value (312); identifying a maximum area under the target filter curve (314); deriving parameters to fit a bell-shaped region (316) with frequencies at.707 times the peak; it is determined whether the filter parameters are audible (318) and if so, the process continues to calculate biquad coefficients based on the identified filter parameters (320). The process continues with limiting the filter dB based on amplitude limiting (322), adding a new limited filter to the DSP filter bank (324), adding an unrestricted EQ filter to the measured response to provide an unrestricted correction response (326), and subtracting the correction response from the target response to provide a new target filter (328). If all available biquad steps are used (330), the process ends (332), otherwise the process continues back to operation (310).

To determine which speaker outputs are active (live), a five-octave multitone (one octave apart between five sine wave signals) signal level is applied to the speakers and rapidly raised to rapidly detect any connected active speakers. The multitone signal level is raised one speaker at a time while the signal levels from all microphones are monitored. As soon as the signal received by one microphone (mic) reaches the desired target level of the Sound Pressure Level (SPL) of the audio system (i.e. SPL threshold level), the multitone test signal is terminated and the speaker output channel is designated as being operational. If the multitone test signal reaches a maximum "safety limit" and no microphone receives the target SPL level, the speaker output is designated as dead/off. The received five-pass signal is passed through a set of five narrow band pass filters. The purpose of the five-pass test tones and the five bandpass filters is to prevent false detection of the speaker by broadband ambient noise or by a single tone generated by other sources in the room. In other words, the audio system is generating and receiving a particular signal signature to distinguish the signal from other unrelated sound sources in the room. The same five-fold multi-tone used to detect the operational speaker output is used to detect the operational microphone input at the same time. The multitone detection signal will terminate once the highest microphone signal reaches the audio system target SPL level. At this point, all microphone signal levels will be recorded. If the microphone signal is above some minimum threshold level, then the microphone input is designated as an operational microphone input, otherwise it is designated as dead/off.

To set the speaker output gain level, the desired acoustic listening level in dB for SPL will be determined and stored in firmware (acoustic listening level). The gain of the DSP speaker output channel will be set to achieve this target SPL level. If the gain of the power amplifier is known and the sensitivity of the speakers is also known, these DSP output gains may be accurately set for a particular SPL level, for example, based on one meter from each speaker (other distances are also contemplated and may be used as alternatives). The level at some estimated listener position will be less than the estimated level. In free space, the sound level drops by 6dB for every double distance from the sound source. For a typical conference room, the sound level for each doubling of the sound source distance may be determined to be-3 dB. If each listener is assumed to be between 2 meters and 8 meters from the nearest speaker and the gain is set for a middle distance of 4 meters, the generated sound level will be within +/-3db of the desired sound level. If the sensitivity of the speaker(s) is not known, the chirp response signal obtained from the nearest microphone will be used. The reason for using the nearest microphone is to minimize reflections and errors due to estimated horizontal loss versus distance. From the level of this response and the time of flight (TOF), the sensitivity of the loudspeaker can be estimated, although the attenuation caused by off-axis pick-up of the loudspeaker is unknown. If the gain of the power amplifier is not known, a typical value of 29dB will be used, which may introduce a SPL level error of +/-3 dB.

Electroacoustic sound systems (electro-acoustic sound system) are analyzed to determine which gains should be used to reach the optimal sound level. The voltage, power and sound level and gain may be derived from any sound system. These values can be used to provide SPL levels for a particular location using a DSP processor. Generally, an audio system will have a microphone, a speaker, a codec, a DSP processor, and an amplifier.

Fig. 4 illustrates an example configuration for identifying various audio signal levels and characteristics according to an example embodiment. Referring to fig. 4, this example includes a particular room or environment, such as a meeting room, having a person 436, and the person 436 is estimated to be approximately one meter from the speaker 434. The attenuation values are denoted as gain values. For example, G _PS ＝L _P -L _SPKR This is the gain of the speaker at one meter from the person, which may be about-6 dB, for example. L (L) _P Is an acoustic sound pressure level, L, irrespective of any particular average value _SPKR Is the sound pressure value at a distance of 1 meter from the loudspeaker. G _MP Is the gain from microphone 432 to the person, G _MS Is the gain from the microphone to the speaker. A power amplifier 424 may be used to provide forThe microphone is powered and DSP processor 422 can be used to receive and process data from the microphone to identify the optimum gain and power level to apply to speaker 434. Ideally, identifying these optimal values includes determining G _PS And G _PS . This will help to achieve sound levels at the listener position as well as set DSP output gain and input preamplifier gain values.

In this example of fig. 4, if some basic parameters are known about microphone, amplifier and speaker, L _{sens,mic,(1)PA} (dBu) is the sensitivity of the analog microphone in dBu, which is the absolute quantity relative to 1 Pascal (PA), in this example-26.4 dBu, G _amp Is the gain of the power amplifier, in this example 29db, l _sens,spkr Is the sensitivity of the loudspeaker, in this example 90dBa. Continuing with the present example, L _gen Is the level of the signal generator (dBu), G _dsp,in Is the DSP processor input gain, including the microphone pre-amplifier gain, in this example 54db, g _dsp,out Is the DSP processor output gain, which in this example is-24 dB. Playing the excitation signal and measuring the response signal, which may be 14.4dBu, for example, and L _1PA =94. In this example, the sound level at the microphone may be through L _mic ＝L _dsp –L _sens,mic,1PA +L _1PA -G _dsp.in =14.4- (-26.4) +94=80.8 dBa. Sound level at 1 meter from speaker is L _spkr ＝L _gen +G _dsp+ G _amp+ L _sens,spkr –L _{sens,spkr,volts} =0+ (-24 dB) +29db+90dba-11.3dBu =83.7 dBu. G can now be calculated _MS ＝L _mis -L _spkr = -2.9dBa. The estimate is based on a-2.5 dB value per doubling of distance in a typical conference room.

In case the gain and other parameters of microphone, power amplifier and speaker are unknown, the L of microphone _p And L _mic Typically-38 dBu, the power amplifier is +/-12dB,29dB +/-3dB, and the speaker is 90dBA +/-5dB. The above formula is necessary to calculate the DSP gain for the desired sound level and to achieve dynamic range. And then canIdentifying a desired listener level L by various gain measurements _P 。

Fig. 5 illustrates a process for identifying Sound Pressure Levels (SPLs) in a controlled speaker and microphone environment according to an example embodiment. Referring to FIG. 5, the example includes a listener 436 in a simulation model, the listener 436 being at a distance D from the speakers 534 in a particular room _P . In free space, the sound level decays by 6dB for each doubling of distance. However, in a room, this attenuation level will be less than 6dB due to reflection and reverberation. Typical values of sound level attenuation in a conference room are about 3dB for every doubling of the distance, typically the small and/or reflective rooms will be smaller and the large and/or absorptive rooms will be larger than this value.

Multiple microphones are used to produce a desired SPL at a particular location (at a distance D from speaker 534 _P At the desired listener level L _P At a known level L1 meter from the speaker 534 ₁ Lower) and knows the attenuation and sensitivity of the speaker for each doubling of distance. All of these parameters can be determined by one chirp at two simultaneously measured positions (as shown by D1 and D2). Assuming uniform horizontal attenuation in the room, the amount of attenuation per doubling of distance can be calculated from any two measurements (at two different locations) in the room. This assumption is more effective as the room size increases and/or becomes more diffuse. This assumption is also more valid as an average attenuation for all frequencies. The decay equation for each doubling of distance can be derived as α _dd ＝-(L ₁ –L ₂ )/log2(D ₂ /D ₁ ) Where l=spl level, d=distance, and α _dd In this case negative, the attenuation value is considered as a negative gain. The locations L1 and L2 from the speaker may be in any order (i.e., not necessarily D2>D1 A kind of electronic device. Next, the sensitivity of the speaker, i.e. the SPL level "1" meter from the speaker, must be measured for a given reference voltage drive. If the measurement is made at a distance other than 1 meter from the speaker, then a level 1 meter from the speaker will be measured by using alpha _dd And a "distance doubling" with respect to 1 meter (doubling of distance) "to calculate. Doubling of the 1 meter distance can be performed using the expression onemeterdoubilings=log2 (D ₁ ) To calculate. L can now be used _1m ＝L ₁ -onemeter doublings calculate the level at 1 meter. If the electrical test signal used is an electrical reference level of sensitivity of the speaker, typically 2.83V (1W at 8 ohms), then L _1m ＝L _sens,spkr . However, if the speaker driving voltages are different, equation L can be simply used _sens,spkr ＝L _1m –L _dsp,FSout –G _dsp,out –G _amp –G _attn,out +L _{sens,spkr,volts} To calculate L _sens,spkr 。L _sens,spkr Is the sensitivity of the loudspeaker, L _dsp,FSout Is the sensitivity of the output of the DSP processor, G _dsp,out Is the gain of the DSP output, G _amp Is the gain of the power amplifier, G _attn,out Is the gain of any attenuator, L _{sens,spkr,volts} Is the sensitivity of the speaker in volts.

Since the room and the alpha of the speaker sensitivity have been identified _dd Then at listener distance L _P Where the desired level L is generated _P The required speaker drive level (or DSP output gain) can be determined by calculating one meter doubling for the listener's position: onemeterdouublings=log2 (D ₁ ). Next, the listener level at 1 meter from the speaker can be calculated: l (L) _1m ＝L ₁ -OneMeterDoublings*α _dd . Finally, the speaker drive level or DSP output gain can be identified by: g _dsp,out ＝L _1m –L _sens,spkr –L _dsp,FSout –G _amp –G _attnout +L _{sens,spkr,volts} 。

In the example of fig. 5, one end of the room has a speaker and to calculate the DSP output gain required to produce the desired SPL level, for example, a SPL of 72.0dB is produced at a distance of 11.92 meters from the speaker. The SPL level is wideband and unweighted, thus using an unweighted full range chirp test signal. There are exactly two microphones in the room, but their distance from the speaker is unknown, nor is the speaker. Known systems The overall parameter is L _dspFSout ＝+20.98dBu，G _dsp,out = -20.27dB (DSP output gain for chirp measurement), G _amp ＝29.64dB，G _attn,out ＝-19.1dB，L _{sens,spkr,volts} = +11.25dBu (2.83V). The process is summarized in seven operations: 1) A chirp is generated and the response is measured at two or more locations. A single chirp is generated and the responses from the two microphones are recorded. Chirp measurements reveal the following data: l (L) ₁ ＝82.0dB _SPL 1.89m from the speaker; l (L) ₂ ＝73.8dB _SPL 7.23m from the speaker; 2) Calculating room attenuation of which the distance is doubled, wherein alpha dd= - (82.0 dB-73.8 dB)/log 2 (7.23 m/1.89 m) = -4.24 dB/double; 3) The chirp level at 1 meter from the speaker was calculated by first finding the nearest microphone double onemeterdoubilings=log2 (1.89 m) =0.918 doubilings with respect to a distance of 1 meter, now using L _1m ＝82.0dB _SPL -(0.918doublings)*(-4.24dB/doubling)＝85.9dB _SPL To calculate the chirp level at 1 meter; 4) Calculating sensitivity of speaker, L _sens,spkr ＝85.9dB _SPL –20.98dBu–(-20.27dB)–29.64dB–(-19.1dB)+11.25dBu＝85.9dB _SPL The method comprises the steps of carrying out a first treatment on the surface of the 5) Calculating the doubling of the distance DP from 1 meter to the listener, onemeter double links=log2 (11.92 m) =3.575 double links; 6) L1m=72 dB is used _SPL -(3.575doublings)*(-4.236dB/doubling)＝87.15dB _SPL The required level at 1 meter from the loudspeaker is calculated. Finally, calculate the DSP output gain, G, required to produce the level _dsp,out ＝87.15dB _SPL –85.9dB _SPL 20.98dBu-29.64dB- (-19.1 dB) +11.25dBu = -19.01dB. In this example, using a DSP output gain of-20.27 dB, a chirp of 72.0dB is measured at 11.92 meters from the speaker _SPL The calculated output gain in this example is thus different from the actual gain by (20.27-19.01) =1.26 dB.

The procedure calculates a prescribed DSP output gain of-19.0 dB based on the single chirps measured at 1.89 meters and 7.23 meters from the unknown speaker to reach 72.0dB at 11.9 meters from the speaker _SPL And the calculated gain is based on an actual measured level of 11.9m with an error of 1.26dBThe 11.9m is located outside the two microphone ranges. If the limited DSP resources only allow measuring the level of one microphone at a time in sequence, the calculation method of the level differences (L1-L2) must be different. If the test signal is added to each microphone until the desired SPL level is reached, then the desired SPL level and output gain are recorded, then the dB level difference is: dB (dB) _diff ＝(L1–G _dBout1 )-(L2–G _dBout2 ). dB when microphone 1 is closer to the speaker than microphone 2 _diff Will be positive. Typically, L1 and L2 are the same, but a closer microphone requires a lower output gain to bring both microphones to the same SPL level, so Gd _Bout1 Will be low, thus making dB _diff Positive values.

In another example, establishing the input microphone gain level may include: if the microphone has a known input sensitivity, the DSP input gain, including the analog preamplifier gain, can be set to the optimal dynamic range. For example, if the maximum sound pressure level at the microphone location in the room is expected to be 100dB SPL, the gain may be set to 100dB SPL, which will provide a full scale value. If the input gain setting is too high, clipping may occur in the pre-amplifier or the a/D converter. If the input gain setting is too low, it can result in too weak a signal and too loud noise (distorted by Automatic Gain Control (AGC)).

If the microphone has no known input sensitivity, the chirp response signal level and time of flight (TOF) information of the speaker closest to each microphone input may be used to estimate the microphone sensitivity. If the speaker and/or microphone does not have an omni-directional pickup pattern, the unknown off-axis attenuation of the speaker and/or the unknown off-axis attenuation of the microphone may error the estimate and may have other effects on the estimate due to the unknown frequency response of the microphone.

When speaker equalization is determined. It is desirable to equalize each speaker to compensate for irregularities in its frequency response and enhancement of low frequencies by nearby surfaces. If the frequency response of the microphone is known, the response of each speaker can be measured by chirp deconvolution after subtracting the known response of the microphone. Furthermore, if the frequency response of the speaker is known, only the response of the room may be determined. The reason for this is that surface reflections in the room may cause comb filtering of the measurement response, which is undesirable. Comb filtering is a time domain phenomenon that cannot be corrected by frequency domain filtering. The detection of surface reflections in the impulse response must be considered so that if a major reflection farther in time can be detected, it can be windowed out from the impulse response and thus removed from the frequency response used to derive the DSP filter.

If the frequency response of the microphone is not known, the frequency response measurement cannot distinguish between speaker-induced irregularities and microphone-induced irregularities. If the unknown microphone and speaker are frequency-responsive and all corrections are applied to the speaker output path, then the microphone imperfections will be overcorrected by the speaker and provide poor sound to the room remote listener when the remote speaker is performing an audio presentation. Also, if all corrections are applied to the microphone input path, then the defects of the speaker are overcorrected by the microphone, thereby giving the far listener poor sound quality of the near speaker. "compromise (splitting the difference)" and making half corrections to each of the microphone input and speaker output is not feasible nor is it possible to obtain good sound.

Equalization will be applied using a standard Infinite Impulse Response (IIR) parameterized filter. Finite Impulse Response (FIR) filters are not suitable for such applications because their frequency resolution is linear rather than logarithmic or octave, which may require very many taps for low frequency filters, and are also not well suited when the exact listening position(s) are not known. The IIR filter is determined by "inverse filtering (inverse filtering)" such that the inverse of the measured amplitude response is used as a target to "best fit" the cascade of parametric filters. The automatic equalization filter has practical limitations on both the degree of response correction (dB) and the range (far/wide/narrow) (Hz). Frequency response correction by inverse filtering from the impulse response is known to be accurate for the locations of the sound source and listener. Since the microphone positions are the only known values, in order for each speaker to emit good sound at all listening positions, a total average of the frequency responses will be performed so that the responses of all microphones picked up by the speakers will be averaged together after applying some octave smoothing. This process is transparent to the installer because the responses of all microphones can be recorded simultaneously using a single speaker chirp.

An example may include a microphone equalization process where equalizing an unknown speaker is impractical when the microphone frequency response is unknown and should not be attempted, so the frequency response of the unknown microphone cannot be determined. However, if the frequency response of the speaker is known, microphone equalization may be performed on the unknown microphone. The process of microphone equalization by chirp deconvolution will take advantage of the known response of the speaker stored in firmware and then subtract it to derive the microphone response. Each speaker should repeat this process to apply ensemble averaging to the measured frequency response. The equalizer settings for each microphone will be determined according to the inverse filtering method described in speaker equalization.

Once the speaker and microphone levels are set and the frequency response irregularities have also been equalized, the speaker values and levels may be set based on the room RT60 measurements. The reverberation time (RT 60) can be obtained by calculating the schrader backward integral (Schroeder reverse integration of the impulse) of the pulse, whereas RT60 is the time required to measure the attenuation of 60dB of sound in space with a diffuse sound field, meaning that the room is large enough that the reflections of the sound source arrive at the microphone from all directions with the same level response energy. Once the RT60 value(s) is known, the NLP level can be set, and when the reverberant tail is longer than the AEC's effective tail length, a more aggressive NLP setting is typically used.

Another example may include setting an output limiter. If the power amplifier gain is known and the speaker power rating is known, a DSP output limiter may be set to protect the speaker. Furthermore, if the sensitivity of the speaker is known, the limiter may further reduce the maximum signal level to protect the listener from excessive sound levels. It is not feasible for most administrators to keep gain value information and similar power gain/sensitivity records. Furthermore, even if the gain value is known, speaker wiring/configuration errors, such as bridging wiring errors, the gain may be incorrect and result in power limit settings errors. Therefore, SPL limitation is a more desirable operation.

According to other example embodiments, measuring the speech intelligibility level (speech intelligibility rating, SIR) of a conference room may include: a Speech Transmission Index (STI) is measured from a speech source to a listener location in a room. In addition, multiple speech sources (e.g., ceiling speakers) and multiple listening positions around the room may also be examined to determine the optimal STI and corresponding SIR. Furthermore, the voice sources in the conference environment may be located at remote locations, and remote microphones, remote rooms, and transmission channels may all affect the listener's voice clarity experience. In a conference room where multiple speakers are typically used simultaneously, the STI should be measured with all "voice conference" speakers playing simultaneously. A voice conference speaker refers to all speakers that are normally on during a conference, while all speakers dedicated to music playback will be off. The reason is that listeners typically listen to the speech from all the voice conference speakers at the same time, so that the speech intelligibility is affected by all the speakers, and thus the rating should be measured when all the voice conference speakers are active. The STI measured when turning on all voice conference speakers may be better or worse than a single speaker, depending on the background noise level, echo and reverberation in the room, spacing between speakers, etc.

The auto-tuning process may use microphones from the conference system without using additional measurement microphones, so the obtained STI measurement value may be a proxy value (proxy) for the true STI value of the measurement microphone placed at the exact ear position of the listener. Since the conference room has multiple listener locations, and possibly multiple conference microphones, the best STI rating will be obtained by: measurements are performed on all "N" microphones simultaneously, and "N" STI values are calculated and then averaged to yield a single STI value for a single room. This would be the average STI value measured at all conference microphone locations, which is a substitute value for the average STI value at all listener locations. The auto-tuning process is designed to order each output speaker zone one by one and measure all microphones simultaneously. The task of a real-time STI analyzer is DSP intensive and can only measure one microphone input at a time. Thus, this places practical limits on measuring the STI values of the "N" microphones and averaging. To obtain the most accurate STI value, all voice conference speakers should be played simultaneously. Thus, during the auto-tuning process, certain strategies may need to be taken for STI that may measure multiple microphones.

One strategy may include: although all speakers play the STI signal, the STI is measured only during the first speaker iteration and the measurement is made using the first microphone. Another approach is to make measurements using microphones that are determined to be in an intermediate position, which is determined by the speaker-to-microphone distance measured when calculating IR. Yet another method is: for each iteration of speaker regions, the STI is measured for the next microphone input so that multiple STI measurements can be averaged. This approach has the disadvantage that, for example, if there is only one speaker zone, only the first microphone can be measured. If the number of speaker zones is less than the number of microphones, the microphones in the intermediate positions may be omitted and the method is operated for the longest time.

It should also be noted that the STI value is generally understood to represent the voice transmission quality of the room. For teleconferencing systems, the quality of speech transmission experienced by listeners has three components: the STI of the speaker and the room in which he/she is located, the STI of the electronic transmission channel, and the STI of the remote microphone and the room. Thus, the STI value calculated by the auto-tuning process is a substitute value for only one of the three components that make up the listener's speech intelligibility experience. However, this information may still be useful because the score of the proximal component may be obtained and the user or installer may control the proximal component. For example, a user/installer may use the auto-tuned STI score to evaluate the relative improvement to STI using two different acoustic treatment designs.

The automatic equalization algorithm is capable of automatically equalizing the frequency response of any speaker in any room to any desired response shape, which may be defined by a flat line and/or a parameterized curve. The algorithm is not designed to operate in real-time during active process audio events, but rather during the system setup process. The algorithm only considers and equalizes the log-amplitude frequency response (decibels versus frequency) and does not attempt to equalize the phase. The algorithm essentially designs a set of optimal filters whose frequency response is very matched to the inverse of the measured response to planarize or reshape it into other desired responses. The algorithm uses only a single biquad IIR filter, which is of the bell-shaped (boost or cut parametric filter), low-pass or high-pass type. FIR filters may be used, but IIR filters are chosen because they are computationally efficient, better in low frequency resolution, and more suitable for spatial averaging or equalization over a wide listening area in a room.

In performing the equalization process, a desired target frequency response is first identified. Typically, this will be a flat response with low and high frequency roll-offs to avoid the process from designing a filter bank that attempts to achieve unrealizable results with a frequency limited speaker. The in-target frequency band response is not necessarily flat and this process allows any arbitrary target frequency response in the form of a biquad filter array. This process also allows the user to set maximum dB boost or clipping limits for the entire DSP filter bank to be applied.

Fig. 6A shows a process for performing an auto-tuning process for an audio system. Referring to fig. 6A, the process may include: identifying a plurality of individual speakers (612) on a network controlled by the controller; providing a first test signal to a first speaker and a second test signal to a second speaker (614); first and second test signals are detected at one or more microphones controlled by a controller, and speaker tuning output parameters are automatically established based on analysis of the different test signals (616). Tuning parameters may be applied to parameter sets of a digital DSP that are applied to various speakers and microphones in an audio environment.

The frequency of the first test signal may be different from the frequency of the second test signal. The first test signal may be provided at a first time and the second test signal may be provided at a second time that is later than the first time. The process may further include: automatically establishing speaker tuning output parameters based on analysis of different test signals by measuring ambient noise levels via one or more microphones; and determining an impulse response based on the first test signal and the second test signal; and determining speaker output levels for use by the first and second speakers based on the impulse response and the ambient noise level. The process may further include: determining a frequency response based on the outputs of the first and second speakers, and averaging values associated with the first and second test signals to obtain one or more of: an average Sound Pressure Level (SPL) of the one or more microphones, an average distance from all of the one or more microphones, and an average frequency response measured from the one or more microphones. The process may further include initiating a verification process that proceeds as an iterative process for each of the first speaker and the second speaker. The process may further include: performing an automatic equalization process to identify the frequency response of the first and second speakers to the desired response shape; and identifying one or more optimal filters having a frequency response closely matching the inverse of the measured frequency response.

Fig. 6B shows a process for performing an auto-tuning process for an audio system. Referring to fig. 6B, the process may include: identifying, in a particular room environment, a plurality of speakers and one or more microphones on a network controlled by a controller (652); providing test signals for playback sequentially from each amplifier channel and the plurality of speakers (654); simultaneously monitoring test signals from one or more microphones to detect speaker and amplifier channels in operation (656); providing additional test signals to the plurality of speakers to determine tuning parameters (658); detecting additional test signals at one or more microphones controlled by the controller (662); and automatically establishing a background noise level and a noise spectrum of the room environment based on the detected additional test signal (664).

The process may further include: while monitoring test signals from one or more microphones to identify whether any of the amplifier output channels are not connected to the plurality of speakers. The additional test signals may include a first test signal provided at a first time and a second test signal provided at a second time that is later than the first time. The process may further include: the frequency response of each of the plurality of speakers, and the sensitivity level of each amplifier channel and corresponding speaker, is automatically established. The sensitivity level is based on a target Sound Pressure Level (SPL) for a particular room environment. The process may further include: the method includes identifying a distance of each of the one or more microphones to each of the plurality of speakers, a room reverberation time for a particular room environment, a level setting of each speaker channel for achieving a target SPL, an equalization setting of each speaker channel for normalizing a frequency response of each speaker and achieving a target room frequency response, an echo cancellation parameter optimal for the particular room environment, a noise reduction parameter optimal for the particular room environment for reducing background noise detected by the microphone, and a nonlinear processing parameter optimal for the particular room environment for reducing background noise when speech is not detected. The process may further include initiating a verification process that continues to verify for each of the plurality of speakers as an iterative process, and the verification process includes again detecting additional test signals on one or more microphones controlled by the controller to verify the target SPL and the target room frequency response.

Fig. 7 illustrates an example process for performing an automatic audio system setup configuration. Referring to fig. 7, the process may include: identifying a plurality of speakers and microphones (712) connected to a network controlled by the controller; assigning (714) preliminary output gains to a plurality of speakers for applying the test signals; measuring ambient noise detected from the microphone 716; simultaneously recording chirp responses from all microphones (718), deconvolving all chirp responses to determine a corresponding number of impulse responses (722); and measuring an average Sound Pressure Level (SPL) of each microphone to obtain a SPL level based on the SPL average (724).

Measuring the ambient noise detected from the microphone may include checking for excessive noise. For each microphone input signal, the process may include: a main pulse peak is identified, and a distance from one or more of the plurality of speakers to each microphone is identified. The process may include: a frequency response of each microphone input signal is determined and a compensation value is applied to each microphone based on the frequency response. The process may further include: the frequency response is averaged to obtain a spatially averaged response and an automatic equalization is performed on the spatially averaged response to match the target response value. The process may further include: an attenuation value associated with the room is determined based on the SPL level and the distances from the nearest and farthest microphones, and an output gain is determined based on the SPL level and the attenuation value, the output gain providing a target sound level at an average distance of all microphones.

Fig. 8 illustrates an example process for performing an automatic equalization process on an audio system. Referring to fig. 8, the process may include: determining a frequency response of the measured chirp signal detected from the one or more speakers (812); determining an average value of the frequency response based on the high and low values (814); subtracting the measured response from the target response, wherein the target response is based on the one or more filter frequencies (816); determining a frequency limited target filter (818) having audible parameters based on the subtracting; and applying an Infinite Impulse Response (IIR) biquad filter based on the defined region of the frequency limited target filter to equalize frequency responses of the one or more speakers (822).

The average value is set to zero scallops and the target response is based on one or more frequencies associated with one or more biquad filters. Determining the target filter based on the target response may include: and determining a target zero crossing point and a target filter derivative zero point. The process may further include: limiting decibels of the target filter based on the detected amplitude peaks to create a limited filter; and adding a constrained filter to the filter bank. The process may further include: an unrestricted equalization filter is added to the measured response to provide an unrestricted correction response. The process may further include: the unrestricted correction response is subtracted from the target response to provide a new target filter.

Fig. 9 illustrates an example process for determining one or more gain values to apply to an audio system. Referring to fig. 9, the process may include: applying a set of initial power and gain parameters for the speaker (912); playing the excitation signal via a speaker (914); measuring a frequency response signal of the played excitation signal (916); determining a sound level at a microphone location and a sound level at a predetermined distance from one or more speakers (918); determining a gain at the microphone location based on a difference between the sound level at the microphone location and the sound level at a predetermined distance from the speaker (922); and applying a gain to the speaker output (924).

The predetermined distance may be a set distance (e.g., one meter) that is related to the location relative to the speaker where the user may be located. The process may further include: the excitation signal is detected at a microphone at a first distance from the loudspeaker and the excitation signal is detected at a second microphone at a second distance from the loudspeaker that is further than the first distance, and the detection is performed simultaneously at both microphones. The process may further include: a first sound pressure level at a first distance and a second sound pressure level at a second distance are determined. The process may further include: the attenuation of the speaker is determined based on a difference between the first sound pressure level and the second sound pressure level. The process may further include: the sensitivity of the speaker is determined based on a sound pressure level measured at a predetermined distance from the speaker when the speaker is driven by a reference voltage.

Fig. 10 shows a process for recognizing a voice clarity level or a voice transmission index. Referring to fig. 10, the process may include: initiating an auto-tuning process (1012); detecting sound measurements associated with outputs of a plurality of speakers at two or more locations via one or more microphones (1014); determining a number of voice transmission index (STI) values equal to the number of microphones (1016); and averaging the voice transmission index values to identify a single voice transmission index value (1018).

The process may further include: the number of STI values is measured when a plurality of speakers simultaneously provide output signals. Measuring the number of STI values while the plurality of speakers simultaneously provide output signals may include: a microphone is used. Measuring the number of STI values while the plurality of speakers simultaneously provide output signals may include: one of the plurality of microphones is used and is identified as being closest to an intermediate position of the plurality of speaker positions. Averaging the voice transmission index values to identify a single voice transmission index value may include: the STI values at the "N" microphones are measured and "N" is greater than 1, and the "N" values are averaged to identify a single STI value for a particular environment.

Auto-tuning can automatically measure the speech intelligibility of the conference audio system and the corresponding room using only the components typically required for conference systems, without the need for other instrumentation. Auto-tuning may be used with third party power amplifiers and speakers. Since the gain and sensitivity of these components are unknown, the auto-tuning process quickly determines these parameters by using a unique wideband multi-tone ramp signal until the microphone reaches a known SPL level, while using a speaker-to-microphone distance that is automatically measured via acoustic delay and calculated using the speed of sound. With this technique, the auto-tuning can determine the gain and sensitivity of the corresponding components, as well as the SPL level of the speaker. The wideband multitone signal is rapidly boosted and optimized for automatic determination of system parameters. The auto-tune auto-equalization algorithm rapidly equalizes multiple speaker zones according to various filters. In addition, the algorithm adds additional enhancement functions.

The process may include: the level and gain of the electroacoustic sound system are analyzed to determine the gain required to reach the desired sound level, and the gain structure is optimized to obtain a maximum dynamic range. Historically, sound pressure levels were represented by "dB SPL". The sound level is usually expressed in units "dB", which means that it is actually an absolute level with respect to 0 db=20u Pascal. Modern international standards denote sound pressure levels by Lp/(20 uPa) or simply Lp. However, lp is also commonly used to represent a variable in sound level, not sound level units. To avoid confusion, in this analysis, sound pressure level will always be denoted by "dBa", meaning absolute sound level, the same as outdated "dB SPL". "dBa" should not be confused with "dBa", which is typically a unit representing an a-weighted sound level. In this analysis, "L" is always a horizontal variable, is an absolute quantity, and "G" is always a gain variable, is a relative quantity. Since the equations contain variables with different units (electrical and acoustic), but still in decibels, these units are explicitly shown in { }, for clarity.

The analysis is split into two distinct signal paths: an input path from the sound source (speaker 218) to the DSP internal processing, and a path from the DSP internal processing to the speaker output level. There are two variations of each of these two paths. The input signal path has a division of an analog microphone and a digital microphone and the output path has a division of an analog amplifier and a digital amplifier (digital in terms of its input signal, not its power amplification technique). For consistency and simplicity, all signal attenuation is expressed in terms of gain, which may have negative values. For example, GP-s=lp-LSpkr is the gain from the speaker (at 1 meter) to the person, which may be-6 dB. These gains are shown as straight arrows in the figure, but in practice the sound path includes reflected and diffuse sound from surfaces around the room. Obviously, the impulse response of a room may reveal details of the room behaviour, but in the present analysis only non-time stationary sound levels, such as the sound level generated by pink noise, are of interest. To simplify the analysis, these multiple sound paths are all concentrated into a single path with gain "G". By measuring GP-S and GM-P, a known sound level at the listener' S location can be identified, as well as the set DSP output gain and input preamplifier gain. GP-S and GM-P are estimates since the microphones are not measured at the listener' S location. However, GM-S can be measured accurately and some estimates of GP-S and GM-P can be made according to typical conference room acoustic "rules of thumb". For consistency and simplicity, all signal attenuations are expressed in terms of gain, which has a negative value. For example, GP-s=lp-LSpkr is the gain from the speaker (at 1 meter) to the person, which may be-6 dB. These gains are shown as straight arrows in the figure, but in practice the sound path includes reflected and diffuse sound from surfaces around the room. Obviously, the impulse response of a room may reveal details of the room behavior, but in this analysis non-time stationary sound levels are identified, such as the sound level generated by pink noise. For simplicity, in this analysis, multiple sound paths are all concentrated into a single path with gain G. GP-S and GM-P are measured so that it is possible to identify a known sound level at the listener position and optimally set the DSP output gain and input preamplifier gain.

An example embodiment may include measuring speech intelligibility to reasonably obtain a speech intelligibility level for the conference room. The Speech Transmission Index (STI) should be identified based on a plurality of speech sources (e.g., speakers on the ceiling) and a plurality of listening positions around the room. Furthermore, the voice sources in the conference environment may be remotely located, and remote microphones, remote rooms, and transmission channels may all affect the listener's voice clarity experience. In a conference room where multiple speakers are typically used simultaneously, the STI should be measured logically with all "voice conference" speakers playing simultaneously. A voice conference speaker refers to all speakers that are normally on during a conference, while all speakers dedicated to music playback will be off. The reason is that a listener will typically listen to the speech from all the voice conference speakers at the same time, so that the speech intelligibility will be affected by all the speakers, and thus the rating should be measured with all the voice conference speakers on. The STI measured when all voice conference speakers are turned on may be better or worse than a single speaker, depending on the background noise level, echo and reverberation in the room, spacing between speakers, etc.

Since auto-tuning must use the microphone of the conference system instead of an additional measurement microphone, it should be noted that the auto-tuning STI measurement is a substitute value for the actual STI value of the measurement microphone placed at the listener's ear position. Since the conference room has multiple listener locations and possibly multiple conference microphones, the best STI rating is obtained by: all N microphones are measured simultaneously, N STI values are calculated, and then the average of these values is calculated to obtain the individual room STI values. This would be the average STI value measured at all conference microphone locations, which in turn is a substitute value for the average STI value at all listener locations. The auto-tuning algorithm(s) is designed to sort one of each output speaker zone at a time and measure all microphones simultaneously. Furthermore, the task of a real-time STI analyzer is very DSP intensive and can only measure one microphone input at a time. This therefore places practical limits on measuring the STI values of the "N" microphones and averaging these values. In order to obtain the most accurate STI value, all voice conference speakers should be played simultaneously.

Several strategies that may measure STI at multiple microphones during auto-tuning may include: as a first method, STI is measured only during the first speaker iteration, but all speakers will play STIPA, and then measurement is performed using the first microphone, but measurement using this microphone is determined to be at an intermediate position determined by the speaker-to-microphone distance measured in the CalcIR state. Another method may include: for each speaker region iteration, the STI for the next microphone input is measured so that multiple STI measurements can be averaged. But some problems may occur: if there is only one speaker zone, only the first microphone is measured. If the number of speaker zones is less than the number of microphones, then the middle microphone may be missed and the method run for the longest time.

It should also be noted that the STI value is generally understood to represent the voice transmission quality of the room. For teleconferencing systems, the quality of speech transmission experienced by listeners actually has three components: the STI of the room where the speaker and person are located, the STI of the electronic transmission channel, and the STI of the remote microphone and room. Thus, the STI value calculated by auto-tuning is a substitute value for only one of the three components that make up the listener's speech intelligibility experience. However, this may still provide a score for the proximal portion, which the user or installer may control during the event. For example, a user/installer may use the auto-tuned STI score to evaluate the relative improvement to STI using two different acoustic treatment designs.

Auto-tuning may automatically measure the speech intelligibility of the conference audio system and the corresponding room using only the components typically required for the conference system, and without the use of other instrumentation. Auto-tuning may be used with third party power amplifiers and speakers. Since the gain and sensitivity of these elements are unknown, the auto-tuning process quickly determines these parameters by using a unique wideband multi-tone ramp signal until the microphone reaches a known SPL level, while using automatic measurement via acoustic delay and using the speed of sound to calculate the speaker-to-microphone distance. With this technique, the auto-tuning can determine the gain and sensitivity of the corresponding components, as well as the SPL level of the speaker. The wideband multitone signal is rapidly boosted and optimized for automatic determination of system parameters. The auto-tune auto-equalization algorithm rapidly equalizes multiple speaker zones based on various filters. In addition, the algorithm adds additional enhancement functions.

The operations of a method or algorithm associated with the embodiments disclosed herein may be embodied directly in hardware, in a computer program executed by a processor, or in a combination of the two. The computer program may be embodied on a computer readable medium, such as a storage medium. For example, a computer program may reside in random access memory ("RAM"), flash memory, read-only memory ("ROM"), erasable programmable read-only memory ("EPROM"), electrically erasable programmable read-only memory ("EEPROM"), registers, hard disk, a removable disk, a compact disc read-only memory ("CD-ROM"), or any other form of storage medium known in the art.

Fig. 11 is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the application described herein. Regardless, the computing node 1100 is capable of implementing and/or performing any of the functions described herein.

In computing node 1100 there is a computer system/server 1102, and the computer system/server 1102 may operate in conjunction with a number of other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for computer system/server 1102 include, but are not limited to, personal computer systems, server computer systems, thin clients, rich clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems or devices, and the like.

The computer system/server 1102 may be described in the general context of computer system-executable instructions (e.g., program modules) being executed by a computer system. Generally, program modules may include routines, procedures, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server 1102 may be used in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As shown in fig. 11, computer systems/servers 1102 in cloud computing node 1100 are shown in the form of general purpose computing devices. Components of computer system/server 1102 may include, but are not limited to, one or more processors or processing units 1104, a system memory 1106, and a bus that couples various system components including the system memory 1106 to the processor 1104.

Bus represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro Channel Architecture (MCA) bus, enhanced ISA (EISA) bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

The computer system/server 1102 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer system/server 1102 and includes both volatile and nonvolatile media, removable and non-removable media. In one embodiment, the system memory 1106 implements the flow diagrams in other figures. The system memory 1106 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 1110 and/or buffers 1112. The computer system/server 1102 may also include other removable/non-removable, volatile/nonvolatile computer system storage media. For example, the storage system 1114 may be provided to read from and write to non-removable, non-volatile magnetic media (not shown, commonly referred to as a "hard disk drive"). Although not shown, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may also be provided. In which case each drive may be connected to the bus through one or more data medium interfaces. As will be further depicted and described below, memory 1106 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the application.

A programmable/utility 1116 having a set (at least one) of program modules 1118 can be stored in the memory 1106 by way of example and not limitation, and the memory 1106 can also store an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data, or some combination thereof, may include an implementation of a network environment. Program modules 1118 generally perform the functions and/or methods of the various embodiments of the applications described herein.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or computer program product. Accordingly, aspects of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, aspects of the present application may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied therein.

The computer system/server 1102 may also communicate with the following devices: one or more external devices 1120, such as a keyboard, pointing device, display 1122, or the like; one or more devices that enable a user to interact with the computer system/server 1102; and/or any device (e.g., network card, modem, etc.) that enables the computer system/server 1102 to communicate with one or more other computing devices. Such communication may occur through I/O interface 1124. In addition, the computer system/server 1102 can also communicate with one or more networks such as a Local Area Network (LAN), a general Wide Area Network (WAN), and/or a public network (e.g., the Internet) through a network adapter 1126. As shown, network adapter 1126 communicates with the other components of computer system/server 1102 through a bus. It should be appreciated that although not shown, other hardware and/or software components can be utilized in conjunction with the computer system/server 1102. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data archive storage systems, and the like.

Those skilled in the art will appreciate that a "system" may be embodied as a personal computer, server, console, personal Digital Assistant (PDA), cell phone, tablet computing device, smart phone, or any other suitable computing device, or combination of devices. The above described functions are described as being performed by a "system" and are not intended to limit the scope of the application in any way, but rather to provide an example of one of many embodiments. Indeed, the methods, systems, and apparatus disclosed herein may be implemented in localized and distributed forms consistent with computing technology.

It should be noted that certain system features described in this specification are presented in terms of modules to further highlight the independence of their implementation. For example, a module may be implemented as a hardware circuit comprising custom Very Large Scale Integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, graphics processing units or the like.

Modules may also be implemented at least partially in software for execution by various types of processors. For example, an identified unit of executable code may comprise one or more physical or logical blocks of computer instructions which may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module. Furthermore, the modules may be stored on a computer readable medium, such as a hard disk drive, a flash memory device, random Access Memory (RAM), magnetic tape, or any other medium for storing data.

Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices.

It will be readily understood that the application components as generally described and illustrated in the figures herein could be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments is not intended to limit the scope of the disclosure, but is merely representative of selected embodiments of the application.

Those of ordinary skill in the art will readily appreciate that the foregoing may be implemented with steps in a different order and/or hardware elements in a configuration than those disclosed. Thus, while the application has been described in terms of these preferred embodiments, certain modifications, variations and alternative constructions will be apparent to those skilled in the art.

While the preferred embodiment of the present application has been described, it is to be understood that the described embodiment is illustrative only and the scope of the application is to be defined solely by the appended claims when considered in light of the full scope of equivalents and modifications (e.g., protocols, hardware devices, software platforms, etc.).

Claims

1. A method, comprising:

starting an automatic tuning process;

detecting, via one or more microphones, sound measurements associated with outputs of one or more speakers at two or more locations;

determining a number of voice transmission index (STI) values equal to the number of microphones; and

the voice transmission index values are averaged to identify a single voice transmission index value.

2. The method according to claim 1, comprising:

the number of STI values is measured while the one or more speakers simultaneously provide output signals.

3. The method of claim 2, wherein measuring the number of STI values while the one or more speakers simultaneously provide output signals comprises: a microphone is used.

4. The method of claim 1, wherein measuring the number of STI values while the one or more speakers simultaneously provide output signals comprises: one of a plurality of microphones is used, and wherein the one microphone is identified as being closest to an intermediate position of the plurality of speakers.

5. The method of claim 1, wherein averaging the voice transmission index values to identify a single voice transmission index value comprises: the STI values are measured at "N" microphones.

6. The method of claim 5, wherein "N" is greater than 1, and averaging the voice transmission index values further comprises: the "N" values are averaged to identify a single STI value for a particular environment.

7. The method of claim 1, wherein the one or more speakers comprise a plurality of speakers.

8. An apparatus, comprising:

a processor configured to:

starting an automatic tuning process;

9. The apparatus of claim 8, wherein the processor is further configured to: the number of STI values is measured while the one or more speakers simultaneously provide output signals.

10. The apparatus of claim 9, wherein the number of STI values is measured while the one or more speakers simultaneously provide output signals using one microphone.

11. The apparatus of claim 8, wherein the processor is configured to control one of a plurality of microphones, the number of STI values being measured while the one or more speakers simultaneously provide output signals, and wherein the one microphone is identified as being closest to an intermediate position of the plurality of speakers.

12. The apparatus of claim 8, wherein the voice transmission index values are averaged to identify a single voice transmission index value by the processor being configured to measure the STI values at "N" microphones.

13. The apparatus of claim 12, wherein "N" is greater than 1, and averaging the voice transmission index values further comprises: the processor is configured to average the "N" values to identify a single STI value for a particular environment.

14. The apparatus of claim 8, wherein the one or more speakers comprise a plurality of speakers.

15. A non-transitory computer-readable storage medium configured to store instructions that, when executed, cause the processor to:

starting an automatic tuning process;

16. The non-transitory computer readable storage medium of claim 15, wherein the processor is further configured to:

17. The non-transitory computer-readable storage medium of claim 16, wherein measuring the number of STI values while the one or more speakers simultaneously provide output signals comprises: a microphone is used.

18. The non-transitory computer-readable storage medium of claim 15, wherein measuring the number of STI values while the one or more speakers simultaneously provide output signals comprises: one of a plurality of microphones is used, and wherein the one microphone is identified as being closest to an intermediate position of the plurality of speakers.

19. The non-transitory computer-readable storage medium of claim 15, wherein averaging the voice transmission index values to identify a single voice transmission index value comprises: the STI values are measured at "N" microphones.

20. The non-transitory computer-readable storage medium of claim 19, wherein "N" is greater than 1, and averaging the voice transmission index values further comprises: the "N" values are averaged to identify a single STI value for a particular environment.