US11211081B1 - Microphone array with automated adaptive beam tracking - Google Patents
Microphone array with automated adaptive beam tracking Download PDFInfo
- Publication number
- US11211081B1 US11211081B1 US16/990,924 US202016990924A US11211081B1 US 11211081 B1 US11211081 B1 US 11211081B1 US 202016990924 A US202016990924 A US 202016990924A US 11211081 B1 US11211081 B1 US 11211081B1
- Authority
- US
- United States
- Prior art keywords
- microphone arrays
- microphone
- signal
- audio
- signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2410/00—Microphones
- H04R2410/01—Noise reduction using microphones having different directional characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/02—Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
An example method of operation may include designating sub-regions which collectively provide a defined reception space, receiving audio signals at a controller from the microphone arrays in the defined reception space, configuring the controller with known locations of each of the microphone arrays, assigning each of the sub-regions to at least one of the microphone arrays based on the known locations, and creating beamform tracking configurations for each of the microphone arrays based on their assigned sub-regions.
Description
This application is a continuation of U.S. patent application Ser. No. 16/279,927, filed on Feb. 19, 2019, now U.S. Pat. No. 10,741,193, issued Aug. 11, 2020, which is a continuation of U.S. patent application Ser. No. 16/017,538, filed on Jun. 25, 2018, now U.S. Pat. No. 10,210,882, issued Feb. 19, 2019, the entire disclosures of which are herein incorporated by reference.
This application generally relates to beam forming, and more particularly, to automated beam forming for optimal voice acquisition in a fixed environment.
A fixed environment may require a sound reception device that identifies sound from a desired area using a microphone array. The environment may be setup for a voice conference which includes microphones, speakers, etc., to which a sound detection device is applied.
Conventionally, voice conference devices may receive sound (i.e., speech) from various attendants participating in the voice conference, and transmit the sound received to remote voice conferences or local speaker systems for sharing the voice of one's speech or other shared sound to be replayed in real-time for others to hear.
In a conference scenario, there are often many attendants, and a voice detection device would need to identify sound associated with each of those attendants. In addition, when the attendant(s) moves, the device would have to identify the attendant moving away from a sound-pickup area. Also, when there is a noise source, such as a projector or other noise making entity, in a conference room, the voice conference device would have a focal sound-pickup area to reduce non-desirable noise from outside that area from being captured.
Conventional approaches provide microphone arrays which have multiple beamformers that define fixed steering directions for fixed beams or coverage zones for tracking beams. The directions or zones are either pre-programmed and not modifiable by the administrators or are configurable during a setup stage. Once configured, the specified configuration remains unchanged in the system during operation. When the number of persons speaking in a particular environment changes over time and/or the positions of activities changes, the result is sub-optimal since the need for a dynamic adjustment is not addressed to match those identified changes in the environment. Also, current beamforming systems deployed in microphone arrays operate mostly in an azimuth dimension, at a single fixed distance and at a small number of elevation angles.
Audio installations frequently include both microphones and loudspeakers in the same acoustic space. When the content sent to the loudspeakers includes signals from the local microphones, the potential for feedback exists. Mix-minus configurations are frequently used to maximize gain before feedback in these types of situations. “Mix-minus” generally refers to the practice of attenuating or eliminating a microphone's contribution to proximate loudspeakers. Mix-minus configurations can be tedious to set up, and are often not set up correctly or ideally.
One example embodiment may provide a method that includes initializing a microphone array in a defined space to receive one or more sound instances based on a preliminary beamform tracking configuration, detecting the one or more sound instances within the defined space via the microphone array, modifying the preliminary beamform tracking configuration, based on a location of the one or more sound instances, to create a modified beamform tracking configuration, and saving the modified beamform tracking configuration in a memory of a microphone array controller.
Another example embodiment may include an apparatus that includes a processor configured to initialize a microphone array in a defined space to receive one or more sound instances based on a preliminary beamform tracking configuration, detect the one or more sound instances within the defined space via the microphone array, modify the preliminary beamform tracking configuration, based on a location of the one or more sound instances, to create a modified beamform tracking configuration, and a memory configured to store the modified beamform tracking configuration in a microphone array controller.
Yet another example embodiment may include a non-transitory computer readable storage medium configured to store instructions that when executed cause a processor to perform initializing a microphone array in a defined space to receive one or more sound instances based on a preliminary beamform tracking configuration, detecting the one or more sound instances within the defined space via the microphone array, modifying the preliminary beamform tracking configuration, based on a location of the one or more sound instances, to create a modified beamform tracking configuration, and saving the modified beamform tracking configuration in a memory of a microphone array controller.
Still another example embodiment may include a method that includes designating a plurality of sub-regions which collectively provide a defined reception space, receiving audio signals at a controller from a plurality of microphone arrays in the defined reception space, configuring the controller with known locations of each of the plurality of microphone arrays, assigning each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on the known locations, and creating beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions.
Still yet another example embodiment may include an apparatus that includes a processor configured to designate a plurality of sub-regions which collectively provide a defined reception space, a receiver configured to receive audio signals at a controller from a plurality of microphone arrays in the defined reception space, and the processor is further configured to configure the controller with known locations of each of the plurality of microphone arrays, assign each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on the known locations, and create beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions.
Still yet another example embodiment may include a non-transitory computer readable storage medium configured to store instructions that when executed cause a processor to perform designating a plurality of sub-regions which collectively provide a defined reception space, receiving audio signals at a controller from a plurality of microphone arrays in the defined reception space, configuring the controller with known locations of each of the plurality of microphone arrays, assigning each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on the known locations, and creating beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions.
Yet another example embodiment may include a method that includes one or more of detecting an acoustic stimulus via active beams associated with at least one microphone disposed in a defined space, detecting loudspeaker characteristic information of at least one loudspeaker providing the acoustic stimulus, transmitting acoustic stimulus information based on the acoustic stimulus to a controller, and modifying, via a controller, at least one control function associated with the at least one microphone and the at least one loudspeaker to minimize acoustic feedback produced by the loudspeaker.
Still yet a further example embodiment may include an apparatus that includes a processor configured to detect an acoustic stimulus via active beams associated with at least one microphone disposed in a defined space, detect loudspeaker characteristic information of at least one loudspeaker providing the acoustic stimulus, a transmitter configured to transmit acoustic stimulus information based on the acoustic stimulus to a controller, and the processor is further configured to modify, via a controller, at least one control function associated with the at least one microphone and the at least one loudspeaker to minimize acoustic feedback produced by the loudspeaker.
Yet still another example embodiment may include a non-transitory computer readable storage medium configured to store instructions that when executed cause a processor to perform detecting an acoustic stimulus via active beams associated with at least one microphone disposed in a defined space, detecting loudspeaker characteristic information of at least one loudspeaker providing the acoustic stimulus, transmitting acoustic stimulus information based on the acoustic stimulus to a controller, and modifying, via a controller, at least one control function associated with the at least one microphone and the at least one loudspeaker to minimize acoustic feedback produced by the loudspeaker.
It will be readily understood that the instant components, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of at least one of a method, apparatus, non-transitory computer readable medium and system, as represented in the attached figures, is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments.
The instant features, structures, or characteristics as described throughout this specification may be combined in any suitable manner in one or more embodiments. For example, the usage of the phrases “example embodiments”, “some embodiments”, or other similar language, throughout this specification refers to the fact that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment. Thus, appearances of the phrases “example embodiments”, “in some embodiments”, “in other embodiments”, or other similar language, throughout this specification do not necessarily all refer to the same group of embodiments, and the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In addition, while the term “message” may have been used in the description of embodiments, the application may be applied to many types of network data, such as, packet, frame, datagram, etc. The term “message” also includes packet, frame, datagram, and any equivalents thereof. Furthermore, while certain types of messages and signaling may be depicted in exemplary embodiments they are not limited to a certain type of message, and the application is not limited to a certain type of signaling.
Example embodiments provide a voice tracking procedure which is applied to microphone arrays disposed in a fixed environment, such as a conference room. The arrays are centrally managed and controlled via a central controller (i.e., server, computer, etc.). In another example, the arrays may be centrally managed and controlled with one of the arrays acting as a central controller and/or a remote controller outside the arrays. Location data from the microphone array will be 3-dimensional, including azimuth, elevation and distance coordinates. This represents an extension over current beamforming systems, which operate mostly in the azimuth dimension, at a single fixed distance and at a small number of elevation angles.
Validation of the accuracy of the location data may be provided by a tracking beamformer module which is part of the microphone array(s). The distance dimension may be included in the calculations to inform both the digital signal processing (DSP) algorithm development and specification of relevant product features. Beamforming procedures and setup algorithms may be used to define a discrete search space of beamforming filters at defined locations, referred to as a filter grid. This grid is defined by a range and number of points in each of three spherical coordinate dimensions including azimuth, elevation and distance.
Compared to previous attempts at beam forming in a conference room environment and similar environments, a major distinction in the present example embodiments is a requirement to cover a larger area. The information produced by the tracker must include not just azimuth and elevation angles, but also a distance to the talker, thus creating three dimensions of beam forming considerations. Two complementary but discrete functions of the tracking algorithm may provide steering the array directivity pattern to optimize voice quality, and producing talker location information for certain purposes, such as user interfaces, camera selection, etc.
When estimating distance from a single microphone array for a given steering direction, given by both azimuth and elevation angles, the ability of a microphone array to distinguish different talker location distances using a steered response power and/or time delay method, depends on its ability to distinguish the curvature of the sound wave front. This is illustrated in the following examples of FIGS. 1C and 1D . It can be observed that the impact of the wave front curvature is more significant for closer sources, leading to greater distance differences.
The preceding example is formalized by distinguishing the near field and far field of a microphones. In the near field, the wave behaves like a spherical wave and there is therefore some ability to resolve source distances. In the far field, however, the wave approximates a plane wave and hence source distances cannot be resolved using a single microphone array. The array far field is defined by: r>(2 L{circumflex over ( )}2)/λ, where ‘r’ is the radial distance to the source, ‘L’ is the array length, and ‘λ’ is the wavelength, equivalently, c/f where ‘c’ is the speed of sound and ‘f’ is frequency. In practice, while some distance discrimination may be achieved for sources within a certain distance of the array, beyond that distance all sources are essentially far-field and the steered response power will not show a clear maximum at the source distance. Given the typical range of talkers for array configuration use cases, it may be imprecise when attempting to discriminate distance directly using steered response power from a single array.
With regard to the tracking example described above, in terms of the purpose of optimizing voice quality by beamforming, there is, therefore, not considered to be any significant audio benefit from beamforming filters calculated at different distances due to the difficulties of resolving a distance dimension. Instead, a single set of beamforming filters optimized for far-field sources provide the most consistent audio output and constrain the tracking search to only operate over azimuth and elevation angles. Nonetheless, for the secondary purpose of providing talker location information for other uses, it is still desirable to estimate distance to some resolution.
In order to achieve talker location information, projection of distance based on elevation angle and assumed average vertical distance between array and talker head locations, and/or a triangulation of angle estimates from multiple microphone array devices in the room may be performed. In this approach, the microphone array should be mounted in the ceiling or suspended from the ceiling, target source locations are the mouths of people that will either be standing or sitting in the room (see FIG. 1E ).
For the resolution and dimensionality of the search grid, as seen previously, there is negligible ability to resolve distances with a single microphone array device due to the far-field nature of the voice sources. The larger microphone array according to example embodiments, provides increased resolution in azimuth and elevation, particularly in higher frequencies, for reasons of voice clarity, the actual beam filters in such a case may be designed to target a 3 dB beamwidth of approximately 20-30 degrees. For this reason, a grid resolution of 5 degrees in both azimuth and elevation may be considered to be a practical or appropriate resolution for tracking, when there is unlikely to be any noticeable optimization in audio quality by tracking to resolutions beyond that level. This possible resolution may lead to 72 points in the azimuth dimension (0 to 355 degrees) and 15 points in the elevation dimension (5 to 75 degrees), giving a total grid (i.e., energy map) size of 1080 distinct locations. If a 6-degree resolution, is instead used, in both dimensions, the grid size decreases to 780 points (60 points in azimuth, 13 points in elevation from 6 to 78 degrees), which is approximately a 25% reduction in computational load.
According to example embodiments, the microphone array may contain 128 microphones for beamforming, however, as tracking only uses a single energy value over a limited frequency band, it is not necessary to use all of those microphones for tracking purposes. In particular, many of the closely spaced microphones may be discarded as the average energy over the frequency band will not be overly influenced by high frequency aliasing effects. This is both because a high frequency cut-off for the tracker calculations will eliminate much of the aliasing, and also because any remaining aliasing lobes will vary direction by frequency bin and hence averaging will reduce their impact. One example demonstrates a full 128-microphone array, and an 80-microphone subset that could be used for energy map tracking calculations. This is a reduction in computational complexity of approximately 35% over using a full array.
The tracking procedure is based on calculating power of a beam steered to each grid point. This is implemented in the FFT domain by multiply and accumulate operations to apply a beamforming filter over all tracking microphone channels, calculating the power spectrum of the result, and obtaining average power over all frequency bins. As the audio output of each of these beams is not required by the tracking algorithm, there is no need to process all FFT bins, and so computational complexity can be limited by only calculating the power based on a subset of bins. While wideband voice has useful information up to 7000 or 8000 Hz, it is also well-known that the main voice energy is concentrated in frequencies below 4000 Hz, even as low as 3400 Hz in traditional telephony.
Further, it may only be necessary to calculate the phase transformed microphone inputs on 80 microphones, once every N frames and stored for use with all grid points. Hence the computational complexity of the input for the loop will be reduced by a factor of 1/N. To spread the computational load, the transformed microphone inputs may be calculated for one audio frame callback, and then update the energy map based on that input over the following 15-20 audio frames. This configuration provides that the full grid energy map will be updated at a rate of 20-40 fps, i.e., updated every 25 to 50 milliseconds. Voiced sounds in speech are typically considered to be stationary over a period of approximately 20 milliseconds, and so an update rate on the tracker of 50 milliseconds may be considered as sufficient. Further computational optimizations may be gained by the fact that the noise removal sidechain in the tracking algorithm needs to only be applied over the tracking microphone subset, e.g., 80 microphones instead of the full 128 microphones. The steered response power (SRP) is calculated at every point of the search grid over several low rate audio frames. Having access to the audio energy at each point of the grid permits a combination over multiple devices, assuming relative array locations are known. This also facilitates room telemetry applications.
According to example embodiments, the beamforming and microphone array system would be operated as one or more arrays in a single reception space along with a master processing system. At a new installation, the master processing system or controller would initiate an array detection process in which each array would be located relative to the other arrays through emitting and detecting some calibration signal, optionally, this process may be performed via a user interface instead of through this automated process. The master would then know the relative locations of each array. The process would have then likely emitted a similar calibration signal from each loudspeaker in the room to determine relative locations or impulse response to each loudspeaker. During operation (i.e., a meeting), each array would calculate a local acoustic energy map. This energy map data would be sent to the master in real-time. The master would merge this into a single room energy map. Based on this single room energy map, the master would identify the main voice activity locations in a clustering step, ignoring remote signals in the known loudspeaker locations. It would assign the detected voice locations to the nearest array in the system. Each array would be forming one or more beam signals in real-time as controlled by this master process. The beam audio signals would come back from each array to the master audio system which would then be responsible to automatically mix them into a room mix signal.
Example embodiments provide a configuration for initializing and adapting a definition of a microphone array beamformer tracking zone. The beamforming is conducted based on voice activity detected and voice location information. The configuration may dynamically adjust a center and range of beamforming steering regions in an effort to optimize voice acquisition from a group of talkers within a room during a particular conversation conducted during a meeting.
Localized voice activity patterns are modeled over time, and zone definitions are dynamically adjusted so that default steering locations and coverage ranges for each beam corresponds to the expected and/or observed behavior of persons speaking during the conference/event. In one example, predefined zones of expected voice input may be defined for a particular space. The zones may be a portion of circle, square, rectangle or other defined space. The dynamic zone adjustment may be performed to accommodate changes in the speaking person(s) at any given time. The zone may change in size, shape, direction, etc., in a dynamic and real-time manner. The zones may have minimum requirements, such as a minimum size, width, etc., which may also be taken into consideration when performing dynamic zone adjustments.
In another example, a number of talkers or persons speaking at any given time may be identified, estimated and/or modeled over a period of time. This ensures stable mixing and tracking of beams zones with active talkers as opposed to zones which are not producing audible noise or noise of interest. Automating the allocation of beam locations and numbers, the configuration used to accommodate the event may be selected based on the event characteristics, such as center, right, left, presentation podium, etc., instead of at the ‘per-beam’ level. The controller would then distribute the available beams across those conceptual areas in a dynamic distribution to optimize audio acquisition according to actual usage patterns. Also, the zones may be classified as a particular category, such as “speech” or “noise” zones. An example of noise zone classification may be performed by detecting a loudspeaker direction using information from AEC or a calibration phase and/or location prominent noise sources during a non-speech period. The noise zones may then be suppressed when configuring a particular mix configuration, such as through a spatial null applied in the beamformer.
Example embodiments provide minimizing beam and zone configuration time for installers since the automation and dynamic adjustments will yield ongoing changes. The initialization provides for uniformly distributed zones and then adaptation during usage to adjust to the changes in the environment. This ensures optimal audio output being maintained for evolving environment changes.
One approach to configuring a modular microphone array is to provide a three-dimensional approach to adjusting the beams, including azimuth, elevation and distance coordinates. A setup configuration of physical elements may provide a physical placement of various microphone arrays, such as, for example two or more microphone arrays in a particular fixed environment defined as a space with a floor and walls. The automated configuration process may be initiated by a user and the resulting calibration configuration parameters are stored in a memory accessible to the controller of the microphone arrays until the calibration configuration is deleted or re-calculated. During the calibration configuration process, the microphone arrays may either take turns emitting a noise, one at a time, or each microphone array may emit a noise signal designed to be detected concurrently (e.g., different known frequency range for each device, or different known pseudo-random sequence). The “noise” may have been a pseudo-random “white” noise, or else a tone pulse and/or a frequency sweep. One example provides emitting a Gaussian modulated sinusoidal Pulse signal from one device and detected using a matched filter on another device within the arrays, however, one skilled in the art would appreciate other signal emissions and detections may be used during the setup calibration phase.
The calibration and coordinating process would run on a master processor of the controller (e.g., a personal computer (PC) or an audio server) that has access to audio and data from all devices. While a master process will need to coordinate the processing, some of the processing may be performed on each of the microphone arrays via a memory and processor coupled to each microphone array device. During the calibration process, relative locations of the microphone arrays may be established in a single coordinate system. For example, one array may be designated as an origin (i.e., (x, y, z)) with a (0, 0, 0) reference and other microphone arrays will be located with corresponding Cartesian coordinates with respect to this origin position. Knowing relative locations will permit merging of beam tracking zones across multiple arrays and determining which array “owns” each beam when performing actual beamforming, which also provides input for automatic beam mixing and gain control procedures. The calibration procedure may require ranging of signals for a few seconds per microphone array, however, the entire process may require a few minutes.
One example result may reduce mixing of multiple out-of-phase versions of the same voice to reduce feedback an unwanted audio signals. When the arrays work independently and each track the same voice at a given time, the result can be unfavorable. Due to different physical locations, a person's voice originated from a common location would have different phase delays at each microphone array, this in turn, would lead to voice degradation from a comb filtering type effect. Another objective may be to have the closest microphone array responsible for forming an audio beam for a given talker. Proximity to the talker will optimize the signal to noise ratio (SNR) compared to a more distant microphone array.
One example embodiment may provide optimizing the accuracy of a beam tracking operation by discerning distances by triangulating distances between multiple microphone arrays based on energy tracking. The distances and energy information may be use for deciding which array unit is responsible to provide a beamformed signal to a particular voice source (person). The method may also include determining mixing weights for merging the various beam signals originating from multiple microphone arrays into a single room mixed signal.
The adaptation of voice may be based on actual live event data received from the event room as a meeting occurs, such a procedure does not require samples of audio and/or performing calibration of beam positions in a setup stage prior to a conference event. The system provides dynamic and ongoing adjustments among the microphone arrays based on the data received regarding locations of speakers, background noise levels, direction of voices, etc. An initial room condition may require an initial condition, which could be a uniform distribution of ‘N’ beam zones around 360 degrees (i.e., 360/N degrees apart) and/or a stored distribution based on a final state from a previous event, and/or a preset configuration that was created and saved through a user interface, or created by sampling voices in different places of the event room.
As the meeting begins, the array may automatically adapt the beam tracking zones according to detected voice locations and activity in the room over a certain period of time. For instance, the process may proceed with four beams at 0, 90, 180 and 270 degrees, each covering +/−45 degrees around a center point. Then, if someone begins talking at a 30-degree angle, the first beam zone will gradually adapt to be centered on 30 degrees+/−some range, and the other three beams will adjust accordingly. An initial condition may provide a beam zone distribution of four uniformly spaced zones as an initial condition, however, six may also be appropriate depending on the circumstances. There may be some changes to the center and range of some of the zones after some live usage activity to account for actual talker locations during a meeting.
According to another example embodiment, multiple microphone array devices (modules) may be strategically arranged in a single room or ‘space’. Those modules may be identified by a central controller as being located in a particular location and/or zone of the room. The modules may also be aware of their position and other module positions throughout the space. Location information may be used to provide a joint beamforming configuration where multiple microphone arrays provide and contribute to a single beamform configuration. The modules or central controller may perform intelligent mixing of beamformed audio signals and voice tracking data. The grouping of modules in a single room and their configuration and relative position/locations and orientation may be automatically configured and adjusted by a process that jointly detects calibration signals emitted from each device. The calibration signals may be spoken words by a speaker, pulses sent from the speakers in the room or speakers associated with the modules, etc.
In general, there may be some physical separation between the arrays 212, 214 and 216. One approach may provide separating the arrays by one meter from one another. This configuration may include the modules being directly adjacent to one another. During a joint beamforming configuration, all microphone elements of all arrays may be participating in one or more beamforms used to capture audio from various parts of the room. The controller 220 may incorporate one, some or all of the microphone array elements into any number of joint beamforms to create one large array of beamforming. Beamformer steering directions and tracking zones are created and managed for all the microphone arrays so that multiple arrays may be performing a single joint beamforming activity.
According to another example embodiment, a microphone array and speaker system may utilize an automated location-based mixing procedure to reduce undesirable feedback from occurring in a predefined space. The configuration may include one or more microphone arrays or array devices and multiple speakers used for local reinforcement so the active beam location from a microphone array is used to invoke an automated mixing and reduction (mix-minus) procedure to reduce relative feedback of a person(s)'s voice as it is amplified through the room speakers. Detecting locations of the speakers in the room relative to the microphone arrays may be performed to determine certain characteristics of the potential for noise feedback and the degree of correction necessary. In operation, calibration signals may be emitted from the speakers and detected to identify speaker locations with respect to the various microphone arrays. Delays may also be determined to identify characteristics between microphones and speakers in the room. In another example, the calibration signals may be emitted from speakers that are not necessarily physically co-located in the microphone array device.
In one example embodiment, a DSP processing algorithm may be used to automate the configuration of a mixing and substracting system to optimize for gain before feedback occurs. The process of feedback occurs when the gain of a microphone-loudspeaker combination is greater than 0 dB at one or more frequencies. The rate at which feedback will grow or decay is based on the following formula: R=G/D, where: “R” is the feedback growth/decay rate in dB/sec (i.e., how quickly the feedback tone will get louder or softer), “G” is the acoustic gain of the microphone-loudspeaker combination in dB (i.e., the difference between the level of a signal sent to the DSP output and the level of the same signal received by the microphone at the DSP input), and “D” is the delay of the microphone-loudspeaker combination (i.e., elapsed time between when a signal is picked up by a microphone, output by the loudspeaker, and arrives back at the microphone—in seconds).
Since delay is always a positive value, the gain of the microphone-loudspeaker combination must be greater than 0 dB for feedback to occur. However, if the gain is negative but still relatively close to 0 dB, the feedback decay rate will be slow and an undesirable, audible “ringing” will be heard in the system. For instance, if the gain of a microphone-loudspeaker combination is −0.1 dB and its delay is 0.02 seconds (20 mS), then feedback will decay at a rate of 5 dB/sec, which is certainly audible. If a level of the microphone's contribution is reduced to that loudspeaker by 3 dB, then feedback will decay at a much faster rate of 155 dB/sec. Feedback is frequency-dependent. Feedback creates resonances at periodic frequencies, which depend on delay time, and feedback will first occur at those resonant frequencies. If a DSP algorithm has the ability to measure the inherent gain and delay of a microphone-loudspeaker combination, it can manage the rate of feedback decay in the system by modifying the gain or modifying the delay, except that modifying delay would likely have undesirable side effects. Such an algorithm can maximize the level of the microphone's signal being reproduced by the loudspeaker while minimizing the potential for feedback.
The proposed algorithm/procedure is designed to maximize gain before feedback, however it is important to note that this mix and subtraction system is used for more than just maximizing gain before feedback. For instance, this algorithm should not be expected to maximize speech intelligibility or to properly set up voice lift systems, for example, where the reinforcement system is not designed to be “heard”, the listener still perceives the sound as originating from the talker. This requires much more knowledge of the relative distances between the talker and listener, and between the listener and loudspeaker. Maximizing gain before feedback is not the only task required to properly set up such a system. For instance, this algorithm/procedure should not be expected to properly set up the gain structure of an entire system or correct for poor gain structure.
The procedure may be setup so the cross-point attenuations within a matrix mixer such that gain before feedback is maximized. In order to perform this function, the algorithm first needs to measure the gain of each microphone-loudspeaker combination. The procedure will output a sufficiently loud noise signal out of each speaker zone at a known level, one zone at a time. It will then measure the level of the signal received by each microphone while that single speaker (or zone of speakers) is activated. The gain measurements are taken while the microphone is routed to the speaker, because the transfer function of the open-loop system (i.e., where no feedback is possible) will be different than the transfer function of the closed-loop system. In order for the procedure to calculate the exact feedback decay rate of each microphone-loudspeaker combination, it would also need to measure the delay of each combination. However, measuring the delay of a microphone-loudspeaker combination may be more complicated than simply measuring the gain and/or may require different test signals. Furthermore, for our purposes, we can assume that the delay will be reasonably small (e.g., less than 50 milliseconds) for any microphone-loudspeaker combination that actually has enough gain that could become feedback.
The microphone array may be used to locate the speakers for purposes of estimating delay and/or gain correction. Detecting locations of the speakers in the room relative to the microphone arrays may be performed to determine certain characteristics of the potential for noise feedback, gain, and/or a relative degree of correction necessary. In operation, calibration signals may be emitted from the speakers and detected to identify speaker locations with respect to the various microphone arrays. Delays may also be determined to identify characteristics between microphones and speakers in the room. In another example, the calibration signals may be emitted from speakers that are not necessarily physically co-located in the microphone array device.
Therefore, if the acoustic gain of the microphone-speaker combination is less than some threshold value (e.g., 3 dB), then the feedback decay rate will be acceptable and “ringing” won't be audible. For this reason, measuring the delay of each microphone-loudspeaker combination will be unnecessary. Once the algorithm has measured the gain of each microphone-loudspeaker combination, it must check to see if any combinations have an acoustic gain that is greater than the threshold value (−3 dB). For any combinations with a gain greater than the threshold value, the algorithm will attenuate the matrix mixer crosspoint corresponding to that combination by a value which will lower the gain below the threshold value. For any combinations with an acoustic gain that is already less than the threshold value, the algorithm will pass the signal through at unity gain for the corresponding crosspoint and no positive gain will be added to any crosspoint.
The method may further include designating a plurality of sub-regions which collectively provide the defined space, scanning each of the plurality of sub-regions for the one or more sound instances, and designating each of the plurality of sub-regions as a desired sound sub-region or an unwanted noise sub-region based on the sound instances received by the plurality of microphone arrays during the scanning of the plurality of sub-regions, and one or more sound instances may include a human voice. The method may also provide subsequently re-scanning each of the plurality of sub-regions for new desired sound instances, creating a new modified beamform tracking configuration based on new locations of the new desired sound instances, and saving the new modified beamform tracking configuration in the memory of the microphone array controller. The preliminary beamform tracking configuration for each sub-region and the modified beamform tracking configuration includes a beamform center steering location and a beamforming steering region range. Also, the method may perform determining estimated locations of the detected one or more sound instances, as detected by the microphone array, by performing microphone array localization based on time delay of arrival (TDOA) or steered response power (SRP). In addition to sound being transmitted, received and processed by the controller, determining a location via the controller may be based on the audio sensing devices may produce metadata signals which include location and/or direction vector data (i.e., error-bound direction data, spectral data and/or temporal audio data). The controller may be distributed, such as multiple controller locations which receive sound, metadata and other indicators for accurate prediction purposes.
The method may also include forming one or more beamformed signals according to the beamform tracking configurations for each of the plurality of microphone arrays, combining, via the central controller, the one or more beamformed signals from each of the plurality of microphone arrays, emitting the audio signals as an audio calibration signal from a known position, and receiving the audio calibration signal at each of the microphone arrays. The audio calibration signal may include one or more of a pulsed tone, a pseudorandom sequence signal, a chirp signal and a sweep signal, and creating the beamform tracking configurations for each of the plurality of microphone arrays further includes combining beamformed signals from each of the plurality of the microphone arrays into a single joint beamformed signal. The audio calibration signals are emitted from each of the microphone arrays and the method also include displaying beam zone and microphone array locations on a user interface.
The method may also include increasing an acoustic gain or decreasing an acoustic gain responsive to receiving the acoustic stimulus and the loudspeaker location information. The acoustic gain includes a function of a difference between a level of the acoustic stimulus processed as output by a digital signal processor and the level of the acoustic stimulus received at the at least one microphone. The method also includes outputting the acoustic stimulus, at a known signal level, from each of a plurality of loudspeakers one loudspeaker zone at a time, and each loudspeaker zone includes one or more of the at least one loudspeaker, and the method also includes determining a delay for each combination of the at least one microphone and the plurality of loudspeakers. The method may also include performing an acoustic gain measurement for each combination of the at least one microphone and the plurality of loudspeakers, and determining whether the acoustic gain is less than a predefined threshold value, and when the acoustic gain is less than the predefined threshold value, setting a feedback decay rate based on the acoustic gain to minimize the acoustic feedback.
The above embodiments may be implemented in hardware, in a computer program executed by a processor, in firmware, or in a combination of the above. A computer program may be embodied on a computer readable medium, such as a storage medium. For example, a computer program may reside in random access memory (“RAM”), flash memory, read-only memory (“ROM”), erasable programmable read-only memory (“EPROM”), electrically erasable programmable read-only memory (“EEPROM”), registers, hard disk, a removable disk, a compact disk read-only memory (“CD-ROM”), or any other form of storage medium known in the art.
An exemplary storage medium may be coupled to the processor such that the processor may read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (“ASIC”). In the alternative, the processor and the storage medium may reside as discrete components. For example, FIG. 5 illustrates an example computer system architecture 500, which may represent or be integrated in any of the above-described components, etc.
In computing node 500 there is a computer system/server 502, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 502 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
Computer system/server 502 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 502 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
As shown in FIG. 5 , computer system/server 502 in a computing node 500 is shown in the form of a general-purpose computing device. The components of computer system/server 502 may include, but are not limited to, one or more processors or processing units 504, a system memory 506, and a bus that couples various system components including system memory 506 to processor 504.
The bus represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus.
Computer system/server 502 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 502, and it includes both volatile and non-volatile media, removable and non-removable media. System memory 506, in one embodiment, implements the flow diagrams of the other figures. The system memory 506 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 510 and/or cache memory 512. Computer system/server 502 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 514 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to the bus by one or more data media interfaces. As will be further depicted and described below, memory 506 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of various embodiments of the application.
Program/utility 516, having a set (at least one) of program modules 518, may be stored in memory 506 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 518 generally carry out the functions and/or methodologies of various embodiments of the application as described herein.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method, or computer program product. Accordingly, aspects of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present application may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Computer system/server 502 may also communicate with one or more external devices 520 such as a keyboard, a pointing device, a display 522, etc.; one or more devices that enable a user to interact with computer system/server 502; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 502 to communicate with one or more other computing devices. Such communication can occur via I/O interfaces 524. Still yet, computer system/server 502 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 526. Also, communications with an external audio device, such as a microphone array over the network or via another proprietary protocol may also be necessary to transfer/share audio data. As depicted, network adapter 526 communicates with the other components of computer system/server 502 via a bus. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 502. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
Although an exemplary embodiment of at least one of a system, method, and non-transitory computer readable medium has been illustrated in the accompanied drawings and described in the foregoing detailed description, it will be understood that the application is not limited to the embodiments disclosed, but is capable of numerous rearrangements, modifications, and substitutions as set forth and defined by the following claims. For example, the capabilities of the system of the various figures can be performed by one or more of the modules or components described herein or in a distributed architecture and may include a transmitter, receiver or pair of both. For example, all or part of the functionality performed by the individual modules, may be performed by one or more of these modules. Further, the functionality described herein may be performed at various times and in relation to various events, internal or external to the modules or components. Also, the information sent between various modules can be sent between the modules via at least one of: a data network, the Internet, a voice network, an Internet Protocol network, a wireless device, a wired device and/or via plurality of protocols. Also, the messages sent or received by any of the modules may be sent or received directly and/or via one or more of the other modules.
One skilled in the art will appreciate that a “system” could be embodied as a personal computer, a server, a console, a personal digital assistant (PDA), a cell phone, a tablet computing device, a smartphone or any other suitable computing device, or combination of devices. Presenting the above-described functions as being performed by a “system” is not intended to limit the scope of the present application in any way, but is intended to provide one example of many embodiments. Indeed, methods, systems and apparatuses disclosed herein may be implemented in localized and distributed forms consistent with computing technology.
It should be noted that some of the system features described in this specification have been presented as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, graphics processing units, or the like.
A module may also be at least partially implemented in software for execution by various types of processors. An identified unit of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions that may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module. Further, modules may be stored on a computer-readable medium, which may be, for instance, a hard disk drive, flash device, random access memory (RAM), tape, or any other such medium used to store data.
Indeed, a module of executable code could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
It will be readily understood that the components of the application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application.
One having ordinary skill in the art will readily understand that the above may be practiced with steps in a different order, and/or with hardware elements in configurations that are different than those which are disclosed. Therefore, although the application has been described based upon these preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent.
While preferred embodiments of the present application have been described, it is to be understood that the embodiments described are illustrative only and the scope of the application is to be defined solely by the appended claims when considered with a full range of equivalents and modifications (e.g., protocols, hardware devices, software platforms etc.) thereto.
Claims (20)
1. A method, comprising:
receiving audio signals as an audio calibration signal at a controller from a plurality of microphone arrays in a space comprising a plurality of sub-regions;
assigning each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on known locations of each of the plurality of microphone arrays;
combining beamformed signals from each of the plurality of the microphone arrays into a single joint beamformed signal; and
creating beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions and the single joint beamformed signal.
2. The method of claim 1 , further comprising:
forming one or more beamformed signals according to the beamform tracking configurations for each of the plurality of microphone arrays.
3. The method of claim 2 , further comprising:
combining, via the controller, the one or more beamformed signals from each of the plurality of microphone arrays.
4. The method of claim 1 , further comprising:
receiving the audio calibration signal at each of the microphone arrays.
5. The method of claim 4 , wherein the audio calibration signal comprises one or more of a pulsed tone, a pseudorandom sequence signal, a chirp signal and a sweep signal.
6. The method of claim 4 , further comprising:
emitting the audio signals as the audio calibration signal from a known position, wherein the audio calibration signals are emitted from each of the microphone arrays.
7. The method of claim 1 , further comprising:
displaying beam zone and microphone array locations on a user interface.
8. An apparatus, comprising:
a receiver configured to:
receive audio signals as an audio calibration signal at a controller from a plurality of microphone arrays in a space comprising a plurality of sub-regions; and
a processor configured to:
combine beamformed signals from each of the plurality of the microphone arrays into a single joint beamformed signal,
assign each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on known locations of each of the plurality of microphone arrays, and
create beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions and the single joint beamformed signal.
9. The apparatus of claim 8 , wherein the processor is further configured to:
form one or more beamformed signals according to the beamform tracking configurations for each of the plurality of microphone arrays.
10. The apparatus of claim 9 , wherein the processor is further configured to:
combine, via the controller, the one or more beamformed signals from each of the plurality of microphone arrays.
11. The apparatus of claim 8 , wherein the receiver is further configured to:
receive the audio calibration signal at each of the microphone arrays.
12. The apparatus of claim 11 , wherein the audio calibration signal comprises one or more of:
a pulsed tone, a pseudorandom sequence signal, a chirp signal and a sweep signal.
13. The apparatus of claim 12 , wherein the processor is further configured to emit the audio signals as the audio calibration signal from a known position, wherein the audio calibration signals are emitted from each of the microphone arrays.
14. The apparatus of claim 8 , wherein the processor is further configured to:
display beam zone and microphone array locations on a user interface.
15. A non-transitory computer readable storage medium configured to store one or more instructions that when executed by a processor cause the processor to perform:
receiving audio signals as an audio calibration signal at a controller from a plurality of microphone arrays in a space comprising a plurality of sub-regions;
assigning each of the plurality of sub-regions to at least one of the plurality of microphone arrays based on known locations of each of the plurality of microphone arrays;
combining beamformed signals from each of the plurality of the microphone arrays into a single joint beamformed signal; and
creating beamform tracking configurations for each of the plurality of microphone arrays based on their assigned sub-regions and the single joint beamformed signal.
16. The non-transitory computer readable storage medium of claim 15 , wherein the one or more instructions are further configured to cause the processor to perform:
forming one or more beamformed signals according to the beamform tracking configurations for each of the plurality of microphone arrays.
17. The non-transitory computer readable storage medium of claim 16 , wherein the one or more instructions are further configured to cause the processor to perform:
combining, via the controller, the one or more beamformed signals from each of the plurality of microphone arrays.
18. The non-transitory computer readable storage medium of claim 15 , wherein the one or more instructions are further configured to cause the processor to perform:
receiving the audio calibration signal at each of the microphone arrays.
19. The non-transitory computer readable storage medium of claim 18 , wherein the audio calibration signal comprises one or more of:
a pulsed tone, a pseudorandom sequence signal, a chirp signal and a sweep signal.
20. The non-transitory computer readable storage medium of claim 15 , wherein the one or more instructions are further configured to cause the processor to perform:
displaying beam zone and microphone array locations on a user interface, and wherein audio calibration signals are emitted from each of the microphone arrays.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/990,924 US11211081B1 (en) | 2018-06-25 | 2020-08-11 | Microphone array with automated adaptive beam tracking |
US17/564,073 US11676618B1 (en) | 2018-06-25 | 2021-12-28 | Microphone array with automated adaptive beam tracking |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/017,538 US10210882B1 (en) | 2018-06-25 | 2018-06-25 | Microphone array with automated adaptive beam tracking |
US16/279,927 US10741193B1 (en) | 2018-06-25 | 2019-02-19 | Microphone array with automated adaptive beam tracking |
US16/990,924 US11211081B1 (en) | 2018-06-25 | 2020-08-11 | Microphone array with automated adaptive beam tracking |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/279,927 Continuation US10741193B1 (en) | 2018-06-25 | 2019-02-19 | Microphone array with automated adaptive beam tracking |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/564,073 Continuation US11676618B1 (en) | 2018-06-25 | 2021-12-28 | Microphone array with automated adaptive beam tracking |
Publications (1)
Publication Number | Publication Date |
---|---|
US11211081B1 true US11211081B1 (en) | 2021-12-28 |
Family
ID=65322693
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/017,538 Active US10210882B1 (en) | 2018-06-25 | 2018-06-25 | Microphone array with automated adaptive beam tracking |
US16/279,927 Active US10741193B1 (en) | 2018-06-25 | 2019-02-19 | Microphone array with automated adaptive beam tracking |
US16/990,924 Active US11211081B1 (en) | 2018-06-25 | 2020-08-11 | Microphone array with automated adaptive beam tracking |
US17/564,073 Active US11676618B1 (en) | 2018-06-25 | 2021-12-28 | Microphone array with automated adaptive beam tracking |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/017,538 Active US10210882B1 (en) | 2018-06-25 | 2018-06-25 | Microphone array with automated adaptive beam tracking |
US16/279,927 Active US10741193B1 (en) | 2018-06-25 | 2019-02-19 | Microphone array with automated adaptive beam tracking |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/564,073 Active US11676618B1 (en) | 2018-06-25 | 2021-12-28 | Microphone array with automated adaptive beam tracking |
Country Status (1)
Country | Link |
---|---|
US (4) | US10210882B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11638091B2 (en) | 2018-06-25 | 2023-04-25 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
US11676618B1 (en) * | 2018-06-25 | 2023-06-13 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9554207B2 (en) | 2015-04-30 | 2017-01-24 | Shure Acquisition Holdings, Inc. | Offset cartridge microphones |
US9565493B2 (en) | 2015-04-30 | 2017-02-07 | Shure Acquisition Holdings, Inc. | Array microphone system and method of assembling the same |
US10367948B2 (en) | 2017-01-13 | 2019-07-30 | Shure Acquisition Holdings, Inc. | Post-mixing acoustic echo cancellation systems and methods |
WO2019231632A1 (en) | 2018-06-01 | 2019-12-05 | Shure Acquisition Holdings, Inc. | Pattern-forming microphone array |
JP7024615B2 (en) * | 2018-06-07 | 2022-02-24 | 日本電信電話株式会社 | Blind separation devices, learning devices, their methods, and programs |
US11297423B2 (en) | 2018-06-15 | 2022-04-05 | Shure Acquisition Holdings, Inc. | Endfire linear array microphone |
US10433086B1 (en) | 2018-06-25 | 2019-10-01 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
EP3854108A1 (en) | 2018-09-20 | 2021-07-28 | Shure Acquisition Holdings, Inc. | Adjustable lobe shape for array microphones |
US10878812B1 (en) * | 2018-09-26 | 2020-12-29 | Amazon Technologies, Inc. | Determining devices to respond to user requests |
US11558693B2 (en) | 2019-03-21 | 2023-01-17 | Shure Acquisition Holdings, Inc. | Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality |
CN113841419A (en) | 2019-03-21 | 2021-12-24 | 舒尔获得控股公司 | Housing and associated design features for ceiling array microphone |
CN113841421A (en) * | 2019-03-21 | 2021-12-24 | 舒尔获得控股公司 | Auto-focus, in-region auto-focus, and auto-configuration of beamforming microphone lobes with suppression |
US11432086B2 (en) | 2019-04-16 | 2022-08-30 | Biamp Systems, LLC | Centrally controlling communication at a venue |
WO2020237206A1 (en) | 2019-05-23 | 2020-11-26 | Shure Acquisition Holdings, Inc. | Steerable speaker array, system, and method for the same |
WO2020243471A1 (en) | 2019-05-31 | 2020-12-03 | Shure Acquisition Holdings, Inc. | Low latency automixer integrated with voice and noise activity detection |
CN110517703B (en) * | 2019-08-15 | 2021-12-07 | 北京小米移动软件有限公司 | Sound collection method, device and medium |
JP2022545113A (en) | 2019-08-23 | 2022-10-25 | シュアー アクイジッション ホールディングス インコーポレイテッド | One-dimensional array microphone with improved directivity |
CN110913306B (en) * | 2019-12-02 | 2021-07-02 | 北京飞利信电子技术有限公司 | Method for realizing array microphone beam forming |
US11361774B2 (en) * | 2020-01-17 | 2022-06-14 | Lisnr | Multi-signal detection and combination of audio-based data transmissions |
US11552611B2 (en) | 2020-02-07 | 2023-01-10 | Shure Acquisition Holdings, Inc. | System and method for automatic adjustment of reference gain |
US11617035B2 (en) * | 2020-05-04 | 2023-03-28 | Shure Acquisition Holdings, Inc. | Intelligent audio system using multiple sensor modalities |
WO2021243368A2 (en) | 2020-05-29 | 2021-12-02 | Shure Acquisition Holdings, Inc. | Transducer steering and configuration systems and methods using a local positioning system |
CN112492452B (en) * | 2020-11-26 | 2022-08-26 | 北京字节跳动网络技术有限公司 | Beam coefficient storage method, device, equipment and storage medium |
JP2024505068A (en) | 2021-01-28 | 2024-02-02 | シュアー アクイジッション ホールディングス インコーポレイテッド | Hybrid audio beamforming system |
Citations (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5335011A (en) | 1993-01-12 | 1994-08-02 | Bell Communications Research, Inc. | Sound localization system for teleconferencing using self-steering microphone arrays |
US20010002930A1 (en) | 1997-11-18 | 2001-06-07 | Kates James Mitchell | Feedback cancellation improvements |
US20050201204A1 (en) | 2004-03-11 | 2005-09-15 | Stephane Dedieu | High precision beamsteerer based on fixed beamforming approach beampatterns |
US20060241490A1 (en) | 2005-03-25 | 2006-10-26 | Siemens Medical Solutions Usa, Inc. | Multi stage beamforming |
US20080199024A1 (en) * | 2005-07-26 | 2008-08-21 | Honda Motor Co., Ltd. | Sound source characteristic determining device |
US20100025184A1 (en) | 2005-02-24 | 2010-02-04 | Jgc Corporation | Mercury removal apparatus for liquid hydrocarbon |
US20100215184A1 (en) | 2009-02-23 | 2010-08-26 | Nuance Communications, Inc. | Method for Determining a Set of Filter Coefficients for an Acoustic Echo Compensator |
US20110103612A1 (en) | 2009-11-03 | 2011-05-05 | Industrial Technology Research Institute | Indoor Sound Receiving System and Indoor Sound Receiving Method |
US20110164761A1 (en) | 2008-08-29 | 2011-07-07 | Mccowan Iain Alexander | Microphone array system and method for sound acquisition |
US20120087509A1 (en) | 2010-10-06 | 2012-04-12 | Thomas Bo Elmedyb | Method of determining parameters in an adaptive audio processing algorithm and an audio processing system |
US20130039503A1 (en) | 2011-08-11 | 2013-02-14 | Broadcom Corporation | Beamforming apparatus and method based on long-term properties of sources of undesired noise affecting voice quality |
US20130070936A1 (en) | 2011-09-20 | 2013-03-21 | Oticon A/S | Control of an adaptive feedback cancellation system based on probe signal injection |
US20130148821A1 (en) | 2011-12-08 | 2013-06-13 | Karsten Vandborg Sorensen | Processing audio signals |
US20130179163A1 (en) | 2012-01-10 | 2013-07-11 | Tobias Herbig | In-car communication system for multiple acoustic zones |
US20130259254A1 (en) | 2012-03-28 | 2013-10-03 | Qualcomm Incorporated | Systems, methods, and apparatus for producing a directional sound field |
US20130294616A1 (en) | 2010-12-20 | 2013-11-07 | Phonak Ag | Method and system for speech enhancement in a room |
US20140003622A1 (en) | 2012-06-28 | 2014-01-02 | Broadcom Corporation | Loudspeaker beamforming for personal audio focal points |
US20140023199A1 (en) | 2012-07-23 | 2014-01-23 | Qsound Labs, Inc. | Noise reduction using direction-of-arrival information |
US20140037100A1 (en) | 2012-08-03 | 2014-02-06 | Qsound Labs, Inc. | Multi-microphone noise reduction using enhanced reference noise signal |
US20140098964A1 (en) | 2012-10-04 | 2014-04-10 | Siemens Corporation | Method and Apparatus for Acoustic Area Monitoring by Exploiting Ultra Large Scale Arrays of Microphones |
US20140314251A1 (en) | 2012-10-04 | 2014-10-23 | Siemens Aktiengesellschaft | Broadband sensor location selection using convex optimization in very large scale arrays |
US20150063590A1 (en) | 2013-08-30 | 2015-03-05 | Oki Electric Industry Co., Ltd. | Sound source separating apparatus, sound source separating program, sound pickup apparatus, and sound pickup program |
US20150208166A1 (en) | 2014-01-18 | 2015-07-23 | Microsoft Corporation | Enhanced spatial impression for home audio |
US20160198258A1 (en) * | 2015-01-05 | 2016-07-07 | Oki Electric Industry Co., Ltd. | Sound pickup device, program recorded medium, and method |
US20160255446A1 (en) | 2015-02-27 | 2016-09-01 | Giuliano BERNARDI | Methods, Systems, and Devices for Adaptively Filtering Audio Signals |
US20170031530A1 (en) | 2013-12-27 | 2017-02-02 | Sony Corporation | Display control device, display control method, and program |
US20170289716A1 (en) | 2016-03-29 | 2017-10-05 | Honda Motor Co., Ltd. | Test device and test method |
US20170374454A1 (en) | 2016-06-23 | 2017-12-28 | Stmicroelectronics S.R.L. | Beamforming method based on arrays of microphones and corresponding apparatus |
US20180070185A1 (en) | 2014-05-20 | 2018-03-08 | Oticon A/S | Hearing device |
US20180115650A1 (en) | 2015-10-16 | 2018-04-26 | Panasonic Intellectual Property Management Co., Ltd. | Device for assisting two-way conversation and method for assisting two-way conversation |
US20180146307A1 (en) | 2016-11-24 | 2018-05-24 | Oticon A/S | Hearing device comprising an own voice detector |
US20180190260A1 (en) | 2017-01-05 | 2018-07-05 | Harman Becker Automotive Systems Gmbh | Active noise reduction earphones |
US20190027032A1 (en) | 2017-07-24 | 2019-01-24 | Harman International Industries, Incorporated | Emergency vehicle alert system |
US10210882B1 (en) | 2018-06-25 | 2019-02-19 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
US20190129026A1 (en) | 2015-06-04 | 2019-05-02 | Chikayoshi Sumi | Measurement and imaging instruments and beamforming method |
US20190174226A1 (en) | 2017-12-06 | 2019-06-06 | Honeywell International Inc. | Systems and methods for automatic speech recognition |
US20190349551A1 (en) | 2018-05-14 | 2019-11-14 | COMSATS University Islamabad | Surveillance system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7068797B2 (en) | 2003-05-20 | 2006-06-27 | Sony Ericsson Mobile Communications Ab | Microphone circuits having adjustable directivity patterns for reducing loudspeaker feedback and methods of operating the same |
EP1591995B1 (en) | 2004-04-29 | 2019-06-19 | Harman Becker Automotive Systems GmbH | Indoor communication system for a vehicular cabin |
-
2018
- 2018-06-25 US US16/017,538 patent/US10210882B1/en active Active
-
2019
- 2019-02-19 US US16/279,927 patent/US10741193B1/en active Active
-
2020
- 2020-08-11 US US16/990,924 patent/US11211081B1/en active Active
-
2021
- 2021-12-28 US US17/564,073 patent/US11676618B1/en active Active
Patent Citations (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5335011A (en) | 1993-01-12 | 1994-08-02 | Bell Communications Research, Inc. | Sound localization system for teleconferencing using self-steering microphone arrays |
US20010002930A1 (en) | 1997-11-18 | 2001-06-07 | Kates James Mitchell | Feedback cancellation improvements |
US20050201204A1 (en) | 2004-03-11 | 2005-09-15 | Stephane Dedieu | High precision beamsteerer based on fixed beamforming approach beampatterns |
US20100025184A1 (en) | 2005-02-24 | 2010-02-04 | Jgc Corporation | Mercury removal apparatus for liquid hydrocarbon |
US20060241490A1 (en) | 2005-03-25 | 2006-10-26 | Siemens Medical Solutions Usa, Inc. | Multi stage beamforming |
US20080199024A1 (en) * | 2005-07-26 | 2008-08-21 | Honda Motor Co., Ltd. | Sound source characteristic determining device |
US20110164761A1 (en) | 2008-08-29 | 2011-07-07 | Mccowan Iain Alexander | Microphone array system and method for sound acquisition |
US20100215184A1 (en) | 2009-02-23 | 2010-08-26 | Nuance Communications, Inc. | Method for Determining a Set of Filter Coefficients for an Acoustic Echo Compensator |
US20110103612A1 (en) | 2009-11-03 | 2011-05-05 | Industrial Technology Research Institute | Indoor Sound Receiving System and Indoor Sound Receiving Method |
US20120087509A1 (en) | 2010-10-06 | 2012-04-12 | Thomas Bo Elmedyb | Method of determining parameters in an adaptive audio processing algorithm and an audio processing system |
US20130294616A1 (en) | 2010-12-20 | 2013-11-07 | Phonak Ag | Method and system for speech enhancement in a room |
US20130039503A1 (en) | 2011-08-11 | 2013-02-14 | Broadcom Corporation | Beamforming apparatus and method based on long-term properties of sources of undesired noise affecting voice quality |
US20130070936A1 (en) | 2011-09-20 | 2013-03-21 | Oticon A/S | Control of an adaptive feedback cancellation system based on probe signal injection |
US20130148821A1 (en) | 2011-12-08 | 2013-06-13 | Karsten Vandborg Sorensen | Processing audio signals |
US20130179163A1 (en) | 2012-01-10 | 2013-07-11 | Tobias Herbig | In-car communication system for multiple acoustic zones |
US20130259254A1 (en) | 2012-03-28 | 2013-10-03 | Qualcomm Incorporated | Systems, methods, and apparatus for producing a directional sound field |
US20140003622A1 (en) | 2012-06-28 | 2014-01-02 | Broadcom Corporation | Loudspeaker beamforming for personal audio focal points |
US20140023199A1 (en) | 2012-07-23 | 2014-01-23 | Qsound Labs, Inc. | Noise reduction using direction-of-arrival information |
US20140037100A1 (en) | 2012-08-03 | 2014-02-06 | Qsound Labs, Inc. | Multi-microphone noise reduction using enhanced reference noise signal |
US20140314251A1 (en) | 2012-10-04 | 2014-10-23 | Siemens Aktiengesellschaft | Broadband sensor location selection using convex optimization in very large scale arrays |
US20140098964A1 (en) | 2012-10-04 | 2014-04-10 | Siemens Corporation | Method and Apparatus for Acoustic Area Monitoring by Exploiting Ultra Large Scale Arrays of Microphones |
US20150063590A1 (en) | 2013-08-30 | 2015-03-05 | Oki Electric Industry Co., Ltd. | Sound source separating apparatus, sound source separating program, sound pickup apparatus, and sound pickup program |
US20170031530A1 (en) | 2013-12-27 | 2017-02-02 | Sony Corporation | Display control device, display control method, and program |
US20150208166A1 (en) | 2014-01-18 | 2015-07-23 | Microsoft Corporation | Enhanced spatial impression for home audio |
US20180070185A1 (en) | 2014-05-20 | 2018-03-08 | Oticon A/S | Hearing device |
US20160198258A1 (en) * | 2015-01-05 | 2016-07-07 | Oki Electric Industry Co., Ltd. | Sound pickup device, program recorded medium, and method |
US20160255446A1 (en) | 2015-02-27 | 2016-09-01 | Giuliano BERNARDI | Methods, Systems, and Devices for Adaptively Filtering Audio Signals |
US20190129026A1 (en) | 2015-06-04 | 2019-05-02 | Chikayoshi Sumi | Measurement and imaging instruments and beamforming method |
US20180115650A1 (en) | 2015-10-16 | 2018-04-26 | Panasonic Intellectual Property Management Co., Ltd. | Device for assisting two-way conversation and method for assisting two-way conversation |
US20170289716A1 (en) | 2016-03-29 | 2017-10-05 | Honda Motor Co., Ltd. | Test device and test method |
US20170374454A1 (en) | 2016-06-23 | 2017-12-28 | Stmicroelectronics S.R.L. | Beamforming method based on arrays of microphones and corresponding apparatus |
US20180146307A1 (en) | 2016-11-24 | 2018-05-24 | Oticon A/S | Hearing device comprising an own voice detector |
US20180190260A1 (en) | 2017-01-05 | 2018-07-05 | Harman Becker Automotive Systems Gmbh | Active noise reduction earphones |
US20190027032A1 (en) | 2017-07-24 | 2019-01-24 | Harman International Industries, Incorporated | Emergency vehicle alert system |
US20190174226A1 (en) | 2017-12-06 | 2019-06-06 | Honeywell International Inc. | Systems and methods for automatic speech recognition |
US20190349551A1 (en) | 2018-05-14 | 2019-11-14 | COMSATS University Islamabad | Surveillance system |
US10210882B1 (en) | 2018-06-25 | 2019-02-19 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11638091B2 (en) | 2018-06-25 | 2023-04-25 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
US11676618B1 (en) * | 2018-06-25 | 2023-06-13 | Biamp Systems, LLC | Microphone array with automated adaptive beam tracking |
Also Published As
Publication number | Publication date |
---|---|
US10210882B1 (en) | 2019-02-19 |
US11676618B1 (en) | 2023-06-13 |
US10741193B1 (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11676618B1 (en) | Microphone array with automated adaptive beam tracking | |
US11863942B1 (en) | Microphone array with automated adaptive beam tracking | |
US11638091B2 (en) | Microphone array with automated adaptive beam tracking | |
US10972835B2 (en) | Conference system with a microphone array system and a method of speech acquisition in a conference system | |
US11765498B2 (en) | Microphone array system | |
JP2022526761A (en) | Beam forming with blocking function Automatic focusing, intra-regional focusing, and automatic placement of microphone lobes | |
US9338549B2 (en) | Acoustic localization of a speaker | |
GB2495472B (en) | Processing audio signals | |
JP2013543987A (en) | System, method, apparatus and computer readable medium for far-field multi-source tracking and separation | |
US20160161595A1 (en) | Narrowcast messaging system | |
US20160161594A1 (en) | Swarm mapping system | |
US20160165338A1 (en) | Directional audio recording system | |
EP3420735B1 (en) | Multitalker optimised beamforming system and method | |
US10932079B2 (en) | Acoustical listening area mapping and frequency correction | |
CN111078185A (en) | Method and equipment for recording sound | |
WO2022118072A1 (en) | Pervasive acoustic mapping | |
US10490205B1 (en) | Location based storage and upload of acoustic environment related information | |
JP2019161604A (en) | Audio processing device | |
Tashev et al. | Cost function for sound source localization with arbitrary microphone arrays | |
US20230292041A1 (en) | Sound receiving device and control method of sound receiving device | |
US11889261B2 (en) | Adaptive beamformer for enhanced far-field sound pickup | |
WO2022239650A1 (en) | Information processing device, information processing method, and program | |
WO2022119990A1 (en) | Audibility at user location through mutual device audibility | |
JP2023057964A (en) | Beamforming microphone system, sound collection program and setting program for beamforming microphone system, setting device for beamforming microphone and setting method for beamforming microphone | |
CN116830599A (en) | Pervasive acoustic mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |