WO2016109103A1 - Capture audio directionnelle - Google Patents

Capture audio directionnelle Download PDF

Info

Publication number
WO2016109103A1
WO2016109103A1 PCT/US2015/063519 US2015063519W WO2016109103A1 WO 2016109103 A1 WO2016109103 A1 WO 2016109103A1 US 2015063519 W US2015063519 W US 2015063519W WO 2016109103 A1 WO2016109103 A1 WO 2016109103A1
Authority
WO
WIPO (PCT)
Prior art keywords
directional
audio
source
estimates
salience
Prior art date
Application number
PCT/US2015/063519
Other languages
English (en)
Inventor
Harinarayanan Erumbi Vallabhan
Shailesh Sakri
Carlos Avendano
Ludger Solbach
Original Assignee
Knowles Electronics, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Knowles Electronics, Llc filed Critical Knowles Electronics, Llc
Priority to CN201580071317.7A priority Critical patent/CN107113499B/zh
Priority to DE112015005862.1T priority patent/DE112015005862T5/de
Publication of WO2016109103A1 publication Critical patent/WO2016109103A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/21Direction finding using differential microphone array [DMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's

Definitions

  • the present disclosure relates generally to audio processing and, more particularly, to systems and methods for improving performance of directional audio capture.
  • Existing systems for directional audio capture are typically configured to capture an audio signal within an area of interest (e.g., within a lobe) and to suppress anything outside the lobe. Furthermore, the existing systems for directional audio capture do not utilize the directionality of the speaker being recorded. This results in non-uniform suppression throughout the lobe. The robustness of such systems can be compromised, especially in cases of varying distances between a talker (i.e., speaker) and an audio capturing device for a given angle. If the talker moves closer to or farther away from the device, the suppression can become non-uniform.
  • a talker i.e., speaker
  • suppressing/boosting certain angles is desirable to maintain uniform noise suppression across the lobe.
  • the existing directional audio capture solutions can also be very sensitive to microphone sealing. Better microphone sealing results in more uniform suppression and poor microphone sealing results in non-uniform suppression.
  • Microphone sealing in general, can make one device different from another device even when the same batch of manufacturing is used. A solution that makes microphone sealing robust during a change in distance between a talker and an audio capture system is desirable.
  • An example method includes correlating phase plots of at least two audio inputs. The method allows for generating, based on the correlation, estimates of salience at different directional angles to localize at least one direction associated with at least one source of a sound. The method also includes determining cues, based on the estimates of salience, and providing the cues to the directional audio capture system.
  • the cues are used by the directional audio capture system to attenuate or amplify the at least two audio inputs at the different directional angles.
  • the cues include at least attenuation levels for the different directional angles.
  • the estimates of salience include a vector of saliences at directional angles from 0 to 360 in a plane parallel to a ground.
  • generating the cues includes mapping the different directional angles to relative levels of attenuation for the directional audio capture system.
  • the method includes controlling the rate of changing of the levels of attenuation in a real time by attack and release time constants to avoid sound artifacts.
  • the method includes determining, based on absence or presence of one or more peaks in the estimates of salience, a mode from a plurality of the operational modes.
  • the method allows configuring, based on the determined mode, the directional audio capture system.
  • the method allows controlling a rate of switching between modes from the plurality of the operational modes in real time by applying attack and release time constants.
  • the audio inputs are captured by at least two microphones having different qualities of sealing.
  • the steps of the method for improving performance of directional audio capture systems are stored on a machine-readable medium comprising instructions, which when
  • FIG. 1 is a block diagram of an exemplary environment in which the present technology can be used.
  • FIG. 2 is a block diagram of an exemplary audio device.
  • FIG. 3 is a block diagram of an exemplary audio processing system.
  • FIG. 4 is a block diagram of an exemplary beam former module.
  • FIG. 5 is a flow chart of an exemplary method for performing an audio zoom.
  • FIG. 6 is a flow chart of an exemplary method for enhancing acoustic signal components.
  • FIG. 7 is a flow chart of an exemplary method for generating a multiplicative mask.
  • FIG. 8 is a block diagram of an exemplary audio processing system suitable for improving performance of directional audio capture.
  • FIG.9 is a flow chart of an exemplary method for improving performance of directional audio capture.
  • FIG. 10 is a computer system that can be used to implement methods disclosed herein, according to various example embodiments.
  • the technology disclosed herein relates to systems and methods for improving performance of directional audio capture.
  • Embodiments of the present technology may be practiced with audio devices operable at least to capture and process acoustic signals.
  • the audio devices can include: radio frequency (RF) receivers, transmitters, and transceivers; wired and/or wireless telecommunications and/or networking devices; amplifiers; audio and/or video players; encoders; decoders;
  • Audio devices may include input devices such as buttons, switches, keys, keyboards, trackballs, sliders, touch screens, one or more microphones, gyroscopes, accelerometers, global positioning system (GPS) receivers, and the like.
  • the audio devices may include outputs, such as Light-Emitting Diode (LED) indicators, video displays, touchscreens, speakers, and the like.
  • the audio devices include hand-held devices, such as wired and/or wireless remote controls, notebook computers, tablet computers, phablets, smart phones, personal digital assistants, media players, mobile telephones, and the like.
  • the audio devices include Television (TV) sets, car control and audio systems, smart thermostats, light switches, dimmers, and so on.
  • TV Television
  • the audio devices operate in stationary and portable environments.
  • Stationary environments can include residential and commercial buildings or structures, and the like.
  • the stationary embodiments can include living rooms, bedrooms, home theaters, conference rooms, auditoriums, business premises, and the like.
  • Portable environments can include moving vehicles, moving persons, other transportation means, and the like.
  • a method for improving a directional audio capture system includes correlating phase plots of at least two audio inputs. The method allows for generating, based on the correlation, estimates of salience at different directional angles to localize at least one direction associated with at least one source of a sound.
  • the cues include at least levels of attenuation.
  • the method includes determining cues, based on the estimates of salience, and providing the cues to the directional audio capture system.
  • FIG. 1 is a block diagram of an exemplary environment 100 in which the present technology can be used.
  • the environment 100 of FIG. 1 includes audio device 104, and audio sources 112, 114 and 116, all within an environment 100 having walls 132 and 134.
  • a user of the audio device 104 may choose to focus on or "zoom" into a particular audio source from the multiple audio sources within environment 100.
  • Environment 100 includes audio sources 112, 114, and 116 which all provide audio in multidirections, including towards audio device 104. Additionally, reflections from audio sources 112 and 116 as well as other audio sources may provide audio which reflects off the walls 132 and 134 of the environment 100 and is directed at audio device 104. For example, reflection 128 is a reflection of an audio signal provided by audio source 112 and reflected from wall 132, and reflection 129 is a reflection of an audio signal provided by audio source 116 and reflected from wall 134, both of which travel towards audio device 104.
  • the present technology allows the user to select an area to "zoom.” By performing an audio zoom on a particular area, the present technology detects audio signals having a source within the particular area and enhances those signals with respect to signals from audio sources outside the particular area.
  • the area may be defined using a beam, such as, for example, beam 140 in FIG. 1.
  • beam 140 contains an area that includes audio source 114. Audio sources 112 and 116 are contained outside the beam area.
  • the present technology would emphasize or "zoom" into the audio signal provided by audio source 114 and de-emphasize the audio provided by audio sources 112 and 116, including any reflections provided by environment 100, such as reflections 128 and 129.
  • a primary microphone 106 and secondary microphone 108 of audio device 104 may be omni-directional microphones. Alternate embodiments may utilize other forms of microphones or acoustic sensors, such as directional microphones.
  • the microphones 106 and 108 receive sound (i.e., acoustic signals) from the audio source 114, the microphones 106 and 108 also pick up noise from audio source 112.
  • the noise 122 is shown coming from a single location in FIG. 1, the noise 122 may include any sounds from one or more locations that differ from the location of audio source 114, and may include reverberations and echoes.
  • the noise 124 may be stationary, non-stationary, and/or a combination of both stationary and non-stationary noise.
  • Some embodiments may utilize level differences (e.g., energy differences) between the acoustic signals received by the two microphones 106 and 108. Because the primary microphone 106 is much closer to the audio source 116 than the secondary microphone 108 in a close-talk use case, the intensity level for noise 126 is higher for the primary microphone 106, resulting in a larger energy level received by the primary microphone 106 during a speech/voice segment, for example.
  • level differences e.g., energy differences
  • the level difference may then be used to discriminate speech and noise in the time-frequency domain. Further embodiments may use a combination of energy level differences and time delays to discriminate speech. Based on binaural cue encoding, speech signal extraction or speech enhancement may be performed.
  • FIG. 2 is a block diagram of an exemplary audio device.
  • the audio device of FIG. 2 provides more detail for audio device 104 of FIG. I.
  • the audio device 104 includes a receiver 210, a processor 220, the primary microphone 106, an optional secondary microphone 108, an audio processing system 230, and an output device 240.
  • the audio device 104 may include further or other components needed for audio device 104 operations.
  • the audio device 104 may include fewer components that perform similar or equivalent functions to those depicted in FIG. 2.
  • Processor 220 may execute instructions and modules stored in a memory (not illustrated in FIG. 2) in the audio device 104 to perform functionality described herein, including noise reduction for an acoustic signal.
  • Processor 220 may include hardware and software implemented as a processing unit, which may process floating point operations and other operations for the processor 220.
  • the exemplary receiver 210 is an acoustic sensor configured to receive a signal from a communications network.
  • the receiver 210 may include an antenna device.
  • the signal may then be forwarded to the audio processing system 230 to reduce noise using the techniques described herein, and provide an audio signal to the output device 240.
  • the present technology may be used in one or both of the transmitting and receiving paths of the audio device 104.
  • the audio processing system 230 is configured to receive the acoustic signals from an acoustic source via the primary microphone 106 and secondary microphone 108 and process the acoustic signals. Processing may include performing noise reduction within an acoustic signal.
  • the audio processing system 230 is discussed in more detail below.
  • the primary and secondary microphones 106, 108 may be spaced a distance apart in order to allow for detecting an energy level difference, time difference, or phase difference between them.
  • the acoustic signals received by primary microphone 106 and secondary microphone 108 may be converted into electrical signals (i.e., a primary electrical signal and a secondary electrical signal). The electrical signals may
  • the acoustic signal received by the primary microphone 106 is herein referred to as the primary acoustic signal
  • the acoustic signal received from by the secondary microphone 108 is herein referred to as the secondary acoustic signal.
  • the primary acoustic signal and the secondary acoustic signal may be processed by the audio processing system 230 to produce a signal with an improved signal-to-noise ratio. It should be noted that embodiments of the technology described herein may be practiced utilizing only the primary microphone 106.
  • the output device 240 is any device that provides an audio output to the user.
  • the output device 240 may include a speaker, an earpiece of a headset or handset, or a speaker on a conference device.
  • a beamforming technique may be used to simulate forwards-facing and backwards-facing directional microphones.
  • the level difference may be used to discriminate speech and noise in the time-frequency domain, which can be used in noise reduction.
  • FIG. 3 is a block diagram of an exemplary audio processing system.
  • the block diagram of FIG. 3 provides more detail for the audio processing system 230 in the block diagram of FIG. 2.
  • Audio processing system 230 includes fast cosine transform (FCT) modules 302 and 304, beam former module 310, multiplicative gain expansion module 320, reverb module 330, mixer module 340, and zoom control module 350.
  • FCT fast cosine transform
  • FCT modules 302 and 304 may receive acoustic signals from audio device microphones and convert the acoustic signals to frequency range sub-band signals.
  • FCT modules 302 and 304 are implemented as one or more modules that create one or more sub-band signals for each received microphone signal.
  • FCT modules 302 and 304 receive an acoustic signal from each microphone contained in audio device 104. These received signals are illustrated as signals Xi-Xi, wherein Xi is a primary microphone signal and Xi represents the remaining microphone signals.
  • the audio processing system 230 of FIG. 3 performs an audio zoom on a per frame and per sub band basis.
  • beam former module 310 receives the frequency sub- band signals as well as a zoom indication signal.
  • the zoom indication is received from zoom control module 350.
  • the zoom indication communicated by zoom indicator signal K may be generated in response to user input, analysis of a primary microphone signal or other acoustic signals received by audio device 104, a video zoom feature selection, or some other data.
  • beam former module 310 receives sub-band signals, processes the sub-band signals to identify which signals are within a particular area to enhance (or "zoom"), and provides data for the selected signals as output to multiplicative gain expansion module 320.
  • the output may include sub-band signals for the audio source within the area to enhance.
  • Beam former module 310 also provides a gain factor to multiplicative gain expansion module 320.
  • the gain factor may indicate whether multiplicative gain expansion module 320 should perform additional gain or reduction to the signals received from beam former module 310.
  • the gain factor is generated as an energy ratio based on the received microphone signals and components.
  • the gain indication output by beam former module 310 may be a ratio of how much energy is reduced in the signal from the primary microphone versus the energy in the signals from the other microphones.
  • the gain may be a boost or cancellation gain expansion factor.
  • the gain factor is discussed in more detail below.
  • Beam former module 310 can be implemented as a null processing noise subtraction (NPNS) module, multiplicative module, or a combination of these modules.
  • NPNS null processing noise subtraction
  • the beam is focused by narrowing constraints of alpha and gamma. For a rider beam, the constraints may be made larger. Hence, a beam may be manipulated by putting a protective range around the preferred direction.
  • Beam former module 310 may be implemented by a system described in the U.S. Patent Application No. 61/325,764, entitled “Multi-Microphone Robust Noise Suppression System," the disclosure of which is incorporated herein by reference. Additional techniques for reducing undesired audio components of a signal are discussed in U.S. Patent
  • Multiplicative gain expansion module 320 receives the sub-band signals associated with audio sources within the selected beam, the gain factor from beam former module 310, and the zoom indicator signal. Multiplicative gain expansion module 320 applies a multiplicative gain based on the gain factor received. In effect, multiplicative gain expansion module 320 filters the beam former signal provided by beam former module 310.
  • the gain factor may be implemented as one of several different energy ratios.
  • the energy ratio may be the ratio of a noise reduced signal to a primary acoustic signal received from a primary microphone, the ratio of a noise reduced signal and a detected noise component within the primary microphone signal, the ratio of a noise reduced signal and a secondary acoustic signal, or the ratio of a noise reduced signal compared to the intra level difference between a primary signal and another signal.
  • the gain factors may be an indication of signal strength in a target direction versus all other directions. Put another way, the gain factor may be an indication of multiplicative expansions due, and whether additional expansion or subtraction should be performed at the multiplicative gain expansion module 320.
  • Multiplicative gain expansion module 320 outputs the modified signal and provides signal to reverb module 330 (which may also function to de-reverb).
  • Reverb module 330 receives the sub-band signals output by multiplicative gain expansion module 320, as well as the microphone signals which were also received by beam former module 310, and performs reverberation or dereverberation to the sub- band signals output by multiplicative gain expansion module 320.
  • Reverb module 330 may adjust a ratio of direct energy to remaining energy within a signal based on the zoom control indicator provided by zoom control module 350.
  • Adjusting the reverb for a signal may involve adjusting the energy of different components of the signal.
  • An audio signal has several components in a frequency domain, including a direct component, early reflections, and a tail
  • a direct component typically has the highest energy level, followed by a somewhat lower energy level of reflections within the signal. Also included within a very particular signal is a tail, which may include noise and other low energy data or low energy audio.
  • a reverberation is defined as reflections of the direct audio component. Hence, a reverberation with many reflections over a broad frequency range results in a more noticeable reverberation. A signal with fewer reflection components has a smaller reverberation component.
  • reverb module 330 may adjust the reverberation components in the signal received from multiplicative gain expansion module 320. Hence, if the zoom indicator received indicates that a zoom in operation is to be performed on the audio, the reverberation will be decreased by minimizing the reflection components of the received signal.
  • reverb module 330 provides the modified signal to mixer module 340.
  • the mixer module 340 receives the reverberation adjusted signal and mixes the signal with the signal from the primary microphone. In some embodiments, mixer module 340 increases the energy of the signal appropriately when there is audio present in the frame and decreases it where there is little audio energy present in the frame.
  • FIG. 4 is a block diagram of an exemplary beam former module.
  • the beam former module 310 may be implemented per tap (i.e., per sub-band).
  • Beam former module 310 receives FCT output signals for a first microphone (such as a primary microphone) and a second microphone.
  • the first microphone FCT signal is processed by module 410 according to the function:
  • the secondary microphone FCT signal is processed by module 420 according to the function:
  • module 410 is then subtracted from the secondary microphone FCT signal at combiner 440 and the output of module 420 is then subtracted by the primary microphone FCT signal at combiner 430.
  • a cardioid signal Cf is output from combiner 430 and provided to module 450 where the following function is applied:
  • a cardioid signal Cb is output from combiner 440 and provided to module 460 where the following function is applied:
  • the difference of the outputs of modules 450 and 460 is determined by element 470 and output as an ILD cue.
  • the ILD cue may be output by beam former module 310 to a post filter (for example, a filter implemented by multiplicative gain expansion module 320).
  • FIG. 5 is a flow chart of an exemplary method for performing an audio zoom.
  • An acoustic signal is received from one or more sources at step 510.
  • the acoustic signals are received through one or more microphones on audio device 104.
  • acoustic signals from audio sources 112-116 and reflections 128-129 are received through microphones 106 and 108 of audio device 104.
  • a zoom indication is then received for a spatial area at step 520.
  • the zoom indication is received from a user or determined based on other data.
  • the zoom indication is received from a user via a video zoom setting, pointing an audio device in a particular direction, an input for video zoom, or in some other manner.
  • Acoustic signal component energy levels are enhanced based on the zoom indication at step 530.
  • acoustic signal component energy levels are enhanced by increasing the energy levels for audio source sub-band signals that originate from a source device within a selected beam area. Audio signals from a device outside a selected beam area are de-emphasized. Enhancing acoustic signal component energy levels is discussed in more detail below with respect to the method of FIG. 6.
  • Reverberation signal components associated with a position inside the spatial area are adjusted based on the received indication at step 540.
  • the adjustments may include modifying the ratio of a direct component with respect to reflection components for the particular signal.
  • reverberation should be decreased by increasing the ratio of the direct component to the reflection components in the audio signal.
  • the direct component is reduced with respect to the reflection components to decrease the ratio of direct to reflection components of the audio signal.
  • a modulated gain is applied to the signal component at step 550.
  • the gain may be applied by mixing a reverb processed acoustic signal with a primary acoustic signal (or another audio signal received by audio device 104).
  • the mixed signal that has been processed by audio zoom is output at step 560.
  • FIG. 6 is a flow chart of an exemplary method for enhancing acoustic signal components.
  • the method in FIG. 6 provides more detail for step 530 of the method in FIG. 5.
  • An audio source is detected in the direction of a beam at step 610. This detection may be performed by a null-processing noise subtraction mechanism or some other module that is able to identify a spatial position of a source based on audio signals received by two or more microphones.
  • Acoustic signal sources located outside the spatial area are attenuated at step 620.
  • the acoustic sources outside the spatial area include certain audio sources (e.g., 112 in FIG. 1) and reflected audio signals such as reflections 128 and 129.
  • Adaptation constraints are then used to steer the beam based on the zoom indication at step 630.
  • the adaptation constraints include a and ⁇ constraints used in a null processing noise suppression system. The adaptation constraints may also be derived from multiplicative expansion or selection of a region around a preferred direction based on a beam pattern.
  • FIG. 7 is a flow chart of an exemplary method for generating a multiplicative mask. The method of FIG. 7 provides more detail for step 650 in the method of FIG. 6. Differential arrays are generated from microphone signals at step 710. The arrays may be generated as part of a beam former module 310.
  • the beam pattern may be a cardiod pattern generated based at least in part from the differential output signals.
  • a beam pattern is generated from the differential arrays at step 720.
  • Energy ratios are then generated from beam patterns at step 730.
  • the energy ratios may be generated as any of a combination of signals.
  • an ILD map may be generated per frequency from energy ratios.
  • An ILD range corresponding to the desired selection may be selected.
  • An ILD window may then be applied to a map by boosting the signal components within the window and attenuating the signal components positioned outside the window.
  • a filter such as a post filter, may be derived from the energy ratio at step 740.
  • the above described modules may include instructions stored in a storage media such as a machine readable medium (e.g., computer readable medium). These instructions may be retrieved and executed by the processor 220 to perform the functionality discussed herein. Some examples of instructions include software, program code, and firmware. Some examples of storage media include memory devices and integrated circuits.
  • FIG. 8 is a block diagram illustrating an audio processing system 800, according to another example embodiment.
  • the example audio processing system 800 includes a source estimation subsystem 830 coupled to various elements of an example AZA subsystem.
  • the example AZA subsystem includes limiters 802a, 802b, and 802n, FCT modules 804a, 804b, and 804n, analysis module 806, zoom control module 810, signal modifier 812, element 818, and a limiter 820.
  • the source estimation subsystem 830 may include a source direction estimator (SDE) module 808, also referred to as a target estimator, a gain module 816, and an automatic gain control (AGC) module 814.
  • SDE source direction estimator
  • AGC automatic gain control
  • the example audio processing system 800 processes acoustic audio signal from microphones 106a, 106b, and 106n.
  • SDE module 808 is operable to localize a source of sound.
  • the SDE module 808 may generate cues based on correlation of phase plots between different microphone inputs. Based on the correlation of the phase plots, the example SDE module 808 can compute a vector of salience estimates at different angles. Based on the salience estimates, the SDE module 808 can determine a direction of the source. In other words, according to various embodiments, a peak in the vector of salience estimates is an indication of direction of source in a particular direction. At the same time, sources of diffused nature, i.e., non-directional, may be represented by poor salience estimates at all the angles. Various embodiments may rely upon the cues (estimates of salience) to improve the performance of an existing directional audio solution, which is carried out by the analysis module 806, signal modifier 812, and zoom control module 810.
  • estimates of salience are used to localize the angle of the source in the range of 0 to 360 degrees in a plane parallel to the ground, when, for example, the audio device 104 is placed on a table top.
  • the estimates of salience can be used to attenuate/amplify the signals at different angles as required by the customer/user.
  • the SDE module 808 is configured to operate in two and more modes.
  • the modes of operation can include "normal,” “noisy,” and
  • a "normal" mode of operation is defined by a single directional speech source without the presence of any kind of strong speech distractors with or without the presence of noise.
  • a vector of salience estimates in such case can be characterized by a single peak (above a salience threshold).
  • the single peak can indicate a presence of a single source of sound.
  • the location of the peak, in the vector of salience estimates may characterize the angle of the source.
  • both a diffused source detector and a simultaneous talker detector may be set to a "no" state. Based on these states, the target estimator, in various embodiments, drives the level of suppression/amplification as desired by the user on a per angle basis.
  • the target estimator generates a mapping of angle to relative levels of attenuation in the AZA subsystem. For example, a range of angles 240-270 degrees may require lOdB of incremental suppression relative to AZA's performance target estimator containing an array with OdB throughout except for the entries between 240 and 270 degrees.
  • the AGC module 814 can control the rate of roll-off by means of attack and release time constants. A smooth roll-off can effectively stabilize the speech system without audible distortions in the audio.
  • noise if present along with the directional speech, is alleviated by the AZA subsystem.
  • a noisy mode of operation can be characterized by a diffused noise source with no directional speech.
  • the noisy mode can result in poor salience estimates for all angles. Since there is no directional information of the source of such data, the signal can be processed solely by the AZA subsystem.
  • interactions between the noisy mode and the normal mode of operations are handled smoothly without sudden switch overs to avoid pumping or any gain related artifacts.
  • a target estimator can provide a target of OdB to the AGC module 814. By appropriately handling the attack and release time, a smooth handover can be achieved. It should be noted, however, that the attack and release time in the noisy mode are different from the attack and release time used in the normal mode.
  • a simultaneous talkers mode is characterized by simultaneous multiple talkers/side distractors either with or without noise.
  • the salience vector for the simultaneous talkers mode can be characterized by multiple peaks (above a salience threshold).
  • the simultaneous talkers mode can be handled in a way similar to the noisy mode.
  • SDE module operates in the simultaneous talkers mode, acoustic signals from the microphones can be processed solely by the AZA subsystem.
  • a handover between the above modes can be carried out in a religious manner with the help of the AGC subsystem.
  • Various embodiments of the technology described herein can avoid the problem of microphone sealing by ignoring any inter-microphone signal level differences.
  • Various embodiments focus instead on the time of arrival/phase cues between the microphones.
  • the underlying AZA sub-system may still be sensitive to the microphone sealing, and therefore the overall system performance may depend on the microphone sealing.
  • an AZA sub-system may be tuned based on characteristics of the sealing of the microphones utilized to reduce the sensitivity on the microphone sealing. Further details regarding exemplary tuning of the AZA sub-system may be found in U.S. Patent Application No. 12/896,725, filed October 1, 2010, incorporated by reference herein.
  • Various embodiments of the present technology may utilize the fact that SDE salience varies very little with the change of a distance between a talker/speaker and an audio device, when the distance is in the range of 0.5m-2m and the speaker's mouth is at around 30cm above the audio device. This can make the audio processing system 800 more robust to the distance variance and can result in an even/similar performance for a talker talking at these distances.
  • the AZA subsystem may be tuned to take full-advantage of the robustness to the distance.
  • the target estimator block 808 can provide relative levels of suppression based on the angle of arrival of sounds independently of the AZA subsystem.
  • the target estimator block can be controlled independently without any interactions with other subsystems.
  • This independently controllable (e.g. "island") architecture can empower the field tuning engineers to match the performance desired by a customer/user.
  • the array of the target estimators during the "normal" mode of operation provides a powerful tool which can allow implementing the above architecture by manipulating the angle of the
  • FIG. 9 is a flow chart showing steps of a method 900 of improving
  • the example method 900 includes correlating phase plots of at least two audio inputs.
  • the audio inputs can be captured by at least two microphones having different sealing.
  • the example method 900 allows generating, based on the correlation, estimates of salience at different directional angles to localize at least one direction associated with at least one source of a sound.
  • the estimates of salience include a vector of saliences at directional angles from 0 to 360 in a plane parallel to a ground.
  • the example method 900 includes determining cues based on the estimates of salience.
  • the example method 900 includes providing those "estimates of salience" -based cues to a directional audio capture system.
  • the example method 900 includes determining, based on the estimates of salience (e.g., absence or presence of one or more peaks in the estimates of salience), a mode from a plurality of the operational modes.
  • the operational modes include a "normal" mode characterized by a single directional speech source, a "simultaneous talkers" mode characterized by the presence of at least two single directional speech sources, and a noisy mode
  • the example method 900 includes configuring, based on the determined mode, the directional audio capture system.
  • the example method 900 includes determining, based on the estimates of salience and the determined mode, other cues including at least levels of attenuation.
  • the example method 900 includes controlling a rate of switching between modes from the plurality of the operational modes in real time by applying attack and release time constants.
  • FIG. 10 illustrates an exemplary computer system 1000 that may be used to implement some embodiments of the present disclosure.
  • the computer system 1000 of FIG. 10 may be implemented in the contexts of the likes of computing systems, networks, servers, or combinations thereof.
  • the computer system 1000 of FIG. 10 includes one or more processor units 1010 and main memory 1020.
  • Main memory 1020 stores, in part, instructions and data for execution by processor units 1010.
  • Main memory 1020 stores the executable code when in operation, in this example.
  • the computer system 1000 of FIG. 10 further includes a mass data storage 1030, portable storage device 1040, output devices 1050, user input devices 1060, a graphics display system 1070, and peripheral devices 1080.
  • FIG. 10 The components shown in FIG. 10 are depicted as being connected via a single bus 1090. The components may be connected through one or more data transport means. Processor unit 1010 and main memory 1020 are connected via a local bus 1090.
  • microprocessor bus and the mass data storage 1030, peripheral device(s) 1080, portable storage device 1040, and graphics display system 1070 are connected via one or more input/output (I/O) buses.
  • I/O input/output
  • Mass data storage 1030 which can be implemented with a magnetic disk drive, solid state drive, or an optical disk drive, is a non- volatile storage device for storing data and instructions for use by processor unit 1010. Mass data storage 1030 stores the system software for implementing embodiments of the present disclosure for purposes of loading that software into main memory 1020.
  • Portable storage device 1040 operates in conjunction with a portable nonvolatile storage medium, such as a flash drive, floppy disk, compact disk, digital video disc, or Universal Serial Bus (USB) storage device, to input and output data and code to and from the computer system 1000 of FIG. 10.
  • a portable nonvolatile storage medium such as a flash drive, floppy disk, compact disk, digital video disc, or Universal Serial Bus (USB) storage device
  • USB Universal Serial Bus
  • User input devices 1060 can provide a portion of a user interface.
  • User input devices 1060 may include one or more microphones, an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys.
  • User input devices 1060 can also include a touchscreen.
  • the computer system 1000 as shown in FIG. 10 includes output devices 1050. Suitable output devices 1050 include speakers, printers, network interfaces, and monitors.
  • Graphics display system 1070 include a liquid crystal display (LCD) or other suitable display device. Graphics display system 1070 is configurable to receive textual and graphical information and process the information for output to the display device.
  • LCD liquid crystal display
  • Peripheral devices 1080 may include any type of computer support device to add additional functionality to the computer system.
  • the components provided in the computer system 1000 of FIG. 10 are those typically found in computer systems that may be suitable for use with embodiments of the present disclosure and are intended to represent a broad category of such computer components that are well known in the art.
  • the computer system 1000 of FIG. 10 can be a personal computer (PC), hand held computer system, telephone, mobile computer system, workstation, tablet, phablet, mobile phone, server, minicomputer, mainframe computer, wearable, or any other computer system.
  • the computer may also include different bus configurations, networked platforms, multi-processor platforms, and the like.
  • Various operating systems may be used including UNIX, LINUX,
  • WINDOWS MAC OS
  • PALM OS PALM OS
  • QNX ANDROID IOS
  • CHROME CHROME
  • TIZEN TIZEN
  • the processing for various embodiments may be implemented in software that is cloud-based.
  • the computer system 1000 is implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud.
  • the computer system 1000 may itself include a cloud-based computing environment, where the functionalities of the computer system 1000 are executed in a distributed fashion.
  • the computer system 1000 when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
  • a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices.
  • Systems that provide cloud-based resources may be utilized exclusively by their owners, or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
  • the cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as the computer system 1000, with each server (or at least a plurality thereof) providing processor and/or storage resources.
  • These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users).
  • each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

L'invention concerne des systèmes et des procédés pour améliorer la performance d'un système de capture audio directionnelle. Un procédé illustratif consiste à corréler des courbes de phase d'au moins deux entrées audios, les entrées audios étant capturées par au moins deux microphones. Le procédé peut aussi consister à produire, en fonction de la corrélation, des estimations de proéminence à des angles directionnels différents pour localiser une direction d'une source sonore. Le procédé peut permettre de fournir des indications au système de capture audio directionnelle en fonction des estimations. Les indications comprennent des niveaux d'atténuation. Un taux de changement des niveaux d'atténuation est commandé par des constantes de temps d'attaque et de relâchement pour éviter les artéfacts sonores. Le procédé consiste également à déterminer un mode selon l'absence ou la présence d'une ou plusieurs crêtes dans les estimations de proéminence. Le procédé permet également de configurer le système de capture audio directionnelle en fonction du mode déterminé.
PCT/US2015/063519 2014-12-30 2015-12-02 Capture audio directionnelle WO2016109103A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201580071317.7A CN107113499B (zh) 2014-12-30 2015-12-02 定向音频捕获
DE112015005862.1T DE112015005862T5 (de) 2014-12-30 2015-12-02 Gerichtete Audioerfassung

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462098247P 2014-12-30 2014-12-30
US62/098,247 2014-12-30

Publications (1)

Publication Number Publication Date
WO2016109103A1 true WO2016109103A1 (fr) 2016-07-07

Family

ID=56284893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/063519 WO2016109103A1 (fr) 2014-12-30 2015-12-02 Capture audio directionnelle

Country Status (3)

Country Link
CN (1) CN107113499B (fr)
DE (1) DE112015005862T5 (fr)
WO (1) WO2016109103A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235164B (zh) * 2017-12-13 2020-09-15 安克创新科技股份有限公司 一种麦克风颈环耳机

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110075857A1 (en) * 2009-09-29 2011-03-31 Oki Electric Industry Co., Ltd. Apparatus for estimating sound source direction from correlation between spatial transfer functions of sound signals on separate channels
US20110129095A1 (en) * 2009-12-02 2011-06-02 Carlos Avendano Audio Zoom
US8194880B2 (en) * 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8233352B2 (en) * 2009-08-17 2012-07-31 Broadcom Corporation Audio source localization system and method
US20120257778A1 (en) * 2011-04-08 2012-10-11 Board Of Regents, The University Of Texas System Differential microphone with sealed backside cavities and diaphragms coupled to a rocking structure thereby providing resistance to deflection under atmospheric pressure and providing a directional response to sound pressure
US20140003622A1 (en) * 2012-06-28 2014-01-02 Broadcom Corporation Loudspeaker beamforming for personal audio focal points

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8767975B2 (en) * 2007-06-21 2014-07-01 Bose Corporation Sound discrimination method and apparatus
US9622006B2 (en) * 2012-03-23 2017-04-11 Dolby Laboratories Licensing Corporation Method and system for head-related transfer function generation by linear mixing of head-related transfer functions

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8194880B2 (en) * 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8233352B2 (en) * 2009-08-17 2012-07-31 Broadcom Corporation Audio source localization system and method
US20110075857A1 (en) * 2009-09-29 2011-03-31 Oki Electric Industry Co., Ltd. Apparatus for estimating sound source direction from correlation between spatial transfer functions of sound signals on separate channels
US20110129095A1 (en) * 2009-12-02 2011-06-02 Carlos Avendano Audio Zoom
US20120257778A1 (en) * 2011-04-08 2012-10-11 Board Of Regents, The University Of Texas System Differential microphone with sealed backside cavities and diaphragms coupled to a rocking structure thereby providing resistance to deflection under atmospheric pressure and providing a directional response to sound pressure
US20140003622A1 (en) * 2012-06-28 2014-01-02 Broadcom Corporation Loudspeaker beamforming for personal audio focal points

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones

Also Published As

Publication number Publication date
DE112015005862T5 (de) 2017-11-02
CN107113499A (zh) 2017-08-29
CN107113499B (zh) 2018-09-18

Similar Documents

Publication Publication Date Title
US9838784B2 (en) Directional audio capture
US10257611B2 (en) Stereo separation and directional suppression with omni-directional microphones
US9210503B2 (en) Audio zoom
US9799330B2 (en) Multi-sourced noise suppression
US9668048B2 (en) Contextual switching of microphones
US10271135B2 (en) Apparatus for processing of audio signals based on device position
EP3189521B1 (fr) Procédé et appareil permettant d'améliorer des sources sonores
US9426568B2 (en) Apparatus and method for enhancing an audio output from a target source
US9595997B1 (en) Adaption-based reduction of echo and noise
US9521486B1 (en) Frequency based beamforming
EP2320676A1 (fr) Procédé, dispositif de communication et système de communication pour commander une focalisation sonore
US20170208391A1 (en) Acoustic echo cancellation reference signal
US10200787B2 (en) Mixing microphone signals based on distance between microphones
US9532138B1 (en) Systems and methods for suppressing audio noise in a communication system
EP2996352B1 (fr) Système et procédé audio utilisant un signal de haut-parleur pour la réduction des bruits de vent
WO2016109103A1 (fr) Capture audio directionnelle
Tashev Recent advances in human-machine interfaces for gaming and entertainment
US20180277134A1 (en) Key Click Suppression
US20220337945A1 (en) Selective sound modification for video communication
WO2023086273A1 (fr) Atténuation distribuée de dispositif audio
WO2023086303A1 (fr) Rendu basé sur l'orientation d'un haut-parleur
JP2023551704A (ja) サブ帯域ドメイン音響エコーキャンセラに基づく音響状態推定器
JP2011182292A (ja) 収音装置、収音方法及び収音プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15875926

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 112015005862

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15875926

Country of ref document: EP

Kind code of ref document: A1