US20100103776A1 - Audio source proximity estimation using sensor array for noise reduction - Google Patents

Audio source proximity estimation using sensor array for noise reduction Download PDF

Info

Publication number
US20100103776A1
US20100103776A1 US12/603,824 US60382409A US2010103776A1 US 20100103776 A1 US20100103776 A1 US 20100103776A1 US 60382409 A US60382409 A US 60382409A US 2010103776 A1 US2010103776 A1 US 2010103776A1
Authority
US
United States
Prior art keywords
audio
amplitudes
determining
proximity
audio signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/603,824
Other versions
US8218397B2 (en
Inventor
Kwokleung Chan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US12/603,824 priority Critical patent/US8218397B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to KR1020117011581A priority patent/KR101260131B1/en
Priority to JP2011533361A priority patent/JP5551176B2/en
Priority to CN200980142292XA priority patent/CN102197422B/en
Priority to PCT/US2009/061807 priority patent/WO2010048490A1/en
Priority to EP09748604A priority patent/EP2353159B1/en
Priority to TW098136052A priority patent/TW201042634A/en
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAN, KWOKLEUNG
Publication of US20100103776A1 publication Critical patent/US20100103776A1/en
Application granted granted Critical
Publication of US8218397B2 publication Critical patent/US8218397B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S11/00Systems for determining distance or velocity not using reflection or reradiation
    • G01S11/14Systems for determining distance or velocity not using reflection or reradiation using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/0332Details of processing therefor involving modification of waveforms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • the present disclosure pertains generally to audio signal processing, and more specifically, to near-field audio signal detection and noise suppression.
  • Devices such as cellular phones, two-way radios and personal digital assistants (PDAs) that accept audio input are often used in adverse noise environments such as crowds, busy streets, restaurants, airports, vehicles or the like.
  • adverse noise environments such as crowds, busy streets, restaurants, airports, vehicles or the like.
  • Unwanted sounds generated from various sound sources within an audio environment referred to as background noise, can emanate from differing locations within that audio environment. Common examples can include, but are not limited to, automobile noises or other voices within a crowded public place. Regardless of the source, the inability to distinguish a desired audio signal from background noise can result in audio input signals having decreased quality.
  • Disclosed herein is an improved technique for suppressing background noise received by an audio input device.
  • the technique permits an audio input device to differentiate between relatively distant noise sources and sound originating at close proximity to the device.
  • the technique can be applied to mobile handsets, such as cellular phones or PDAs, hands-free headsets, and other audio input devices. Audio input devices taking advantage of this “close proximity” detection are better able to suppress background noise and deliver an improved user experience.
  • a method of determining the proximity of an audio source includes transforming audio signals from a plurality of sensors to frequency domain. The amplitudes of the transformed audio signals are then determined. The proximity of the audio source is determined based on a comparison of the amplitudes.
  • a method of determining the proximity of an audio source includes receiving audio signals from a plurality of sensors and transforming the audio signals to frequency domain. The amplitudes of the transformed audio signals are determined at a plurality of frequencies. For each frequency, a differential signal is determined by comparing the spectral amplitudes from the different sensor at the frequency. This produces a plurality of differential signals. The proximity of the audio source is determined based on the differential signals.
  • an apparatus includes a plurality of audio sensors outputting a plurality of audio signals in response to an audio source.
  • a processor included in the apparatus is configured to transform the audio signals to frequency domain and to also determine the proximity of the audio source by comparing amplitudes of the transformed audio signals.
  • an apparatus includes means for transforming a plurality of audio signals from a plurality of sensors to frequency domain; means for determining amplitudes of the transformed audio signals; means for comparing the amplitudes; and means for determining the proximity of the audio source based on the comparison of the amplitudes.
  • a computer-readable medium embodying a set of instructions executable by one or more processors, includes code for transforming a plurality of audio signals from a plurality of sensors to frequency domain; code for determining amplitudes of the transformed audio signals; code for comparing the amplitudes; and code for determining the proximity of the audio source based on the comparison of the amplitudes.
  • FIG. 1 is a diagram of an exemplary audio environment including a near-field audio source and a far-field background audio source.
  • FIG. 2 is a diagram conceptually illustrating sound waves emitted from a near-field audio source.
  • FIG. 3 is a diagram conceptually illustrating sound waves emitted from a far-field audio source.
  • FIG. 4 is a flowchart illustrating a method of determining the proximity of an audio source by comparing signal amplitude from different audio sensors.
  • FIG. 5 is a flowchart illustrating a method of determining the proximity of an audio source using beam forming.
  • FIG. 6 is a flowchart illustrating a method of determining the proximity of an audio source by comparing spectral components of incoming audio.
  • FIG. 7 is a process block diagram showing a process of spectral noise reduction.
  • FIG. 8 is a more detailed process block diagram showing the process of spectral noise reduction.
  • FIG. 9 is a block diagram showing certain components of an exemplary headset device having audio source proximity estimation capability.
  • FIG. 10 shows graphs depicting exemplary background noise suppression.
  • FIG. 1 is a diagram of an exemplary audio environment 10 including an audio input device such as a headset 16 , a near-field audio source 15 such as the mouth of a user 12 , and a far-field background audio source 14 , such as a radio.
  • the headset 16 is subjected to the near-field audio source 15 and far-field audio source 14 .
  • the far-field audio source 14 is located farther away from the audio input device than the near-field audio source 15 .
  • Each of the audio sources 14 , 15 can be anything that emits sounds.
  • the headset 16 uses a sensor array to estimate the proximity of the audio sources 14 , 15 and thereafter control a noise reduction module included in the headset 16 to suppress audio signals classified as far-field.
  • the sensor array includes a first audio sensor 18 and second audio sensor 20 included in the headset 16 .
  • the audio sensors 18 , 20 are spaced apart, for example, they may by 2-4 cm apart.
  • the audio sensors 18 , 20 can be microphones or any other suitable audio transducer responsive to sound input.
  • the first audio sensor 18 is closer to the user's mouth than the second audio sensor 20 .
  • audio signals originating from the user's mouth that are picked up by the first audio sensor 18 are louder than the same signal picked up by second audio sensor 20 .
  • the magnitude difference between the detected audio signals can range from 1 dB to 3 dB or more, depending on the relative distance from the mouth to the audio sensors 18 , 20 .
  • the signal level difference is signal frequency dependent. Typically, higher signal frequency gives a higher signal level difference because of the diffraction effect.
  • the headset 16 monitors and compares the signal amplitude levels at the two audio sensor 18 , 20 to estimate the audio source proximity.
  • the exemplary headset 16 includes an earpiece body 17 and at least one support 19 , such as an ear hanger, for allowing the headset 16 to be comfortably worn by the user 12 .
  • a boom 21 can also be included in the headset 16 for placing the first audio sensor 18 closer to the user's mouth.
  • the second audio sensor 20 can be included in the earpiece body 17 , as shown.
  • the headset 16 is a wireless headset, such as a Bluetooth headset, in which audio signals between one or more devices and the headset 16 are carried over one or more wireless radio frequency (RF) or infrared (IR) channels.
  • RF radio frequency
  • IR infrared
  • the headset 16 can include components and functionality as defined by the Bluetooth Specification available at www.bluetooth.com.
  • the Bluetooth Specification provides specific guidelines for providing wireless headset functionality.
  • the headset 16 may be a wired headset, having a conductor carrying audio signals between a device and the headset 16 .
  • audio input device is illustrated as the headset 16
  • the audio source proximity estimation and noise suppression techniques and devices disclosed herein may also be included in other audio input devices, such as communication devices, e.g., phones, cellular phones, PDAs, video game, voice-activated remotes, live reporting systems, public address systems or the like.
  • An audio input device is a device that receives sound.
  • FIG. 2 is a diagram conceptually illustrating the headset 16 subjected to sound waves emitted from the near-field audio source 15 . Since the first audio sensor 18 is relatively close to the audio source 15 than the second audio sensor 20 , the amplitude of the sound received at the sensors 18 , 20 from the source 15 is measurably different. This difference in sensor amplitudes is exploited by the headset 16 to determine whether an audio source is near or distant to the headset 16 .
  • FIG. 3 is a diagram conceptually illustrating the headset 16 subjected to sound waves emitted from a far-field audio source 14 . Since the audio sensors 18 , 20 are close to each other relative to the distance from the far-field audio source 14 , they pickup the audio at roughly the same amplitude level, irrespective of the direction of arrival of the audio signal. As a result, a system which monitors the signal levels received by the two sensors 18 , 20 is able to estimate the audio source proximity.
  • FIG. 4 is a flowchart 100 illustrating a method for estimating audio source proximity based on the audio signal levels at the sensor array elements, e.g., audio sensors 18 , 20 .
  • audio input signals are received from the audio sensors. Each sensor provides a separate audio signal, also referred to as an audio channel. Each audio signal represents sound received at a particular audio sensor.
  • the incoming audio signals are pre-conditioned. The pre-conditioning may include band-pass filtering each of the audio signals to reject interfering signals outside the frequency range of interest. For example, the audio signals may be filtered to remove signal outside the human audible range.
  • the audio input signals may also be individually amplified to account for the difference in intrinsic sensitivity of the individual sensors.
  • the signal levels from the audio sensors should more accurately represent the signal strengths arriving at the audio sensors.
  • the audio sensors can be calibrated during manufacturing of the audio input device to obtain the correct amplification factor. If pre-use estimation of the correction factor is not feasible, the audio sensors can be calibrated and the correction factor can also be estimated during operation of the audio input device through an automatic gain matching mechanism.
  • the audio signals may be initially received from the sensors as analog signals and then converted into digital audio signals by an analog-to-digital (A/D) converter.
  • A/D analog-to-digital
  • the signal pre-conditioning described above can be performed on the analog audio signals, digitized audio signals, or in any suitable combination of the digital and analog processing domains.
  • the amplitude of each audio sensor signal is determined.
  • one method is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series.
  • PCM pulse code modulation
  • the digitized incoming audio signals from each sensor are divided into audio frames of a predetermined length, e.g., 10 mS (milliseconds). Other suitable frame lengths may be used, such as 20 mS.
  • the amplitude of each audio signal is then computed on a frame-by-frame basis.
  • the amplitude of an audio signal in a frame is computed for each sensor as:
  • amp(n) represents the audio signal amplitude of the nth frame
  • n is the frame index
  • x k (t) represents a digital audio sample at time t
  • k denotes the kth sensor
  • t is the time index for the incoming audio signal samples.
  • p is a pre-chosen parameter, that may have a value greater than one, for example, p may equal two.
  • the summation is over all the audio samples in the frame.
  • the audio signal amplitude amp k (n) may also be smoothed over successive frames using a smoothing function, such as:
  • amp_sm k (n) is the smoothed amplitude value of the nth frame
  • amp_sm k (n ⁇ 1) is the smoothed amplitude value of the (n ⁇ 1)th frame
  • is a predefined weighting constant, preferably having a value less than one.
  • smoothed frame amplitudes may optionally be converted to the log domain.
  • the smoothed frame amplitudes may be converted to log domain according to Equation 3, below.
  • log_amp_sm k (n) is the log value of the smoothed amplitude value of the nth frame.
  • the audio channel amplitudes are then compared on a frame-by-frame basis to find the difference between channel amplitudes.
  • the difference, diffAmp(n) can be computed as:
  • diffAmp(n) represents the difference between the channel amplitudes for the nth frame, for a first audio channel and a second audio channel.
  • the amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_sm k (n) for the two channels.
  • the proximity of the audio source is determined.
  • the amplitude difference between the audio channels is compared to a predefined threshold.
  • diffAmp(n) for Equation 4 is compared to a threshold. If diffAmp(n) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n) falls below the threshold for a predefined number of consecutive frames.
  • a noise reduction/suppression module of the audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • a near_field_score for each frame may be computed from diffAmp(n) through the division by a predefined normalization factor, as given, for example, by Equation 5, below.
  • the normalization factor, norm_factor may be any suitable constant value or function.
  • the near_field_score(n) may further be converted to a probability value indicating the likelihood that the audio source is near-field.
  • the conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6, below.
  • Equation 6 u is the near_field_score(n), f(u) represents the probability value, and A and B are constants.
  • the amount suppression applied by the noise reduction/suppression module may then be made a function of the near_field_score(n), or alternatively, the near-field probability value, f(u).
  • the score or probability value is compared to a predefined threshold. If the score or f(u) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device.
  • This flag may stay on until the score or f(u) falls below the threshold for a predefined number of consecutive frames.
  • Different threshold values can be used for the near_field_score and probability.
  • a noise reduction/suppression module of the audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise. Or alternatively, the amount of suppression is made a function of the near_field_score(n) or the near-field probability values, f(u). Typically, as the score or probability decreases, stronger suppression is applied.
  • FIG. 5 is a flowchart 200 illustrating method of determining the proximity of an audio source using beamforming. The method begins by receiving multi-channel audio inputs from plural audio sensors and pre-conditioning the audio signals (blocks 102 - 104 ), as described above in connection with the method of FIG. 4 .
  • beamforming is applied to the digitized audio channels to improve the accuracy of the proximity estimation.
  • the audio input signals are passed through a beam-former to enhance audio signals received from a direction of interest, for example, from the frontal direction.
  • the spatial selectivity of incoming audio is achieved by using adaptive or fixed receive beam patterns.
  • Suitable beamforming techniques are readily available for application in the audio input devices disclosed herein. For example, the output of a beamformer, y k (t), is given by:
  • Equation 7 denotes a convolution function
  • W kk′ is a weighting factor
  • k indicates the kth audio sensor
  • k′ indicates the k′th audio sensor
  • x k′ (t) represents a digital audio sample from the k′th audio sensor at time t.
  • the beamformed audio signals, y k (t) can then be processed in a manner similar to that described in blocks 106 - 110 of FIG. 4 .
  • the amplitude of each beamformed audio sensor signal is determined.
  • one method is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series.
  • PCM pulse code modulation
  • the digitized beamformed audio signals from each sensor are divided into of audio frames of a predetermined length, e.g., 10 mS. Other suitable frame lengths may be used, such as 20 mS.
  • the amplitude of each beamformed audio signal is then computed on a frame-by-frame basis.
  • the amplitude of a beamformed audio signal in a frame may be computed for each sensor using Equation 1, substituting y k (t) for x k (t).
  • the beamformed audio signal amplitude may also be smoothed over successive frames using a smoothing function, such as the smoothing function given by Equation 2.
  • the smoothed frame amplitudes may optionally be converted to the log domain according to Equation 3.
  • the beamformed audio channel amplitudes are then compared on a frame-by-frame basis to find the difference between channel amplitudes.
  • the beamformed amplitude difference can be determined according to Equation 4.
  • the beamformed amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_sm k (n) for the two beamformed channels.
  • the proximity of the audio source is determined.
  • the amplitude difference between the beamformed audio channels is compared to a predefined threshold.
  • diffAmp(n) of Equation 4 is compared to a threshold. If the diffAmp(n) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n) falls below the threshold for a predefined number of consecutive frames.
  • a noise reduction/suppression module of the audio input device may suppress the incoming audio signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • a near_field_score for each beamformed frame may be computed from diffAmp(n) through the division by a predefined normalization factor, as given, for example, by Equation 5.
  • the near_field_score(n) for the beamformed audio channels may further be converted to a probability value indicating the likelihood that the audio source is near-field.
  • the conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6.
  • the amount suppression applied by the noise reduction/suppression module of a beamforming audio input device may then be made a function of the near_field_score(n), or alternatively, the near-field probability value.
  • the score or probability value f(u) is compared to a predefined threshold. If the score or f(u) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until the score or f(u) falls below the threshold for a predefined number of consecutive frames. Different threshold values can be used for the score and probability value.
  • a noise reduction/suppression module of the beamforming audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise. Or alternatively, the amount of suppression is made a function of the near_field_score(n) or the near-field probability values, f(u). Typically, as the score or probability decreases, stronger suppression is applied.
  • FIG. 6 is a flowchart 300 illustrating a method of determining the proximity of an audio source by comparing frequency components of incoming audio. The method begins by receiving multi-channel audio inputs from plural audio sensors and pre-conditioning the audio signals (blocks 102 - 104 ), as described above in connection with the method of FIG. 4 .
  • the sensor signals are transformed to the frequency domain.
  • This transformation of each signal can be done using, for example, as fast Fourier transform (FFT), discrete Fourier transform (DFT), discrete cosine transform (DCT), wavelet transformation, or any other suitable transformation.
  • FFT fast Fourier transform
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • wavelet transformation wavelet transformation
  • an FFT is used to convert the audio signals from the sensor to the frequency domain.
  • One method for accomplishing the transformation is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series.
  • PCM pulse code modulation
  • the digitized audio signals from each sensor are divided into a sequence of audio frames of a predetermined length, e.g., 10 mS (milliseconds). Other suitable frame lengths may be used, such as 20 mS.
  • a frequency domain transform is then applied to the audio samples in each frame.
  • the amplitude of the transformed audio signals is determined.
  • the frequency amplitudes of each transformed audio signal may be computed on a frame-by-frame basis, with the amplitude amp k (n,f) at a particular frequency, f, of the nth frame being obtained directly from the transform function.
  • the range of frequencies of interest may be any desirable frequency spectrum, for example, the audible range of human hearing.
  • Each frequency of interest in the range may be a particular frequency or bandwidth different from other frequencies or bandwidths of interest within the range.
  • the frequencies of interest may be spaced at regular intervals, e.g., 100 Hz, or spaced at non-regular intervals.
  • the frequency amplitudes may be smoothed according to Equation 2, at each frequency of interest to yield amp_sm k (n,f), and optionally converted to the log domain using Equation 3 at each frequency of interest to yield log_amp_sm k (n,f), computed for each frequency f.
  • the amplitudes (e.g., magnitudes) of the transformed sensor signals are compared to one another.
  • a diffAmp(n,f), near_field_score(n,f) may be computed at each frequency, f, according to Equations 4 and 5, respectively.
  • the frequency domain amplitude difference can be determined according to Equation 4.
  • the frequency domain amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_sm k (n,f) for the two transformed channels.
  • a near-field flag may also be computed separately for each frequency.
  • the proximity of the audio source is determined.
  • the amplitude difference between the frequency-transformed audio channels is compared to a predefined threshold. For example, diffAmp(n,f) is compared to a threshold. If the diffAmp(n,f) is greater than the threshold for a predefined number of consecutive frames, a near-field flag for the frequency is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n,f) falls below the threshold for a predefined number of consecutive frames.
  • a noise reduction/suppression module of the audio input device may suppress the incoming audio signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • a near_field_score(n,f) at each frequency of interest in each transformed frame may be computed from diffAmp(n,f) through the division by a predefined normalization factor, as given, for example, by Equation 5.
  • the near_field_score(n,f) values for the frequency-transformed audio channels may further be converted to probability values, f(u,f), each probability value corresponding to one of the frequencies, indicating the likelihood that the audio source is near-field.
  • the conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6.
  • different amounts of noise suppression may then be applied to different frequency components of the incoming audio signal during noise reduction.
  • This frequency domain approach is beneficial when a desired near-field audio signal and far-field background noise at different frequency bands are present at the same audio frame.
  • the amount suppression applied by the noise reduction/suppression module of a frequency domain audio input device may be made a function of the near_field_score(n,f), or alternatively, the near-field probability values, f(u,f).
  • each score or probability value is compared to a predefined threshold. If the score or f(u,f) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state.
  • the set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device for the particular frequency. This flag may stay on until the score or f(u,f) falls below the threshold for a predefined number of consecutive frames.
  • a noise reduction/suppression module of the frequency domain audio input device may suppress the frequency component of the audio signal when the corresponding near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise at that frequency. Or alternatively, the amount of suppression is made a function of the near_field_score(n,f) or the near-field probability values, f(u,f). Typically, as the score or probability decreases, stronger suppression is applied.
  • FIGS. 4-6 can be used individually or together in any suitable combination thereof to affect background noise suppression in an input audio device.
  • FIG. 7 is a process block diagram showing an exemplary process 400 for spectral noise reduction in a voice processing device.
  • the process 400 may be incorporated into an audio input device, such as the headset 16 of FIG. 1 .
  • Two or more audio sensors, such as microphones 402 , 404 transduce incoming audio into electrical signals.
  • the electrical signals can then be pre-conditioned, as described for example in block 104 , digitized using an A/D converter (not shown) into a digital audio format such as PCM, and then formed into a sequence of digital audio frames, which are then received by a microphone calibration module 406 .
  • A/D converter not shown
  • PCM digital audio format
  • the microphone calibration module 406 balances the gains of the microphones 402 , 404 to compensate for intrinsic differences in the sensitivities of the individual microphones 402 , 404 . After this correction, the signal levels from the microphones 402 , 404 should more accurately represent the signal strengths actually arriving at the microphones 402 , 404 .
  • the microphones 402 , 404 can alternatively be calibrated during manufacture of the audio input device to obtain the correct amplification factor. If pre-use estimation of the correction factor is not feasible, the microphone calibration module 406 can calibrate the microphones 406 through the use of, for example, an automatic gain matching mechanism.
  • the audio signals output from the microphone calibration module 406 are provided to an echo cancellation module 408 .
  • the echo cancellation module 408 can employ conventional echo cancellation algorithms to remove echo from the incoming audio signals.
  • the audio frames output from the echo cancellation module are then provided to a voice activity detection (VAD) module 410 , a spatial noise processing module 412 , and a proximity detection module 414 .
  • VAD voice activity detection
  • the VAD module 410 detects the presence or absence of human speech (voice) in the frames of the incoming audio signals, and outputs one of more flags corresponding to the audio signals, indicating whether voice is currently present in the incoming audio received by the audio input device.
  • the VAD algorithm used by the VAD module 410 can be, for example, any suitable VAD algorithm currently known to those skilled in the art. For example, an energy-based VAD algorithm may be used. This type of VAD algorithm computes signal energy and compares the signal energy level to a threshold to determine voice activity. A zero-crossing count type VAD algorithm may also be use. This type of VAD algorithm determines the presence of voice by counting the number of zero crossings per frame as an input audio signal fluctuates from positives to negatives and vice versa.
  • a certain threshold of zero-crossings may be used to indicate voice activity.
  • pitch estimation and detection algorithms can be used to detect voice activity, as well as VAD algorithms that compute formants and/or cepstral coefficient to indicate the presence of voice.
  • VAD algorithms compute formants and/or cepstral coefficient to indicate the presence of voice.
  • Other VAD algorithms or any suitable combination of the above VAD algorithms may alternatively/additionally be employed by the VAD module 410 .
  • the proximity detection module 414 may employ any of the proximity detection methods described in connection with FIGS. 4-6 herein, or any suitable combination thereof, to determine the proximity of an audio source producing sound received by the audio input device.
  • the proximity detection method used is the frequency domain method described with reference to FIG. 6 .
  • the proximity detection module 414 outputs a near-field flag for each audio frame. Using the preferred frequency-domain proximity detection method, a near-field flag is output for each frequency of interest, per each audio frame.
  • the spatial noise processing module 412 suppresses audio noise in the time domain based on the output flag(s) of the VAD module 410 .
  • the audio frames processed are preferably those received from a predefined one of the microphones, e.g., the microphone closer to a user's mouth. If, for example, the VAD flag(s) indicate that an incoming audio frame does not include voice, the spatial noise processing module 412 suppresses the audio frame, otherwise the module 412 passes the audio frame unchanged to a spectral noise reduction (SNR) module 416 .
  • SNR spectral noise reduction
  • the SNR module 416 suppresses background noise in the audio frame based on the VAD flag(s) and the near-field flag(s) received from the VAD module 410 and proximity detection module 414 , respectively. If at least one of the VAD flags indicates that voice is contained in a frame, then the SNR module 416 checks to determine whether a near-field flag from the proximity detection module 414 indicates that the audio source is within close proximity to audio input device. If a VAD flag is not set, then the SNR module 416 is receiving a partially suppressed audio frame from the spatial noise processing module 412 , and may perform further processing on the frame. If voice is present, the SNR module 416 transforms the audio frames into the frequency domain.
  • the transformation can be done using any of the transforms described in connection with block 306 of FIG. 6 .
  • the SNR module 416 may use the near-field flags from the proximity detection module 414 for each frequency of interest. If the near-field flag is set for a particular frequency, then that frequency component of the frame is not suppressed. If the near-field flag is not set, then the corresponding frequency component of the audio frame is suppressed. Or alternatively, the amount of suppression is linked to the near_field_score(n,f) or the near-field probability values, f(u,f). Typically, as the score or probability decreases, stronger suppression is applied.
  • the SNR module 416 transforms the processed audio frames back to the time domain using an inverse transform. The processed audio frames may then be output as a transmit (Tx) audio signal.
  • FIG. 8 is a more detailed process block diagram showing a process 600 of spectral noise reduction that can be incorporated into the SNR module 416 .
  • the incoming signal is divided into frames of 10 mS.
  • the spectrum of each frame is computed (blocks 606 , 608 ).
  • a decision is made to decide if the given frame is the desired signal or not. This decision may be a soft one and done independently on each frequency in the spectrum.
  • signal energy, ⁇ X(f) 2 , and noise energy, ⁇ N(f) 2 for each frequency f are updated (blocks 606 and 608 , respectively).
  • the signal of current frame is typically attenuated if the current frame contains mostly noise. This is done by multiplying the current frame signal by a gain factor, G(f) (block 614 ).
  • G(f) usually is a function of ⁇ X(f) 2 and ⁇ N(f) 2 with some parameters controlling the aggressiveness of attenuation. Below are two commonly used formulae to compute the gain factor:
  • G ⁇ ( f ) max ⁇ ( 1 - ⁇ ⁇ ⁇ ⁇ N ⁇ ( f ) 2 ⁇ X ⁇ ( f ) 2 , ⁇ ) Eq . ⁇ 8
  • G ⁇ ( f ) max ⁇ ( ⁇ X ⁇ ( f ) 2 ⁇ X ⁇ ( f ) 2 + ⁇ ⁇ ⁇ ⁇ N ⁇ ( f ) 2 , ⁇ ) Eq . ⁇ 9
  • ⁇ and ⁇ are the aggressiveness parameters. Increasing ⁇ would make the attenuation more aggressive while increasing ⁇ would make the attenuation less aggressiveness.
  • both the VAD 410 and proximity detection 414 may control the audio and noise signal spectrum estimation, blocks 606 and 608 , respectively. For example, when VAD is ON and the near-field flag is set, the input frame is used to update the audio signal spectrum, but not the noise spectrum.
  • the aggressiveness parameters are determined.
  • G(f) When signal is classified to be from far, G(f) is reduced by for example setting ⁇ to a high value and ⁇ to a low value.
  • G(f) When signal is classified to be from near, G(f) is increased by setting ⁇ to a low value and ⁇ to a high value.
  • the values ⁇ and ⁇ can be made as a function of the near_field_score or probability value. Typically, ⁇ would decrease with the near_field_score (probability) and ⁇ would increase with the near_field_score.
  • G(f) When other forms of G(f) are used, it can be modified similarly following the principle that G(f) be reduced when the score or probability decreases.
  • the final gain factor is obtained by smoothing G(f) over the frequency axis and time direction (block 612 ).
  • FIG. 9 is a block diagram showing certain components of the exemplary headset 16 .
  • the headset 16 is configured to perform audio source proximity estimation and noise suppression, as described herein.
  • the headset 16 includes a wireless interface 700 , microphones 402 , 404 , a processor 704 , a memory 706 , a microphone pre-processing module 708 , audio processing circuit 710 , and at least one headphone (HP) speaker 711 .
  • the components 700 - 710 can be coupled together using a digital bus 713 .
  • the processor 704 executes software or firmware that is stored in the memory 502 to provide the functionality of the blocks 406 - 416 , and/or the proximity detection methods described in connection with FIGS. 4-6 .
  • the processor 704 can be any suitable processor or controller, such as an ARM7, digital signal processor (DSP), one or more application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), discrete logic, or any suitable combination thereof.
  • the processor 704 may include a multi-processor architecture having a plurality of processors, such as a microprocessor-DSP combination.
  • a DSP can be programmed to provide at least some of the audio processing disclosed herein, such as the functions described for blocks 406 - 416 , and a microprocessor can be programmed to control overall operating of the audio input device.
  • the memory 502 and microprocessor 500 can be coupled together and communicate on a common bus, such as bus 713 .
  • the memory 502 and microprocessor 500 may be integrated onto a single chip, or they may be separate components or any suitable combination of integrated and discrete components.
  • other processor-memory architectures may alternatively be used, such as a multiprocessor and/or multi memory arrangement.
  • the memory 502 may be any suitable memory device for storing programming code and/or data contents, such as a flash memory, RAM, ROM, PROM or the like, or any suitable combination of the foregoing types of memories. Separate memory devices can also be included in the headset 16 .
  • the microphone preprocessor 708 is configured to process electronic signals received from the microphones 402 , 404 .
  • the microphone preprocessor 708 may include an analog-to-digital converter (ADC), amplifiers, a noise reduction and echo cancellation circuit (NREC) responsive to the microphones 402 , 404 .
  • ADC analog-to-digital converter
  • NREC noise reduction and echo cancellation circuit
  • the ADC converts analog signals from the microphones into digital signal that are then processed by the NREC.
  • the NREC is employed to reduce undesirable audio artifacts for communications and voice control applications.
  • the microphone preprocessor 708 may be implemented using commercially-available hardware, software, firmware, or any suitable combination thereof.
  • the audio processing circuit 710 includes digital circuitry and/or analog circuitry to additionally process the digitized audio signals that are being output to the headphone speaker(s) 711 after passing through the noise suppression processing of the headset 16 .
  • Digital-to-analog (D/A) conversion, filtering, amplification and other audio processing functions can be performed by the audio processing circuit 710 .
  • the headphone speaker(s) 711 are any suitable audio transducer(s) for converting the electronic signals output from the audio processing circuit 710 into sound to be heard by a user.
  • the wireless interface 700 permits the headset 16 to wirelessly communicate with other devices, for example, a cellular phone or the like.
  • the wireless interface 700 includes a transceiver 702 .
  • the wireless interface 700 provides two-way wireless communications with the handset and other devices, if needed.
  • the wireless interface 700 includes a commercially-available Bluetooth module that provides at least a Bluetooth core system consisting of a Bluetooth RF transceiver, baseband processor, protocol stack, as well as hardware and software interfaces for connecting the module to a controller, such as the processor 704 , in the headset 16 .
  • the transceiver 700 is preferably a Bluetooth transceiver.
  • the wireless interface 700 may be controlled by the headset controller (e.g., the processor 704 ).
  • An audio input device may have more then two audio sensors.
  • a near_field_score or probability value either being referred to as a proximity score, may be computed for each possible pair of audio sensors.
  • the individual pair scores can then be combined to give a final score. For example, if there are three audio sensors, namely 1, 2 and 3, three pair scores can be computed for the three possible pairs. These proximity scores would be score 12 for audio sensors 1 and 2 , score 13 for audio sensors 1 and 3 , and score 23 for audio sensors 2 and 3 .
  • a final score can be obtained by taking the average of the scores, or by taking the maximum of the scores, or alternatively, by taking the average of the two largest scores among the three, and ignoring the other score. And again, G(f) would be reduced when this combined near_field_score is low.
  • FIG. 10 shows graphs 800 , 802 , 804 depicting exemplary background noise suppression.
  • Graph 800 shows a trace of a raw input audio signal from an audio sensor.
  • the graphs 800 - 804 cover a first time interval 806 , when the audio signal comprises a mix of human speech and noise, and a second time interval 808 , when the audio signal includes only background noise, without any speech.
  • Graph 802 depicts the value of the near-field flag during the intervals 806 , 808 .
  • the near-field flag can be generated by any of the audio source proximity detection methods described herein in connection with FIGS. 4-6 .
  • the near-field flag is set during the first interval 806 , when a near-field source, such as a human speaking, is detected.
  • the flag is not set in the second interval 808 , when only background noise from a distant audio source is present.
  • the graph 804 shows the output audio signal after noise suppression is applied according to the near-field flag.
  • the near-field flag is set in interval 806
  • no or limited noise suppression is applied to the audio signal.
  • the background noise is reduced by, for example the SNR module 416 , to smaller levels, as shown in graph 804 .
  • the background noise is suppressed when the proximity information (e.g., near-field flag) corresponding to the audio signal is employed by a noise reduction module.
  • the principles disclosed herein may be applied to other devices, such as other wireless devices including cellular phones, PDAs, personal computers, stereo systems, video games and the like. Also, the principles disclosed herein may be applied to wired headsets, where the communications link between the headset and another device is a wire, rather than a wireless link. In addition, the various components and/or method steps/blocks may be implemented in arrangements other than those specifically disclosed without departing from the scope of the claims.
  • the functionality of the systems, devices, headsets and their respective components, as well as the method steps and blocks described herein may be implemented in hardware, software, firmware, or any suitable combination thereof.
  • the software/firmware may be a program having sets of instructions (e.g., code segments) executable by one or more digital circuits, such as microprocessors, DSPs, embedded controllers, or intellectual property (IP) cores. If implemented in software/firmware, the functions may be stored on or transmitted over as instructions or code on one or more computer-readable media.
  • Computer-readable medium includes both computer storage medium and communication medium, including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium may be any available medium that can be accessed by a computer.
  • such computer-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • DSL digital subscriber line
  • wireless technologies such as infrared, radio, and microwave
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable medium.

Abstract

Estimating the proximity of an audio source is accomplished by transforming audio signals from a plurality of sensors to frequency domain. The amplitudes of the transformed audio signals are then determined. The proximity of the audio source is determined based on a comparison of the frequency domain amplitudes. This estimation permits a device to differentiate between relatively distant audio sources and audio sources at close proximity to the device. The technique can be applied to mobile handsets, such as cellular phones or PDAs, hands-free headsets, and other audio input devices. Devices taking advantage of this “close proximity” detection are better able to suppress background noise and deliver an improved user experience.

Description

    CLAIM OF PRIORITY UNDER 35 U.S.C. §119
  • The present application for patent claims priority to Provisional Application No. 61/108,413 entitled “Estimation of Signal Proximity with a Sensor Array for Noise Reduction” filed Oct. 24, 2008, assigned to the same assignee hereof and which is incorporated herein by reference.
  • BACKGROUND
  • 1. Field
  • The present disclosure pertains generally to audio signal processing, and more specifically, to near-field audio signal detection and noise suppression.
  • 2. Background
  • Devices such as cellular phones, two-way radios and personal digital assistants (PDAs) that accept audio input are often used in adverse noise environments such as crowds, busy streets, restaurants, airports, vehicles or the like. Unwanted sounds generated from various sound sources within an audio environment, referred to as background noise, can emanate from differing locations within that audio environment. Common examples can include, but are not limited to, automobile noises or other voices within a crowded public place. Regardless of the source, the inability to distinguish a desired audio signal from background noise can result in audio input signals having decreased quality.
  • Strong background noise in these environments can obscure a user's speech and make it difficult to understand what the person is saying. In many cases, noise corrupts a speech signal and hence significantly degrades the quality of the desire audio signal. In cellular phones, for example, a person conversing in a noisy environment, like a crowded cafe or a busy train station, might not be able to converse properly as the noise corrupted speech perceived by a listener on the other end of a call is less intelligible. In all such cases of audio corruption, improving the quality of transmitted audio by suppressing background noise is desirable.
  • While noise filtering systems have been developed that attempt to remove background noise, these systems have not been able to remove all of the noise in all environments. Thus, there is a need for an improved technique of detecting and suppressing background noise.
  • SUMMARY
  • Disclosed herein is an improved technique for suppressing background noise received by an audio input device. The technique permits an audio input device to differentiate between relatively distant noise sources and sound originating at close proximity to the device. The technique can be applied to mobile handsets, such as cellular phones or PDAs, hands-free headsets, and other audio input devices. Audio input devices taking advantage of this “close proximity” detection are better able to suppress background noise and deliver an improved user experience.
  • According to an aspect, a method of determining the proximity of an audio source includes transforming audio signals from a plurality of sensors to frequency domain. The amplitudes of the transformed audio signals are then determined. The proximity of the audio source is determined based on a comparison of the amplitudes.
  • According to another aspect, a method of determining the proximity of an audio source includes receiving audio signals from a plurality of sensors and transforming the audio signals to frequency domain. The amplitudes of the transformed audio signals are determined at a plurality of frequencies. For each frequency, a differential signal is determined by comparing the spectral amplitudes from the different sensor at the frequency. This produces a plurality of differential signals. The proximity of the audio source is determined based on the differential signals.
  • According to another aspect, an apparatus includes a plurality of audio sensors outputting a plurality of audio signals in response to an audio source. A processor included in the apparatus is configured to transform the audio signals to frequency domain and to also determine the proximity of the audio source by comparing amplitudes of the transformed audio signals.
  • According to another aspect, an apparatus includes means for transforming a plurality of audio signals from a plurality of sensors to frequency domain; means for determining amplitudes of the transformed audio signals; means for comparing the amplitudes; and means for determining the proximity of the audio source based on the comparison of the amplitudes.
  • According to a further aspect, a computer-readable medium, embodying a set of instructions executable by one or more processors, includes code for transforming a plurality of audio signals from a plurality of sensors to frequency domain; code for determining amplitudes of the transformed audio signals; code for comparing the amplitudes; and code for determining the proximity of the audio source based on the comparison of the amplitudes.
  • Other aspects, features, and advantages will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional features, aspects, and advantages be included within this description and be protected by the accompanying claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • It is to be understood that the drawings are solely for purpose of illustration. Furthermore, the components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the techniques described herein. In the figures, like reference numerals designate corresponding parts throughout the different views.
  • FIG. 1 is a diagram of an exemplary audio environment including a near-field audio source and a far-field background audio source.
  • FIG. 2 is a diagram conceptually illustrating sound waves emitted from a near-field audio source.
  • FIG. 3 is a diagram conceptually illustrating sound waves emitted from a far-field audio source.
  • FIG. 4 is a flowchart illustrating a method of determining the proximity of an audio source by comparing signal amplitude from different audio sensors.
  • FIG. 5 is a flowchart illustrating a method of determining the proximity of an audio source using beam forming.
  • FIG. 6 is a flowchart illustrating a method of determining the proximity of an audio source by comparing spectral components of incoming audio.
  • FIG. 7 is a process block diagram showing a process of spectral noise reduction.
  • FIG. 8 is a more detailed process block diagram showing the process of spectral noise reduction.
  • FIG. 9 is a block diagram showing certain components of an exemplary headset device having audio source proximity estimation capability.
  • FIG. 10 shows graphs depicting exemplary background noise suppression.
  • DETAILED DESCRIPTION
  • The following detailed description, which references to and incorporates the drawings, describes and illustrates one or more specific embodiments. These embodiments, offered not to limit but only to exemplify and teach, are shown and described in sufficient detail to enable those skilled in the art to practice what is claimed. Thus, for the sake of brevity, the description may omit certain information known to those of skill in the art.
  • The word “exemplary” is used throughout this disclosure to mean “serving as an example, instance, or illustration.” Anything described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other approaches or features.
  • FIG. 1 is a diagram of an exemplary audio environment 10 including an audio input device such as a headset 16, a near-field audio source 15 such as the mouth of a user 12, and a far-field background audio source 14, such as a radio. The headset 16 is subjected to the near-field audio source 15 and far-field audio source 14. The far-field audio source 14 is located farther away from the audio input device than the near-field audio source 15. Each of the audio sources 14, 15 can be anything that emits sounds.
  • The headset 16 uses a sensor array to estimate the proximity of the audio sources 14, 15 and thereafter control a noise reduction module included in the headset 16 to suppress audio signals classified as far-field. In the example shown, the sensor array includes a first audio sensor 18 and second audio sensor 20 included in the headset 16. The audio sensors 18, 20 are spaced apart, for example, they may by 2-4 cm apart. The audio sensors 18, 20 can be microphones or any other suitable audio transducer responsive to sound input. At nominal wearing position, the first audio sensor 18 is closer to the user's mouth than the second audio sensor 20. As a result, audio signals originating from the user's mouth that are picked up by the first audio sensor 18 are louder than the same signal picked up by second audio sensor 20. The magnitude difference between the detected audio signals can range from 1 dB to 3 dB or more, depending on the relative distance from the mouth to the audio sensors 18, 20. In addition, the signal level difference is signal frequency dependent. Typically, higher signal frequency gives a higher signal level difference because of the diffraction effect. With respect to the far-field audio source 14, since the audio sensors 18, 20 are close to each other relative to the distance from the far-field source 14, they pickup the far-field audio at roughly the same amplitude level, irrespective of the direction of arrival of the far-field audio. As a result, the headset 16 monitors and compares the signal amplitude levels at the two audio sensor 18, 20 to estimate the audio source proximity.
  • The exemplary headset 16 includes an earpiece body 17 and at least one support 19, such as an ear hanger, for allowing the headset 16 to be comfortably worn by the user 12. A boom 21 can also be included in the headset 16 for placing the first audio sensor 18 closer to the user's mouth. The second audio sensor 20 can be included in the earpiece body 17, as shown. In the example shown, the headset 16 is a wireless headset, such as a Bluetooth headset, in which audio signals between one or more devices and the headset 16 are carried over one or more wireless radio frequency (RF) or infrared (IR) channels. If implemented as a Bluetooth wireless headset, the headset 16 can include components and functionality as defined by the Bluetooth Specification available at www.bluetooth.com. The Bluetooth Specification provides specific guidelines for providing wireless headset functionality. Alternatively, the headset 16 may be a wired headset, having a conductor carrying audio signals between a device and the headset 16.
  • Although the audio input device is illustrated as the headset 16, the audio source proximity estimation and noise suppression techniques and devices disclosed herein may also be included in other audio input devices, such as communication devices, e.g., phones, cellular phones, PDAs, video game, voice-activated remotes, live reporting systems, public address systems or the like. An audio input device is a device that receives sound.
  • FIG. 2 is a diagram conceptually illustrating the headset 16 subjected to sound waves emitted from the near-field audio source 15. Since the first audio sensor 18 is relatively close to the audio source 15 than the second audio sensor 20, the amplitude of the sound received at the sensors 18, 20 from the source 15 is measurably different. This difference in sensor amplitudes is exploited by the headset 16 to determine whether an audio source is near or distant to the headset 16.
  • FIG. 3 is a diagram conceptually illustrating the headset 16 subjected to sound waves emitted from a far-field audio source 14. Since the audio sensors 18, 20 are close to each other relative to the distance from the far-field audio source 14, they pickup the audio at roughly the same amplitude level, irrespective of the direction of arrival of the audio signal. As a result, a system which monitors the signal levels received by the two sensors 18, 20 is able to estimate the audio source proximity.
  • FIG. 4 is a flowchart 100 illustrating a method for estimating audio source proximity based on the audio signal levels at the sensor array elements, e.g., audio sensors 18, 20. In block 102, audio input signals are received from the audio sensors. Each sensor provides a separate audio signal, also referred to as an audio channel. Each audio signal represents sound received at a particular audio sensor. In block 104, the incoming audio signals are pre-conditioned. The pre-conditioning may include band-pass filtering each of the audio signals to reject interfering signals outside the frequency range of interest. For example, the audio signals may be filtered to remove signal outside the human audible range. The audio input signals may also be individually amplified to account for the difference in intrinsic sensitivity of the individual sensors. After this correction, the signal levels from the audio sensors should more accurately represent the signal strengths arriving at the audio sensors. The audio sensors can be calibrated during manufacturing of the audio input device to obtain the correct amplification factor. If pre-use estimation of the correction factor is not feasible, the audio sensors can be calibrated and the correction factor can also be estimated during operation of the audio input device through an automatic gain matching mechanism. The audio signals may be initially received from the sensors as analog signals and then converted into digital audio signals by an analog-to-digital (A/D) converter. The signal pre-conditioning described above can be performed on the analog audio signals, digitized audio signals, or in any suitable combination of the digital and analog processing domains.
  • Next, in block 106 the amplitude of each audio sensor signal is determined. Although different methods can be employed to determine the amplitudes of the audio signals, one method is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series. Then, the digitized incoming audio signals from each sensor are divided into audio frames of a predetermined length, e.g., 10 mS (milliseconds). Other suitable frame lengths may be used, such as 20 mS. The amplitude of each audio signal is then computed on a frame-by-frame basis. The amplitude of an audio signal in a frame is computed for each sensor as:

  • amp k(n)=Σt |x k(t)|p  Eq. 1
  • In Equation 1, amp(n) represents the audio signal amplitude of the nth frame, n is the frame index, xk(t) represents a digital audio sample at time t, k denotes the kth sensor, and t is the time index for the incoming audio signal samples. p is a pre-chosen parameter, that may have a value greater than one, for example, p may equal two. The summation is over all the audio samples in the frame. For each sensor, the audio signal amplitude ampk(n) may also be smoothed over successive frames using a smoothing function, such as:

  • amp sm k(n)=α·amp k(n)+(1−α)·amp sm k(n−1)  Eq. 2
  • In Equation 2, amp_smk(n) is the smoothed amplitude value of the nth frame, amp_smk(n−1) is the smoothed amplitude value of the (n−1)th frame, and α is a predefined weighting constant, preferably having a value less than one.
  • In addition, the smoothed frame amplitudes may optionally be converted to the log domain. The smoothed frame amplitudes may be converted to log domain according to Equation 3, below.

  • log amp sm k(n)=log(amp sm k(n))  Eq. 3
  • In Equation 3, log_amp_smk(n) is the log value of the smoothed amplitude value of the nth frame.
  • In block 108, the audio channel amplitudes are then compared on a frame-by-frame basis to find the difference between channel amplitudes. For example, for a sensor array with two elements (as shown in FIG. 1), the difference, diffAmp(n) can be computed as:

  • diffAmp(n)=log amp sm 1(n)−log amp sm 2(n)  Eq. 4
  • In Equation 4, diffAmp(n) represents the difference between the channel amplitudes for the nth frame, for a first audio channel and a second audio channel. Alternatively, the amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_smk(n) for the two channels.
  • In block 110, the proximity of the audio source is determined. To accomplish this, the amplitude difference between the audio channels is compared to a predefined threshold. For example, diffAmp(n) for Equation 4 is compared to a threshold. If diffAmp(n) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n) falls below the threshold for a predefined number of consecutive frames. A noise reduction/suppression module of the audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • As an alternative to the channel amplitude difference, a near_field_score for each frame may be computed from diffAmp(n) through the division by a predefined normalization factor, as given, for example, by Equation 5, below.
  • near_field _score ( n ) = diffAmp ( n ) norm_factor Eq . 5
  • The normalization factor, norm_factor, may be any suitable constant value or function.
  • The near_field_score(n) may further be converted to a probability value indicating the likelihood that the audio source is near-field. The conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6, below.
  • f ( u ) = 1 1 + exp ( - Au + B ) Eq . 6
  • In Equation 6, u is the near_field_score(n), f(u) represents the probability value, and A and B are constants. The amount suppression applied by the noise reduction/suppression module may then be made a function of the near_field_score(n), or alternatively, the near-field probability value, f(u). Using either the score or probability value f(u), the score or probability value is compared to a predefined threshold. If the score or f(u) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until the score or f(u) falls below the threshold for a predefined number of consecutive frames. Different threshold values can be used for the near_field_score and probability. A noise reduction/suppression module of the audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise. Or alternatively, the amount of suppression is made a function of the near_field_score(n) or the near-field probability values, f(u). Typically, as the score or probability decreases, stronger suppression is applied.
  • FIG. 5 is a flowchart 200 illustrating method of determining the proximity of an audio source using beamforming. The method begins by receiving multi-channel audio inputs from plural audio sensors and pre-conditioning the audio signals (blocks 102-104), as described above in connection with the method of FIG. 4.
  • Next, in block 206, beamforming is applied to the digitized audio channels to improve the accuracy of the proximity estimation. Instead of using the raw audio input signals, the audio input signals are passed through a beam-former to enhance audio signals received from a direction of interest, for example, from the frontal direction. The spatial selectivity of incoming audio is achieved by using adaptive or fixed receive beam patterns. Suitable beamforming techniques are readily available for application in the audio input devices disclosed herein. For example, the output of a beamformer, yk(t), is given by:

  • y k(t)=Σk′ W kk′
    Figure US20100103776A1-20100429-P00001
    x k′(t)  Eq. 7
  • In Equation 7,
    Figure US20100103776A1-20100429-P00001
    denotes a convolution function, Wkk′ is a weighting factor, k indicates the kth audio sensor and k′ indicates the k′th audio sensor and xk′(t) represents a digital audio sample from the k′th audio sensor at time t. The beamformed audio signals, yk(t), can then be processed in a manner similar to that described in blocks 106-110 of FIG. 4.
  • More specifically, in block 208, the amplitude of each beamformed audio sensor signal is determined. Although different methods can be employed to determine the amplitudes of the beamformed audio signals, one method is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series. Then, the digitized beamformed audio signals from each sensor are divided into of audio frames of a predetermined length, e.g., 10 mS. Other suitable frame lengths may be used, such as 20 mS. The amplitude of each beamformed audio signal is then computed on a frame-by-frame basis. The amplitude of a beamformed audio signal in a frame may be computed for each sensor using Equation 1, substituting yk(t) for xk(t).
  • For each sensor, the beamformed audio signal amplitude may also be smoothed over successive frames using a smoothing function, such as the smoothing function given by Equation 2. In addition, the smoothed frame amplitudes may optionally be converted to the log domain according to Equation 3.
  • In block 210, the beamformed audio channel amplitudes are then compared on a frame-by-frame basis to find the difference between channel amplitudes. For example, for a sensor array with two elements (as shown in FIG. 1), the beamformed amplitude difference can be determined according to Equation 4. Alternatively, the beamformed amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_smk(n) for the two beamformed channels.
  • In block 212, the proximity of the audio source is determined. To accomplish this, the amplitude difference between the beamformed audio channels is compared to a predefined threshold. For example, diffAmp(n) of Equation 4 is compared to a threshold. If the diffAmp(n) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n) falls below the threshold for a predefined number of consecutive frames. A noise reduction/suppression module of the audio input device may suppress the incoming audio signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • As an alternative to the beamformed channel amplitude difference, a near_field_score for each beamformed frame may be computed from diffAmp(n) through the division by a predefined normalization factor, as given, for example, by Equation 5.
  • The near_field_score(n) for the beamformed audio channels may further be converted to a probability value indicating the likelihood that the audio source is near-field. The conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6.
  • The amount suppression applied by the noise reduction/suppression module of a beamforming audio input device may then be made a function of the near_field_score(n), or alternatively, the near-field probability value. Using the score or probability value f(u), the score or probability value is compared to a predefined threshold. If the score or f(u) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until the score or f(u) falls below the threshold for a predefined number of consecutive frames. Different threshold values can be used for the score and probability value. A noise reduction/suppression module of the beamforming audio input device may suppress the signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise. Or alternatively, the amount of suppression is made a function of the near_field_score(n) or the near-field probability values, f(u). Typically, as the score or probability decreases, stronger suppression is applied.
  • FIG. 6 is a flowchart 300 illustrating a method of determining the proximity of an audio source by comparing frequency components of incoming audio. The method begins by receiving multi-channel audio inputs from plural audio sensors and pre-conditioning the audio signals (blocks 102-104), as described above in connection with the method of FIG. 4.
  • Next, in block 306, the sensor signals are transformed to the frequency domain. This transformation of each signal can be done using, for example, as fast Fourier transform (FFT), discrete Fourier transform (DFT), discrete cosine transform (DCT), wavelet transformation, or any other suitable transformation. Preferably, an FFT is used to convert the audio signals from the sensor to the frequency domain. One method for accomplishing the transformation is to digitize each of the audio signals into a conventional digital audio format, such as PCM (pulse code modulation) audio, where the audio samples are in a time series. Then, the digitized audio signals from each sensor are divided into a sequence of audio frames of a predetermined length, e.g., 10 mS (milliseconds). Other suitable frame lengths may be used, such as 20 mS. A frequency domain transform is then applied to the audio samples in each frame.
  • In block 308, at each frequency of interest, the amplitude of the transformed audio signals is determined. The frequency amplitudes of each transformed audio signal may be computed on a frame-by-frame basis, with the amplitude ampk(n,f) at a particular frequency, f, of the nth frame being obtained directly from the transform function. The range of frequencies of interest may be any desirable frequency spectrum, for example, the audible range of human hearing. Each frequency of interest in the range may be a particular frequency or bandwidth different from other frequencies or bandwidths of interest within the range. For example, the frequencies of interest may be spaced at regular intervals, e.g., 100 Hz, or spaced at non-regular intervals.
  • The frequency amplitudes may be smoothed according to Equation 2, at each frequency of interest to yield amp_smk(n,f), and optionally converted to the log domain using Equation 3 at each frequency of interest to yield log_amp_smk(n,f), computed for each frequency f.
  • At block 310, at each frequency of interest, the amplitudes (e.g., magnitudes) of the transformed sensor signals are compared to one another. A diffAmp(n,f), near_field_score(n,f) may be computed at each frequency, f, according to Equations 4 and 5, respectively. For example, for a sensor array with two elements (as shown in FIG. 1), the frequency domain amplitude difference can be determined according to Equation 4. Alternatively, the frequency domain amplitude difference can be computed without converting the amplitudes to the log domain by computing the difference between amp_smk(n,f) for the two transformed channels. A near-field flag may also be computed separately for each frequency.
  • In block 312, the proximity of the audio source is determined. To accomplish this, the amplitude difference between the frequency-transformed audio channels is compared to a predefined threshold. For example, diffAmp(n,f) is compared to a threshold. If the diffAmp(n,f) is greater than the threshold for a predefined number of consecutive frames, a near-field flag for the frequency is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device. This flag may stay on until diffAmp(n,f) falls below the threshold for a predefined number of consecutive frames. A noise reduction/suppression module of the audio input device may suppress the incoming audio signal when the near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise.
  • As an alternative to the frequency-transformed channel amplitude difference, a near_field_score(n,f) at each frequency of interest in each transformed frame may be computed from diffAmp(n,f) through the division by a predefined normalization factor, as given, for example, by Equation 5.
  • The near_field_score(n,f) values for the frequency-transformed audio channels may further be converted to probability values, f(u,f), each probability value corresponding to one of the frequencies, indicating the likelihood that the audio source is near-field. The conversion can be made using a non-linear function, such as a sigmoid function, for example, as given in Equation 6.
  • Using the method of FIG. 6, different amounts of noise suppression may then be applied to different frequency components of the incoming audio signal during noise reduction. This frequency domain approach is beneficial when a desired near-field audio signal and far-field background noise at different frequency bands are present at the same audio frame.
  • For example, the amount suppression applied by the noise reduction/suppression module of a frequency domain audio input device may be made a function of the near_field_score(n,f), or alternatively, the near-field probability values, f(u,f). Using the scores or probability values, each score or probability value is compared to a predefined threshold. If the score or f(u,f) is greater than the threshold for a predefined number of consecutive frames, a near-field flag is triggered to a set state. The set flag indicates that the audio sensors have detected an audio source that is in close proximity to the audio input device for the particular frequency. This flag may stay on until the score or f(u,f) falls below the threshold for a predefined number of consecutive frames. Different threshold values may be used for the scores and probability values. A noise reduction/suppression module of the frequency domain audio input device may suppress the frequency component of the audio signal when the corresponding near-field flag is off, as the incoming audio signal is classified as far-field and thus treated as background noise at that frequency. Or alternatively, the amount of suppression is made a function of the near_field_score(n,f) or the near-field probability values, f(u,f). Typically, as the score or probability decreases, stronger suppression is applied.
  • The methods described in FIGS. 4-6 can be used individually or together in any suitable combination thereof to affect background noise suppression in an input audio device.
  • FIG. 7 is a process block diagram showing an exemplary process 400 for spectral noise reduction in a voice processing device. The process 400 may be incorporated into an audio input device, such as the headset 16 of FIG. 1. Two or more audio sensors, such as microphones 402, 404, transduce incoming audio into electrical signals. The electrical signals can then be pre-conditioned, as described for example in block 104, digitized using an A/D converter (not shown) into a digital audio format such as PCM, and then formed into a sequence of digital audio frames, which are then received by a microphone calibration module 406. The microphone calibration module 406 balances the gains of the microphones 402, 404 to compensate for intrinsic differences in the sensitivities of the individual microphones 402, 404. After this correction, the signal levels from the microphones 402, 404 should more accurately represent the signal strengths actually arriving at the microphones 402, 404. The microphones 402, 404 can alternatively be calibrated during manufacture of the audio input device to obtain the correct amplification factor. If pre-use estimation of the correction factor is not feasible, the microphone calibration module 406 can calibrate the microphones 406 through the use of, for example, an automatic gain matching mechanism.
  • The audio signals output from the microphone calibration module 406 are provided to an echo cancellation module 408. The echo cancellation module 408 can employ conventional echo cancellation algorithms to remove echo from the incoming audio signals. The audio frames output from the echo cancellation module are then provided to a voice activity detection (VAD) module 410, a spatial noise processing module 412, and a proximity detection module 414.
  • The VAD module 410 detects the presence or absence of human speech (voice) in the frames of the incoming audio signals, and outputs one of more flags corresponding to the audio signals, indicating whether voice is currently present in the incoming audio received by the audio input device. The VAD algorithm used by the VAD module 410 can be, for example, any suitable VAD algorithm currently known to those skilled in the art. For example, an energy-based VAD algorithm may be used. This type of VAD algorithm computes signal energy and compares the signal energy level to a threshold to determine voice activity. A zero-crossing count type VAD algorithm may also be use. This type of VAD algorithm determines the presence of voice by counting the number of zero crossings per frame as an input audio signal fluctuates from positives to negatives and vice versa. A certain threshold of zero-crossings may be used to indicate voice activity. Also, pitch estimation and detection algorithms can be used to detect voice activity, as well as VAD algorithms that compute formants and/or cepstral coefficient to indicate the presence of voice. Other VAD algorithms or any suitable combination of the above VAD algorithms may alternatively/additionally be employed by the VAD module 410.
  • The proximity detection module 414 may employ any of the proximity detection methods described in connection with FIGS. 4-6 herein, or any suitable combination thereof, to determine the proximity of an audio source producing sound received by the audio input device. Preferably, the proximity detection method used is the frequency domain method described with reference to FIG. 6. The proximity detection module 414 outputs a near-field flag for each audio frame. Using the preferred frequency-domain proximity detection method, a near-field flag is output for each frequency of interest, per each audio frame.
  • The spatial noise processing module 412 suppresses audio noise in the time domain based on the output flag(s) of the VAD module 410. The audio frames processed are preferably those received from a predefined one of the microphones, e.g., the microphone closer to a user's mouth. If, for example, the VAD flag(s) indicate that an incoming audio frame does not include voice, the spatial noise processing module 412 suppresses the audio frame, otherwise the module 412 passes the audio frame unchanged to a spectral noise reduction (SNR) module 416.
  • The SNR module 416 suppresses background noise in the audio frame based on the VAD flag(s) and the near-field flag(s) received from the VAD module 410 and proximity detection module 414, respectively. If at least one of the VAD flags indicates that voice is contained in a frame, then the SNR module 416 checks to determine whether a near-field flag from the proximity detection module 414 indicates that the audio source is within close proximity to audio input device. If a VAD flag is not set, then the SNR module 416 is receiving a partially suppressed audio frame from the spatial noise processing module 412, and may perform further processing on the frame. If voice is present, the SNR module 416 transforms the audio frames into the frequency domain. The transformation can be done using any of the transforms described in connection with block 306 of FIG. 6. The SNR module 416 may use the near-field flags from the proximity detection module 414 for each frequency of interest. If the near-field flag is set for a particular frequency, then that frequency component of the frame is not suppressed. If the near-field flag is not set, then the corresponding frequency component of the audio frame is suppressed. Or alternatively, the amount of suppression is linked to the near_field_score(n,f) or the near-field probability values, f(u,f). Typically, as the score or probability decreases, stronger suppression is applied. After this processing takes place in the SNR module 416, the SNR module 416 transforms the processed audio frames back to the time domain using an inverse transform. The processed audio frames may then be output as a transmit (Tx) audio signal.
  • FIG. 8 is a more detailed process block diagram showing a process 600 of spectral noise reduction that can be incorporated into the SNR module 416.
  • Typically, in a spectral suppression process, the incoming signal is divided into frames of 10 mS. The spectrum of each frame is computed (blocks 606, 608). A decision is made to decide if the given frame is the desired signal or not. This decision may be a soft one and done independently on each frequency in the spectrum. At the end of this spectrum computation and signal/noise decision, signal energy, σX(f) 2, and noise energy, σN(f) 2, for each frequency f are updated ( blocks 606 and 608, respectively). The signal of current frame is typically attenuated if the current frame contains mostly noise. This is done by multiplying the current frame signal by a gain factor, G(f) (block 614). G(f) usually is a function of σX(f) 2 and σN(f) 2 with some parameters controlling the aggressiveness of attenuation. Below are two commonly used formulae to compute the gain factor:
  • G ( f ) = max ( 1 - α σ N ( f ) 2 σ X ( f ) 2 , ɛ ) Eq . 8 G ( f ) = max ( σ X ( f ) 2 σ X ( f ) 2 + α σ N ( f ) 2 , ɛ ) Eq . 9
  • Here α and ε are the aggressiveness parameters. Increasing α would make the attenuation more aggressive while increasing ε would make the attenuation less aggressiveness.
  • In a typical usage of an audio input device, desired voice would be coming from a close distance while signal from far away would usually be noise. Hence to reduce the background noise, it is desired to apply more attenuation when signal is detected to be coming from a distance. This can be done by making the G(f) a function of proximity detection output (block 414) and/or the VAD flag (block 410). In addition, both the VAD 410 and proximity detection 414 may control the audio and noise signal spectrum estimation, blocks 606 and 608, respectively. For example, when VAD is ON and the near-field flag is set, the input frame is used to update the audio signal spectrum, but not the noise spectrum.
  • In block 610, the aggressiveness parameters are determined. When signal is classified to be from far, G(f) is reduced by for example setting α to a high value and ε to a low value. When signal is classified to be from near, G(f) is increased by setting α to a low value and ε to a high value. The values α and ε can be made as a function of the near_field_score or probability value. Typically, α would decrease with the near_field_score (probability) and ε would increase with the near_field_score. When other forms of G(f) are used, it can be modified similarly following the principle that G(f) be reduced when the score or probability decreases. After the instantaneous G(f) is computed, the final gain factor is obtained by smoothing G(f) over the frequency axis and time direction (block 612).
  • FIG. 9 is a block diagram showing certain components of the exemplary headset 16. The headset 16 is configured to perform audio source proximity estimation and noise suppression, as described herein. The headset 16 includes a wireless interface 700, microphones 402, 404, a processor 704, a memory 706, a microphone pre-processing module 708, audio processing circuit 710, and at least one headphone (HP) speaker 711. The components 700-710 can be coupled together using a digital bus 713.
  • The processor 704 executes software or firmware that is stored in the memory 502 to provide the functionality of the blocks 406-416, and/or the proximity detection methods described in connection with FIGS. 4-6.
  • The processor 704 can be any suitable processor or controller, such as an ARM7, digital signal processor (DSP), one or more application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), discrete logic, or any suitable combination thereof. Alternatively, the processor 704 may include a multi-processor architecture having a plurality of processors, such as a microprocessor-DSP combination. In an exemplary multi-processor architecture, a DSP can be programmed to provide at least some of the audio processing disclosed herein, such as the functions described for blocks 406-416, and a microprocessor can be programmed to control overall operating of the audio input device.
  • The memory 502 and microprocessor 500 can be coupled together and communicate on a common bus, such as bus 713. The memory 502 and microprocessor 500 may be integrated onto a single chip, or they may be separate components or any suitable combination of integrated and discrete components. In addition, other processor-memory architectures may alternatively be used, such as a multiprocessor and/or multi memory arrangement.
  • The memory 502 may be any suitable memory device for storing programming code and/or data contents, such as a flash memory, RAM, ROM, PROM or the like, or any suitable combination of the foregoing types of memories. Separate memory devices can also be included in the headset 16.
  • The microphone preprocessor 708 is configured to process electronic signals received from the microphones 402, 404. The microphone preprocessor 708 may include an analog-to-digital converter (ADC), amplifiers, a noise reduction and echo cancellation circuit (NREC) responsive to the microphones 402, 404. The ADC converts analog signals from the microphones into digital signal that are then processed by the NREC. The NREC is employed to reduce undesirable audio artifacts for communications and voice control applications. The microphone preprocessor 708 may be implemented using commercially-available hardware, software, firmware, or any suitable combination thereof.
  • The audio processing circuit 710 includes digital circuitry and/or analog circuitry to additionally process the digitized audio signals that are being output to the headphone speaker(s) 711 after passing through the noise suppression processing of the headset 16. Digital-to-analog (D/A) conversion, filtering, amplification and other audio processing functions can be performed by the audio processing circuit 710.
  • The headphone speaker(s) 711 are any suitable audio transducer(s) for converting the electronic signals output from the audio processing circuit 710 into sound to be heard by a user.
  • The wireless interface 700 permits the headset 16 to wirelessly communicate with other devices, for example, a cellular phone or the like. The wireless interface 700 includes a transceiver 702. The wireless interface 700 provides two-way wireless communications with the handset and other devices, if needed. Preferably, the wireless interface 700 includes a commercially-available Bluetooth module that provides at least a Bluetooth core system consisting of a Bluetooth RF transceiver, baseband processor, protocol stack, as well as hardware and software interfaces for connecting the module to a controller, such as the processor 704, in the headset 16. Although any suitable wireless technology can be employed with the headset 16, the transceiver 700 is preferably a Bluetooth transceiver. The wireless interface 700 may be controlled by the headset controller (e.g., the processor 704).
  • An audio input device may have more then two audio sensors. In cases of three or more audio sensors being used, a near_field_score or probability value, either being referred to as a proximity score, may be computed for each possible pair of audio sensors. The individual pair scores can then be combined to give a final score. For example, if there are three audio sensors, namely 1, 2 and 3, three pair scores can be computed for the three possible pairs. These proximity scores would be score12 for audio sensors 1 and 2, score13 for audio sensors 1 and 3, and score23 for audio sensors 2 and 3. A final score can be obtained by taking the average of the scores, or by taking the maximum of the scores, or alternatively, by taking the average of the two largest scores among the three, and ignoring the other score. And again, G(f) would be reduced when this combined near_field_score is low.
  • An example of an audio signal processed in accordance with the techniques disclosed herein is shown in FIG. 10. FIG. 10 shows graphs 800, 802, 804 depicting exemplary background noise suppression. Graph 800 shows a trace of a raw input audio signal from an audio sensor. The graphs 800-804 cover a first time interval 806, when the audio signal comprises a mix of human speech and noise, and a second time interval 808, when the audio signal includes only background noise, without any speech. Graph 802 depicts the value of the near-field flag during the intervals 806, 808. The near-field flag can be generated by any of the audio source proximity detection methods described herein in connection with FIGS. 4-6. As shown in the example graph 802, the near-field flag is set during the first interval 806, when a near-field source, such as a human speaking, is detected. The flag is not set in the second interval 808, when only background noise from a distant audio source is present.
  • The graph 804 shows the output audio signal after noise suppression is applied according to the near-field flag. When the near-field flag is set in interval 806, no or limited noise suppression is applied to the audio signal. When the flag is not set in interval 808, the background noise, as shown in graph 800, is reduced by, for example the SNR module 416, to smaller levels, as shown in graph 804. In the last graph 804, the background noise is suppressed when the proximity information (e.g., near-field flag) corresponding to the audio signal is employed by a noise reduction module.
  • The principles disclosed herein may be applied to other devices, such as other wireless devices including cellular phones, PDAs, personal computers, stereo systems, video games and the like. Also, the principles disclosed herein may be applied to wired headsets, where the communications link between the headset and another device is a wire, rather than a wireless link. In addition, the various components and/or method steps/blocks may be implemented in arrangements other than those specifically disclosed without departing from the scope of the claims.
  • The functionality of the systems, devices, headsets and their respective components, as well as the method steps and blocks described herein may be implemented in hardware, software, firmware, or any suitable combination thereof. The software/firmware may be a program having sets of instructions (e.g., code segments) executable by one or more digital circuits, such as microprocessors, DSPs, embedded controllers, or intellectual property (IP) cores. If implemented in software/firmware, the functions may be stored on or transmitted over as instructions or code on one or more computer-readable media. Computer-readable medium includes both computer storage medium and communication medium, including any medium that facilitates transfer of a computer program from one place to another. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable medium.
  • Certain embodiments have been described. However, various modifications to these embodiments are possible, and the principles presented herein may thus be applied to other embodiments as well. Thus, other embodiments and modifications will occur readily to those of ordinary skill in the art in view of these teachings. Therefore, the following claims are intended to cover all such embodiments and modifications when viewed in conjunction with the above specification and accompanying drawings.

Claims (38)

1. A method of determining proximity of an audio source, comprising:
transforming a plurality of audio signals from a plurality of sensors to frequency domain;
determining amplitudes of the transformed audio signals;
comparing the amplitudes; and
determining the proximity of the audio source based on the comparison of the amplitudes.
2. The method of claim 1, further comprising:
beamforming each of the audio signals.
3. The method of claim 1, further comprising:
beamforming each of the transformed audio signals.
4. The method of claim 1, further comprising:
band-pass filtering each of the audio signals.
5. The method of claim 1, further comprising:
amplifying each of the audio signals by a respective correction factor.
6. The method of claim 1, wherein transforming includes applying an FFT, DCT, DFT, wavelet transformation, or any suitable combination of the foregoing transformations to the audio signals.
7. The method of claim 1, further comprising:
dividing the audio signals into a plurality of frames;
determining an amplitude of each transformed audio signal for each of the frames, whereby producing the amplitudes;
smoothing the amplitudes over the frames;
comparing the smoothed amplitudes of the transformed audio signals to one another to produce at least one differential signal; and
determining the proximity of the audio source based on the differential signal.
8. The method of claim 7, further comprising:
normalizing the differential signal.
9. The method of claim 7, further comprising:
converting the smoothed amplitudes to a log domain.
10. The method of claim 7, further comprising:
applying a non-linear function to the differential signal to produce a proximity score.
11. The method of claim 10, wherein the non-linear function is a sigmoid function.
12. A method of determining proximity of an audio source, comprising:
receiving a plurality of audio signals from a plurality of sensors;
transforming the audio signals to frequency domain;
determining amplitudes of the transformed audio signals at a plurality of frequencies;
for each of the frequencies, determining a differential signal by comparing the amplitudes corresponding to the frequency, whereby determining a plurality of differential signals; and
determining the proximity of the audio source based on the differential signals.
13. The method of claim 12, further comprising:
comparing the differential signal at each of the frequencies to a predetermined threshold; and
determining a near-field flag at each of the frequencies, based on the comparison of the differential signal and the predetermined threshold for the frequency.
14. The method of claim 12, further comprising:
dividing the audio signals into a plurality of frames; and
determining the amplitudes for each of the frames.
15. The method of claim 14, further comprising:
smoothing the amplitudes over the frames.
16. The method of claim 15, further comprising:
converting the smoothed amplitudes to a log domain.
17. The method of claim 12, further comprising:
normalizing the differential signals to determine a proximity score at each of the frequencies.
18. The method of claim 12, further comprising:
applying a non-linear function to the differential signals to produce a proximity score at each of the frequencies.
19. An apparatus, comprising:
a plurality of audio sensors outputting a plurality of audio signals in response to an audio source; and
a proximity detection module configured to transform the audio signals to frequency domain and to determine the proximity of the audio source by comparing amplitudes of the transformed audio signals.
20. The apparatus of claim 19, wherein the apparatus is a headset.
21. The apparatus of claim 20, wherein the headset is a wireless headset.
22. The apparatus of claim 19, further comprising:
a noise reduction/suppression module responsive to output from the proximity detection module.
23. The apparatus of claim 22, wherein the noise reduction/suppression module is configured to estimate an audio signal spectrum and a noise signal spectrum.
24. The apparatus of claim 19, further comprising:
a microphone calibration module.
25. The apparatus of claim 19, further comprising:
a voice activity detection (VAD) module.
26. The apparatus of claim 19, further comprising:
an echo cancellation module.
27. An apparatus, comprising:
means for transforming a plurality of audio signals from a plurality of sensors to frequency domain;
means for determining amplitudes of the transformed audio signals;
means for comparing the amplitudes; and
means for determining the proximity of the audio source based on the comparison of the amplitudes.
28. A computer-readable medium embodying a set of instructions executable by one or more processors, comprising:
code for transforming a plurality of audio signals from a plurality of sensors to frequency domain;
code for determining amplitudes of the transformed audio signals;
code for comparing the amplitudes; and
code for determining the proximity of the audio source based on the comparison of the amplitudes.
29. The computer-readable medium of claim 28, further comprising:
code for noise reduction/suppression.
30. The computer-readable medium of claim 29, further comprising:
code for estimating an audio signal spectrum and a noise signal spectrum.
31. The computer-readable medium of claim 28, further comprising:
code for voice activity detection.
32. A method of determining proximity of an audio source, comprising:
receiving a plurality of audio signals from a plurality of sensors;
beamforming the audio signals;
determining amplitudes of the beamformed audio signals;
comparing the amplitudes; and
determining the proximity of the audio source based on the comparison of the amplitudes.
33. The method of claim 32, wherein determining comprising:
computing a near field score; and
determining the proximity of the audio source based on the near field score.
34. The method of claim 32, wherein determining comprising:
computing a near field probability value; and
determining the proximity of the audio source based on the near field probability value.
35. The method of claim 32, further comprising:
amplifying each of the audio signals by a respective correction factor.
36. The method of claim 32, further comprising:
dividing the audio signals into a plurality of frames;
determining an amplitude of each beamformed audio signal for each of the frames, whereby producing the amplitudes;
smoothing the amplitudes over the frames;
comparing the smoothed amplitudes of the beamformed audio signals to one another to produce at least one differential signal; and
determining the proximity of the audio source based on the differential signal.
37. The method of claim 36, further comprising:
normalizing the differential signal.
38. The method of claim 36, further comprising:
applying a non-linear function to the differential signal to produce a proximity score.
US12/603,824 2008-10-24 2009-10-22 Audio source proximity estimation using sensor array for noise reduction Active 2030-10-15 US8218397B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US12/603,824 US8218397B2 (en) 2008-10-24 2009-10-22 Audio source proximity estimation using sensor array for noise reduction
JP2011533361A JP5551176B2 (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction
CN200980142292XA CN102197422B (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction
PCT/US2009/061807 WO2010048490A1 (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction
KR1020117011581A KR101260131B1 (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction
EP09748604A EP2353159B1 (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction
TW098136052A TW201042634A (en) 2008-10-24 2009-10-23 Audio source proximity estimation using sensor array for noise reduction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10841308P 2008-10-24 2008-10-24
US12/603,824 US8218397B2 (en) 2008-10-24 2009-10-22 Audio source proximity estimation using sensor array for noise reduction

Publications (2)

Publication Number Publication Date
US20100103776A1 true US20100103776A1 (en) 2010-04-29
US8218397B2 US8218397B2 (en) 2012-07-10

Family

ID=42117378

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/603,824 Active 2030-10-15 US8218397B2 (en) 2008-10-24 2009-10-22 Audio source proximity estimation using sensor array for noise reduction

Country Status (7)

Country Link
US (1) US8218397B2 (en)
EP (1) EP2353159B1 (en)
JP (1) JP5551176B2 (en)
KR (1) KR101260131B1 (en)
CN (1) CN102197422B (en)
TW (1) TW201042634A (en)
WO (1) WO2010048490A1 (en)

Cited By (221)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110075858A1 (en) * 2009-09-09 2011-03-31 Sony Corporation Information processing apparatus, information processing method, and program
US20110161074A1 (en) * 2009-12-29 2011-06-30 Apple Inc. Remote conferencing center
US20110222372A1 (en) * 2010-03-12 2011-09-15 University Of Maryland Method and system for dereverberation of signals propagating in reverberative environments
WO2011149969A2 (en) * 2010-05-27 2011-12-01 Ikoa Corporation Separating voice from noise using a network of proximity filters
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
US8218397B2 (en) * 2008-10-24 2012-07-10 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction
US20130013303A1 (en) * 2011-07-05 2013-01-10 Skype Limited Processing Audio Signals
US8452037B2 (en) 2010-05-05 2013-05-28 Apple Inc. Speaker clip
US20130158711A1 (en) * 2011-10-28 2013-06-20 University Of Washington Through Its Center For Commercialization Acoustic proximity sensing
US20130344924A1 (en) * 2012-06-21 2013-12-26 Michael Sorensen Headset System With A Headset Unit And A Detachable Wearing Device
US8644519B2 (en) 2010-09-30 2014-02-04 Apple Inc. Electronic devices with improved audio
US8666082B2 (en) 2010-11-16 2014-03-04 Lsi Corporation Utilizing information from a number of sensors to suppress acoustic noise through an audio processing system
US20140142927A1 (en) * 2012-11-21 2014-05-22 Harman International Industries Canada Ltd. System to control audio effect parameters of vocal signals
US8811648B2 (en) 2011-03-31 2014-08-19 Apple Inc. Moving magnet audio transducer
US8811601B2 (en) 2011-04-04 2014-08-19 Qualcomm Incorporated Integrated echo cancellation and noise suppression
US20140241702A1 (en) * 2013-02-25 2014-08-28 Ludger Solbach Dynamic audio perspective change during video playback
US8879761B2 (en) 2011-11-22 2014-11-04 Apple Inc. Orientation-based audio
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8903108B2 (en) 2011-12-06 2014-12-02 Apple Inc. Near-field null and beamforming
US8942410B2 (en) 2012-12-31 2015-01-27 Apple Inc. Magnetically biased electromagnet for audio applications
US8989428B2 (en) 2011-08-31 2015-03-24 Apple Inc. Acoustic systems in electronic devices
US9007871B2 (en) 2011-04-18 2015-04-14 Apple Inc. Passive proximity detection
US20150112671A1 (en) * 2013-10-18 2015-04-23 Plantronics, Inc. Headset Interview Mode
US9020163B2 (en) 2011-12-06 2015-04-28 Apple Inc. Near-field null and beamforming
US9031257B2 (en) 2011-09-30 2015-05-12 Skype Processing signals
US9042574B2 (en) 2011-09-30 2015-05-26 Skype Processing audio signals
US9042573B2 (en) 2011-09-30 2015-05-26 Skype Processing signals
US9042575B2 (en) 2011-12-08 2015-05-26 Skype Processing audio signals
US9111543B2 (en) 2011-11-25 2015-08-18 Skype Processing signals
US9183844B2 (en) * 2012-05-22 2015-11-10 Harris Corporation Near-field noise cancellation
US9210504B2 (en) 2011-11-18 2015-12-08 Skype Processing audio signals
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9343056B1 (en) 2010-04-27 2016-05-17 Knowles Electronics, Llc Wind noise detection and suppression
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9438992B2 (en) 2010-04-29 2016-09-06 Knowles Electronics, Llc Multi-microphone robust noise suppression
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
WO2016176329A1 (en) * 2015-04-28 2016-11-03 Dolby Laboratories Licensing Corporation Impulsive noise suppression
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502048B2 (en) 2010-04-19 2016-11-22 Knowles Electronics, Llc Adaptively reducing noise to limit speech distortion
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9525943B2 (en) 2014-11-24 2016-12-20 Apple Inc. Mechanically actuated panel acoustic system
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US20170078791A1 (en) * 2011-02-10 2017-03-16 Dolby International Ab Spatial adaptation in multi-microphone sound capture
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9858948B2 (en) 2015-09-29 2018-01-02 Apple Inc. Electronic equipment with ambient noise sensing input circuitry
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
WO2018101868A1 (en) * 2016-12-02 2018-06-07 Dirac Research Ab Processing of an audio input signal
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10148801B2 (en) 2015-10-20 2018-12-04 Huawei Technologies Co., Ltd. Method and apparatus for controlling multi-microphone noise-canceling sound pickup range of terminal
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10257240B2 (en) * 2014-11-18 2019-04-09 Cisco Technology, Inc. Online meeting computer with improved noise management logic
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10304475B1 (en) * 2017-08-14 2019-05-28 Amazon Technologies, Inc. Trigger word based beam selection
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10402151B2 (en) 2011-07-28 2019-09-03 Apple Inc. Devices with enhanced audio
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
CN111667842A (en) * 2020-06-10 2020-09-15 北京达佳互联信息技术有限公司 Audio signal processing method and device
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10915709B2 (en) * 2016-04-28 2021-02-09 Masoud Amri Voice-controlled system
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8965033B2 (en) 2012-08-31 2015-02-24 Sonos, Inc. Acoustic optimization
US9460590B2 (en) 2012-09-24 2016-10-04 Wal-Mart Stores, Inc. Determination of customer proximity to a register through use of sound and methods thereof
US9820033B2 (en) 2012-09-28 2017-11-14 Apple Inc. Speaker assembly
US8858271B2 (en) 2012-10-18 2014-10-14 Apple Inc. Speaker interconnect
US9357299B2 (en) 2012-11-16 2016-05-31 Apple Inc. Active protection for acoustic device
US20140272209A1 (en) 2013-03-13 2014-09-18 Apple Inc. Textile product having reduced density
CN104102337A (en) * 2013-04-08 2014-10-15 普雷森株式会社 Method and apparatus for determining proximity level between user and electronic device
RU2536343C2 (en) * 2013-04-15 2014-12-20 Открытое акционерное общество "Концерн "Созвездие" Method of picking up speech signal in presence of interference and apparatus therefor
CN105378826B (en) 2013-05-31 2019-06-11 诺基亚技术有限公司 Audio scene device
US9451354B2 (en) 2014-05-12 2016-09-20 Apple Inc. Liquid expulsion from an orifice
CN103987000A (en) * 2014-05-28 2014-08-13 深圳市金立通信设备有限公司 Audio frequency correction method and terminal
CN103987001A (en) * 2014-05-28 2014-08-13 深圳市金立通信设备有限公司 Audio correcting method and device
US20160057597A1 (en) * 2014-08-25 2016-02-25 Telecommunication Systems, Inc. Audio emergency beacon
GB2538853B (en) 2015-04-09 2018-09-19 Dolby Laboratories Licensing Corp Switching to a second audio interface between a computer apparatus and an audio apparatus
US9847093B2 (en) 2015-06-19 2017-12-19 Samsung Electronics Co., Ltd. Method and apparatus for processing speech signal
CN106328151B (en) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 ring noise eliminating system and application method thereof
CN106328154B (en) * 2015-06-30 2019-09-17 芋头科技(杭州)有限公司 A kind of front audio processing system
US9900698B2 (en) 2015-06-30 2018-02-20 Apple Inc. Graphene composite acoustic diaphragm
KR101731714B1 (en) * 2015-08-13 2017-04-28 중소기업은행 Method and headset for improving sound quality
US20180210704A1 (en) * 2017-01-26 2018-07-26 Wal-Mart Stores, Inc. Shopping Cart and Associated Systems and Methods
KR101893768B1 (en) * 2017-02-27 2018-09-04 주식회사 브이터치 Method, system and non-transitory computer-readable recording medium for providing speech recognition trigger
US10395667B2 (en) * 2017-05-12 2019-08-27 Cirrus Logic, Inc. Correlation-based near-field detector
EP3425923A1 (en) * 2017-07-06 2019-01-09 GN Audio A/S Headset with reduction of ambient noise
US11307661B2 (en) 2017-09-25 2022-04-19 Apple Inc. Electronic device with actuators for producing haptic and audio output along a device housing
CN108391190B (en) * 2018-01-30 2019-09-20 努比亚技术有限公司 A kind of noise-reduction method, earphone and computer readable storage medium
US10873798B1 (en) 2018-06-11 2020-12-22 Apple Inc. Detecting through-body inputs at a wearable audio device
US10757491B1 (en) 2018-06-11 2020-08-25 Apple Inc. Wearable interactive audio device
US11334032B2 (en) 2018-08-30 2022-05-17 Apple Inc. Electronic watch with barometric vent
US11561144B1 (en) 2018-09-27 2023-01-24 Apple Inc. Wearable electronic device with fluid-based pressure sensing
CN109841214B (en) * 2018-12-25 2021-06-01 百度在线网络技术(北京)有限公司 Voice wakeup processing method and device and storage medium
CN114399013A (en) 2019-04-17 2022-04-26 苹果公司 Wireless locatable tag
US11217262B2 (en) * 2019-11-18 2022-01-04 Google Llc Adaptive energy limiting for transient noise suppression

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917921A (en) * 1991-12-06 1999-06-29 Sony Corporation Noise reducing microphone apparatus
US6061456A (en) * 1992-10-29 2000-05-09 Andrea Electronics Corporation Noise cancellation apparatus
US6549630B1 (en) * 2000-02-04 2003-04-15 Plantronics, Inc. Signal expander with discrimination between close and distant acoustic source
US7221622B2 (en) * 2003-01-22 2007-05-22 Fujitsu Limited Speaker distance detection apparatus using microphone array and speech input/output apparatus
US20070154031A1 (en) * 2006-01-05 2007-07-05 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US20080019548A1 (en) * 2006-01-30 2008-01-24 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US20080152167A1 (en) * 2006-12-22 2008-06-26 Step Communications Corporation Near-field vector signal enhancement
US20080317260A1 (en) * 2007-06-21 2008-12-25 Short William R Sound discrimination method and apparatus
WO2010048490A1 (en) * 2008-10-24 2010-04-29 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63262577A (en) * 1987-04-20 1988-10-28 Sony Corp Microphone apparatus
JPH07135694A (en) * 1993-11-11 1995-05-23 Matsushita Electric Ind Co Ltd Microphone
JP2002218583A (en) * 2001-01-17 2002-08-02 Sony Corp Sound field synthesis arithmetic method and device
AU2003242921A1 (en) * 2002-07-01 2004-01-19 Koninklijke Philips Electronics N.V. Stationary spectral power dependent audio enhancement system
US7499686B2 (en) 2004-02-24 2009-03-03 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement on a mobile device
JP2005303574A (en) * 2004-04-09 2005-10-27 Toshiba Corp Voice recognition headset
US7970150B2 (en) 2005-04-29 2011-06-28 Lifesize Communications, Inc. Tracking talkers using virtual broadside scan and directed beams
EP1830348B1 (en) * 2006-03-01 2016-09-28 Nuance Communications, Inc. Hands-free system for speech signal acquisition
US20080175408A1 (en) 2007-01-20 2008-07-24 Shridhar Mukund Proximity filter
DE112007003716T5 (en) * 2007-11-26 2011-01-13 Fujitsu Ltd., Kawasaki Sound processing device, correction device, correction method and computer program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917921A (en) * 1991-12-06 1999-06-29 Sony Corporation Noise reducing microphone apparatus
US6061456A (en) * 1992-10-29 2000-05-09 Andrea Electronics Corporation Noise cancellation apparatus
US6549630B1 (en) * 2000-02-04 2003-04-15 Plantronics, Inc. Signal expander with discrimination between close and distant acoustic source
US7221622B2 (en) * 2003-01-22 2007-05-22 Fujitsu Limited Speaker distance detection apparatus using microphone array and speech input/output apparatus
US20070154031A1 (en) * 2006-01-05 2007-07-05 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US20080019548A1 (en) * 2006-01-30 2008-01-24 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US20080152167A1 (en) * 2006-12-22 2008-06-26 Step Communications Corporation Near-field vector signal enhancement
US20080317260A1 (en) * 2007-06-21 2008-12-25 Short William R Sound discrimination method and apparatus
WO2010048490A1 (en) * 2008-10-24 2010-04-29 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction

Cited By (325)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US11928604B2 (en) 2005-09-08 2024-03-12 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US11012942B2 (en) 2007-04-03 2021-05-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8218397B2 (en) * 2008-10-24 2012-07-10 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US20110075858A1 (en) * 2009-09-09 2011-03-31 Sony Corporation Information processing apparatus, information processing method, and program
US8848941B2 (en) * 2009-09-09 2014-09-30 Sony Corporation Information processing apparatus, information processing method, and program
US20110161074A1 (en) * 2009-12-29 2011-06-30 Apple Inc. Remote conferencing center
US8560309B2 (en) 2009-12-29 2013-10-15 Apple Inc. Remote conferencing center
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10692504B2 (en) 2010-02-25 2020-06-23 Apple Inc. User profiling for voice input processing
US8988970B2 (en) * 2010-03-12 2015-03-24 University Of Maryland Method and system for dereverberation of signals propagating in reverberative environments
US20110222372A1 (en) * 2010-03-12 2011-09-15 University Of Maryland Method and system for dereverberation of signals propagating in reverberative environments
US9502048B2 (en) 2010-04-19 2016-11-22 Knowles Electronics, Llc Adaptively reducing noise to limit speech distortion
US9343056B1 (en) 2010-04-27 2016-05-17 Knowles Electronics, Llc Wind noise detection and suppression
US9438992B2 (en) 2010-04-29 2016-09-06 Knowles Electronics, Llc Multi-microphone robust noise suppression
US8452037B2 (en) 2010-05-05 2013-05-28 Apple Inc. Speaker clip
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
WO2011149969A2 (en) * 2010-05-27 2011-12-01 Ikoa Corporation Separating voice from noise using a network of proximity filters
WO2011149969A3 (en) * 2010-05-27 2012-04-05 Ikoa Corporation Separating voice from noise using a network of proximity filters
US9431023B2 (en) 2010-07-12 2016-08-30 Knowles Electronics, Llc Monaural noise suppression based on computational auditory scene analysis
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
US8447596B2 (en) 2010-07-12 2013-05-21 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
US8644519B2 (en) 2010-09-30 2014-02-04 Apple Inc. Electronic devices with improved audio
US8666082B2 (en) 2010-11-16 2014-03-04 Lsi Corporation Utilizing information from a number of sensors to suppress acoustic noise through an audio processing system
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10154342B2 (en) * 2011-02-10 2018-12-11 Dolby International Ab Spatial adaptation in multi-microphone sound capture
US20170078791A1 (en) * 2011-02-10 2017-03-16 Dolby International Ab Spatial adaptation in multi-microphone sound capture
US10417405B2 (en) 2011-03-21 2019-09-17 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US8811648B2 (en) 2011-03-31 2014-08-19 Apple Inc. Moving magnet audio transducer
US8811601B2 (en) 2011-04-04 2014-08-19 Qualcomm Incorporated Integrated echo cancellation and noise suppression
US9007871B2 (en) 2011-04-18 2015-04-14 Apple Inc. Passive proximity detection
US9674625B2 (en) 2011-04-18 2017-06-06 Apple Inc. Passive proximity detection
US11350253B2 (en) 2011-06-03 2022-05-31 Apple Inc. Active transport based notifications
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US9269367B2 (en) * 2011-07-05 2016-02-23 Skype Limited Processing audio signals during a communication event
US20130013303A1 (en) * 2011-07-05 2013-01-10 Skype Limited Processing Audio Signals
US10402151B2 (en) 2011-07-28 2019-09-03 Apple Inc. Devices with enhanced audio
US10771742B1 (en) 2011-07-28 2020-09-08 Apple Inc. Devices with enhanced audio
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US8989428B2 (en) 2011-08-31 2015-03-24 Apple Inc. Acoustic systems in electronic devices
US9042574B2 (en) 2011-09-30 2015-05-26 Skype Processing audio signals
US9031257B2 (en) 2011-09-30 2015-05-12 Skype Processing signals
US9042573B2 (en) 2011-09-30 2015-05-26 Skype Processing signals
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US20130158711A1 (en) * 2011-10-28 2013-06-20 University Of Washington Through Its Center For Commercialization Acoustic proximity sensing
US9199380B2 (en) * 2011-10-28 2015-12-01 University Of Washington Through Its Center For Commercialization Acoustic proximity sensing
US9210504B2 (en) 2011-11-18 2015-12-08 Skype Processing audio signals
US8879761B2 (en) 2011-11-22 2014-11-04 Apple Inc. Orientation-based audio
US10284951B2 (en) 2011-11-22 2019-05-07 Apple Inc. Orientation-based audio
US9111543B2 (en) 2011-11-25 2015-08-18 Skype Processing signals
US8903108B2 (en) 2011-12-06 2014-12-02 Apple Inc. Near-field null and beamforming
US9020163B2 (en) 2011-12-06 2015-04-28 Apple Inc. Near-field null and beamforming
US9042575B2 (en) 2011-12-08 2015-05-26 Skype Processing audio signals
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9183844B2 (en) * 2012-05-22 2015-11-10 Harris Corporation Near-field noise cancellation
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US20130344924A1 (en) * 2012-06-21 2013-12-26 Michael Sorensen Headset System With A Headset Unit And A Detachable Wearing Device
US8818467B2 (en) * 2012-06-21 2014-08-26 Gn Netcom A/S Headset system with a headset unit and a detachable wearing device
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9424859B2 (en) * 2012-11-21 2016-08-23 Harman International Industries Canada Ltd. System to control audio effect parameters of vocal signals
US20140142927A1 (en) * 2012-11-21 2014-05-22 Harman International Industries Canada Ltd. System to control audio effect parameters of vocal signals
US8942410B2 (en) 2012-12-31 2015-01-27 Apple Inc. Magnetically biased electromagnet for audio applications
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US20140241702A1 (en) * 2013-02-25 2014-08-28 Ludger Solbach Dynamic audio perspective change during video playback
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US11048473B2 (en) 2013-06-09 2021-06-29 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10769385B2 (en) 2013-06-09 2020-09-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9392353B2 (en) * 2013-10-18 2016-07-12 Plantronics, Inc. Headset interview mode
US20150112671A1 (en) * 2013-10-18 2015-04-23 Plantronics, Inc. Headset Interview Mode
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10657966B2 (en) 2014-05-30 2020-05-19 Apple Inc. Better resolution when referencing to concepts
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10714095B2 (en) 2014-05-30 2020-07-14 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10453443B2 (en) 2014-09-30 2019-10-22 Apple Inc. Providing an indication of the suitability of speech recognition
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10257240B2 (en) * 2014-11-18 2019-04-09 Cisco Technology, Inc. Online meeting computer with improved noise management logic
US10362403B2 (en) 2014-11-24 2019-07-23 Apple Inc. Mechanically actuated panel acoustic system
US9525943B2 (en) 2014-11-24 2016-12-20 Apple Inc. Mechanically actuated panel acoustic system
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US10930282B2 (en) 2015-03-08 2021-02-23 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
WO2016176329A1 (en) * 2015-04-28 2016-11-03 Dolby Laboratories Licensing Corporation Impulsive noise suppression
US10319391B2 (en) 2015-04-28 2019-06-11 Dolby Laboratories Licensing Corporation Impulsive noise suppression
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11127397B2 (en) 2015-05-27 2021-09-21 Apple Inc. Device voice control
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10681212B2 (en) 2015-06-05 2020-06-09 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US9858948B2 (en) 2015-09-29 2018-01-02 Apple Inc. Electronic equipment with ambient noise sensing input circuitry
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10148801B2 (en) 2015-10-20 2018-12-04 Huawei Technologies Co., Ltd. Method and apparatus for controlling multi-microphone noise-canceling sound pickup range of terminal
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10354652B2 (en) 2015-12-02 2019-07-16 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10942703B2 (en) 2015-12-23 2021-03-09 Apple Inc. Proactive assistance based on dialog communication between devices
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10915709B2 (en) * 2016-04-28 2021-02-09 Masoud Amri Voice-controlled system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10580409B2 (en) 2016-06-11 2020-03-03 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10942702B2 (en) 2016-06-11 2021-03-09 Apple Inc. Intelligent device arbitration and control
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
WO2018101868A1 (en) * 2016-12-02 2018-06-07 Dirac Research Ab Processing of an audio input signal
US10638227B2 (en) 2016-12-02 2020-04-28 Dirac Research Ab Processing of an audio input signal
CN110062945A (en) * 2016-12-02 2019-07-26 迪拉克研究公司 The processing of audio input signal
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10847142B2 (en) 2017-05-11 2020-11-24 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10909171B2 (en) 2017-05-16 2021-02-02 Apple Inc. Intelligent automated assistant for media exploration
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10304475B1 (en) * 2017-08-14 2019-05-28 Amazon Technologies, Inc. Trigger word based beam selection
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10944859B2 (en) 2018-06-03 2021-03-09 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11360739B2 (en) 2019-05-31 2022-06-14 Apple Inc. User activity shortcut suggestions
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
CN111667842A (en) * 2020-06-10 2020-09-15 北京达佳互联信息技术有限公司 Audio signal processing method and device

Also Published As

Publication number Publication date
EP2353159B1 (en) 2013-03-27
US8218397B2 (en) 2012-07-10
CN102197422A (en) 2011-09-21
CN102197422B (en) 2013-12-18
WO2010048490A1 (en) 2010-04-29
KR20110090940A (en) 2011-08-10
KR101260131B1 (en) 2013-05-02
JP2012507046A (en) 2012-03-22
EP2353159A1 (en) 2011-08-10
TW201042634A (en) 2010-12-01
JP5551176B2 (en) 2014-07-16

Similar Documents

Publication Publication Date Title
US8218397B2 (en) Audio source proximity estimation using sensor array for noise reduction
US10229698B1 (en) Playback reference signal-assisted multi-microphone interference canceler
US8391507B2 (en) Systems, methods, and apparatus for detection of uncorrelated component
US8898058B2 (en) Systems, methods, and apparatus for voice activity detection
US8620672B2 (en) Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal
EP2277323B1 (en) Speech enhancement using multiple microphones on multiple devices
KR101470262B1 (en) Systems, methods, apparatus, and computer-readable media for multi-microphone location-selective processing
US8693704B2 (en) Method and apparatus for canceling noise from mixed sound
KR100860805B1 (en) Voice enhancement system
US20190272842A1 (en) Speech enhancement for an electronic device
US20120263317A1 (en) Systems, methods, apparatus, and computer readable media for equalization
US20110058676A1 (en) Systems, methods, apparatus, and computer-readable media for dereverberation of multichannel signal
US20070033020A1 (en) Estimation of noise in a speech signal
US20110293103A1 (en) Systems, methods, devices, apparatus, and computer program products for audio equalization
US9532149B2 (en) Method of signal processing in a hearing aid system and a hearing aid system
KR20080092404A (en) System and method for utilizing inter-microphone level differences for speech enhancement
US20140365212A1 (en) Receiver Intelligibility Enhancement System
JP5903921B2 (en) Noise reduction device, voice input device, wireless communication device, noise reduction method, and noise reduction program
EP3764360B1 (en) Signal processing methods and systems for beam forming with improved signal to noise ratio
EP3764660B1 (en) Signal processing methods and systems for adaptive beam forming

Legal Events

Date Code Title Description
AS Assignment

Owner name: QUALCOMM INCORPORATED,CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHAN, KWOKLEUNG;REEL/FRAME:023919/0376

Effective date: 20100125

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHAN, KWOKLEUNG;REEL/FRAME:023919/0376

Effective date: 20100125

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY