US20020198705A1 - Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors - Google Patents

Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors Download PDF

Info

Publication number
US20020198705A1
US20020198705A1 US10/159,770 US15977002A US2002198705A1 US 20020198705 A1 US20020198705 A1 US 20020198705A1 US 15977002 A US15977002 A US 15977002A US 2002198705 A1 US2002198705 A1 US 2002198705A1
Authority
US
United States
Prior art keywords
acoustic signals
speech
acoustic
noise
difference parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/159,770
Other versions
US7246058B2 (en
Inventor
Gregory Burnett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ji Audio Holdings LLC
BlackRock Advisors LLC
Jawbone Innovations LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US10/159,770 priority Critical patent/US7246058B2/en
Application filed by Individual filed Critical Individual
Publication of US20020198705A1 publication Critical patent/US20020198705A1/en
Assigned to ALIPHCOM, INC reassignment ALIPHCOM, INC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BURNETT, GREGORY C.
Priority to US11/805,987 priority patent/US20070233479A1/en
Application granted granted Critical
Publication of US7246058B2 publication Critical patent/US7246058B2/en
Priority to US13/431,725 priority patent/US10225649B2/en
Priority to US13/436,765 priority patent/US8682018B2/en
Priority to US13/753,441 priority patent/US8942383B2/en
Priority to US13/919,919 priority patent/US20140372113A1/en
Assigned to DBD CREDIT FUNDING LLC, AS ADMINISTRATIVE AGENT reassignment DBD CREDIT FUNDING LLC, AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: ALIPH, INC., ALIPHCOM, BODYMEDIA, INC., MACGYVER ACQUISITION LLC
Assigned to WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT reassignment WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT PATENT SECURITY AGREEMENT Assignors: ALIPH, INC., ALIPHCOM, BODYMEDIA, INC., MACGYVER ACQUISITION LLC
Priority to US14/224,868 priority patent/US20140286519A1/en
Assigned to SILVER LAKE WATERMAN FUND, L.P., AS SUCCESSOR AGENT reassignment SILVER LAKE WATERMAN FUND, L.P., AS SUCCESSOR AGENT NOTICE OF SUBSTITUTION OF ADMINISTRATIVE AGENT IN PATENTS Assignors: DBD CREDIT FUNDING LLC, AS RESIGNING AGENT
Assigned to BLACKROCK ADVISORS, LLC reassignment BLACKROCK ADVISORS, LLC SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPH, INC., ALIPHCOM, BODYMEDIA, INC., MACGYVER ACQUISITION LLC, PROJECT PARIS ACQUISITION LLC
Assigned to BODYMEDIA, INC., ALIPHCOM, ALIPH, INC., MACGYVER ACQUISITION LLC, PROJECT PARIS ACQUISITION LLC reassignment BODYMEDIA, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT
Assigned to BODYMEDIA, INC., ALIPHCOM, ALIPH, INC., MACGYVER ACQUISITION LLC, PROJECT PARIS ACQUISITION, LLC reassignment BODYMEDIA, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT
Assigned to ALIPHCOM reassignment ALIPHCOM CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY NAME PREVIOUSLY RECORDED AT REEL: 013855 FRAME: 906. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: BURNETT, GREGORY C
Assigned to BLACKROCK ADVISORS, LLC reassignment BLACKROCK ADVISORS, LLC SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPH, INC., ALIPHCOM, BODYMEDIA, INC., MACGYVER ACQUISITION LLC, PROJECT PARIS ACQUISITION LLC
Assigned to BLACKROCK ADVISORS, LLC reassignment BLACKROCK ADVISORS, LLC CORRECTIVE ASSIGNMENT TO CORRECT THE APPLICATION NO. 13870843 PREVIOUSLY RECORDED ON REEL 036500 FRAME 0173. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY INTEREST. Assignors: ALIPH, INC., ALIPHCOM, BODYMEDIA, INC., MACGYVER ACQUISITION, LLC, PROJECT PARIS ACQUISITION LLC
Assigned to ALIPHCOM, LLC reassignment ALIPHCOM, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPHCOM DBA JAWBONE
Assigned to JAWB ACQUISITION, LLC reassignment JAWB ACQUISITION, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPHCOM, LLC
Assigned to ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC reassignment ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPHCOM
Assigned to JAWB ACQUISITION LLC reassignment JAWB ACQUISITION LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC
Assigned to BODYMEDIA, INC., ALIPH, INC., MACGYVER ACQUISITION LLC, PROJECT PARIS ACQUISITION LLC, ALIPHCOM reassignment BODYMEDIA, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST. Assignors: SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT
Assigned to ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC reassignment ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: BLACKROCK ADVISORS, LLC
Assigned to JI AUDIO HOLDINGS LLC reassignment JI AUDIO HOLDINGS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JAWB ACQUISITION LLC
Assigned to JAWBONE INNOVATIONS, LLC reassignment JAWBONE INNOVATIONS, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JI AUDIO HOLDINGS LLC
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • the disclosed embodiments relate to the processing of speech signals.
  • Typical methods for classifying voiced and unvoiced speech have relied mainly on the acoustic content of microphone data, which is plagued by problems with noise and the corresponding uncertainties in signal content. This is especially problematic now with the proliferation of portable communication devices like cellular telephones and personal digital assistants because, in many cases, the quality of service provided by the device depends on the quality of the voice services offered by the device.
  • There are methods known in the art for suppressing the noise present in the speech signals demonstrate performance shortcomings that include unusually long computing time, requirements for cumbersome hardware to perform the signal processing, and distorting the signals of interest.
  • FIG. 1 is a block diagram of a NAVSAD system, under an embodiment.
  • FIG. 2 is a block diagram of a PSAD system, under an embodiment.
  • FIG. 3 is a block diagram of a denoising system, referred to herein as the Pathfinder system, under an embodiment.
  • FIG. 4 is a flow diagram of a detection algorithm for use in detecting voiced and unvoiced speech, under an embodiment.
  • FIG. 5A plots the received GEMS signal for an utterance along with the mean correlation between the GEMS signal and the Mic 1 signal and the threshold for voiced speech detection.
  • FIG. 5B plots the received GEMS signal for an utterance along with the standard deviation of the GEMS signal and the threshold for voiced speech detection.
  • FIG. 6 plots voiced speech detected from an utterance along with the GEMS signal and the acoustic noise.
  • FIG. 7 is a microphone array for use under an embodiment of the PSAD system.
  • FIG. 8 is a plot of ⁇ M versus d 1 for several ⁇ d values, under an embodiment.
  • FIG. 9 shows a plot of the gain parameter as the sum of the absolute values of H 1 (z) and the acoustic data or audio from microphone 1 .
  • FIG. 10 is an alternative plot of acoustic data presented in FIG. 9.
  • FIG. 1 is a block diagram of a NAVSAD system 100 , under an embodiment.
  • the NAVSAD system couples microphones 10 and sensors 20 to at least one processor 30 .
  • the sensors 20 of an embodiment include voicing activity detectors or non-acoustic sensors.
  • the processor 30 controls subsystems including a detection subsystem 50 , referred to herein as a detection algorithm, and a denoising subsystem 40 . Operation of the denoising subsystem 40 is described in detail in the Related Applications.
  • the NAVSAD system works extremely well in any background acoustic noise environment.
  • FIG. 2 is a block diagram of a PSAD system 200 , under an embodiment.
  • the PSAD system couples microphones 10 to at least one processor 30 .
  • the processor 30 includes a detection subsystem 50 , referred to herein as a detection algorithm, and a denoising subsystem 40 .
  • the PSAD system is highly sensitive in low acoustic noise environments and relatively insensitive in high acoustic noise environments.
  • the PSAD can operate independently or as a backup to the NAVSAD, detecting voiced speech if the NAVSAD fails.
  • the detection subsystems 50 and denoising subsystems 40 of both the NAVSAD and PSAD systems of an embodiment are algorithms controlled by the processor 30 , but are not so limited.
  • Alternative embodiments of the NAVSAD and PSAD systems can include detection subsystems 50 and/or denoising subsystems 40 that comprise additional hardware, firmware, software, and/or combinations of hardware, firmware, and software.
  • functions of the detection subsystems 50 and denoising subsystems 40 may be distributed across numerous components of the NAVSAD and PSAD systems.
  • FIG. 3 is a block diagram of a denoising subsystem 300 , referred to herein as the Pathfinder system, under an embodiment.
  • the Pathfinder system is briefly described below, and is described in detail in the Related Applications. Two microphones Mic 1 and Mic 2 are used in the Pathfinder system, and Mic 1 is considered the “signal” microphone.
  • the Pathfinder system 300 is equivalent to the NAVSAD system 100 when the voicing activity detector (VAD) 320 is a non-acoustic voicing sensor 20 and the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40 .
  • the Pathfinder system 300 is equivalent to the PSAD system 200 in the absence of the VAD 320 , and when the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40 .
  • the NAVSAD and PSAD systems support a two-level commercial approach in which (i) a relatively less expensive PSAD system supports an acoustic approach that functions in most low- to medium-noise environments, and (ii) a NAVSAD system adds a non-acoustic sensor to enable detection of voiced speech in any environment.
  • Unvoiced speech is normally not detected using the sensor, as it normally does not sufficiently vibrate human tissue.
  • detecting the unvoiced speech is not as important, as it is normally very low in energy and easily washed out by the noise. Therefore in high noise environments the unvoiced speech is unlikely to affect the voiced speech denoising.
  • Unvoiced speech information is most important in the presence of little to no noise and, therefore, the unvoiced detection should be highly sensitive in low noise situations, and insensitive in high noise situations. This is not easily accomplished, and comparable acoustic unvoiced detectors known in the art are incapable of operating under these environmental constraints.
  • the NAVSAD and PSAD systems include an array algorithm for speech detection that uses the difference in frequency content between two microphones to calculate a relationship between the signals of the two microphones. This is in contrast to conventional arrays that attempt to use the time/phase difference of each microphone to remove the noise outside of an “area of sensitivity”.
  • the methods described herein provide a significant advantage, as they do not require a specific orientation of the array with respect to the signal.
  • the systems described herein are sensitive to noise of every type and every orientation, unlike conventional arrays that depend on specific noise orientations. Consequently, the frequency-based arrays presented herein are unique as they depend only on the relative orientation of the two microphones themselves with no dependence on the orientation of the noise and signal with respect to the microphones. This results in a robust signal processing system with respect to the type of noise, microphones, and orientation between the noise/signal source and the microphones.
  • the systems described herein use the information derived from the Pathfinder noise suppression system and/or a non-acoustic sensor described in the Related Applications to determine the voicing state of an input signal, as described in detail below.
  • the voicing state includes silent, voiced, and unvoiced states.
  • the NAVSAD system for example, includes a non-acoustic sensor to detect the vibration of human tissue associated with speech.
  • the non-acoustic sensor of an embodiment is a General Electromagnetic Movement Sensor (GEMS) as described briefly below and in detail in the Related Applications, but is not so limited. Alternative embodiments, however, may use any sensor that is able to detect human tissue motion associated with speech and is unaffected by environmental acoustic noise.
  • GEMS General Electromagnetic Movement Sensor
  • the GEMS is a radio frequency device (2.4 GHz) that allows the detection of moving human tissue dielectric interfaces.
  • the GEMS includes an RF interferometer that uses homodyne mixing to detect small phase shifts associated with target motion. In essence, the sensor sends out weak electromagnetic waves (less than 1 milliwatt) that reflect off of whatever is around the sensor. The reflected waves are mixed with the original transmitted waves and the results analyzed for any change in position of the targets. Anything that moves near the sensor will cause a change in phase of the reflected wave that will be amplified and displayed as a change in voltage output from the sensor.
  • a similar sensor is described by Gregory C. Burnett (1999) in “The physiological basis of glottal electromagnetic micropower sensors (GEMS) and their use in defining an excitation function for the human vocal tract”; Ph.D. Thesis, University of California at Davis.
  • FIG. 4 is a flow diagram of a detection algorithm 50 for use in detecting voiced and unvoiced speech, under an embodiment.
  • both the NAVSAD and PSAD systems of an embodiment include the detection algorithm 50 as the detection subsystem 50 .
  • This detection algorithm 50 operates in real-time and, in an embodiment, operates on 20 millisecond windows and steps 10 milliseconds at a time, but is not so limited.
  • the voice activity determination is recorded for the first 10 milliseconds, and the second 10 milliseconds functions as a “look-ahead” buffer. While an embodiment uses the 20/10 windows, alternative embodiments may use numerous other combinations of window values.
  • the systems using the detection algorithm of an embodiment function in environments containing varying amounts of background acoustic noise. If the non-acoustic sensor is available, this external noise is not a problem for voiced speech. However, for unvoiced speech (and voiced if the non-acoustic sensor is not available or has malfunctioned) reliance is placed on acoustic data alone to separate noise from unvoiced speech.
  • An advantage inheres in the use of two microphones in an embodiment of the Pathfinder noise suppression system, and the spatial relationship between the microphones is exploited to assist in the detection of unvoiced speech.
  • the speech source should be relatively louder in one designated microphone when compared to the other microphone. Tests have shown that this requirement is easily met with conventional microphones when the microphones are placed on the head, as any noise should result in an H 1 with a gain near unity.
  • the NAVSAD relies on two parameters to detect voiced speech. These two parameters include the energy of the sensor in the window of interest, determined in an embodiment by the standard deviation (SD), and optionally the cross-correlation (XCORR) between the acoustic signal from microphone 1 and the sensor data.
  • SD standard deviation
  • XCORR cross-correlation
  • the energy of the sensor can be determined in any one of a number of ways, and the SD is just one convenient way to determine the energy.
  • the SD is akin to the energy of the signal, which normally corresponds quite accurately to the voicing state, but may be susceptible to movement noise (relative motion of the sensor with respect to the human user) and/or electromagnetic noise.
  • the XCORR can be used. The XCORR is only calculated to 15 delays, which corresponds to just under 2 milliseconds at 8000 Hz.
  • the XCORR can also be useful when the sensor signal is distorted or modulated in some fashion. For example, there are sensor locations (such as the jaw or back of the neck) where speech production can be detected but where the signal may have incorrect or distorted time-based information. That is, they may not have well defined features in time that will match with the acoustic waveform. However, XCORR is more susceptible to errors from acoustic noise, and in high ( ⁇ 0 dB SNR) environments is almost useless. Therefore it should not be the sole source of voicing information.
  • the sensor detects human tissue motion associated with the closure of the vocal folds, so the acoustic signal produced by the closure of the folds is highly correlated with the closures. Therefore, sensor data that correlates highly with the acoustic signal is declared as speech, and sensor data that does not correlate well is termed noise.
  • the acoustic data is expected to lag behind the sensor data by about 0.1 to 0.8 milliseconds (or about 1-7 samples) as a result of the delay time due to the relatively slower speed of sound (around 330 m/s).
  • an embodiment uses a 15-sample correlation, as the acoustic wave shape varies significantly depending on the sound produced, and a larger correlation width is needed to ensure detection.
  • the SD and XCORR signals are related, but are sufficiently different so that the voiced speech detection is more reliable. For simplicity, though, either parameter may be used.
  • the values for the SD and XCORR are compared to empirical thresholds, and if both are above their threshold, voiced speech is declared. Example data is presented and described below.
  • FIGS. 5A, 5B, and 6 show data plots for an example in which a subject twice speaks the phrase “pop pan”, under an embodiment.
  • FIG. 5A plots the received GEMS signal 502 for this utterance along with the mean correlation 504 between the GEMS signal and the Mic 1 signal and the threshold T 1 used for voiced speech detection.
  • FIG. 5B plots the received GEMS signal 502 for this utterance along with the standard deviation 506 of the GEMS signal and the threshold T 2 used for voiced speech detection.
  • FIG. 5A plots the received GEMS signal 502 for this utterance along with the mean correlation 504 between the GEMS signal and the Mic 1 signal and the threshold T 1 used for voiced speech detection.
  • FIG. 5B plots the received GEMS signal 502 for this utterance along with the standard deviation 506 of the GEMS signal and the threshold T 2 used for voiced speech detection.
  • the NAVSAD can determine when voiced speech is occurring with high degrees of accuracy due to the non-acoustic sensor data.
  • the sensor offers little assistance in separating unvoiced speech from noise, as unvoiced speech normally causes no detectable signal in most non-acoustic sensors. If there is a detectable signal, the NAVSAD can be used, although use of the SD method is dictated as unvoiced speech is normally poorly correlated. In the absence of a detectable signal use is made of the system and methods of the Pathfinder noise removal algorithm in determining when unvoiced speech is occurring. A brief review of the Pathfinder algorithm is described below, while a detailed description is provided in the Related Applications.
  • the acoustic information coming into Microphone 1 is denoted by m 1 (n)
  • the information coming into Microphone 2 is similarly labeled m 2 (n)
  • the GEMS sensor is assumed available to determine voiced speech areas.
  • these signals are represented as M 1 (z) and M 2 (z).
  • M 1 ⁇ ( z ) S ⁇ ( z ) + N 2 ⁇ ( z )
  • S 2 ⁇ ( z ) S ⁇ ( z ) ⁇ H 2 ⁇ ( z )
  • Equation 1 This is the general case for all two microphone systems. There is always going to be some leakage of noise into Mic 1 , and some leakage of signal into Mic 2 . Equation 1 has four unknowns and only two relationships and cannot be solved explicitly.
  • H 1 (z) can be calculated using any of the available system identification algorithms and the microphone outputs when only noise is being received. The calculation can be done adaptively, so that if the noise changes significantly H 1 (z) can be recalculated quickly.
  • Equation 1 With a solution for one of the unknowns in Equation 1, solutions can be found for another, H 2 (z), by using the amplitude of the GEMS or similar device along with the amplitude of the two microphones.
  • N ( z ) M 2 ( z ) ⁇ S ( z ) H 2 ( z )
  • H 2 (z) is usually quite small, so that H 2 (z)H 1 (z) ⁇ 1, and
  • the PSAD system is described. As sound waves propagate, they normally lose energy as they travel due to diffraction and dispersion. Assuming the sound waves originate from a point source and radiate isotropically, their amplitude will decrease as a function of 1/r, where r is the distance from the originating point. This function of 1/r proportional to amplitude is the worst case, if confined to a smaller area the reduction will be less. However it is an adequate model for the configurations of interest, specifically the propagation of noise and speech to microphones located somewhere on the user's head.
  • ⁇ M is the difference in gain between Mic 1 and Mic 2 and therefore H 1 (z), as above in Equation 2.
  • d 1 is the distance from Mic 1 to the speech or noise source.
  • FIG. 8 is a plot 800 of ⁇ M versus d 1 for several ⁇ d values, under an embodiment. It is clear that as ⁇ d becomes larger and the noise source is closer, ⁇ M becomes larger. The variable ⁇ d will change depending on the orientation to the speech/noise source, from the maximum value on the array midline to zero perpendicular to the array midline. From the plot 800 it is clear that for small ⁇ d and for distances over approximately 30 centimeters (cm), ⁇ M is close to unity.
  • H 1 (z) Since most noise sources are farther away than 30 cm and are unlikely to be on the midline on the array, it is probable that when calculating H 1 (z) as above in Equation 2, ⁇ M (or equivalently the gain of H 1 (z)) will be close to unity. Conversely, for noise sources that are close (within a few centimeters), there could be a substantial difference in gain depending on which microphone is closer to the noise.
  • the gain in this example is calculated by the sum of the absolute value of the filter coefficients. This sum is not equivalent to the gain, but the two are related in that a rise in the sum of the absolute value reflects a rise in the gain.
  • FIG. 9 shows a plot 900 of the gain parameter 902 as the sum of the absolute values of H 1 (z) and the acoustic data 904 or audio from microphone 1 .
  • the speech signal was an utterance of the phrase “pop pan”, repeated twice.
  • the evaluated bandwidth included the frequency range from 2500 Hz to 3500 Hz, although 1500Hz to 2500 Hz was additionally used in practice. Note the rapid increase in the gain when the unvoiced speech is first encountered, then the rapid return to normal when the speech ends.
  • the large changes in gain that result from transitions between noise and speech can be detected by any standard signal processing techniques.
  • the standard deviation of the last few gain calculations is used, with thresholds being defined by a running average of the standard deviations and the standard deviation noise floor. The later changes in gain for the voiced speech are suppressed in this plot 900 for clarity.
  • FIG. 10 is an alternative plot 1000 of acoustic data presented in FIG. 9.
  • the data used to form plot 900 is presented again in this plot 1000 , along with audio data 1004 and GEMS data 1006 without noise to make the unvoiced speech apparent.
  • this automatic backup of the NAVSAD system functions best in an environment with low noise (approximately 10+ dB SNR), as high amounts (10 dB of SNR or less) of acoustic noise can quickly overwhelm any acoustic-only unvoiced detector, including the PSAD.
  • This is evident in the difference in the voiced signal data 602 and 1002 shown in plots 600 and 100 of FIGS. 6 and 10, respectively, where the same utterance is spoken, but the data of plot 600 shows no unvoiced speech because the unvoiced speech is undetectable. This is the desired behavior when performing denoising, since if the unvoiced speech is not detectable then it will not significantly affect the denoising process.
  • Using the Pathfinder system to detect unvoiced speech ensures detection of any unvoiced speech loud enough to distort the denoising.
  • the configuration of the microphones can have an effect on the change in gain associated with speech and the thresholds needed to detect speech.
  • each configuration will require testing to determine the proper thresholds, but tests with two very different microphone configurations showed the same thresholds and other parameters to work well.
  • the first microphone set had the signal microphone near the mouth and the noise microphone several centimeters away at the ear, while the second configuration placed the noise and signal microphones back-to-back within a few centimeters of the mouth.
  • the results presented herein were derived using the first microphone configuration, but the results using the other set are virtually identical, so the detection algorithm is relatively robust with respect to microphone placement.
  • a number of configurations are possible using the NAVSAD and PSAD systems to detect voiced and unvoiced speech.
  • One configuration uses the NAVSAD system (non-acoustic only) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech.
  • An alternative configuration uses the NAVSAD system (non-acoustic correlated with acoustic) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech.
  • Another alternative configuration uses the PSAD system to detect both voiced and unvoiced speech.
  • the “k” in “kick” has significant frequency content form 500 Hz to 4000 Hz, but a “sh” in “she” only contains significant energy from 1700-4000 Hz.
  • Voiced speech could be classified in a similar manner. For instance, an /i/ (“ee”) has significant energy around 300 Hz and 2500 Hz, and an /a/ (“ah”) has energy at around 900 Hz and 1200 Hz. This ability to discriminate unvoiced and voiced speech in the presence of noise is, thus, very useful.
  • routines described herein can be provided with one or more of the following, or one or more combinations of the following: stored in non-volatile memory (not shown) that forms part of an associated processor or processors, or implemented using conventional programmed logic arrays or circuit elements, or stored in removable media such as disks, or downloaded from a server and stored locally at a client, or hardwired or preprogrammed in chips such as EEPROM semiconductor chips, application specific integrated circuits (ASICs), or by digital signal processing (DSP) integrated circuits.
  • non-volatile memory not shown
  • ASICs application specific integrated circuits
  • DSP digital signal processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Systems and methods are provided for detecting voiced and unvoiced speech in acoustic signals having varying levels of background noise. The systems receive acoustic signals at two microphones, and generate difference parameters between the acoustic signals received at each of the two microphones. The difference parameters are representative of the relative difference in signal gain between portions of the received acoustic signals. The systems identify information of the acoustic signals as unvoiced speech when the difference parameters exceed a first threshold, and identify information of the acoustic signals as voiced speech when the difference parameters exceed a second threshold. Further, embodiments of the systems include non-acoustic sensors that receive physiological information to aid in identifying voiced speech.

Description

    RELATED APPLICATIONS
  • This application claims the benefit of U.S. Application Nos. 60/294,383 filed May 30, 2001; 09/905,361 filed Jul. 12, 2001; 60/335,100 filed Oct. 30, 2001; 60/332,202 and 09/990,847, both filed Nov. 21, 2001; 60/362,103, 60/362,161, 60/362,162, 60/362,170, and 60/361,981, all filed Mar. 5, 2002; 60/368,208, 60/368,209, and 60/368,343, all filed Mar. 27, 2002; all of which are incorporated herein by reference in their entirety.[0001]
  • TECHNICAL FIELD
  • The disclosed embodiments relate to the processing of speech signals. [0002]
  • BACKGROUND
  • The ability to correctly identify voiced and unvoiced speech is critical to many speech applications including speech recognition, speaker verification, noise suppression, and many others. In a typical acoustic application, speech from a human speaker is captured and transmitted to a receiver in a different location. In the speaker's environment there may exist one or more noise sources that pollute the speech signal, or the signal of interest, with unwanted acoustic noise. This makes it difficult or impossible for the receiver, whether human or machine, to understand the user's speech. [0003]
  • Typical methods for classifying voiced and unvoiced speech have relied mainly on the acoustic content of microphone data, which is plagued by problems with noise and the corresponding uncertainties in signal content. This is especially problematic now with the proliferation of portable communication devices like cellular telephones and personal digital assistants because, in many cases, the quality of service provided by the device depends on the quality of the voice services offered by the device. There are methods known in the art for suppressing the noise present in the speech signals, but these methods demonstrate performance shortcomings that include unusually long computing time, requirements for cumbersome hardware to perform the signal processing, and distorting the signals of interest.[0004]
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is a block diagram of a NAVSAD system, under an embodiment. [0005]
  • FIG. 2 is a block diagram of a PSAD system, under an embodiment. [0006]
  • FIG. 3 is a block diagram of a denoising system, referred to herein as the Pathfinder system, under an embodiment. [0007]
  • FIG. 4 is a flow diagram of a detection algorithm for use in detecting voiced and unvoiced speech, under an embodiment. [0008]
  • FIG. 5A plots the received GEMS signal for an utterance along with the mean correlation between the GEMS signal and the [0009] Mic 1 signal and the threshold for voiced speech detection.
  • FIG. 5B plots the received GEMS signal for an utterance along with the standard deviation of the GEMS signal and the threshold for voiced speech detection. [0010]
  • FIG. 6 plots voiced speech detected from an utterance along with the GEMS signal and the acoustic noise. [0011]
  • FIG. 7 is a microphone array for use under an embodiment of the PSAD system. [0012]
  • FIG. 8 is a plot of ΔM versus d[0013] 1 for several Δd values, under an embodiment.
  • FIG. 9 shows a plot of the gain parameter as the sum of the absolute values of H[0014] 1(z) and the acoustic data or audio from microphone 1.
  • FIG. 10 is an alternative plot of acoustic data presented in FIG. 9. [0015]
  • In the figures, the same reference numbers identify identical or substantially similar elements or acts. [0016]
  • Any headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.[0017]
  • DETAILED DESCRIPTION
  • Systems and methods for discriminating voiced and unvoiced speech from background noise are provided below including a Non-Acoustic Sensor Voiced Speech Activity Detection (NAVSAD) system and a Pathfinder Speech Activity Detection (PSAD) system. The noise removal and reduction methods provided herein, while allowing for the separation and classification of unvoiced and voiced human speech from background noise, address the shortcomings of typical systems known in the art by cleaning acoustic signals of interest without distortion. [0018]
  • FIG. 1 is a block diagram of a NAVSAD [0019] system 100, under an embodiment. The NAVSAD system couples microphones 10 and sensors 20 to at least one processor 30. The sensors 20 of an embodiment include voicing activity detectors or non-acoustic sensors. The processor 30 controls subsystems including a detection subsystem 50, referred to herein as a detection algorithm, and a denoising subsystem 40. Operation of the denoising subsystem 40 is described in detail in the Related Applications. The NAVSAD system works extremely well in any background acoustic noise environment.
  • FIG. 2 is a block diagram of a [0020] PSAD system 200, under an embodiment. The PSAD system couples microphones 10 to at least one processor 30. The processor 30 includes a detection subsystem 50, referred to herein as a detection algorithm, and a denoising subsystem 40. The PSAD system is highly sensitive in low acoustic noise environments and relatively insensitive in high acoustic noise environments. The PSAD can operate independently or as a backup to the NAVSAD, detecting voiced speech if the NAVSAD fails.
  • Note that the detection subsystems [0021] 50 and denoising subsystems 40 of both the NAVSAD and PSAD systems of an embodiment are algorithms controlled by the processor 30, but are not so limited. Alternative embodiments of the NAVSAD and PSAD systems can include detection subsystems 50 and/or denoising subsystems 40 that comprise additional hardware, firmware, software, and/or combinations of hardware, firmware, and software. Furthermore, functions of the detection subsystems 50 and denoising subsystems 40 may be distributed across numerous components of the NAVSAD and PSAD systems.
  • FIG. 3 is a block diagram of a [0022] denoising subsystem 300, referred to herein as the Pathfinder system, under an embodiment. The Pathfinder system is briefly described below, and is described in detail in the Related Applications. Two microphones Mic 1 and Mic 2 are used in the Pathfinder system, and Mic 1 is considered the “signal” microphone. With reference to FIG. 1, the Pathfinder system 300 is equivalent to the NAVSAD system 100 when the voicing activity detector (VAD) 320 is a non-acoustic voicing sensor 20 and the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40. With reference to FIG. 2, the Pathfinder system 300 is equivalent to the PSAD system 200 in the absence of the VAD 320, and when the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40.
  • The NAVSAD and PSAD systems support a two-level commercial approach in which (i) a relatively less expensive PSAD system supports an acoustic approach that functions in most low- to medium-noise environments, and (ii) a NAVSAD system adds a non-acoustic sensor to enable detection of voiced speech in any environment. Unvoiced speech is normally not detected using the sensor, as it normally does not sufficiently vibrate human tissue. However, in high noise situations detecting the unvoiced speech is not as important, as it is normally very low in energy and easily washed out by the noise. Therefore in high noise environments the unvoiced speech is unlikely to affect the voiced speech denoising. Unvoiced speech information is most important in the presence of little to no noise and, therefore, the unvoiced detection should be highly sensitive in low noise situations, and insensitive in high noise situations. This is not easily accomplished, and comparable acoustic unvoiced detectors known in the art are incapable of operating under these environmental constraints. [0023]
  • The NAVSAD and PSAD systems include an array algorithm for speech detection that uses the difference in frequency content between two microphones to calculate a relationship between the signals of the two microphones. This is in contrast to conventional arrays that attempt to use the time/phase difference of each microphone to remove the noise outside of an “area of sensitivity”. The methods described herein provide a significant advantage, as they do not require a specific orientation of the array with respect to the signal. [0024]
  • Further, the systems described herein are sensitive to noise of every type and every orientation, unlike conventional arrays that depend on specific noise orientations. Consequently, the frequency-based arrays presented herein are unique as they depend only on the relative orientation of the two microphones themselves with no dependence on the orientation of the noise and signal with respect to the microphones. This results in a robust signal processing system with respect to the type of noise, microphones, and orientation between the noise/signal source and the microphones. [0025]
  • The systems described herein use the information derived from the Pathfinder noise suppression system and/or a non-acoustic sensor described in the Related Applications to determine the voicing state of an input signal, as described in detail below. The voicing state includes silent, voiced, and unvoiced states. The NAVSAD system, for example, includes a non-acoustic sensor to detect the vibration of human tissue associated with speech. The non-acoustic sensor of an embodiment is a General Electromagnetic Movement Sensor (GEMS) as described briefly below and in detail in the Related Applications, but is not so limited. Alternative embodiments, however, may use any sensor that is able to detect human tissue motion associated with speech and is unaffected by environmental acoustic noise. [0026]
  • The GEMS is a radio frequency device (2.4 GHz) that allows the detection of moving human tissue dielectric interfaces. The GEMS includes an RF interferometer that uses homodyne mixing to detect small phase shifts associated with target motion. In essence, the sensor sends out weak electromagnetic waves (less than 1 milliwatt) that reflect off of whatever is around the sensor. The reflected waves are mixed with the original transmitted waves and the results analyzed for any change in position of the targets. Anything that moves near the sensor will cause a change in phase of the reflected wave that will be amplified and displayed as a change in voltage output from the sensor. A similar sensor is described by Gregory C. Burnett (1999) in “The physiological basis of glottal electromagnetic micropower sensors (GEMS) and their use in defining an excitation function for the human vocal tract”; Ph.D. Thesis, University of California at Davis. [0027]
  • FIG. 4 is a flow diagram of a [0028] detection algorithm 50 for use in detecting voiced and unvoiced speech, under an embodiment. With reference to FIGS. 1 and 2, both the NAVSAD and PSAD systems of an embodiment include the detection algorithm 50 as the detection subsystem 50. This detection algorithm 50 operates in real-time and, in an embodiment, operates on 20 millisecond windows and steps 10 milliseconds at a time, but is not so limited. The voice activity determination is recorded for the first 10 milliseconds, and the second 10 milliseconds functions as a “look-ahead” buffer. While an embodiment uses the 20/10 windows, alternative embodiments may use numerous other combinations of window values.
  • Consideration was given to a number of multi-dimensional factors in developing the [0029] detection algorithm 50. The biggest consideration was to maintaining the effectiveness of the Pathfinder denoising technique, described in detail in the Related Applications and reviewed herein. Pathfinder performance can be compromised if the adaptive filter training is conducted on speech rather than on noise. It is therefore important not to exclude any significant amount of speech from the VAD to keep such disturbances to a minimum.
  • Consideration was also given to the accuracy of the characterization between voiced and unvoiced speech signals, and distinguishing each of these speech signals from noise signals. This type of characterization can be useful in such applications as speech recognition and speaker verification. [0030]
  • Furthermore, the systems using the detection algorithm of an embodiment function in environments containing varying amounts of background acoustic noise. If the non-acoustic sensor is available, this external noise is not a problem for voiced speech. However, for unvoiced speech (and voiced if the non-acoustic sensor is not available or has malfunctioned) reliance is placed on acoustic data alone to separate noise from unvoiced speech. An advantage inheres in the use of two microphones in an embodiment of the Pathfinder noise suppression system, and the spatial relationship between the microphones is exploited to assist in the detection of unvoiced speech. However, there may occasionally be noise levels high enough that the speech will be nearly undetectable and the acoustic-only method will fail. In these situations, the non-acoustic sensor (or hereafter just the sensor) will be required to ensure good performance. [0031]
  • In the two-microphone system, the speech source should be relatively louder in one designated microphone when compared to the other microphone. Tests have shown that this requirement is easily met with conventional microphones when the microphones are placed on the head, as any noise should result in an H[0032] 1 with a gain near unity.
  • Regarding the NAVSAD system, and with reference to FIG. 1 and FIG. 3, the NAVSAD relies on two parameters to detect voiced speech. These two parameters include the energy of the sensor in the window of interest, determined in an embodiment by the standard deviation (SD), and optionally the cross-correlation (XCORR) between the acoustic signal from [0033] microphone 1 and the sensor data. The energy of the sensor can be determined in any one of a number of ways, and the SD is just one convenient way to determine the energy.
  • For the sensor, the SD is akin to the energy of the signal, which normally corresponds quite accurately to the voicing state, but may be susceptible to movement noise (relative motion of the sensor with respect to the human user) and/or electromagnetic noise. To further differentiate sensor noise from tissue motion, the XCORR can be used. The XCORR is only calculated to 15 delays, which corresponds to just under 2 milliseconds at 8000 Hz. [0034]
  • The XCORR can also be useful when the sensor signal is distorted or modulated in some fashion. For example, there are sensor locations (such as the jaw or back of the neck) where speech production can be detected but where the signal may have incorrect or distorted time-based information. That is, they may not have well defined features in time that will match with the acoustic waveform. However, XCORR is more susceptible to errors from acoustic noise, and in high (<0 dB SNR) environments is almost useless. Therefore it should not be the sole source of voicing information. [0035]
  • The sensor detects human tissue motion associated with the closure of the vocal folds, so the acoustic signal produced by the closure of the folds is highly correlated with the closures. Therefore, sensor data that correlates highly with the acoustic signal is declared as speech, and sensor data that does not correlate well is termed noise. The acoustic data is expected to lag behind the sensor data by about 0.1 to 0.8 milliseconds (or about 1-7 samples) as a result of the delay time due to the relatively slower speed of sound (around 330 m/s). However, an embodiment uses a 15-sample correlation, as the acoustic wave shape varies significantly depending on the sound produced, and a larger correlation width is needed to ensure detection. [0036]
  • The SD and XCORR signals are related, but are sufficiently different so that the voiced speech detection is more reliable. For simplicity, though, either parameter may be used. The values for the SD and XCORR are compared to empirical thresholds, and if both are above their threshold, voiced speech is declared. Example data is presented and described below. [0037]
  • FIGS. 5A, 5B, and [0038] 6 show data plots for an example in which a subject twice speaks the phrase “pop pan”, under an embodiment. FIG. 5A plots the received GEMS signal 502 for this utterance along with the mean correlation 504 between the GEMS signal and the Mic 1 signal and the threshold T1 used for voiced speech detection. FIG. 5B plots the received GEMS signal 502 for this utterance along with the standard deviation 506 of the GEMS signal and the threshold T2 used for voiced speech detection. FIG. 6 plots voiced speech 602 detected from the acoustic or audio signal 608, along with the GEMS signal 604 and the acoustic noise 606; no unvoiced speech is detected in this example because of the heavy background babble noise 606. The thresholds have been set so that there are virtually no false negatives, and only occasional false positives. A voiced speech activity detection accuracy of greater than 99% has been attained under any acoustic background noise conditions.
  • The NAVSAD can determine when voiced speech is occurring with high degrees of accuracy due to the non-acoustic sensor data. However, the sensor offers little assistance in separating unvoiced speech from noise, as unvoiced speech normally causes no detectable signal in most non-acoustic sensors. If there is a detectable signal, the NAVSAD can be used, although use of the SD method is dictated as unvoiced speech is normally poorly correlated. In the absence of a detectable signal use is made of the system and methods of the Pathfinder noise removal algorithm in determining when unvoiced speech is occurring. A brief review of the Pathfinder algorithm is described below, while a detailed description is provided in the Related Applications. [0039]
  • With reference to FIG. 3, the acoustic information coming into [0040] Microphone 1 is denoted by m1(n), the information coming into Microphone 2 is similarly labeled m2(n), and the GEMS sensor is assumed available to determine voiced speech areas. In the z (digital frequency) domain, these signals are represented as M1(z) and M2(z). Then M 1 ( z ) = S ( z ) + N 2 ( z ) M 2 ( z ) = N ( z ) + S 2 ( z ) with N 2 ( z ) = N ( z ) H 1 ( z ) S 2 ( z ) = S ( z ) H 2 ( z ) so  that M 1 ( z ) = S ( z ) + N ( z ) H 1 ( z ) M 2 ( z ) = N ( z ) + S ( z ) H 2 ( z ) ( 1 )
    Figure US20020198705A1-20021226-M00001
  • This is the general case for all two microphone systems. There is always going to be some leakage of noise into [0041] Mic 1, and some leakage of signal into Mic 2. Equation 1 has four unknowns and only two relationships and cannot be solved explicitly.
  • However, there is another way to solve for some of the unknowns in [0042] Equation 1. Examine the case where the signal is not being generated—that is, where the GEMS signal indicates voicing is not occurring. In this case, s(n)=S(z)=0, and Equation 1 reduces to
  • M 1n(z)=N(z)H 1(z)
  • M 2n(z)=N(z)
  • where the n subscript on the M variables indicate that only noise is being received. This leads to [0043] M 1 n ( z ) = M 2 n ( z ) H 1 ( z ) H 1 ( z ) = M 1 n ( z ) M 2 n ( z ) ( 2 )
    Figure US20020198705A1-20021226-M00002
  • H[0044] 1(z) can be calculated using any of the available system identification algorithms and the microphone outputs when only noise is being received. The calculation can be done adaptively, so that if the noise changes significantly H1(z) can be recalculated quickly.
  • With a solution for one of the unknowns in [0045] Equation 1, solutions can be found for another, H2(z), by using the amplitude of the GEMS or similar device along with the amplitude of the two microphones. When the GEMS indicates voicing, but the recent (less than 1 second) history of the microphones indicate low levels of noise, assume that n(s)=N(z)˜0. Then Equation 1 reduces to
  • M 1s(z)=S(z)
  • M 2s(z)=S(z)H 2(z)
  • which in turn leads to [0046] M 2 s ( z ) = M 1 s ( z ) H 2 ( z ) H 2 ( z ) = M 2 s ( z ) M 1 s ( z )
    Figure US20020198705A1-20021226-M00003
  • which is the inverse of the H[0047] 1(z) calculation, but note that different inputs are being used.
  • After calculating H[0048] 1(z) and H2(z) above, they are used to remove the noise from the signal. Rewrite Equation 1 as
  • S(z)=M 1(z)−N(z)H 1(z)
  • N(z)=M 2(z)−S(z)H 2 (z)
  • S(z)=M 1(z)−[M 2(z)−S(z)H 2(z)]H 1(z)′
  • S(z)[1−H 2(z)H 1(z)]=M 1(z)−M 2(z)H 1(z)
  • and solve for S(z) as: [0049] S ( z ) = M 1 ( z ) - M 2 ( z ) H 1 ( z ) 1 - H 2 ( z ) H 1 ( z ) . ( 3 )
    Figure US20020198705A1-20021226-M00004
  • In practice H[0050] 2(z) is usually quite small, so that H2(z)H1(z)<<1, and
  • S(z)≈M 1(z)−M 2(z)H 1(z),
  • obviating the need for the H[0051] 2(z) calculation.
  • With reference to FIG. 2 and FIG. 3, the PSAD system is described. As sound waves propagate, they normally lose energy as they travel due to diffraction and dispersion. Assuming the sound waves originate from a point source and radiate isotropically, their amplitude will decrease as a function of 1/r, where r is the distance from the originating point. This function of 1/r proportional to amplitude is the worst case, if confined to a smaller area the reduction will be less. However it is an adequate model for the configurations of interest, specifically the propagation of noise and speech to microphones located somewhere on the user's head. [0052]
  • FIG. 7 is a microphone array for use under an embodiment of the PSAD system. Placing the [0053] microphones Mic 1 and Mic 2 in a linear array with the mouth on the array midline, the difference in signal strength in Mic 1 and Mic 2 (assuming the microphones have identical frequency responses) will be proportional to both d1 and Δd. Assuming a 1/r (or in this case 1/d) relationship, it is seen that Δ M = Mic1 Mic2 = Δ H 1 ( z ) d 1 + Δ d d 1 ,
    Figure US20020198705A1-20021226-M00005
  • where ΔM is the difference in gain between [0054] Mic 1 and Mic 2 and therefore H1(z), as above in Equation 2. The variable d1 is the distance from Mic 1 to the speech or noise source.
  • FIG. 8 is a [0055] plot 800 of ΔM versus d1 for several Δd values, under an embodiment. It is clear that as Δd becomes larger and the noise source is closer, ΔM becomes larger. The variable Δd will change depending on the orientation to the speech/noise source, from the maximum value on the array midline to zero perpendicular to the array midline. From the plot 800 it is clear that for small Δd and for distances over approximately 30 centimeters (cm), ΔM is close to unity. Since most noise sources are farther away than 30 cm and are unlikely to be on the midline on the array, it is probable that when calculating H1(z) as above in Equation 2, ΔM (or equivalently the gain of H1(z)) will be close to unity. Conversely, for noise sources that are close (within a few centimeters), there could be a substantial difference in gain depending on which microphone is closer to the noise.
  • If the “noise” is the user speaking, and [0056] Mic 1 is closer to the mouth than Mic 2, the gain increases. Since environmental noise normally originates much farther away from the user's head than speech, noise will be found during the time when the gain of H1(z) is near unity or some fixed value, and speech can be found after a sharp rise in gain. The speech can be unvoiced or voiced, as long as it is of sufficient volume compared to the surrounding noise. The gain will stay somewhat high during the speech portions, then descend quickly after speech ceases. The rapid increase and decrease in the gain of H1(z) should be sufficient to allow the detection of speech under almost any circumstances. The gain in this example is calculated by the sum of the absolute value of the filter coefficients. This sum is not equivalent to the gain, but the two are related in that a rise in the sum of the absolute value reflects a rise in the gain.
  • As an example of this behavior, FIG. 9 shows a [0057] plot 900 of the gain parameter 902 as the sum of the absolute values of H1(z) and the acoustic data 904 or audio from microphone 1. The speech signal was an utterance of the phrase “pop pan”, repeated twice. The evaluated bandwidth included the frequency range from 2500 Hz to 3500 Hz, although 1500Hz to 2500 Hz was additionally used in practice. Note the rapid increase in the gain when the unvoiced speech is first encountered, then the rapid return to normal when the speech ends. The large changes in gain that result from transitions between noise and speech can be detected by any standard signal processing techniques. The standard deviation of the last few gain calculations is used, with thresholds being defined by a running average of the standard deviations and the standard deviation noise floor. The later changes in gain for the voiced speech are suppressed in this plot 900 for clarity.
  • FIG. 10 is an [0058] alternative plot 1000 of acoustic data presented in FIG. 9. The data used to form plot 900 is presented again in this plot 1000, along with audio data 1004 and GEMS data 1006 without noise to make the unvoiced speech apparent. The voiced signal 1002 has three possible values: 0 for noise, 1 for unvoiced, and 2 for voiced. Denoising is only accomplished when V=0. It is clear that the unvoiced speech is captured very well, aside from two single dropouts in the unvoiced detection near the end of each “pop”. However, these single-window dropouts are not common and do not significantly affect the denoising algorithm. They can easily be removed using standard smoothing techniques.
  • What is not clear from this [0059] plot 1000 is that the PSAD system functions as an automatic backup to the NAVSAD. This is because the voiced speech (since it has the same spatial relationship to the mics as the unvoiced) will be detected as unvoiced if the sensor or NAVSAD system fail for any reason. The voiced speech will be misclassified as unvoiced, but the denoising will still not take place, preserving the quality of the speech signal.
  • However, this automatic backup of the NAVSAD system functions best in an environment with low noise (approximately 10+ dB SNR), as high amounts (10 dB of SNR or less) of acoustic noise can quickly overwhelm any acoustic-only unvoiced detector, including the PSAD. This is evident in the difference in the [0060] voiced signal data 602 and 1002 shown in plots 600 and 100 of FIGS. 6 and 10, respectively, where the same utterance is spoken, but the data of plot 600 shows no unvoiced speech because the unvoiced speech is undetectable. This is the desired behavior when performing denoising, since if the unvoiced speech is not detectable then it will not significantly affect the denoising process. Using the Pathfinder system to detect unvoiced speech ensures detection of any unvoiced speech loud enough to distort the denoising.
  • Regarding hardware considerations, and with reference to FIG. 7, the configuration of the microphones can have an effect on the change in gain associated with speech and the thresholds needed to detect speech. In general, each configuration will require testing to determine the proper thresholds, but tests with two very different microphone configurations showed the same thresholds and other parameters to work well. The first microphone set had the signal microphone near the mouth and the noise microphone several centimeters away at the ear, while the second configuration placed the noise and signal microphones back-to-back within a few centimeters of the mouth. The results presented herein were derived using the first microphone configuration, but the results using the other set are virtually identical, so the detection algorithm is relatively robust with respect to microphone placement. [0061]
  • A number of configurations are possible using the NAVSAD and PSAD systems to detect voiced and unvoiced speech. One configuration uses the NAVSAD system (non-acoustic only) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech. An alternative configuration uses the NAVSAD system (non-acoustic correlated with acoustic) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech. Another alternative configuration uses the PSAD system to detect both voiced and unvoiced speech. [0062]
  • While the systems described above have been described with reference to separating voiced and unvoiced speech from background acoustic noise, there are no reasons more complex classifications can not be made. For more in-depth characterization of speech, the system can bandpass the information from [0063] Mic 1 and Mic 2 so that it is possible to see which bands in the Mic 1 data are more heavily composed of noise and which are more weighted with speech. Using this knowledge, it is possible to group the utterances by their spectral characteristics similar to conventional acoustic methods; this method would work better in noisy environments.
  • As an example, the “k” in “kick” has significant frequency content form 500 Hz to 4000 Hz, but a “sh” in “she” only contains significant energy from 1700-4000 Hz. Voiced speech could be classified in a similar manner. For instance, an /i/ (“ee”) has significant energy around 300 Hz and 2500 Hz, and an /a/ (“ah”) has energy at around 900 Hz and 1200 Hz. This ability to discriminate unvoiced and voiced speech in the presence of noise is, thus, very useful. [0064]
  • Each of the steps depicted in the flow diagrams presented herein can itself include a sequence of operations that need not be described herein. Those skilled in the relevant art can create routines, algorithms, source code, microcode, program logic arrays or otherwise implement the invention based on the flow diagrams and the detailed description provided herein. The routines described herein can be provided with one or more of the following, or one or more combinations of the following: stored in non-volatile memory (not shown) that forms part of an associated processor or processors, or implemented using conventional programmed logic arrays or circuit elements, or stored in removable media such as disks, or downloaded from a server and stored locally at a client, or hardwired or preprogrammed in chips such as EEPROM semiconductor chips, application specific integrated circuits (ASICs), or by digital signal processing (DSP) integrated circuits. [0065]
  • Unless described otherwise herein, the information described herein is well known or described in detail in the Related Applications. Indeed, much of the detailed description provided herein is explicitly disclosed in the Related Applications; most or all of the additional material of aspects of the invention will be recognized by those skilled in the relevant art as being inherent in the detailed description provided in such Related Applications, or well known to those skilled in the relevant art. Those skilled in the relevant art can implement aspects of the invention based on the material presented herein and the detailed description provided in the Related Applications. [0066]
  • Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. [0067]
  • The above description of illustrated embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. The teachings of the invention provided herein can be applied to signal processing systems, not only for the speech signal processing described above. Further, the elements and acts of the various embodiments described above can be combined to provide further embodiments. [0068]
  • All of the above references and Related Applications are incorporated herein by reference. Aspects of the invention can be modified, if necessary, to employ the systems, functions and concepts of the various references described above to provide yet further embodiments of the invention. [0069]
  • These and other changes can be made to the invention in light of the above detailed description. In general, in the following claims, the terms used should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims, but should be construed to include all speech signal systems that operate under the claims to provide a method for procurement. Accordingly, the invention is not limited by the disclosure, but instead the scope of the invention is to be determined entirely by the claims. [0070]
  • While certain aspects of the invention are presented below in certain claim forms, the inventor contemplates the various aspects of the invention in any number of claim forms. Thus, the inventor reserves the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention. [0071]

Claims (7)

What I claim is:
1. A system for detecting voiced and unvoiced speech in acoustic signals having varying levels of background noise, comprising:
at least two microphones for receiving the acoustic signals;
at least one processor coupled among the microphones, wherein the at least one processor;
generates difference parameters between the acoustic signals received at each of the two microphones, wherein the difference parameters are representative of the relative difference in signal gain between portions of the received acoustic signals;
identifies information of the acoustic signals as unvoiced speech when the difference parameters exceed a first threshold; and
identifies information of the acoustic signals as voiced speech when the difference parameters exceed a second threshold.
2. A method for detecting voiced and unvoiced speech in acoustic signals having varying levels of background noise, comprising:
receiving the acoustic signals at two receivers;
generating difference parameters between the acoustic signals received at each of the two receivers, wherein the difference parameters are representative of the relative difference in signal gain between portions of the received acoustic signals;
identifying information of the acoustic signals as unvoiced speech when the difference parameters exceed a first threshold; and
identifying information of the acoustic signals as voiced speech when the difference parameters exceed a second threshold.
3. The method of claim 2, further comprising generating the first and second thresholds using standard deviations corresponding to the generation of the difference parameters.
4. The method of claim 2, further comprising:
identifying information of the acoustic signals as noise when the difference parameters are less than the first threshold; and
performing denoising on the identified noise.
5. The method of claim 2, further comprising receiving physiological information associated with human voicing activity, wherein the physiological information comprises receiving physiological data associated with human voicing using at least one detector selected from a group including radio frequency devices, electroglottographs, ultrasound devices, acoustic throat microphones, and airflow detectors.
6. A system for detecting voiced and unvoiced speech in acoustic signals having varying levels of background noise, comprising:
at least two microphones that receive the acoustic signals;
at least one voicing sensor that receives physiological information associated with human voicing activity; and
at least one processor coupled among the microphones and the voicing sensor, wherein the at least one processor;
generates cross correlation data between the physiological information and an acoustic signal received at one of the two microphones;
identifies information of the acoustic signals as voiced speech when the cross correlation data corresponding to a portion of the acoustic signal received at the one receiver exceeds a correlation threshold;
generates difference parameters between the acoustic signals received at each of the two receivers, wherein the difference parameters are representative of the relative difference in signal gain between portions of the received acoustic signals;
identifies information of the acoustic signals as unvoiced speech when the difference parameters exceed a gain threshold; and
identifies information of the acoustic signals as noise when the difference parameters are less than the gain threshold.
7. A method for removing noise from acoustic signals, comprising:
receiving the acoustic signals at two receivers and receiving physiological information associated with human voicing activity at a voicing sensor;
generating cross correlation data between the physiological information and an acoustic signal received at one of the two receivers;
identifying information of the acoustic signals as voiced speech when the cross correlation data corresponding to a portion of the acoustic signal received at the one receiver exceeds a correlation threshold;
generating difference parameters between the acoustic signals received at each of the two receivers, wherein the difference parameters are representative of the relative difference in signal gain between portions of the received acoustic signals;
identifying information of the acoustic signals as unvoiced speech when the difference parameters exceed a gain threshold; and
identifying information of the acoustic signals as noise when the difference parameters are less than the gain threshold.
US10/159,770 2000-07-19 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors Expired - Lifetime US7246058B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US10/159,770 US7246058B2 (en) 2001-05-30 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US11/805,987 US20070233479A1 (en) 2002-05-30 2007-05-25 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US13/431,725 US10225649B2 (en) 2000-07-19 2012-03-27 Microphone array with rear venting
US13/436,765 US8682018B2 (en) 2000-07-19 2012-03-30 Microphone array with rear venting
US13/753,441 US8942383B2 (en) 2001-05-30 2013-01-29 Wind suppression/replacement component for use with electronic systems
US13/919,919 US20140372113A1 (en) 2001-07-12 2013-06-17 Microphone and voice activity detection (vad) configurations for use with communication systems
US14/224,868 US20140286519A1 (en) 2000-07-19 2014-03-25 Microphone array with rear venting

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US29438301P 2001-05-30 2001-05-30
US33510001P 2001-10-30 2001-10-30
US33220201P 2001-11-21 2001-11-21
US36210302P 2002-03-05 2002-03-05
US36216202P 2002-03-05 2002-03-05
US36217002P 2002-03-05 2002-03-05
US36216102P 2002-03-05 2002-03-05
US36198102P 2002-03-05 2002-03-05
US36834302P 2002-03-27 2002-03-27
US36820902P 2002-03-27 2002-03-27
US36820802P 2002-03-27 2002-03-27
US10/159,770 US7246058B2 (en) 2001-05-30 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/606,140 Continuation-In-Part US8326611B2 (en) 2001-05-30 2009-10-26 Acoustic voice activity detection (AVAD) for electronic systems

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US10/769,302 Continuation US7433484B2 (en) 2001-05-30 2004-01-30 Acoustic vibration sensor
US11/805,987 Continuation US20070233479A1 (en) 2000-07-19 2007-05-25 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Publications (2)

Publication Number Publication Date
US20020198705A1 true US20020198705A1 (en) 2002-12-26
US7246058B2 US7246058B2 (en) 2007-07-17

Family

ID=27583771

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/159,770 Expired - Lifetime US7246058B2 (en) 2000-07-19 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Country Status (1)

Country Link
US (1) US7246058B2 (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027522A1 (en) * 2003-07-30 2005-02-03 Koichi Yamamoto Speech recognition method and apparatus therefor
US6961623B2 (en) 2002-10-17 2005-11-01 Rehabtronics Inc. Method and apparatus for controlling a device or process with vibrations generated by tooth clicks
US20060120537A1 (en) * 2004-08-06 2006-06-08 Burnett Gregory C Noise suppressing multi-microphone headset
US20070118375A1 (en) * 1999-09-21 2007-05-24 Kenyon Stephen C Audio Identification System And Method
US20090003640A1 (en) * 2003-03-27 2009-01-01 Burnett Gregory C Microphone Array With Rear Venting
US20090254350A1 (en) * 2006-07-13 2009-10-08 Nec Corporation Apparatus, Method and Program for Giving Warning in Connection with inputting of unvoiced Speech
US20100036657A1 (en) * 2006-11-20 2010-02-11 Mitsunori Morisaki Speech estimation system, speech estimation method, and speech estimation program
US20100095210A1 (en) * 2003-08-08 2010-04-15 Audioeye, Inc. Method and Apparatus for Website Navigation by the Visually Impaired
US20110071825A1 (en) * 2008-05-28 2011-03-24 Tadashi Emori Device, method and program for voice detection and recording medium
US20130024194A1 (en) * 2010-11-25 2013-01-24 Goertek Inc. Speech enhancing method and device, and nenoising communication headphone enhancing method and device, and denoising communication headphones
US8577057B2 (en) 2010-11-02 2013-11-05 Robert Bosch Gmbh Digital dual microphone module with intelligent cross fading
US20130317809A1 (en) * 2010-08-24 2013-11-28 Lawrence Livermore National Security, Llc Speech masking and cancelling and voice obscuration
US20150081285A1 (en) * 2013-09-16 2015-03-19 Samsung Electronics Co., Ltd. Speech signal processing apparatus and method for enhancing speech intelligibility
US9047371B2 (en) 2010-07-29 2015-06-02 Soundhound, Inc. System and method for matching a query against a broadcast stream
US9066186B2 (en) 2003-01-30 2015-06-23 Aliphcom Light-based detection for acoustic applications
US9099094B2 (en) 2003-03-27 2015-08-04 Aliphcom Microphone array with rear venting
US9196261B2 (en) 2000-07-19 2015-11-24 Aliphcom Voice activity detector (VAD)—based multiple-microphone acoustic noise suppression
US9292488B2 (en) 2014-02-01 2016-03-22 Soundhound, Inc. Method for embedding voice mail in a spoken utterance using a natural language processing computer system
US9390167B2 (en) 2010-07-29 2016-07-12 Soundhound, Inc. System and methods for continuous audio matching
US9507849B2 (en) 2013-11-28 2016-11-29 Soundhound, Inc. Method for combining a query and a communication command in a natural language computer system
US9516442B1 (en) 2012-09-28 2016-12-06 Apple Inc. Detecting the positions of earbuds and use of these positions for selecting the optimum microphones in a headset
US9564123B1 (en) 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
US9715626B2 (en) 1999-09-21 2017-07-25 Iceberg Industries, Llc Method and apparatus for automatically recognizing input audio and/or video streams
US10121165B1 (en) 2011-05-10 2018-11-06 Soundhound, Inc. System and method for targeting content based on identified audio and multimedia
US10423709B1 (en) 2018-08-16 2019-09-24 Audioeye, Inc. Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces
US10444934B2 (en) 2016-03-18 2019-10-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10867120B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10896286B2 (en) 2016-03-18 2021-01-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10957310B1 (en) 2012-07-23 2021-03-23 Soundhound, Inc. Integrated programming framework for speech and text understanding with meaning parsing
US11064296B2 (en) * 2017-12-28 2021-07-13 Iflytek Co., Ltd. Voice denoising method and apparatus, server and storage medium
US11295730B1 (en) 2014-02-27 2022-04-05 Soundhound, Inc. Using phonetic variants in a local context to improve natural language understanding
US11627413B2 (en) 2012-11-05 2023-04-11 Jawbone Innovations, Llc Acoustic voice activity detection (AVAD) for electronic systems
US11727195B2 (en) 2016-03-18 2023-08-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies

Families Citing this family (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8280072B2 (en) 2003-03-27 2012-10-02 Aliphcom, Inc. Microphone array with rear venting
US8326611B2 (en) * 2007-05-25 2012-12-04 Aliphcom, Inc. Acoustic voice activity detection (AVAD) for electronic systems
US8452023B2 (en) 2007-05-25 2013-05-28 Aliphcom Wind suppression/replacement component for use with electronic systems
DE60333200D1 (en) * 2002-08-30 2010-08-12 Nat Univ Corp Nara Inst Microphone and communication interface system
US7574008B2 (en) * 2004-09-17 2009-08-11 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement
US8345890B2 (en) * 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US9185487B2 (en) 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8194880B2 (en) * 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US20070276658A1 (en) * 2006-05-23 2007-11-29 Barry Grayson Douglass Apparatus and Method for Detecting Speech Using Acoustic Signals Outside the Audible Frequency Range
US8150065B2 (en) 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US7844070B2 (en) 2006-05-30 2010-11-30 Sonitus Medical, Inc. Methods and apparatus for processing audio signals
US8291912B2 (en) 2006-08-22 2012-10-23 Sonitus Medical, Inc. Systems for manufacturing oral-based hearing aid appliances
US20080260169A1 (en) * 2006-11-06 2008-10-23 Plantronics, Inc. Headset Derived Real Time Presence And Communication Systems And Methods
US9591392B2 (en) * 2006-11-06 2017-03-07 Plantronics, Inc. Headset-derived real-time presence and communication systems and methods
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
RU2440627C2 (en) 2007-02-26 2012-01-20 Долби Лэборетериз Лайсенсинг Корпорейшн Increasing speech intelligibility in sound recordings of entertainment programmes
US8503686B2 (en) 2007-05-25 2013-08-06 Aliphcom Vibration sensor and acoustic voice activity detection system (VADS) for use with electronic systems
US8321213B2 (en) * 2007-05-25 2012-11-27 Aliphcom, Inc. Acoustic voice activity detection (AVAD) for electronic systems
US20100098269A1 (en) * 2008-10-16 2010-04-22 Sonitus Medical, Inc. Systems and methods to provide communication, positioning and monitoring of user status
US8270638B2 (en) 2007-05-29 2012-09-18 Sonitus Medical, Inc. Systems and methods to provide communication, positioning and monitoring of user status
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US8433080B2 (en) 2007-08-22 2013-04-30 Sonitus Medical, Inc. Bone conduction hearing device with open-ear microphone
US8224013B2 (en) 2007-08-27 2012-07-17 Sonitus Medical, Inc. Headset systems and methods
US7682303B2 (en) 2007-10-02 2010-03-23 Sonitus Medical, Inc. Methods and apparatus for transmitting vibrations
US8795172B2 (en) 2007-12-07 2014-08-05 Sonitus Medical, Inc. Systems and methods to provide two-way communications
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8055307B2 (en) 2008-01-18 2011-11-08 Aliphcom, Inc. Wireless handsfree headset method and system with handsfree applications
US7974845B2 (en) 2008-02-15 2011-07-05 Sonitus Medical, Inc. Stuttering treatment methods and apparatus
US8270637B2 (en) 2008-02-15 2012-09-18 Sonitus Medical, Inc. Headset systems and methods
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8023676B2 (en) 2008-03-03 2011-09-20 Sonitus Medical, Inc. Systems and methods to provide communication and monitoring of user status
US20090226020A1 (en) * 2008-03-04 2009-09-10 Sonitus Medical, Inc. Dental bone conduction hearing appliance
US8150075B2 (en) 2008-03-04 2012-04-03 Sonitus Medical, Inc. Dental bone conduction hearing appliance
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
BR112012007264A2 (en) 2009-10-02 2020-08-11 Sonitus Medical Inc. intraoral device for sound transmission
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
CN203136164U (en) * 2010-03-22 2013-08-14 艾利佛有限公司 Device and system for pipe calibration of omnidirectional microphone
US8635066B2 (en) 2010-04-14 2014-01-21 T-Mobile Usa, Inc. Camera-assisted noise cancellation and speech recognition
US8712069B1 (en) * 2010-04-19 2014-04-29 Audience, Inc. Selection of system parameters based on non-acoustic sensor information
US8798290B1 (en) 2010-04-21 2014-08-05 Audience, Inc. Systems and methods for adaptive signal equalization
WO2012021832A1 (en) 2010-08-12 2012-02-16 Aliph, Inc. Calibration system with clamping system
US9357307B2 (en) 2011-02-10 2016-05-31 Dolby Laboratories Licensing Corporation Multi-channel wind noise suppression system and method
US9794678B2 (en) 2011-05-13 2017-10-17 Plantronics, Inc. Psycho-acoustic noise suppression
US9318129B2 (en) 2011-07-18 2016-04-19 At&T Intellectual Property I, Lp System and method for enhancing speech activity detection using facial feature detection
US9459276B2 (en) 2012-01-06 2016-10-04 Sensor Platforms, Inc. System and method for device self-calibration
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9726498B2 (en) 2012-11-29 2017-08-08 Sensor Platforms, Inc. Combining monitoring sensor measurements and system signals to determine device context
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9609423B2 (en) 2013-09-27 2017-03-28 Volt Analytics, Llc Noise abatement system for dental procedures
US9454976B2 (en) 2013-10-14 2016-09-27 Zanavox Efficient discrimination of voiced and unvoiced sounds
US9807492B1 (en) 2014-05-01 2017-10-31 Ambarella, Inc. System and/or method for enhancing hearing using a camera module, processor and/or audio input and/or output devices
US9870500B2 (en) 2014-06-11 2018-01-16 At&T Intellectual Property I, L.P. Sensor enhanced speech recognition
EP3441966A1 (en) * 2014-07-23 2019-02-13 PCMS Holdings, Inc. System and method for determining audio context in augmented-reality applications
CN106797512B (en) 2014-08-28 2019-10-25 美商楼氏电子有限公司 Method, system and the non-transitory computer-readable storage medium of multi-source noise suppressed
US10614788B2 (en) 2017-03-15 2020-04-07 Synaptics Incorporated Two channel headset-based own voice enhancement
US10776073B2 (en) 2018-10-08 2020-09-15 Nuance Communications, Inc. System and method for managing a mute button setting for a conference call

Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3789166A (en) * 1971-12-16 1974-01-29 Dyna Magnetic Devices Inc Submersion-safe microphone
US4006318A (en) * 1975-04-21 1977-02-01 Dyna Magnetic Devices, Inc. Inertial microphone system
US4591668A (en) * 1984-05-08 1986-05-27 Iwata Electric Co., Ltd. Vibration-detecting type microphone
US4653102A (en) * 1985-11-05 1987-03-24 Position Orientation Systems Directional microphone system
US4777649A (en) * 1985-10-22 1988-10-11 Speech Systems, Inc. Acoustic feedback control of microphone positioning and speaking volume
US4901354A (en) * 1987-12-18 1990-02-13 Daimler-Benz Ag Method for improving the reliability of voice controls of function elements and device for carrying out this method
US5097515A (en) * 1988-11-30 1992-03-17 Matsushita Electric Industrial Co., Ltd. Electret condenser microphone
US5212764A (en) * 1989-04-19 1993-05-18 Ricoh Company, Ltd. Noise eliminating apparatus and speech recognition apparatus using the same
US5400409A (en) * 1992-12-23 1995-03-21 Daimler-Benz Ag Noise-reduction method for noise-affected voice channels
US5406622A (en) * 1993-09-02 1995-04-11 At&T Corp. Outbound noise cancellation for telephonic handset
US5414776A (en) * 1993-05-13 1995-05-09 Lectrosonics, Inc. Adaptive proportional gain audio mixing system
US5473702A (en) * 1992-06-03 1995-12-05 Oki Electric Industry Co., Ltd. Adaptive noise canceller
US5517435A (en) * 1993-03-11 1996-05-14 Nec Corporation Method of identifying an unknown system with a band-splitting adaptive filter and a device thereof
US5515865A (en) * 1994-04-22 1996-05-14 The United States Of America As Represented By The Secretary Of The Army Sudden Infant Death Syndrome (SIDS) monitor and stimulator
US5539859A (en) * 1992-02-18 1996-07-23 Alcatel N.V. Method of using a dominant angle of incidence to reduce acoustic noise in a speech signal
US5590241A (en) * 1993-04-30 1996-12-31 Motorola Inc. Speech processing system and method for enhancing a speech signal in a noisy environment
US5633935A (en) * 1993-04-13 1997-05-27 Matsushita Electric Industrial Co., Ltd. Stereo ultradirectional microphone apparatus
US5649055A (en) * 1993-03-26 1997-07-15 Hughes Electronics Voice activity detector for speech signals in variable background noise
US5664052A (en) * 1992-04-15 1997-09-02 Sony Corporation Method and device for discriminating voiced and unvoiced sounds
US5684460A (en) * 1994-04-22 1997-11-04 The United States Of America As Represented By The Secretary Of The Army Motion and sound monitor and stimulator
US5729694A (en) * 1996-02-06 1998-03-17 The Regents Of The University Of California Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
US5754665A (en) * 1995-02-27 1998-05-19 Nec Corporation Noise Canceler
US5835608A (en) * 1995-07-10 1998-11-10 Applied Acoustic Research Signal separating system
US5853005A (en) * 1996-05-02 1998-12-29 The United States Of America As Represented By The Secretary Of The Army Acoustic monitoring system
US5917921A (en) * 1991-12-06 1999-06-29 Sony Corporation Noise reducing microphone apparatus
US5966090A (en) * 1998-03-16 1999-10-12 Mcewan; Thomas E. Differential pulse radar motion sensor
US5986600A (en) * 1998-01-22 1999-11-16 Mcewan; Thomas E. Pulsed RF oscillator and radar motion sensor
US6000396A (en) * 1995-08-17 1999-12-14 University Of Florida Hybrid microprocessor controlled ventilator unit
US6006175A (en) * 1996-02-06 1999-12-21 The Regents Of The University Of California Methods and apparatus for non-acoustic speech characterization and recognition
US6069963A (en) * 1996-08-30 2000-05-30 Siemens Audiologische Technik Gmbh Hearing aid wherein the direction of incoming sound is determined by different transit times to multiple microphones in a sound channel
US6191724B1 (en) * 1999-01-28 2001-02-20 Mcewan Thomas E. Short pulse microwave transceiver
US6233551B1 (en) * 1998-05-09 2001-05-15 Samsung Electronics Co., Ltd. Method and apparatus for determining multiband voicing levels using frequency shifting method in vocoder
US6266422B1 (en) * 1997-01-29 2001-07-24 Nec Corporation Noise canceling method and apparatus for the same
US6430295B1 (en) * 1997-07-11 2002-08-06 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for measuring signal level and delay at multiple sensors

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0637187B1 (en) 1993-07-28 1999-12-22 Pan Communications, Inc. Two-way communications earset
US5933506A (en) 1994-05-18 1999-08-03 Nippon Telegraph And Telephone Corporation Transmitter-receiver having ear-piece type acoustic transducing part
JP3522954B2 (en) 1996-03-15 2004-04-26 株式会社東芝 Microphone array input type speech recognition apparatus and method
JP2000312395A (en) 1999-04-28 2000-11-07 Alpine Electronics Inc Microphone system
JP2001189987A (en) 1999-12-28 2001-07-10 Pioneer Electronic Corp Narrow directivity microphone unit
US20020039425A1 (en) 2000-07-19 2002-04-04 Burnett Gregory C. Method and apparatus for removing noise from electronic signals

Patent Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3789166A (en) * 1971-12-16 1974-01-29 Dyna Magnetic Devices Inc Submersion-safe microphone
US4006318A (en) * 1975-04-21 1977-02-01 Dyna Magnetic Devices, Inc. Inertial microphone system
US4591668A (en) * 1984-05-08 1986-05-27 Iwata Electric Co., Ltd. Vibration-detecting type microphone
US4777649A (en) * 1985-10-22 1988-10-11 Speech Systems, Inc. Acoustic feedback control of microphone positioning and speaking volume
US4653102A (en) * 1985-11-05 1987-03-24 Position Orientation Systems Directional microphone system
US4901354A (en) * 1987-12-18 1990-02-13 Daimler-Benz Ag Method for improving the reliability of voice controls of function elements and device for carrying out this method
US5097515A (en) * 1988-11-30 1992-03-17 Matsushita Electric Industrial Co., Ltd. Electret condenser microphone
US5212764A (en) * 1989-04-19 1993-05-18 Ricoh Company, Ltd. Noise eliminating apparatus and speech recognition apparatus using the same
US5917921A (en) * 1991-12-06 1999-06-29 Sony Corporation Noise reducing microphone apparatus
US5539859A (en) * 1992-02-18 1996-07-23 Alcatel N.V. Method of using a dominant angle of incidence to reduce acoustic noise in a speech signal
US5664052A (en) * 1992-04-15 1997-09-02 Sony Corporation Method and device for discriminating voiced and unvoiced sounds
US5473702A (en) * 1992-06-03 1995-12-05 Oki Electric Industry Co., Ltd. Adaptive noise canceller
US5400409A (en) * 1992-12-23 1995-03-21 Daimler-Benz Ag Noise-reduction method for noise-affected voice channels
US5517435A (en) * 1993-03-11 1996-05-14 Nec Corporation Method of identifying an unknown system with a band-splitting adaptive filter and a device thereof
US5649055A (en) * 1993-03-26 1997-07-15 Hughes Electronics Voice activity detector for speech signals in variable background noise
US5633935A (en) * 1993-04-13 1997-05-27 Matsushita Electric Industrial Co., Ltd. Stereo ultradirectional microphone apparatus
US5590241A (en) * 1993-04-30 1996-12-31 Motorola Inc. Speech processing system and method for enhancing a speech signal in a noisy environment
US5414776A (en) * 1993-05-13 1995-05-09 Lectrosonics, Inc. Adaptive proportional gain audio mixing system
US5406622A (en) * 1993-09-02 1995-04-11 At&T Corp. Outbound noise cancellation for telephonic handset
US5684460A (en) * 1994-04-22 1997-11-04 The United States Of America As Represented By The Secretary Of The Army Motion and sound monitor and stimulator
US5515865A (en) * 1994-04-22 1996-05-14 The United States Of America As Represented By The Secretary Of The Army Sudden Infant Death Syndrome (SIDS) monitor and stimulator
US5754665A (en) * 1995-02-27 1998-05-19 Nec Corporation Noise Canceler
US5835608A (en) * 1995-07-10 1998-11-10 Applied Acoustic Research Signal separating system
US6000396A (en) * 1995-08-17 1999-12-14 University Of Florida Hybrid microprocessor controlled ventilator unit
US6006175A (en) * 1996-02-06 1999-12-21 The Regents Of The University Of California Methods and apparatus for non-acoustic speech characterization and recognition
US5729694A (en) * 1996-02-06 1998-03-17 The Regents Of The University Of California Speech coding, reconstruction and recognition using acoustics and electromagnetic waves
US5853005A (en) * 1996-05-02 1998-12-29 The United States Of America As Represented By The Secretary Of The Army Acoustic monitoring system
US6069963A (en) * 1996-08-30 2000-05-30 Siemens Audiologische Technik Gmbh Hearing aid wherein the direction of incoming sound is determined by different transit times to multiple microphones in a sound channel
US6266422B1 (en) * 1997-01-29 2001-07-24 Nec Corporation Noise canceling method and apparatus for the same
US6430295B1 (en) * 1997-07-11 2002-08-06 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for measuring signal level and delay at multiple sensors
US5986600A (en) * 1998-01-22 1999-11-16 Mcewan; Thomas E. Pulsed RF oscillator and radar motion sensor
US5966090A (en) * 1998-03-16 1999-10-12 Mcewan; Thomas E. Differential pulse radar motion sensor
US6233551B1 (en) * 1998-05-09 2001-05-15 Samsung Electronics Co., Ltd. Method and apparatus for determining multiband voicing levels using frequency shifting method in vocoder
US6191724B1 (en) * 1999-01-28 2001-02-20 Mcewan Thomas E. Short pulse microwave transceiver

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070118375A1 (en) * 1999-09-21 2007-05-24 Kenyon Stephen C Audio Identification System And Method
US9715626B2 (en) 1999-09-21 2017-07-25 Iceberg Industries, Llc Method and apparatus for automatically recognizing input audio and/or video streams
US7783489B2 (en) * 1999-09-21 2010-08-24 Iceberg Industries Llc Audio identification system and method
US9196261B2 (en) 2000-07-19 2015-11-24 Aliphcom Voice activity detector (VAD)—based multiple-microphone acoustic noise suppression
US8589169B2 (en) * 2002-07-31 2013-11-19 Nathan T. Bradley System and method for creating audio files
US6961623B2 (en) 2002-10-17 2005-11-01 Rehabtronics Inc. Method and apparatus for controlling a device or process with vibrations generated by tooth clicks
US9066186B2 (en) 2003-01-30 2015-06-23 Aliphcom Light-based detection for acoustic applications
US8254617B2 (en) * 2003-03-27 2012-08-28 Aliphcom, Inc. Microphone array with rear venting
US9099094B2 (en) 2003-03-27 2015-08-04 Aliphcom Microphone array with rear venting
US20090003640A1 (en) * 2003-03-27 2009-01-01 Burnett Gregory C Microphone Array With Rear Venting
US20050027522A1 (en) * 2003-07-30 2005-02-03 Koichi Yamamoto Speech recognition method and apparatus therefor
US8046229B2 (en) * 2003-08-08 2011-10-25 Audioeye, Inc. Method and apparatus for website navigation by the visually impaired
US20110307259A1 (en) * 2003-08-08 2011-12-15 Bradley Nathan T System and method for audio content navigation
US8296150B2 (en) * 2003-08-08 2012-10-23 Audioeye, Inc. System and method for audio content navigation
US20100095210A1 (en) * 2003-08-08 2010-04-15 Audioeye, Inc. Method and Apparatus for Website Navigation by the Visually Impaired
US8340309B2 (en) * 2004-08-06 2012-12-25 Aliphcom, Inc. Noise suppressing multi-microphone headset
US20060120537A1 (en) * 2004-08-06 2006-06-08 Burnett Gregory C Noise suppressing multi-microphone headset
US8364492B2 (en) * 2006-07-13 2013-01-29 Nec Corporation Apparatus, method and program for giving warning in connection with inputting of unvoiced speech
US20090254350A1 (en) * 2006-07-13 2009-10-08 Nec Corporation Apparatus, Method and Program for Giving Warning in Connection with inputting of unvoiced Speech
US20100036657A1 (en) * 2006-11-20 2010-02-11 Mitsunori Morisaki Speech estimation system, speech estimation method, and speech estimation program
US20110071825A1 (en) * 2008-05-28 2011-03-24 Tadashi Emori Device, method and program for voice detection and recording medium
US8589152B2 (en) * 2008-05-28 2013-11-19 Nec Corporation Device, method and program for voice detection and recording medium
US10055490B2 (en) 2010-07-29 2018-08-21 Soundhound, Inc. System and methods for continuous audio matching
US9047371B2 (en) 2010-07-29 2015-06-02 Soundhound, Inc. System and method for matching a query against a broadcast stream
US9390167B2 (en) 2010-07-29 2016-07-12 Soundhound, Inc. System and methods for continuous audio matching
US10657174B2 (en) 2010-07-29 2020-05-19 Soundhound, Inc. Systems and methods for providing identification information in response to an audio segment
US9563699B1 (en) 2010-07-29 2017-02-07 Soundhound, Inc. System and method for matching a query against a broadcast stream
US20130317809A1 (en) * 2010-08-24 2013-11-28 Lawrence Livermore National Security, Llc Speech masking and cancelling and voice obscuration
US8577057B2 (en) 2010-11-02 2013-11-05 Robert Bosch Gmbh Digital dual microphone module with intelligent cross fading
US9240195B2 (en) * 2010-11-25 2016-01-19 Goertek Inc. Speech enhancing method and device, and denoising communication headphone enhancing method and device, and denoising communication headphones
US20130024194A1 (en) * 2010-11-25 2013-01-24 Goertek Inc. Speech enhancing method and device, and nenoising communication headphone enhancing method and device, and denoising communication headphones
US10832287B2 (en) 2011-05-10 2020-11-10 Soundhound, Inc. Promotional content targeting based on recognized audio
US10121165B1 (en) 2011-05-10 2018-11-06 Soundhound, Inc. System and method for targeting content based on identified audio and multimedia
US12100023B2 (en) 2011-05-10 2024-09-24 Soundhound Ai Ip, Llc Query-specific targeted ad delivery
US10957310B1 (en) 2012-07-23 2021-03-23 Soundhound, Inc. Integrated programming framework for speech and text understanding with meaning parsing
US10996931B1 (en) 2012-07-23 2021-05-04 Soundhound, Inc. Integrated programming framework for speech and text understanding with block and statement structure
US11776533B2 (en) 2012-07-23 2023-10-03 Soundhound, Inc. Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement
US9516442B1 (en) 2012-09-28 2016-12-06 Apple Inc. Detecting the positions of earbuds and use of these positions for selecting the optimum microphones in a headset
US11627413B2 (en) 2012-11-05 2023-04-11 Jawbone Innovations, Llc Acoustic voice activity detection (AVAD) for electronic systems
US9767829B2 (en) * 2013-09-16 2017-09-19 Samsung Electronics Co., Ltd. Speech signal processing apparatus and method for enhancing speech intelligibility
US20150081285A1 (en) * 2013-09-16 2015-03-19 Samsung Electronics Co., Ltd. Speech signal processing apparatus and method for enhancing speech intelligibility
US9507849B2 (en) 2013-11-28 2016-11-29 Soundhound, Inc. Method for combining a query and a communication command in a natural language computer system
US9601114B2 (en) 2014-02-01 2017-03-21 Soundhound, Inc. Method for embedding voice mail in a spoken utterance using a natural language processing computer system
US9292488B2 (en) 2014-02-01 2016-03-22 Soundhound, Inc. Method for embedding voice mail in a spoken utterance using a natural language processing computer system
US11295730B1 (en) 2014-02-27 2022-04-05 Soundhound, Inc. Using phonetic variants in a local context to improve natural language understanding
US10311858B1 (en) 2014-05-12 2019-06-04 Soundhound, Inc. Method and system for building an integrated user profile
US11030993B2 (en) 2014-05-12 2021-06-08 Soundhound, Inc. Advertisement selection by linguistic classification
US9564123B1 (en) 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
US10860173B1 (en) 2016-03-18 2020-12-08 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11080469B1 (en) 2016-03-18 2021-08-03 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10896286B2 (en) 2016-03-18 2021-01-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10928978B2 (en) 2016-03-18 2021-02-23 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10867120B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10997361B1 (en) 2016-03-18 2021-05-04 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10444934B2 (en) 2016-03-18 2019-10-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10845947B1 (en) 2016-03-18 2020-11-24 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11029815B1 (en) 2016-03-18 2021-06-08 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11061532B2 (en) 2016-03-18 2021-07-13 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US12045560B2 (en) 2016-03-18 2024-07-23 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10866691B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11151304B2 (en) 2016-03-18 2021-10-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11157682B2 (en) 2016-03-18 2021-10-26 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10845946B1 (en) 2016-03-18 2020-11-24 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11455458B2 (en) 2016-03-18 2022-09-27 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10809877B1 (en) 2016-03-18 2020-10-20 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11727195B2 (en) 2016-03-18 2023-08-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11836441B2 (en) 2016-03-18 2023-12-05 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11064296B2 (en) * 2017-12-28 2021-07-13 Iflytek Co., Ltd. Voice denoising method and apparatus, server and storage medium
US10762280B2 (en) 2018-08-16 2020-09-01 Audioeye, Inc. Systems, devices, and methods for facilitating website remediation and promoting assistive technologies
US10423709B1 (en) 2018-08-16 2019-09-24 Audioeye, Inc. Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces

Also Published As

Publication number Publication date
US7246058B2 (en) 2007-07-17

Similar Documents

Publication Publication Date Title
US7246058B2 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US20070233479A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US8321213B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
US8326611B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
US8503686B2 (en) Vibration sensor and acoustic voice activity detection system (VADS) for use with electronic systems
US10230346B2 (en) Acoustic voice activity detection
US20140126743A1 (en) Acoustic voice activity detection (avad) for electronic systems
US8942383B2 (en) Wind suppression/replacement component for use with electronic systems
US8488803B2 (en) Wind suppression/replacement component for use with electronic systems
EP2633519B1 (en) Method and apparatus for voice activity detection
US7372770B2 (en) Ultrasonic Doppler sensor for speech-based user interface
JP3812887B2 (en) Signal processing system and method
WO2002098169A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US11627413B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
AU2016202314A1 (en) Acoustic Voice Activity Detection (AVAD) for electronic systems
Kalgaonkar et al. Ultrasonic doppler sensor for voice activity detection
EP1415505A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US12063487B2 (en) Acoustic voice activity detection (AVAD) for electronic systems

Legal Events

Date Code Title Description
AS Assignment

Owner name: ALIPHCOM, INC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BURNETT, GREGORY C.;REEL/FRAME:013855/0906

Effective date: 20030312

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: DBD CREDIT FUNDING LLC, AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNORS:ALIPHCOM;ALIPH, INC.;MACGYVER ACQUISITION LLC;AND OTHERS;REEL/FRAME:030968/0051

Effective date: 20130802

Owner name: DBD CREDIT FUNDING LLC, AS ADMINISTRATIVE AGENT, N

Free format text: SECURITY AGREEMENT;ASSIGNORS:ALIPHCOM;ALIPH, INC.;MACGYVER ACQUISITION LLC;AND OTHERS;REEL/FRAME:030968/0051

Effective date: 20130802

AS Assignment

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, OREGON

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ALIPHCOM;ALIPH, INC.;MACGYVER ACQUISITION LLC;AND OTHERS;REEL/FRAME:031764/0100

Effective date: 20131021

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT,

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ALIPHCOM;ALIPH, INC.;MACGYVER ACQUISITION LLC;AND OTHERS;REEL/FRAME:031764/0100

Effective date: 20131021

AS Assignment

Owner name: SILVER LAKE WATERMAN FUND, L.P., AS SUCCESSOR AGENT, CALIFORNIA

Free format text: NOTICE OF SUBSTITUTION OF ADMINISTRATIVE AGENT IN PATENTS;ASSIGNOR:DBD CREDIT FUNDING LLC, AS RESIGNING AGENT;REEL/FRAME:034523/0705

Effective date: 20141121

Owner name: SILVER LAKE WATERMAN FUND, L.P., AS SUCCESSOR AGEN

Free format text: NOTICE OF SUBSTITUTION OF ADMINISTRATIVE AGENT IN PATENTS;ASSIGNOR:DBD CREDIT FUNDING LLC, AS RESIGNING AGENT;REEL/FRAME:034523/0705

Effective date: 20141121

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: ALIPHCOM, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:035531/0554

Effective date: 20150428

Owner name: MACGYVER ACQUISITION LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:035531/0419

Effective date: 20150428

Owner name: PROJECT PARIS ACQUISITION, LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:035531/0554

Effective date: 20150428

Owner name: ALIPHCOM, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:035531/0419

Effective date: 20150428

Owner name: PROJECT PARIS ACQUISITION LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:035531/0419

Effective date: 20150428

Owner name: BLACKROCK ADVISORS, LLC, NEW JERSEY

Free format text: SECURITY INTEREST;ASSIGNORS:ALIPHCOM;MACGYVER ACQUISITION LLC;ALIPH, INC.;AND OTHERS;REEL/FRAME:035531/0312

Effective date: 20150428

Owner name: MACGYVER ACQUISITION LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:035531/0554

Effective date: 20150428

Owner name: ALIPH, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:035531/0419

Effective date: 20150428

Owner name: ALIPH, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:035531/0554

Effective date: 20150428

Owner name: BODYMEDIA, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:035531/0554

Effective date: 20150428

Owner name: BODYMEDIA, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:035531/0419

Effective date: 20150428

AS Assignment

Owner name: ALIPHCOM, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY NAME PREVIOUSLY RECORDED AT REEL: 013855 FRAME: 906. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:BURNETT, GREGORY C;REEL/FRAME:035532/0905

Effective date: 20030312

AS Assignment

Owner name: BLACKROCK ADVISORS, LLC, NEW JERSEY

Free format text: SECURITY INTEREST;ASSIGNORS:ALIPHCOM;MACGYVER ACQUISITION LLC;ALIPH, INC.;AND OTHERS;REEL/FRAME:036500/0173

Effective date: 20150826

AS Assignment

Owner name: BLACKROCK ADVISORS, LLC, NEW JERSEY

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE APPLICATION NO. 13870843 PREVIOUSLY RECORDED ON REEL 036500 FRAME 0173. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY INTEREST;ASSIGNORS:ALIPHCOM;MACGYVER ACQUISITION, LLC;ALIPH, INC.;AND OTHERS;REEL/FRAME:041793/0347

Effective date: 20150826

AS Assignment

Owner name: ALIPHCOM, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIPHCOM DBA JAWBONE;REEL/FRAME:043637/0796

Effective date: 20170619

Owner name: JAWB ACQUISITION, LLC, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIPHCOM, LLC;REEL/FRAME:043638/0025

Effective date: 20170821

AS Assignment

Owner name: ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIPHCOM;REEL/FRAME:043711/0001

Effective date: 20170619

Owner name: ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS)

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIPHCOM;REEL/FRAME:043711/0001

Effective date: 20170619

AS Assignment

Owner name: JAWB ACQUISITION LLC, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC;REEL/FRAME:043746/0693

Effective date: 20170821

AS Assignment

Owner name: PROJECT PARIS ACQUISITION LLC, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:045167/0597

Effective date: 20150428

Owner name: ALIPH, INC., CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:045167/0597

Effective date: 20150428

Owner name: BODYMEDIA, INC., CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:045167/0597

Effective date: 20150428

Owner name: MACGYVER ACQUISITION LLC, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:045167/0597

Effective date: 20150428

Owner name: ALIPHCOM, ARKANSAS

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 13/982,956 PREVIOUSLY RECORDED AT REEL: 035531 FRAME: 0554. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST;ASSIGNOR:SILVER LAKE WATERMAN FUND, L.P., AS ADMINISTRATIVE AGENT;REEL/FRAME:045167/0597

Effective date: 20150428

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Free format text: 11.5 YR SURCHARGE- LATE PMT W/IN 6 MO, SMALL ENTITY (ORIGINAL EVENT CODE: M2556); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2553); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 12

AS Assignment

Owner name: ALIPHCOM (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC, NEW YORK

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BLACKROCK ADVISORS, LLC;REEL/FRAME:055207/0593

Effective date: 20170821

AS Assignment

Owner name: JI AUDIO HOLDINGS LLC, NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JAWB ACQUISITION LLC;REEL/FRAME:056320/0195

Effective date: 20210518

AS Assignment

Owner name: JAWBONE INNOVATIONS, LLC, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JI AUDIO HOLDINGS LLC;REEL/FRAME:056323/0728

Effective date: 20210518

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2022-00623

Opponent name: GOOGLE LLC

Effective date: 20220308

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2022-01021

Opponent name: APPLE, INC., SAMSUNG ELECTRONICS CO., LTD., AND SAMSUNG ELECTRONICS AMERICA, INC.

Effective date: 20220516

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2023-00222

Opponent name: AMAZON.COM, INC., ANDAMAZON.COM SERVICES LLC

Effective date: 20221117

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2023-01110

Opponent name: LG ELECTRONICS, INC.

Effective date: 20230623

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2023-01119

Opponent name: SONY ELECTRONICS, INC., SONY GROUP CORPORATION, SONY CORPORATION, AND SONY CORPORATION OF AMERICA

Effective date: 20230623

IPR Aia trial proceeding filed before the patent and appeal board: inter partes review

Free format text: TRIAL NO: IPR2023-01352

Opponent name: META PLATFORMS, INC.

Effective date: 20230831