EP1415505A1 - Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors - Google Patents

Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Info

Publication number
EP1415505A1
EP1415505A1 EP02739572A EP02739572A EP1415505A1 EP 1415505 A1 EP1415505 A1 EP 1415505A1 EP 02739572 A EP02739572 A EP 02739572A EP 02739572 A EP02739572 A EP 02739572A EP 1415505 A1 EP1415505 A1 EP 1415505A1
Authority
EP
European Patent Office
Prior art keywords
acoustic signals
speech
acoustic
noise
difference parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02739572A
Other languages
German (de)
English (en)
French (fr)
Inventor
Gregory C. Burnett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AliphCom LLC
Original Assignee
AliphCom LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/905,361 external-priority patent/US20020039425A1/en
Priority claimed from US09/990,847 external-priority patent/US20020099541A1/en
Application filed by AliphCom LLC filed Critical AliphCom LLC
Priority claimed from PCT/US2002/017251 external-priority patent/WO2002098169A1/en
Publication of EP1415505A1 publication Critical patent/EP1415505A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • TECHNICAL FIELD The disclosed embodiments relate to the processing of speech signals.
  • voiced and unvoiced speech are critical to many speech applications including speech recognition, speaker verification, noise suppression, and many others.
  • speech from a human speaker is captured and transmitted to a receiver in a different location.
  • noise sources that pollute the speech signal, or the signal of interest, with unwanted acoustic noise. This makes it difficult or impossible for the receiver, whether human or machine, to understand the user's speech.
  • Typical methods for classifying voiced and unvoiced speech have relied mainly on the acoustic content of microphone data, which is plagued by problems with noise and the corresponding uncertainties in signal content. This is especially problematic now with the proliferation of portable communication devices like cellular telephones and personal digital assistants because, in many cases, the quality of service provided by the device depends on the quality of the voice services offered by the device.
  • There are methods known in the art for suppressing the noise present in the speech signals demonstrate performance shortcomings that include unusually long computing time, requirements for cumbersome hardware to perform the signal processing, and distorting the signals of interest.
  • Figure 1 is a block diagram of a NAVSAD system, under an embodiment.
  • Figure 2 is a block diagram of a PSAD system, under an embodiment.
  • Figure 3 is a block diagram of a denoising system, referred to herein as the Pathfinder system, under an embodiment.
  • Figure 4 is a flow diagram of a detection algorithm for use in detecting voiced and unvoiced speech, under an embodiment.
  • Figure 5A plots the received GEMS signal for an utterance along with the mean correlation between the GEMS signal and the Mic 1 signal and the threshold for voiced speech detection.
  • Figure 5B plots the received GEMS signal for an utterance along with the standard deviation of the GEMS signal and the threshold for voiced speech detection.
  • Figure 6 plots voiced speech detected from an utterance along with the GEMS signal and the acoustic noise.
  • Figure 7 is a microphone array for use under an embodiment of the PSAD system.
  • Figure 8 is a plot of ⁇ M versus di for several ⁇ d values, under an embodiment.
  • Figure 9 shows a plot of the gain parameter as the sum of the absolute values of H- ⁇ (z) and the acoustic data or audio from microphone 1.
  • Figure 10 is an alternative plot of acoustic data presented in Figure 9.
  • FIG. 1 is a block diagram of a NAVSAD system 100, under an embodiment.
  • the NAVSAD system couples microphones 10 and sensors 20 to at least one processor 30.
  • the sensors 20 of an embodiment include voicing activity detectors or non-acoustic sensors.
  • the processor 30 controls subsystems including a detection subsystem 50, referred to herein as a detection algorithm, and a denoising subsystem 40. Operation of the denoising subsystem 40 is described in detail in the Related Applications.
  • the NAVSAD system works extremely well in any background acoustic noise environment.
  • FIG. 2 is a block diagram of a PSAD system 200, under an embodiment.
  • the PSAD system couples microphones 10 to at least one processor 30.
  • the processor 30 includes a detection subsystem 50, referred to herein as a detection algorithm, and a denoising subsystem 40.
  • the PSAD system is highly sensitive in low acoustic noise environments and relatively insensitive in high acoustic noise environments.
  • the PSAD can operate independently or as a backup to the NAVSAD, detecting voiced speech if the NAVSAD fails.
  • the detection subsystems 50 and denoising subsystems 40 of both the NAVSAD and PSAD systems of an embodiment are algorithms controlled by the processor 30, but are not so limited.
  • Alternative embodiments of the NAVSAD and PSAD systems can include detection subsystems 50 and/or denoising subsystems 40 that comprise additional hardware, firmware, software, and/or combinations of hardware, firmware, and software. Furthermore, functions of the detection subsystems 50 and denoising subsystems 40 may be distributed across numerous components of the NAVSAD and PSAD systems.
  • FIG 3 is a block diagram of a denoising subsystem 300, referred to herein as the Pathfinder system, under an embodiment.
  • the Pathfinder system is briefly described below, and is described in detail in the Related Applications. Two microphones Mic 1 and Mic 2 are used in the Pathfinder system, and Mic 1 is considered the "signal" microphone.
  • the Pathfinder system 300 is equivalent to the NAVSAD system 100 when the voicing activity detector (VAD) 320 is a non-acoustic voicing sensor 20 and the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40.
  • VAD voicing activity detector
  • the Pathfinder system 300 is equivalent to the PSAD system 200 in the absence of the VAD 320, and when the noise removal subsystem 340 includes the detection subsystem 50 and the denoising subsystem 40.
  • the NAVSAD and PSAD systems support a two-level commercial approach in which (i) a relatively less expensive PSAD system supports an acoustic approach that functions in most low- to medium-noise environments, and (ii) a NAVSAD system adds a non-acoustic sensor to enable detection of voiced speech in any environment. Unvoiced speech is normally not detected using the sensor, as it normally does not sufficiently vibrate human tissue.
  • the NAVSAD and PSAD systems include an array algorithm for speech detection that uses the difference in frequency content between two microphones to calculate a relationship between the signals of the two microphones. This is in contrast to conventional arrays that attempt to use the time/phase difference of each microphone to remove the noise outside of an "area of sensitivity".
  • the methods described herein provide a significant advantage, as they do not require a specific orientation of the array with respect to the signal.
  • the systems described herein are sensitive to noise of every type and every orientation, unlike conventional arrays that depend on specific noise orientations. Consequently, the frequency-based arrays presented herein are unique as they depend only on the relative orientation of the two microphones themselves with no dependence on the orientation of the noise and signal with respect to the microphones. This results in a robust signal processing system with respect to the type of noise, microphones, and orientation between the noise/signal source and the microphones.
  • the systems described herein use the information derived from the
  • the voicing state includes silent, voiced, and unvoiced states.
  • the NAVSAD system for example, includes a non-acoustic sensor to detect the vibration of human tissue associated with speech.
  • the non-acoustic sensor of an embodiment is a General Electromagnetic Movement Sensor (GEMS) as described briefly below and in detail in the Related Applications, but is not so limited. Alternative embodiments, however, may use any sensor that is able to detect human tissue motion associated with speech and is unaffected by environmental acoustic noise.
  • GEMS General Electromagnetic Movement Sensor
  • the GEMS is a radio frequency device (2.4 GHz) that allows the detection of moving human tissue dielectric interfaces.
  • the GEMS includes an RF interferometer that uses homodyne mixing to detect small phase shifts associated with target motion. In essence, the sensor sends out weak electromagnetic waves (less than 1 milliwatt) that reflect off of whatever is around the sensor. The reflected waves are mixed with the original transmitted waves and the results analyzed for any change in position of the targets. Anything that moves near the sensor will cause a change in phase of the reflected wave that will be amplified and displayed as a change in voltage output from the sensor.
  • a similar sensor is described by Gregory C. Burnett (1999) in "The physiological basis of glottal electromagnetic micropower sensors (GEMS) and their use in defining an excitation function for the human vocal tract"; Ph.D. Thesis, University of California at Davis.
  • FIG 4 is a flow diagram of a detection algorithm 50 for use in detecting voiced and unvoiced speech, under an embodiment.
  • both the NAVSAD and PSAD systems of an embodiment include the detection algorithm 50 as the detection subsystem 50.
  • This detection algorithm 50 operates in real-time and, in an embodiment, operates on 20 millisecond windows and steps 10 milliseconds at a time, but is not so limited.
  • the voice activity determination is recorded for the first 10 milliseconds, and the second 10 milliseconds functions as a "look-ahead" buffer. While an embodiment uses the 20/10 windows, alternative embodiments may use numerous other combinations of window values.
  • Pathfinder performance can be compromised if the adaptive filter training is conducted on speech rather than on noise. It is therefore important not to exclude any significant amount of speech from the VAD to keep such disturbances to a minimum.
  • the systems using the detection algorithm of an embodiment function in environments containing varying amounts of background acoustic noise. If the non-acoustic sensor is available, this external noise is not a problem for voiced speech. However, for unvoiced speech (and voiced if the non-acoustic sensor is not available or has malfunctioned) reliance is placed on acoustic data alone to separate noise from unvoiced speech.
  • An advantage inheres in the use of two microphones in an embodiment of the Pathfinder noise suppression system, and the spatial relationship between the microphones is exploited to assist in the detection of unvoiced speech. However, there may occasionally be noise levels high enough that the speech will be nearly undetectable and the acoustic-only method will fail.
  • the non-acoustic sensor (or hereafter just the sensor) will be required to ensure good performance.
  • the speech source should be relatively louder in one designated microphone when compared to the other microphone. Tests have shown that this requirement is easily met with conventional microphones when the microphones are placed on the head, as any noise should result in an Hi with a gain near unity.
  • the NAVSAD relies on two parameters to detect voiced speech. These two parameters include the energy of the sensor in the window of interest, determined in an embodiment by the standard deviation (SD), and optionally the cross- correlation (XCORR) between the acoustic signal from microphone 1 and the sensor data.
  • SD standard deviation
  • XCORR cross- correlation
  • the energy of the sensor can be determined in any one of a number of ways, and the SD is just one convenient way to determine the energy.
  • the SD is akin to the energy of the signal, which normally corresponds quite accurately to the voicing state, but may be susceptible to movement noise (relative motion of the sensor with respect to the human user) and/or electromagnetic noise.
  • the XCORR can be used. The XCORR is only calculated to 15 delays, which corresponds to just under 2 milliseconds at 8000 Hz.
  • the XCORR can also be useful when the sensor signal is distorted or modulated in some fashion. For example, there are sensor locations (such as the jaw or back of the neck) where speech production can be detected but where the signal may have incorrect or distorted time-based information. That is, they may not have well defined features in time that will match with the acoustic waveform. However, XCORR is more susceptible to errors from acoustic noise, and in high ( ⁇ 0 dB SNR) environments is almost useless. Therefore it should not be the sole source of voicing information.
  • the sensor detects human tissue motion associated with the closure of the vocal folds, so the acoustic signal produced by the closure of the folds is highly correlated with the closures. Therefore, sensor data that correlates highly with the acoustic signal is declared as speech, and sensor data that does not correlate well is termed noise.
  • the acoustic data is expected to lag behind the sensor data by about 0.1 to 0.8 milliseconds (or about 1-7 samples) as a result of the delay time due to the relatively slower speed of sound (around 330 m/s).
  • an embodiment uses a 15-sample correlation, as the acoustic wave shape varies significantly depending on the sound produced, and a larger correlation width is needed to ensure detection.
  • the SD and XCORR signals are related, but are sufficiently different so that the voiced speech detection is more reliable. For simplicity, though, either parameter may be used.
  • the values for the SD and XCORR are compared to empirical thresholds, and if both are above their threshold, voiced speech is declared. Example data is presented and described below.
  • Figures 5A, 5B, and 6 show data plots for an example in which a subject twice speaks the phrase "pop pan", under an embodiment.
  • Figure 5A plots the received GEMS signal 502 for this utterance along with the mean correlation 504 between the GEMS signal and the Mic 1 signal and the threshold T1 used for voiced speech detection.
  • Figure 5B plots the received GEMS signal 502 for this utterance along with the standard deviation 506 of the GEMS signal and the threshold T2 used for voiced speech detection.
  • Figure 6 plots voiced speech 602 detected from the acoustic or audio signal 608, along with the GEMS signal 604 and the acoustic noise 606; no unvoiced speech is detected in this example because of the heavy background babble noise 606.
  • the thresholds have been set so that there are virtually no false negatives, and only occasional false positives.
  • a voiced speech activity detection accuracy of greater than 99% has been attained under any acoustic background noise conditions.
  • the NAVSAD can determine when voiced speech is occurring with high degrees of accuracy due to the non-acoustic sensor data. However, the sensor offers little assistance in separating unvoiced speech from noise, as unvoiced speech normally causes no detectable signal in most non-acoustic sensors. If there is a detectable signal, the NAVSAD can be used, although use of the SD method is dictated as unvoiced speech is normally poorly correlated. In the absence of a detectable signal use is made of the system and methods of the Pathfinder noise removal algorithm in determining when unvoiced speech is occurring. A brief review of the Pathfinder algorithm is described below, while a detailed description is provided in the Related Applications.
  • the acoustic information coming into Microphone 1 is denoted by m ⁇ (n)
  • the information coming into Microphone 2 is similarly labeled m 2 (n)
  • the GEMS sensor is assumed available to determine voiced speech areas.
  • these signals are represented as M ⁇ (z) and M 2 (z).
  • N 2 (z) N(z)H ! (z)
  • Equation 1 has four unknowns and only two relationships and cannot be solved explicitly. However, there is another way to solve for some of the unknowns in
  • M ln (z) N ⁇ z)H x (z)
  • ⁇ - ⁇ (z) can be calculated using any of the available system identification algorithms and the microphone outputs when only noise is being received. The calculation can be done adaptively, so that if the noise changes significantly H ⁇ (z) can be recalculated quickly.
  • Equation 1 With a solution for one of the unknowns in Equation 1 , solutions can be found for another; H 2 (z), by using the amplitude of the GEMS or similar device along with the amplitude of the two microphones.
  • the PSAD system As sound waves propagate, they normally lose energy as they travel due to diffraction and dispersion. Assuming the sound waves originate from a point source and radiate isotropically, their amplitude will decrease as a function of 1/r, where r is the distance from the originating point. This function of 1/r proportional to amplitude is the worst case, if confined to a smaller area the reduction will be less. However it is an adequate model for the configurations of interest, specifically the propagation of noise and speech to microphones located somewhere on the user's head.
  • Figure 7 is a microphone array for use under an embodiment of the PSAD system. Placing the microphones Mic 1 and Mic 2 in a linear array with the mouth on the array midline, the difference in signal strength in Mic 1 and Mic 2 (assuming the microphones have identical frequency responses) will be proportional to both di and ⁇ d. Assuming a 1/r (or in this case 1/d) relationship, it is seen that
  • ⁇ M is the difference in gain between Mic 1 and Mic 2 and therefore H-i(z), as above in Equation 2.
  • d-i is the distance from Mic 1 to the speech or noise source.
  • Figure 8 is a plot 800 of ⁇ M versus di for several ⁇ d values, under an embodiment. It is clear that as ⁇ d becomes larger and the noise source is closer, ⁇ M becomes larger. The variable ⁇ d will change depending on the orientation to the speech/noise source, from the maximum value on the array midline to zero perpendicular to the array midline. From the plot 800 it is clear that for small ⁇ d and for distances over approximately 30 centimeters (cm), ⁇ M is close to unity. Since most noise sources are farther away than 30 cm and are unlikely to be on the midline on the array, it is probable that when calculating H- ⁇ (z) as above in Equation 2, ⁇ M (or equivalently the gain of H ⁇ (z)) will be close to unity. Conversely, for noise sources that are close (within a few centimeters), there could be a substantial difference in gain depending on which microphone is closer to the noise.
  • FIG. 9 shows a plot 900 of the gain parameter 902 as the sum of the absolute values of H ⁇ (z) and the acoustic data 904 or audio from microphone 1.
  • the speech signal was an utterance of the phrase "pop pan", repeated twice.
  • the evaluated bandwidth included the frequency range from 2500 Hz to 3500 Hz, although 1500Hz to 2500 Hz was additionally used in practice. Note the rapid increase in the gain when the unvoiced speech is first encountered, then the rapid return to normal when the speech ends.
  • the large changes in gain that result from transitions between noise and speech can be detected by any standard signal processing techniques.
  • the standard deviation of the last few gain calculations is used, with thresholds being defined by a running average of the standard deviations and the standard deviation noise floor. The later changes in gain for the voiced speech are suppressed in this plot 900 for clarity.
  • Figure 10 is an alternative plot 1000 of acoustic data presented in Figure 9.
  • the data used to form plot 900 is presented again in this plot 1000, along with audio data 1004 and GEMS data 1006 without noise to make the unvoiced speech apparent.
  • this automatic backup of the NAVSAD system functions best in an environment with low noise (approximately 10+ dB SNR), as high amounts (10 dB of SNR or less) of acoustic noise can quickly overwhelm any acoustic-only unvoiced detector, including the PSAD.
  • This is evident in the difference in the voiced signal data 602 and 1002 shown in plots 600 and 100 of Figures 6 and 10, respectively, where the same utterance is spoken, but the data of plot 600 shows no unvoiced speech because the unvoiced speech is undetectable. This is the desired behavior when performing denoising, since if the unvoiced speech is not detectable then it will not significantly affect the denoising process.
  • the configuration of the microphones can have an effect on the change in gain associated with speech and the thresholds needed to detect speech.
  • each configuration will require testing to determine the proper thresholds, but tests with two very different microphone configurations showed the same thresholds and other parameters to work well.
  • the first microphone set had the signal microphone near the mouth and the noise microphone several centimeters away at the ear, while the second configuration placed the noise and signal microphones back-to-back within a few centimeters of the mouth.
  • the results presented herein were derived using the first microphone configuration, but the results using the other set are virtually identical, so the detection algorithm is relatively robust with respect to microphone placement.
  • NAVSAD and PSAD systems detect voiced and unvoiced speech.
  • One configuration uses the NAVSAD system (non-acoustic only) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech.
  • An alternative configuration uses the NAVSAD system (non-acoustic correlated with acoustic) to detect voiced speech along with the PSAD system to detect unvoiced speech; the PSAD also functions as a backup to the NAVSAD system for detecting voiced speech.
  • Another alternative configuration uses the PSAD system to detect both voiced and unvoiced speech.
  • the "k” in “kick” has significant frequency content form 500 Hz to 4000 Hz, but a “sh” in “she” only contains significant energy from 1700-4000 Hz.
  • Voiced speech could be classified in a similar manner. For instance, an l ⁇ l (“ee”) has significant energy around 300 Hz and 2500 Hz, and an lal (“ah”) has energy at around 900 Hz and 1200 Hz. This ability to discriminate unvoiced and voiced speech in the presence of noise is, thus, very useful.
  • routines described herein can be provided with one or more of the following, or one or more combinations of the following: stored in non-volatile memory (not shown) that forms part of an associated processor or processors, or implemented using conventional programmed logic arrays or circuit elements, or stored in removable media such as disks, or downloaded from a server and stored locally at a client, or hardwired or preprogrammed in chips such as EEPROM semiconductor chips, application specific integrated circuits (ASICs), or by digital signal processing (DSP) integrated circuits.
  • non-volatile memory not shown
  • ASICs application specific integrated circuits
  • DSP digital signal processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
EP02739572A 2001-05-30 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors Withdrawn EP1415505A1 (en)

Applications Claiming Priority (27)

Application Number Priority Date Filing Date Title
US905361 1986-09-09
US29438301P 2001-05-30 2001-05-30
US294383P 2001-05-30
US09/905,361 US20020039425A1 (en) 2000-07-19 2001-07-12 Method and apparatus for removing noise from electronic signals
US33510001P 2001-10-30 2001-10-30
US335100P 2001-10-30
US33220201P 2001-11-21 2001-11-21
US990847 2001-11-21
US332202P 2001-11-21
US09/990,847 US20020099541A1 (en) 2000-11-21 2001-11-21 Method and apparatus for voiced speech excitation function determination and non-acoustic assisted feature extraction
US36216102P 2002-03-05 2002-03-05
US36198102P 2002-03-05 2002-03-05
US36216202P 2002-03-05 2002-03-05
US36217002P 2002-03-05 2002-03-05
US36210302P 2002-03-05 2002-03-05
US362161P 2002-03-05
US362103P 2002-03-05
US362170P 2002-03-05
US361981P 2002-03-05
US362162P 2002-03-05
US36820802P 2002-03-27 2002-03-27
US36834302P 2002-03-27 2002-03-27
US36820902P 2002-03-27 2002-03-27
US368208P 2002-03-27
US368209P 2002-03-27
US368343P 2002-03-27
PCT/US2002/017251 WO2002098169A1 (en) 2001-05-30 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Publications (1)

Publication Number Publication Date
EP1415505A1 true EP1415505A1 (en) 2004-05-06

Family

ID=31499757

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02739572A Withdrawn EP1415505A1 (en) 2001-05-30 2002-05-30 Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors

Country Status (5)

Country Link
EP (1) EP1415505A1 (ko)
JP (1) JP2005503579A (ko)
KR (1) KR100992656B1 (ko)
CN (1) CN1513278A (ko)
CA (1) CA2448669A1 (ko)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320559B (zh) 2007-06-07 2011-05-18 华为技术有限公司 一种声音激活检测装置及方法
CN101527756B (zh) * 2008-03-04 2012-03-07 联想(北京)有限公司 一种电话会议的方法及系统
WO2010002676A2 (en) * 2008-06-30 2010-01-07 Dolby Laboratories Licensing Corporation Multi-microphone voice activity detector
KR101451844B1 (ko) * 2013-03-27 2014-10-16 주식회사 시그테크 음성활동감지방법 및 그 방법을 채택한 통신장치
KR101396873B1 (ko) 2013-04-03 2014-05-19 주식회사 크린컴 두 개의 마이크로폰을 포함하는 통신장치에서의 잡음제거방법 및 장치
CN107371079B (zh) * 2017-04-17 2019-10-11 恒玄科技(上海)有限公司 一种耳机的双麦克降噪系统及降噪方法
WO2019030898A1 (ja) * 2017-08-10 2019-02-14 三菱電機株式会社 雑音除去装置および雑音除去方法
CN109192209A (zh) * 2018-10-23 2019-01-11 珠海格力电器股份有限公司 一种语音识别方法及装置
CN113724694B (zh) * 2021-11-01 2022-03-08 深圳市北科瑞声科技股份有限公司 语音转换模型训练方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO02098169A1 *

Also Published As

Publication number Publication date
JP2005503579A (ja) 2005-02-03
KR20040030638A (ko) 2004-04-09
CA2448669A1 (en) 2002-12-05
KR100992656B1 (ko) 2010-11-05
CN1513278A (zh) 2004-07-14

Similar Documents

Publication Publication Date Title
US7246058B2 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US20070233479A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US8321213B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
US8326611B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
US9263062B2 (en) Vibration sensor and acoustic voice activity detection systems (VADS) for use with electronic systems
US10230346B2 (en) Acoustic voice activity detection
EP2633519B1 (en) Method and apparatus for voice activity detection
US20140126743A1 (en) Acoustic voice activity detection (avad) for electronic systems
US8942383B2 (en) Wind suppression/replacement component for use with electronic systems
US8488803B2 (en) Wind suppression/replacement component for use with electronic systems
JP3812887B2 (ja) 信号処理システムおよび方法
WO2002098169A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US11627413B2 (en) Acoustic voice activity detection (AVAD) for electronic systems
AU2016202314A1 (en) Acoustic Voice Activity Detection (AVAD) for electronic systems
EP1415505A1 (en) Detecting voiced and unvoiced speech using both acoustic and nonacoustic sensors
US12063487B2 (en) Acoustic voice activity detection (AVAD) for electronic systems

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20031224

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1061142

Country of ref document: HK

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20051201

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1061142

Country of ref document: HK