ATE421136T1 - Audiovisuelle sprachaktivitätsdetektion für ein spracherkennungssystem - Google Patents

Audiovisuelle sprachaktivitätsdetektion für ein spracherkennungssystem

Info

Publication number
ATE421136T1
ATE421136T1 AT03702812T AT03702812T ATE421136T1 AT E421136 T1 ATE421136 T1 AT E421136T1 AT 03702812 T AT03702812 T AT 03702812T AT 03702812 T AT03702812 T AT 03702812T AT E421136 T1 ATE421136 T1 AT E421136T1
Authority
AT
Austria
Prior art keywords
voice
audiovisual
recognition system
activity detection
voice recognition
Prior art date
Application number
AT03702812T
Other languages
English (en)
Inventor
Antonio Colmenarez
Andreas Kellner
Original Assignee
Koninkl Philips Electronics Nv
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninkl Philips Electronics Nv filed Critical Koninkl Philips Electronics Nv
Application granted granted Critical
Publication of ATE421136T1 publication Critical patent/ATE421136T1/de

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
AT03702812T 2002-01-30 2003-01-29 Audiovisuelle sprachaktivitätsdetektion für ein spracherkennungssystem ATE421136T1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/058,730 US7219062B2 (en) 2002-01-30 2002-01-30 Speech activity detection using acoustic and facial characteristics in an automatic speech recognition system

Publications (1)

Publication Number Publication Date
ATE421136T1 true ATE421136T1 (de) 2009-01-15

Family

ID=27609658

Family Applications (1)

Application Number Title Priority Date Filing Date
AT03702812T ATE421136T1 (de) 2002-01-30 2003-01-29 Audiovisuelle sprachaktivitätsdetektion für ein spracherkennungssystem

Country Status (7)

Country Link
US (1) US7219062B2 (de)
EP (1) EP1472679B1 (de)
JP (1) JP4681810B2 (de)
CN (1) CN1291372C (de)
AT (1) ATE421136T1 (de)
DE (1) DE60325826D1 (de)
WO (1) WO2003065350A1 (de)

Families Citing this family (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675027B1 (en) * 1999-11-22 2004-01-06 Microsoft Corp Personal mobile computing device having antenna microphone for improved speech recognition
US7274800B2 (en) * 2001-07-18 2007-09-25 Intel Corporation Dynamic gesture recognition from stereo sequences
US7165029B2 (en) * 2002-05-09 2007-01-16 Intel Corporation Coupled hidden Markov model for audiovisual speech recognition
US20030212552A1 (en) * 2002-05-09 2003-11-13 Liang Lu Hong Face recognition procedure useful for audiovisual speech recognition
US7209883B2 (en) * 2002-05-09 2007-04-24 Intel Corporation Factorial hidden markov model for audiovisual speech recognition
US7587318B2 (en) * 2002-09-12 2009-09-08 Broadcom Corporation Correlating video images of lip movements with audio signals to improve speech recognition
WO2004029905A1 (ja) * 2002-09-27 2004-04-08 Ginganet Corporation 遠隔教育システム、受講確認方法および受講確認プログラム
US7171043B2 (en) * 2002-10-11 2007-01-30 Intel Corporation Image recognition using hidden markov models and coupled hidden markov models
US7472063B2 (en) * 2002-12-19 2008-12-30 Intel Corporation Audio-visual feature fusion and support vector machine useful for continuous speech recognition
US7203368B2 (en) * 2003-01-06 2007-04-10 Intel Corporation Embedded bayesian network for pattern recognition
CA2473195C (en) * 2003-07-29 2014-02-04 Microsoft Corporation Head mounted multi-sensory audio input system
US7383181B2 (en) * 2003-07-29 2008-06-03 Microsoft Corporation Multi-sensory speech detection system
US20050033571A1 (en) * 2003-08-07 2005-02-10 Microsoft Corporation Head mounted multi-sensory audio input system
US7447630B2 (en) * 2003-11-26 2008-11-04 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement
US20050154593A1 (en) * 2004-01-14 2005-07-14 International Business Machines Corporation Method and apparatus employing electromyographic sensors to initiate oral communications with a voice-based device
US7499686B2 (en) * 2004-02-24 2009-03-03 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement on a mobile device
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
US8244542B2 (en) * 2004-07-01 2012-08-14 Emc Corporation Video surveillance
US20060046845A1 (en) * 2004-08-26 2006-03-02 Alexandre Armand Device for the acoustic control of a game system and application
US7574008B2 (en) * 2004-09-17 2009-08-11 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement
US7283850B2 (en) * 2004-10-12 2007-10-16 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement on a mobile device
US7346504B2 (en) * 2005-06-20 2008-03-18 Microsoft Corporation Multi-sensory speech enhancement using a clean speech prior
US7680656B2 (en) * 2005-06-28 2010-03-16 Microsoft Corporation Multi-sensory speech enhancement using a speech-state model
US7406303B2 (en) 2005-07-05 2008-07-29 Microsoft Corporation Multi-sensory speech enhancement using synthesized sensor signal
US20070033042A1 (en) * 2005-08-03 2007-02-08 International Business Machines Corporation Speech detection fusing multi-class acoustic-phonetic, and energy features
US7962340B2 (en) * 2005-08-22 2011-06-14 Nuance Communications, Inc. Methods and apparatus for buffering data for use in accordance with a speech recognition system
WO2007026280A1 (en) * 2005-08-31 2007-03-08 Philips Intellectual Property & Standards Gmbh A dialogue system for interacting with a person by making use of both visual and speech-based recognition
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
US7860718B2 (en) * 2005-12-08 2010-12-28 Electronics And Telecommunications Research Institute Apparatus and method for speech segment detection and system for speech recognition
US7930178B2 (en) * 2005-12-23 2011-04-19 Microsoft Corporation Speech modeling and enhancement based on magnitude-normalized spectra
US8326636B2 (en) 2008-01-16 2012-12-04 Canyon Ip Holdings Llc Using a physical phenomenon detector to control operation of a speech recognition engine
JP3139277U (ja) * 2007-11-26 2008-02-07 株式会社Srj バーチャルスクールシステムおよびスクールシティシステム
JP2011186351A (ja) * 2010-03-11 2011-09-22 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
US8635066B2 (en) * 2010-04-14 2014-01-21 T-Mobile Usa, Inc. Camera-assisted noise cancellation and speech recognition
US20110311144A1 (en) * 2010-06-17 2011-12-22 Microsoft Corporation Rgb/depth camera for improving speech recognition
US8751565B1 (en) 2011-02-08 2014-06-10 Google Inc. Components for web-based configurable pipeline media processing
US8681866B1 (en) 2011-04-28 2014-03-25 Google Inc. Method and apparatus for encoding video by downsampling frame resolution
US9106787B1 (en) 2011-05-09 2015-08-11 Google Inc. Apparatus and method for media transmission bandwidth control using bandwidth estimation
US8863042B2 (en) * 2012-01-24 2014-10-14 Charles J. Kulas Handheld device with touch controls that reconfigure in response to the way a user operates the device
US8913103B1 (en) 2012-02-01 2014-12-16 Google Inc. Method and apparatus for focus-of-attention control
US8782271B1 (en) 2012-03-19 2014-07-15 Google, Inc. Video mixing using video speech detection
US9185429B1 (en) 2012-04-30 2015-11-10 Google Inc. Video encoding and decoding using un-equal error protection
KR101992676B1 (ko) 2012-07-26 2019-06-25 삼성전자주식회사 영상 인식을 이용하여 음성 인식을 하는 방법 및 장치
WO2014025012A1 (ja) * 2012-08-10 2014-02-13 株式会社ホンダアクセス 音声認識方法及び音声認識装置
US9704486B2 (en) 2012-12-11 2017-07-11 Amazon Technologies, Inc. Speech recognition power management
US9172740B1 (en) 2013-01-15 2015-10-27 Google Inc. Adjustable buffer remote access
US9311692B1 (en) 2013-01-25 2016-04-12 Google Inc. Scalable buffer remote access
US9225979B1 (en) 2013-01-30 2015-12-29 Google Inc. Remote access encoding
WO2014189931A1 (en) 2013-05-23 2014-11-27 Knowles Electronics, Llc Vad detection microphone and method of operating the same
US9711166B2 (en) 2013-05-23 2017-07-18 Knowles Electronics, Llc Decimation synchronization in a microphone
US10020008B2 (en) 2013-05-23 2018-07-10 Knowles Electronics, Llc Microphone and corresponding digital interface
US9165182B2 (en) * 2013-08-19 2015-10-20 Cisco Technology, Inc. Method and apparatus for using face detection information to improve speaker segmentation
JP6221535B2 (ja) * 2013-09-11 2017-11-01 ソニー株式会社 情報処理装置、情報処理方法、およびプログラム
US9502028B2 (en) * 2013-10-18 2016-11-22 Knowles Electronics, Llc Acoustic activity detection apparatus and method
US9147397B2 (en) 2013-10-29 2015-09-29 Knowles Electronics, Llc VAD detection apparatus and method of operating the same
EP3084760A4 (de) * 2013-12-20 2017-08-16 Intel Corporation Übergang von einem niedrigleistungs-dauereinschaltmodus zu einem hochleistungs-spracherkennungsmodus
US10304458B1 (en) 2014-03-06 2019-05-28 Board of Trustees of the University of Alabama and the University of Alabama in Huntsville Systems and methods for transcribing videos using speaker identification
US9966079B2 (en) * 2014-03-24 2018-05-08 Lenovo (Singapore) Pte. Ltd. Directing voice input based on eye tracking
US9830080B2 (en) 2015-01-21 2017-11-28 Knowles Electronics, Llc Low power voice trigger for acoustic apparatus and method
US10121472B2 (en) 2015-02-13 2018-11-06 Knowles Electronics, Llc Audio buffer catch-up apparatus and method with two microphones
CN105991851A (zh) 2015-02-17 2016-10-05 杜比实验室特许公司 处理电话会议系统中的烦扰
DE102015206566A1 (de) * 2015-04-13 2016-10-13 BSH Hausgeräte GmbH Haushaltsgerät und Verfahren zum Betreiben eines Haushaltsgeräts
US9478234B1 (en) 2015-07-13 2016-10-25 Knowles Electronics, Llc Microphone apparatus and method with catch-up buffer
EP3185244B1 (de) * 2015-12-22 2019-02-20 Nxp B.V. Sprachaktivierungssystem
EP3460791A4 (de) * 2016-05-16 2019-05-22 Sony Corporation Informationsverarbeitungsvorrichtung
CN107437420A (zh) * 2016-05-27 2017-12-05 富泰华工业(深圳)有限公司 语音信息的接收方法、系统及装置
JP6744025B2 (ja) * 2016-06-21 2020-08-19 日本電気株式会社 作業支援システム、管理サーバ、携帯端末、作業支援方法およびプログラム
US10621992B2 (en) * 2016-07-22 2020-04-14 Lenovo (Singapore) Pte. Ltd. Activating voice assistant based on at least one of user proximity and context
CN106373568A (zh) * 2016-08-30 2017-02-01 深圳市元征科技股份有限公司 智能车载单元控制方法和装置
KR102591413B1 (ko) * 2016-11-16 2023-10-19 엘지전자 주식회사 이동단말기 및 그 제어방법
US10332515B2 (en) * 2017-03-14 2019-06-25 Google Llc Query endpointing based on lip detection
US10664533B2 (en) 2017-05-24 2020-05-26 Lenovo (Singapore) Pte. Ltd. Systems and methods to determine response cue for digital assistant based on context
CN109102801A (zh) * 2017-06-20 2018-12-28 京东方科技集团股份有限公司 语音识别方法和语音识别装置
DE112018006597B4 (de) 2018-03-13 2022-10-06 Mitsubishi Electric Corporation Sprachverarbeitungsvorrichtung und Sprachverarbeitungsverfahren
CN109147779A (zh) * 2018-08-14 2019-01-04 苏州思必驰信息科技有限公司 语音数据处理方法和装置
US11151993B2 (en) * 2018-12-28 2021-10-19 Baidu Usa Llc Activating voice commands of a smart display device based on a vision-based mechanism
KR20210042520A (ko) * 2019-10-10 2021-04-20 삼성전자주식회사 전자 장치 및 이의 제어 방법
CN111768760B (zh) * 2020-05-26 2023-04-18 云知声智能科技股份有限公司 一种多模态语音端点检测方法及装置
CN113345472B (zh) * 2021-05-08 2022-03-25 北京百度网讯科技有限公司 语音端点检测方法、装置、电子设备及存储介质

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5781300A (en) * 1980-11-10 1982-05-21 Matsushita Electric Ind Co Ltd Voice recognition apparatus
US4449189A (en) * 1981-11-20 1984-05-15 Siemens Corporation Personal access control system using speech and face recognition
SE450325B (sv) * 1983-02-23 1987-06-22 Tricum Ab Kostfiberprodukt baserad pa skaldelar fran froet hos ceralier
US4975960A (en) 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
JPS6338993A (ja) * 1986-08-04 1988-02-19 松下電器産業株式会社 音声区間検出装置
JP3273781B2 (ja) * 1989-09-21 2002-04-15 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴイ 記録担体及び記録担体を得る方法及び装置及び複写防止手段を持つ情報記録装置
US5412378A (en) * 1990-06-13 1995-05-02 Clemens; Jon K. Antitheft protection of devices
US5621858A (en) * 1992-05-26 1997-04-15 Ricoh Corporation Neural network acoustic and visual speech recognition system training method and apparatus
JPH06230799A (ja) * 1993-02-04 1994-08-19 Nippon Telegr & Teleph Corp <Ntt> 信号記録装置
JPH06301393A (ja) * 1993-04-13 1994-10-28 Matsushita Electric Ind Co Ltd 音声区間検出装置及び音声認識装置
US5473726A (en) * 1993-07-06 1995-12-05 The United States Of America As Represented By The Secretary Of The Air Force Audio and amplitude modulated photo data collection for speech recognition
US6471420B1 (en) * 1994-05-13 2002-10-29 Matsushita Electric Industrial Co., Ltd. Voice selection apparatus voice response apparatus, and game apparatus using word tables from which selected words are output as voice selections
JPH1011087A (ja) * 1996-06-27 1998-01-16 Kokusai Denshin Denwa Co Ltd <Kdd> 未登録単語検出方法及び装置並びに音声認識装置
US5915027A (en) * 1996-11-05 1999-06-22 Nec Research Institute Digital watermarking
NL1007123C2 (nl) 1997-09-26 1999-03-29 Od & Me Bv Registratiedragers, werkwijze voor het controleren van dergelijke registratiedragers, werkwijzen voor het vervaardigen van dergelijke registratiedragers alsmede inrichting geschikt voor het uitvoeren van dergelijke werkwijzen.
US6216103B1 (en) * 1997-10-20 2001-04-10 Sony Corporation Method for implementing a speech recognition system to determine speech endpoints during conditions with background noise
JP3865924B2 (ja) * 1998-03-26 2007-01-10 松下電器産業株式会社 音声認識装置
US6219639B1 (en) * 1998-04-28 2001-04-17 International Business Machines Corporation Method and apparatus for recognizing identity of individuals employing synchronized biometrics
JP2000338987A (ja) * 1999-05-28 2000-12-08 Mitsubishi Electric Corp 発話開始監視装置、話者同定装置、音声入力システム、および話者同定システム、並びに通信システム
JP3983421B2 (ja) * 1999-06-11 2007-09-26 三菱電機株式会社 音声認識装置
US6594629B1 (en) * 1999-08-06 2003-07-15 International Business Machines Corporation Methods and apparatus for audio-visual speech detection and recognition
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
JP2001051772A (ja) * 1999-08-11 2001-02-23 Fujitsu Ltd 話者顔面位置検出装置および話者口唇部開閉検出装置
ES2231448T3 (es) * 2000-01-27 2005-05-16 Siemens Aktiengesellschaft Sistema y procedimiento para el procesamiento de voz enfocado a la vision.
US6754373B1 (en) * 2000-07-14 2004-06-22 International Business Machines Corporation System and method for microphone activation using visual speech cues
US6721706B1 (en) * 2000-10-30 2004-04-13 Koninklijke Philips Electronics N.V. Environment-responsive user interface/entertainment device that simulates personal interaction

Also Published As

Publication number Publication date
CN1291372C (zh) 2006-12-20
EP1472679B1 (de) 2009-01-14
JP2005516263A (ja) 2005-06-02
WO2003065350A1 (en) 2003-08-07
EP1472679A1 (de) 2004-11-03
DE60325826D1 (de) 2009-03-05
CN1623182A (zh) 2005-06-01
JP4681810B2 (ja) 2011-05-11
US20030144844A1 (en) 2003-07-31
US7219062B2 (en) 2007-05-15

Similar Documents

Publication Publication Date Title
ATE421136T1 (de) Audiovisuelle sprachaktivitätsdetektion für ein spracherkennungssystem
US11683632B2 (en) Automatic speech recognition triggering system
ES2806204T3 (es) Técnicas para reconomiento de voz para activación y sistemas y métodos relacionados
US9775113B2 (en) Voice wakeup detecting device with digital microphone and associated method
BR0113725A (pt) Combinação de dtw e hmm nos modos de reconhecimento de fala dependente e independente do falante
EP1901282A3 (de) Sprachkommunikationssystem für ein Fahrzeug und Verfahren zum Bedienen eines Sprachkommunikationssystems für ein Fahrzeug
KR950015199A (ko) 음성 인식 방법 및 장치
ATE335195T1 (de) Hintergrundlernen von sprecherstimmen
WO2007117814A3 (en) Voice signal perturbation for speech recognition
US10477294B1 (en) Multi-device audio capture
US11694685B2 (en) Hotphrase triggering based on a sequence of detections
CN110268471B (zh) 具有嵌入式降噪的asr的方法和设备
WO2003098373A3 (en) Voice authentication
WO2004081916A3 (en) Human machine interface with speech recognition
CN204242252U (zh) 一种带声纹识别功能的行车记录仪
CN108337620A (zh) 一种语音控制的扩音器及其控制方法
CN208337877U (zh) 一种语音控制的扩音器
CN111294475B (zh) 电子装置及电子装置的模式切换方法
CN112908310A (zh) 一种智能电器中的语音指令识别方法及识别系统
Tatman Speaker dialect is a necessary feature to model perceptual accent adaptation in humans
Mishra et al. Automatic speech recognition using template model for man-machine interface
Rahman et al. Speech recognition front-end for segmenting and clustering continuous bangla speech
Heracleous et al. Audible (normal) speech and inaudible murmur recognition using NAM microphone
Fan et al. Power-normalized PLP (PNPLP) feature for robust speech recognition
JP2018084700A (ja) 対話補助システムの制御方法、対話補助システム、及び、プログラム

Legal Events

Date Code Title Description
RER Ceased as to paragraph 5 lit. 3 law introducing patent treaties