CN104541324B - 一种使用动态贝叶斯网络模型的语音识别系统和方法 - Google Patents

一种使用动态贝叶斯网络模型的语音识别系统和方法 Download PDF

Info

Publication number
CN104541324B
CN104541324B CN201380031695.3A CN201380031695A CN104541324B CN 104541324 B CN104541324 B CN 104541324B CN 201380031695 A CN201380031695 A CN 201380031695A CN 104541324 B CN104541324 B CN 104541324B
Authority
CN
China
Prior art keywords
signal
observed
analysis module
word
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201380031695.3A
Other languages
English (en)
Chinese (zh)
Other versions
CN104541324A (zh
Inventor
巴尔托什·焦尔科
托马什·贾奇克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Akademia Gomiczo Hutnicza
Original Assignee
Akademia Gomiczo Hutnicza
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Akademia Gomiczo Hutnicza filed Critical Akademia Gomiczo Hutnicza
Publication of CN104541324A publication Critical patent/CN104541324A/zh
Application granted granted Critical
Publication of CN104541324B publication Critical patent/CN104541324B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
CN201380031695.3A 2013-05-01 2013-06-26 一种使用动态贝叶斯网络模型的语音识别系统和方法 Expired - Fee Related CN104541324B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PL403724A PL403724A1 (pl) 2013-05-01 2013-05-01 System rozpoznawania mowy i sposób wykorzystania dynamicznych modeli i sieci Bayesa
PLP.403724 2013-05-01
PCT/EP2013/063330 WO2014177232A1 (en) 2013-05-01 2013-06-26 A speech recognition system and a method of using dynamic bayesian network models

Publications (2)

Publication Number Publication Date
CN104541324A CN104541324A (zh) 2015-04-22
CN104541324B true CN104541324B (zh) 2019-09-13

Family

ID=48699782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380031695.3A Expired - Fee Related CN104541324B (zh) 2013-05-01 2013-06-26 一种使用动态贝叶斯网络模型的语音识别系统和方法

Country Status (9)

Country Link
US (1) US9552811B2 (enExample)
EP (1) EP2959475B1 (enExample)
JP (1) JP2016517047A (enExample)
CN (1) CN104541324B (enExample)
AU (1) AU2013388411A1 (enExample)
CA (1) CA2875727A1 (enExample)
IN (1) IN2014DN10400A (enExample)
PL (2) PL403724A1 (enExample)
WO (1) WO2014177232A1 (enExample)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017532082A (ja) 2014-08-22 2017-11-02 エスアールアイ インターナショナルSRI International 患者の精神状態のスピーチベース評価のためのシステム
US10706873B2 (en) * 2015-09-18 2020-07-07 Sri International Real-time speaker state analytics platform
US9792907B2 (en) 2015-11-24 2017-10-17 Intel IP Corporation Low resource key phrase detection for wake on voice
CN105654944B (zh) * 2015-12-30 2019-11-01 中国科学院自动化研究所 一种融合了短时与长时特征建模的环境声识别方法及装置
US9972313B2 (en) * 2016-03-01 2018-05-15 Intel Corporation Intermediate scoring and rejection loopback for improved key phrase detection
US10043521B2 (en) 2016-07-01 2018-08-07 Intel IP Corporation User defined key phrase detection by user dependent sequence modeling
CN106297828B (zh) * 2016-08-12 2020-03-24 苏州驰声信息科技有限公司 一种基于深度学习的误发音检测的检测方法和装置
US10083689B2 (en) * 2016-12-23 2018-09-25 Intel Corporation Linear scoring for low power wake on voice
WO2018209608A1 (en) * 2017-05-17 2018-11-22 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for robust language identification
US10902738B2 (en) * 2017-08-03 2021-01-26 Microsoft Technology Licensing, Llc Neural models for key phrase detection and question generation
CN107729381B (zh) * 2017-09-15 2020-05-08 广州嘉影软件有限公司 基于多维特征识别的交互多媒体资源聚合方法及系统
US10714122B2 (en) 2018-06-06 2020-07-14 Intel Corporation Speech classification of audio for wake on voice
US10650807B2 (en) 2018-09-18 2020-05-12 Intel Corporation Method and system of neural network keyphrase detection
US11127394B2 (en) 2019-03-29 2021-09-21 Intel Corporation Method and system of high accuracy keyphrase detection for low resource devices
CN110838306B (zh) * 2019-11-12 2022-05-13 广州视源电子科技股份有限公司 语音信号检测方法、计算机存储介质及相关设备
US11640713B2 (en) * 2020-07-29 2023-05-02 Optima Sports Systems S.L. Computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay
CN114612810B (zh) * 2020-11-23 2023-04-07 山东大卫国际建筑设计有限公司 一种动态自适应异常姿态识别方法及装置
CN115718536B (zh) * 2023-01-09 2023-04-18 苏州浪潮智能科技有限公司 一种调频方法、装置、电子设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6292776B1 (en) * 1999-03-12 2001-09-18 Lucent Technologies Inc. Hierarchial subband linear predictive cepstral features for HMM-based speech recognition
US6542866B1 (en) * 1999-09-22 2003-04-01 Microsoft Corporation Speech recognition method and apparatus utilizing multiple feature streams
US20030212552A1 (en) * 2002-05-09 2003-11-13 Liang Lu Hong Face recognition procedure useful for audiovisual speech recognition
US7346510B2 (en) * 2002-03-19 2008-03-18 Microsoft Corporation Method of speech recognition using variables representing dynamic aspects of speech
CN102411931A (zh) * 2010-09-15 2012-04-11 微软公司 用于大词汇量连续语音识别的深度信任网络

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6256046B1 (en) 1997-04-18 2001-07-03 Compaq Computer Corporation Method and apparatus for visual sensing of humans for active public interfaces
WO2004027685A2 (en) * 2002-09-19 2004-04-01 The Penn State Research Foundation Prosody based audio/visual co-analysis for co-verbal gesture recognition
US7203368B2 (en) 2003-01-06 2007-04-10 Intel Corporation Embedded bayesian network for pattern recognition
US7454342B2 (en) * 2003-03-19 2008-11-18 Intel Corporation Coupled hidden Markov model (CHMM) for continuous audiovisual speech recognition
US7454336B2 (en) * 2003-06-20 2008-11-18 Microsoft Corporation Variational inference and learning for segmental switching state space models of hidden speech dynamics
JP4479191B2 (ja) * 2003-08-25 2010-06-09 カシオ計算機株式会社 音声認識装置、音声認識方法及び音声認識処理プログラム
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
JP4843987B2 (ja) * 2005-04-05 2011-12-21 ソニー株式会社 情報処理装置、情報処理方法、およびプログラム
US8200648B2 (en) * 2006-08-07 2012-06-12 Yeda Research & Development Co. Ltd. At The Weizmann Institute Of Science Data similarity and importance using local and global evidence scores
US9589380B2 (en) 2007-02-27 2017-03-07 International Business Machines Corporation Avatar-based unsolicited advertisements in a virtual universe
US9183843B2 (en) * 2011-01-07 2015-11-10 Nuance Communications, Inc. Configurable speech recognition system using multiple recognizers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6292776B1 (en) * 1999-03-12 2001-09-18 Lucent Technologies Inc. Hierarchial subband linear predictive cepstral features for HMM-based speech recognition
US6542866B1 (en) * 1999-09-22 2003-04-01 Microsoft Corporation Speech recognition method and apparatus utilizing multiple feature streams
US7346510B2 (en) * 2002-03-19 2008-03-18 Microsoft Corporation Method of speech recognition using variables representing dynamic aspects of speech
US20030212552A1 (en) * 2002-05-09 2003-11-13 Liang Lu Hong Face recognition procedure useful for audiovisual speech recognition
CN102411931A (zh) * 2010-09-15 2012-04-11 微软公司 用于大词汇量连续语音识别的深度信任网络

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"An asynchronous DBN for audio-visual speech recognition";Kate Saenko and Karen Livescu;《2006 IEEE Spoken Language Technology Workshop》;20061213;全文 *
"Automatic Speech Recognition using Dynamic Bayesian Networks with both Acoustic and Articulatory Variables";Todd A.Stephenson等;《International Conference on Spoken Language Processing,2000》;20001031;第2卷;全文 *
"DBN based multi-stream models for audio-visual speech recognition";John N.Gowdy等;《IEEE International Conference on Acoustics,2004》;20040521;全文 *
"Hierarchical spectro-temporal features for robust speech recognition";Xavier Domont等;《IEEE International Conference on Acoustics,2008》;20080404;全文 *
"Visual model structures and synchrony constraints for audio-visual speech recognition";Timothy J.Hazen;《IEEE Transactions on Audio Speech & Language Processing》;20060531;第14卷(第3期);全文 *

Also Published As

Publication number Publication date
IN2014DN10400A (enExample) 2015-08-14
JP2016517047A (ja) 2016-06-09
PL403724A1 (pl) 2014-11-10
CA2875727A1 (en) 2014-11-06
CN104541324A (zh) 2015-04-22
US9552811B2 (en) 2017-01-24
EP2959475A1 (en) 2015-12-30
PL2959475T3 (pl) 2018-04-30
WO2014177232A1 (en) 2014-11-06
AU2013388411A1 (en) 2015-01-22
EP2959475B1 (en) 2017-02-08
US20160111086A1 (en) 2016-04-21

Similar Documents

Publication Publication Date Title
CN104541324B (zh) 一种使用动态贝叶斯网络模型的语音识别系统和方法
Chernykh et al. Emotion recognition from speech with recurrent neural networks
Hayashi et al. Duration-controlled LSTM for polyphonic sound event detection
JP6671020B2 (ja) 対話行為推定方法、対話行為推定装置及びプログラム
Mariooryad et al. Compensating for speaker or lexical variabilities in speech for emotion recognition
KR20190125463A (ko) 음성 감정 검출 방법 및 장치, 컴퓨터 장치 및 저장 매체
US12148417B1 (en) Label confidence scoring
JP6910002B2 (ja) 対話行為推定方法、対話行為推定装置及びプログラム
CN112397053B (zh) 语音识别方法、装置、电子设备及可读存储介质
CN113763992B (zh) 语音测评方法、装置、计算机设备和存储介质
JP2024502946A6 (ja) 音声認識トランスクリプトの句読点付け及び大文字化
JP2024502946A (ja) 音声認識トランスクリプトの句読点付け及び大文字化
Schmitt et al. Towards adaptive spoken dialog systems
Moyal et al. Phonetic search methods for large speech databases
Vegesna et al. Dnn-hmm acoustic modeling for large vocabulary telugu speech recognition
Zhang et al. Cacnet: Cube attentional cnn for automatic speech recognition
CN115273862A (zh) 语音处理的方法、装置、电子设备和介质
Palzer et al. Improving neural diarization through speaker attribute attractors and local dependency modeling
CN112150103A (zh) 一种日程设置方法、装置和存储介质
Rana et al. Multi-task semisupervised adversarial autoencoding for speech emotion
Röpke et al. Training a speech-to-text model for Dutch on the Corpus gesproken Nederlands
Tripathi et al. Cyclegan-based speech mode transformation model for robust multilingual ASR
Ohta et al. Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning
Bayer et al. Towards End-to-End Spoken Dialogue Systems with Turn Embeddings.
Yoshida et al. Audio-visual voice activity detection based on an utterance state transition model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190913

Termination date: 20210626

CF01 Termination of patent right due to non-payment of annual fee