WO2009025356A1 - 音声認識装置および音声認識方法 - Google Patents

音声認識装置および音声認識方法 Download PDF

Info

Publication number
WO2009025356A1
WO2009025356A1 PCT/JP2008/065008 JP2008065008W WO2009025356A1 WO 2009025356 A1 WO2009025356 A1 WO 2009025356A1 JP 2008065008 W JP2008065008 W JP 2008065008W WO 2009025356 A1 WO2009025356 A1 WO 2009025356A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
unit
score
tone
search unit
Prior art date
Application number
PCT/JP2008/065008
Other languages
English (en)
French (fr)
Inventor
Ken Hanazawa
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to JP2009529074A priority Critical patent/JP5282737B2/ja
Priority to US12/672,015 priority patent/US8315870B2/en
Priority to CN2008801035918A priority patent/CN101785051B/zh
Publication of WO2009025356A1 publication Critical patent/WO2009025356A1/ja

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Abstract

 距離計算部(16)は、入力音声の特徴量と各音素モデルとの音響距離を求める。単語サーチ部(17)は、音響距離と単語の音素およびトーンラベルを含む言語モデルとに基づいて単語サーチを行い、単語仮説とその単語仮説の確からしさを示す第1のスコアとを出力する。単語サーチ部(17)はまた、入力音声の認識結果が上記単語仮説であるとした場合の、入力音声における母音区間とそのトーンラベルとを出力する。トーン認識部(21)は、単語サーチ部(17)から出力された母音区間に対応する特徴量に基づいて、単語サーチ部17から出力されたトーンラベルの確からしさを示す第2のスコアを出力する。リスコア部(22)は、トーン認識部(21)から出力された第2のスコアを用いて、単語サーチ部(17)から出力された単語仮説についての第1のスコアを補正する。これにより、声調音声に対する音声認識精度を向上させることができる。
PCT/JP2008/065008 2007-08-22 2008-08-22 音声認識装置および音声認識方法 WO2009025356A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2009529074A JP5282737B2 (ja) 2007-08-22 2008-08-22 音声認識装置および音声認識方法
US12/672,015 US8315870B2 (en) 2007-08-22 2008-08-22 Rescoring speech recognition hypothesis using prosodic likelihood
CN2008801035918A CN101785051B (zh) 2007-08-22 2008-08-22 语音识别装置和语音识别方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2007-215958 2007-08-22
JP2007215958 2007-08-22

Publications (1)

Publication Number Publication Date
WO2009025356A1 true WO2009025356A1 (ja) 2009-02-26

Family

ID=40378256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/065008 WO2009025356A1 (ja) 2007-08-22 2008-08-22 音声認識装置および音声認識方法

Country Status (4)

Country Link
US (1) US8315870B2 (ja)
JP (1) JP5282737B2 (ja)
CN (1) CN101785051B (ja)
WO (1) WO2009025356A1 (ja)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2233110A1 (en) 2009-03-24 2010-09-29 orangedental GmbH & Co. KG Methods and apparatus to determine distances for use in dentistry
CN102254556A (zh) * 2010-05-17 2011-11-23 阿瓦雅公司 基于听者和说者的讲话风格比较估计听者理解说者的能力
CN102938252A (zh) * 2012-11-23 2013-02-20 中国科学院自动化研究所 结合韵律和发音学特征的汉语声调识别系统及方法

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102237081B (zh) * 2010-04-30 2013-04-24 国际商业机器公司 语音韵律评估方法与系统
US10002608B2 (en) * 2010-09-17 2018-06-19 Nuance Communications, Inc. System and method for using prosody for voice-enabled search
US8401853B2 (en) 2010-09-22 2013-03-19 At&T Intellectual Property I, L.P. System and method for enhancing voice-enabled search based on automated demographic identification
JP5179559B2 (ja) * 2010-11-12 2013-04-10 シャープ株式会社 画像処理システムを制御する制御装置、画像形成装置、画像読取装置、制御方法、画像処理プログラム及びコンピュータ読み取り可能な記録媒体
JP5716595B2 (ja) * 2011-01-28 2015-05-13 富士通株式会社 音声補正装置、音声補正方法及び音声補正プログラム
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
TWI557722B (zh) * 2012-11-15 2016-11-11 緯創資通股份有限公司 語音干擾的濾除方法、系統,與電腦可讀記錄媒體
WO2014167570A1 (en) * 2013-04-10 2014-10-16 Technologies For Voice Interface System and method for extracting and using prosody features
US9251202B1 (en) * 2013-06-25 2016-02-02 Google Inc. Corpus specific queries for corpora from search query
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge
CN103474061A (zh) * 2013-09-12 2013-12-25 河海大学 基于分类器融合的汉语方言自动辨识方法
CN105632499B (zh) * 2014-10-31 2019-12-10 株式会社东芝 用于优化语音识别结果的方法和装置
US9824684B2 (en) * 2014-11-13 2017-11-21 Microsoft Technology Licensing, Llc Prediction-based sequence recognition
CN104464751B (zh) * 2014-11-21 2018-01-16 科大讯飞股份有限公司 发音韵律问题的检测方法及装置
US9953644B2 (en) 2014-12-01 2018-04-24 At&T Intellectual Property I, L.P. Targeted clarification questions in speech recognition with concept presence score and concept correctness score
WO2016103358A1 (ja) * 2014-12-24 2016-06-30 三菱電機株式会社 音声認識装置及び音声認識方法
US9754580B2 (en) 2015-10-12 2017-09-05 Technologies For Voice Interface System and method for extracting and using prosody features
CN105869624B (zh) 2016-03-29 2019-05-10 腾讯科技(深圳)有限公司 数字语音识别中语音解码网络的构建方法及装置
US10607601B2 (en) * 2017-05-11 2020-03-31 International Business Machines Corporation Speech recognition by selecting and refining hot words
TW201921336A (zh) * 2017-06-15 2019-06-01 大陸商北京嘀嘀無限科技發展有限公司 用於語音辨識的系統和方法
CN109145281B (zh) * 2017-06-15 2020-12-25 北京嘀嘀无限科技发展有限公司 语音识别方法、装置及存储介质
EP3823306B1 (en) * 2019-11-15 2022-08-24 Sivantos Pte. Ltd. A hearing system comprising a hearing instrument and a method for operating the hearing instrument
CN111862954B (zh) * 2020-05-29 2024-03-01 北京捷通华声科技股份有限公司 一种语音识别模型的获取方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63165900A (ja) * 1986-12-27 1988-07-09 沖電気工業株式会社 会話音声認識方式
JPH04128899A (ja) * 1990-09-20 1992-04-30 Fujitsu Ltd 音声認識装置
JPH07261778A (ja) * 1994-03-22 1995-10-13 Canon Inc 音声情報処理方法及び装置
JP2001282282A (ja) * 2000-03-31 2001-10-12 Canon Inc 音声情報処理方法および装置および記憶媒体

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0758839B2 (ja) 1987-09-05 1995-06-21 ティーディーケイ株式会社 電子部品挿入ヘッド
JP2946219B2 (ja) 1989-11-22 1999-09-06 九州日立マクセル株式会社 スクリーン印刷用印刷版
SE514684C2 (sv) * 1995-06-16 2001-04-02 Telia Ab Metod vid tal-till-textomvandling
US5806031A (en) * 1996-04-25 1998-09-08 Motorola Method and recognizer for recognizing tonal acoustic sound signals
JP3006677B2 (ja) * 1996-10-28 2000-02-07 日本電気株式会社 音声認識装置
US6253178B1 (en) * 1997-09-22 2001-06-26 Nortel Networks Limited Search and rescoring method for a speech recognition system
JP2003514260A (ja) * 1999-11-11 2003-04-15 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ スピーチ認識のための音調特徴
US7043430B1 (en) * 1999-11-23 2006-05-09 Infotalk Corporation Limitied System and method for speech recognition using tonal modeling
CN1180398C (zh) * 2000-05-26 2004-12-15 封家麒 一种语音辨识方法及系统
US6510410B1 (en) * 2000-07-28 2003-01-21 International Business Machines Corporation Method and apparatus for recognizing tone languages using pitch information
AU2000276402A1 (en) * 2000-09-30 2002-04-15 Intel Corporation Method, apparatus, and system for bottom-up tone integration to chinese continuous speech recognition system
JP4353202B2 (ja) * 2006-05-25 2009-10-28 ソニー株式会社 韻律識別装置及び方法、並びに音声認識装置及び方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63165900A (ja) * 1986-12-27 1988-07-09 沖電気工業株式会社 会話音声認識方式
JPH04128899A (ja) * 1990-09-20 1992-04-30 Fujitsu Ltd 音声認識装置
JPH07261778A (ja) * 1994-03-22 1995-10-13 Canon Inc 音声情報処理方法及び装置
JP2001282282A (ja) * 2000-03-31 2001-10-12 Canon Inc 音声情報処理方法および装置および記憶媒体

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ONODERA S. ET AL: "Multipath hoshiki o mochiita zatsuon kankyoka deno tango onsei ninshiki -accent joho no riyo-", THE ACOUSTICAL SOCIETY OF JAPAN 2004 SHUNKI KENKYU HAPPYOKAI KOEN RONBUNSHU, 17 March 2004 (2004-03-17), pages 161 - 162 *
ZHAO LI ET AL: "3 Jigen viterbi-ho o mochiita onso joho to oncho joho no togo ni yoru chugokugo renzoku onsei ninshiki", THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN, vol. 54, no. 7, 1 July 1998 (1998-07-01), pages 497 - 505 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2233110A1 (en) 2009-03-24 2010-09-29 orangedental GmbH & Co. KG Methods and apparatus to determine distances for use in dentistry
CN102254556A (zh) * 2010-05-17 2011-11-23 阿瓦雅公司 基于听者和说者的讲话风格比较估计听者理解说者的能力
CN102938252A (zh) * 2012-11-23 2013-02-20 中国科学院自动化研究所 结合韵律和发音学特征的汉语声调识别系统及方法

Also Published As

Publication number Publication date
CN101785051A (zh) 2010-07-21
JP5282737B2 (ja) 2013-09-04
US20110196678A1 (en) 2011-08-11
US8315870B2 (en) 2012-11-20
JPWO2009025356A1 (ja) 2010-11-25
CN101785051B (zh) 2012-09-05

Similar Documents

Publication Publication Date Title
WO2009025356A1 (ja) 音声認識装置および音声認識方法
TW200638337A (en) Using a spoken utterance for disambiguation of spelling inputs into a speech recognition system
ATE524777T1 (de) Automatische aktualisierung eines sprachmodells
ATE395685T1 (de) Spracherkennung durch wort-in-phrase-befehl
WO2006023631A3 (en) Document transcription system training
WO2008073850A3 (en) Method and apparatus for reading education
WO2008087934A1 (ja) 拡張認識辞書学習装置と音声認識システム
ATE362633T1 (de) Erlernen der aussprache neuer worte unter verwendung eines aussprachegraphen
WO2007118020A3 (en) Method and system for managing pronunciation dictionaries in a speech application
ATE404967T1 (de) Text-zu-sprache-system und verfahren, computerprogramm dafür
TW200601263A (en) Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition
EP4318463A3 (en) Multi-modal input on an electronic device
WO2007034478A3 (en) System and method for correcting speech
WO2009006081A3 (en) Pronunciation correction of text-to-speech systems between different spoken languages
WO2008142836A1 (ja) 声質変換装置および声質変換方法
WO2007117814A3 (en) Voice signal perturbation for speech recognition
AU2001250579A1 (en) Discriminatively trained mixture models in continuous speech recognition
ATE405920T1 (de) Erzeugen einer spracherkennungsgrammatik für alphanumerische ausdrücke
WO2009035825A3 (en) Automatic reading tutoring
ATE514162T1 (de) Dynamische erzeugung von kontexten zur spracherkennung
DE602004024172D1 (de) Automatische Erzeugung einer Wortaussprache für die Spracherkennung
Jang Speech rhythm metrics for automatic scoring of English speech by Korean EFL learners
JP2004271895A (ja) 複数言語音声認識システムおよび発音学習システム
JP2012255867A (ja) 音声認識装置
Elmahdy et al. A baseline speech recognition system for levantine colloquial arabic

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880103591.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08827744

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009529074

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 12672015

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 08827744

Country of ref document: EP

Kind code of ref document: A1