JP4322815B2 - 音声認識システム及び方法 - Google Patents
音声認識システム及び方法 Download PDFInfo
- Publication number
- JP4322815B2 JP4322815B2 JP2005000506A JP2005000506A JP4322815B2 JP 4322815 B2 JP4322815 B2 JP 4322815B2 JP 2005000506 A JP2005000506 A JP 2005000506A JP 2005000506 A JP2005000506 A JP 2005000506A JP 4322815 B2 JP4322815 B2 JP 4322815B2
- Authority
- JP
- Japan
- Prior art keywords
- token
- node
- tokens
- likelihood
- list
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
Description
サブワードオーディオモデルが用いられれば、そのときには現在言語領域の任意のワードを表す正確なモデルシーケンスを与える辞書が必要となる。音単位オーディオモデル(phone unit Acoustic Model)には、語彙が発音辞書であり、これはワード毎に音韻表記を与える。
Claims (2)
- 音声部分に対応する1以上の候補テキスト単位の連接を所定の基準に従って決定するための自動音声認識システムの復号装置であって、
音声部分に対応する一連の特徴ベクトルを受ける手段と、
一連のテキスト単位を表す復号化ネットワークの各ノード列に前記特徴ベクトルがどれだけ良く対応するかを示す尤度値を用いて、前記特徴ベクトルを前記ノード列にマッピングする手段と、
各トークンが1つのノードに対応し、かつ、複数のテキスト単位の連接及びこれらの連接の尤度値と関連するダイナミックプログラミングアルゴリズムを実行することによって前記候補テキスト単位の連接に対応する前記復号化ネットワークにおいて1以上の候補ノード列を決定する手段と、
を具備し、
前記復号化ネットワークにおいて1つのノードと関連する1つのトークンは該ネットワークにおける複数の先のノードと関連する複数のトークンから取り込まれ、
前記復号化ネットワークにおいて、共通のノードに移動することになる異なるノード(遷移元ノード)からの複数の前記トークンが混合されることによって、
(A)それぞれの遷移元のノードが保持する各候補テキスト履歴のリストを示すポインタと、
(B)前記各候補テキスト履歴に対応する尤度の前記各リスト間における尤度差であるオフセットと、
(C)前記各リストに含まれる各候補テキスト履歴それぞれに、前記共通ノードの候補テキストを加えたものに対応する各尤度の中で最高の尤度とからなる新トークンを生成する、復号装置。 - 自動音声認識システムにおいて音声部分に対応する複数の候補テキスト単位連接を所定の基準に従って決定する復号化方法であって、
音声部分に対応する一連の特徴ベクトルを受けるステップと、
一連のテキスト単位を表す復号化ネットワークの各ノード列に前記特徴ベクトルがどれだけ良く対応するかを示す尤度値を用いて、前記特徴ベクトルを前記ノード列にマッピングするステップと、
各トークンが1つのノードに対応し、かつ、複数のテキスト単位の連接及びこれらの連接の尤度値と関連するダイナミックプログラミングアルゴリズムを実行することによって前記候補テキスト単位の連接に対応する前記復号化ネットワークにおいて1以上の候補ノード列を決定するステップと、
を具備し、
前記復号化ネットワークにおいて1つのノードと関連する1つのトークンは該ネットワークにおける複数の先のノードと関連する複数のトークンから取り込まれ、
前記復号化ネットワークにおいて、共通のノードに移動することになる異なるノード(遷移元ノード)からの複数の前記トークンが混合されることによって、
(A)それぞれの遷移元のノードが保持する各候補テキスト履歴のリストを示すポインタと、
(B)前記各候補テキスト履歴に対応する尤度の前記各リスト間における尤度差であるオフセットと、
(C)前記各リストに含まれる各候補テキスト履歴それぞれに、前記共通ノードの候補テキストを加えたものに対応する各尤度の中で最高の尤度とからなる新トークンを生成する、復号化方法。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0400101A GB2409750B (en) | 2004-01-05 | 2004-01-05 | Speech recognition system and technique |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005215672A JP2005215672A (ja) | 2005-08-11 |
JP4322815B2 true JP4322815B2 (ja) | 2009-09-02 |
Family
ID=31503420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2005000506A Expired - Fee Related JP4322815B2 (ja) | 2004-01-05 | 2005-01-05 | 音声認識システム及び方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7711561B2 (ja) |
JP (1) | JP4322815B2 (ja) |
GB (1) | GB2409750B (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9786272B2 (en) | 2013-12-24 | 2017-10-10 | Kabushiki Kaisha Toshiba | Decoder for searching a digraph and generating a lattice, decoding method, and computer program product |
US10008200B2 (en) | 2013-12-24 | 2018-06-26 | Kabushiki Kaisha Toshiba | Decoder for searching a path according to a signal sequence, decoding method, and computer program product |
US10042345B2 (en) | 2014-01-31 | 2018-08-07 | Kabushiki Kaisha Toshiba | Conversion device, pattern recognition system, conversion method, and computer program product |
US10055511B2 (en) | 2013-12-24 | 2018-08-21 | Kabushiki Kaisha Toshiba | Search device, search method, and computer program product |
US10109274B2 (en) | 2014-11-28 | 2018-10-23 | Kabushiki Kaisha Toshiba | Generation device, recognition device, generation method, and computer program product |
US10452355B2 (en) | 2014-09-18 | 2019-10-22 | Kabushiki Kaisha Toshiba | Automaton deforming device, automaton deforming method, and computer program product |
US10572538B2 (en) | 2015-04-28 | 2020-02-25 | Kabushiki Kaisha Toshiba | Lattice finalization device, pattern recognition device, lattice finalization method, and computer program product |
Families Citing this family (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7277850B1 (en) * | 2003-04-02 | 2007-10-02 | At&T Corp. | System and method of word graph matrix decomposition |
US9117460B2 (en) * | 2004-05-12 | 2015-08-25 | Core Wireless Licensing S.A.R.L. | Detection of end of utterance in speech recognition system |
US7409332B2 (en) * | 2004-07-14 | 2008-08-05 | Microsoft Corporation | Method and apparatus for initializing iterative training of translation probabilities |
US7877256B2 (en) * | 2006-02-17 | 2011-01-25 | Microsoft Corporation | Time synchronous decoding for long-span hidden trajectory model |
US8510109B2 (en) | 2007-08-22 | 2013-08-13 | Canyon Ip Holdings Llc | Continuous speech transcription performance indication |
US9973450B2 (en) | 2007-09-17 | 2018-05-15 | Amazon Technologies, Inc. | Methods and systems for dynamically updating web service profile information by parsing transcribed message strings |
US8639509B2 (en) * | 2007-07-27 | 2014-01-28 | Robert Bosch Gmbh | Method and system for computing or determining confidence scores for parse trees at all levels |
JP5327054B2 (ja) * | 2007-12-18 | 2013-10-30 | 日本電気株式会社 | 発音変動規則抽出装置、発音変動規則抽出方法、および発音変動規則抽出用プログラム |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
KR101056511B1 (ko) * | 2008-05-28 | 2011-08-11 | (주)파워보이스 | 실시간 호출명령어 인식을 이용한 잡음환경에서의음성구간검출과 연속음성인식 시스템 |
US20090307274A1 (en) * | 2008-06-06 | 2009-12-10 | Microsoft Corporation | Delayed merge |
US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
US11487347B1 (en) * | 2008-11-10 | 2022-11-01 | Verint Americas Inc. | Enhanced multi-modal communication |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
GB2482874B (en) * | 2010-08-16 | 2013-06-12 | Toshiba Res Europ Ltd | A speech processing system and method |
US8812321B2 (en) * | 2010-09-30 | 2014-08-19 | At&T Intellectual Property I, L.P. | System and method for combining speech recognition outputs from a plurality of domain-specific speech recognizers via machine learning |
US8914286B1 (en) * | 2011-04-14 | 2014-12-16 | Canyon IP Holdings, LLC | Speech recognition with hierarchical networks |
US8676580B2 (en) * | 2011-08-16 | 2014-03-18 | International Business Machines Corporation | Automatic speech and concept recognition |
US8972263B2 (en) * | 2011-11-18 | 2015-03-03 | Soundhound, Inc. | System and method for performing dual mode speech recognition |
US9324323B1 (en) * | 2012-01-13 | 2016-04-26 | Google Inc. | Speech recognition using topic-specific language models |
US8775177B1 (en) | 2012-03-08 | 2014-07-08 | Google Inc. | Speech recognition process |
US9483459B1 (en) * | 2012-03-31 | 2016-11-01 | Google Inc. | Natural language correction for speech input |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
KR101970041B1 (ko) * | 2012-09-07 | 2019-04-18 | 카네기 멜론 유니버시티 | 하이브리드 지피유/씨피유(gpu/cpu) 데이터 처리 방법 |
KR20140089871A (ko) * | 2013-01-07 | 2014-07-16 | 삼성전자주식회사 | 대화형 서버, 그 제어 방법 및 대화형 시스템 |
CN103971686B (zh) * | 2013-01-30 | 2015-06-10 | 腾讯科技(深圳)有限公司 | 自动语音识别方法和系统 |
US9153231B1 (en) * | 2013-03-15 | 2015-10-06 | Amazon Technologies, Inc. | Adaptive neural network speech recognition models |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US10102851B1 (en) * | 2013-08-28 | 2018-10-16 | Amazon Technologies, Inc. | Incremental utterance processing and semantic stability determination |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
KR102267405B1 (ko) * | 2014-11-21 | 2021-06-22 | 삼성전자주식회사 | 음성 인식 장치 및 음성 인식 장치의 제어 방법 |
US9552808B1 (en) * | 2014-11-25 | 2017-01-24 | Google Inc. | Decoding parameters for Viterbi search |
CN105700389B (zh) * | 2014-11-27 | 2020-08-11 | 青岛海尔智能技术研发有限公司 | 一种智能家庭自然语言控制方法 |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US20170092278A1 (en) * | 2015-09-30 | 2017-03-30 | Apple Inc. | Speaker recognition |
US10176802B1 (en) * | 2016-03-21 | 2019-01-08 | Amazon Technologies, Inc. | Lattice encoding using recurrent neural networks |
US10199037B1 (en) * | 2016-06-29 | 2019-02-05 | Amazon Technologies, Inc. | Adaptive beam pruning for automatic speech recognition |
JP2018013590A (ja) | 2016-07-20 | 2018-01-25 | 株式会社東芝 | 生成装置、認識システム、有限状態トランスデューサの生成方法、および、データ |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10157607B2 (en) | 2016-10-20 | 2018-12-18 | International Business Machines Corporation | Real time speech output speed adjustment |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US11011155B2 (en) * | 2017-08-01 | 2021-05-18 | Texas Instruments Incorporated | Multi-phrase difference confidence scoring |
CN109036381A (zh) * | 2018-08-08 | 2018-12-18 | 平安科技(深圳)有限公司 | 语音处理方法及装置、计算机装置及可读存储介质 |
KR20200056001A (ko) | 2018-11-14 | 2020-05-22 | 삼성전자주식회사 | 인공신경망에서의 디코딩 방법 및 그 장치 |
CN111583910B (zh) * | 2019-01-30 | 2023-09-26 | 北京猎户星空科技有限公司 | 模型更新方法、装置、电子设备及存储介质 |
CN110046276B (zh) * | 2019-04-19 | 2021-04-20 | 北京搜狗科技发展有限公司 | 一种语音中关键词的检索方法和装置 |
CN110970031B (zh) * | 2019-12-16 | 2022-06-24 | 思必驰科技股份有限公司 | 语音识别系统及方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997008686A2 (en) * | 1995-08-28 | 1997-03-06 | Philips Electronics N.V. | Method and system for pattern recognition based on tree organised probability densities |
EP1133766B1 (en) * | 1998-11-25 | 2004-01-21 | Entropic Limited | Network and language models for use in a speech recognition system |
US6574595B1 (en) * | 2000-07-11 | 2003-06-03 | Lucent Technologies Inc. | Method and apparatus for recognition-based barge-in detection in the context of subword-based automatic speech recognition |
AU2000276400A1 (en) * | 2000-09-30 | 2002-04-15 | Intel Corporation | Search method based on single triphone tree for large vocabulary continuous speech recognizer |
US20020178004A1 (en) * | 2001-05-23 | 2002-11-28 | Chienchung Chang | Method and apparatus for voice recognition |
JP4048741B2 (ja) * | 2001-07-24 | 2008-02-20 | セイコーエプソン株式会社 | Hmmの出力確率演算方法および音声認識装置 |
US20030061046A1 (en) * | 2001-09-27 | 2003-03-27 | Qingwei Zhao | Method and system for integrating long-span language model into speech recognition system |
-
2004
- 2004-01-05 GB GB0400101A patent/GB2409750B/en not_active Expired - Fee Related
- 2004-04-15 US US10/824,517 patent/US7711561B2/en not_active Expired - Fee Related
-
2005
- 2005-01-05 JP JP2005000506A patent/JP4322815B2/ja not_active Expired - Fee Related
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9786272B2 (en) | 2013-12-24 | 2017-10-10 | Kabushiki Kaisha Toshiba | Decoder for searching a digraph and generating a lattice, decoding method, and computer program product |
US10008200B2 (en) | 2013-12-24 | 2018-06-26 | Kabushiki Kaisha Toshiba | Decoder for searching a path according to a signal sequence, decoding method, and computer program product |
US10055511B2 (en) | 2013-12-24 | 2018-08-21 | Kabushiki Kaisha Toshiba | Search device, search method, and computer program product |
US10042345B2 (en) | 2014-01-31 | 2018-08-07 | Kabushiki Kaisha Toshiba | Conversion device, pattern recognition system, conversion method, and computer program product |
US10452355B2 (en) | 2014-09-18 | 2019-10-22 | Kabushiki Kaisha Toshiba | Automaton deforming device, automaton deforming method, and computer program product |
US10109274B2 (en) | 2014-11-28 | 2018-10-23 | Kabushiki Kaisha Toshiba | Generation device, recognition device, generation method, and computer program product |
US10572538B2 (en) | 2015-04-28 | 2020-02-25 | Kabushiki Kaisha Toshiba | Lattice finalization device, pattern recognition device, lattice finalization method, and computer program product |
Also Published As
Publication number | Publication date |
---|---|
JP2005215672A (ja) | 2005-08-11 |
US7711561B2 (en) | 2010-05-04 |
GB0400101D0 (en) | 2004-02-04 |
US20050149326A1 (en) | 2005-07-07 |
GB2409750A (en) | 2005-07-06 |
GB2409750B (en) | 2006-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4322815B2 (ja) | 音声認識システム及び方法 | |
US5621859A (en) | Single tree method for grammar directed, very large vocabulary speech recognizer | |
US9002705B2 (en) | Interactive device that recognizes input voice of a user and contents of an utterance of the user, and performs a response corresponding to the recognized contents | |
EP1128361B1 (en) | Language models for speech recognition | |
US6178401B1 (en) | Method for reducing search complexity in a speech recognition system | |
Lee et al. | Improved acoustic modeling for large vocabulary continuous speech recognition | |
Schwartz et al. | Multiple-pass search strategies | |
EP0664535A2 (en) | Large vocabulary connected speech recognition system and method of language representation using evolutional grammar to represent context free grammars | |
US20040172247A1 (en) | Continuous speech recognition method and system using inter-word phonetic information | |
JP2005227758A (ja) | 音声特性に基づく電話発信者の自動識別 | |
JP2001249684A (ja) | 音声認識装置および音声認識方法、並びに記録媒体 | |
US6253178B1 (en) | Search and rescoring method for a speech recognition system | |
Renals et al. | Start-synchronous search for large vocabulary continuous speech recognition | |
US6980954B1 (en) | Search method based on single triphone tree for large vocabulary continuous speech recognizer | |
Renals et al. | Decoder technology for connectionist large vocabulary speech recognition | |
Lee et al. | Improved acoustic modeling for continuous speech recognition | |
Lee et al. | Acoustic modeling of subword units for speech recognition | |
JP3873418B2 (ja) | 音声スポッティング装置 | |
JP3559479B2 (ja) | 連続音声認識方法 | |
Steinbiss | A search organization for large-vocabulary recognition based on n-best decoding. | |
JP2005091504A (ja) | 音声認識装置 | |
JP2731133B2 (ja) | 連続音声認識装置 | |
Bansal et al. | A joint decoding algorithm for multiple-example-based addition of words to a pronunciation lexicon | |
Kam et al. | Modeling pronunciation variation for Cantonese speech recognition | |
Holter et al. | Combined Optimisation of Baseforms and Subword Models for an Hmm Based Speech Recogniser. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20060206 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20081021 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20081104 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20090105 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20090203 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20090406 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20090512 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20090603 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120612 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120612 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130612 Year of fee payment: 4 |
|
LAPS | Cancellation because of no payment of annual fees |