JP4195428B2 - 多数の音声特徴を利用する音声認識 - Google Patents
多数の音声特徴を利用する音声認識 Download PDFInfo
- Publication number
- JP4195428B2 JP4195428B2 JP2004270823A JP2004270823A JP4195428B2 JP 4195428 B2 JP4195428 B2 JP 4195428B2 JP 2004270823 A JP2004270823 A JP 2004270823A JP 2004270823 A JP2004270823 A JP 2004270823A JP 4195428 B2 JP4195428 B2 JP 4195428B2
- Authority
- JP
- Japan
- Prior art keywords
- speech
- features
- model
- log
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 claims description 17
- 238000012886 linear function Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 description 38
- 230000006870 function Effects 0.000 description 20
- 238000012545 processing Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 7
- 238000013518 transcription Methods 0.000 description 7
- 230000035897 transcription Effects 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
Description
λjは対数線形モデルのパラメータであり、
fjは、抽出した多数の特徴であり、
Zは、式2が真の確率であることを保証する規格化因子(normalizationfactor)である(合計すると1になる)。規格化因子は条件付き変数の関数である。
bは、条件付き事象のある特性を表す二値関数であり、wは、単語などのターゲット(または予測される)状態/単位であり、
αは関数の重みである。
Fwjは、単語wによって埋められる分節に対するj番目のスコア特徴である。例えば、様々な周知の動的タイム・ワーピングおよび隠れマルコフ・モデル技術(図には明示的に示さず)によって得られる上位10個の動的タイム・ワーピング・スコアおよび隠れマルコフスコアが返される場合、ラティス中の各単語について11個のスコア特徴が存在することになる。
120 端末
140 端末
150 端末
210 電話システム
215 ネットワーク
225 ネットワーク
220 音声移送システム
230 音声入力装置
300 音声認識システム
310 音声データベース
320 スピーチ・プロセッサ
340 記憶装置
360 入力装置
380 出力装置
395 バス
1000 音声処理システム
Claims (2)
- 多数の音声特徴を入力データから抽出可能な特徴抽出器と、
前記特徴抽出器が抽出できた複数の音声特徴を使って、仮定される言語単位の事後確率を求める対数線形機能と、
前記対数線形機能に照会し、未知の発話の認識後出力を求める探索装置とを備え、 前記音声特徴が、データベース中の基準音節文節に対して計算した動的タイム・ワーピングのスコアである、
音声認識システム。 - 多数の音声特徴の抽出を試みるステップと、
抽出した複数の音声特徴を使って、仮定される言語単位の事後確率を求めるステップと、
対数線形機能を使用して、未知の発話の認識後出力を求めるステップとを含み、
前記音声特徴が、データベース中の基準音節文節に対して計算した動的タイム・ワーピングのスコアである、
音声認識方法。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/724,536 US7464031B2 (en) | 2003-11-28 | 2003-11-28 | Speech recognition utilizing multitude of speech features |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005165272A JP2005165272A (ja) | 2005-06-23 |
JP4195428B2 true JP4195428B2 (ja) | 2008-12-10 |
Family
ID=34620090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004270823A Expired - Fee Related JP4195428B2 (ja) | 2003-11-28 | 2004-09-17 | 多数の音声特徴を利用する音声認識 |
Country Status (3)
Country | Link |
---|---|
US (2) | US7464031B2 (ja) |
JP (1) | JP4195428B2 (ja) |
CN (1) | CN1296886C (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11074926B1 (en) | 2020-01-07 | 2021-07-27 | International Business Machines Corporation | Trending and context fatigue compensation in a voice signal |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7899671B2 (en) * | 2004-02-05 | 2011-03-01 | Avaya, Inc. | Recognition results postprocessor for use in voice recognition systems |
US7392187B2 (en) * | 2004-09-20 | 2008-06-24 | Educational Testing Service | Method and system for the automatic generation of speech features for scoring high entropy speech |
US7840404B2 (en) * | 2004-09-20 | 2010-11-23 | Educational Testing Service | Method and system for using automatic generation of speech features to provide diagnostic feedback |
US7809568B2 (en) * | 2005-11-08 | 2010-10-05 | Microsoft Corporation | Indexing and searching speech with text meta-data |
US7831428B2 (en) * | 2005-11-09 | 2010-11-09 | Microsoft Corporation | Speech index pruning |
US7831425B2 (en) * | 2005-12-15 | 2010-11-09 | Microsoft Corporation | Time-anchored posterior indexing of speech |
JP5062171B2 (ja) * | 2006-03-23 | 2012-10-31 | 日本電気株式会社 | 音声認識システム、音声認識方法および音声認識用プログラム |
US8214213B1 (en) * | 2006-04-27 | 2012-07-03 | At&T Intellectual Property Ii, L.P. | Speech recognition based on pronunciation modeling |
US8214208B2 (en) * | 2006-09-28 | 2012-07-03 | Reqall, Inc. | Method and system for sharing portable voice profiles |
US7788094B2 (en) * | 2007-01-29 | 2010-08-31 | Robert Bosch Gmbh | Apparatus, method and system for maximum entropy modeling for uncertain observations |
US7813929B2 (en) * | 2007-03-30 | 2010-10-12 | Nuance Communications, Inc. | Automatic editing using probabilistic word substitution models |
GB2453366B (en) * | 2007-10-04 | 2011-04-06 | Toshiba Res Europ Ltd | Automatic speech recognition method and apparatus |
US20090099847A1 (en) * | 2007-10-10 | 2009-04-16 | Microsoft Corporation | Template constrained posterior probability |
US7933847B2 (en) * | 2007-10-17 | 2011-04-26 | Microsoft Corporation | Limited-memory quasi-newton optimization algorithm for L1-regularized objectives |
US8296141B2 (en) * | 2008-11-19 | 2012-10-23 | At&T Intellectual Property I, L.P. | System and method for discriminative pronunciation modeling for voice search |
US9484019B2 (en) | 2008-11-19 | 2016-11-01 | At&T Intellectual Property I, L.P. | System and method for discriminative pronunciation modeling for voice search |
US8401852B2 (en) * | 2009-11-30 | 2013-03-19 | Microsoft Corporation | Utilizing features generated from phonic units in speech recognition |
WO2012023450A1 (ja) * | 2010-08-19 | 2012-02-23 | 日本電気株式会社 | テキスト処理システム、テキスト処理方法およびテキスト処理プログラム |
US8484023B2 (en) * | 2010-09-24 | 2013-07-09 | Nuance Communications, Inc. | Sparse representation features for speech recognition |
US8630860B1 (en) * | 2011-03-03 | 2014-01-14 | Nuance Communications, Inc. | Speaker and call characteristic sensitive open voice search |
US8727991B2 (en) | 2011-08-29 | 2014-05-20 | Salutron, Inc. | Probabilistic segmental model for doppler ultrasound heart rate monitoring |
US8909512B2 (en) * | 2011-11-01 | 2014-12-09 | Google Inc. | Enhanced stability prediction for incrementally generated speech recognition hypotheses based on an age of a hypothesis |
CN102376305B (zh) * | 2011-11-29 | 2013-06-19 | 安徽科大讯飞信息科技股份有限公司 | 语音识别方法及系统 |
US9324323B1 (en) | 2012-01-13 | 2016-04-26 | Google Inc. | Speech recognition using topic-specific language models |
US8775177B1 (en) * | 2012-03-08 | 2014-07-08 | Google Inc. | Speech recognition process |
CN102810135B (zh) * | 2012-09-17 | 2015-12-16 | 顾泰来 | 一种药品处方辅助处理系统 |
US9697827B1 (en) * | 2012-12-11 | 2017-07-04 | Amazon Technologies, Inc. | Error reduction in speech processing |
US9653070B2 (en) | 2012-12-31 | 2017-05-16 | Intel Corporation | Flexible architecture for acoustic signal processing engine |
WO2014191054A1 (en) * | 2013-05-31 | 2014-12-04 | Longsand Limited | Processing of audio data |
CN103337241B (zh) * | 2013-06-09 | 2015-06-24 | 北京云知声信息技术有限公司 | 一种语音识别方法和装置 |
JP2015060095A (ja) * | 2013-09-19 | 2015-03-30 | 株式会社東芝 | 音声翻訳装置、音声翻訳方法およびプログラム |
US9529901B2 (en) * | 2013-11-18 | 2016-12-27 | Oracle International Corporation | Hierarchical linguistic tags for documents |
US9842592B2 (en) * | 2014-02-12 | 2017-12-12 | Google Inc. | Language models using non-linguistic context |
KR20170034227A (ko) * | 2015-09-18 | 2017-03-28 | 삼성전자주식회사 | 음성 인식 장치 및 방법과, 음성 인식을 위한 변환 파라미터 학습 장치 및 방법 |
CN106683677B (zh) | 2015-11-06 | 2021-11-12 | 阿里巴巴集团控股有限公司 | 语音识别方法及装置 |
US10832664B2 (en) * | 2016-08-19 | 2020-11-10 | Google Llc | Automated speech recognition using language models that selectively use domain-specific model components |
JP6585022B2 (ja) | 2016-11-11 | 2019-10-02 | 株式会社東芝 | 音声認識装置、音声認識方法およびプログラム |
US10347245B2 (en) * | 2016-12-23 | 2019-07-09 | Soundhound, Inc. | Natural language grammar enablement by speech characterization |
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
US10607601B2 (en) * | 2017-05-11 | 2020-03-31 | International Business Machines Corporation | Speech recognition by selecting and refining hot words |
US10672388B2 (en) * | 2017-12-15 | 2020-06-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for open-vocabulary end-to-end speech recognition |
CN108415898B (zh) * | 2018-01-19 | 2021-09-24 | 思必驰科技股份有限公司 | 深度学习语言模型的词图重打分方法和系统 |
JP7137694B2 (ja) | 2018-09-12 | 2022-09-14 | シェンチェン ショックス カンパニー リミテッド | 複数の音響電気変換器を有する信号処理装置 |
JP7120064B2 (ja) * | 2019-02-08 | 2022-08-17 | 日本電信電話株式会社 | 言語モデルスコア計算装置、言語モデル作成装置、それらの方法、プログラム、および記録媒体 |
CN110853669B (zh) * | 2019-11-08 | 2023-05-16 | 腾讯科技(深圳)有限公司 | 音频识别方法、装置及设备 |
US11250872B2 (en) * | 2019-12-14 | 2022-02-15 | International Business Machines Corporation | Using closed captions as parallel training data for customization of closed captioning systems |
US11705111B2 (en) | 2020-11-12 | 2023-07-18 | Samsung Electronics Co., Ltd. | Methods and systems for predicting non-default actions against unstructured utterances |
CN113657461A (zh) * | 2021-07-28 | 2021-11-16 | 北京宝兰德软件股份有限公司 | 基于文本分类的日志异常检测方法、系统、设备及介质 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0756595A (ja) | 1993-08-19 | 1995-03-03 | Hitachi Ltd | 音声認識装置 |
US6304841B1 (en) * | 1993-10-28 | 2001-10-16 | International Business Machines Corporation | Automatic construction of conditional exponential models from elementary features |
US5790754A (en) * | 1994-10-21 | 1998-08-04 | Sensory Circuits, Inc. | Speech recognition apparatus for consumer electronic applications |
WO1999031654A2 (en) * | 1997-12-12 | 1999-06-24 | Koninklijke Philips Electronics N.V. | Method of determining model-specific factors for pattern recognition, in particular for speech patterns |
CN1141696C (zh) * | 2000-03-31 | 2004-03-10 | 清华大学 | 基于语音识别专用芯片的非特定人语音识别、语音提示方法 |
US7054810B2 (en) * | 2000-10-06 | 2006-05-30 | International Business Machines Corporation | Feature vector-based apparatus and method for robust pattern recognition |
DE10106581A1 (de) * | 2001-02-13 | 2002-08-22 | Philips Corp Intellectual Pty | Spracherkennungssystem, Trainingseinrichtung und Verfahren zum iterativen Berechnen freier Parameter eines Maximum-Entropie-Sprachmodells |
JP2002251592A (ja) * | 2001-02-22 | 2002-09-06 | Toshiba Corp | パターン認識辞書学習方法 |
DE10119284A1 (de) * | 2001-04-20 | 2002-10-24 | Philips Corp Intellectual Pty | Verfahren und System zum Training von jeweils genau einer Realisierungsvariante eines Inventarmusters zugeordneten Parametern eines Mustererkennungssystems |
US6687690B2 (en) * | 2001-06-14 | 2004-02-03 | International Business Machines Corporation | Employing a combined function for exception exploration in multidimensional data |
JP3919475B2 (ja) | 2001-07-10 | 2007-05-23 | シャープ株式会社 | 話者特徴抽出装置および話者特徴抽出方法、音声認識装置、並びに、プログラム記録媒体 |
US7324927B2 (en) * | 2003-07-03 | 2008-01-29 | Robert Bosch Gmbh | Fast feature selection method and system for maximum entropy modeling |
-
2003
- 2003-11-28 US US10/724,536 patent/US7464031B2/en not_active Expired - Fee Related
-
2004
- 2004-07-28 CN CNB2004100586870A patent/CN1296886C/zh not_active Expired - Fee Related
- 2004-09-17 JP JP2004270823A patent/JP4195428B2/ja not_active Expired - Fee Related
-
2008
- 2008-08-20 US US12/195,123 patent/US20080312921A1/en not_active Abandoned
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11074926B1 (en) | 2020-01-07 | 2021-07-27 | International Business Machines Corporation | Trending and context fatigue compensation in a voice signal |
Also Published As
Publication number | Publication date |
---|---|
US7464031B2 (en) | 2008-12-09 |
JP2005165272A (ja) | 2005-06-23 |
US20080312921A1 (en) | 2008-12-18 |
CN1296886C (zh) | 2007-01-24 |
CN1622196A (zh) | 2005-06-01 |
US20050119885A1 (en) | 2005-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4195428B2 (ja) | 多数の音声特徴を利用する音声認識 | |
JP6705008B2 (ja) | 話者照合方法及びシステム | |
CN107810529B (zh) | 语言模型语音端点确定 | |
US10210862B1 (en) | Lattice decoding and result confirmation using recurrent neural networks | |
US9911413B1 (en) | Neural latent variable model for spoken language understanding | |
US9318103B2 (en) | System and method for recognizing a user voice command in noisy environment | |
KR100755677B1 (ko) | 주제 영역 검출을 이용한 대화체 음성 인식 장치 및 방법 | |
US6542866B1 (en) | Speech recognition method and apparatus utilizing multiple feature streams | |
US10490182B1 (en) | Initializing and learning rate adjustment for rectifier linear unit based artificial neural networks | |
JP3933750B2 (ja) | 連続密度ヒドンマルコフモデルを用いた音声認識方法及び装置 | |
JP4274962B2 (ja) | 音声認識システム | |
US10170107B1 (en) | Extendable label recognition of linguistic input | |
KR101153078B1 (ko) | 음성 분류 및 음성 인식을 위한 은닉 조건부 랜덤 필드모델 | |
KR20180038707A (ko) | 동적 가중치 값과 토픽 정보를 이용하는 음성인식 방법 | |
JP4836076B2 (ja) | 音声認識システム及びコンピュータプログラム | |
Williams | Knowing what you don't know: roles for confidence measures in automatic speech recognition | |
US20040006469A1 (en) | Apparatus and method for updating lexicon | |
Wang | Mandarin spoken document retrieval based on syllable lattice matching | |
Tabibian | A survey on structured discriminative spoken keyword spotting | |
Benıtez et al. | Different confidence measures for word verification in speech recognition | |
Kurian et al. | Automated Transcription System for MalayalamLanguage | |
Furui | Steps toward natural human-machine communication in the 21st century | |
Khalifa et al. | Statistical modeling for speech recognition | |
JP6199994B2 (ja) | コンテキスト情報を使用した音声認識システムにおける誤警報低減 | |
KR101037801B1 (ko) | 부단위 인식을 이용한 핵심어 검출 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20060620 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20060913 |
|
A602 | Written permission of extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A602 Effective date: 20060919 |
|
A521 | Written amendment |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20061013 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20070403 |
|
A521 | Written amendment |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20070615 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20080909 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20080925 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20111003 Year of fee payment: 3 |
|
LAPS | Cancellation because of no payment of annual fees |