CN1139911C - 语音识别系统的动态可配置声模型 - Google Patents
语音识别系统的动态可配置声模型 Download PDFInfo
- Publication number
- CN1139911C CN1139911C CNB99806243XA CN99806243A CN1139911C CN 1139911 C CN1139911 C CN 1139911C CN B99806243X A CNB99806243X A CN B99806243XA CN 99806243 A CN99806243 A CN 99806243A CN 1139911 C CN1139911 C CN 1139911C
- Authority
- CN
- China
- Prior art keywords
- morpheme
- parameter
- model
- size
- merged
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 34
- 238000013016 damping Methods 0.000 claims description 21
- 238000009826 distribution Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 230000005039 memory span Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 230000015654 memory Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000007476 Maximum Likelihood Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 230000008676 import Effects 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (31)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/060,654 | 1998-04-15 | ||
US09/060,654 US6141641A (en) | 1998-04-15 | 1998-04-15 | Dynamically configurable acoustic model for speech recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1301379A CN1301379A (zh) | 2001-06-27 |
CN1139911C true CN1139911C (zh) | 2004-02-25 |
Family
ID=22030937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB99806243XA Expired - Fee Related CN1139911C (zh) | 1998-04-15 | 1999-03-29 | 语音识别系统的动态可配置声模型 |
Country Status (6)
Country | Link |
---|---|
US (1) | US6141641A (zh) |
EP (1) | EP1070314B1 (zh) |
JP (2) | JP4450991B2 (zh) |
CN (1) | CN1139911C (zh) |
DE (1) | DE69925479T2 (zh) |
WO (1) | WO1999053478A1 (zh) |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6807537B1 (en) * | 1997-12-04 | 2004-10-19 | Microsoft Corporation | Mixtures of Bayesian networks |
US6418431B1 (en) * | 1998-03-30 | 2002-07-09 | Microsoft Corporation | Information retrieval and speech recognition based on language models |
US6141641A (en) * | 1998-04-15 | 2000-10-31 | Microsoft Corporation | Dynamically configurable acoustic model for speech recognition system |
DE59904741D1 (de) * | 1998-05-11 | 2003-04-30 | Siemens Ag | Anordnung und verfahren zur erkennung eines vorgegebenen wortschatzes in gesprochener sprache durch einen rechner |
US6684186B2 (en) * | 1999-01-26 | 2004-01-27 | International Business Machines Corporation | Speaker recognition using a hierarchical speaker model tree |
US6904402B1 (en) * | 1999-11-05 | 2005-06-07 | Microsoft Corporation | System and iterative method for lexicon, segmentation and language model joint optimization |
US6442519B1 (en) * | 1999-11-10 | 2002-08-27 | International Business Machines Corp. | Speaker model adaptation via network of similar users |
US6792405B2 (en) | 1999-12-10 | 2004-09-14 | At&T Corp. | Bitstream-based feature extraction method for a front-end speech recognizer |
US7110947B2 (en) | 1999-12-10 | 2006-09-19 | At&T Corp. | Frame erasure concealment technique for a bitstream-based feature extractor |
US6865528B1 (en) | 2000-06-01 | 2005-03-08 | Microsoft Corporation | Use of a unified language model |
US7031908B1 (en) * | 2000-06-01 | 2006-04-18 | Microsoft Corporation | Creating a language model for a language processing system |
US7020587B1 (en) * | 2000-06-30 | 2006-03-28 | Microsoft Corporation | Method and apparatus for generating and managing a language model data structure |
JP4336865B2 (ja) * | 2001-03-13 | 2009-09-30 | 日本電気株式会社 | 音声認識装置 |
US7711570B2 (en) * | 2001-10-21 | 2010-05-04 | Microsoft Corporation | Application abstraction with dialog purpose |
US8229753B2 (en) | 2001-10-21 | 2012-07-24 | Microsoft Corporation | Web server controls for web enabled recognition and/or audible prompting |
WO2003043277A1 (fr) * | 2001-11-15 | 2003-05-22 | Matsushita Electric Industrial Co., Ltd. | Appareil et procede de masquage d'erreur |
DE10220524B4 (de) | 2002-05-08 | 2006-08-10 | Sap Ag | Verfahren und System zur Verarbeitung von Sprachdaten und zur Erkennung einer Sprache |
DE10220520A1 (de) * | 2002-05-08 | 2003-11-20 | Sap Ag | Verfahren zur Erkennung von Sprachinformation |
EP1361740A1 (de) * | 2002-05-08 | 2003-11-12 | Sap Ag | Verfahren und System zur Verarbeitung von Sprachinformationen eines Dialogs |
EP1363271A1 (de) | 2002-05-08 | 2003-11-19 | Sap Ag | Verfahren und System zur Verarbeitung und Speicherung von Sprachinformationen eines Dialogs |
US7940844B2 (en) | 2002-06-18 | 2011-05-10 | Qualcomm Incorporated | Video encoding and decoding techniques |
US7533023B2 (en) * | 2003-02-12 | 2009-05-12 | Panasonic Corporation | Intermediary speech processor in network environments transforming customized speech parameters |
US7529671B2 (en) | 2003-03-04 | 2009-05-05 | Microsoft Corporation | Block synchronous decoding |
US7571097B2 (en) * | 2003-03-13 | 2009-08-04 | Microsoft Corporation | Method for training of subspace coded gaussian models |
US8301436B2 (en) * | 2003-05-29 | 2012-10-30 | Microsoft Corporation | Semantic object synchronous understanding for highly interactive interface |
US7200559B2 (en) | 2003-05-29 | 2007-04-03 | Microsoft Corporation | Semantic object synchronous understanding implemented with speech application language tags |
US8160883B2 (en) | 2004-01-10 | 2012-04-17 | Microsoft Corporation | Focus tracking in dialogs |
US7231019B2 (en) * | 2004-02-12 | 2007-06-12 | Microsoft Corporation | Automatic identification of telephone callers based on voice characteristics |
KR100590561B1 (ko) * | 2004-10-12 | 2006-06-19 | 삼성전자주식회사 | 신호의 피치를 평가하는 방법 및 장치 |
US20060136210A1 (en) * | 2004-12-16 | 2006-06-22 | Sony Corporation | System and method for tying variance vectors for speech recognition |
US20070088552A1 (en) * | 2005-10-17 | 2007-04-19 | Nokia Corporation | Method and a device for speech recognition |
US7970613B2 (en) * | 2005-11-12 | 2011-06-28 | Sony Computer Entertainment Inc. | Method and system for Gaussian probability data bit reduction and computation |
US7680664B2 (en) * | 2006-08-16 | 2010-03-16 | Microsoft Corporation | Parsimonious modeling by non-uniform kernel allocation |
US8234116B2 (en) * | 2006-08-22 | 2012-07-31 | Microsoft Corporation | Calculating cost measures between HMM acoustic models |
US8165877B2 (en) * | 2007-08-03 | 2012-04-24 | Microsoft Corporation | Confidence measure generation for speech related searching |
US8160878B2 (en) * | 2008-09-16 | 2012-04-17 | Microsoft Corporation | Piecewise-based variable-parameter Hidden Markov Models and the training thereof |
US8145488B2 (en) * | 2008-09-16 | 2012-03-27 | Microsoft Corporation | Parameter clustering and sharing for variable-parameter hidden markov models |
US9002713B2 (en) * | 2009-06-09 | 2015-04-07 | At&T Intellectual Property I, L.P. | System and method for speech personalization by need |
US8484023B2 (en) * | 2010-09-24 | 2013-07-09 | Nuance Communications, Inc. | Sparse representation features for speech recognition |
KR20120045582A (ko) * | 2010-10-29 | 2012-05-09 | 한국전자통신연구원 | 음향 모델 생성 장치 및 방법 |
EP2851895A3 (en) | 2011-06-30 | 2015-05-06 | Google, Inc. | Speech recognition using variable-length context |
US9153235B2 (en) | 2012-04-09 | 2015-10-06 | Sony Computer Entertainment Inc. | Text dependent speaker recognition with long-term feature based on functional data analysis |
US9514739B2 (en) * | 2012-06-06 | 2016-12-06 | Cypress Semiconductor Corporation | Phoneme score accelerator |
US9224384B2 (en) * | 2012-06-06 | 2015-12-29 | Cypress Semiconductor Corporation | Histogram based pre-pruning scheme for active HMMS |
JP5659203B2 (ja) * | 2012-09-06 | 2015-01-28 | 株式会社東芝 | モデル学習装置、モデル作成方法及びモデル作成プログラム |
US9336771B2 (en) * | 2012-11-01 | 2016-05-10 | Google Inc. | Speech recognition using non-parametric models |
US20140372118A1 (en) * | 2013-06-17 | 2014-12-18 | Speech Morphing Systems, Inc. | Method and apparatus for exemplary chip architecture |
CN104766608A (zh) * | 2014-01-07 | 2015-07-08 | 深圳市中兴微电子技术有限公司 | 一种语音控制方法及装置 |
US9858922B2 (en) | 2014-06-23 | 2018-01-02 | Google Inc. | Caching speech recognition scores |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
US9299347B1 (en) | 2014-10-22 | 2016-03-29 | Google Inc. | Speech recognition using associative mapping |
CN112509567B (zh) * | 2020-12-25 | 2024-05-10 | 阿波罗智联(北京)科技有限公司 | 语音数据处理的方法、装置、设备、存储介质及程序产品 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3584567D1 (de) * | 1984-12-27 | 1991-12-05 | Texas Instruments Inc | Verfahren und einrichtung zur sprecherunabhaengigen spracherkennung. |
US4797929A (en) * | 1986-01-03 | 1989-01-10 | Motorola, Inc. | Word recognition in a speech recognition system using data reduced word templates |
US5033087A (en) * | 1989-03-14 | 1991-07-16 | International Business Machines Corp. | Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system |
JP2662120B2 (ja) * | 1991-10-01 | 1997-10-08 | インターナショナル・ビジネス・マシーンズ・コーポレイション | 音声認識装置および音声認識用処理ユニット |
EP0590173A1 (de) * | 1992-09-28 | 1994-04-06 | International Business Machines Corporation | Computersystem zur Spracherkennung |
JPH0769711B2 (ja) * | 1993-03-09 | 1995-07-31 | 株式会社エイ・ティ・アール自動翻訳電話研究所 | 音声認識方法 |
US5794197A (en) * | 1994-01-21 | 1998-08-11 | Micrsoft Corporation | Senone tree representation and evaluation |
US5794198A (en) * | 1994-10-28 | 1998-08-11 | Nippon Telegraph And Telephone Corporation | Pattern recognition method |
JPH08248986A (ja) * | 1995-03-13 | 1996-09-27 | Nippon Telegr & Teleph Corp <Ntt> | パターン認識方法 |
US5710866A (en) * | 1995-05-26 | 1998-01-20 | Microsoft Corporation | System and method for speech recognition using dynamically adjusted confidence measure |
JP2852210B2 (ja) * | 1995-09-19 | 1999-01-27 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | 不特定話者モデル作成装置及び音声認識装置 |
JP3126985B2 (ja) * | 1995-11-04 | 2001-01-22 | インターナシヨナル・ビジネス・マシーンズ・コーポレーション | 音声認識システムの言語モデルのサイズを適応させるための方法および装置 |
JPH09134193A (ja) * | 1995-11-08 | 1997-05-20 | Nippon Telegr & Teleph Corp <Ntt> | 音声認識装置 |
US5806030A (en) * | 1996-05-06 | 1998-09-08 | Matsushita Electric Ind Co Ltd | Low complexity, high accuracy clustering method for speech recognizer |
US5822730A (en) * | 1996-08-22 | 1998-10-13 | Dragon Systems, Inc. | Lexical tree pre-filtering in speech recognition |
US5950158A (en) * | 1997-07-30 | 1999-09-07 | Nynex Science And Technology, Inc. | Methods and apparatus for decreasing the size of pattern recognition models by pruning low-scoring models from generated sets of models |
US5963902A (en) * | 1997-07-30 | 1999-10-05 | Nynex Science & Technology, Inc. | Methods and apparatus for decreasing the size of generated models trained for automatic pattern recognition |
US6141641A (en) * | 1998-04-15 | 2000-10-31 | Microsoft Corporation | Dynamically configurable acoustic model for speech recognition system |
-
1998
- 1998-04-15 US US09/060,654 patent/US6141641A/en not_active Expired - Lifetime
-
1999
- 1999-03-29 JP JP2000543956A patent/JP4450991B2/ja not_active Expired - Fee Related
- 1999-03-29 DE DE69925479T patent/DE69925479T2/de not_active Expired - Lifetime
- 1999-03-29 EP EP99914257A patent/EP1070314B1/en not_active Expired - Lifetime
- 1999-03-29 CN CNB99806243XA patent/CN1139911C/zh not_active Expired - Fee Related
- 1999-03-29 WO PCT/US1999/006837 patent/WO1999053478A1/en active IP Right Grant
-
2009
- 2009-12-04 JP JP2009276794A patent/JP4913204B2/ja not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
DE69925479T2 (de) | 2006-02-02 |
JP2002511609A (ja) | 2002-04-16 |
CN1301379A (zh) | 2001-06-27 |
JP4450991B2 (ja) | 2010-04-14 |
DE69925479D1 (de) | 2005-06-30 |
JP4913204B2 (ja) | 2012-04-11 |
JP2010049291A (ja) | 2010-03-04 |
EP1070314B1 (en) | 2005-05-25 |
WO1999053478A1 (en) | 1999-10-21 |
EP1070314A1 (en) | 2001-01-24 |
US6141641A (en) | 2000-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1139911C (zh) | 语音识别系统的动态可配置声模型 | |
CN1256714C (zh) | 语音识别方法及语境模型分级结构生成方法 | |
CN1202512C (zh) | 用于识别连续和分立语音的语音识别系统 | |
CN1152365C (zh) | 音调跟踪装置和方法 | |
CN1667699A (zh) | 为字母-声音转换生成有互信息标准的大文法音素单元 | |
CN1331467A (zh) | 产生声学模型的方法和装置 | |
CN1573926A (zh) | 用于文本和语音分类的区别性语言模型训练 | |
CN1677487A (zh) | 使用语义监控的语言模型适应 | |
CN1571013A (zh) | 从文本中预测误词率的方法和设备 | |
CN1171592A (zh) | 采用连续密度隐藏式马尔克夫模型的语音识别方法和系统 | |
CN1538382A (zh) | 用于子空间编码高斯模型的训练的方法 | |
CN1298172A (zh) | 用于中等或大词汇量语音识别的上下文相关声模型 | |
CN1551101A (zh) | 压缩声音模型的自适应 | |
CN105893389A (zh) | 一种语音信息搜索方法、装置及服务器 | |
CN1495641A (zh) | 自适应上下文敏感分析有限版权弃权声明 | |
CN1924994A (zh) | 一种嵌入式语音合成方法及系统 | |
CN111653270B (zh) | 语音处理方法、装置、计算机可读存储介质及电子设备 | |
JP7193000B2 (ja) | 類似文書検索方法、類似文書検索プログラム、類似文書検索装置、索引情報作成方法、索引情報作成プログラムおよび索引情報作成装置 | |
CN1198261C (zh) | 基于决策树的语音辨别方法 | |
CN116189671B (zh) | 一种用于语言教学的数据挖掘方法及系统 | |
CN104199811A (zh) | 短句解析模型建立方法及系统 | |
CN1214362C (zh) | 用于确定信号间相关系数和信号音高的设备和方法 | |
CN1455388A (zh) | 语音识别系统及用于语音识别系统的特征矢量集的压缩方法 | |
JP7376896B2 (ja) | 学習装置、学習方法、学習プログラム、生成装置、生成方法及び生成プログラム | |
CN104166837A (zh) | 采用最相关的兴趣点的各组的选择的视觉语音识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: MICROSOFT TECHNOLOGY LICENSING LLC Free format text: FORMER OWNER: MICROSOFT CORP. Effective date: 20150422 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20150422 Address after: Washington State Patentee after: Micro soft technique license Co., Ltd Address before: Washington, USA Patentee before: Microsoft Corp. |
|
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20040225 Termination date: 20150329 |
|
EXPY | Termination of patent right or utility model |