DE19982503T1 - Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen - Google Patents
Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare DomänenInfo
- Publication number
- DE19982503T1 DE19982503T1 DE19982503T DE19982503T DE19982503T1 DE 19982503 T1 DE19982503 T1 DE 19982503T1 DE 19982503 T DE19982503 T DE 19982503T DE 19982503 T DE19982503 T DE 19982503T DE 19982503 T1 DE19982503 T1 DE 19982503T1
- Authority
- DE
- Germany
- Prior art keywords
- model
- adaptation
- speech recognition
- hierarchical organization
- acoustic model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 230000006978 adaptation Effects 0.000 title 1
- 230000008520 organization Effects 0.000 title 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/187,902 US6324510B1 (en) | 1998-11-06 | 1998-11-06 | Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains |
PCT/US1999/025752 WO2000028526A1 (en) | 1998-11-06 | 1999-11-05 | Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains |
Publications (1)
Publication Number | Publication Date |
---|---|
DE19982503T1 true DE19982503T1 (de) | 2001-03-08 |
Family
ID=22690959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE19982503T Ceased DE19982503T1 (de) | 1998-11-06 | 1999-11-05 | Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen |
Country Status (4)
Country | Link |
---|---|
US (1) | US6324510B1 (de) |
JP (1) | JP2002529800A (de) |
DE (1) | DE19982503T1 (de) |
WO (1) | WO2000028526A1 (de) |
Families Citing this family (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1116219B1 (de) * | 1999-07-01 | 2005-03-16 | Koninklijke Philips Electronics N.V. | Robuste sprachverarbeitung von verrauschten sprachmodellen |
KR100366057B1 (ko) * | 2000-06-26 | 2002-12-27 | 한국과학기술원 | 인간 청각 모델을 이용한 효율적인 음성인식 장치 |
US7295979B2 (en) * | 2000-09-29 | 2007-11-13 | International Business Machines Corporation | Language context dependent data labeling |
US7472064B1 (en) * | 2000-09-30 | 2008-12-30 | Intel Corporation | Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition |
ATE297588T1 (de) * | 2000-11-14 | 2005-06-15 | Ibm | Anpassung des phonetischen kontextes zur verbesserung der spracherkennung |
US7016887B2 (en) * | 2001-01-03 | 2006-03-21 | Accelrys Software Inc. | Methods and systems of classifying multiple properties simultaneously using a decision tree |
WO2002091357A1 (en) * | 2001-05-08 | 2002-11-14 | Intel Corporation | Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system |
US7809574B2 (en) | 2001-09-05 | 2010-10-05 | Voice Signal Technologies Inc. | Word recognition using choice lists |
US7467089B2 (en) | 2001-09-05 | 2008-12-16 | Roth Daniel L | Combined speech and handwriting recognition |
US7444286B2 (en) | 2001-09-05 | 2008-10-28 | Roth Daniel L | Speech recognition using re-utterance recognition |
WO2004023455A2 (en) * | 2002-09-06 | 2004-03-18 | Voice Signal Technologies, Inc. | Methods, systems, and programming for performing speech recognition |
US7313526B2 (en) | 2001-09-05 | 2007-12-25 | Voice Signal Technologies, Inc. | Speech recognition using selectable recognition modes |
US7526431B2 (en) | 2001-09-05 | 2009-04-28 | Voice Signal Technologies, Inc. | Speech recognition using ambiguous or phone key spelling and/or filtering |
US7505911B2 (en) | 2001-09-05 | 2009-03-17 | Roth Daniel L | Combined speech recognition and sound recording |
US7050668B2 (en) * | 2003-06-19 | 2006-05-23 | Lucent Technologies Inc. | Methods and apparatus for control of optical switching arrays that minimize bright state switching |
FR2857528B1 (fr) * | 2003-07-08 | 2006-01-06 | Telisma | Reconnaissance vocale pour les larges vocabulaires dynamiques |
US7542949B2 (en) * | 2004-05-12 | 2009-06-02 | Mitsubishi Electric Research Laboratories, Inc. | Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models |
US20060080356A1 (en) * | 2004-10-13 | 2006-04-13 | Microsoft Corporation | System and method for inferring similarities between media objects |
US20060136210A1 (en) * | 2004-12-16 | 2006-06-22 | Sony Corporation | System and method for tying variance vectors for speech recognition |
US20060136215A1 (en) * | 2004-12-21 | 2006-06-22 | Jong Jin Kim | Method of speaking rate conversion in text-to-speech system |
EP1889255A1 (de) * | 2005-05-24 | 2008-02-20 | Loquendo S.p.A. | Automatische textunabhängige, sprachenunabhänige sprecher-voice-print-erzeugung und sprechererkennung |
US8126710B2 (en) * | 2005-06-01 | 2012-02-28 | Loquendo S.P.A. | Conservative training method for adapting a neural network of an automatic speech recognition device |
US7805301B2 (en) * | 2005-07-01 | 2010-09-28 | Microsoft Corporation | Covariance estimation for pattern recognition |
US20070081428A1 (en) * | 2005-09-29 | 2007-04-12 | Spryance, Inc. | Transcribing dictation containing private information |
KR100755677B1 (ko) * | 2005-11-02 | 2007-09-05 | 삼성전자주식회사 | 주제 영역 검출을 이용한 대화체 음성 인식 장치 및 방법 |
US20080004876A1 (en) * | 2006-06-30 | 2008-01-03 | Chuang He | Non-enrolled continuous dictation |
US20080162129A1 (en) * | 2006-12-29 | 2008-07-03 | Motorola, Inc. | Method and apparatus pertaining to the processing of sampled audio content using a multi-resolution speech recognition search process |
US20080243503A1 (en) * | 2007-03-30 | 2008-10-02 | Microsoft Corporation | Minimum divergence based discriminative training for pattern recognition |
WO2008137616A1 (en) * | 2007-05-04 | 2008-11-13 | Nuance Communications, Inc. | Multi-class constrained maximum likelihood linear regression |
US8289884B1 (en) * | 2008-01-14 | 2012-10-16 | Dulles Research LLC | System and method for identification of unknown illicit networks |
US8682660B1 (en) * | 2008-05-21 | 2014-03-25 | Resolvity, Inc. | Method and system for post-processing speech recognition results |
US9798653B1 (en) * | 2010-05-05 | 2017-10-24 | Nuance Communications, Inc. | Methods, apparatus and data structure for cross-language speech adaptation |
US8719023B2 (en) * | 2010-05-21 | 2014-05-06 | Sony Computer Entertainment Inc. | Robustness to environmental changes of a context dependent speech recognizer |
US8812321B2 (en) * | 2010-09-30 | 2014-08-19 | At&T Intellectual Property I, L.P. | System and method for combining speech recognition outputs from a plurality of domain-specific speech recognizers via machine learning |
KR20120045582A (ko) * | 2010-10-29 | 2012-05-09 | 한국전자통신연구원 | 음향 모델 생성 장치 및 방법 |
US9257115B2 (en) | 2012-03-08 | 2016-02-09 | Facebook, Inc. | Device for extracting information from a dialog |
US9514739B2 (en) * | 2012-06-06 | 2016-12-06 | Cypress Semiconductor Corporation | Phoneme score accelerator |
US9224386B1 (en) * | 2012-06-22 | 2015-12-29 | Amazon Technologies, Inc. | Discriminative language model training using a confusion matrix |
JP6234060B2 (ja) | 2013-05-09 | 2017-11-22 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | ターゲットドメインの学習用音声データの生成方法、生成装置、および生成プログラム |
US10140981B1 (en) * | 2014-06-10 | 2018-11-27 | Amazon Technologies, Inc. | Dynamic arc weights in speech recognition models |
KR102405793B1 (ko) * | 2015-10-15 | 2022-06-08 | 삼성전자 주식회사 | 음성 신호 인식 방법 및 이를 제공하는 전자 장치 |
US10235994B2 (en) * | 2016-03-04 | 2019-03-19 | Microsoft Technology Licensing, Llc | Modular deep learning model |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4803729A (en) * | 1987-04-03 | 1989-02-07 | Dragon Systems, Inc. | Speech recognition method |
US5345535A (en) * | 1990-04-04 | 1994-09-06 | Doddington George R | Speech analysis method and apparatus |
US5303299A (en) * | 1990-05-15 | 1994-04-12 | Vcs Industries, Inc. | Method for continuous recognition of alphanumeric strings spoken over a telephone network |
US5745649A (en) * | 1994-07-07 | 1998-04-28 | Nynex Science & Technology Corporation | Automated speech recognition using a plurality of different multilayer perception structures to model a plurality of distinct phoneme categories |
JP2980228B2 (ja) * | 1994-10-25 | 1999-11-22 | 日本ビクター株式会社 | 音声認識用音響モデル生成方法 |
US5715367A (en) * | 1995-01-23 | 1998-02-03 | Dragon Systems, Inc. | Apparatuses and methods for developing and using models for speech recognition |
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
US5806030A (en) * | 1996-05-06 | 1998-09-08 | Matsushita Electric Ind Co Ltd | Low complexity, high accuracy clustering method for speech recognizer |
US5983180A (en) * | 1997-10-23 | 1999-11-09 | Softsound Limited | Recognition of sequential data using finite state sequence models organized in a tree structure |
-
1998
- 1998-11-06 US US09/187,902 patent/US6324510B1/en not_active Expired - Lifetime
-
1999
- 1999-11-05 WO PCT/US1999/025752 patent/WO2000028526A1/en active Application Filing
- 1999-11-05 JP JP2000581636A patent/JP2002529800A/ja active Pending
- 1999-11-05 DE DE19982503T patent/DE19982503T1/de not_active Ceased
Also Published As
Publication number | Publication date |
---|---|
JP2002529800A (ja) | 2002-09-10 |
WO2000028526A1 (en) | 2000-05-18 |
US6324510B1 (en) | 2001-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE19982503T1 (de) | Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen | |
DE69720087D1 (de) | Verfahren und Vorrichtung zur Unterdrückung von Hintergrundmusik oder -geräuschen im Eingangssignal eines Spracherkenners | |
DE69732769D1 (de) | Einrichtung und verfahren zur verminderung der undurchschaubarkeit eines spracherkennungswortverzeichnisses und zur dynamischen selektion von akustischen modellen | |
DE69933627D1 (de) | Vorrichtung und Verfahren zur Anpassung des Phasen- und Amplitudenfrequenzgangs eines Mikrofons | |
DE69628411D1 (de) | Vorrichtung und Verfahren zur Geräuschreduzierung eines Sprachsignals | |
DE69531710D1 (de) | Verfahren und Vorrichtung zur Verminderung von Rauschen bei Sprachsignalen | |
DE69923253D1 (de) | Verfahren und Vorrichtung zur Spracherkennung | |
DE69725106D1 (de) | Verfahren und Vorrichtung zur Spracherkennung mit Rauschadaptierung | |
DE69518705D1 (de) | Verfahren und Vorrichtung zur Spracherkennung | |
DE69524829D1 (de) | Verfahren und Vorrichtung zur Spracherkennung | |
DE69632901D1 (de) | Vorrichtung und Verfahren zur Sprachsynthese | |
DE69624624T2 (de) | Verfahren und Vorrichtung zur Reduzierung des Datenstroms innerhalb eines Zuges | |
DE69519820D1 (de) | Verfahren und Vorrichtung zur Sprachsynthese | |
DE69607913D1 (de) | Verfahren und vorrichtung zur spracherkennung auf der basis neuer wortmodelle | |
DE69619587D1 (de) | Verfahren und Vorrichtung zur Tonerzeugung | |
DE69523998T2 (de) | Verfahren und Vorrichtung zur Sprachsynthese | |
DE69710525T2 (de) | Verfahren und Vorrichtung zur Sprachsynthese | |
DE60023736D1 (de) | Verfahren und vorrichtung zur spracherkennung mit verschiedenen sprachmodellen | |
DE69613950T2 (de) | Verfahren und Vorrichtung zur Tonerzeugung | |
DE69612958D1 (de) | Verfahren und vorrichtung zur resynthetisierung eines sprachsignals | |
DE69613644D1 (de) | Verfahren zur Erzeugung eines Sprachmodels und Spracherkennungsvorrichtung | |
DE50114446D1 (de) | Vorrichtung und Verfahren zum geräuschabhängigen Anpassen eines akustischen Nutzsignals | |
DE69906569D1 (de) | Verfahren und vorrichtung zur spracherkennung eines mit störungen behafteten akustischen signals | |
DE69519818T2 (de) | Verfahren und Vorrichtung zur Sprachsynthese | |
DE69517829D1 (de) | Vorrichtung und Verfahren zur Spracherkennung |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
8127 | New person/name/address of the applicant |
Owner name: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V., LEPER, BE |
|
8127 | New person/name/address of the applicant |
Owner name: MULTIMODAL TECHNOLOGIES,INC., PITTSBURGH, PA., US |
|
8110 | Request for examination paragraph 44 | ||
R002 | Refusal decision in examination/registration proceedings | ||
8131 | Rejection | ||
R003 | Refusal decision now final |
Effective date: 20110322 |