DE19982503T1 - Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen - Google Patents

Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen

Info

Publication number
DE19982503T1
DE19982503T1 DE19982503T DE19982503T DE19982503T1 DE 19982503 T1 DE19982503 T1 DE 19982503T1 DE 19982503 T DE19982503 T DE 19982503T DE 19982503 T DE19982503 T DE 19982503T DE 19982503 T1 DE19982503 T1 DE 19982503T1
Authority
DE
Germany
Prior art keywords
model
adaptation
speech recognition
hierarchical organization
acoustic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
DE19982503T
Other languages
English (en)
Inventor
Alex Waibl
Juergen Fritsch
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Multimodal Technologies inc Pittsburgh Pa Us
Original Assignee
INTERACTIVE SYSTEMS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INTERACTIVE SYSTEMS Inc filed Critical INTERACTIVE SYSTEMS Inc
Publication of DE19982503T1 publication Critical patent/DE19982503T1/de
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)
DE19982503T 1998-11-06 1999-11-05 Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen Ceased DE19982503T1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/187,902 US6324510B1 (en) 1998-11-06 1998-11-06 Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains
PCT/US1999/025752 WO2000028526A1 (en) 1998-11-06 1999-11-05 Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains

Publications (1)

Publication Number Publication Date
DE19982503T1 true DE19982503T1 (de) 2001-03-08

Family

ID=22690959

Family Applications (1)

Application Number Title Priority Date Filing Date
DE19982503T Ceased DE19982503T1 (de) 1998-11-06 1999-11-05 Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen

Country Status (4)

Country Link
US (1) US6324510B1 (de)
JP (1) JP2002529800A (de)
DE (1) DE19982503T1 (de)
WO (1) WO2000028526A1 (de)

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1116219B1 (de) * 1999-07-01 2005-03-16 Koninklijke Philips Electronics N.V. Robuste sprachverarbeitung von verrauschten sprachmodellen
KR100366057B1 (ko) * 2000-06-26 2002-12-27 한국과학기술원 인간 청각 모델을 이용한 효율적인 음성인식 장치
US7295979B2 (en) * 2000-09-29 2007-11-13 International Business Machines Corporation Language context dependent data labeling
US7472064B1 (en) * 2000-09-30 2008-12-30 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition
ATE297588T1 (de) * 2000-11-14 2005-06-15 Ibm Anpassung des phonetischen kontextes zur verbesserung der spracherkennung
US7016887B2 (en) * 2001-01-03 2006-03-21 Accelrys Software Inc. Methods and systems of classifying multiple properties simultaneously using a decision tree
WO2002091357A1 (en) * 2001-05-08 2002-11-14 Intel Corporation Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system
US7809574B2 (en) 2001-09-05 2010-10-05 Voice Signal Technologies Inc. Word recognition using choice lists
US7467089B2 (en) 2001-09-05 2008-12-16 Roth Daniel L Combined speech and handwriting recognition
US7444286B2 (en) 2001-09-05 2008-10-28 Roth Daniel L Speech recognition using re-utterance recognition
WO2004023455A2 (en) * 2002-09-06 2004-03-18 Voice Signal Technologies, Inc. Methods, systems, and programming for performing speech recognition
US7313526B2 (en) 2001-09-05 2007-12-25 Voice Signal Technologies, Inc. Speech recognition using selectable recognition modes
US7526431B2 (en) 2001-09-05 2009-04-28 Voice Signal Technologies, Inc. Speech recognition using ambiguous or phone key spelling and/or filtering
US7505911B2 (en) 2001-09-05 2009-03-17 Roth Daniel L Combined speech recognition and sound recording
US7050668B2 (en) * 2003-06-19 2006-05-23 Lucent Technologies Inc. Methods and apparatus for control of optical switching arrays that minimize bright state switching
FR2857528B1 (fr) * 2003-07-08 2006-01-06 Telisma Reconnaissance vocale pour les larges vocabulaires dynamiques
US7542949B2 (en) * 2004-05-12 2009-06-02 Mitsubishi Electric Research Laboratories, Inc. Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models
US20060080356A1 (en) * 2004-10-13 2006-04-13 Microsoft Corporation System and method for inferring similarities between media objects
US20060136210A1 (en) * 2004-12-16 2006-06-22 Sony Corporation System and method for tying variance vectors for speech recognition
US20060136215A1 (en) * 2004-12-21 2006-06-22 Jong Jin Kim Method of speaking rate conversion in text-to-speech system
EP1889255A1 (de) * 2005-05-24 2008-02-20 Loquendo S.p.A. Automatische textunabhängige, sprachenunabhänige sprecher-voice-print-erzeugung und sprechererkennung
US8126710B2 (en) * 2005-06-01 2012-02-28 Loquendo S.P.A. Conservative training method for adapting a neural network of an automatic speech recognition device
US7805301B2 (en) * 2005-07-01 2010-09-28 Microsoft Corporation Covariance estimation for pattern recognition
US20070081428A1 (en) * 2005-09-29 2007-04-12 Spryance, Inc. Transcribing dictation containing private information
KR100755677B1 (ko) * 2005-11-02 2007-09-05 삼성전자주식회사 주제 영역 검출을 이용한 대화체 음성 인식 장치 및 방법
US20080004876A1 (en) * 2006-06-30 2008-01-03 Chuang He Non-enrolled continuous dictation
US20080162129A1 (en) * 2006-12-29 2008-07-03 Motorola, Inc. Method and apparatus pertaining to the processing of sampled audio content using a multi-resolution speech recognition search process
US20080243503A1 (en) * 2007-03-30 2008-10-02 Microsoft Corporation Minimum divergence based discriminative training for pattern recognition
WO2008137616A1 (en) * 2007-05-04 2008-11-13 Nuance Communications, Inc. Multi-class constrained maximum likelihood linear regression
US8289884B1 (en) * 2008-01-14 2012-10-16 Dulles Research LLC System and method for identification of unknown illicit networks
US8682660B1 (en) * 2008-05-21 2014-03-25 Resolvity, Inc. Method and system for post-processing speech recognition results
US9798653B1 (en) * 2010-05-05 2017-10-24 Nuance Communications, Inc. Methods, apparatus and data structure for cross-language speech adaptation
US8719023B2 (en) * 2010-05-21 2014-05-06 Sony Computer Entertainment Inc. Robustness to environmental changes of a context dependent speech recognizer
US8812321B2 (en) * 2010-09-30 2014-08-19 At&T Intellectual Property I, L.P. System and method for combining speech recognition outputs from a plurality of domain-specific speech recognizers via machine learning
KR20120045582A (ko) * 2010-10-29 2012-05-09 한국전자통신연구원 음향 모델 생성 장치 및 방법
US9257115B2 (en) 2012-03-08 2016-02-09 Facebook, Inc. Device for extracting information from a dialog
US9514739B2 (en) * 2012-06-06 2016-12-06 Cypress Semiconductor Corporation Phoneme score accelerator
US9224386B1 (en) * 2012-06-22 2015-12-29 Amazon Technologies, Inc. Discriminative language model training using a confusion matrix
JP6234060B2 (ja) 2013-05-09 2017-11-22 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation ターゲットドメインの学習用音声データの生成方法、生成装置、および生成プログラム
US10140981B1 (en) * 2014-06-10 2018-11-27 Amazon Technologies, Inc. Dynamic arc weights in speech recognition models
KR102405793B1 (ko) * 2015-10-15 2022-06-08 삼성전자 주식회사 음성 신호 인식 방법 및 이를 제공하는 전자 장치
US10235994B2 (en) * 2016-03-04 2019-03-19 Microsoft Technology Licensing, Llc Modular deep learning model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4803729A (en) * 1987-04-03 1989-02-07 Dragon Systems, Inc. Speech recognition method
US5345535A (en) * 1990-04-04 1994-09-06 Doddington George R Speech analysis method and apparatus
US5303299A (en) * 1990-05-15 1994-04-12 Vcs Industries, Inc. Method for continuous recognition of alphanumeric strings spoken over a telephone network
US5745649A (en) * 1994-07-07 1998-04-28 Nynex Science & Technology Corporation Automated speech recognition using a plurality of different multilayer perception structures to model a plurality of distinct phoneme categories
JP2980228B2 (ja) * 1994-10-25 1999-11-22 日本ビクター株式会社 音声認識用音響モデル生成方法
US5715367A (en) * 1995-01-23 1998-02-03 Dragon Systems, Inc. Apparatuses and methods for developing and using models for speech recognition
US6067517A (en) * 1996-02-02 2000-05-23 International Business Machines Corporation Transcription of speech data with segments from acoustically dissimilar environments
US5806030A (en) * 1996-05-06 1998-09-08 Matsushita Electric Ind Co Ltd Low complexity, high accuracy clustering method for speech recognizer
US5983180A (en) * 1997-10-23 1999-11-09 Softsound Limited Recognition of sequential data using finite state sequence models organized in a tree structure

Also Published As

Publication number Publication date
JP2002529800A (ja) 2002-09-10
WO2000028526A1 (en) 2000-05-18
US6324510B1 (en) 2001-11-27

Similar Documents

Publication Publication Date Title
DE19982503T1 (de) Verfahren und Vorrichtung zum hierarchischen Organisieren eines akustischen Modells zur Spracherkennung und Anpassung des Modells für uneinsehbare Domänen
DE69720087D1 (de) Verfahren und Vorrichtung zur Unterdrückung von Hintergrundmusik oder -geräuschen im Eingangssignal eines Spracherkenners
DE69732769D1 (de) Einrichtung und verfahren zur verminderung der undurchschaubarkeit eines spracherkennungswortverzeichnisses und zur dynamischen selektion von akustischen modellen
DE69933627D1 (de) Vorrichtung und Verfahren zur Anpassung des Phasen- und Amplitudenfrequenzgangs eines Mikrofons
DE69628411D1 (de) Vorrichtung und Verfahren zur Geräuschreduzierung eines Sprachsignals
DE69531710D1 (de) Verfahren und Vorrichtung zur Verminderung von Rauschen bei Sprachsignalen
DE69923253D1 (de) Verfahren und Vorrichtung zur Spracherkennung
DE69725106D1 (de) Verfahren und Vorrichtung zur Spracherkennung mit Rauschadaptierung
DE69518705D1 (de) Verfahren und Vorrichtung zur Spracherkennung
DE69524829D1 (de) Verfahren und Vorrichtung zur Spracherkennung
DE69632901D1 (de) Vorrichtung und Verfahren zur Sprachsynthese
DE69624624T2 (de) Verfahren und Vorrichtung zur Reduzierung des Datenstroms innerhalb eines Zuges
DE69519820D1 (de) Verfahren und Vorrichtung zur Sprachsynthese
DE69607913D1 (de) Verfahren und vorrichtung zur spracherkennung auf der basis neuer wortmodelle
DE69619587D1 (de) Verfahren und Vorrichtung zur Tonerzeugung
DE69523998T2 (de) Verfahren und Vorrichtung zur Sprachsynthese
DE69710525T2 (de) Verfahren und Vorrichtung zur Sprachsynthese
DE60023736D1 (de) Verfahren und vorrichtung zur spracherkennung mit verschiedenen sprachmodellen
DE69613950T2 (de) Verfahren und Vorrichtung zur Tonerzeugung
DE69612958D1 (de) Verfahren und vorrichtung zur resynthetisierung eines sprachsignals
DE69613644D1 (de) Verfahren zur Erzeugung eines Sprachmodels und Spracherkennungsvorrichtung
DE50114446D1 (de) Vorrichtung und Verfahren zum geräuschabhängigen Anpassen eines akustischen Nutzsignals
DE69906569D1 (de) Verfahren und vorrichtung zur spracherkennung eines mit störungen behafteten akustischen signals
DE69519818T2 (de) Verfahren und Vorrichtung zur Sprachsynthese
DE69517829D1 (de) Vorrichtung und Verfahren zur Spracherkennung

Legal Events

Date Code Title Description
8127 New person/name/address of the applicant

Owner name: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V., LEPER, BE

8127 New person/name/address of the applicant

Owner name: MULTIMODAL TECHNOLOGIES,INC., PITTSBURGH, PA., US

8110 Request for examination paragraph 44
R002 Refusal decision in examination/registration proceedings
8131 Rejection
R003 Refusal decision now final

Effective date: 20110322