BR9712979A - Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala - Google Patents

Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala

Info

Publication number
BR9712979A
BR9712979A BR9712979-8A BR9712979A BR9712979A BR 9712979 A BR9712979 A BR 9712979A BR 9712979 A BR9712979 A BR 9712979A BR 9712979 A BR9712979 A BR 9712979A
Authority
BR
Brazil
Prior art keywords
models
hidden markov
acoustic
markov
acoustic model
Prior art date
Application number
BR9712979-8A
Other languages
English (en)
Inventor
Udo Bub
Harald Hoege
Joachim Koehler
Original Assignee
Siemens Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DE19636739A external-priority patent/DE19636739C1/de
Application filed by Siemens Ag filed Critical Siemens Ag
Publication of BR9712979A publication Critical patent/BR9712979A/pt

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
  • Measuring Volume Flow (AREA)
  • Character Discrimination (AREA)

Abstract

Patente de Invenção: <B>"PROCESSO PARA ADAPTAçãO DE UM MODELO ACúSTICO HIDDEN MARKOV EM UM SISTEMA DE RECONHECIMENTO DE FALA"<D>. Com a invenção, um livro de código (CB) em geral disponível com um sistema de reconhecimento de falta de modelos acústicos Hidden Markov é adaptado para casos especiais de aplicação. Esses casos de aplicação são então definidos por um léxico (LEX) do aplicativo alterado pelo usuário. A adaptação (ADAP) ocorre durante a operação e se dá por um deslocamento do vetor de ponto mediano armazenado das distribuições de distâncias de probabilidade de modelos Hidden Markov, na direção de um vetor de característica reconhecido de expressões de som e com relação aos modelos Hidden Markov especialmente empregues. Em comparação com processos usuais, tem a invenção a vantagem de que se da on-line e de que assegura uma taxa de reconhecimento muito alta com um pequeno dispêndio de busca. Além disso, o dispêndio para o treinamento de modelos acústico especiais para correspondentes casos de emprego é eliminado. Mediante aplicação de modelos Hidden Markov especiais de fonemas multi-linguais, em que as semelhanças de sons é aproveitada por diversas línguas, pode-se ter uma adaptação automática a línguas estrangeiras. Com o método então empregue para modelagem fonética acústica, são levadas em consideração tanto propriedades específicas da língua como também independentes da língua na reunião de distâncias de probabilidade para modelos acústicos Hidden Markov distintos em diversas línguas.
BR9712979-8A 1996-09-10 1997-09-10 Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala BR9712979A (pt)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE19636739A DE19636739C1 (de) 1996-09-10 1996-09-10 Verfahren zur Mehrsprachenverwendung eines hidden Markov Lautmodelles in einem Spracherkennungssystem
DE19640586 1996-10-01
PCT/DE1997/002016 WO1998011534A1 (de) 1996-09-10 1997-09-10 Verfahren zur anpassung eines hidden-markov-lautmodelles in einem spracherkennungssystem

Publications (1)

Publication Number Publication Date
BR9712979A true BR9712979A (pt) 2000-10-31

Family

ID=26029219

Family Applications (1)

Application Number Title Priority Date Filing Date
BR9712979-8A BR9712979A (pt) 1996-09-10 1997-09-10 Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala

Country Status (9)

Country Link
US (1) US6460017B1 (pt)
EP (1) EP0925579B1 (pt)
JP (1) JP2001503154A (pt)
CN (1) CN1237259A (pt)
AT (1) ATE209814T1 (pt)
BR (1) BR9712979A (pt)
DE (1) DE59705581D1 (pt)
ES (1) ES2169432T3 (pt)
WO (1) WO1998011534A1 (pt)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE235733T1 (de) * 1998-05-11 2003-04-15 Siemens Ag Anordnung und verfahren zur erkennung eines vorgegebenen wortschatzes in gesprochener sprache durch einen rechner
US6085160A (en) * 1998-07-10 2000-07-04 Lernout & Hauspie Speech Products N.V. Language independent speech recognition
KR100415217B1 (ko) * 1998-09-09 2004-01-16 아사히 가세이 가부시키가이샤 음성인식 장치
IT1310154B1 (it) * 1999-09-02 2002-02-11 Cselt Centro Studi Lab Telecom Procedimento per realizzare un riconoscitore vocale, relativoriconoscitore e procedimento per il riconoscimento della voce
KR100307623B1 (ko) * 1999-10-21 2001-11-02 윤종용 엠.에이.피 화자 적응 조건에서 파라미터의 분별적 추정 방법 및 장치 및 이를 각각 포함한 음성 인식 방법 및 장치
US6665640B1 (en) 1999-11-12 2003-12-16 Phoenix Solutions, Inc. Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US6615172B1 (en) 1999-11-12 2003-09-02 Phoenix Solutions, Inc. Intelligent query engine for processing voice based queries
US7725307B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
DE10040063A1 (de) * 2000-08-16 2002-02-28 Philips Corp Intellectual Pty Verfahren zur Zuordnung von Phonemen
US6801656B1 (en) * 2000-11-06 2004-10-05 Koninklijke Philips Electronics N.V. Method and apparatus for determining a number of states for a hidden Markov model in a signal processing system
DE60029456T2 (de) * 2000-12-11 2007-07-12 Sony Deutschland Gmbh Verfahren zur Online-Anpassung von Aussprachewörterbüchern
US7418386B2 (en) 2001-04-03 2008-08-26 Intel Corporation Method, apparatus and system for building a compact language model for large vocabulary continuous speech recognition (LVCSR) system
US7124080B2 (en) * 2001-11-13 2006-10-17 Microsoft Corporation Method and apparatus for adapting a class entity dictionary used with language models
US7974843B2 (en) * 2002-01-17 2011-07-05 Siemens Aktiengesellschaft Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer
DE10220520A1 (de) * 2002-05-08 2003-11-20 Sap Ag Verfahren zur Erkennung von Sprachinformation
EP1361740A1 (de) * 2002-05-08 2003-11-12 Sap Ag Verfahren und System zur Verarbeitung von Sprachinformationen eines Dialogs
DE10256935A1 (de) * 2002-12-05 2004-07-01 Siemens Ag Auswahl der Benutzersprache an einem rein akustisch gesteuerten Telefon
US8285537B2 (en) * 2003-01-31 2012-10-09 Comverse, Inc. Recognition of proper nouns using native-language pronunciation
DE10334400A1 (de) * 2003-07-28 2005-02-24 Siemens Ag Verfahren zur Spracherkennung und Kommunikationsgerät
US7596499B2 (en) * 2004-02-02 2009-09-29 Panasonic Corporation Multilingual text-to-speech system with limited resources
US8036893B2 (en) * 2004-07-22 2011-10-11 Nuance Communications, Inc. Method and system for identifying and correcting accent-induced speech recognition difficulties
KR101244232B1 (ko) * 2005-05-27 2013-03-18 오디언스 인코포레이티드 오디오 신호 분석 및 변경을 위한 시스템 및 방법
US8781837B2 (en) * 2006-03-23 2014-07-15 Nec Corporation Speech recognition system and method for plural applications
US20080147579A1 (en) * 2006-12-14 2008-06-19 Microsoft Corporation Discriminative training using boosted lasso
US20090030676A1 (en) * 2007-07-26 2009-01-29 Creative Technology Ltd Method of deriving a compressed acoustic model for speech recognition
EP2192575B1 (en) * 2008-11-27 2014-04-30 Nuance Communications, Inc. Speech recognition based on a multilingual acoustic model
KR101217524B1 (ko) * 2008-12-22 2013-01-18 한국전자통신연구원 고립어 엔베스트 인식결과를 위한 발화검증 방법 및 장치
US20100198577A1 (en) * 2009-02-03 2010-08-05 Microsoft Corporation State mapping for cross-language speaker adaptation
US9798653B1 (en) * 2010-05-05 2017-10-24 Nuance Communications, Inc. Methods, apparatus and data structure for cross-language speech adaptation
US9672815B2 (en) * 2012-07-20 2017-06-06 Interactive Intelligence Group, Inc. Method and system for real-time keyword spotting for speech analytics
US9437208B2 (en) * 2013-06-03 2016-09-06 Adobe Systems Incorporated General sound decomposition models
US9183830B2 (en) * 2013-11-01 2015-11-10 Google Inc. Method and system for non-parametric voice conversion
US9177549B2 (en) * 2013-11-01 2015-11-03 Google Inc. Method and system for cross-lingual voice conversion
US9542927B2 (en) 2014-11-13 2017-01-10 Google Inc. Method and system for building text-to-speech voice from diverse recordings
CN105260775B (zh) * 2015-10-16 2017-11-21 清华大学 实现马尔科夫随机场概率编码的方法及神经电路

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
JPH0636156B2 (ja) 1989-03-13 1994-05-11 インターナショナル・ビジネス・マシーンズ・コーポレーション 音声認識装置
EP0548460A3 (en) 1991-12-21 1993-11-03 Deutsche Aerospace Method for fast speaker adaptation in a speech recognizer for large vocabulary
FI97919C (fi) * 1992-06-05 1997-03-10 Nokia Mobile Phones Ltd Puheentunnistusmenetelmä ja -järjestelmä puheella ohjattavaa puhelinta varten
US5805771A (en) * 1994-06-22 1998-09-08 Texas Instruments Incorporated Automatic language identification method and system
JPH0896514A (ja) * 1994-07-28 1996-04-12 Sony Corp オーディオ信号処理装置
US5864810A (en) * 1995-01-20 1999-01-26 Sri International Method and apparatus for speech recognition adapted to an individual speaker

Also Published As

Publication number Publication date
ES2169432T3 (es) 2002-07-01
JP2001503154A (ja) 2001-03-06
DE59705581D1 (de) 2002-01-10
CN1237259A (zh) 1999-12-01
WO1998011534A1 (de) 1998-03-19
EP0925579B1 (de) 2001-11-28
ATE209814T1 (de) 2001-12-15
US6460017B1 (en) 2002-10-01
EP0925579A1 (de) 1999-06-30

Similar Documents

Publication Publication Date Title
BR9712979A (pt) Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala
Gauvain et al. Speaker-independent continuous speech dictation
AU640164B2 (en) Method of speech recognition
WO1998011537A3 (de) Verfahren zur mehrsprachenverwendung eines hidden markov lautmodelles in einem spracherkennungssystem
Lyu et al. Language identification on code-switching utterances using multiple cues.
Ostendorf et al. Parse scoring with prosodic information: an analysis/synthesis approach
Chollet et al. Toward ALISP: A proposal for automatic language independent speech processing
Vorstermans et al. Automatic segmentation and labelling of multi-lingual speech data
Veilleux et al. Prosody/parse scoring and its application in ATIS
Stahlberg et al. Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment.
Schuller et al. Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition
Lamel et al. Continuous speech recognition at LIMSI
Kim et al. A combined punctuation generation and speech recognition system and its performance enhancement using prosody
Toledano et al. Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules
Heeman POS tagging versus classes in language modeling
Niu et al. A study on landmark detection based on CTC and its application to pronunciation error detection
Boroş A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language
Freij et al. Lexical stress estimation and phonological knowledge
Bakenecker et al. Improving parsing by incorporating" prosodic clause boundaries" into a grammar
Ostendorf Prosodic boundary detection
Mimer et al. Flexible decision trees for grapheme based speech recognition
Pandey et al. Fusion of spectral and prosodic information using combined error optimization for keyword spotting
Keri et al. Pause prediction from lexical and syntax information
Plannerer et al. A continuous speech recognition system using phonotactic constraints
Tanaka et al. A speech processing based on syllable identification by using phonological patterns

Legal Events

Date Code Title Description
B08F Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette]
B15K Others concerning applications: alteration of classification

Ipc: G10L 15/065 (2013.01), G10L 15/00 (2013.0