BR9712979A - Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala - Google Patents
Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de falaInfo
- Publication number
- BR9712979A BR9712979A BR9712979-8A BR9712979A BR9712979A BR 9712979 A BR9712979 A BR 9712979A BR 9712979 A BR9712979 A BR 9712979A BR 9712979 A BR9712979 A BR 9712979A
- Authority
- BR
- Brazil
- Prior art keywords
- models
- hidden markov
- acoustic
- markov
- acoustic model
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 4
- 230000006978 adaptation Effects 0.000 abstract 3
- 238000006073 displacement reaction Methods 0.000 abstract 1
- 238000009826 distribution Methods 0.000 abstract 1
- 230000014509 gene expression Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
- Measuring Volume Flow (AREA)
- Character Discrimination (AREA)
Abstract
Patente de Invenção: <B>"PROCESSO PARA ADAPTAçãO DE UM MODELO ACúSTICO HIDDEN MARKOV EM UM SISTEMA DE RECONHECIMENTO DE FALA"<D>. Com a invenção, um livro de código (CB) em geral disponível com um sistema de reconhecimento de falta de modelos acústicos Hidden Markov é adaptado para casos especiais de aplicação. Esses casos de aplicação são então definidos por um léxico (LEX) do aplicativo alterado pelo usuário. A adaptação (ADAP) ocorre durante a operação e se dá por um deslocamento do vetor de ponto mediano armazenado das distribuições de distâncias de probabilidade de modelos Hidden Markov, na direção de um vetor de característica reconhecido de expressões de som e com relação aos modelos Hidden Markov especialmente empregues. Em comparação com processos usuais, tem a invenção a vantagem de que se da on-line e de que assegura uma taxa de reconhecimento muito alta com um pequeno dispêndio de busca. Além disso, o dispêndio para o treinamento de modelos acústico especiais para correspondentes casos de emprego é eliminado. Mediante aplicação de modelos Hidden Markov especiais de fonemas multi-linguais, em que as semelhanças de sons é aproveitada por diversas línguas, pode-se ter uma adaptação automática a línguas estrangeiras. Com o método então empregue para modelagem fonética acústica, são levadas em consideração tanto propriedades específicas da língua como também independentes da língua na reunião de distâncias de probabilidade para modelos acústicos Hidden Markov distintos em diversas línguas.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19636739A DE19636739C1 (de) | 1996-09-10 | 1996-09-10 | Verfahren zur Mehrsprachenverwendung eines hidden Markov Lautmodelles in einem Spracherkennungssystem |
DE19640586 | 1996-10-01 | ||
PCT/DE1997/002016 WO1998011534A1 (de) | 1996-09-10 | 1997-09-10 | Verfahren zur anpassung eines hidden-markov-lautmodelles in einem spracherkennungssystem |
Publications (1)
Publication Number | Publication Date |
---|---|
BR9712979A true BR9712979A (pt) | 2000-10-31 |
Family
ID=26029219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR9712979-8A BR9712979A (pt) | 1996-09-10 | 1997-09-10 | Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala |
Country Status (9)
Country | Link |
---|---|
US (1) | US6460017B1 (pt) |
EP (1) | EP0925579B1 (pt) |
JP (1) | JP2001503154A (pt) |
CN (1) | CN1237259A (pt) |
AT (1) | ATE209814T1 (pt) |
BR (1) | BR9712979A (pt) |
DE (1) | DE59705581D1 (pt) |
ES (1) | ES2169432T3 (pt) |
WO (1) | WO1998011534A1 (pt) |
Families Citing this family (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE235733T1 (de) * | 1998-05-11 | 2003-04-15 | Siemens Ag | Anordnung und verfahren zur erkennung eines vorgegebenen wortschatzes in gesprochener sprache durch einen rechner |
US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
KR100415217B1 (ko) * | 1998-09-09 | 2004-01-16 | 아사히 가세이 가부시키가이샤 | 음성인식 장치 |
IT1310154B1 (it) * | 1999-09-02 | 2002-02-11 | Cselt Centro Studi Lab Telecom | Procedimento per realizzare un riconoscitore vocale, relativoriconoscitore e procedimento per il riconoscimento della voce |
KR100307623B1 (ko) * | 1999-10-21 | 2001-11-02 | 윤종용 | 엠.에이.피 화자 적응 조건에서 파라미터의 분별적 추정 방법 및 장치 및 이를 각각 포함한 음성 인식 방법 및 장치 |
US6665640B1 (en) | 1999-11-12 | 2003-12-16 | Phoenix Solutions, Inc. | Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries |
US6633846B1 (en) | 1999-11-12 | 2003-10-14 | Phoenix Solutions, Inc. | Distributed realtime speech recognition system |
US9076448B2 (en) | 1999-11-12 | 2015-07-07 | Nuance Communications, Inc. | Distributed real time speech recognition system |
US6615172B1 (en) | 1999-11-12 | 2003-09-02 | Phoenix Solutions, Inc. | Intelligent query engine for processing voice based queries |
US7725307B2 (en) | 1999-11-12 | 2010-05-25 | Phoenix Solutions, Inc. | Query engine for processing voice based queries including semantic decoding |
DE10040063A1 (de) * | 2000-08-16 | 2002-02-28 | Philips Corp Intellectual Pty | Verfahren zur Zuordnung von Phonemen |
US6801656B1 (en) * | 2000-11-06 | 2004-10-05 | Koninklijke Philips Electronics N.V. | Method and apparatus for determining a number of states for a hidden Markov model in a signal processing system |
DE60029456T2 (de) * | 2000-12-11 | 2007-07-12 | Sony Deutschland Gmbh | Verfahren zur Online-Anpassung von Aussprachewörterbüchern |
US7418386B2 (en) | 2001-04-03 | 2008-08-26 | Intel Corporation | Method, apparatus and system for building a compact language model for large vocabulary continuous speech recognition (LVCSR) system |
US7124080B2 (en) * | 2001-11-13 | 2006-10-17 | Microsoft Corporation | Method and apparatus for adapting a class entity dictionary used with language models |
US7974843B2 (en) * | 2002-01-17 | 2011-07-05 | Siemens Aktiengesellschaft | Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer |
DE10220520A1 (de) * | 2002-05-08 | 2003-11-20 | Sap Ag | Verfahren zur Erkennung von Sprachinformation |
EP1361740A1 (de) * | 2002-05-08 | 2003-11-12 | Sap Ag | Verfahren und System zur Verarbeitung von Sprachinformationen eines Dialogs |
DE10256935A1 (de) * | 2002-12-05 | 2004-07-01 | Siemens Ag | Auswahl der Benutzersprache an einem rein akustisch gesteuerten Telefon |
US8285537B2 (en) * | 2003-01-31 | 2012-10-09 | Comverse, Inc. | Recognition of proper nouns using native-language pronunciation |
DE10334400A1 (de) * | 2003-07-28 | 2005-02-24 | Siemens Ag | Verfahren zur Spracherkennung und Kommunikationsgerät |
US7596499B2 (en) * | 2004-02-02 | 2009-09-29 | Panasonic Corporation | Multilingual text-to-speech system with limited resources |
US8036893B2 (en) * | 2004-07-22 | 2011-10-11 | Nuance Communications, Inc. | Method and system for identifying and correcting accent-induced speech recognition difficulties |
KR101244232B1 (ko) * | 2005-05-27 | 2013-03-18 | 오디언스 인코포레이티드 | 오디오 신호 분석 및 변경을 위한 시스템 및 방법 |
US8781837B2 (en) * | 2006-03-23 | 2014-07-15 | Nec Corporation | Speech recognition system and method for plural applications |
US20080147579A1 (en) * | 2006-12-14 | 2008-06-19 | Microsoft Corporation | Discriminative training using boosted lasso |
US20090030676A1 (en) * | 2007-07-26 | 2009-01-29 | Creative Technology Ltd | Method of deriving a compressed acoustic model for speech recognition |
EP2192575B1 (en) * | 2008-11-27 | 2014-04-30 | Nuance Communications, Inc. | Speech recognition based on a multilingual acoustic model |
KR101217524B1 (ko) * | 2008-12-22 | 2013-01-18 | 한국전자통신연구원 | 고립어 엔베스트 인식결과를 위한 발화검증 방법 및 장치 |
US20100198577A1 (en) * | 2009-02-03 | 2010-08-05 | Microsoft Corporation | State mapping for cross-language speaker adaptation |
US9798653B1 (en) * | 2010-05-05 | 2017-10-24 | Nuance Communications, Inc. | Methods, apparatus and data structure for cross-language speech adaptation |
US9672815B2 (en) * | 2012-07-20 | 2017-06-06 | Interactive Intelligence Group, Inc. | Method and system for real-time keyword spotting for speech analytics |
US9437208B2 (en) * | 2013-06-03 | 2016-09-06 | Adobe Systems Incorporated | General sound decomposition models |
US9183830B2 (en) * | 2013-11-01 | 2015-11-10 | Google Inc. | Method and system for non-parametric voice conversion |
US9177549B2 (en) * | 2013-11-01 | 2015-11-03 | Google Inc. | Method and system for cross-lingual voice conversion |
US9542927B2 (en) | 2014-11-13 | 2017-01-10 | Google Inc. | Method and system for building text-to-speech voice from diverse recordings |
CN105260775B (zh) * | 2015-10-16 | 2017-11-21 | 清华大学 | 实现马尔科夫随机场概率编码的方法及神经电路 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
JPH0636156B2 (ja) | 1989-03-13 | 1994-05-11 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 音声認識装置 |
EP0548460A3 (en) | 1991-12-21 | 1993-11-03 | Deutsche Aerospace | Method for fast speaker adaptation in a speech recognizer for large vocabulary |
FI97919C (fi) * | 1992-06-05 | 1997-03-10 | Nokia Mobile Phones Ltd | Puheentunnistusmenetelmä ja -järjestelmä puheella ohjattavaa puhelinta varten |
US5805771A (en) * | 1994-06-22 | 1998-09-08 | Texas Instruments Incorporated | Automatic language identification method and system |
JPH0896514A (ja) * | 1994-07-28 | 1996-04-12 | Sony Corp | オーディオ信号処理装置 |
US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
-
1997
- 1997-09-10 EP EP97944692A patent/EP0925579B1/de not_active Expired - Lifetime
- 1997-09-10 AT AT97944692T patent/ATE209814T1/de active
- 1997-09-10 ES ES97944692T patent/ES2169432T3/es not_active Expired - Lifetime
- 1997-09-10 US US09/254,785 patent/US6460017B1/en not_active Expired - Lifetime
- 1997-09-10 JP JP10513150A patent/JP2001503154A/ja active Pending
- 1997-09-10 BR BR9712979-8A patent/BR9712979A/pt unknown
- 1997-09-10 CN CN97199583A patent/CN1237259A/zh active Pending
- 1997-09-10 DE DE59705581T patent/DE59705581D1/de not_active Expired - Lifetime
- 1997-09-10 WO PCT/DE1997/002016 patent/WO1998011534A1/de active IP Right Grant
Also Published As
Publication number | Publication date |
---|---|
ES2169432T3 (es) | 2002-07-01 |
JP2001503154A (ja) | 2001-03-06 |
DE59705581D1 (de) | 2002-01-10 |
CN1237259A (zh) | 1999-12-01 |
WO1998011534A1 (de) | 1998-03-19 |
EP0925579B1 (de) | 2001-11-28 |
ATE209814T1 (de) | 2001-12-15 |
US6460017B1 (en) | 2002-10-01 |
EP0925579A1 (de) | 1999-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
BR9712979A (pt) | Processo para adaptação de um modelo acústico hidden markov em um sistema de identificação de fala | |
Gauvain et al. | Speaker-independent continuous speech dictation | |
AU640164B2 (en) | Method of speech recognition | |
WO1998011537A3 (de) | Verfahren zur mehrsprachenverwendung eines hidden markov lautmodelles in einem spracherkennungssystem | |
Lyu et al. | Language identification on code-switching utterances using multiple cues. | |
Ostendorf et al. | Parse scoring with prosodic information: an analysis/synthesis approach | |
Chollet et al. | Toward ALISP: A proposal for automatic language independent speech processing | |
Vorstermans et al. | Automatic segmentation and labelling of multi-lingual speech data | |
Veilleux et al. | Prosody/parse scoring and its application in ATIS | |
Stahlberg et al. | Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment. | |
Schuller et al. | Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition | |
Lamel et al. | Continuous speech recognition at LIMSI | |
Kim et al. | A combined punctuation generation and speech recognition system and its performance enhancement using prosody | |
Toledano et al. | Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules | |
Heeman | POS tagging versus classes in language modeling | |
Niu et al. | A study on landmark detection based on CTC and its application to pronunciation error detection | |
Boroş | A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language | |
Freij et al. | Lexical stress estimation and phonological knowledge | |
Bakenecker et al. | Improving parsing by incorporating" prosodic clause boundaries" into a grammar | |
Ostendorf | Prosodic boundary detection | |
Mimer et al. | Flexible decision trees for grapheme based speech recognition | |
Pandey et al. | Fusion of spectral and prosodic information using combined error optimization for keyword spotting | |
Keri et al. | Pause prediction from lexical and syntax information | |
Plannerer et al. | A continuous speech recognition system using phonotactic constraints | |
Tanaka et al. | A speech processing based on syllable identification by using phonological patterns |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
B08F | Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette] | ||
B15K | Others concerning applications: alteration of classification |
Ipc: G10L 15/065 (2013.01), G10L 15/00 (2013.0 |