TWI484476B - 電腦實作的語音系統及方法 - Google Patents

電腦實作的語音系統及方法 Download PDF

Info

Publication number
TWI484476B
TWI484476B TW099105182A TW99105182A TWI484476B TW I484476 B TWI484476 B TW I484476B TW 099105182 A TW099105182 A TW 099105182A TW 99105182 A TW99105182 A TW 99105182A TW I484476 B TWI484476 B TW I484476B
Authority
TW
Taiwan
Prior art keywords
term memory
word
probability
short
context
Prior art date
Application number
TW099105182A
Other languages
English (en)
Chinese (zh)
Other versions
TW201035968A (en
Inventor
Katsutoshl Ohtsuki
Takashi Umeoka
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of TW201035968A publication Critical patent/TW201035968A/zh
Application granted granted Critical
Publication of TWI484476B publication Critical patent/TWI484476B/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
TW099105182A 2009-03-30 2010-02-23 電腦實作的語音系統及方法 TWI484476B (zh)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/413,606 US8798983B2 (en) 2009-03-30 2009-03-30 Adaptation for statistical language model

Publications (2)

Publication Number Publication Date
TW201035968A TW201035968A (en) 2010-10-01
TWI484476B true TWI484476B (zh) 2015-05-11

Family

ID=42785345

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099105182A TWI484476B (zh) 2009-03-30 2010-02-23 電腦實作的語音系統及方法

Country Status (6)

Country Link
US (1) US8798983B2 (https=)
JP (1) JP2012522278A (https=)
KR (1) KR101679445B1 (https=)
CN (1) CN102369567B (https=)
TW (1) TWI484476B (https=)
WO (1) WO2010117688A2 (https=)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8688454B2 (en) * 2011-07-06 2014-04-01 Sri International Method and apparatus for adapting a language model in response to error correction
KR101478146B1 (ko) * 2011-12-15 2015-01-02 한국전자통신연구원 화자 그룹 기반 음성인식 장치 및 방법
US8918408B2 (en) * 2012-08-24 2014-12-23 Microsoft Corporation Candidate generation for predictive input using input history
CN102968986B (zh) * 2012-11-07 2015-01-28 华南理工大学 基于长时特征和短时特征的重叠语音与单人语音区分方法
US10726831B2 (en) * 2014-05-20 2020-07-28 Amazon Technologies, Inc. Context interpretation in natural language processing using previous dialog acts
US9703394B2 (en) * 2015-03-24 2017-07-11 Google Inc. Unlearning techniques for adaptive language models in text entry
CN108241440B (zh) * 2016-12-27 2023-02-17 北京搜狗科技发展有限公司 一种候选词展示方法和装置
US10535342B2 (en) * 2017-04-10 2020-01-14 Microsoft Technology Licensing, Llc Automatic learning of language models
CN109981328B (zh) * 2017-12-28 2022-02-25 中国移动通信集团陕西有限公司 一种故障预警方法及装置
CN112508197B (zh) * 2020-11-27 2024-02-20 高明昕 人工智能设备的控制方法、控制装置和人工智能设备
CN117313790A (zh) * 2023-09-26 2023-12-29 山东新一代信息产业技术研究院有限公司 一种增强大模型上下文方法及系统
CN119293191A (zh) * 2024-12-09 2025-01-10 北京罗克维尔斯科技有限公司 基于记忆系统的交互方法、装置、设备、存储介质及车辆

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW283774B (en) * 1994-12-31 1996-08-21 Lin-Shan Lii Intelligently vocal chinese input method and chinese dictation machine
TW200301460A (en) * 2001-12-17 2003-07-01 Asahi Chemicl Ind Voice recognition method, remote control, data terminal device, telephone communication terminal, and voice recognition device
TW200400488A (en) * 2002-06-28 2004-01-01 Samsung Electronics Co Ltd Voice recognition device, observation probability calculating device, complex fast fourier transform calculation device and method, cache device, and method of controlling the cache device
WO2010033384A1 (en) * 2008-09-19 2010-03-25 Dolby Laboratories Licensing Corporation Upstream quality enhancement signal processing for resource constrained client devices
US20110144991A1 (en) * 2009-12-11 2011-06-16 International Business Machines Corporation Compressing Feature Space Transforms

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19708183A1 (de) * 1997-02-28 1998-09-03 Philips Patentverwaltung Verfahren zur Spracherkennung mit Sprachmodellanpassung
CN1311881A (zh) 1998-06-04 2001-09-05 松下电器产业株式会社 语言变换规则产生装置、语言变换装置及程序记录媒体
US7403888B1 (en) * 1999-11-05 2008-07-22 Microsoft Corporation Language input user interface
US6848080B1 (en) * 1999-11-05 2005-01-25 Microsoft Corporation Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors
US7107204B1 (en) * 2000-04-24 2006-09-12 Microsoft Corporation Computer-aided writing system and method with cross-language writing wizard
US7013258B1 (en) * 2001-03-07 2006-03-14 Lenovo (Singapore) Pte. Ltd. System and method for accelerating Chinese text input
US7103534B2 (en) 2001-03-31 2006-09-05 Microsoft Corporation Machine learning contextual approach to word determination for text input via reduced keypad keys
JP4340024B2 (ja) 2001-06-07 2009-10-07 日本放送協会 統計的言語モデル生成装置および統計的言語モデル生成プログラム
JP4215418B2 (ja) * 2001-08-24 2009-01-28 インターナショナル・ビジネス・マシーンズ・コーポレーション 単語予測方法、音声認識方法、その方法を用いた音声認識装置及びプログラム
US20040003392A1 (en) 2002-06-26 2004-01-01 Koninklijke Philips Electronics N.V. Method and apparatus for finding and updating user group preferences in an entertainment system
US20050027534A1 (en) 2003-07-30 2005-02-03 Meurs Pim Van Phonetic and stroke input methods of Chinese characters and phrases
US7542907B2 (en) * 2003-12-19 2009-06-02 International Business Machines Corporation Biasing a speech recognizer based on prompt context
US8019602B2 (en) * 2004-01-20 2011-09-13 Microsoft Corporation Automatic speech recognition learning using user corrections
US7478033B2 (en) 2004-03-16 2009-01-13 Google Inc. Systems and methods for translating Chinese pinyin to Chinese characters
US7406416B2 (en) * 2004-03-26 2008-07-29 Microsoft Corporation Representation of a deleted interpolation N-gram language model in ARPA standard format
KR100718147B1 (ko) 2005-02-01 2007-05-14 삼성전자주식회사 음성인식용 문법망 생성장치 및 방법과 이를 이용한 대화체음성인식장치 및 방법
US7379870B1 (en) 2005-02-03 2008-05-27 Hrl Laboratories, Llc Contextual filtering
US8117540B2 (en) 2005-05-18 2012-02-14 Neuer Wall Treuhand Gmbh Method and device incorporating improved text input mechanism
JP4769031B2 (ja) * 2005-06-24 2011-09-07 マイクロソフト コーポレーション 言語モデルを作成する方法、かな漢字変換方法、その装置、コンピュータプログラムおよびコンピュータ読み取り可能な記憶媒体
JP4197344B2 (ja) 2006-02-20 2008-12-17 インターナショナル・ビジネス・マシーンズ・コーポレーション 音声対話システム
CN101034390A (zh) 2006-03-10 2007-09-12 日电(中国)有限公司 用于语言模型切换和自适应的装置和方法
US7912700B2 (en) 2007-02-08 2011-03-22 Microsoft Corporation Context based word prediction
US7809719B2 (en) 2007-02-08 2010-10-05 Microsoft Corporation Predicting textual candidates
US8028230B2 (en) 2007-02-12 2011-09-27 Google Inc. Contextual input method
JP4852448B2 (ja) * 2007-02-28 2012-01-11 日本放送協会 誤り傾向学習音声認識装置及びコンピュータプログラム
US20090030687A1 (en) * 2007-03-07 2009-01-29 Cerra Joseph P Adapting an unstructured language model speech recognition system based on usage
CN101286094A (zh) * 2007-04-10 2008-10-15 谷歌股份有限公司 多模式输入法编辑器
KR101465770B1 (ko) 2007-06-25 2014-11-27 구글 인코포레이티드 단어 확률 결정
US8010465B2 (en) * 2008-02-26 2011-08-30 Microsoft Corporation Predicting candidates using input scopes
JP5054711B2 (ja) * 2009-01-29 2012-10-24 日本放送協会 音声認識装置および音声認識プログラム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW283774B (en) * 1994-12-31 1996-08-21 Lin-Shan Lii Intelligently vocal chinese input method and chinese dictation machine
TW200301460A (en) * 2001-12-17 2003-07-01 Asahi Chemicl Ind Voice recognition method, remote control, data terminal device, telephone communication terminal, and voice recognition device
TW200400488A (en) * 2002-06-28 2004-01-01 Samsung Electronics Co Ltd Voice recognition device, observation probability calculating device, complex fast fourier transform calculation device and method, cache device, and method of controlling the cache device
WO2010033384A1 (en) * 2008-09-19 2010-03-25 Dolby Laboratories Licensing Corporation Upstream quality enhancement signal processing for resource constrained client devices
TW201028997A (en) * 2008-09-19 2010-08-01 Dolby Lab Licensing Corp Upstream quality enhancement signal processing for resource constrained client devices
US20110144991A1 (en) * 2009-12-11 2011-06-16 International Business Machines Corporation Compressing Feature Space Transforms
TW201133470A (en) * 2009-12-11 2011-10-01 Ibm Compressing feature space transforms

Also Published As

Publication number Publication date
US8798983B2 (en) 2014-08-05
WO2010117688A2 (en) 2010-10-14
KR20120018114A (ko) 2012-02-29
CN102369567B (zh) 2013-07-17
WO2010117688A3 (en) 2011-01-13
US20100250251A1 (en) 2010-09-30
TW201035968A (en) 2010-10-01
CN102369567A (zh) 2012-03-07
KR101679445B1 (ko) 2016-11-24
JP2012522278A (ja) 2012-09-20

Similar Documents

Publication Publication Date Title
TWI484476B (zh) 電腦實作的語音系統及方法
CN111709248B (zh) 文本生成模型的训练方法、装置及电子设备
JP5901001B1 (ja) 音響言語モデルトレーニングのための方法およびデバイス
EP3549069B1 (en) Neural network data entry system
CN111709234B (zh) 文本处理模型的训练方法、装置及电子设备
TWI475406B (zh) 取決於上下文之輸入方法
US9202461B2 (en) Sampling training data for an automatic speech recognition system based on a benchmark classification distribution
JP5932869B2 (ja) N−gram言語モデルの教師無し学習方法、学習装置、および学習プログラム
CN112926306B (zh) 文本纠错方法、装置、设备以及存储介质
US20210233510A1 (en) Language-agnostic Multilingual Modeling Using Effective Script Normalization
WO2015169134A1 (en) Method and apparatus for phonetically annotating text
AU2010346493B2 (en) Speech correction for typed input
US20100235780A1 (en) System and Method for Identifying Words Based on a Sequence of Keyboard Events
KR20100135819A (ko) 스케일된 확률들을 사용한 단어들의 분절
JP2016024759A (ja) 言語モデル用の学習テキストを選択する方法及び当該学習テキストを使用して言語モデルを学習する方法、並びに、それらを実行するためのコンピュータ及びコンピュータ・プログラム
CN110457719A (zh) 一种翻译模型结果重排序的方法及装置
CN113408306B (zh) 翻译方法及分类模型的训练方法、装置、设备和存储介质
WO2016144988A1 (en) Token-level interpolation for class-based language models
US11170183B2 (en) Language entity identification
JPWO2014073206A1 (ja) 情報処理装置、及び、情報処理方法
CN108874765A (zh) 词向量处理方法及装置
CN117043858A (zh) 用于执行语音辨识的循环神经网络-换能器模型
US11289095B2 (en) Method of and system for translating speech to text
CN108664141B (zh) 具有文档上下文自学习功能的输入法
US12374323B2 (en) 4-bit conformer with accurate quantization training for speech recognition

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees