CN1207664C - Error correcting method for voice identification result and voice identification system - Google Patents

Error correcting method for voice identification result and voice identification system Download PDF

Info

Publication number
CN1207664C
CN1207664C CN 99110695 CN99110695A CN1207664C CN 1207664 C CN1207664 C CN 1207664C CN 99110695 CN99110695 CN 99110695 CN 99110695 A CN99110695 A CN 99110695A CN 1207664 C CN1207664 C CN 1207664C
Authority
CN
China
Prior art keywords
character
speech recognition
speech
recognition result
error
Prior art date
Application number
CN 99110695
Other languages
Chinese (zh)
Other versions
CN1282072A (en
Inventor
唐道南
苏辉
王茜莺
沈丽琴
秦勇
Original Assignee
国际商业机器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国际商业机器公司 filed Critical 国际商业机器公司
Priority to CN 99110695 priority Critical patent/CN1207664C/en
Publication of CN1282072A publication Critical patent/CN1282072A/en
Application granted granted Critical
Publication of CN1207664C publication Critical patent/CN1207664C/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6288Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • G06K9/6292Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion of classification results, e.g. of classification results related to same input data
    • G06K9/6293Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion of classification results, e.g. of classification results related to same input data of classification results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/72Methods or arrangements for recognition using electronic means using context analysis based on the provisionally recognised identity of a number of successive patterns, e.g. a word
    • G06K9/726Syntactic or semantic context, e.g. balancing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K2209/01Character recognition

Abstract

本发明公开了一种可以对语音识别结果中的错误进行校正的方法和语音识别系统。 The present invention discloses a speech recognition result can correct errors in the speech recognition system and method. 本发明的错误校正方法包括步骤:对输出的语音识别结果中的错误进行标记;用基于字形输入的方法输入与标记的错误相对应的正确字符;对基于字形的输入进行识别;显示候选的正确字符;用户从候选的正确字符中选出所需字符;和使用选出的字符代替错误的字符,所述方法的特征在于还包括步骤:使用错误字符的语音信息来对候选的正确字符进行筛选处理。 The error correction method of the present invention comprises the steps of: speech recognition result output errors in the marked; the wrong based approach shaped input entered the mark corresponding to the correct character; based on the input shape identification; display the correct candidate characters; the correct character from the candidate selected by the user in the desired character; and using the selected character in place of the wrong character, said method being characterized by further comprising the step of: using the wrong character speech information to filter the correct character candidate deal with.

Description

对语音识别结果中的错误进行校正的方法和语音识别系统 A method of speech recognition result correcting errors in a speech recognition system and

本发明涉及语音识别技术,具体地说涉及使用语音信息对语音识别结果中的错误进行校正的方法和使用该方法的语音识别系统。 The present invention relates to speech recognition, and more particularly relates to methods of using speech information in the speech recognition result and an error correcting speech recognition system using the same.

语音识别技术是一种利用计算机和数字信号处理技术准确地识别人的语音(如字、词、子句、句子等)的技术。 Speech recognition technology is an accurately recognize human speech (e.g., words, word, clause, sentence, etc.) using computer technology and digital signal processing techniques. 语音识别的基础是提取待识别语音的各种有效特征,形成待识别的语音模式,并与存储在计算机内存中的样本模式相比较,再通过模式分类方法进行识别是什么字、什么词等。 Basis of what word speech recognition is to extract various active and other features of the speech to be recognized, recognition of speech patterns to be formed, and compared to a sample pattern stored in the computer memory, then recognition is what the word by the pattern classification method. 语音识别过程是对音节或词等语言成份的识别过程。 Speech recognition process is the identification process of language or words such as syllable ingredients. 无疑语音识别是一种快速地将文本输入到计算机中的有效方式。 Speech recognition is a fast undoubtedly be an effective way to enter text into the computer. 虽然目前对语音识别有大量研究,但由于语言的复杂性,在连续语音、话者无关、大词汇的识别方面还处于探索阶段。 Although there are numerous studies on speech recognition, but because of the complexity of the language in continuous speech, speaker-independent, large vocabulary recognition is still in the exploratory stage. 识别的准确率永远不会达到100%,所以对语音识别结果中的错误进行校正是必不可少的步骤。 Recognition accuracy will never reach 100%, so the speech recognition result in the error correction is an essential step.

在错误校正过程中各种输入方式的友好性和有效性是非常重要的,因为它们是完成语音识别过程的一部分,并且有可能是用户是否接受语音输入方式的决定性因素。 In the error correction process, friendliness and effectiveness of a variety of input methods is very important because they are part of the speech recognition process is completed, and there may be a decisive factor in whether the user accepts voice input mode. 通常采用诸如手写输入或各种基于笔画的输入方式来纠正语音识别结果中的错误,因为语音识别系统的用户一般来说不愿意使用键盘或不熟悉键盘,这些用户更希望使用接近自然书写习惯的基于笔形的手写输入方式,如手写输入、基于笔画或笔画类型的输入方式。 Usually as a handwriting input or a variety of ways based on the input strokes to correct errors in the speech recognition results, because the user voice recognition systems in general do not want to use the keyboard or keyboard are not familiar, these users prefer to use close to the natural habit of writing pen-based handwriting input method, such as handwriting input, an input mode based on the type of stroke or stroke. 但是,由于手写识别技术亦不太成熟,从而降低了校正语音识别结果中的错误的效率。 However, the handwriting recognition technology also is not mature, thereby reducing the efficiency of the speech recognition result correction error. 目前各种纠正语音识别结果中的错误的方法都没有利用语音识别过程中产生的有用的语音信息。 Currently a variety of correct speech recognition results do not use the wrong method useful for speech recognition voice information generated in the process. 本发明的目的就是有效地利用语音识别过程中产生的语音信息提高语音识别的纠错效率,即:提高纠错的可靠性和速度。 Object of the present invention is to effectively utilize the speech recognition voice information generated during the error correction to improve the efficiency of speech recognition, namely: to improve the reliability and speed of error correction.

本发明充分利用在语音识别过程中得到的语音信息使采用各种基于笔形的输入方法对语音识别结果中的错误进行校正的效率得到提高。 The present invention fully utilizes the voice information obtained in the speech recognition process to make various input methods using the pen-type speech recognition result based on the error correcting efficiency is improved. 本发明自动保存和处理来自语音识别过程的有效的语音信息。 The present invention automatically saved and effective handling of voice information from the voice recognition process. 这是通过内部数据转换以及加入涉及各个统计模型的评估过程来实现的。 This is achieved by converting internal data and assessment process involves adding various statistical models. 本发明使用混淆矩阵产生语音模型并将语音模型和字、词级语言模型配合使用来优化纠错处理。 The present invention uses a confusion matrix generated speech and the speech model and a word model, with the use of word-level language model to optimize the error correcting process.

根据本发明的一个方面提供一种对语音识别结果中的错误进行校正的方法,包括:对输出的语音识别结果中的错误进行标记;用基于字形输入的方法输入与标记的错误相对应的正确字符;对基于字形的输入进行识别;显示候选的正确字符;用户从候选的正确字符中选出所需字符;和使用选出的字符代替错误的字符,所述方法的特征在于还包括步骤:使用错误字符的语音信息来对候选的正确字符进行筛选处理。 According to one aspect there is provided a speech recognition of the present invention results in an error correcting method, comprising: speech recognition result output errors in marking; glyph input method for the mark based on the input corresponding to the correct error character; glyph based on the input identification; correct character candidates displayed; user select the correct character from the candidate characters; and the characters used in place of the selected character errors, characterized in that the method further comprises the step of: use voice information incorrect character to character screening process to correct candidate.

根据本发明的另一个方面提供一种语音识别系统,包括:采集用户语音的语音检测装置;对语音模型中的每个发音计算出其与语音采样是否相同的概率估值的发音概率计算装置;根据语言模型计算文字在当前上下文情况下出现的概率估值的文字概率计算装置;对所述发音概率计算装置和文字概率计算装置的计算结果进行综合以得出与联合最大概率值对应的文字作为语音识别结果的文字匹配装置;利用识别结果修改上下文的上下文产生装置;以及文字输出装置,所述语音识别系统的特征在于还包括一个错误校正装置,用户可以利用该错误校正装置对文字输出装置输出的语音识别结果中的错误进行标记,用基于字形输入的方法输入与标记的错误相对应的正确字符,而所述错误校正装置对基于字形的输入进行识别、产生候选的正确字符并利用错误字符的 To provide a speech recognition system, according to another aspect of the present invention includes: the voice collecting user voice detecting apparatus; for each of the voiced speech model which calculates the probability that speech samples are the same pronunciation estimate probability calculation means; the computing means calculates the language model probability character text appears in the current context where valuations; calculation means calculates the probability calculation means and the pronunciation character probability to derive the integrated joint maximum probability value corresponding to the text as a word speech recognition result matching means; recognition result using a modified context context generation means; and a character output means, characterized in that the speech recognition system further comprising an error correction device, the user can output means to the text with which error correction device speech recognition result errors are marked with an error input from the input method based on the mark shape corresponding to the correct character, and said error correcting means based on the input shape recognition, and generate the correct character candidate with an error character of 音信息对候选的正确字符进行筛选处理。 Sound information on the correct character of the candidate screening process.

通过以下结合附图对本发明最佳实施方式进行的详细描述,本发明的其它目的和特征将会更加明显。 Detailed description of the preferred embodiments of the present invention carried out in conjunction with the following drawings, other objects and features of the invention will become more apparent.

图1为根据本发明一个实施例对语音识别结果中的错误进行校正的具体操作流程;图2为根据本发明一个实施例对语音识别结果中的错误进行校正的方法的一般流程图;图3为根据本发明的一个实施例利用从混淆矩阵中得出的语音信息对候选字符进行筛选过程的一般流程图;图4为现有技术的语音识别系统的构成示意图;图5为根据本发明一个实施例的可以对识别结果中的错误进行校正的语音识别系统;和图6为根据本发明一个实施例的错误纠正屏幕。 Figure 1 is a specific operation flow of the embodiment of the present invention in the speech recognition result correcting errors; FIG. 2 is a flowchart of a general method according to one embodiment of the present invention for speech recognition result correcting errors; Figure 3 a general flowchart for the use of the confusion matrix derived from the speech information in accordance with one embodiment of the invention the screening process, candidate characters; FIG. 4 is a configuration schematically showing a voice recognition system of the prior art; FIG. 5 in accordance with the present invention is a Example embodiments may be speech recognition system for correcting the errors of the recognition result; and FIG. 6 is an embodiment of the present invention, error correction screen.

在图1中示出了根据本发明一个实施例的通过手写输入对语音识别结果中的错误进行校正的操作流程。 In FIG 1 shows one embodiment of the handwriting input to the present invention, the speech recognition result in the error correcting process operation. 当发现语音识别结果中存在错误时,可以按如下过程进行校正:步骤101:用户进行口述,重复多次以得到正确结果;步骤102:在屏幕上显示语音识别(SR)结果;步骤103:用户标记待校正的错误;步骤104:系统使用错误字符的音标来检索与该错误字符相关的语音信息(按统计模型形式)并将语音信息和语言模型配合使用,以对候选者排队并对候选者进行选择;步骤105:用户借助各种输入方法,如手写输入方法,来输入与标记的错误字符对应的正确字符;步骤106:当在各种输入过程中完成了识别过程时,系统利用步骤104中的模型对当前候选者清单中的候选者排队以获得较高的准确性和较高的速度;步骤107:将由此产生的候选者清单的一部分或全部显示在屏幕上;步骤108:用户通过光标等选择正确的字符。 When there is an error found in the speech recognition result, the correction process may be carried out as follows: Step 101: a user dictation, repeated several times to obtain a correct result; Step 102: Display a speech recognition (SR) results on the screen; Step 103: User tag error to be corrected; step 104: the system using phonetic characters to retrieve the error associated with the error character speech information (in the form of the statistical model) and with the voice information and the language model used, and in order to line up candidates for candidates selection; step 105: a user input through a variety of methods, such as handwriting input method, the input to the error flag corresponding to the character to the correct character; step 106: upon completion of the identification process in a variety of entry, step 104 using the system the model of the current candidate list candidate queue to obtain a higher accuracy and higher speed; step 107: the candidate list of the resulting part or all displayed on a screen; step 108: user cursor, etc. select the correct character.

图2示出了对于采用基于笔画的键盘或手写输入时,语音识别结果中的错误的恢复过程,如图2所示:步骤201:用户完成第一遍口述;步骤202:在显示器上显示语音识别(SR)结果;步骤203:用户检验结果,如果识别结果中没有错误,则不需校正,则输入过程结束。 Figure 2 shows that for a stroke when using a keyboard or handwriting input on an error recovery process in the speech recognition result, as shown in FIG. 2: step 201: the user completes the first pass oral; Step 202: the voice on the display recognition (SR) result; step 203: a user test results, if the recognition result is not an error, you do not need correction, the input process ends. 如果在语音识别结果中存在一个或多个错误,用户标记待校正的错误。 If one or more errors in the speech recognition result, the user marks the error to be corrected. 这可以是由多个字组成的词。 This can be a word composed of a plurality of words. 用户通常要求显示一个侯选者清单。 Users typically required to display a candidate for the list. 如果在清单中存在正确的字符,则用户直接到步骤209,否则用户转到步骤204。 If there is the correct character in the list, the user directly to step 209, otherwise the user proceeds to step 204. 对于语音识别结果中的每个错误可以重复执行该步骤。 For each of the speech recognition result in this step may be repeated errors.

步骤204:用户通过语音输入与标记的错误字符(字、词)对应的正确字符(字、词)。 Step 204: the user through the voice input error flag characters (characters, words) corresponding to the correct character (word, word). 语音识别机将只使用语音模型对其进行译码(即:禁止语言模型)。 Speech recognition engine will only be decoded speech models (i.e.: prohibited language model). 如果在屏幕上显示出正确的字符(字、词)作为候选者,则用户转到步骤209;步骤205:如果屏幕上显示的字符(字、词)仍不正确,用户可以重复执行步骤204;步骤206:当错误持续存在时,用户开始输入正确的字符,即输入字符的笔画序列;步骤207:系统根据从步骤204中得到的错误字符的发音类型,从混淆矩阵中检索和错误字符相关的统计模型。 If the correct character (characters, words) as a candidate on the screen, the user proceeds to step 209; Step 205: If the characters (characters, words) displayed on the screen is still incorrect, the user may repeat the steps 204; step 206: when the error is persistent, the user starts to enter the correct character, i.e. input stroke sequence of characters; step 207: the system of erroneous characters received from step 204 the pronunciation type, from the confusion matrix to retrieve and erroneous characters associated statistical model. 该模型俘获错误字符的统计意义上来说最有用的特征,它可以由错误字符的第一个声母或拼音字母的分布组成;步骤208:由步骤207中获得的语音模型和字、词级语言模型配合使用,导出在连续的笔画输入过程中对候选的字符(字、词)的似然性的概率估计。 The model capture statistically wrong characters are most useful feature, it may be by the first initials or phonetic alphabet characters of the error distribution; Step 208: the voice model obtained from step 207 and word, word-level language model with the use of export like probabilistic estimate of the likelihood of the candidate's character (word, words) in a continuous stroke input process. 使用这些集成模型对基于笔画输入而产生的候选者进行排列以提高纠错效率;步骤209:用户通过光标等选择所需的正确字符,并输入其在候选者清单中的序号。 Integrated models using these candidates generated based on the stroke input arranged to improve the efficiency of the error correction; Step 209: the user selecting the correct cursor or the like through the desired character, number and enter it in the candidate list.

以下结合图3,描述一个根据本发明一个具体的实施例利用从混淆矩阵中得出的语音信息对候选字符进行筛选的过程。 Below in connection with FIG. 3, described utilizes a confusion matrix derived from the speech information of a particular embodiment of the screening of candidate characters according to the present invention process.

利用来自语音识别过程的语音信息的目的是有效地对候选的字符(字、词)进行排队。 Speech information from the object using the speech recognition process is effective candidate character (characters, words) are queued. 以下详细地描述对于给定的错误字符(字、词),如何从预先产生的混淆矩阵中提取语音信息。 The following are described in detail for a given error character (word, words), how to extract voice information from the confusion matrix previously generated. 还将描述如何在这种概率统计模型的基础上结合语言模型来对候选字符(字、词)进行筛选。 Also describes how to filter the candidate characters (words, phrases) combined with language model based on the probability of the statistical model.

首先介绍如何产生混淆矩阵。 First, how confusing matrix. 假定混淆矩阵是事先由语音输入错误数据产生的,它俘获连续语音输入中所有音节的错误概率。 The confusion matrix is ​​assumed to be generated in advance by the voice data input errors, which capture continuous speech input error probabilities of all syllables.

将汉语中的音节集定义为: The Chinese syllable set is defined as:

SSet={S1,S2…,SN}为了得到每个识别结果中的错误E的候选者,我们需要得到在给定识别字符的音节和其上下文情况下每个候选者的概率,即:P(C|SHE,H)其中C代表某一候选者,SHE是识别出的字符的音节序列,包括识别错误字符本身的音节和其最近的历史情况,即:SHE=S(H)+S(E),其中S(H)代表H的音节序列,S(E)代表E的音节序列。 SSet = {S1, S2 ..., SN} in order to obtain each of the recognition candidates in the error E, we need to obtain the probability of a given syllable recognizing characters and their context where each candidate, namely: P ( C | SHE, H) wherein C represents a candidate, SHE syllable sequence is recognized characters, comprising a character recognition error syllable and its own history of recent, namely: SHE = S (H) + S (E ), where S (H) of the representative syllable sequence H, S (E) E is the representative syllable sequence. H是其语境的历史情况。 H is the history of its context. 然后我们根据上述概率值对候选者排队。 Then we line up for the candidates based on the probability value.

使用Bayes规则,我们可以得出P(C|SHE,H)=P(CSHEH)P(SHEH)=P(SHEH|C)P(C)P(SHEH)]]>因为SHE是纯语音,并且H是纯语言事件,所以我们可以将它们认为是完全独立的变量。 Using Bayes rule, we can conclude that P (C | SHE, H) = P (CSHEH) P (SHEH) = P (SHEH | C) P (C) P (SHEH)]]> because SHE is pure voice, and H is a pure language event, so we can consider them to be completely independent variables. 并且确定给定的识别出的字符的SHE和H。 And determines a given character identified SHE and H. 所以上述等式可以简化为:RankP(SHEH|C)P(C)P(SHEH)=RankP(SHE|C)P(H|C)P(C)]]>=RankP(C|SHE)P(C|H)P(C)---(1)]]>为了实用,我们将P(C|SHE)简化为P(CS|SE),其中CS表示C的音节,SE是识别错误字符的音节。 Therefore, the above equation can be simplified to: RankP (SHEH | C) P (C) P (SHEH) = RankP (SHE | C) P (H | C) P (C)]]> = RankP (C | SHE) P (C | H) P (C) --- (1)]]> for practical reasons, we P (C | SHE) simplifies to P (CS | SE), wherein CS represents the syllable C, SE is a character recognition error syllables. 这种简化表明我们忽略了语音上下文S(H),并将具有相同音节的字符组成一类。 This indicates that we ignore the simplified speech context S (H), and having the same character A class syllable.

在训练时,我们采用M个测试者,每个测试者读N个测试语句。 In training, we use the M testers, who read each test N test statement. 我们按音节不管语言模型来对这些测试者的语句进行识别。 Regardless of our language model to identify these statements by the testers syllable.

对测试语句中的每个音节ST,如果将其识别为SD,其中SD可以是ST本身,我们将在混淆矩阵中对Count(ST-SD)加1。 ST test for each syllable statement, if it is identified as SD, where SD ST itself may be, we will of Count (ST-SD) plus 1 in the confusion matrix. 然后,我们可以得到将ST识别为SD的概率:P(SD|ST)=Count(ST-SD)ΣCount(ST-SM)]]>对于所有SM∈SSet其中ST,SD∈SSet,Count(ST-SD)是将ST识别为SD的次数,∑Count(ST-SM)是一行ST的累加,它代表ST被识别为任何音节的总次数SM∈SSet。 Then, we can obtain a probability of SD ST recognized as: P (SD | ST) = Count (ST-SD) & Sigma; Count (ST-SM)]]> wherein all SM∈SSet ST, SD∈SSet, Count (ST-SD) is identified as the number of SD ST, ΣCount (ST-SM) is a line cumulative ST, ST is recognized as representing the total number of syllables any SM∈SSet. 我们在最终的混淆矩阵中保存P(SD|ST)。 We save P (SD | ST) in the final confusion matrix.

同时,我们可以得到:P(ST)=Count(ST)ΣCount(Sm)---(2)]]>对于所有SM∈训练数据混淆矩阵 At the same time, we get: P (ST) = Count (ST) & Sigma; Count (Sm) --- (2)]]> For all training data confusion matrix SM∈ 通过使用混淆矩阵,我们得到识别出的音节SD,并且我们想得到SD来自于给定的ST的概率,即(ST|SD)。 By using confusion matrix, we have identified syllable SD, SD and we want to get from a given probability ST in that (ST | SD). 使用Bayes规则,我们得到:P(ST|SD)=P(SD|ST)P(ST)P(SD)]]>当我们计算P(CS|SE)时,P(CS|SE)=P(ST=CS|SD=SE)=P(SD=SE|ST=CS)P(ST=CS)P(SD=SE)]]>对于所有候选者P(SD=SE)都相同,所以在对候选者排队时,P(SD=SE)是没用的。 Using Bayes rule, we get: P (ST | SD) = P (SD | ST) P (ST) P (SD)]]> When we calculate P (CS | SE) when, P (CS | SE) = P (ST = CS | SD = SE) = P (SD = SE | ST = CS) P (ST = CS) P (SD = SE)]]> are the same for all the candidate P (SD = SE), so when candidate queue, P (SD = SE) is useless. 我们从混淆矩阵中能得到P(SD=SE|ST=CS)以及公式(2)中的P(ST=CS)。 We can get from the confusion matrix P (SD = SE | ST = CS) and the formula P (ST = CS) (2) of.

尽管上述方法适合于用户在首次完成他/她的语音输入之后列出候选者并相对于特定的错误字符寻找候选者(上述错误恢复过程中的步骤203),这里我们将注意力放在用户已重复对错误的字符进行语音输入,但仍失败,并准备采用基于笔形的输入方法的情形(步骤206)。 Although the above method is suitable for users listed candidates for the first time after the completion of his / her voice and input (step above error recovery process 203) looking for candidates with specific error character, here we will focus on the user has repeat character voice input errors, but still failed, and prepared using the case of the pen-type input method (step 206) based. 这表明在正确的字符以具体方式读出之后,错误仍然存在。 This indicates that after the correct character read out in a concrete manner, the error still exists.

于是可以由在一具体语音输入环境中记录下来的错误字符来训练混淆矩阵。 So the confusion matrix can be trained by the error character recorded voice input in a specific environment. 这种混淆矩阵可以和语言模型一起来使用以对笔形输入过程中产生的候选者进行排队。 Such language model confusion matrix can be used together in a pen type input candidate generated during queuing.

为了防止混淆矩阵中各数据项不准确(由于训练数据不充分、不熟悉的发音,等),可以使用下述方式修改混淆矩陈以补充原始的混淆矩阵方法。 To prevent confusion matrix of each data item is not accurate (due to insufficient training data, unfamiliar pronunciation, etc.) may be used in the following manner to modify confusion matrix system to supplement the original confusion matrix method. 我们首先将音节聚集成不同的起始韵母组(包括起始韵母丢失的组)并按上述相同的方式产生韵母混淆矩阵。 We first initial syllable gathered into different vowels group (including initial vowel missing group) according to the same manner as described above to produce vowel confusion matrix. 我们还可以将音节聚集成不同的声母组并产生声母混淆矩陈。 We can also be gathered into syllables and produce different sets of initials initials confusion matrix system. 这些矩阵将给出两个独立的语音模型,它们可以和字、词级语言模型配合使用来对有效的被认为是正确的字、词的全部的似然做出估计。 These matrices will be given two separate voice model, they can and word, word-level language model to be used in conjunction effective is considered to be the right word, the word to make all likelihood estimate.

图3描述了根据本发明的一个具体实施例利用从混淆矩阵中得出的语音信息对候选者进行筛选的过程。 3 depicts a process in accordance with one embodiment of the present invention using the specific embodiments derived from the speech information of the confusion matrix for screening candidates.

步骤301:对用户基于笔形的输入进行识别产生候选者清单C={C1,C2…Cn};步骤302:对于候选者清单C={C1,C2…Cn}中的每一候选者Ci,从混淆矩阵中得出其音节SCi与识别错误字符音节SE的似然性;步骤303:判断从混淆矩阵中得出的似然性是否大于一阈值Slim,如果判断结果为小于Slim则从候选者清单中去掉该候选者,并对下一个候选者执行步骤302;步骤304:如果判断结果为从混淆矩阵中得出的仍然性大于等于阈值Slim,则将此候选者保留作为待显示给用户的候选者清单中的成员,对下一个候选者执行步骤302;步骤305:在对C={C1,C2…Cn}中所有候选者执行了以上步骤后,将保留在候选者清单中的候选者显示给用户。 Step 301: the user identification based on the pen-shaped input generated candidate list C = {C1, C2 ... Cn}; Step 302: For the candidate list C = {C1, C2 ... Cn} each of candidates of Ci, from syllable confusion matrix which is derived SCi and character recognition error likelihood syllable SE; step 303: Determine confusion matrix derived from the likelihood of similar whether Slim greater than a threshold, the candidate list if the result is less than from Slim in removing the candidate, and a candidate for the next step 302; step 304: If the determination result is derived from the confusion matrix is ​​still of greater than or equal the threshold value Slim, the retention of this candidate to the user to be displayed as a candidate 's list members, step 302 is executed the next candidate; step 305: after to C = {C1, C2 ... Cn} all candidates the previous steps, will remain candidates in the candidate list displayed to the user.

例如在某个语音识别过程中将“世”识别成“是”,为了校正错误,我们借助基于笔形的输入方法输入了笔画“一”,这时:C={一厂丁二七十才寸士世}如果不利用语音信息,正确的“世”在很靠后的位置,但通过以上步骤301-305的处理,显示给用户的候选者清单将是:C={十士世} For example, in a speech recognition procedure in the "world" recognized as "Yes", in order to correct the error, we use pen-based input method of the input stroke "a", then: C = {a butadiene plant only seventy inch } if persons world without using voice information, the correct "World" is in the position on, but by the processing of the above steps 301-305, the user to display a list of candidates will be: C = {} ten persons Shi

由此可以看出借助语音信息可以提高错误校正的效率。 It can be seen by means of voice information can improve the efficiency of error correction.

此外,我们可以使用语言模型(LM)来对候选者清单来做进一步限定。 In addition, we can use a language model (LM) to do further define the list of candidates. 为了使用LM来裁剪候选者清单,我们注意到,由于假设错误是发生在单个字输入环境中,我们只需考虑一元文法语言模型。 In order to use LM to cut the list of candidates, we note that, due to the assumption that an error occurred in a single word input environment, we need only consider one yuan grammar language model. 换句话说,在对有效的候选者进行评估和排序过程中,我们仅简单地在语音模型中加入单字频率。 In other words, the effective candidates and to evaluate the sorting process, we simply added words in the speech model frequency. 我们还可以对语音和语言模型加权,对于不熟悉的题目可以减小语言模型的权值。 We can also weighted for speech and language model, are not familiar with the subject can reduce the weight of the language model.

此外,对于基于笔画的输入,可以根据笔画或笔画类型序列将所有涵盖的字符集组织成树形结构。 In addition, stroke-based input, according to the type of stroke or stroke sequence encompassed all characters organized into a tree structure. 当依次输入笔画(笔画类型)时,系统对产生的树形结构进行遍历,只保留有效的分枝。 When sequentially input stroke (stroke type), the system for traversing the tree structure produced, leaving only effective branches. 可以使用组合的语音(混淆矩阵)以及语言模型来对当前有效的候选者排队,这是基于它们整体似然值来完成的。 You may use a combination of voice (confusion matrix) and the language model of the currently active candidate line, which is based on their overall likelihood values ​​done.

可以对这种声音辅助笔画输入的有效性进行如下的估计。 It can assist the effectiveness of stroke input such as sound estimates. 在不借助任何现有的语音信息独立地使用笔画/笔画类型输入系统时,平均需要5.36个笔画才能把常用汉字集中的6763个汉字限制到10个以下候选者上。 When not by any of the existing voice information used independently strokes / system type of input strokes, strokes to an average of 5.36 to 6763 commonly used Chinese characters limit set to 10 or less candidates. 当我们使用字、词级语言模型来处理多字词时,每个汉字的有效笔画数可以减少到3.12。 When we use the word, word-level language model to handle multiple words, each valid number of strokes of Chinese characters can be reduced to 3.12.

当借助现有的语音信息输入笔画时,如果我们假设错误字符的第一个拼音字母90%为正确的,则为了俘获上层10个候选者清单中的正确候选者,所需平均笔画数不会超过2.8(或3.7)。 When building on existing voice information input strokes, if we assume that the bad character of the first phonetic alphabet 90% correct, in order to capture the right candidate for the top 10 candidates list, the average number of strokes is not required more than 2.8 (or 3.7). 如果假设90%的声母是正确的,则俘获正确候选者所需笔画数不会超过2.5(或3.4)。 If it is assumed 90% of the consonant is correct, then the candidate correctly capture a desired number of strokes is not more than 2.5 (or 3.4). 使用声母和韵母信息的这两个模型共同工作,可以使所需输入的平均笔画数不会超过1.0027。 Using the initial and final information on these two models work together, we can make the average number of strokes needed to enter not more than 1.0027.

如果通过长度为100的混淆集的混淆矩阵提供已有的语音信息,而在顶层10个候选者中语音识别机不能提供正确的字符,则需要基于笔画的输入方法。 If the length of the confusion set 100 confusion matrix provides existing voice information, and the top 10 candidates in the speech recognition engine can not provide the correct character, the need to enter the stroke based method. 如果10-100混淆集可以涵盖60%的错误字符的正确候选者,我们的初始混淆矩阵数据导致所需输入笔画数为3.572。 If the correct 10-100 confusion set can cover 60% of the candidates for the wrong character, our initial confusion matrix input data lead to the desired number of strokes is 3.572. 通过使用语音模型和字符及语言模型可以获得这些数字。 By using voice and character models and language models can get these numbers. 通过字词级预测方式,每个字符所需的平均有效笔画数会进一步降低。 By word level prediction methods, the average effective number of strokes required for each character will be further reduced. 估计为2.2-2.56。 Estimated at 2.2-2.56.

如果在错误词中有其它错误的字符,则其它错误字符的混淆矩阵与语言模型一起用来分别提供待估测的字符候选者。 If there are other errors in the error word of character, then the language model with the confusion matrix other errors characters are used together to provide character candidate to be estimated. 也可以估计每个字符的平均有效笔画数。 Also it can estimate the average effective number of strokes per character.

由此可以看出通过使用语音信息可以使语音识别错误的校正效率大大提高。 It can be seen that the speech recognition error can be corrected by using voice information efficiency greatly increased. 以下描述一下使用这种语音辅助方法的进行错误校正的语音识别系统。 The following describe a method using such an auxiliary voice error correcting speech recognition system.

如图4所示,一般的语音识别系统包括一个语音模型7和一个语言模型8。 As shown, the general speech recognition system 4 comprises a speech model 7 and a language model 8. 语音模型7包括所识别语言中的常用文字的发音。 Speech model 7 includes language commonly recognized character pronunciation. 这种发音是利用统计方法从多数人对某个文字的阅读发音中总结出来的,代表了该文字的一般发音特征。 General pronunciation Pronunciation feature of this is the use of statistical methods to summarize the most people to pronounce a text reading out, on behalf of the text. 语言模型8包括所识别语言中常用文字的使用方法。 8 language model comprises using a method commonly used in the language identified character.

图4所示的连续语音识别系统的工作过程为,语音检测装置1采集用户的语音,例如将语言表示为语音采样,将该语音采样送到发音概率计算装置2。 During operation of the continuous speech recognition system is shown in Figure 4, a speech detection apparatus collecting the user's speech, for example, expressed as language voice samples, the speech samples probability calculation means 2 to the pronunciation. 发音概率计算装置2对语音模型7中的每个发音给出其与语音采样是否相同的概率估值。 Probability calculation means 2 pronunciation model for each speech utterance is given with the voice samples 7 are the same probability estimates. 文字概率计算装置5,根据从大量语料中总结出的语言规律,给出对语言模型8中的文字是否是当前上下文情况下应出现的文字的概率估值。 Text probability calculation means 5. The language drawn from the law of large corpus given of the language model probability is 8 words whether the current context where the text should appear valuation. 文字匹配装置3,将发音概率计算装置2计算的概率估值与文字概率计算装置5计算的概率估值结合起来,计算一个联合概率(该联合概率值表示将语音采样识别为该文字的可能性),联合概率值最大的文字,作为语音识别的结果。 Text matching means 3, the probability calculation of the probability estimation and pronunciation character probability calculating means of calculating means 2 calculates the probability estimates 5 combine sampled speech recognition possibility for calculating a joint probability character (which represents the joint probability value ), the maximum joint probability of the text, as a result of speech recognition. 上下文产生装置4利用上述识别结果修改当前的上下文,以便为识别下一个语音采样所用。 Context generating means 4 using the recognition result of the modification of the current context, in order to recognize a voice sample for the next use. 文字输出装置6输出所识别的文字。 Character output means 6 outputs a recognized character.

图5示出了根据本发明一个优选实施例的可以对语音识别结果中的错误进行校正的语音识别系统,在该系统中,用户通过基于笔形的输入装置9输入正确的字符,而错误校正装置10要据语音模型7和语言模型8对候选者清单产生装置11产生的候选者清单进行筛选。 FIG. 5 shows a speech recognition system can be corrected on the voice recognition result in errors in an embodiment of the present invention, preferably, in the system, the user input the correct character-based input device pen 9, and the error correction device 10 candidates, according to a speech model 7 and a language model 8 pairs of candidate list generating means for generating a list of 11 were screened.

基于笔形的输入装置可以是书写板也可以是输入笔画或笔画类型的装置。 Based on the input device may be a pen-type writing pad may be a stroke or stroke type input device. 在不增加硬件的情况可以通过以下几种方式实现笔画输入:1.使用通用键盘上的一个子区域设计汉字笔画或笔画类型用于输入笔画。 In the case of the hardware can be realized without increasing the stroke input by the following ways: 1. The use of a sub-region designed stroke or stroke type characters on a universal keyboard for input stroke. 笔画类型可以使笔画的输入更为简单和可靠。 Strokes can cause the type of input strokes easier and more reliable.

2.在错误校正屏幕上设计虚拟的按键集。 2. Design a virtual set of buttons on the screen to correct the error.

3.用户可以使用鼠标来标识所希望的笔画。 3. The user can use the mouse to identify the desired stroke. 可以开发一个识别系统来识别整个笔画或笔画类型集。 A recognition system can be developed to identify the type of the entire stroke or stroke set.

4.也可以使用语音来输入笔画或笔画类型。 4 may be used for voice input stroke or stroke type.

此外,在对语音识别结果中的错误进行校正过程中,当用户标记了错误之后,根据请求可以弹出一个候选者清单。 Further, the speech recognition result of the error correcting process, an error flag when the user, upon request may pop up a candidate list. 在此我们描述一个错误校正屏幕的设计。 Here we describe a design error correction of the screen. 如图6所示,该错误校正屏幕由用于输入五种笔画类型的虚拟键盘和其右侧的候选者清单组成。 6, the error correction for the input screen by the five kinds of keyboard strokes and other types of virtual candidate list on the right composition. 当用户开始使用虚拟键盘输入笔画类型时,其右侧的候选者清单将自动更改,每输入一笔画类型,将显示新的顶层候选者。 When a user starts using the virtual keyboard input stroke type, the list of candidates will automatically change to its right, a stroke each input type, will show the new top-level candidates. 在同一屏幕上集成候选者清单和虚拟笔画键盘的用户接口将更便于提高错误校正的速度。 Integration of candidate lists and virtual keyboard strokes on the same screen user interface will make it easier to improve the error correction rate.

以上结合具体实施例描述了根据本发明的对语音识别结果中的错误进行校正的方法和具有错误校正功能的语音识别系统。 Described above with reference to specific embodiments for a method of speech recognition result correcting errors according to the present invention and a speech recognition system having an error correction function. 对于本领域技术人员来说很明显,在不背离本发明的精神前提下,可以对本发明做出许多修改,本发明旨在包括所有这些修改和变型,本发明的保护范围由所附权利要求书来限定。 For the skilled person it is clear that without departing from the spirit of the present invention, many modifications may be made to the present invention, the present invention is intended to include all such modifications and variations, the present invention is defined by the scope of the appended claims defined.

Claims (20)

1.一种对语音识别结果中的错误进行校正的方法,包括步骤:对输出的语音识别结果中的错误进行标记;用基于字形输入的方法输入与标记的错误相对应的正确字符;对基于字形的输入进行识别;显示候选的正确字符;用户从候选的正确字符中选出所需字符;和使用选出的字符代替错误的字符,所述方法的特征在于还包括步骤:使用错误字符的语音信息来对候选的正确字符进行筛选处理。 1. A speech recognition result is an error correction method, comprising the steps of: speech recognition result output errors in the marked; the wrong shape input by the input method based on the mark corresponding to the correct character; based shape recognition input; correct character candidates displayed; user select the correct character from the candidate characters; the use and character of the selected character in place of the error, the method being characterized in that further comprising the step of: using the wrong character voice information to the correct character candidate screening process.
2.根据权利要求1的对语音识别结果中的错误进行校正的方法,其特征在于所述语音识别为汉语语音识别,并且所述字符为汉语中的字、词或字、词的组合。 2. A method of speech recognition result correcting errors according to claim 1, characterized in that the voice recognition is a Chinese speech recognition, and the character is a Chinese word, a combination of words or characters, words.
3.根据权利要求1或2的对语音识别结果中的错误进行校正的方法,其特征在于所述错误字符的语音信息来自于语音识别阶段中用户的口述。 3. The method of speech recognition result correcting errors according to claim 1 or claim 2, wherein said error character speech information from the speech recognition stage spoken user.
4.根据权利要求1或2的对语音识别结果中的错误进行校正的方法,其特征在于所述错误字符的语音信息是在错误校正阶段从用户口述中得到的。 4. The method of speech recognition result correcting errors as claimed in claim 1 or 2, wherein said error character speech information is received from the user in the dictation error correction stage.
5.根据权利要求1或2的对语音识别结果中的错误进行校正的方法,其特征在于所述语音信息为使用混淆矩阵得出的语音模型。 5. The method of speech recognition result correcting errors as claimed in claim 1 or 2, characterized in that the speech information using a speech model confusion matrix derived.
6.根据权利要求5的对语音识别结果中的错误进行校正的方法,其特征在于所述语音模型和字、词级语言模型配合使用,以对候选的字符进行筛选处理。 6. A method of speech recognition result correcting errors according to claim 5, wherein said speech models and words with word-level language model used for screening candidate character processing.
7.根据权利要求1或2的对语音识别结果中的错误进行校正的方法,其特征在于使用树形结构来组织候选字符,并使用所述语音信息对树形结构进行裁剪。 7. The method of speech recognition result correcting errors as claimed in claim 1 or 2, characterized by using a tree structure organized candidate characters, and voice information using the tree structure to be cut.
8.根据权利要求7的对语音识别结果中的错误进行校正的方法,其特征在于所述语音信息为使用混淆矩阵得出的语音模型。 8. A method of speech recognition result correcting errors as claimed in claim 7, wherein the speech information using a speech model confusion matrix derived.
9.根据权利要求8的对语音识别结果中的错误进行校正的方法,其特征在于所述语音模型可以和字、词级语言模型配合使用,以有效地对树形结构进行裁剪。 9. A method of speech recognition result correcting errors according to claim 8, characterized in that the speech and the word model, with the use of word-level language model, in order to effectively cut the tree structure.
10.根据权利要求1或2的对语音识别结果中的错误进行校正的方法,其特征在于:在同一屏幕上集成候选的正确字符和虚拟的笔画键盘。 10. The method of speech recognition result correcting errors as claimed in claim 1 or 2, wherein: the character candidates and the correct integration of the virtual keyboard strokes on the same screen.
11.一种可以对语音识别结果中的错误进行校正的语音识别系统,该语音识别系统包括:采集用户语音的语音检测装置;对语音模型中的每个发音计算出其与语音采样是否相同的概率估值的发音概率计算装置;根据语言模型计算文字在当前上下文情况下出现的概率估值的文字概率计算装置;对所述发音概率计算装置和文字概率计算装置的计算结果进行综合以得出与联合最大概率值对应的文字作为语音识别结果的文字匹配装置;利用识别结果修改上下文的上下文产生装置;以及文字输出装置,所述语音识别系统的特征在于还包括一个错误校正装置,用户可以利用该错误校正装置对文字输出装置输出的语音识别结果中的错误进行标记,用基于字形输入的方法输入与标记的错误相对应的正确字符,而所述错误校正装置对基于字形的输入进行识别、产生候选 A correction can be made to the speech recognition system in the speech recognition result of errors, the speech recognition system comprising: speech detection means collecting user voice; each of the pronunciations of the speech model is calculated with the same whether the speech samples probability estimates pronunciation probability calculation means; calculating means in accordance with the language model probability calculation character text appears in the current context where valuations; calculation means calculating means and the character probability calculation to arrive at a comprehensive probability pronunciation combined with the maximum probability value corresponding to the character character matching means as a speech recognition result; recognition result using a context modify the context generating means; and wherein character output means, the speech recognition system is characterized by further comprising an error correction device, the user can use the error correction device of the speech recognition result character output means errors marked by the method of input labeled glyph corresponding to the input error based on a correct character, and said error correcting means based on the input shape recognition, generate candidate 正确字符并利用错误字符的语音信息对候选的正确字符进行筛选处理。 Correct incorrect characters and character of the candidate screening process information using voice wrong characters.
12.根据权利要求11的语音识别系统,其特征在于所述语音识别为汉语语音识别,并且所述字符为汉语中的字、词或字、词的组合。 12. A speech recognition system according to claim 11, characterized in that the voice recognition is a Chinese speech recognition, and the character is a Chinese word, a combination of words or characters, words.
13.根据权利要求11或12的语音识别系统,其特征在于所述错误字符的语音信息来自于语音识别阶段中用户的口述。 13. A speech recognition system according to claim 11 or claim 12, wherein said error character speech information from the speech recognition stage spoken user.
14.根据权利要求11或12的语音识别系统,其特征在于所述错误字符的语音信息是在错误校正阶段从用户口述中得到的。 14. The speech recognition system as claimed in claim 11 or 12, wherein said error character speech information is received from the user in the dictation error correction stage.
15.根据权利要求11或12的语音识别系统,其特征在于所述语音信息为使用混淆矩阵得出的语音模型。 15. The speech recognition system as claimed in claim 11 or 12, wherein said speech information using a speech model confusion matrix derived.
16.根据权利要求15的语音识别系统,其特征在于所述语音模型和字、词级语言模型配合使用,以对候选的字符进行筛选处理。 16. A speech recognition system according to claim 15, wherein said speech models and words with word-level language model used for screening candidate character processing.
17.根据权利要求11或12的语音识别系统,其特征在于使用树形结构来组织候选字符,并使用所述语音信息对树形结构进行裁剪。 17. The speech recognition system as claimed in claim 11 or 12, characterized in that the tree structure used to organize the candidate characters, and voice information using the tree structure of the crop.
18.根据权利要求17的语音识别系统,其特征在于所述语音信息为使用混淆矩阵得出的语音模型。 18. A speech recognition system according to claim 17, wherein said speech information using a speech model confusion matrix derived.
19.根据权利要求18的语音识别系统,其特征在于所述语音模型可以和字、词级语言模型配合使用,以有效地对树形结构进行裁剪。 19. A speech recognition system according to claim 18, wherein said speech models and words can, with the use of word-level language model, in order to effectively cut the tree structure.
20.根据权利要求11或12的语音识别系统,其特征在于:在同一屏幕上集成候选的正确字符和虚拟的笔画键盘。 20. The speech recognition system as claimed in claim 11 or 12, wherein: the character candidates and the correct integration of the virtual keyboard strokes on the same screen.
CN 99110695 1999-07-27 1999-07-27 Error correcting method for voice identification result and voice identification system CN1207664C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 99110695 CN1207664C (en) 1999-07-27 1999-07-27 Error correcting method for voice identification result and voice identification system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN 99110695 CN1207664C (en) 1999-07-27 1999-07-27 Error correcting method for voice identification result and voice identification system
TW88115493A TW449735B (en) 1999-07-27 1999-09-08 Error correction for Chinese speech recognition with alternative input methods
CA 2313968 CA2313968A1 (en) 1999-07-27 2000-07-17 A method for correcting the error characters in the result of speech recognition and the speech recognition system using the same
US09/624,962 US6513005B1 (en) 1999-07-27 2000-07-25 Method for correcting error characters in results of speech recognition and speech recognition system using the same

Publications (2)

Publication Number Publication Date
CN1282072A CN1282072A (en) 2001-01-31
CN1207664C true CN1207664C (en) 2005-06-22

Family

ID=5274644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 99110695 CN1207664C (en) 1999-07-27 1999-07-27 Error correcting method for voice identification result and voice identification system

Country Status (4)

Country Link
US (1) US6513005B1 (en)
CN (1) CN1207664C (en)
CA (1) CA2313968A1 (en)
TW (1) TW449735B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293299A (en) * 2017-06-16 2017-10-24 朱明增 It is a kind of to improve the speech recognition alignment system that dispatcher searches drawing efficiency

Families Citing this family (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156741C (en) * 1998-04-16 2004-07-07 国际商业机器公司 Chinese handwriting indentifying method and device
US6904405B2 (en) * 1999-07-17 2005-06-07 Edwin A. Suominen Message recognition using shared language model
US6996531B2 (en) * 2001-03-30 2006-02-07 Comverse Ltd. Automated database assistance using a telephone for a speech based or text based multimedia communication mode
US6708148B2 (en) * 2001-10-12 2004-03-16 Koninklijke Philips Electronics N.V. Correction device to mark parts of a recognized text
US6986106B2 (en) 2002-05-13 2006-01-10 Microsoft Corporation Correction widget
US20030233237A1 (en) * 2002-06-17 2003-12-18 Microsoft Corporation Integration of speech and stylus input to provide an efficient natural input experience
US7137076B2 (en) 2002-07-30 2006-11-14 Microsoft Corporation Correcting recognition results associated with user input
US7386454B2 (en) * 2002-07-31 2008-06-10 International Business Machines Corporation Natural error handling in speech recognition
US7120275B2 (en) * 2003-01-16 2006-10-10 Microsoft Corporation Ink recognition for use in character-based applications
US7117153B2 (en) * 2003-02-13 2006-10-03 Microsoft Corporation Method and apparatus for predicting word error rates from text
JP4000095B2 (en) * 2003-07-30 2007-10-31 株式会社東芝 Speech recognition method, apparatus and program
US7848573B2 (en) * 2003-12-03 2010-12-07 Microsoft Corporation Scaled text replacement of ink
US7506271B2 (en) * 2003-12-15 2009-03-17 Microsoft Corporation Multi-modal handwriting recognition correction
US8019602B2 (en) * 2004-01-20 2011-09-13 Microsoft Corporation Automatic speech recognition learning using user corrections
DE602005025088D1 (en) * 2004-03-03 2011-01-13 Nec Corp Image similar calculation system, image screening system, image life calculation process, and image life calculation program
US8589156B2 (en) * 2004-07-12 2013-11-19 Hewlett-Packard Development Company, L.P. Allocation of speech recognition tasks and combination of results thereof
US7725318B2 (en) * 2004-07-30 2010-05-25 Nice Systems Inc. System and method for improving the accuracy of audio searching
KR100679042B1 (en) * 2004-10-27 2007-02-06 삼성전자주식회사 Method and apparatus for speech recognition, and navigation system using for the same
CN100536532C (en) * 2005-05-23 2009-09-02 北京大学 Method and system for automatic subtilting
US7756341B2 (en) * 2005-06-30 2010-07-13 Xerox Corporation Generic visual categorization method and system
US8473295B2 (en) 2005-08-05 2013-06-25 Microsoft Corporation Redictation of misrecognized words using a list of alternatives
JP4708913B2 (en) * 2005-08-12 2011-06-22 キヤノン株式会社 Information processing method and information processing apparatus
US8249873B2 (en) * 2005-08-12 2012-08-21 Avaya Inc. Tonal correction of speech
US20070050188A1 (en) * 2005-08-26 2007-03-01 Avaya Technology Corp. Tone contour transformation of speech
US7949529B2 (en) * 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US20070094022A1 (en) * 2005-10-20 2007-04-26 Hahn Koo Method and device for recognizing human intent
US20070132834A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Speech disambiguation in a composite services enablement environment
JP4734155B2 (en) * 2006-03-24 2011-07-27 株式会社東芝 Speech recognition apparatus, speech recognition method, and speech recognition program
TWI305345B (en) * 2006-04-13 2009-01-11 Delta Electronics Inc System and method of the user interface for text-to-phone conversion
JP2007293595A (en) * 2006-04-25 2007-11-08 Canon Inc Information processor and information processing method
CN101460996B (en) * 2006-06-02 2012-10-31 日本电气株式会社 Gain control system, gain control method
US7925505B2 (en) * 2007-04-10 2011-04-12 Microsoft Corporation Adaptation of language models and context free grammar in speech recognition
US8457946B2 (en) * 2007-04-26 2013-06-04 Microsoft Corporation Recognition architecture for generating Asian characters
US20090228273A1 (en) * 2008-03-05 2009-09-10 Microsoft Corporation Handwriting-based user interface for correction of speech recognition errors
US20090326938A1 (en) * 2008-05-28 2009-12-31 Nokia Corporation Multiword text correction
CN101651788B (en) * 2008-12-26 2012-11-21 中国科学院声学研究所 Alignment system of on-line speech text and method thereof
WO2011075890A1 (en) * 2009-12-23 2011-06-30 Nokia Corporation Method and apparatus for editing speech recognized text
JP5158174B2 (en) * 2010-10-25 2013-03-06 株式会社デンソー Voice recognition device
US9123339B1 (en) * 2010-11-23 2015-09-01 Google Inc. Speech recognition using repeated utterances
US8515751B2 (en) * 2011-09-28 2013-08-20 Google Inc. Selective feedback for text recognition systems
KR20130135410A (en) * 2012-05-31 2013-12-11 삼성전자주식회사 Method for providing voice recognition function and an electronic device thereof
US8606577B1 (en) * 2012-06-25 2013-12-10 Google Inc. Visual confirmation of voice recognized text input
US9292487B1 (en) * 2012-08-16 2016-03-22 Amazon Technologies, Inc. Discriminative language model pruning
CN103021412B (en) * 2012-12-28 2014-12-10 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103000176B (en) * 2012-12-28 2014-12-10 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN104007952A (en) * 2013-02-27 2014-08-27 联想(北京)有限公司 Input method, device and electronic device
JP5807921B2 (en) * 2013-08-23 2015-11-10 国立研究開発法人情報通信研究機構 Quantitative F0 pattern generation device and method, model learning device for F0 pattern generation, and computer program
CN103699359B (en) * 2013-12-23 2017-12-29 华为技术有限公司 A kind of bearing calibration of voice command, correction system and electronic equipment
TWI587281B (en) * 2014-11-07 2017-06-11 Papago Inc Voice control system and its method
CN105808197B (en) * 2014-12-30 2019-07-26 联想(北京)有限公司 A kind of information processing method and electronic equipment
EP3089159B1 (en) 2015-04-28 2019-08-28 Google LLC Correcting voice recognition using selective re-speak
US10049655B1 (en) * 2016-01-05 2018-08-14 Google Llc Biasing voice correction suggestions
US9971758B1 (en) * 2016-01-06 2018-05-15 Google Llc Allowing spelling of arbitrary words
CN106406807A (en) * 2016-09-19 2017-02-15 北京云知声信息技术有限公司 A method and a device for voice correction of characters
CN106875949A (en) * 2017-04-28 2017-06-20 深圳市大乘科技股份有限公司 A kind of bearing calibration of speech recognition and device
CN109949828A (en) * 2017-12-20 2019-06-28 北京君林科技股份有限公司 A kind of text method of calibration and device
US10269376B1 (en) * 2018-06-28 2019-04-23 Invoca, Inc. Desired signal spotting in noisy, flawed environments

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5287275A (en) * 1988-08-20 1994-02-15 Fujitsu Limited Image recognition apparatus and method for recognizing a pattern within an image
US5883986A (en) * 1995-06-02 1999-03-16 Xerox Corporation Method and system for automatic transcription correction
US5768422A (en) * 1995-08-08 1998-06-16 Apple Computer, Inc. Method for training an adaptive statistical classifier to discriminate against inproper patterns
US6340967B1 (en) * 1998-04-24 2002-01-22 Natural Input Solutions Inc. Pen based edit correction interface method and apparatus
US6393395B1 (en) * 1999-01-07 2002-05-21 Microsoft Corporation Handwriting and speech recognizer using neural network with separate start and continuation output scores

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293299A (en) * 2017-06-16 2017-10-24 朱明增 It is a kind of to improve the speech recognition alignment system that dispatcher searches drawing efficiency

Also Published As

Publication number Publication date
TW449735B (en) 2001-08-11
US6513005B1 (en) 2003-01-28
CA2313968A1 (en) 2001-01-27
CN1282072A (en) 2001-01-31

Similar Documents

Publication Publication Date Title
JP6251958B2 (en) Utterance analysis device, voice dialogue control device, method, and program
Schuster et al. Japanese and korean voice search
TWI532035B (en) Method for building language model, speech recognition method and electronic apparatus
JP5330450B2 (en) Topic-specific models for text formatting and speech recognition
Zhang et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams
Novotney et al. Cheap, fast and good enough: Automatic speech recognition with non-expert transcription
Marti et al. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system
Davis N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics
Kondrak A new algorithm for the alignment of phonetic sequences
Graves et al. A novel connectionist system for unconstrained handwriting recognition
Graves et al. Unconstrained on-line handwriting recognition with recurrent neural networks
US6681206B1 (en) Method for generating morphemes
US6694296B1 (en) Method and apparatus for the recognition of spelled spoken words
EP1575029B1 (en) Generating large units of graphonemes with mutual information criterion for letter to sound conversion
CN100568223C (en) The method and apparatus that is used for the multi-mode input of ideographic language
Biadsy et al. Online arabic handwriting recognition using hidden markov models
US5949961A (en) Word syllabification in speech synthesis system
US7917350B2 (en) Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building
US7475015B2 (en) Semantic language modeling and confidence measurement
US8200491B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
CN1156741C (en) Chinese handwriting indentifying method and device
KR100656736B1 (en) System and method for disambiguating phonetic input
US7383172B1 (en) Process and system for semantically recognizing, correcting, and suggesting domain specific speech
US6956969B2 (en) Methods and apparatuses for handwriting recognition
Al-Emami et al. On-line recognition of handwritten Arabic characters

Legal Events

Date Code Title Description
C06 Publication
C14 Grant of patent or utility model
ASS Succession or assignment of patent right

Owner name: NEW ANST COMMUNICATION CO.,LTD.

Free format text: FORMER OWNER: INTERNATIONAL BUSINESS MACHINE CORP.

Effective date: 20090911

C41 Transfer of patent application or patent right or utility model
CX01 Expiry of patent term