KR100639931B1

KR100639931B1 - Recognition error correction apparatus for interactive voice recognition system and method therefof

Info

Publication number: KR100639931B1
Application number: KR1020040097115A
Authority: KR
Inventors: 김정세; 김상훈
Original assignee: 한국전자통신연구원
Priority date: 2004-11-24
Filing date: 2004-11-24
Publication date: 2006-11-01
Also published as: KR20060057921A

Abstract

본 발명은 음성인식 결과인 단어들이나 문장에 대해 인식 오류를 수정하도록 하는 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법에 관한 것이다. 상기와 같은 목적을 달성하기 위한 본 발명의 대화형 음성인식 시스템의 인식오류 보정장치는, 음성을 입력하는 음성신호입력부와, 입력된 음성을 처리 분석하는 음성 분석부와, 언어 모델과 음향 모델을 이용하여 음성 분석 결과에 대해 음성 인식 처리를 수행하는 음성 인식부를 갖는 음성인식 시스템에 있어서, 음성인식 후처리를 위한 확률값을 미리 저장하고 있는 저장 수단과, 상기 저장 수단에 저장된 확률값을 이용하여 상기 음성 인식부에 의해서 인식된 결과의 오류를 수정하는 음성인식 후처리수단과, 상기 음성인식 후처리수단에서 수정된 음성인식 결과를 출력하는 인식결과 출력수단을 포함하여 구성되는 것을 특징으로 한다. 본 발명의 의한 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법은 음성인식의 오류를 줄여줌으로써 인식율을 향상시킬 수 있다. The present invention relates to an apparatus for correcting a recognition error of an interactive speech recognition system and a method for correcting a recognition error for words or sentences resulting from speech recognition. In order to achieve the above object, an apparatus for correcting an error of an interactive speech recognition system of the present invention includes a speech signal input unit for inputting a speech, a speech analyzer for processing and analyzing the input speech, a language model and an acoustic model. A speech recognition system having a speech recognition unit that performs a speech recognition process on a speech analysis result by using: a storage means for storing a probability value for speech recognition post-processing in advance, and the speech using a probability value stored in the storage means. It is characterized in that it comprises a speech recognition post-processing means for correcting the error of the result recognized by the recognition unit, and a recognition result output means for outputting the speech recognition result corrected by the speech recognition post-processing means. Apparatus for correcting a recognition error of the interactive speech recognition system and the method according to the present invention can improve the recognition rate by reducing the error of speech recognition.

음성인식, 음성인식 후처리, 음향모델, 언어모델, 인식오류 수정Speech recognition, speech recognition post-processing, acoustic model, language model, recognition error correction

Description

Device for correcting recognition error of interactive speech recognition system and its method {RECOGNITION ERROR CORRECTION APPARATUS FOR INTERACTIVE VOICE RECOGNITION SYSTEM AND METHOD THEREFOF}

도 1은 본 발명의 일 실시예에 따른 대화형 음성인식 시스템의 인식오류 수정장치의 구성을 나타낸 기능 블럭도,1 is a functional block diagram showing the configuration of a recognition error correction apparatus of an interactive speech recognition system according to an embodiment of the present invention;

도 2는 본 발명의 일 실시예에 따른 대화형 음성인식 시스템의 인식오류 수정 방법을 나타낸 동작 플로우챠트이다.2 is a flowchart illustrating a method of correcting a recognition error of an interactive voice recognition system according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10 : 음성 신호 입력부10: voice signal input unit

20 : 음성 분석부20: voice analysis unit

30 : 음성 인식부30: speech recognition unit

40 : 음향 모델 저장부40: acoustic model storage unit

50 : 언어모델 저장부50: language model storage unit

60 : 음성인식 후처리부60: speech recognition post-processing unit

70 : 음성인식 후처리 사전부70: speech recognition post-processing dictionary

80 : 음성인식결과 출력부80: voice recognition result output unit

본 발명은 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법에 관한 것으로, 특히 음성인식 결과인 단어들이나 문장에 대해 인식 오류를 수정하도록 하는 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus for correcting an error of recognition of an interactive speech recognition system and a method thereof, and more particularly, to an apparatus for correcting an error of speech recognition for an interactive speech recognition system that corrects an error in recognition of words or sentences resulting from speech recognition. It is about.

종래의 기술은 음성인식부의 내부에서 음성인식의 성능을 향상시킬 수 있는 방법들에 대한 기술들이 대부분이며, 음성인식부에 자연언어처리에서 사용되는 기술들과 접목하여 음성인식기의 결과를 보정할 수 없는 문제점이 있었다.Conventional techniques are mostly techniques for improving the performance of speech recognition in the speech recognition section, and the results of speech recognition can be corrected by integrating the speech recognition section with techniques used in natural language processing. There was no problem.

따라서, 본 발명은 상기와 같은 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 음성인식의 결과에 대해 후처리를 이용하여 음성인식 기술을 향상시키도록 하는 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법을 제공하는 데 있다.
Accordingly, the present invention has been made to solve the above problems, an object of the present invention is to correct the recognition error of the interactive speech recognition system to improve the speech recognition technology by using post-processing for the result of speech recognition. And a method thereof.

상기와 같은 목적을 달성하기 위한 본 발명의 대화형 음성인식 시스템의 인식오류 보정장치는, 음성을 입력하는 음성신호입력부와, 입력된 음성을 처리 분석하는 음성 분석부와, 언어 모델과 음향 모델을 이용하여 음성 분석 결과에 대해 음성 인식 처리를 수행하는 음성 인식부를 갖는 음성인식 시스템에 있어서,In order to achieve the above object, an apparatus for correcting an error of an interactive speech recognition system of the present invention includes a speech signal input unit for inputting a speech, a speech analyzer for processing and analyzing the input speech, a language model and an acoustic model. In the speech recognition system having a speech recognition unit for performing a speech recognition process on the speech analysis result by using,

음성인식 후처리를 위한 확률값을 미리 저장하고 있는 저장 수단과, 상기 저장 수단에 저장된 확률값을 이용하여 상기 음성 인식부에 의해서 인식된 결과의 오류를 수정하는 음성인식 후처리수단과, 상기 음성인식 후처리수단에서 수정된 음성인식 결과를 출력하는 인식결과 출력수단을 포함하여 구성되는 것을 특징으로 한다.Storage means for storing a probability value for speech recognition post-processing in advance, speech recognition post-processing means for correcting an error of a result recognized by the speech recognition unit using the probability value stored in the storage means, and after the speech recognition Characterized in that it comprises a recognition result output means for outputting the modified voice recognition result in the processing means.

상기와 같은 목적을 달성하기 위한 본 발명의 대화형 음성인식 시스템의 인식오류 보정방법은, 화자의 음성을 입력되며, 입력된 음성을 처리 분석한 후, 언어 모델과 음향 모델을 이용하여 음성 분석 결과에 대해 음성 인식 처리를 수행하는 제1 단계와, 기 저장된 확률값을 이용하여 상기 인식된 결과의 오류를 수정하는 제2 단계와, 상기 수정된 음성인식 결과를 출력하는 제3 단계를 포함하여 이루어지는 것을 특징으로 한다.Recognition error correction method of the interactive speech recognition system of the present invention for achieving the above object, the input of the speaker's voice, after processing the input voice analysis, the speech analysis results using the language model and the acoustic model And a second step of correcting an error of the recognized result using a prestored probability value, and a third step of outputting the corrected voice recognition result. It features.

이하, 본 발명의 일 실시예에 의한 대화형 음성인식 시스템의 인식오류 보정장치 및 그 방법에 대하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, an apparatus for correcting a recognition error of an interactive voice recognition system and a method thereof according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 대화형 음성인식 시스템의 인식오류 보정장치의 기능 블록도이다.1 is a functional block diagram of a recognition error correction apparatus of an interactive voice recognition system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 대화형 음성인식 시스템의 인식오류 보정장치는 음성을 입력하는 음성신호입력부(10)와, 상기 입력된 음성을 처리 분석하는 음성 분석부(20)와, 음향 모델을 저장하는 음향 모델 저장부(40)와, 일반적인 기본 언어 모델을 저장하는 언어 모델 저장부(50)와, 상기 음향모델 저장부(40)에 저장된 음향 모델과 상기 언어 모델 저장부(50)에 저장된 언어 모델 저장부(50)에 저장된 언어 모델을 이용하여 음성 분석 결과에 대해 음성 인식 처리를 하는 음성 인식부(30)와, 음성인식 후처리를 위한 확률값들을 저장한 음성인식 호처리 사전부(70)와, 상기 저장된 확률값들을 이용하여 그 인식된 결과의 오류를 수정하는 음성인식 후처리부(60)와, 상기 음성인식 후처리부(60)에서 처리된 음성 인식 결과를 출력하는 인식결과출력부(80)로 구성된다.As shown in FIG. 1, an apparatus for correcting a recognition error of an interactive voice recognition system according to an embodiment of the present invention includes a voice signal input unit 10 for inputting a voice, and a voice analyzer for processing and analyzing the input voice. 20, an acoustic model storage unit 40 for storing an acoustic model, a language model storage unit 50 for storing a general basic language model, an acoustic model stored in the acoustic model storage unit 40, and the language The speech recognition unit 30 performs a speech recognition process on the speech analysis result by using the language model stored in the model storage unit 50 and the probability values for post-processing speech recognition. The speech recognition post-processing unit 70, the speech recognition post-processing unit 60 for correcting an error of the recognized result using the stored probability values, and the speech recognition result processed by the speech recognition post-processing unit 60 Recognition output And an output unit 80.

그러면, 상기와 같은 구성을 가지는 본 발명의 일 실시예에 따른 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법에 대해 도 2를 참조하여 설명하기로 한다.Next, an apparatus and method for correcting a recognition error of an interactive speech recognition system according to an embodiment of the present invention having the above configuration will be described with reference to FIG. 2.

도 2에 도시된 바와 같이, 먼저, 화자에 의해 음성신호 입력부(10)를 통해 화자의 음성이 입력되면(S10), 음성 분석부(20)는 음성 신호 입력부(10)에서 입력한 음성 신호에 대해서 노이즈 처리, 음성신호를 증폭시키는 처리 등 음성을 분석한다(S11). 상기 음성 신호 입력부(10)는 음성 입력을 위한 마이크 등의 장비이다. As shown in FIG. 2, first, when a speaker's voice is input by the speaker through the voice signal input unit 10 (S10), the voice analyzer 20 may be connected to the voice signal input by the voice signal input unit 10. The voice is analyzed, such as noise processing and amplification of the audio signal (S11). The voice signal input unit 10 is a device such as a microphone for voice input.

음성 인식부(30)는 음향모델 저장부(40)에 저장된 음향 모델과 언어모델 저장부(50)에 저장된 언어모델을 상기 화자의 음성을 인식한다(S12). The speech recognizer 30 recognizes the speaker's voice in the acoustic model stored in the acoustic model storage 40 and the language model stored in the language model storage 50 (S12).

상기 음향모델 저장부(40)에 저장된 음향 모델은 불특정 화자의 음성 인식을 위한 음향 모델로 예를 들면 HMM(Hidden Markov Model)이다.The acoustic model stored in the acoustic model storage 40 is an acoustic model for speech recognition of an unspecified speaker, for example, a Hidden Markov Model (HMM).

상기 언어모델 저장부(50)에 저장된 언어모델은 일반적인 기본 언어 모델 사전을 의미한다.The language model stored in the language model storage unit 50 means a general basic language model dictionary.

이어서, 음성인식 후처리부(60)는 음성인식부(30)의 결과를 받아 음성인식 후처리사전을 이용해서 음성인식의 결과인 문장이 제대로 된 것인지를 판단한다.Subsequently, the speech recognition post-processing unit 60 receives the result of the speech recognition unit 30 and determines whether the sentence resulting from the speech recognition is correct using the speech recognition post-processing dictionary.

즉, 상기 음성인식 후처리부(60)는 키워드의 쌍이 일반적인 텍스트에서 나타나는 조합인지를 판단함으로써 적절한 문장인지를 확인 할 수 있다. 예를 들어, "전화번호"와 "날씨"로 인식되었을 때 두개의 키워드는 일반 텍스트에서 함께 나타날 확률이 아주 낮기 때문에 잘못된 음성인식 결과라고 판단 할 수 있다. That is, the speech recognition post-processing unit 60 may check whether the pair of keywords is a proper sentence by determining whether the combination appears in the general text. For example, when recognized as "telephone number" and "weather", two keywords can be judged to be false voice recognition results because they are very unlikely to appear together in plain text.

일반적으로, 대화체의 특성상 완전한 구문 분석은 불가능하다. 일반적인 문법으로는 적용이 힘든 문장의 도치나 필요 키워드의 삭제가 빈번하게 때문이다. 또한, 현재의 정형화된 문장에 대한 구문 분석이 60-70% 이내이기 때문에 아직 사용하기에는 어렵다. 따라서, 상기와 같이 문장내의 키워드들의 조합으로 판단할 수밖에 없다.In general, full parsing is not possible due to the nature of the dialogue. This is due to the frequent incidence of sentences that are difficult to apply in the grammar and the deletion of necessary keywords. In addition, since the parsing of current formalized sentences is within 60-70%, it is still difficult to use. Therefore, it can only be determined by the combination of the keywords in the sentence as described above.

이 판단을 위한 두 키워드간의 확률을 계산하는 방법에 이용되는 정보로는 시소러스(thesaurus), 온톨러지(ontology) 및 상호정보(MI: Mutual Information) 등이 있다.Information used in the method of calculating the probability between two keywords for this determination includes thesaurus, ontology, and mutual information (MI).

아래는 상호정보(MI)에 대한 예이다.The following is an example of mutual information (MI).

상호정보(MI)에 의한 확률 계산법은 키워드에 포함된 단어와 단어들 사이의 상호관계(수학식 1)에 기초하여 이루어진다.The probability calculation method using the mutual information MI is performed based on the words included in the keyword and the correlation between the words (Equation 1).

MI = P(x,y) / P(x) * P(y)MI = P (x, y) / P (x) * P (y)

여기서, P는 확률이고, x와 y는 각각 키워드를 나타낸다. 즉, x라는 키워드와 y라는 키워드들이 각각 나타날 확률에 x 키워드와 y키워드가 동시에 나타날 확률이 상호정보이다.Where P is a probability and x and y each represent a keyword. That is, the probability that the x keyword and the y keyword appear simultaneously in the probability that the keyword x and the keyword y appear respectively is mutual information.

상기 수학식 1에 도시된 바와 같이, 각 키워드들은 서로 통계적인 관계를 갖는다. 이 확률값을 미리 계산하여 음성인식 후처리 사전부(70)에 저장하고, 상기 음성인식 후처리부(60)는 이 확률값을 이용하여 음성인식의 결과를 판단할 수 있다.As shown in Equation 1, each keyword has a statistical relationship with each other. The probability value is calculated in advance and stored in the speech recognition post-processing dictionary unit 70, and the speech recognition post-processing unit 60 can determine the result of speech recognition using this probability value.

아래는 시소러스(thesaurus)난 워드넷(WordNet)에 대한 예이다.Below is an example of a thesaurus or WordNet.

시소러스는 통제된 색인어의 어휘집으로, 개념간의 특정 관계를 형식적으로 조직화하여 명시한 것으로 적용분야에 따라 각기 다르게 구성된다. 즉, 어떤 단어와 단어사이의 관계를 정의해 놓은 것으로, 단어 x와 단어 y 사이에 얼마나 많은 단어들을 건너야 만날 수 있는 지를 계산함으로써 상호정보와 같은 확률값을 대치하여 사용할 수 있다.A thesaurus is a controlled index of lexical terms that formally organizes specific relationships between concepts and is organized differently in different applications. In other words, by defining a relationship between a word and a word, it is possible to substitute probability values such as mutual information by calculating how many words must be crossed between the word x and the word y.

상기 음성인식 후처리부(60)는 상기 음성 인식부(30)에 인식된 결과를 상기와 같은 방법으로 확률값을 계산하게 된다(S13). 음성인식 후처리부(60)는 이와 같이 계산된 확률값과 임계치를 비교하게 된다(S14).The speech recognition post-processing unit 60 calculates a probability value based on the result recognized by the speech recognition unit 30 in the same manner as described above (S13). The speech recognition post-processing unit 60 compares the calculated probability value with the threshold value (S14).

상기 확률값이 임계치보다 낮을 경우, 음성인식 후처리부(60)는 음성인식의 중간결과인 인식 후보 리스트를 음성 인식부(30)에 요구하게 된다(S15). 음성인식 후처리부(60)는 그 후보 리스트에 나타난 인식 후보들에 대해 일일이 확률값을 계산한 후에(S16), 그 계산된 확률값들과 상기 임계치를 비교하게 된다(S17).If the probability value is lower than the threshold value, the speech recognition post-processing unit 60 requests the speech recognition unit 30 to recognize the recognition candidate list, which is an intermediate result of the speech recognition (S15). The speech recognition post-processing unit 60 calculates probability values for the recognition candidates shown in the candidate list (S16), and then compares the calculated probability values with the threshold value (S17).

음성인식 후처리부(60)는 그 임계치를 넘는 확률값들 중에서 최대 확률값을 선택하게 된다(S18). 이어서, 상기 음성인식 후처리부(60)는 상기 선택된 최대 확률값을 인식결과로 출력하게된다(S19).The speech recognition post-processing unit 60 selects the maximum probability value from the probability values exceeding the threshold (S18). Subsequently, the speech recognition post-processing unit 60 outputs the selected maximum probability value as a recognition result (S19).

만약에, 그 계산된 후보 확률값들 모두가 상기 임계치를 넘지 못한다면, 음성인식 후처리부(60)는 화자에게 무엇을 요구하는 것 인지를 되물어 볼 수도 있다(S20).If all of the calculated candidate probability values do not exceed the threshold, the speech recognition post-processing unit 60 may ask the speaker what to ask (S20).

상기 확률값이 임계치를 넘을 경우(S14), 음성인식 후처리부(60)는 바로 그 확률값을 음성인식 결과로 출력하게 된다(S19).When the probability value exceeds the threshold (S14), the speech recognition post-processing unit 60 outputs the very probability value as the speech recognition result (S19).

만약, 키워드가 하나일 경우는 이전 인식 문장을 이용하여 무엇을 요구하는 지를 확인할 수 있다. 예들 들어, 이전에 화자가 "오늘의 서울 날씨는 어때"라고 물었고 인식이 되었을 경우, 다음에 화자가 "내일은"이라고 말을 했을 경우 내일의 날씨를 묻는 것임을 판단 할 수 있다.If there is only one keyword, it is possible to check what is requested by using the previous recognition sentence. For example, if the speaker previously asked "How is the weather in Seoul today" and is recognized, the next time the speaker says "Tomorrow is the day", you can determine that you are asking for the weather for tomorrow.

음성인식 후처리 사전부(70)는 음성인식 후처리부(60)에서 사용할 사전을 저장하는 것으로, 시소러스(thesaurus), 온톨로지(ontology) 및 상호정보(MI) 등으로 구성할 수 있다.The speech recognition post-processing dictionary unit 70 stores a dictionary to be used by the speech recognition post-processing unit 60, and may include a thesaurus, an ontology, mutual information MI, and the like.

인식 결과 출력부(80)는 음성 인식부(30)에서의 인식 결과를 모니터나 기타 출력 장치로 출력하게 된다.The recognition result output unit 80 outputs the recognition result from the voice recognition unit 30 to a monitor or other output device.

이상에서 몇 가지 실시예를 들어 본 발명을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것이 아니고 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변경실시될 수 있다.Although the present invention has been described in more detail with reference to some embodiments, the present invention is not necessarily limited to these embodiments and can be variously modified and implemented within the scope without departing from the spirit of the present invention.

상술한 바와 같이, 본 발명의 의한 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법은 사용자와 대화할 때 음성인식부의 오류를 최소화하기 위해 음성인식후처리를 이용하여 성능이 높은 대화형 음성 인식 시스템을 실현 할 수 있다. As described above, the apparatus for correcting a recognition error of the interactive speech recognition system and the method thereof according to the present invention utilizes the speech recognition post processing to minimize the error of the speech recognition unit when talking with a user. The system can be realized.

또한, 본 발명의 의한 대화형 음성인식 시스템의 인식오류 수정장치 및 그 방법은 음성인식의 오류를 줄여줌으로써 인식율을 향상시킬 수 있다.

In addition, the apparatus for correcting a recognition error of the interactive speech recognition system and the method according to the present invention can improve the recognition rate by reducing the error of speech recognition.

Claims

In the speech recognition system having a voice signal input unit for inputting a voice, a speech analysis unit for processing and analyzing the input voice, and a speech recognition unit for performing a speech recognition process on the speech analysis results using the language model and the acoustic model,

Storage means for storing a probability value for speech recognition post-processing in advance;

A probability calculating means for calculating a probability between conversation keywords as a result of the recognition of the speech recognition unit as a probability value, a comparison means for comparing whether the probability value calculated by the probability calculating means exceeds a threshold, and the comparison means. A recognition candidate list requesting means for requesting a recognition candidate list from the speech recognition unit if the result probability value does not exceed a threshold value, the error of the result recognized by the speech recognition unit using the probability value stored in the storage means; A speech recognition post-processing means for correcting;

Recognition error correction apparatus of the interactive speech recognition system, characterized in that it comprises a recognition result output means for outputting the speech recognition result modified by the speech recognition post-processing means

The method of claim 1,

And a probability value stored in the storage means is at least one of mutual information, a thesaurus and an ontology.

delete

The method of claim 1,

The probability calculating means calculates a probability value for each of the recognition candidates of the recognition candidate list requested by the recognition candidate list requesting means, and the comparing means compares whether each of the calculated probability values exceeds the threshold value. Device for correcting error of interactive voice recognition system.

The method of claim 4, wherein

And a maximum probability value selecting means for selecting a maximum probability value among the calculated probability values when the probability value of the comparison means is greater than a threshold value.

A first step of inputting a speaker's voice, processing and analyzing the input voice, and performing a voice recognition process on a voice analysis result using a language model and an acoustic model;

A first process of calculating a probability between conversation keywords as a result of speech recognition as a probability value, a second process of comparing the calculated probability value with a threshold value, and if the comparison result probability value does not exceed a threshold value, recognition A second step of requesting a candidate list and correcting an error of the recognized result using a previously stored probability value;

And a third step of outputting the modified speech recognition result.

The method of claim 6,

And the previously stored probability value is at least one of mutual information, a thesaurus, and an ontology.

delete

The method of claim 6, wherein the second step,

A fourth step of calculating a probability value for each of the recognition candidates in the requested recognition candidate list;

A fifth step of comparing whether each of the calculated probability values exceeds the threshold;

And a sixth step of selecting a maximum probability value from the calculated probability values when the probability value is greater than the threshold value as a result of the comparison.