KR100766058B1

KR100766058B1 - method and apparatus for exceptional case handling in spoken dialog system

Info

Publication number: KR100766058B1
Application number: KR1020050119974A
Authority: KR
Inventors: 윤승; 김상훈
Original assignee: 한국전자통신연구원
Priority date: 2005-12-08
Filing date: 2005-12-08
Publication date: 2007-10-11
Also published as: KR20070060491A

Abstract

본 발명은 대화형 음성 인터페이스 시스템에서 예외 상황을 처리하는 방법 및 장치에 관한 것이다. 본 발명은 음성대화 말뭉치에서 사용자에 의한 예외 발화와 음성 인식 오류 등을 추출하고, 이들 중 예외 처리 대상을 DB 화 한 다음, 이러한 예외 상황을 해결하기 위한 예외 해소 정보와 규칙들을 작성하고, 실제 시스템에서 예외 발화가 입력되었을 경우 기 구축된 DB와 정보 및 규칙들을 이용하여 예외 상황을 처리하도록 함으로써 사용자의 만족도를 개선하고 시스템의 성능을 높이고자 한다.The present invention relates to a method and apparatus for handling an exception in an interactive voice interface system. The present invention extracts an exception utterance and speech recognition error by the user from the speech conversation corpus, and makes an exception handling target DB among them, and then creates exception resolution information and rules for solving such an exception situation, and actual system In case exception spoken is input from, we want to improve user's satisfaction and improve system performance by handling exception situation using pre-established DB, information and rules.

대화형 음성 인터페이스 시스템, 예외상황처리, 대화모델링 Interactive Voice Interface System, Exception Handling, Dialog Modeling

Description

Method and apparatus for exceptional case handling in spoken dialog system

도 1 은 본 발명에 따른 대화형 음성 인터페이스 장치를 나타낸 구성도1 is a block diagram showing an interactive voice interface device according to the present invention

도 2 는 본 발명에 따른 대화형 음성 인터페이스 시스템에서의 예외 상황 처리부를 상세히 나타낸 구성도2 is a configuration diagram showing in detail the exception processing unit in the interactive voice interface system according to the present invention;

도 3 은 본 발명에 따른 예외 상황 처리 방법을 나타낸 흐름도3 is a flowchart illustrating an exception handling method according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

100 : 대화형 음성 인터페이스 장치 110 : 음성 인식부100: interactive voice interface device 110: voice recognition unit

120 : 대화 처리부 121 : 예외 상황 처리부120: conversation processing unit 121: exception processing unit

123 : 입력문 분석부 125 : 대화 관리부123: input sentence analysis unit 125: conversation management unit

127 : 작업 관리부 129 : 출력문 생성부127: task management unit 129: output statement generation unit

130 : 음성 합성부 200 : 음성대화 말뭉치130: speech synthesis unit 200: speech conversation corpus

210 : 예외 상황 말뭉치 220 : 예외상황처리대상 DB210: exception situation corpus 220: exception handling target DB

221 : 서비스 미지원 표현 DB 223 : 중의적 표현 DB221: Service not supported expression DB 223: Chinese expression DB

225 : 음성인식 오류 DB 227 : 오류판별 DB225: Speech recognition error DB 227: Error discrimination DB

230 : 오류해소 처리부230: error resolution processing unit

본 발명은 대화형 음성 인터페이스 시스템에 사용되는 음성 대화 처리 기술에 관한 것으로, 특히 대화 진행 중의 예외 상황에 대처하는 기술에 관한 것이다.TECHNICAL FIELD The present invention relates to a voice chat processing technique used in an interactive voice interface system, and more particularly, to a technique for dealing with an exception situation during a conversation.

대화형 음성 인터페이스 시스템에서 예외 상황은 음성 인식 오류와 사용자의 시스템에 대한 이해 부족에 따른 인식 결과 처리 오류에서 발생하게 되며 정상적인 대화 흐름을 방해해 시스템이 본래에 이루고자 했던 목적을 달성할 수 없게 만들게 된다.In interactive speech interface systems, exceptions occur in speech recognition errors and in recognition result processing errors due to the user's lack of understanding of the system, and disrupt the normal flow of conversations, making the system unable to achieve its intended purpose. .

종래의 기술은 이러한 예외 상황을 해결하기 위해 주로 음성 인식 오류 해소에 초점을 맞추어 왔다. 즉 음성 인식기의 인식 결과에 대한 언어모델 신뢰도와 음향모델 신뢰도 또는 구문 분석 신뢰도 등을 이용해 이 값이 임계치 이하일 경우 재발성을 요구하는 방법으로 이를 해결하고자 했으며, 경우에 따라서는 각종 정보를 이용하여 오류 예상 구간을 복구하거나 대화 진행 흐름상에서 현재 위치를 변경하는 방법 등을 통한 오류 해결 등을 추가로 시도해 왔다. Prior art has mainly focused on resolving speech recognition errors to solve this exception. In other words, we tried to solve this problem by requesting recurrence when this value is below the threshold using language model reliability, acoustic model reliability, or parsing reliability of the recognition result of the speech recognizer. Further attempts have been made to resolve the error, such as restoring the expected section or changing the current position in the dialogue flow.

그러나 이는 음성 인식결과에 대한 신뢰도가 임계치보다 높을 때에는 오류를 판별해내지 못한다는 한계를 가지며, 또한 기본적으로 사용자의 발화가 시스템의 의도에 부합해 이루어졌을 것이라는 가정 하에서 오류 처리가 이루어지므로 특히 사용자 주도형 시스템과 같이 시스템이 예상하지 못한 사용자에 의한 예외가 빈번하게 나타나는 경우에는 오류 처리가 이루어졌을지라도 여전히 처리 결과에 오류가 남아 있게 되는 문제가 있다.However, it has a limitation that it is impossible to discriminate an error when the reliability of the speech recognition result is higher than the threshold, and the error handling is performed under the assumption that the user's speech is basically made in accordance with the intention of the system. If an exception occurs frequently by an unexpected user, such as a system, there is a problem that an error remains in the processing result even though error processing is performed.

아직까지 대화형 음성인터페이스 시스템이 일반인에게 완전히 실용화 되어 있지 않고, 음성 인식 성능과 적용 영역 제한 등에 따른 많은 한계를 지니므로 일반 사용자가 자연스럽게 이를 이용하기는 매우 어려운 실정임에도 종래의 방법은 예외 상황이 발생하는 경우에 대한 일반적인 고려가 부족한 상태로 음성 인식 오류에 대한 대처 방안만을 적용함으로써 실제 대화형 음성 인터페이스 시스템에서 빈번하게 발생하는 사용자에 의한 예외 상황을 효과적으로 처리할 수 없었다.As the interactive voice interface system has not been fully practical to the general public yet, and has many limitations due to voice recognition performance and application area limitation, it is very difficult for the general user to use it naturally. In this case, it is not possible to effectively deal with the exceptions caused by users frequently occurring in the actual interactive voice interface system by applying only the countermeasures for the speech recognition error in a state in which the general consideration of the case is insufficient.

따라서 본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, 기 구축된 DB와 예외 상황 해소 정보 및 규칙들을 이용하여 예외 상황을 처리하도록 함으로써 예외 상황을 자연스럽게 해결할 수 있도록 대화 흐름이 진행되어 시스템의 목표를 달성할 수 있도록 하는 예외 상황 처리 방법 및 장치를 제공하는데 그 목적이 있다.Therefore, the present invention has been made to solve the above problems, the dialogue flow proceeds to solve the exception naturally by using the pre-established DB and exception resolution information and rules to handle the exception situation of the system It is an object of the present invention to provide a method and apparatus for handling exceptions to achieve a goal.

상기와 같은 목적을 달성하기 위한 예외 상황 처리 방법의 특징은 (a) 예외 상황 처리를 위한 다수개의 DB를 구축하는 단계와, (b) 상기 다수개의 DB에 예외상황 해소 정보 및 규칙을 작성하여 저장하는 단계와, (c) 예외 상황에 해당하는 표현이 사용자로부터 발화되었을 때 상기 작성된 예외 상황 정보 및 규칙을 이용하여 예외 상황을 처리하는 단계를 포함하는데 있다.The characteristic of the exception handling method for achieving the above object is (a) building a plurality of DB for exception handling, (b) creating and storing exception resolution information and rules in the plurality of DB And (c) processing the exception by using the prepared exception information and the rule when the expression corresponding to the exception is uttered by the user.

바람직하게 상기 (a) 단계는 모의 대화 방식으로 수집된 음성 파일 및 음성 전사문과 음성 파일에 대한 음성 인식 결과 텍스트로 구축된 음성 대화 말뭉치에서 예외 상황과 음성 인식 오류를 나타내는 예외 상황 말뭉치를 구축하는 단계와, 상기 구축된 예외 상황 말뭉치에 기반하여 예외 상황 처리에 필요한 서비스 미지원 표현 DB, 중의적 표현 DB, 음성 인식 오류 DB 및 발화 오류 판별 DB 중 적어도 하나 이상을 구축하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (a) comprises: constructing an exception situation corpus indicating an exception situation and a speech recognition error in a speech conversation corpus formed of a speech file collected in a mock conversation manner and a speech transcription text and a speech recognition result text for the speech file. And constructing at least one of a service-unsupported expression DB, a median expression DB, a speech recognition error DB, and a speech error determination DB necessary for processing the exception based on the constructed exception situation corpus. .

바람직하게 상기 제 (b) 단계는 상기 서비스 미지원 표현 DB, 중의적 표현 DB, 음성 인식 오류 DB 및 발화 오류 판별 DB 각각에 저장된 예외 상황 별로 예외 상황 해소 정보 및 규칙을 생성하여 저장하는 것을 특징으로 한다.Preferably, the step (b) may generate and store exception resolution information and rules for each exception situation stored in the service unsupported expression DB, the intermediate expression DB, the speech recognition error DB, and the speech error determination DB. .

바람직하게 상기 제 (c) 단계는 상기 시스템에서 제공할 수 없는 단어 및 문장으로 인한 서비스 미지원 표현이 발화되면 서비스될 시스템에서 제공 가능한 단어 및 문자를 생성하는 단계와, 상기 생성된 단어 및 문자를 이용하여 지원 가능 서비스를 제시하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (c) may include generating words and characters that can be provided by the system to be serviced when the service-unsupported expression due to words and sentences that cannot be provided by the system is uttered, and using the generated words and characters. To present a supportable service.

바람직하게 상기 제 (c) 단계는 상기 중의성을 가진 단어 또는 문장들로 인한 중의적 표현이 포함되어 발화되면 히스토리를 이용한 중의적 해결가능 여부를 다시 판단하는 단계와, 상기 판단 결과 히스토리를 이용해 중의성 해결이 가능하면 중의성을 해소하는 단계와, 상기 판단 결과 히스토리를 이용해 중의성 해결이 불가능하면 중의성 해소에 따른 부대화를 이용하여 중의성을 해소하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (c) includes re-determining whether it is possible to solve the problem by using the history when the speech is included with the neutral expression due to the word or sentence having the neutrality, and using the history as a result of the determination. Resolving neutrality if the resolution is possible, and resolving the neutrality by using the concomitant to resolve the neutrality if it is impossible to resolve the neutrality using the history of the determination result.

바람직하게 상기 제 (c) 단계는 상기 빈번한 음성 인식 오류 단어, 문장, 패턴으로 인한 음성 인식 오류 예외 상황이 발견되면 오류해소 테이블을 이용한 음성인식 오류의 해결 가능여부를 판단하는 단계와, 상기 판단결과 상기 오류해소 테이 블을 이용해 해결이 가능하면 이를 통해 음성인식오류를 해소하는 단계와, 상기 판단결과 상기 오류해소 테이블을 이용해 해결이 불가능하면 인식오류 해소에 따른 부대화를 이용하여 중의성 해소 또는 멀티 모달 입력장비로 전환하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (c) includes determining whether the voice recognition error can be solved using an error solving table when a voice recognition error exception situation due to the frequent voice recognition error word, sentence, or pattern is found, and the determination result. If it is possible to solve using the error elimination table to resolve the voice recognition error through this, and if it is impossible to solve using the error elimination table as a result of the determination, neutrality or multi-neutral by using the supplementary to solve the recognition error And switching to a modal input device.

바람직하게 상기 제 (c) 단계는 이전문장과 동일하거나 또는 유사한 단어로 이루어진 유사한 어순의 문장이 출현하여 사용자 답변을 오류로 판단하는 발화오류가 발견되면 발화오류 해소에 따른 부대화를 이용하여 오류를 해소하는 것을 특징으로 한다.Preferably, in the step (c), if a speech error that determines a user's answer as an error occurs when a sentence of a similar word order consisting of the same or similar words is found in the full-length chapter, the error is detected by using the incidental signal to solve the speech error. It is characterized by eliminating.

상기와 같은 목적을 달성하기 위한 예외 상황 처리 장치의 특징은 사용자의 발화를 텍스트로 변환하는 음성 인식부와, 상기 변환된 텍스트를 입력받아 음성 합성에 필요한 출력문을 생성하는 대화 처리부와, 상기 생성되는 출력문을 토대로 합성음을 생성하는 음성 합성부로 구성되는 대화형 음성 인터페이스 장치에 있어서, 모의 대화 방식에 의해 수집된 음성 파일, 음성 전사문 및 수집된 음성 파일 중 하나 이상을 이용하여 상기 음성 인식부에서 변환된 음성 인식 결과 텍스트를 추가하여 구성하는 음성 대화 말뭉치와, 상기 음성대화 말뭉치에 대응하여 예외 상황 및 음성 인식 오류로 구성된 예외 상황 말뭉치와, 상기 구성된 예외 상황 말뭉치를 이용해 예외 상황에 해당되는 구체적인 정보등을 DB로 구성한 예외상황처리대상DB와, 상기 예외상황처리대상DB에 구성된 정보를 이용하여 해당 표현마다 오류 해소 정보 및 규칙들을 작성하는 오류 해소 처리부를 포함하는 것을 특징으로 한다.Features of the exceptional situation processing apparatus for achieving the above object are a speech recognition unit for converting a user's speech into text, a conversation processing unit for receiving the converted text to generate an output sentence for speech synthesis, and the generated An interactive voice interface device comprising a voice synthesizer for generating a synthesized sound based on an output sentence, wherein the voice recognition unit converts the voice file using at least one of a voice file collected by a simulated conversation method, a voice transcription sentence, and a collected voice file. A speech conversation corpus formed by adding the speech recognition result text, an exception context corpus composed of an exception situation and a speech recognition error corresponding to the speech conversation corpus, and specific information corresponding to the exception context using the configured exception speech corpus Exception handling target DB comprising a DB and the exception Using information comprised in the destination DB is characterized in that it comprises a processing unit for creating the error eliminated error resolution information and the rules for each representation.

바람직하게 상기 음성대화 말뭉치는 WOZ(Wizard-of-Oz-Simulating)법인 모의 대화방식으로 구축되는 것을 특징으로 한다.Preferably, the voice conversation corpus is constructed in a simulated conversation method, which is a Wizard-of-Oz-Simulating (WOZ) method.

바람직하게 상기 예외상황처리대상DB는 사용자가 직관에 의해 빈번하게 발화하지만 실제 서비스될 시스템에서는 제공할 수 없는 단어 및 문장들을 모아 놓은 서비스 미지원 표현 DB와, 중의성을 가진 단어 또는 문장을 모아 놓은 중의적 표현 DB와, 빈번한 음성 인식 오류 단어, 문장, 패턴을 모아 놓은 음성 인식 오류 DB와, 사용자 답변이 이전 문장과 동일하거나 또는 유사한 단어로 이루어진 유사한 어순의 문장 등이 출현하는 경우 등과 같이 사용자 답변을 오류로 판단할 수 있는 규칙들을 모아 놓은 발화 오류 판별 DB로 구성되는 것을 특징으로 한다.Preferably, the exception handling target DB is a service unsupported expression DB that collects words and sentences that a user frequently utters by intuition but cannot be provided by a system to be serviced, and a word or sentence having neutrality is collected. The user's response, such as a speech expression DB, a speech recognition error DB that collects frequent speech recognition error words, sentences, and patterns, and a sentence of a similar word order composed of the same or similar words as the previous sentence. It is characterized by consisting of a speech error determination DB that collects the rules that can be determined as an error.

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments with reference to the accompanying drawings.

본 발명에 따른 대화형 음성 인터페이스 시스템에서의 예외 상황 처리 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다.A preferred embodiment of the exception handling method in the interactive voice interface system according to the present invention will be described below with reference to the accompanying drawings.

도 1 은 본 발명에 따른 대화형 음성 인터페이스 장치를 나타낸 구성도이다.1 is a block diagram showing an interactive voice interface device according to the present invention.

도 1과 같이, 대화형 음성 인터페이스 장치(100)는 사용자의 발화를 텍스트로 변환하는 음성 인식부(110)와, 상기 변환된 텍스트를 입력받아 기 구축된 예외 상황 처리 대상 DB 및 이를 해결하기 위한 예외 해소 정보 및 규칙들을 이용하여 예외 상황이 처리된 음성 합성에 필요한 출력문을 생성하는 대화 처리부(120)와, 상기 대화 처리부에서 예외 상황 처리를 통해 출력되는 출력문을 토대로 합성음을 생성하는 음성 합성부(130)로 구성된다. As illustrated in FIG. 1, the interactive voice interface device 100 includes a voice recognition unit 110 that converts a user's speech into text, an exception situation processing target DB that is already built by receiving the converted text, and a solution for solving the same. The dialogue processing unit 120 generates an output sentence required for the speech synthesis in which the exception is processed using exception resolution information and rules, and the speech synthesis unit generating the synthesized sound based on the output sentence output through the exception processing in the dialogue processing unit. 130).

이때, 상기 대화 처리부(120)는 사용자에 의한 예외 발화와 음성 인식 오류 등을 추출하고, 기 구축된 예외 상황 처리 대상 DB와 이를 해결하기 위한 예외 해소 정보 및 규칙들을 이용하여 예외 상황을 처리하는 예외 상황 처리부(121)와, 예외 상황 처리를 통해 출력된 텍스트를 기반으로 사용자의 발화 의도를 파악하는 입력문 분석부(123)와, 현재의 발화가 어떠한 대화 흐름에 있는지를 판단하는 대화 관리부(125) 및 작업 관리부(127)와, 음성 합성에 필요한 출력문을 생성하는 출력문 생성부(129)로 구성된다.At this time, the conversation processing unit 120 extracts an exception speech and a voice recognition error by a user, and handles the exception using an exception processing target DB that is already built and exception resolution information and rules for solving the exception. The situation processing unit 121, an input sentence analysis unit 123 for identifying a user's speech intention based on the text output through the exception situation processing, and the dialog managing unit 125 for determining which conversation flow the current speech is in And a task management unit 127, and an output sentence generation unit 129 for generating an output sentence necessary for speech synthesis.

도 2 는 본 발명에 따른 대화형 음성 인터페이스 시스템에서의 예외 상황 처리부를 상세히 나타낸 구성도이다.2 is a block diagram showing in detail the exception processing unit in the interactive voice interface system according to the present invention.

도 2와 같이, 모의 대화 방식에 의해 수집된 음성 파일 및 음성 전사문, 그리고 수집된 음성 파일을 이용하여 상기 음성인식부(110)에서 변환된 음성 인식 결과 텍스트를 추가하여 구성된 음성 대화 말뭉치(200)와, 상기 음성대화 말뭉치(200)에 대응하여 예외 상황 및 음성 인식 오류로 구성된 예외 상황 말뭉치(210)와, 상기 구성된 예외 상황 말뭉치(210)를 이용해 예외 상황에 해당되는 구체적인 정보등을 DB로 구성한 예외상황처리대상DB(220)와, 상기 예외상황처리대상DB(220)에 구성된 정보를 이용하여 해당 표현마다 오류 해소 정보 및 규칙들을 작성하는 오류 해소 처리부(230)로 구성된다.As shown in FIG. 2, the voice conversation corpus 200 configured by adding the speech recognition result text converted by the speech recognition unit 110 using the speech file and the voice transcription sentence collected by the simulation conversation method and the collected speech file. ), An exception situation corpus 210 composed of an exception situation and a speech recognition error corresponding to the speech conversation corpus 200, and specific information corresponding to the exception situation using the configured exception situation corpus 210 to a DB. It is composed of an exception situation processing target DB (220) configured and an error resolution processing unit (230) for creating error resolution information and rules for each expression by using the information configured in the exception situation processing target DB (220).

이때, 상기 음성대화 말뭉치(200)는 WOZ(Wizard-of-Oz-Simulating)법과 같은 모의대화방식으로 구축되는 것이 바람직하다. In this case, the voice conversation corpus 200 is preferably constructed by a simulation conversation method such as the Wizard-of-Oz-Simulating (WOZ) method.

아울러 상기 예외상황처리대상DB(220)는 사용자가 직관에 의해 빈번하게 발화하지만 실제 서비스될 시스템에서는 제공할 수 없는 단어 및 문장들을 모아 놓은 서비스 미지원 표현 DB(221)와, 중의성을 가진 단어 또는 문장을 모아 놓은 중의적 표현 DB(223)와, 빈번한 음성 인식 오류 단어, 문장, 패턴을 모아 놓은 음성 인식 오류 DB(225)와, 사용자 답변이 이전 문장과 동일하거나 또는 유사한 단어로 이루어진 유사한 어순의 문장 등이 출현하는 경우 등과 같이 사용자 답변을 오류로 판단할 수 있는 규칙들을 모아 놓은 발화 오류 판별 DB(227)를 포함하여 구성된다. In addition, the exception situation processing target DB 220 is a service unsupported expression DB 221 that collects words and sentences that a user frequently utters by intuition but cannot be provided in a system to be actually serviced, and a word having a neutrality or A idiom DB 223 that collects sentences, a VO of DB 225 that collects frequent speech recognition error words, sentences, and patterns, and a similar word order in which the user's answer is the same or similar to the previous sentence. It is configured to include a speech error determination DB 227 that collects rules for determining a user's answer as an error, such as when a sentence or the like appears.

이때, 상기 예외상황처리대상DB(220)에는 필요에 따라 다른 종류의 DB들도 추가될 수 있다.In this case, other types of DBs may be added to the exception situation processing target DB 220 as necessary.

이와 같이 구성된 본 발명에 따른 예외 상황 처리 장치 및 이를 이용한 대화형 음성 인터페이스 장치에 따른 동작을 첨부한 도면을 참조하여 상세히 설명하면 다음과 같다.An operation according to the exceptional situation processing apparatus and the interactive voice interface apparatus using the same according to the present invention configured as described above will be described in detail with reference to the accompanying drawings.

도 2를 참조하여 설명하면, 모의 대화 방식에 의해 수집된 음성 파일 및 음성 전사문으로 음성 대화 말뭉치(200)를 구축하고 여기에 수집된 음성 파일을 이용하여 상기 음성인식부(110)에서 변환된 음성 인식 결과 텍스트를 음성 대화 말뭉치(200)에 추가 구축한 후, 상기 음성대화 말뭉치(200)에 대응하여 예외 상황 및 음성 인식 오류로 구성된 예외 상황 말뭉치(210)를 구축한다.Referring to FIG. 2, a voice conversation corpus 200 is constructed from a voice file and a voice transcription sentence collected by a simulated conversation method, and converted by the voice recognition unit 110 using the collected voice file. After the speech recognition result text is further added to the speech conversation corpus 200, an exception situation corpus 210 composed of an exception situation and a speech recognition error is constructed in response to the speech conversation corpus 200. FIG.

이때, 상기 음성대화 말뭉치(200)는 사용자에게 나중에 실제 구축될 시스템을 이용할 때와 동일한 정보만을 제공한 상태에서 대화를 수집한다. At this time, the voice conversation corpus 200 collects the conversation while providing only the same information to the user as when using the system to be actually constructed later.

상기 대화를 수집하는 방법은 목표로 하는 시스템의 성능 수준에서 자연스럽게 대화를 진행해 나가되, 한편으로는 사용자의 발화에 예외 상황이 발생할 수 있도록 사용자를 유도하여 대화를 진행해 나가면서 수집하게 된다.The method of collecting the conversation naturally proceeds with the conversation at the performance level of the target system, while collecting the conversation by inducing the user to cause an exception in the user's speech.

이렇게 수집된 사용자의 발화에 대하여는 대화형 음성 인터페이스 시스템에 적용하고자 하는 음성인식기를 이용하여 음성 인식 결과도 함께 음성 대화 말뭉치(200)로 구축한다. 또한, 구축된 음성 대화 말뭉치(200)에 대해서는 정상적인 경우와 사용자에 의한 예외 상황, 그리고 인식 오류의 경우 등을 판별할 수 있는 주석을 달아 예외 상황 말뭉치(210)를 재구축하도록 한다.The collected speech of the user is constructed using the voice recognizer to be applied to the interactive voice interface system as a voice conversation corpus 200 together. In addition, the constructed voice conversation corpus 200 may be reconstructed by an exception situation corpus 210 by annotating to determine a normal case, an exception situation by a user, and a case of a recognition error.

이와 같이, 주석이 부착된 예외 상황 말뭉치(210)를 이용해 예외 상황 처리를 위한 각종 DB를 구축하도록 한다. In this way, using the annotated exception situation corpus 210 to build a variety of DB for exception processing.

이때, 상기 구축될 수 있는 DB로는 먼저, 사용자가 직관에 의해 빈번하게 발화하지만 실제 서비스될 시스템에서는 제공할 수 없는 단어 및 문장들을 모아 놓은 ‘서비스 미지원 표현 DB(221)’, 그리고 중의성을 가진 단어 또는 문장을 모아 놓은 ‘중의적 표현 DB(223)’, 또 빈번한 음성 인식 오류 단어, 문장, 패턴을 모아 놓은 ‘음성 인식 오류 DB(225)’, 다음으로 사용자 답변이 이전 문장과 동일하거나 또는 유사한 단어로 이루어진 유사한 어순의 문장 등이 출현하는 경우 등과 같이 사용자 답변을 오류로 판단할 수 있는 규칙들을 모아 놓은 ‘발화 오류 판별 DB(227)’ 등이 있으며 필요에 따라 다른 종류의 DB들도 추가될 수 있다. At this time, as the DB that can be constructed, first, the user frequently utters by intuition, but the 'service unsupported expression DB 221', which collects words and sentences that cannot be provided in a system to be actually serviced, and has a neutrality 'Intermediate expression DB (223)' that collects words or sentences, and 'Voice recognition error DB (225)' that collects frequent speech recognition error words, sentences, and patterns. There is a "speech error determination DB (227)" that collects rules for judging a user's answer as an error, such as when a sentence of similar word order composed of similar words appears, and other kinds of DBs are added as necessary. Can be.

상기 예외상황처리대상DB(220) 구축이 완료되면 구축된 DB를 이용해 예외 상황을 해결할 수 있도록 예외 상황 해소 정보 및 규칙들을 작성한다. When the exception situation processing target DB (220) construction is completed, the exception situation resolution information and rules are written to solve the exception situation using the built DB.

즉, 상기 '서비스 미지원 표현 DB(221)'을 이용해 이에 해당하는 표현이 입력되면 지원하지 않는 서비스라는 메시지를 출력할 수 있도록 하고 입력된 단어 유형에 따라 지원 가능한 관련 서비스를 제시할 수 있도록 준비한다.That is, when the corresponding expression is input using the 'service unsupported expression DB 221', the message 'service not supported' can be outputted and the related service that can be supported according to the input word type is prepared. .

그리고 상기 '중의적 표현 DB(223)'을 이용해서는 해당 표현의 대화이력(Dialog history)을 이용해 중의성이 해소될 수 있는 경우에는 식별자 정보를 부착해 이후 시스템에서 대화이력을 이용해 중의성을 해소하도록 하고 이것이 불가능한 경우는 해당 표현마다 중의성을 해소할 수 있도록 시스템 주도하의 오류 해소 부대화(subdialog)를 전개하도록 구성한다. If the neutrality can be resolved using the dialogue history of the corresponding expression using the 'intermediate expression DB 223', the identifier information is attached and the system then resolves the neutrality using the dialogue history. If this is not possible, the system should be configured to develop a system-driven error-resolving subdialog to resolve the neutrality of each expression.

다음으로 상기 '음성 인식 오류 DB(225)'를 이용해 오인식된 결과가 의미상 불가능한 표현이며 동일한 상황에서 동일하게 오인식 되는 경우는 자동 수정이 가능하도록 오류 해소 테이블을 작성하고, 그렇지 못한 경우에는 앞서와 마찬가지로 부대화를 전개할 수 있도록 구성하거나 멀티 모달(Multi-modal) 입력 장비가 사용 가능한 경우에는 이를 이용해 오류를 해소할 수 있도록 준비한다. 이를 응용해 만일 답변으로 음성 인식 오류 DB에 존재하는 표현이 등장할 가능성이 높을 것으로 예상되는 경우에는 미리 사용자에게 인식 성능을 향상시킬 수 있도록 발화해줄 것을 요청하도록 하거나 멀티모달 입력 장비의 이용이 가능하다면 이로 선 전환하여 초기 단계에서 음성 인식 오류를 방지하도록 규칙을 작성할 수도 있다. Next, if a result misrecognized using the 'voice recognition error DB 225' is a semantically impossible expression, and if the same misrecognition is performed in the same situation, an error resolution table is created so that automatic correction is possible. Similarly, the unit can be deployed for deployment, or if multi-modal input equipment is available, it can be used to resolve the error. If it is expected that the expression present in the speech recognition error DB is likely to appear as a response, ask the user to speak in advance to improve the recognition performance or if a multimodal input device is available. You can also switch to this to write rules to prevent speech recognition errors at an early stage.

그리고 상기 '발화 오류 판별 DB(227)'를 이용해서는 사용자의 해당 발화가 발화 오류 판별 규칙에 따른 오류에 해당하는지를 판단하여 오류일 가능성이 높은 발화인 경우에는 이를 해소할 수 있는 부대화를 전개할 수 있도록 준비한다.Further, by using the 'speech error determination DB 227', it is determined whether the corresponding speech of the user corresponds to an error according to the utterance error discrimination rule. Be prepared.

지금까지 언급한 예외 상황 해소 정보 및 규칙은 적용되는 시스템에 따라 다양하게 작성될 수 있으나 이해를 돕기 위하여 날씨 정보 제공 대화형 음성 인터페이스 시스템을 가정해 출현할 수 있는 오류와 이의 해소 방법에 대한 대표적인 실 시예를 표 1에 나타내었다.The exception resolution information and rules mentioned so far can be prepared in various ways depending on the applied system, but for the sake of understanding, errors that may occur assuming a weather-providing interactive voice interface system and representative methods for solving the exceptions Examples are shown in Table 1.

예외 발화Exception 예외 내용Exception 예외 해소 방법How to clear the exception 서비스 미지원 표현 DBService Unsupported Expression DB 세차해도 괜찮아? 올 여름 더워? 설악산 단풍 언제 부터지? Can I wash my car? Is it hot this summer? When did Mt. Seorak fall? 시스템의 능력이 단순 기상 정보만을 제공할 수 있는 데에 그치는데 반해 복잡한 추론이 필요한 날씨 정보를 요청While the system's capabilities can only provide simple weather information, it asks for weather information that requires complex inference. 빈번히 등장하는 서비스 미지원 발화의 경우 이에 대한 음성 인식을 가능하게 한 후, 이러한 발화가 입력으로 들어올 경우 시스템에서 제공 가능한 지역별 날씨 및 기상 상황을 안내함으로써 예외 상황 해소In the case of frequent unsupported utterances, it is possible to recognize voices and solve exceptions by guiding the local weather and weather conditions that can be provided by the system when such utterances are input. 중의적 표현 DBChinese expression DB 서울도 알려줘 서울은 어때 Please tell me how Seoul is 서울의 어떠한 것을 알려달라는 것인지 현재 문장만 가지고는 판단 불가능I can't judge what you want to know in Seoul based on the current sentence. 이러한 표현은 앞선 정보 요청 발화가 존재하는 상황에서만 가능하다는 것을 알 수 있으므로 미리 식별자 정보를 부착해 식별자 정보가 부착된 표현이 입력으로 들어올 경우 시스템에서 대화 이력을 이용해 중의성 해소It can be seen that such expression is possible only in the presence of the above information request utterance, so that when the expression with the identifier information is input as input, the system uses the conversation history to resolve the neutrality. 오늘 예상 온도는? What is the expected temperature today? 예상온도가 예상 최저기온을 묻는 것인지 예상 최고 기온을 묻는 것인지 판단 불가능It is not possible to determine whether the expected temperature asks for the expected minimum temperature or the expected maximum temperature. 중의성 해소가 가능한 부가적인 정보 없이 이러한 발화가 입력으로 들어올 경우 예상 최저 기온을 묻는 것인지 예상 최고 기온을 묻는 것인지 판단 가능한 부대화를 전개해 중의성 해소If such ignition comes into the input without additional information that can resolve the neutrality, it is possible to determine whether to ask for the expected minimum temperature or the expected maximum temperature. 음성 인식 오류 DBSpeech Recognition Error DB 제주도의 눈 Jeju Island 화자가 의도한 발화가 ‘제주도는?’인데 음운적 유사성 때문에 오인식 된 경우The speaker's intended speech is 'Jeju-do?' 빈번하게 오인식 되는 대상에 속하는 표현이 입력으로 들어올 경우 만일 ‘제주도 눈’이 언제나 동일하게 오인식 되는 표현이며 오인식된 발화가 의미를 가지지 못하는 경우라면 ‘제주도는?’을 발화한 것으로 자동 수정을 하고 그럴 수 없는 경우에는 부대화 전개로 중의성 해소 If the expression belonging to the frequently misunderstood object comes in as input, if 'Jeju Island' is always the same misrecognized expression, and if the misrecognized utterance does not have a meaning, it is automatically corrected by firing 'what is Jeju Island?' If you cannot, neutralize the problem 발화 오류 판별 DBIgnition error determination DB (‘오늘 온도 알려 줘’발화 후) 오늘 온도 알려달 라니까 ('Tell me the temperature today' after the ignition) 시스템이 응답을 제공했는데도 사용자가 이전 발화와 동일하거나 유사한 표현으로 재발화한 경우If the user provided a response and the user re fired with the same or similar expression as the previous one. 이러한 발화의 경우 차상위 인식 결과 정보를 활용해 사용자의 의도한 발화가 차상위 질문인지 묻거나 앞단계부터 대화를 재시작하는 등의 부대화를 전개한다. 만일 멀티모달 입력 장비가 사용 가능한 경우 멀티모달 입력 장비로 전환하며 이용이 불가능한 경우에는 유사 구문으로 발화할 것을 요청하는 등의 방법으로 예외 상황 해소 In case of such utterances, the second-level recognition result information is used to ask whether the user's intended utterance is the next-level question or to restart the conversation from the previous step. If a multi-modal input device is available, switch to the multi-modal input device, and if it is not available, resolve the exception by requesting to fire a pseudo-syntax.

도 3 은 본 발명에 따른 예외 상황 처리방법을 나타낸 흐름도로서, 표 1에서 나타내고 있는 것과 같이 준비된 예외상황처리대상DB(220) 및 오류 해소 정보와 규칙들을 적용하여 설명하면 다음과 같다.3 is a flowchart illustrating an exception handling method according to the present invention, which is described below by applying the prepared exception handling target DB 220 and error solving information and rules as shown in Table 1 below.

먼저, 음성 인식부(110)를 통해 사용자의 발화를 텍스트로 변환하여 음성을 인식한다(S10).First, the voice recognition unit 110 converts the user's speech into text to recognize the voice (S10).

이어 예외 상황 처리부(121)는 사용자로부터 입력된 발화에 앞서 상기 인식된 음성 인식 결과로서 입력된 내용이 미리 구축된 각각의 예외상황처리대상DB(220)에 해당되는 표현이 존재하는지 검사한다(S20).Subsequently, the exception processing unit 121 checks whether there is an expression corresponding to each exception situation processing target DB 220 in which the input content as a result of the recognized speech recognition is pre-established before the utterance input from the user (S20). ).

그리고 상기 검사 결과 해당하는 표현이 존재할 경우 각 예외 상황 DB의 해당 표현마다 작성된 오류 해소 정보 및 규칙에 따라 오류를 해결한다. If the corresponding expression exists as a result of the check, the error is resolved according to the error resolution information and rule written for the corresponding expression of each exception situation DB.

즉, 상기 검사결과(S20) 실제 서비스될 시스템에서 제공할 수 없는 단어 및 문자들로 인한 서비스 미지원 발화로 판별되면(S30) 서비스 미지원 생성 및 지원가능 서비스를 제시한다(S40). That is, if it is determined that the service is not supported speech due to the words and characters that cannot be provided in the system to be actually serviced (S30) (S30), the service is not supported and generates a supportable service (S40).

그리고 상기 검사결과(S20), 중의성을 가진 단어 또는 문장들로 인한 중의적 표현 포함 발화로 판별되면(S50) 히스토리를 이용한 중의성 해결 가능여부를 다시 판단한다(S60). 그리고 히스토리를 이용해 중의성 해결이 가능하면 중의성을 해소하고(S70), 상기 히스토리를 이용해 중의성 해결이 불가능하면 중의성 해소에 따른 부대화를 이용하여 중의성을 해소한다(S80).When it is determined that the check result (S20), the speech containing the neutral expression due to the word or sentence having a neutrality (S50), it is determined again whether or not the neutrality can be solved using the history (S60). If neutrality can be solved using history, neutrality is solved (S70). If neutrality cannot be solved using history, neutrality is solved using incidentalization according to neutralization.

또한, 상기 검사결과(S20), 빈번한 음성 인식 오류 단어, 문장, 패턴으로 인한 음성인식 오류예상 발화로 판별되면(S90) 오류해소 테이블을 이용한 음성인식 오류의 해결 가능여부를 다시 판단한다(S100). 그리고 상기 오류해소 테이블을 이용해 해결이 가능하면 이를 통해 음성인식오류를 해소하고(S110), 상기 오류해소 테이블을 이용해 해결이 불가능하면 인식오류 해소에 따른 부대화를 이용하여 중의성 해소 또는 멀티 모달 입력 장비로 전환한다(S120).In addition, if it is determined that the voice recognition error expected speech due to the check result (S20), frequent voice recognition error word, sentence, pattern (S90), it is determined again whether the voice recognition error can be solved using the error solving table (S100). . And if it is possible to solve using the error elimination table to solve the voice recognition error through this (S110), if it is not possible to solve using the error elimination table, neutral resolution or multi-modal input by using the side of the recognition error resolution Switch to the equipment (S120).

아울러, 상기 검사결과(S20), 이전 문장과 동일하거나 또는 유사한 단어로 이루어진 유사한 어순의 문장 등이 출현하는 경우와 같이 사용자 답변이 발화오류로 판별되면(S130), 발화오류 해소에 따른 부대화를 이용하여 오류를 해소한다(S140). 또한, 이렇게 오류를 해소한 상황은 발화 오류 판별 DB에 추가로 저장해 이후에 동일한 상황에 동일한 발화가 입력으로 들어왔을 경우에는 부대화를 전개하지 않고도 오류를 해결할 수 있도록 한다(S150).In addition, if the user response is determined to be a utterance error, such as when the check result (S20), a sentence of a similar word order composed of the same or similar words as the previous sentence, etc. (S130), the incidentalization according to the resolution of the utterance error Solve the error by using (S140). In addition, the situation in which the error is resolved is additionally stored in the utterance error determination DB, so that when the same utterance is input to the same situation later, the error can be solved without developing the additional unit (S150).

이렇게 예외 상황 처리부(121) 결과에 따라 대화 처리를 진행하여 출력문을 생성하고(S160), 이 출력문을 토대로 음성 합성부(120)에서 합성음을 생성하게 된다(S170). In this way, the dialogue processing is performed according to the result of the exception processing unit 121 to generate an output sentence (S160), and the speech synthesis unit 120 generates a synthesized sound based on the output sentence (S170).

이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 또한 설명하였으나, 본 발명은 상기한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시 가능한 것은 물론이고, 그와 같은 변경은 기재된 청구범위 내에 있게 된다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the above-described embodiments, and the present invention is not limited to the above-described embodiments without departing from the spirit of the present invention as claimed in the claims. Various modifications may be made by those skilled in the art, and such modifications are intended to fall within the scope of the appended claims. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

이상에서 설명한 바와 같은 본 발명에 따른 대화형 음성 인터페이스 시스템에서의 예외 상황 처리 방법 및 장치는 종래 방법에 비해 예외 상황 처리를 효과적으로 할 수 있게 되므로 사용자의 시스템에 대한 만족도를 개선하고 시스템의 작업 성공률을 높이는 효과가 있다.As described above, the method and apparatus for handling exceptions in the interactive voice interface system according to the present invention can effectively handle exceptions compared to the conventional method, thereby improving user satisfaction with the system and improving the success rate of the system. The height is effective.

Claims

delete

(a) constructing an exception situation corpus indicating an exception condition and a speech recognition error in a speech conversation corpus constructed of a speech file collected in a simulated conversation manner and a speech transcription text and a speech recognition result text for the speech file;

(b) constructing at least one or more of a service unsupported expression DB, a median expression DB, a speech recognition error DB, and a speech error determination DB required for exception handling based on the constructed exception situation corpus;

(c) generating and storing exception resolution information and rules in the plurality of DBs;

(d) if the expression corresponding to the exception is uttered by the user, handling the exception using the generated exception resolution information and rule

Exception handling method comprising a.

The method of claim 2, wherein step (c) comprises:

Generating and storing the exception resolution information and rule for each exception situation stored in the service unsupported expression DB, the expression expression DB, the speech recognition error DB, and the speech error determination DB, respectively.

Exception handling method comprising a.

The method of claim 2, wherein step (d)

Generating words and characters that can be provided by the system when a service-unsupported expression due to words and sentences that cannot be provided by the system to be serviced is uttered; And

Presenting a supportable service using the generated words and letters

Exception handling method comprising a.

The method of claim 2, wherein step (d)

Determining whether the neutrality resolution is possible by using a history when a neutral expression including words or sentences having neutrality is spoken;

Resolving the neutrality when the neutrality resolution is possible using the history as a result of the determination; And

If it is impossible to resolve the neutrality by using the history as a result of the determination, resolving the neutrality by using the subsidiary according to the neutral resolution.

Exception handling method comprising a.

The method of claim 2, wherein step (d)

Determining whether a speech recognition error can be solved using an error resolution table when a speech recognition error expected expression due to frequent speech recognition error words, sentences and patterns is uttered;

Resolving the voice recognition error by using the error resolving table when it is possible to solve the voice recognition error using the error resolving table as a result of the determination; And

If it is impossible to solve the speech recognition error using the error elimination table as a result of the determination, resolving the neutrality or switching to a multi-modal input device by using the auxiliary according to the resolution of the recognition error;

Exception handling method comprising a.

The method of claim 2, wherein step (d)

If the user response is determined to be an error consisting of a sentence of similar word order consisting of the same or similar sentences to the previous sentence, using the incidental to solve the utterance error to resolve the error

Exception handling method comprising a.

An interactive voice including a speech recognition unit for converting a user's speech into text, a dialogue processing unit for receiving the converted text and generating an output sentence for speech synthesis, and a speech synthesis unit for generating synthesized sounds based on the generated output sentence. In the interface device,

A speech conversation corpus formed by adding the speech recognition result text converted by the speech recognition unit by using one or more of the speech file collected by the simulated conversation method, the voice transcription, and the collected speech file;

An exception situation corpus consisting of an exception and a speech recognition error in response to the speech conversation corpus;

An exceptional situation processing target DB including specific information corresponding to the exceptional situation using the configured exception situation corpus, and a DB;

And an error resolving processor for creating error resolving information and rules for each expression by using the information configured in the exception situation processing target DB.

The method of claim 8,

The speech conversation corpus is an exception handling apparatus, characterized in that it is constructed in a simulated conversation method of the Wizard-of-Oz-Simulating (WOZ) method.

The method of claim 8, wherein the exception situation processing target DB,

Service unsupported representation DB consisting of words and sentences that the user frequently speaks but cannot be provided in the system to be serviced;

A neutral expression DB consisting of words or sentences having neutrality;

Frequent speech recognition error DB consisting of a speech recognition error word, sentence and pattern; And

Spoken error determination DB consisting of rules for judging the input sentence as an error if the user answer is a sentence of a similar word order composed of the same or similar words as the previous sentence

An exception handling apparatus comprising a.