KR100768731B1

KR100768731B1 - A VoiceXML Dialogue apparatus based on Speech Act for Controlling Dialogue Flow and method of the same

Info

Publication number: KR100768731B1
Application number: KR1020060059135A
Authority: KR
Inventors: 박경현; 김상훈
Original assignee: 한국전자통신연구원
Priority date: 2005-12-05
Filing date: 2006-06-29
Publication date: 2007-10-19
Also published as: KR20070058952A

Abstract

본 발명은 음성대화인터페이스분야에 관한 것으로, 특히 대화흐름 제어를 위한 화행기반 VoiceXML(Voice Extensible Markup Language) 대화장치 및 방법에 관한 것으로, 입력되는 화자의 발성에 대한 대화내용에 따른 대화관리를 처리하는 대화 관리자와, DDML(Dialogue Description Markup Language)을 통한 대화내용에서 추출한 화행정보를 중심으로 상기 대화 관리자의 대화흐름을 제어하는 VoiceXML 모듈을 포함하여 구성되는데 있다.The present invention relates to the field of voice conversation interface, and more particularly, to a speech act-based VoiceXML (Voice Extensible Markup Language) conversation apparatus and method for controlling conversation flow. And a VoiceXML module for controlling the dialogue flow of the dialogue manager based on dialogue act information extracted from the dialogue contents through DDML (Dialogue Description Markup Language).

DDML, VoiceXML, DDML, VoiceXML,

Description

A VoiceXML Dialogue apparatus based on Speech Act for Controlling Dialogue Flow and method of the same}

도 1 은 종래의 VoiceXML 대화 시스템에서 날씨검색 대화영역에 따른 시나리오 예제를 나타낸 도면1 is a diagram illustrating an example scenario according to a weather search dialog area in a conventional VoiceXML dialog system.

도 2 는 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화장치의 구성을 나타낸 도면2 is a diagram showing the configuration of a speech act based VoiceXML dialogue apparatus for dialogue flow control according to the present invention;

도 3 은 본 발명에 따른 VoiceXML 대화장치의 구성에서 DDML의 일부를 나타낸 일실시예3 is a diagram illustrating a part of DDML in the configuration of a VoiceXML dialog according to the present invention.

도 4 는 본 발명에 따른 VoiceXML 대화장치의 구성에서 DDML의 DTD를 나타낸 일실시예Figure 4 is an embodiment showing the DTD of the DDML in the configuration of the VoiceXML chat device according to the present invention

도 5 는 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법을 나타낸 흐름도5 is a flowchart illustrating a dialogue act based VoiceXML dialogue method for dialogue flow control according to the present invention.

도 6 은 본 발명에 따른 VoiceXML 대화방법에서 다중 대화흐름을 반영한 DDML의 일부를 나타낸 일실시예6 is a diagram illustrating a part of DDML reflecting multiple conversation flows in a VoiceXML dialogue method according to the present invention;

도 7 은 본 발명에 따른 VoiceXML 대화방법에서 화행 기반의 VoiceXML의 일부를 나타낸 일실시예7 is a diagram illustrating a part of speech XML based voiceXML in a VoiceXML dialogue method according to the present invention;

도 8 은 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법의 일실시예8 is an embodiment of a dialogue act-based VoiceXML dialogue method for dialogue flow control according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

100 : VoiceXML 대화부 110 : 대화관리자100: VoiceXML chat unit 110: chat manager

112 : 음성 인식기 114 : 대화 관리기112: speech recognizer 114: conversation manager

116 : 음성 합성부 120 : VoiceXML 인터프리터116: speech synthesizer 120: VoiceXML interpreter

200 : 오프라인 부 210 : 웹서버200: offline part 210: web server

212 : VoiceXML 문서 220 : DDML 문서212: VoiceXML document 220: DDML document

230 : 대화 시나리오 240 : DDML2VoiceXML230: Conversation scenario 240: DDML2VoiceXML

250 : Scenario2DDML250: Scenario2DDML

본 발명은 음성대화인터페이스분야에 관한 것으로, 특히 대화흐름 제어를 위한 화행기반 VoiceXML(Voice Extensible Markup Language) 대화장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of voice conversation interfaces, and more particularly, to a speech act-based VoiceXML (Voice Extensible Markup Language) conversation apparatus and method for dialogue flow control.

상기 VoiceXML은 음성기반의 인터넷 표준언어로서, 웹기반의 서비스 시나리오를 HTML 언어를 통하여 PC상에 구현하여 제공하는 것이 인터넷 서비스라고 한다면, 이에 대응되는 개념으로 VoiceXML 언어를 통하여 대화형 음성 응답장치(Interactive Voice Response : IVR)에 해당하는 VoiceXML 플랫폼 상에 구현하여 전화음성을 통하여 제공하는 서비스이다. The VoiceXML is a voice-based Internet standard language. If the Internet-based service is implemented by providing a web-based service scenario on a PC through an HTML language, the corresponding voiceXML is an interactive voice response device through the VoiceXML language. Voice Response: It is a service that is implemented on VoiceXML platform corresponding to IVR) and provided through telephone voice.

이처럼 상기 VoiceXML은 기본적으로 전화음성을 이용하여 웹상의 데이터를 제어할 수 있도록 해주는 마크업 언어로 현재 대화 시스템에서 이용되고 있으며, 이는 대화관리의 측면에서 볼 때, VoiceXML이 대화흐름을 관리하기 때문에 개발자는 기존의 대화시스템에서 제어할 수 없었던 대화흐름의 제어를 가능하게 해주는 장점을 가지게 된다.As such, the VoiceXML is basically a markup language that enables users to control data on the web using telephone voice. Currently, the VoiceXML is used in a conversation system. Has the advantage of enabling the control of the conversation flow that could not be controlled in the existing conversation system.

하지만 상기 VoiceXML은 화자의 발성을 바탕으로 대화 시나리오를 기술하기 때문에 대화흐름을 기술하는데 있어 제약이 따르게 된다. However, since the VoiceXML describes a conversation scenario based on the speaker's voice, there is a limitation in describing the conversation flow.

예를 들어, VoiceXML이 "안녕하세요" 라는 화자의 발성에 대해서 다음 대화를 진행시키도록 기술되어 있다면 "안녕"과 같이 화자의 의도는 같지만 발성이 틀릴 경우 VoiceXML은 더 이상 대화를 진행시키지 않는다.For example, if VoiceXML is described to proceed with the next conversation about the speaker's "Hello," VoiceXML will not proceed any further if the speaker's intention is the same as "Hello" but the voice is wrong.

이해를 돕기 위해 이와 같은 내용에 따른 일 실시예를 도면을 통해 설명하면 다음과 같다.To help understand, one embodiment according to the same will be described with reference to the drawings.

도 1 은 종래의 VoiceXML 대화 시스템에서 날씨검색 대화영역에 따른 시나리오 예제를 나타낸 도면이다. 1 is a diagram illustrating an example scenario according to a weather search dialogue area in a conventional VoiceXML dialogue system.

도 1을 참조하여 설명하면, 먼저 사용자(User)가 "써니"라는 로봇을 호출하고, 로봇은 사용자의 호출에 "무엇을 도와드릴까요"라고 응답하고 있다. 이어 사용자가 다시 "오늘 대전 날씨는 어때?"라고 날씨 검색을 요청하고, 로봇은 해당 날씨를 검색하여 사용자에게 "오늘 대전지역의 날씨는 맑습니다"라고 제공하고 있다. Referring to FIG. 1, a user first calls a robot called "Sunny", and the robot responds to the user's call with "What can I help you with?" Then, the user requests a weather search again, "How is the weather in Daejeon today?", And the robot searches for the weather and provides the user with "The weather in Daejeon is clear".

그러나 이 시나리오를 기준으로 할 때, 상기 VoiceXML 시스템의 경우 사용자 가 "오늘 대전 날씨 어때?"라고 질의를 할 경우에만 날씨정보를 제공하고, 그 이외의 질의인 "오늘 대전 날씨 좀 알려줘" 또는 "오늘 대전 날씨는 좋아?"와 같은 질의에 대해서는 날씨정보를 제공하지 못하게 된다. However, based on this scenario, the VoiceXML system provides weather information only when the user inquires "How is the weather in Daejeon today?", And other queries such as "Please tell me the weather in Daejeon" or "Today." Do you like weather in Daejeon? "

그 이유는 VoiceXML 문서에 대화내용 자체인 "오늘 대전 날씨 어때?"를 기술하고 있기 때문이다. 즉, VoiceXML 대화 시스템은 VoiceXML 문서에 저장되어 있는 대화내용과 사용자로부터 들어온 입력인 "오늘 대전 날씨 어때?"와 서로 매칭되지 않으면 대화를 더 이상 진행시키지 않게 된다. This is because the VoiceXML document describes the dialogue itself, "How's the weather in Daejeon today?" In other words, the VoiceXML conversation system will not proceed any further conversation if it does not match the conversation contents stored in the VoiceXML document and the input from the user, "How about weather today?"

이처럼 기존의 VocieXML 시스템은 개발자가 대화시스템의 내부구조를 모르더라도 대화시나리오를 작성하여 대화시스템에 쉽게 적용할 수 있기 때문에 많은 상용 시스템에 적용되어오고 있다. 하지만 VoiceXML이 대화내용을 직접 기술하기 때문에 대화의 처리에 많은 제약이 발생한다는 단점을 가지고 있다. As such, the existing VocieXML system has been applied to many commercial systems because the developer can easily apply the dialogue scenario to the dialogue system even if the developer does not know the internal structure of the dialogue system. However, since VoiceXML directly describes the contents of the conversation, it has a disadvantage in that a lot of restrictions are placed on the processing of the conversation.

따라서 현재 각종 안내시스템이나 예약시스템과 같이 시스템 주도적인 대화로 처리할 수 있는 영역에 한정되어 사용화가 이루어지고 있다.Therefore, the use is currently limited to the area that can be handled by the system-driven dialogue, such as various guide systems and reservation systems.

이와 같이, 상기 설명한 종래 기술에 따른 VoiceXML(Voice Extensible Markup Language) 대화 시스템은 다음과 같은 문제점이 있다.As described above, the VoiceXML (Voice Extensible Markup Language) dialog system according to the related art described above has the following problems.

첫째, 종래의 VoiceXML 대화시스템은 화자의 발성을 기반으로 대화흐름을 정의하고 있기 때문에 대화를 진행하는데 있어서 유연성을 제공하지 못하는 문제점이 있다.First, the conventional VoiceXML dialogue system has a problem in that it does not provide flexibility in proceeding with the dialogue because it defines the dialogue flow based on the speaker's voice.

둘째, 종래의 VoiceXML 대화시스템은 대화흐름이 이미 정의되어 있어서 대화 영역의 변경이나 대화영역내의 대화흐름의 변경이 용이하지 않은 문제점이 있다.Second, the conventional VoiceXML chat system has a problem that the chat flow is already defined and the chat flow in the chat area is not easily changed.

따라서 본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, VoiceXML과 DDML(Dialogue Description Markup Language)를 적용하여 자유로운 대화흐름 제어할 수 있는 VoiceXML 대화장치 및 방법을 제공하는데 그 목적이 있다.Accordingly, an object of the present invention is to provide a VoiceXML dialogue apparatus and method capable of freely controlling a conversation flow by applying VoiceXML and Dialogue Description Markup Language (DDML).

본 발명의 다른 목적은 화행의 기반으로 기술되는 DDML로부터 변환된 VoiceXML을 통해 대화흐름의 수정 및 변경이 용이한 VoiceXML 대화장치 및 방법을 제공하는데 있다.Another object of the present invention is to provide a VoiceXML dialogue apparatus and method for easily modifying and changing a dialogue flow through VoiceXML converted from DDML described based on a dialogue act.

본 발명의 또 다른 목적은 대화관리와 대화흐름의 관리를 각각 독립적으로 구성함으로써 개발자로 하여금 보다 유연성 있게 대화흐름을 관리할 수 있도록 하는데 있다.Still another object of the present invention is to enable a developer to manage conversation flow more flexibly by configuring conversation management and conversation flow management independently.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화장치의 특징은 입력되는 화자의 발성에 대한 대화내용에 따른 대화관리를 처리하는 대화 관리자와, DDML(Dialogue Description Markup Language)을 통한 대화내용에서 추출한 화행정보를 중심으로 상기 대화 관리자의 대화흐름을 제어하는 VoiceXML 모듈을 포함하되,상기 대화 관리자는 화자의 음성을 인식하여 인식결과를 대화관리자에게 전달하는 음성 인식기와, 음성인식결과를 분석하여 화행을 추출하고 추출된 화행과 함께 상기 VoiceXML 모듈을 통해 웹서버로부터 로딩한 VoiceXML 문서를 바탕으로 합성문장을 생성하는 대화 관리기와, 상기 대화 관리기에서 생성된 합성문장을 합성하여 화자에 응답하는 음성 합성기를 더 포함하는 것을 특징으로 한다.Features of the dialogue act-based VoiceXML dialog apparatus for controlling the conversation flow according to the present invention for achieving the above object is a dialog manager for processing the dialog management according to the dialogue content for the utterance of the input speaker, and DDML (Dialogue Description Markup) And a VoiceXML module for controlling the dialogue flow of the dialogue manager based on the dialogue act information extracted from the dialogue contents through a language), wherein the dialogue manager recognizes the speaker's voice and delivers the recognition result to the dialogue manager. Analyzing speech recognition results, extracting dialogue acts, and generating dialogue sentences based on VoiceXML documents loaded from the web server through the VoiceXML module together with extracted dialogue acts, and synthesizing the synthetic sentences generated by the dialogue manager. It further comprises a speech synthesizer for responding to the speaker.

삭제delete

바람직하게 상기 VoiceXML 모듈은 상기 대화 관리자를 통해 대화내용에서 추출한 화행정보에 상응하여 오프라인 방식으로 특정영역의 대화 DB로부터 추출된 대화시나리오를 DDML 문서로 생성하는 Scenario2DDML와, 상기 생성된 DDML 문서를 화행기반의 VoiceXML 문서로 변환하여 웹서버에 저장하는 Scenario2DDML와, 상기 대화 관리자에서 전달된 화행을 기반으로 상기 웹서버에 저장되어 있는 VoiceXML 문서를 로딩하여 처리한 화행형태의 VoiceXML의 마크업언어를 대화 관리자에게 전달하는 VoiceXML 인터프리터를 포함하여 구성되는 것을 특징으로 한다.Preferably, the VoiceXML module uses Scenario2DDML to generate a dialogue scenario extracted from a dialogue DB of a specific region as a DDML document in an offline manner corresponding to the dialogue act information extracted from the dialogue contents through the dialogue manager, and based on the dialogue DDML document. Scenario2DDML which converts VoiceXML document into a web server and stores it in a web server, and a speech manager markup language of a speech act type VoiceXML that is loaded and processed based on a dialogue act transmitted from the dialogue manager. It comprises a VoiceXML interpreter to deliver.

바람직하게 상기 생성된 DDML 문서가 개발자가 요구하는 대화흐름에서 벗어나거나 또는 보다 세밀한 대화흐름의 제어를 통해 보완하는 DDML 에디터를 더 포함하는 것을 특징으로 한다.Preferably, the generated DDML document may further include a DDML editor which complements the conversation flow required by the developer or controls the detailed conversation flow.

바람직하게 상기 DDML은 각 대화를 발화객체(<object>), 화행(<action>), 대상(<target>)을 포함하는 상태(state) 단위로 구성하는 것을 특징으로 한다.Preferably, the DDML is configured to configure each conversation in a state unit including a speech object (<object>), a dialogue act (<action>), and a target (<target>).

상기와 같은 목적을 달성하기 위한 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법의 특징은 (a) 화자가 발상하면 화자의 음성을 인식하여 인식결과를 출력하는 단계와, (b) 상기 음성인식결과를 분석하여 대화내용에서 화행정보를 추출하는 단계와, (c) 상기 대화내용에서 추출한 화행정보에 상응하여 오프라인 방식으로 특정영역의 대화 DB로부터 대화시나리오를 추출하는 단계와, (d) 상기 추출된 대화시나리오로부터 화행을 추출하여 다중 대화흐름을 반영한 DDML로 표현된 DDML문서를 생성하는 단계와, (e) 상기 생성된 상기 DDML 문서를 화행기반의 VoiceXML 문서로 변환하여 웹서버에 저장하는 단계와, (f) 상기 웹서버에 저장되어 있는 VoiceXML 문서를 로딩하여 상기 추출한 화행정보를 바탕으로 응답화행을 생성함으로써 대화를 처리하는 단계와, (g) 상기 생성된 응답화행정보를 바탕으로 응답문장을 생성하고 합성하여 화자에게 응답하는 단계를 포함하여 이루어지는 것을 특징으로 한다.To achieve the above object, a feature of the dialogue act-based VoiceXML dialogue method for dialogue flow control according to the present invention includes (a) recognizing a speaker's voice and outputting a recognition result when the speaker conceives; (B) extracting dialogue act information from the dialogue contents by analyzing the speech recognition result; (c) extracting dialogue scenarios from the dialogue DB of a specific region in an offline manner corresponding to the dialogue act information extracted from the dialogue contents; and (d) Extracting a dialogue act from the extracted dialogue scenario to generate a DDML document expressed in DDML reflecting multiple dialogue flows, and (e) converting the generated DDML document into a dialogue act based VoiceXML document and storing it in a web server. And (f) processing a conversation by loading a VoiceXML document stored in the web server and generating a response dialogue line based on the extracted dialogue act information. And, (g) characterized by comprising the step of generating a response sentence based on the generated response information and synthesized speech act in response to the speaker.

삭제delete

바람직하게 상기 (d) 단계는 표현된 DDML이 개발자가 요구하는 대화흐름에서 벗어나거나 또는 보다 세밀한 대화흐름의 제어가 필요하다고 판단되면 DDML을 추가로 보완하는 단계를 포함하는 것을 특징으로 한다.Preferably, step (d) further comprises supplementing the DDML if it is determined that the expressed DDML deviates from the conversation flow required by the developer or needs more detailed control of the conversation flow.

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments with reference to the accompanying drawings.

본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화장치 및 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다.Referring to the accompanying drawings, a preferred embodiment of the dialogue act-based VoiceXML dialogue apparatus and method for dialogue flow control according to the present invention will be described.

도 2 는 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화장치의 구성을 나타낸 도면이다.2 is a diagram illustrating a configuration of a speech act-based VoiceXML conversation apparatus for conversation flow control according to the present invention.

도 2와 같이, VoiceXML 대화 장치는 크게 VoiceXML 대화부(100)와 오프라인 부(200)로 정의되어 진다. 이때, 상기 오프라인 부(200)는 오프라인 상에서 대화흐름의 제어에 필요한 정보를 제공해 주기 위한 블록으로서, 실질적인 VoiceXML 대화 장치는 VoiceXML 대화부(100)에서 모든 동작이 이루어진다. As shown in FIG. 2, the VoiceXML conversation apparatus is largely defined as a VoiceXML conversation unit 100 and an offline unit 200. At this time, the offline unit 200 is a block for providing information necessary for the control of the conversation flow on the offline, and the actual VoiceXML dialog device performs all operations in the VoiceXML dialog unit 100.

상기 VoiceXML 대화부(100)는 입력되는 화자의 발성에 대한 인식, 의도파악 및 합성을 통한 출력 등 대화내용에 따른 대화관리를 처리하는 대화 관리자(110)와, 상기 대화 관리자(110)에서 전달된 화행에 상응하는 응답화행을 전달하는 VoiceXML 인터프리터(120)로 구성된다.The VoiceXML conversation unit 100 is a conversation manager 110 that processes conversation management according to a conversation content such as recognition of a speaker's utterance, intention grasping and output through synthesis, and the conversation manager 110. It consists of a VoiceXML interpreter 120 for delivering a response response corresponding to the speech act.

또한, 상기 대화 관리자(110)는 화자의 음성을 인식하는 음성 인식기(112)와, 음성인식결과를 분석하여 화행을 추출하고 추출된 화행과 함께 상기 VoiceXML 인터프리터(120)를 통해 웹서버로부터 로딩한 VoiceXML 문서를 바탕으로 합성문장을 생성하는 대화 관리기(114)와, 상기 대화 관리기(114)에서 생성된 합성문장을 합성하여 화자에 응답하는 음성 합성기(116)를 포함하여 구성된다. 따라서 상기 VoiceXML 인터프리터(120)는 상기 대화 관리기(114)에서 전달된 화행을 기반으로 상기 웹서버(210)에 저장되어 있는 VoiceXML 문서(212)를 로딩하여 처리한 화행형태의 VoiceXML의 마크업 언어를 대화 관리기(114)에게 전달한다.In addition, the conversation manager 110 analyzes a speech recognizer 112 that recognizes a speaker's voice, extracts a speech act by analyzing a speech recognition result, and loads it from the web server through the VoiceXML interpreter 120 together with the extracted speech act. The conversation manager 114 generates a compound sentence based on the VoiceXML document, and the voice synthesizer 116 synthesizes the compound sentence generated by the conversation manager 114 and responds to the speaker. Accordingly, the VoiceXML interpreter 120 loads and processes the markup language of SpeechXML VoiceXML, which is processed by loading the VoiceXML document 212 stored in the web server 210 based on the dialogue act transmitted from the conversation manager 114. Transfer to conversation manager 114.

그리고 오프라인 부(200)는 상기 VoiceXML 대화부(100)를 통해 대화내용에서 추출한 화행정보에 상응하여 오프라인 방식으로 특정영역의 대화 DB로부터 추출된 대화시나리오를 DDML 문서(220)로 생성하는 Scenario2DDML(240)와, 상기 생성된 DDML 문서(220)를 화행기반의 VoiceXML 문서(212)로 변환하여 웹서버(210)에 저장하는 Scenario2DDML(250)로 구성된다.In addition, the offline unit 200 generates a dialogue scenario extracted from the dialogue DB of a specific region as a DDML document 220 in an offline manner corresponding to the dialogue act information extracted from the dialogue contents through the VoiceXML dialogue unit 100. ), And the Scenario2DDML 250 that converts the generated DDML document 220 into a speech act based VoiceXML document 212 and stores it in the web server 210.

이때 상기 생성된 DDML 문서(220)가 개발자가 요구하는 대화흐름에서 벗어나거나 또는 보다 세밀한 대화흐름의 제어가 필요하면 DDML 에디터(미도시)를 이용하여 DDML을 보완할 수 있다. In this case, if the generated DDML document 220 deviates from the conversation flow required by the developer or needs more detailed control of the conversation flow, the DDML document 220 may be supplemented with a DDML editor (not shown).

도 3 은 본 발명에 따른 VoiceXML 대화장치의 구성에서 DDML의 일부를 나타낸 일실시예로서, 날씨검색 시나리오를 표현하고 있다.3 illustrates a weather search scenario as an example of a part of the DDML in the configuration of the VoiceXML dialog according to the present invention.

도 3에서 보인 것과 같이, 상기 DDML은 각 대화를 상태(state) 단위로 구성되고, 각 상태는 다시 발화객체(<object>), 화행(<action>), 대상(<target>)을 포함하고 그 외의 추가정보를 포함하여 구성된다.As shown in FIG. 3, the DDML includes each conversation in state units, and each state again includes a utterance object (<object>), a dialogue act (<action>), and a target (<target>). Other additional information is included.

이처럼 상기 DDML은 화행을 기반으로 대화흐름을 기술하기 위한 마크업 언어로서, 개발자는 DDML을 이용하여 4에서 나타내고 있는 DDML의 DTD(Document type definition)에 따라 대화시나리오(230)로부터 자동으로 DDML문서(220)를 생성할 수 있다. 따라서, 개발자는 오프라인 방식을 통한 DDML을 이용하여 대화시나리오의 변경 및 수정을 용이하게 할 수 있다.As described above, the DDML is a markup language for describing a conversation flow based on a dialogue act, and the developer automatically uses the DDML document from the dialogue scenario 230 according to the document type definition (DTD) of the DDML indicated by 4. 220). Therefore, the developer can easily change and modify the conversation scenario by using the DDML in an offline manner.

이와 같이 구성된 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법의 동작을 첨부한 도면을 참조하여 상세히 설명하면 다음과 같다.The operation of the dialogue act-based VoiceXML dialogue method for dialogue flow control according to the present invention configured as described above will be described in detail with reference to the accompanying drawings.

도 5 는 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법을 나타낸 흐름도이다.5 is a flowchart illustrating a speech act based VoiceXML dialogue method for dialogue flow control according to the present invention.

도 5를 참조하여 설명하면 먼저, 화자가 발상하면 음성 인식기(112)는 화자의 음성을 인식하여 인식결과를 대화 관리기(114)에게 전달한다. Referring to FIG. 5, first, when a speaker thinks, the speech recognizer 112 recognizes the speaker's voice and transfers the recognition result to the conversation manager 114.

이어 상기 대화관리기(114)는 음성인식결과를 분석하여 화행정보를 추출하고 추출된 화행정보를 VoiceXML 인터프리터(120)로 전달한다. Subsequently, the conversation manager 114 analyzes the voice recognition result to extract speech act information and transfers the extracted speech act information to the VoiceXML interpreter 120.

그러면 웹서버에 저장되어 있는 VoiceXML 문서를 로딩하고 있는 VoiceXML 인 터프리터(120)는 대화관리기(114)로부터 입력받은 화행과 VoiceXML 문서를 바탕으로 대화를 진행시키고 응답문장에 상응하는 화행정보를 대화관리기(114)에게 전달한다.Then, the VoiceXML interpreter 120 loading the VoiceXML document stored in the web server proceeds with the dialogue based on the dialogue act and the VoiceXML document received from the dialogue manager 114, and displays dialogue act information corresponding to the response sentence. (114).

이때, 상기 VoiceXML 문서(212)는 화행을 기반으로 다음과 같은 과정을 통해 생성된다. At this time, the VoiceXML document 212 is generated through the following process based on speech acts.

먼저 특정영역의 대화 DB로부터 대화시나리오를 추출한다. First, the conversation scenario is extracted from the conversation DB of a specific area.

그리고 Scenario2DDML(126)을 통해 상기 추출된 대화시나리오로부터 화행을 추출하여 도 4와 같이 이루어지는 DDML DTD(Document Type Definition)에 의거하여 다중 대화흐름을 반영한 DDML로 표현된 DDML문서(220)를 생성한다. 이때 표현된 DDML이 개발자가 요구하는 대화흐름에서 벗어나거나 또는 보다 세밀한 대화흐름의 제어가 필요하면 개발자가 DDML문서를 보완할 수 있다. Then, the dialogue act is extracted from the extracted dialogue scenario through Scenario2DDML 126 to generate a DDML document 220 expressed in DDML reflecting multiple dialogue flows based on the DDML Document Type Definition (DTD) as shown in FIG. 4. At this time, if the expressed DDML is out of the dialogue flow required by the developer or if more detailed control of the dialogue flow is required, the developer can supplement the DDML document.

여기서 상기 DDML문서의 수정은 대화를 보다 유연하게 처리하게 하기 위해 필요하다. 즉, 일반적으로 대화시나리오는 대화 DB로부터 추출하여 생성하기 때문에 일반적인 대화흐름만이 기술될 확률이 높다. 따라서 DDML을 로딩하여 예외상황 처리, 영역에 종속적인 대화처리 등이 필요하게 된다.The modification of the DDML document is necessary here to make the conversation more flexible. That is, since the conversation scenario is generally generated by extracting from the conversation DB, it is highly likely that only the general conversation flow will be described. Therefore, by loading DDML, exception handling and area-dependent dialog processing are required.

상기 다중 대화흐름을 반영한 DDML의 일부를 나타낸 일실시예를 도 6에서 보여주고 있다.An embodiment showing a part of the DDML reflecting the multiple conversation flow is shown in FIG. 6.

이렇게 생성된 상기 DDML 문서(220)는 DDML2VoiceXML(124)을 통해 다시 화행기반의 VoiceXML 문서(212)로 변환하여 웹서버(210)에 저장되어진다.The DDML document 220 generated as described above is converted into a speech act based VoiceXML document 212 through the DDML2VoiceXML 124 and stored in the web server 210.

상기 화행 기반의 VoiceXML의 일부를 나타낸 일실시예를 도 7에서 보여주고 있다.An embodiment showing a part of the speech act based VoiceXML is illustrated in FIG. 7.

상기 DDML은 화행을 기반으로 대화흐름을 기술하기 위한 마크업 언어로 오프라인 방식으로 대화 시나리오로부터 DDML을 거쳐 VoiceXML을 생성하는 방식으로 개발자는 DDML을 이용하여 대화시나리오의 변경 및 수정을 용이하게 할 수 있다.The DDML is a markup language for describing a conversation flow based on a dialogue act. The DDML generates a VoiceXML through a DDML from a conversation scenario in an offline manner. The developer can easily change and modify a conversation scenario using the DDML. .

계속해서 설명하면 상기 대화관리기(114)는 상기 VoiceXML 인터프리터(120)에서 전달된 화행정보를 바탕으로 응답문장을 생성하고 생성된 응답문장을 음성 합성부(116)로 전달한다. In the following description, the conversation manager 114 generates a response sentence based on the dialogue act information transmitted from the VoiceXML interpreter 120 and transmits the generated response sentence to the voice synthesizer 116.

그러면 최종적으로 상기 음성 합성부(116)는 응답문장을 합성하여 화자에 응답한다.Finally, the speech synthesizer 116 synthesizes a response sentence and responds to the speaker.

도 8 은 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화방법의 일실시예로서, 날씨검색을 주제로 하는 다중 대화흐름을 화행으로 표현한 도면이다.FIG. 8 is a diagram illustrating multiple dialogue flows based on a weather search as a dialogue act as an embodiment of a dialogue act based VoiceXML dialogue method for dialogue flow control according to the present invention.

도 8을 참조하여 설명하면, 먼저 화자가 로봇을 호출하면 대화관리자(110)는 대화내용에서 추출한 화행정보(system_call)를 추출하여 VoiceXML 인터프리터(120)에 전달한다(S100).Referring to FIG. 8, when the speaker calls the robot, the conversation manager 110 extracts the dialogue act information (system_call) extracted from the conversation contents and transmits the dialogue act information (system_call) to the VoiceXML interpreter 120 (S100).

그리고 상기 VoiceXML 인터프리터(120)는 대화 관리자(110)에게 응답화행(call_response)를 리턴한 후 다음 대화의 화행정보를 기다린다(S200).The VoiceXML interpreter 120 returns a call_response to the conversation manager 110 and then waits for information on a conversation of the next conversation (S200).

그리고 화자가 "오늘 대전 날씨 알려줘"라고 발성하면 대화관리자는 다시 날씨검색에 해당하는 화행(search_weather_date_place)을 추출한다(S300).And when the speaker utters "tell me the weather of Daejeon today", the conversation manager again extracts a dialogue act (search_weather_date_place) corresponding to the weather search (S300).

이때, 실제 대화방법에서는 대화흐름을 정의할 때 사용자의 반응에 따라 여 러 대화흐름으로 분기되기 때문에 상기 DDML은 다중 대화흐름을 기술할 수 있어야 한다. In this case, the DDML should be able to describe multiple conversation flows because in the actual conversation method, the conversation flow is branched into several conversation flows according to the user's response.

예를 들어, 날씨검색 대화영역에서 도 1에서처럼 사용자는 "오늘 대전 날씨 어때?" 라고 날짜 및 지역을 함께 날씨에 따른 질의(search_weather(date,place))할 수 있지만 사용자는 "오늘 날씨 어때?", "대전 날씨는 어때?", 또는 "날씨 알려줄래?" 와 같이 날짜만 포함된 날씨에 따른 질의(search_weather(date)), 지역만 포함된 날씨에 따른 질의(search_weather(place)) 또는 날짜 및 지역없이 날씨에 따른 질의(search_weather)를 할 수 있다. For example, in the weather search dialog, as in Figure 1, the user asks, "How is the weather in Daejeon today?" You can query the date and region according to the weather (search_weather (date, place)), but you can say "How is the weather today?", "How is the weather in War?", Or "Will you tell me the weather?" As such, a query based on weather including only a date (search_weather (date)), a weather based query including only a region (search_weather (place)) or a weather based query without a date and region (search_weather) may be performed.

이처럼 각 발화는 날씨검색이라는 화자의 의도는 동일하지만 발화 내에 포함되어 있는 정보가 서로 다르기 때문에 각기 다른 대화흐름을 가져야 한다. As such, each utterance has the same intention as the speaker, but because the information contained in the utterance is different from each other, they must have different conversation flows.

따라서, 이와 같은 다중 대화흐름을 기술하기 위해 DDML은 도 8에서 나타내고 있는 것과 같이 if, switch, goto, link등과 같은 대화분기 제어를 위한 엘리먼트와 속성을 정의한다.Accordingly, in order to describe such a multi-conversation flow, DDML defines elements and attributes for dialog branch control such as if, switch, goto, link, etc. as shown in FIG.

그러므로 VoiceXML 인터프리터(120)는 다중대화흐름이 기술된 DDML문서로부터 변환된 VoiceXML 문서를 로딩하여 해당 대화를 처리하고 응답화행을 대화관리자(110)에게 리턴한다(S400).Therefore, the VoiceXML interpreter 120 loads the VoiceXML document converted from the DDML document in which the multi-conversation flow is described, processes the corresponding conversation, and returns the response line to the conversation manager 110 (S400).

이렇게 응답화행을 리턴받은 대화관리기(114)는 응답화행에 상응하는 응답문장을 생성하고 생성한 응답문장을 음성합성부(116)에 전달한다.The conversation manager 114 which has returned the response dialogue line generates a response sentence corresponding to the response dialogue line and transmits the generated response sentence to the voice synthesizer 116.

이와 같은 화행기반의 VoiceXML을 통해 종래보다 유연성있는 대화흐름을 제어할 수 있게 된다. 즉, 기존의 VoiceXML의 경우 VoiceXML에 "오늘 날씨 알려줘" 라고 기술되어 있으면 화자는 "오늘 날씨 알려줘" 라고만 발성을 해야 대화를 진행시킬 수 있지만 화행을 이용할 경우 화자는 "오늘 날씨는 어때?", "오늘도 날씨가 좋을까?" 등 화행이 동일한 다양한 발성을 할 수 있어 보다 유연성있게 대화흐름을 제어할 수 있고 사용자에게도 보다 친숙하게 다가갈 수 있게 된다.Through the dialogue act-based VoiceXML, it is possible to control the dialogue flow more flexible than before. In other words, in case of existing VoiceXML, if the voiceXML describes "tell me the weather today", the speaker can speak only when "tell me the weather today." Is it nice today? " It is possible to control the conversation flow more flexibly, because the act of speech can have the same variety of voices.

이상에서와 같이 상세한 설명과 도면을 통해 본 발명의 최적 실시예를 개시하였다. 용어들은 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. As described above, the preferred embodiment of the present invention has been disclosed through the detailed description and the drawings. The terms are used only for the purpose of describing the present invention and are not used to limit the scope of the present invention as defined in the meaning or claims. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible from this. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

이상에서 설명한 바와 같은 본 발명에 따른 대화흐름 제어를 위한 화행기반 VoiceXML 대화장치 및 방법은 다음과 같은 효과가 있다.As described above, the dialogue act-based VoiceXML dialogue apparatus and method for dialogue flow control according to the present invention have the following effects.

첫째, 본 발명에 따른 화행기반의 VoiceXML 장치 및 방법은 기존의 VoiceXML이 처리하지 못하는 대화영역을 처리할 수 있기 때문에 사용자는 시스템과의 대화에 있어서 보다 친밀감을 느낄 수 있고 응용 분야 역시 기존의 영역을 벗어나 더욱 광범위하게 적용될 수 있다.First, the speech act based VoiceXML device and method according to the present invention can handle the conversation area that the existing VoiceXML cannot process, so that the user can feel more intimate in the conversation with the system, and the application field can also use the existing area. More broadly applicable.

둘째, 본 발명에서는 대화흐름의 관리만을 VoiceXML이 담당하도록 함으로써, 대화관리와 대화흐름(대화 시나리오)의 관리를 독립적으로 구성하여 개발자로 하여 금 보다 유연성있게 대화흐름을 관리할 수 있도록 할 수 있다.Second, in the present invention, by managing only the management of the conversation flow by VoiceXML, it is possible to configure the conversation management and the management of the conversation flow (conversation scenario) independently so that the developer can manage the conversation flow more flexibly.

Claims

A conversation manager that handles the conversation management according to the conversation contents of the speaker's voice input;

It includes a VoiceXML module for controlling the dialogue flow of the dialog manager based on the dialogue act information extracted from the dialogue contents through DDML (Dialogue Description Markup Language),

The conversation manager recognizes the speaker's voice and delivers the recognition result to the conversation manager. A conversation manager for generating a composite sentence based on the speech synthesizer, and a speech synthesizer responding to the speaker by synthesizing the synthesized sentence generated by the conversation manager

Speech act based VoiceXML dialogue device further comprising.

delete

The method of claim 1, wherein the VoiceXML module

Scenario2DDML for generating a DDML document of a dialogue scenario extracted from a dialogue DB of a specific area in an offline manner corresponding to the dialogue act information extracted from the dialogue contents through the dialogue manager;

Scenario2DDML converting the generated DDML document into a speech act based VoiceXML document and storing it in a web server;

Speech act based VoiceXML dialogue device including VoiceXML interpreter which delivers markup language of speech act type VoiceXML to conversation manager based on dialogue act delivered from dialogue manager. .

The method of claim 3, wherein

A dialogue act-based VoiceXML dialogue device further comprising a DDML editor in which the generated DDML document deviates from the dialogue flow required by the developer or through the control of the finer dialogue flow.

The method of claim 3, wherein

The DDML is a dialogue act-based VoiceXML dialogue apparatus that configures each conversation in a state unit including a speech object (<object>), a dialogue act (<action>), and a target (<target>).

(a) when the speaker thinks, recognizes the speaker's voice and outputs the recognition result;

(b) analyzing speech recognition results to extract speech act information from conversation contents;

(c) extracting a dialogue scenario from a dialogue DB of a specific region in an offline manner corresponding to the dialogue act information extracted from the dialogue contents;

(d) extracting a dialogue act from the extracted dialogue scenario to generate a DDML document expressed in DDML reflecting multiple dialogue flows;

(e) converting the generated DDML document into a speech act based VoiceXML document and storing it in a web server;

(f) processing a conversation by loading a VoiceXML document stored in the web server and generating a response dialogue line based on the extracted dialogue act information;

and (g) generating and synthesizing a response sentence based on the generated response dialogue information and responding to the speaker.

delete

The method of claim 6,

The step (d) further comprises the step of supplementing the DDML if it is determined that the expressed DDML deviates from the conversation flow required by the developer or needs more detailed control of the conversation flow.