KR102673867B1

KR102673867B1 - Apparatus and method for providing responsive conversation corpus

Info

Publication number: KR102673867B1
Application number: KR1020200017781A
Authority: KR
Inventors: 윤정민; 장두성; 최혜진
Original assignee: 주식회사 케이티
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2024-06-07
Also published as: KR20210103257A

Abstract

맞장구 대화 말뭉치 제공 장치 및 방법이 개시된다.
본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 장치는, 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템과 연동하여 맞장구 대화 말뭉치를 제공하는 장치로서, 상기 대화 예제 말뭉치를 저장한 상기 예제 기반 대화 시스템의 데이터베이스에서 발화와 해당 발화에 대응하는 응답을 매핑한 대화 예제를 검출하는 대화 예제 검출부; 검출된 대화 예제의 응답에서 맞장구 응답을 추출하고, 상기 검출된 대화 예제의 발화와 추출된 맞장구 응답을 매핑하는 방식으로 맞장구 대화 말뭉치를 생성하는 맞장구 대화 말뭉치 생성부; 및 상기 맞장구 대화 말뭉치를 저장하는 맞장구 대화 말뭉치 저장부를 포함한다.An apparatus and method for providing a corpus of tit-for-tat conversations are disclosed.
A device for providing a matching conversation corpus according to an embodiment of the present invention is a device that provides a matching conversation corpus in conjunction with an example-based conversation system that outputs a response to a user's utterance based on a conversation example corpus. a dialogue example detection unit that detects a dialogue example mapping an utterance and a response corresponding to the utterance in the database of the example-based dialogue system that stores the; a tit-for-tat conversation corpus generating unit that extracts a tit-for-tat response from the responses of the detected conversation examples, and generates a tit-for-tat conversation corpus by mapping the utterances in the detected conversation examples and the extracted tit-for-tat responses; and a matching conversation corpus storage unit that stores the matching conversation corpus.

Description

Apparatus and method for providing responsive conversation corpus}

본 발명은 맞장구 대화 말뭉치 제공 장치 및 방법에 관한 것으로서, 더 상세하게는, 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템과 연동하여 맞장구 대화 말뭉치를 제공하는 맞장구 대화 말뭉치 제공 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for providing a matching conversation corpus, and more specifically, to a matching conversation corpus that provides a matching conversation corpus in conjunction with an example-based dialogue system that outputs responses to user utterances based on a conversation example corpus. It relates to a provision device and method.

최근, 음성 인식 기술의 발전에 따라 사용자의 발화를 인식하여 응답을 제공하는 대화 시스템 기술들이 소개되고 있다.Recently, with the development of voice recognition technology, conversation system technologies that recognize user utterances and provide responses have been introduced.

그러나, 한국 공개특허공보 제10-2019-0116041호에 개시된 바와 같이, 기존기술은 미리 구축된 말뭉치를 이용하여 사용자의 발화에 대한 응답을 제공하기 때문에, 사용자의 다양한 발화에 대응하여 응답 가능한 커버리지가 제한적이고 사용자 발화에 대한 응답이 부자연스러워 대화의 실제성이나 완성도가 떨어지는 문제점이 있다.However, as disclosed in Korean Patent Publication No. 10-2019-0116041, the existing technology provides responses to the user's utterances using a pre-built corpus, so the coverage that can respond to the user's various utterances is limited. There is a problem in that it is limited and the response to the user's utterance is unnatural, which reduces the realism and completeness of the conversation.

또한, 대화 시스템의 응답 정확도를 높이고 응답 커버리지를 확대하기 위해 방대한 수준의 말뭉치 데이터베이스를 별도로 구축하는 경우, 대화 시스템 구축에 많은 시간과 비용이 요구되는 문제점이 있다.In addition, when a large corpus database is separately constructed to increase the response accuracy of the conversation system and expand response coverage, there is a problem that a lot of time and money are required to build the conversation system.

본 발명이 해결하고자 하는 기술적 과제는, 대화 시스템으로 하여금 사용자의 발화 내용에 따라 정보 전달을 위한 응답과 맞장구 응답을 선택적으로 출력할 수 있도록 하여 대화 시스템의 응답 커버리지와 응답 정확도를 동시에 개선하면서도, 시스템 구축 비용을 절감하는 대화 말뭉치 제공 장치 및 방법을 제공하는 것이다.The technical problem that the present invention aims to solve is to enable the dialogue system to selectively output responses for conveying information and counter-argument responses according to the content of the user's speech, thereby simultaneously improving the response coverage and response accuracy of the dialogue system, while simultaneously improving the system's response coverage and response accuracy. The goal is to provide a device and method for providing a conversation corpus that reduces construction costs.

본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 장치는, 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템과 연동하여 맞장구 대화 말뭉치를 제공하는 장치로서, 상기 대화 예제 말뭉치를 저장한 상기 예제 기반 대화 시스템의 데이터베이스에서 발화와 해당 발화에 대응하는 응답을 매핑한 대화 예제를 검출하는 대화 예제 검출부; 검출된 대화 예제의 응답에서 맞장구 응답을 추출하고, 상기 검출된 대화 예제의 발화와 추출된 맞장구 응답을 매핑하는 방식으로 맞장구 대화 말뭉치를 생성하는 맞장구 대화 말뭉치 생성부; 및 상기 맞장구 대화 말뭉치를 저장하는 맞장구 대화 말뭉치 저장부를 포함한다.A device for providing a matching conversation corpus according to an embodiment of the present invention is a device that provides a matching conversation corpus in conjunction with an example-based conversation system that outputs a response to a user's utterance based on a conversation example corpus. a dialogue example detection unit that detects a dialogue example mapping an utterance and a response corresponding to the utterance in the database of the example-based dialogue system that stores the; a tit-for-tat conversation corpus generating unit that extracts a tit-for-tat response from the responses of the detected conversation examples, and generates a tit-for-tat conversation corpus by mapping the utterances in the detected conversation examples and the extracted tit-for-tat responses; and a matching conversation corpus storage unit that stores the matching conversation corpus.

일 실시예에 있어서, 상기 장치는, 상기 예제 기반 대화 시스템 또는 타 대화 시스템에 입력된 입력발화와, 상기 입력발화에 대응하여 해당 대화 시스템에서 출력된 출력응답을 연관시킨 입력발화 및 출력응답 페어를 대화 로그 데이터베이스에 저장하는 대화 로그 저장부를 더 포함하고, 상기 대화 예제 검출부는, 상기 대화 로그 데이터베이스에 저장된 입력발화 및 출력응답 페어를 상기 대화 예제로서 검출하도록 구성된다.In one embodiment, the device creates an input utterance and output response pair that associates an input utterance input to the example-based dialogue system or another dialogue system with an output response output from the corresponding dialogue system in response to the input utterance. It further includes a conversation log storage unit for storing a conversation log database, and the conversation example detection unit is configured to detect an input utterance and output response pair stored in the conversation log database as the conversation example.

일 실시예에 있어서, 상기 맞장구 대화 말뭉치 생성부는, 상기 검출된 대화 예제의 응답에서 적어도 하나의 어절을 포함하는 응답 부분을 분리하고, 분리된 응답 부분이 상기 대화 예제 말뭉치에 독립된 응답으로 포함되어 있는 경우 상기 분리된 응답 부분을 상기 맞장구 응답으로 추출하는 맞장구 응답 추출부; 및 상기 맞장구 응답 추출부에 의해 추출되는 맞장구 응답들을 그룹핑하고, 그룹핑된 맞장구 응답들을 대표하는 대표 맞장구 응답을 선정함으로써 상기 그룹핑된 맞장구 응답들을 통합하는 맞장구 응답 통합부를 포함한다.In one embodiment, the matching conversation corpus generator separates a response part containing at least one word from the response of the detected conversation example, and the separated response part is included as an independent response in the conversation example corpus. a matching response extractor for extracting the separated response part as the matching response; and a matching response integrator that integrates the grouped matching responses by grouping the matching responses extracted by the matching responses extracting unit and selecting a representative matching response representing the grouped matching responses.

일 실시예에 있어서, 상기 맞장구 응답 추출부는, 상기 검출된 대화 예제의 응답이 복수의 문장을 포함하는 경우, 해당 응답에서 첫 번째 문장에 해당하는 응답 부분을 분리하도록 구성된다.In one embodiment, the matching response extractor is configured to separate a response portion corresponding to the first sentence from the response when the response of the detected conversation example includes a plurality of sentences.

일 실시예에 있어서, 상기 맞장구 응답 추출부는, 상기 검출된 대화 예제의 응답에서, 연속된 복수의 어절을 포함하되 미리 결정된 개수 이내의 어절을 포함하는 응답 부분을 분리하도록 구성된다.In one embodiment, the matching response extractor is configured to separate, from the response of the detected conversation example, a response portion that includes a plurality of consecutive words but includes words within a predetermined number.

일 실시예에 있어서, 상기 연속된 복수의 어절을 포함하는 응답 부분은, 상기 검출된 대화 예제의 응답을 이루는 첫 번째 어절과 상기 첫 번째 어절에서 연속되는 적어도 하나의 어절을 포함한다.In one embodiment, the response portion including the plurality of consecutive words includes a first word constituting a response to the detected conversation example and at least one word consecutive from the first word.

일 실시예에 있어서, 상기 맞장구 대화 말뭉치 생성부는, 상기 검출된 대화 예제의 발화와 상기 대표 맞장구 응답을 매핑하여 맞장구 대화 예제를 생성하는 맞장구 대화 예제 생성부를 더 포함한다.In one embodiment, the matching conversation corpus generator further includes a matching conversation example generating unit that generates a matching conversation example by mapping the utterance of the detected conversation example and the representative matching response.

일 실시예에 있어서, 상기 장치는, 상기 예제 기반 대화 시스템에 입력된 사용자의 발화에 대응하는 응답이 상기 대화 예제 말뭉치에 존재하지 않는 경우, 상기 맞장구 대화 말뭉치 저장부에 의해 저장된 맞장구 대화 말뭉치를 이용하여 상기 사용자의 발화에 대응하는 적절한 맞장구 응답을 출력하는 맞장구 응답 출력부를 더 포함한다.In one embodiment, the device uses the tit-for-tat conversation corpus stored by the tit-for-tat conversation corpus storage unit when a response corresponding to the user's utterance input to the example-based dialogue system does not exist in the conversation example corpus. It further includes a matching response output unit that outputs an appropriate matching response corresponding to the user's utterance.

본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 방법은, 컴퓨터 장치가 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템과 연동하여 맞장구 대화 말뭉치를 제공하는 방법으로서, 상기 컴퓨터 장치가 상기 대화 예제 말뭉치를 저장한 상기 예제 기반 대화 시스템의 데이터베이스에서 발화와 해당 발화에 대응하는 응답을 매핑한 대화 예제를 검출하는 (a) 단계; 상기 컴퓨터 장치가 검출된 대화 예제의 응답에서 맞장구 응답을 추출하고, 상기 검출된 대화 예제의 발화와 추출된 맞장구 응답을 매핑하는 방식으로 맞장구 대화 말뭉치를 생성하는 (b) 단계; 및 상기 컴퓨터 장치가 상기 맞장구 대화 말뭉치를 저장하는 (c) 단계를 포함한다.A method of providing a matching conversation corpus according to an embodiment of the present invention is a method of providing a matching conversation corpus in conjunction with an example-based conversation system in which a computer device outputs a response to a user's utterance based on a conversation example corpus, Step (a) of a computer device detecting a conversation example mapping an utterance and a response corresponding to the utterance in the database of the example-based conversation system that stores the conversation example corpus; Step (b) in which the computer device extracts a tit-for-tat response from the response of the detected conversation example, and generates a tit-for-tat conversation corpus by mapping the utterance of the detected conversation example and the extracted tit-for-tat response; and step (c), wherein the computer device stores the corpus of tit-for-tat conversation.

일 실시예에 있어서, 상기 방법은, 상기 (a) 단계 전에 상기 컴퓨터 장치가 상기 예제 기반 대화 시스템 또는 타 대화 시스템에 입력된 입력발화와, 상기 입력발화에 대응하여 해당 대화 시스템에서 출력된 출력응답을 연관시킨 입력발화 및 출력응답 페어를 대화 로그 데이터베이스에 저장하는 단계를 더 포함하고, 상기 (a) 단계는, 상기 컴퓨터 장치가 상기 대화 로그 데이터베이스에 저장된 입력발화 및 출력응답 페어를 대화 예제로서 검출하는 단계를 포함한다.In one embodiment, the method includes, before step (a), the computer device receives an input speech input to the example-based dialogue system or another dialogue system, and an output response output from the dialogue system in response to the input speech. It further includes the step of storing the associated input utterance and output response pair in a conversation log database, wherein step (a) is performed, wherein the computer device detects the input utterance and output response pair stored in the conversation log database as a conversation example. It includes steps to:

일 실시예에 있어서, 상기 (b) 단계는, 상기 검출된 대화 예제의 응답에서 적어도 하나의 어절을 포함하는 응답 부분을 분리하고, 분리된 응답 부분이 상기 대화 예제 말뭉치에 독립된 응답으로 포함되어 있는 경우 상기 분리된 응답 부분을 상기 맞장구 응답으로 추출하는 (b1) 단계; 및 상기 맞장구 응답 추출부에 의해 추출되는 맞장구 응답들을 그룹핑하고, 그룹핑된 맞장구 응답들을 대표하는 대표 맞장구 응답을 선정함으로써 상기 그룹핑된 맞장구 응답들을 통합하는 (b2) 단계를 포함한다.In one embodiment, step (b) separates a response part containing at least one word from the response of the detected conversation example, and the separated response part is included as an independent response in the conversation example corpus. step (b1) of extracting the separated response part as the matching response; and a step (b2) of grouping the matching responses extracted by the matching responses extracting unit and selecting a representative matching response representing the grouped matching responses, thereby integrating the grouped matching responses.

일 실시예에 있어서, 상기 (b1) 단계는, 상기 검출된 대화 예제의 응답이 복수의 문장을 포함하는 경우, 해당 응답에서 첫 번째 문장에 해당하는 응답 부분을 분리하는 단계를 포함한다.In one embodiment, step (b1) includes, when the response of the detected conversation example includes a plurality of sentences, separating a response portion corresponding to the first sentence from the response.

일 실시예에 있어서, 상기 (b1) 단계는, 상기 검출된 대화 예제의 응답에서, 연속된 복수의 어절을 포함하는 응답 부분을 분리하되 미리 결정된 개수 이내의 어절을 포함하는 응답 부분을 분리하는 단계를 포함한다.In one embodiment, the step (b1) includes separating a response part containing a plurality of consecutive words from the response of the detected conversation example, but separating a response part containing words within a predetermined number. Includes.

일 실시예에 있어서, 상기 (b) 단계는, 상기 검출된 대화 예제의 발화와 상기 대표 맞장구 응답을 매핑하여 맞장구 대화 예제를 생성하는 (b3) 단계를 더 포함한다.In one embodiment, the step (b) further includes a step (b3) of generating a tit-for-tat conversation example by mapping the utterance of the detected conversation example and the representative tit-for-tat response.

일 실시예에 있어서, 상기 방법은, 상기 예제 기반 대화 시스템에 입력된 사용자의 발화에 대응하는 응답이 상기 대화 예제 말뭉치에 존재하지 않는 경우, 상기 컴퓨터 장치가 상기 (c) 단계에서 저장된 맞장구 대화 말뭉치를 이용하여 상기 사용자의 발화에 대응하는 적절한 맞장구 응답을 출력하는 (e) 단계를 더 포함한다.In one embodiment, the method is such that, when a response corresponding to a user's utterance input to the example-based dialogue system does not exist in the dialogue example corpus, the computer device stores the tit-for-tat dialogue corpus stored in step (c). It further includes step (e) of outputting an appropriate response corresponding to the user's utterance using .

본 발명에 따른 실시예들은, 상술한 동작 또는 방법을 컴퓨터 시스템을 통해 실행하는 컴퓨터 프로그램으로서 기록매체에 기록되는 컴퓨터 프로그램을 이용하여 구현될 수 있다.Embodiments according to the present invention can be implemented using a computer program recorded on a recording medium as a computer program that executes the above-described operations or methods through a computer system.

본 발명에 따르면, 컴퓨터 장치 또는 애플리케이션으로 구현 가능한 맞장구 대화 말뭉치 제공 장치가 예제 기반 대화 시스템의 대화 예제 말뭉치를 이용하여 맞장구 대화 말뭉치를 자동으로 생성 및 제공함으로써, 대화 시스템으로 하여금 사용자의 발화 내용에 따라 정보 전달을 위한 응답과 맞장구 응답을 선택적으로 출력할 수 있도록 하고 대화 시스템의 응답 커버리지와 응답 정확도를 동시에 개선하면서도, 시스템 구축 비용을 절감할 수 있다.According to the present invention, a device for providing a matching conversation corpus that can be implemented as a computer device or an application automatically generates and provides a matching conversation corpus using the conversation example corpus of an example-based conversation system, allowing the conversation system to follow the content of the user's speech. It allows selective output of responses to convey information and counter-responses, simultaneously improving response coverage and response accuracy of the conversation system, while reducing system construction costs.

또한, 맞장구 대화 말뭉치 제공 장치가 대화 시스템의 대화 이력을 기록한 대화 로그(log)를 더 이용하여 맞장구 대화 말뭉치를 생성함으로써, 대화 시스템의 응답 커버리지를 적응적으로 확장시키고 대화 시스템을 통해 진행되는 대화의 실제성과 완성도를 높일 수 있다.In addition, the matching conversation corpus providing device further uses the conversation log recording the conversation history of the conversation system to create a matching conversation corpus, adaptively expanding the response coverage of the conversation system and improving the conversation progress through the conversation system. Reality and completeness can be improved.

나아가, 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자라면, 본 발명에 따른 다양한 실시예들이 상기 언급되지 않은 여러 기술적 과제들을 해결할 수 있음을 이하의 설명으로부터 자명하게 이해할 수 있을 것이다.Furthermore, those skilled in the art to which the present invention pertains will be able to clearly understand from the following description that various embodiments according to the present invention can solve various technical problems not mentioned above.

도 1은 본 발명이 적용되는 대화 시스템의 일례를 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 장치를 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 방법을 나타낸 흐름도이다.
도 4는 본 발명에 적용 가능한 맞장구 응답 추출 알고리즘의 일례를 나타낸 도면이다.1 is a diagram showing an example of a conversation system to which the present invention is applied.
Figure 2 is a block diagram illustrating a device for providing a corpus of tit-for-tat conversation according to an embodiment of the present invention.
Figure 3 is a flowchart showing a method of providing a corpus of tit-for-tat conversation according to an embodiment of the present invention.
Figure 4 is a diagram showing an example of a matching response extraction algorithm applicable to the present invention.

이하, 본 발명의 기술적 과제에 대한 해결 방안을 명확화하기 위해 첨부도면을 참조하여 본 발명의 실시예들을 상세하게 설명한다. 다만, 본 발명을 설명함에 있어서 관련 공지기술에 관한 설명이 오히려 본 발명의 요지를 불명료하게 하는 경우 그에 관한 설명은 생략하기로 한다. 또한, 본 명세서에서 사용되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이들은 설계자, 제조자 등의 의도 또는 관례 등에 따라 달라질 수 있을 것이다. 그러므로 후술되는 용어들의 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to clarify solutions to the technical problems of the present invention. However, in describing the present invention, if the description of related known technology rather obscures the gist of the present invention, the description thereof will be omitted. Additionally, the terms used in this specification are terms defined in consideration of the functions in the present invention, and may vary depending on the intention or custom of the designer, manufacturer, etc. Therefore, definitions of terms described below should be made based on the content throughout this specification.

도 1에는 본 발명이 적용되는 대화 시스템(2)의 일례가 도시되어 있다.Figure 1 shows an example of a conversation system 2 to which the present invention is applied.

도 1에 도시된 바와 같이, 본 발명이 적용되는 대화 시스템(2)은 대화 예제 말뭉치를 이용하는 예제 기반 대화 시스템으로 구성될 수 있다.As shown in FIG. 1, the conversation system 2 to which the present invention is applied may be configured as an example-based conversation system using a corpus of conversation examples.

즉, 대화 시스템(2)은 인공지능 스피커나 스마트 셋톱박스 등과 같이 사용자(U)의 발화를 인식하고 음성을 출력하는 스마트 단말(10)과, 통신 네트워크를 통해 상기 스마트 단말(10)과 연동하여 사용자의 발화에 대한 응답을 출력하는 대화 서비스 서버(20)를 포함할 수 있다. 이 경우, 대화 서비스 서버(20)는 대화 예제 말뭉치를 저장하여 관리하는 대화 예제 말뭉치 데이터베이스(22)를 포함할 수 있다.That is, the conversation system 2 is connected to a smart terminal 10 that recognizes the user's utterance and outputs a voice, such as an artificial intelligence speaker or a smart set-top box, and the smart terminal 10 through a communication network. It may include a conversation service server 20 that outputs a response to the user's utterance. In this case, the conversation service server 20 may include a conversation example corpus database 22 that stores and manages conversation example corpora.

본 발명에 따른 맞장구 대화 말뭉치 제공 장치는, 상기와 같이 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템과 연동하여 맞장구 대화 말뭉치를 자동으로 생성함으로써, 대화 시스템에 입력되는 사용자의 다양한 발화에 대해 적절한 맞장구 응답을 제공하도록 구성된다.The device for providing a matching conversation corpus according to the present invention automatically generates a matching conversation corpus in conjunction with an example-based conversation system that outputs responses to the user's utterances based on the conversation example corpus as described above, so that the matching conversation corpus is input to the conversation system. It is configured to provide appropriate responses to the user's various utterances.

본 발명에 있어서, ‘맞장구 응답’은 단순한 추임새, 간투사 또는 감탄사 등과 같이 제한적인 단어나 어절로 구성되는 것이 아니라, 복수의 음절을 포함한 다양한 내용의 문장으로 구성되는 것이다.In the present invention, the ‘joint response’ is not composed of limited words or phrases such as simple chuimsae, interjections, or exclamations, but is composed of sentences with various contents including multiple syllables.

도 2에는 본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 장치(100)가 블록도로 도시되어 있다.FIG. 2 shows a block diagram of an apparatus 100 for providing a corpus of confrontational dialogue according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 장치(100)는, 대화 예제 말뭉치를 기반으로 사용자의 발화에 대한 응답을 출력하는 예제 기반 대화 시스템(2)과 연동하는 일종의 컴퓨터 장치나 서버로 구성될 수 있으며, 통신부(101), 입력부(102), 저장부(104), 출력부(103) 및 제어부(105) 등을 포함할 수 있다.As shown in FIG. 2, the device 100 for providing a matching conversation corpus according to an embodiment of the present invention is linked with an example-based conversation system 2 that outputs a response to the user's utterance based on a conversation example corpus. It may be composed of a type of computer device or server, and may include a communication unit 101, an input unit 102, a storage unit 104, an output unit 103, and a control unit 105.

통신부(101)는, 유·무선 통신 네트워크를 통해 예제 기반 대화 시스템(2)이나 다른 대화 시스템, 그 밖에 다른 통신 장치 등으로부터 전송된 데이터를 수신하여 제어부(105)에 전달하거나, 제어부(105)에서 처리된 제어 신호나 데이터를 예제 기반 대화 시스템(2)이나 다른 대화 시스템, 그 밖에 다른 통신 장치로 전송하도록 구성된다. 이를 위해, 통신부(101)는 유·무선 통신을 수행하는 통신 모뎀을 포함할 수 있다. 이 경우, 통신부(101)는 원거리 무선 통신을 수행하는 LTE 통신 모듈, 5G 통신 모듈 등과, 근거리 무선 통신을 수행하는 블루투스 통신 모듈, 지그비 통신 모듈, 와이파이 통신 모듈 등을 포함할 수 있다. 또한, 통신부(101)는 USB 포트, 유선랜 포트 또는 그 밖의 데이터 전송 케이블이 연결되는 다양한 통신 포트 등을 포함할 수 있다.The communication unit 101 receives data transmitted from the example-based dialogue system 2, another dialogue system, or other communication devices through a wired/wireless communication network and transmits it to the control unit 105. It is configured to transmit the control signal or data processed in the example-based conversation system (2), another conversation system, or other communication device. To this end, the communication unit 101 may include a communication modem that performs wired and wireless communication. In this case, the communication unit 101 may include an LTE communication module, a 5G communication module, etc. that perform long-distance wireless communication, a Bluetooth communication module, a Zigbee communication module, a Wi-Fi communication module, etc., that perform short-distance wireless communication. Additionally, the communication unit 101 may include a USB port, a wired LAN port, or other various communication ports to which data transmission cables are connected.

입력부(102)는, 시스템 운영자 또는 관리자의 명령이나 데이터를 입력받도록 구성된다. 이를 위해, 입력부(102)는 키보드, 조작 버튼 또는 터치 패널 등과 같은 입력 장치를 포함할 수 있다.The input unit 102 is configured to receive commands or data from a system operator or administrator. To this end, the input unit 102 may include an input device such as a keyboard, operation button, or touch panel.

출력부(103)는, 제어부(105)를 통해 처리된 데이터나 정보를 시각적 또는 시청각적으로 출력하도록 구성된다. 이를 위해, 출력부(103)는 모니터, 디스플레이 패널 또는 터치 스크린 등과 같은 시각적 출력 장치를 포함할 수 있다. 또한, 출력부(103)는 스피커 등과 같은 음향 발생 장치를 더 포함할 수 있다.The output unit 103 is configured to output data or information processed through the control unit 105 visually or audiovisually. To this end, the output unit 103 may include a visual output device such as a monitor, display panel, or touch screen. Additionally, the output unit 103 may further include a sound generating device such as a speaker.

저장부(104)는, 맞장구 대화 말뭉치 제공 장치(100)의 동작에 필요한 데이터들을 저장하여 관리하도록 구성된다. 이를 위해, 저장부(104)는 ROM, RAM, EEPROM, 레지스터, 플래시 메모리, CD-ROM, 자기 테이프, 하드 디스크, 플로피디스크, 광 데이터 기록장치 등의 다양한 저장 매체들을 선택적으로 포함할 수 있다. 또한, 저장부(104)는 맞장구 대화 말뭉치 데이터베이스(104a)와 대화 로그 데이터베이스(104b)를 포함할 수 있다. 이 경우, 맞장구 대화 말뭉치 데이터베이스(104a)는 본 발명에 따라 생성된 맞장구 대화 말뭉치가 저장하여 관리한다. 대화 로그 데이터베이스(104b)는 맞장구 대화 말뭉치 제공 장치(100)와 연동하는 대화 시스템에 입력된 발화와 해당 발화에 대응하여 대화 시스템에서 출력된 응답을 기록한 대화 로그를 저장하여 관리한다.The storage unit 104 is configured to store and manage data necessary for the operation of the device 100 for providing a conversation corpus. To this end, the storage unit 104 may optionally include various storage media such as ROM, RAM, EEPROM, register, flash memory, CD-ROM, magnetic tape, hard disk, floppy disk, and optical data recording device. Additionally, the storage unit 104 may include a matching conversation corpus database 104a and a conversation log database 104b. In this case, the matching conversation corpus database 104a stores and manages the matching conversation corpus created according to the present invention. The conversation log database 104b stores and manages conversation logs that record utterances input to a conversation system that works with the matching conversation corpus providing device 100 and responses output from the conversation system in response to the utterances.

제어부(105)는, 맞장구 대화 말뭉치 제공 장치(100)의 전반적인 동작을 제어하도록 구성된다. 이를 위해, 제어부(105)는 제어 로직을 실행하기 위한 범용 프로세서, ASIC(application-specific integrated circuit), 그 밖의 다른 칩셋, 논리 회로, 레지스터, 메모리 등의 하드웨어들을 선택적으로 포함할 수 있다. 한편, 제어부(105)는 하드웨어와 소프트웨어의 결합으로 구성될 수 있다. 즉, 제어부(105)의 제어 로직은 컴퓨터 프로그램으로 구성되어 제어부(105)의 자체 메모리나 저장부(104)에 저장되고, 저장된 컴퓨터 프로그램은 제어부(105)의 하드웨어를 통해 실행되도록 구성될 수 있다.The control unit 105 is configured to control the overall operation of the device 100 for providing a conversation corpus. To this end, the control unit 105 may optionally include hardware such as a general-purpose processor, an application-specific integrated circuit (ASIC), other chipsets, logic circuits, registers, and memory for executing control logic. Meanwhile, the control unit 105 may be composed of a combination of hardware and software. That is, the control logic of the control unit 105 is composed of a computer program and stored in the internal memory or storage unit 104 of the control unit 105, and the stored computer program may be configured to be executed through the hardware of the control unit 105. .

한편, 제어부(105)는 기능적으로 구분되는 구성요소들로서, 대화 예제 검출부(110), 맞장구 대화 말뭉치 생성부(120) 및 맞장구 대화 말뭉치 저장부(130)를 포함하며, 실시예에 따라 맞장구 응답 출력부(140), 대화 로그 저장부(150) 등을 더 포함할 수 있다.Meanwhile, the control unit 105 is a functionally divided component and includes a conversation example detection unit 110, a matching conversation corpus generation unit 120, and a matching conversation corpus storage unit 130, and outputs a matching response depending on the embodiment. It may further include a unit 140, a conversation log storage unit 150, etc.

대화 예제 검출부(110)는, 대화 예제 말뭉치를 저장한 상기 예제 기반 대화 시스템(2)의 데이터베이스(22)에서 발화와 해당 발화에 대응하는 응답을 매핑한 대화 예제를 검출하도록 구성된다. 일반적으로 예제 기반 대화 시스템은 10만 개에서 천만 개에 이르는 발화(또는 질의)-응답 페어들(즉, 대화 예제들)로 이루진 대화 예제 말뭉치를 이용하기 때문에, 대화 예제 검출부(110)는 예제 기반 대화 시스템(2)의 데이터베이스(22)에서 매우 다양하고 많은 대화 예제들을 검출할 수 있다.The dialogue example detection unit 110 is configured to detect dialogue examples that map utterances and responses corresponding to the utterances in the database 22 of the example-based dialogue system 2 that stores a corpus of dialogue examples. In general, since example-based dialogue systems use a dialogue example corpus consisting of 100,000 to 10 million utterance (or question)-response pairs (i.e., dialogue examples), the dialogue example detector 110 detects examples. Many and diverse conversation examples can be detected in the database 22 of the base conversation system 2.

맞장구 대화 말뭉치 생성부(120)는, 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답에서 맞장구 응답을 추출하고, 상기 검출된 대화 예제의 발화와 추출된 맞장구 응답을 매핑하는 방식으로 맞장구 대화 말뭉치를 생성하도록 구성된다. 이를 위해, 맞장구 대화 말뭉치 생성부(120)는 맞장구 응답 추출부(122), 맞장구 응답 통합부(124) 및 맞장구 대화 예제 생성부(126)를 포함할 수 있다.The tit-for-tat conversation corpus generation unit 120 extracts a tit-for-tat response from the response of the conversation example detected by the conversation example detection unit 110, and maps the utterance of the detected conversation example to the extracted tit-for-tat response. It is configured to create a corpus. To this end, the matching dialogue corpus generation unit 120 may include a matching dialogue response extraction unit 122, a matching dialogue response integration unit 124, and a matching dialogue example generating unit 126.

이 경우, 맞장구 응답 추출부(122)는 검출된 대화 예제의 응답에서 적어도 하나의 어절을 포함하는 응답 부분을 분리하여 해당 응답 부분을 맞장구 응답으로 추출하도록 구성된다.In this case, the matching response extractor 122 is configured to separate the response part including at least one word from the response of the detected conversation example and extract the corresponding response part as a matching response.

예제 기반 대화 시스템(2)의 대화 예제 말뭉치에는 다양한 유형의 발화(질문)와 해당 발화에 대한 다양한 응답으로 이루어진 대화 예제들을 얻을 수 있다. 이러한 다양한 응답에서 적절한 맞장구 응답을 추출하기 위해서는 한국어의 언어적 특성을 이용한 추출 방식이 적용되어야 한다. 즉, 한국어 대화에서 상대방의 동의, 공감 또는 추가 의견 등을 나타내는 문장은 주로 응답의 앞 부분에 위치하게 된다. 그럼에도 응답 전체를 맞장구 응답으로 추출한다면, 맞장구 응답이 지나치게 길어지게 되고 맞장구 응답의 커버리지가 좁아지게 된다.In the dialogue example corpus of the example-based dialogue system (2), you can obtain dialogue examples consisting of various types of utterances (questions) and various responses to the utterances. In order to extract appropriate matching responses from these various responses, an extraction method using the linguistic characteristics of Korean must be applied. In other words, in Korean conversations, sentences expressing the other person's agreement, sympathy, or additional opinions are usually located at the beginning of the response. Nevertheless, if the entire response is extracted as a matching response, the matching response becomes too long and the coverage of the matching response becomes narrow.

따라서 일 실시예에 있어서, 상기 맞장구 응답 추출부(122)는 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답이 복수의 문장을 포함하는 경우, 해당 응답에서 첫 번째 문장에 해당하는 응답 부분을 분리하여 맞장구 응답으로 추출하도록 구성될 수 있다.Therefore, in one embodiment, when the response of the conversation example detected by the conversation example detection unit 110 includes a plurality of sentences, the response extractor 122 selects the response part corresponding to the first sentence in the response. It can be configured to separate and extract it as a matching response.

예컨대, 대화 예제의 응답이 “메리 크리스마스 오늘 뭐하실 건가요? 좋은 하루 보내세요.”, “정말 맛있겠네요. 라면은 언제나 맛있죠.”와 같이 문장 부호를 통해 복수의 문장으로 구분되는 경우, 상기 맞장구 응답 추출부(122)는 상기 응답들에서 각각 첫 번째 문장인, “메리 크리스마스 오늘 뭐하실 건가요?”와 “정말 맛있겠네요.”를 분리하여 해당 발화에 대한 맞장구 응답으로 추출할 수 있다.For example, the response in the conversation example is “Merry Christmas, what are you going to do today?” Have a nice day.”, “It sounds really delicious. When a plurality of sentences are divided through punctuation marks, such as “Ramen is always delicious,” the matching response extractor 122 extracts the first sentences from the responses, “Merry Christmas, what are you going to do today?” and “What are you going to do today?” “It must be really delicious.” can be separated and extracted as a tit-for-tat response to that utterance.

또한, 대화 예제의 응답이 “여름에는 메밀로 만든 메밀국수가 딱 이에요 정말 맛있죠”와 같이 문장 부호를 포함하지 않지만 구문 구조 분석을 통해 복수의 문장으로 구분되는 경우, 상기 맞장구 응답 추출부(122)는 마찬가지로 해당 응답에서 첫 번째 문장인, “여름에는 메밀로 만든 메밀국수가 딱 이에요”를 분리하여 해당 발화에 대한 맞장구 응답으로 추출할 수 있다.In addition, if the response in the conversation example does not include punctuation marks, such as “Buckwheat noodles made from buckwheat are perfect for summer, they are really delicious,” but is divided into multiple sentences through syntactic structure analysis, the correct response extractor 122 Likewise, the first sentence, “Buckwheat noodles made from buckwheat are perfect for summer,” can be separated from the response and extracted as a tit-for-tat response to the utterance.

다른 일 실시예에 있어서, 상기 맞장구 응답 추출부(122)는 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답에서, 연속된 복수의 어절을 포함하되 미리 결정된 개수 이내의 어절을 포함하는 응답 부분을 분리하여 맞장구 응답으로 추출하도록 구성될 수 있다. 이 경우, 상기 연속된 복수의 어절을 포함하는 응답 부분은, 검출된 대화 예제의 응답을 이루는 첫 번째 어절과 상기 첫 번째 어절에서 연속되는 적어도 하나의 어절을 포함할 수 있다.In another embodiment, the matching response extractor 122 includes a plurality of consecutive words in the response of the conversation example detected by the conversation example detector 110, but includes words within a predetermined number. It can be configured to separate the parts and extract them as a matching response. In this case, the response portion including the plurality of consecutive words may include the first word constituting the response to the detected conversation example and at least one word consecutive from the first word.

예컨대, 맞장구 응답이 5개 이내의 어절들을 포함하도록 결정되고, 대화 예제의 응답이 “메리 크리스마스 오늘 뭐하실 건가요? 좋은 하루 보내세요.”, “정말 맛있겠네요. 라면은 언제나 맛있죠.”인 경우, 상기 맞장구 응답 추출부(122)는 상기 응답들에서 각각 2개의 어절을 포함하는 “메리 크리스마스”와 “정말 맛있겠네요”를 분리하여 해당 발화에 대한 맞장구 응답으로 추출할 수 있다.For example, the tit-for-tat response is determined to contain less than 5 words, and the response in the conversation example is “Merry Christmas, what are you going to do today?” Have a nice day.”, “It sounds really delicious. In the case of “Ramen is always delicious.”, the tit-for-tat response extractor 122 separates “Merry Christmas” and “It must be really delicious,” which each contain two phrases, from the responses and extracts them as a tit-for-tat response to the corresponding utterance. can do.

반면, 맞장구 응답이 5개 이내의 어절들을 포함하도록 결정되고, 대화 예제의 응답이 “여름에는 메밀로 만든 메밀국수가 딱 이에요 정말 맛있죠”인 경우, 상기 맞장구 응답 추출부(122)는 “여름에는 메밀로 만든 메밀국수가 딱 이에요”가 해당 응답의 첫 번째 문장에 해당하는 것이나 6개의 어절을 포함하므로 맞장구 응답으로 추출하지 않을 수 있다.On the other hand, if the tit-for-tat response is determined to contain 5 or fewer words, and the response in the conversation example is “Buckwheat noodles made from buckwheat are perfect for summer, they are really delicious,” the tit-for-tat response extractor 122 will say, “In summer, buckwheat noodles are perfect.” “Buckwheat noodles made from buckwheat are perfect” is the first sentence of the response, but since it contains 6 words, it may not be extracted as a tit-for-tat response.

또한, 상기 맞장구 응답 추출부(122)는 상기와 같이 분리된 응답 부분이 상기 대화 예제 말뭉치의 전체 응답 집합에 독립된 응답으로 포함되어 있는 경우에만, 해당 응답 부분을 맞장구 응답으로 추출하도록 구성될 수 있다. 분리된 응답 부분이 상기 대화 예제 말뭉치의 전체 응답 집합에 독립된 응답으로 포함되어 있다면 그 자체로 완전한 응답으로 볼 수 있기 때문이다.In addition, the tit-for-tat response extractor 122 may be configured to extract the response part as a tit-for-tat response only when the separated response part as described above is included as an independent response in the entire response set of the conversation example corpus. . This is because if the separated response part is included as an independent response in the entire response set of the above conversation example corpus, it can be viewed as a complete response in itself.

맞장구 응답 통합부(124)는, 상기 맞장구 응답 추출부(122)에 의해 추출된 맞장구 응답들을 그룹핑하고, 그룹핑된 맞장구 응답들을 대표하는 대표 맞장구 응답을 선정함으로써 상기 그룹핑된 맞장구 응답들을 통합하도록 구성될 수 있다.The matching response integration unit 124 is configured to group the matching responses extracted by the matching answer extracting unit 122 and integrate the grouped matching responses by selecting a representative matching response representing the grouped matching responses. You can.

맞장구 응답들은 자연어이기 때문에 의미는 같지만 다른 형태를 갖고 있는 경우가 많다. 따라서, 아래와 같은 두 가지 방식을 통해 같은 의미의 응답들을 통합하고 통합된 응답을 맞장구 응답으로 사용할 수 있다. 아래의 두 가지 방식은 서로 장단점이 상반되는바, 상황에 따라 두 가지 방식 중 하나를 적절히 선택하여 맞장구 응답을 통합할 수 있다.Because the responses are in natural language, they often have the same meaning but different forms. Therefore, you can integrate responses with the same meaning through the two methods below and use the integrated response as a matching response. The two methods below have conflicting advantages and disadvantages, so depending on the situation, one of the two methods can be appropriately selected to integrate a tit-for-tat response.

첫째, 클러스터링을 이용한 방식은, 클러스터링 알고리즘을 이용하여 자동으로 응답들에 관한 클러스터를 생성하고 생성된 각각의 클러스터에 적절한 맞장구 응답을 지정해주는 방법이다. 각각의 응답은 의미 벡터로 표현되고 클러스터링 알고리즘을 통해 생성된 클러스터에 적절한 맞장구 응답을 지정해주면 해당 클러스터에 포함되는 입력 발화 모두를 말뭉치로 생성할 수 있다. 이러한 방식은, 많은 양을 한 번에 생성할 수 있지만 클러스터에 오류가 포함될 가능성이 있다.First, the method using clustering is a method that automatically creates clusters of responses using a clustering algorithm and assigns an appropriate matching response to each generated cluster. Each response is expressed as a semantic vector, and by specifying an appropriate tit-for-tat response to the cluster created through the clustering algorithm, all input utterances included in the cluster can be created as a corpus. This method can generate a large amount at once, but there is a possibility that the cluster may contain errors.

맞장구 응답의 고빈도 정렬을 이용한 방식은, 맞장구 응답들을 고빈도 순으로 정렬하여 출력하고, 사용자나 관리자의 선택에 따라 맞장구 응답들을 통합하는 방식이다. 이러한 방식은 클러스터링을 이용하는 방식보다 많은 시간이 요구되지만 사용자나 관리자가 해당 시스템의 목적에 따라 맞장구 응답을 직접 선택하여 통합하는 과정을 거치므로 오류를 줄일 수 있다.The method using high-frequency sorting of matching responses is a method of sorting and outputting matching responses in order of high frequency and integrating matching responses according to the selection of the user or administrator. This method requires more time than the method using clustering, but errors can be reduced because the user or administrator goes through the process of directly selecting and integrating matching responses according to the purpose of the system.

한편, 맞장구 대화 예제 생성부(126)는, 대화 예제 검출부(110)에 의해 검출된 대화 예제의 발화와 맞장구 응답 추출부(122)에 의해 추출된 맞장구 응답을 매핑하여 맞장구 대화 예제를 생성하도록 구성된다. 이 경우, 맞장구 대화 예제 생성부(126)는 상기 검출된 대화 예제의 발화와 상기 맞장구 응답 통합부(124)에 의해 선정된 대표 맞장구 응답을 매핑하여 맞장구 대화 예제들을 생성하도록 구성될 수 있다. 이와 같이, 다양한 유형의 발화들과, 맞장구 응답 통합 과정을 통해 선정된 소량의 맞장구 응답을 상호 매핑하여 생성된 맞장구 대화 예제들로 맞장구 대화 말뭉치를 구성함으로써, 맞장구 대화 말뭉치의 데이터 크기를 감소시키면서도 맞장구 응답의 커버리지를 확장시킬 수 있다.Meanwhile, the matching conversation example generation unit 126 is configured to generate a matching conversation example by mapping the utterance of the conversation example detected by the conversation example detection unit 110 and the matching response extracted by the matching response extracting unit 122. do. In this case, the matching conversation example generating unit 126 may be configured to generate matching conversation examples by mapping the utterance of the detected conversation example and the representative matching response selected by the matching conversation example integration unit 124. In this way, by constructing a tit-for-tat conversation corpus with examples of tit-for-tat dialogues created by mutually mapping various types of utterances and a small amount of tit-for-tat responses selected through a tit-for-tat response integration process, the data size of the tit-for-tat conversation corpus is reduced while The coverage of the response can be expanded.

맞장구 대화 말뭉치 저장부(130)는, 상기와 같이 생성된 맞장구 대화 예제들을 포함하는 맞장구 대화 말뭉치를 데이터베이스(104a)에 저장하도록 구성된다.The matching conversation corpus storage unit 130 is configured to store the matching conversation corpus including the matching conversation examples created as described above in the database 104a.

맞장구 응답 출력부(140)는, 상기 맞장구 대화 말뭉치 제공 장치(100)와 연동하는 대화 시스템에 입력된 사용자의 발화에 대응하여, 맞장구 응답을 출력하도록 구성된다. 예컨대, 맞장구 응답 출력부(140)는 예제 기반 대화 시스템(2)에 입력된 사용자의 발화에 대응하는 응답이 해당 대화 시스템의 대화 예제 말뭉치에 존재하지 않는 경우, 상기 맞장구 대화 말뭉치를 이용하여 상기 맞장구 대화 말뭉치에 포함된 소량의 맞장구 응답들 중 상기 사용자의 발화에 대응하는 가장 적절한 맞장구 응답을 출력하도록 구성될 수 있다.The matching response output unit 140 is configured to output a matching response in response to the user's utterance input to the conversation system that works with the matching conversation corpus providing device 100. For example, if the response corresponding to the user's utterance input to the example-based dialogue system 2 does not exist in the dialogue example corpus of the corresponding dialogue system, the matching response output unit 140 uses the matching dialogue corpus to It may be configured to output the most appropriate tit-for-tat response corresponding to the user's utterance among a small amount of tit-for-tat responses included in the conversation corpus.

대화 로그 저장부(150)는, 상기 예제 기반 대화 시스템(2) 또는 타 대화 시스템에 입력된 입력발화와, 상기 입력발화에 대응하여 해당 대화 시스템에서 출력된 출력응답을 연관시킨 입력발화 및 출력응답 페어를 대화 로그 데이터베이스(104b)에 저장하도록 구성된다. 이 경우, 상기 대화 예제 검출부(110)는, 상기 대화 로그 데이터베이스(104b)에 저장된 입력발화 및 출력응답 페어를, 맞장구 응답 추출을 위한 대화 예제로서 검출하도록 구성될 수 있다. 그 결과, 상기 맞장구 대화 말뭉치 생성부(120)는 대화 로그 데이터베이스(104b)에서 검출된 입력발화 및 출력응답 페어를 이용하여 더욱 다양한 맞장구 응답들을 추출할 수 있다.The conversation log storage unit 150 stores input utterances and output responses that associate input utterances input to the example-based dialog system 2 or other dialog systems with output responses output from the corresponding dialog system in response to the input utterances. It is configured to store the pair in the conversation log database 104b. In this case, the conversation example detection unit 110 may be configured to detect the input utterance and output response pair stored in the conversation log database 104b as a conversation example for extracting a matching response. As a result, the matching conversation corpus generator 120 can extract more diverse matching responses using the input utterance and output response pairs detected in the conversation log database 104b.

이와 같이, 대화 예제 말뭉치를 이용한 예제기반 대화 시스템이 이미 구축되어 있는 경우, 본 발명에 따른 맞장구 대화 말뭉치 제공 장치(100)를 이용하면 맞장구 대화 말뭉치를 비용 효율적으로 생성하여 제공할 수 있게 된다.In this way, when an example-based conversation system using a conversation example corpus has already been built, the matching conversation corpus 100 according to the present invention can be used to cost-effectively generate and provide a matching conversation corpus.

상술한 제어부(105)의 구성요소들은 어느 한 구성요소가 다른 구성요소와 통합되거나 제어 로직의 단계적 실행을 위해 하위 구성요소들로 세분화될 수 있다. 제어부(105)의 구성요소들이 통합 또는 세분화되더라도 기능의 동일성이 인정된다면 통합 또는 세분화된 구성요소들도 본 발명의 범위에 포함되는 것이다.The components of the control unit 105 described above may be integrated with other components or may be subdivided into sub-components for step-by-step execution of control logic. Even if the components of the control unit 105 are integrated or segmented, if the same function is recognized, the integrated or segmented components are also included in the scope of the present invention.

또한, 도 2에서는 본 발명에 따른 맞장구 대화 말뭉치 제공 장치(100)가 예제 기반 대화 시스템(2)과 별도의 장치로 구성되는 것으로 설명되었으나, 실시예에 따라 맞장구 대화 말뭉치 제공 장치(100)는 예제 기반 대화 시스템(2)과 통합적으로 구성될 수도 있다.In addition, in Figure 2, the device 100 for providing a matching conversation corpus according to the present invention is described as being composed of a separate device from the example-based conversation system 2, but according to the embodiment, the device 100 for providing a matching conversation corpus is provided as an example. It may also be integrated with the base conversation system (2).

도 3은 본 발명의 일 실시예에 따른 맞장구 대화 말뭉치 제공 방법을 나타낸 흐름도이다. 도 3을 참조하여 맞장구 대화 말뭉치 제공 장치(100)의 세부 동작들을 시계열적으로 설명한다.Figure 3 is a flowchart showing a method of providing a corpus of tit-for-tat conversation according to an embodiment of the present invention. With reference to FIG. 3 , detailed operations of the device 100 for providing a conversation corpus will be described in chronological order.

도 3에 도시된 바와 같이, 우선 상기 장치(100)의 대화 예제 검출부(110)는, 대화 예제 말뭉치를 저장한 상기 예제 기반 대화 시스템(2)의 데이터베이스(22)에서 발화와 해당 발화에 대응하는 응답을 매핑한 대화 예제를 검출한다(S310).As shown in FIG. 3, first, the dialogue example detection unit 110 of the device 100 detects an utterance and a corresponding utterance in the database 22 of the example-based dialogue system 2 that stores a corpus of dialogue examples. A conversation example mapping a response is detected (S310).

이 경우, 상기 장치(100)의 대화 로그 저장부(150)는 사전에 상기 예제 기반 대화 시스템(2) 또는 타 대화 시스템에 입력된 입력발화와, 상기 입력발화에 대응하여 해당 대화 시스템에서 출력된 출력응답을 연관시킨 입력발화 및 출력응답 페어를 대화 로그 데이터베이스(104b)에 저장하고, 상기 대화 예제 검출부(110)는 상기 대화 로그 데이터베이스(104b)에 저장된 입력발화 및 출력응답 페어를, 맞장구 응답 추출을 위한 대화 예제로서 검출할 수 있다.In this case, the conversation log storage unit 150 of the device 100 stores input speech previously input to the example-based dialogue system 2 or another dialogue system, and output from the dialogue system in response to the input speech. The input utterance and output response pairs associated with the output response are stored in the conversation log database 104b, and the conversation example detector 110 extracts a matching response from the input utterance and output response pair stored in the conversation log database 104b. It can be detected as a conversation example for .

그 다음, 상기 장치(100)의 맞장구 대화 말뭉치 생성부(120)는, 상기 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답에서 맞장구 응답을 추출하고, 상기 검출된 대화 예제의 발화와 추출된 맞장구 응답을 매핑하는 방식으로 맞장구 대화 말뭉치를 생성한다(S320 내지 S340).Next, the tit-for-tat conversation corpus generator 120 of the device 100 extracts a tit-for-tat response from the response to the conversation example detected by the conversation example detection unit 110, and extracts and utterances the detected conversation example. A corpus of tit-for-tat conversations is created by mapping the tit-for-tat responses (S320 to S340).

이 경우, 맞장구 대화 말뭉치 생성부(120)의 맞장구 응답 추출부(122)는, 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답에서 적어도 하나의 어절을 포함하는 응답 부분을 분리하여 해당 응답 부분을 맞장구 응답으로 추출할 수 있다(S320).In this case, the matching response extractor 122 of the matching conversation corpus generation unit 120 separates the response part containing at least one word from the response of the conversation example detected by the conversation example detection unit 110 and responds to the corresponding response. The part can be extracted as a matching response (S320).

일 실시예에 있어서, 맞장구 응답 추출부(122)는 대화 예제 검출부(110)에 의해 검출된 대화 예제의 응답이 복수의 문장을 포함하는 경우, 해당 응답에서 첫 번째 문장에 해당하는 응답 부분을 분리하여 맞장구 응답으로 추출할 수 있다.In one embodiment, when the response of the conversation example detected by the conversation example detection unit 110 includes a plurality of sentences, the matching response extractor 122 separates the response part corresponding to the first sentence from the response. Thus, it can be extracted as a matching response.

도 4에는 본 발명에 적용 가능한 맞장구 응답 추출 알고리즘의 일례가 도시되어 있다. 도 4에서, S는 대화 예제 말뭉치의 전체 응답 집합이고, L은 맞장구 응답에 포함되는 최대 어절의 개수이고, Answer[0:i]는 해당 응답에서 i번째 어절까지를 포함하는 응답 부분이다.Figure 4 shows an example of a matching response extraction algorithm applicable to the present invention. In Figure 4, S is the entire response set of the conversation example corpus, L is the maximum number of words included in the correct response, and Answer[0:i] is the response part including up to the i-th word in the response.

도 4에 도시된 바와 같이, 상기 맞장구 응답 추출부(122)는 부분집합 알고리즘을 이용하여 해당 응답에서 맞장구 응답을 추출할 수 있다. 즉, 맞장구 응답 추출부(122)는 해당 응답에서 i번째 어절까지를 포함하는 응답 부분을 분리하고, 분리된 응답 부분이 상기 대화 예제 말뭉치의 전체 응답 집합(S)에 독립된 응답으로 포함되어 있는 경우, 해당 응답 부분을 맞장구 응답으로 추출할 수 있다.As shown in FIG. 4, the matching response extractor 122 can extract a matching response from the corresponding response using a subset algorithm. That is, the matching response extractor 122 separates the response part including up to the i-th word from the corresponding response, and the separated response part is included as an independent response in the entire response set (S) of the conversation example corpus. , the corresponding response part can be extracted as a matching response.

다시 도 3을 참조하면, 맞장구 대화 말뭉치 생성부(120)의 맞장구 응답 통합부(124)는, 상기 맞장구 응답 추출부(122)에 의해 추출된 맞장구 응답들을 그룹핑하고, 그룹핑된 맞장구 응답들을 대표하는 대표 맞장구 응답을 선정함으로써 상기 그룹핑된 맞장구 응답들을 통합할 수 있다(S330).Referring again to FIG. 3, the matching response integration unit 124 of the matching conversation corpus generation unit 120 groups the matching responses extracted by the matching matching response extracting unit 122, and generates a grouping of matching responses representing the grouped matching responses. The grouped matching responses can be integrated by selecting a representative matching response (S330).

앞서 언급한 바와 같이, 맞장구 응답 통합부(124)는 클러스터링을 이용한 방식 또는 맞장구 응답의 고빈도 정렬을 이용한 방식을 이용하여 맞장구 응답들을 통합할 수 있다. 즉, 맞장구 응답 통합부(124)는, 클러스터링 알고리즘을 이용하여 자동으로 응답들에 관한 클러스터를 생성하고 생성된 각각의 클러스터에 적절한 맞장구 응답을 선정하여 맞장구 응답들을 통합하거나, 맞장구 응답들을 고빈도 순으로 정렬하여 출력하고 사용자나 관리자에 의해 선택에 따라 맞장구 응답들을 통합할 수 있다.As mentioned above, the matching response integrator 124 may integrate the matching responses using a method using clustering or a method using a high-frequency sorting of the matching responses. That is, the matching response integrator 124 automatically creates clusters of responses using a clustering algorithm and selects an appropriate matching response for each generated cluster to integrate the matching responses, or ranks the matching responses in high frequency order. You can sort and output them and integrate the matching responses as selected by the user or administrator.

그 다음, 상기 맞장구 대화 말뭉치 생성부(120)의 맞장구 대화 예제 생성부(126)는, 대화 예제 검출부(110)에 의해 검출된 대화 예제의 발화와 맞장구 응답 추출부(122)에 의해 추출된 맞장구 응답을 매핑하여 맞장구 대화 예제를 생성할 수 있다(S340). 이 경우, 맞장구 대화 예제 생성부(126)는 상기 검출된 대화 예제의 발화와 상기 맞장구 응답 통합부(124)에 의해 선정된 대표 맞장구 응답을 매핑하여 맞장구 대화 예제들을 생성할 수 있다.Next, the matching conversation example generation unit 126 of the matching conversation corpus generation unit 120 selects the utterance of the conversation example detected by the conversation example detection unit 110 and the matching conversation example extracted by the matching response extracting unit 122. A tit-for-tat conversation example can be created by mapping the response (S340). In this case, the matching conversation example generating unit 126 may generate matching conversation examples by mapping the utterance of the detected conversation example and the representative matching response selected by the matching conversation example integration unit 124.

그 다음, 상기 장치(100)의 맞장구 대화 말뭉치 저장부(130)는, 상기와 같이 생성된 맞장구 대화 예제들을 포함하는 맞장구 대화 말뭉치를 데이터베이스(104a)에 저장한다(S350).Next, the matching conversation corpus storage unit 130 of the device 100 stores the matching conversation corpus including the matching conversation examples generated as described above in the database 104a (S350).

그러면, 상기 장치(100)의 맞장구 응답 출력부(140)는, 상기 맞장구 대화 말뭉치 제공 장치(100)와 연동하는 대화 시스템에 입력된 사용자의 발화에 대응하여, 맞장구 응답을 출력할 수 있다(S360). 예컨대, 맞장구 응답 출력부(140)는 예제 기반 대화 시스템(2)에 입력된 사용자의 발화에 대응하는 응답이 해당 대화 시스템의 대화 예제 말뭉치에 존재하지 않는 경우에, 상기 맞장구 대화 말뭉치를 이용하여 상기 맞장구 대화 말뭉치에 포함된 소량의 맞장구 응답들 중 상기 사용자의 발화에 대응하는 가장 적절한 맞장구 응답을 출력할 수 있다.Then, the tit-for-tat response output unit 140 of the device 100 may output a tit-for-tat response in response to the user's utterance input into the conversation system that works with the chat corpus providing device 100 (S360) ). For example, if the response corresponding to the user's utterance input to the example-based dialogue system 2 does not exist in the dialogue example corpus of the corresponding dialogue system, the matching response output unit 140 uses the matching dialogue corpus to The most appropriate tit-for-tat response corresponding to the user's utterance can be output among a small amount of tit-for-tat responses included in the tit-for-tat conversation corpus.

한편, 본 발명에 따른 실시예들은 컴퓨터 시스템과 이러한 컴퓨터 시스템을 구동하는 컴퓨터 프로그램으로 구현될 수 있다. 본 발명의 실시예들이 컴퓨터 프로그램으로 구현되는 경우, 본 발명의 구성요소들은 해당 컴퓨터 시스템을 통해 해당 동작이나 작업을 실행하는 프로그램 세그먼트들이다. 이러한 컴퓨터 프로그램 내지 프로그램 세그먼트들은 컴퓨터로 판독 가능한 다양한 기록매체에 저장될 수 있다. 컴퓨터로 판독 가능한 기록매체에는 컴퓨터 시스템이 읽어들일 수 있는 데이터를 기록하는 모든 종류의 매체가 포함된다. 예컨대, 컴퓨터로 판독 가능한 기록매체에는 ROM, RAM, EEPROM, 레지스터, 플래시 메모리, CD-ROM, 자기 테이프, 하드 디스크, 플로피디스크, 또는 광 데이터 기록장치 등이 포함될 수 있다. 또한, 이러한 기록매체는 다양한 네트워크로 연결된 컴퓨터 시스템들에 분산 배치되어 프로그램 코드들을 분산 방식으로 저장하거나 실행시킬 수 있다.Meanwhile, embodiments according to the present invention may be implemented with a computer system and a computer program that runs the computer system. When embodiments of the present invention are implemented as a computer program, the components of the present invention are program segments that execute the corresponding operation or task through the computer system. These computer programs or program segments may be stored in various computer-readable recording media. Computer-readable recording media include all types of media that record data that can be read by a computer system. For example, computer-readable recording media may include ROM, RAM, EEPROM, registers, flash memory, CD-ROM, magnetic tape, hard disk, floppy disk, or optical data recording device. Additionally, these recording media can be distributed across computer systems connected to various networks to store or execute program codes in a distributed manner.

상술한 바와 같이, 본 발명에 따르면, 본 발명에 따르면, 컴퓨터 장치 또는 애플리케이션으로 구현 가능한 맞장구 대화 말뭉치 제공 장치가 예제 기반 대화 시스템의 대화 예제 말뭉치를 이용하여 맞장구 대화 말뭉치를 자동으로 생성 및 제공함으로써, 대화 시스템으로 하여금 사용자의 발화 내용에 따라 정보 전달을 위한 응답과 맞장구 응답을 선택적으로 출력할 수 있도록 하고 대화 시스템의 응답 커버리지와 응답 정확도를 동시에 개선하면서도, 시스템 구축 비용을 절감할 수 있다.As described above, according to the present invention, a device for providing a matching conversation corpus that can be implemented as a computer device or an application automatically generates and provides a matching conversation corpus using a conversation example corpus of an example-based conversation system, It allows the dialogue system to selectively output responses to convey information and counter-argument responses depending on the content of the user's speech, and simultaneously improves the response coverage and response accuracy of the dialogue system, while reducing system construction costs.

나아가, 본 발명에 따른 실시예들은, 당해 기술 분야는 물론 관련 기술 분야에서 본 명세서에 언급된 내용 이외의 다른 여러 기술적 과제들을 해결할 수 있음은 물론이다.Furthermore, it goes without saying that the embodiments according to the present invention can solve various technical problems other than those mentioned in this specification in the relevant technical field as well as in the related technical field.

지금까지 본 발명에 대해 구체적인 실시예들을 참고하여 설명하였다. 그러나 당업자라면 본 발명의 기술적 범위에서 다양한 변형 실시예들이 구현될 수 있음을 명확하게 이해할 수 있을 것이다. 그러므로 앞서 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 할 것이다. 즉, 본 발명의 진정한 기술적 사상의 범위는 청구범위에 나타나 있으며, 그와 균등범위 내에 있는 모든 차이점은 본 발명에 포함되는 것으로 해석되어야 할 것이다.So far, the present invention has been described with reference to specific embodiments. However, those skilled in the art will clearly understand that various modified embodiments can be implemented within the technical scope of the present invention. Therefore, the previously disclosed embodiments should be considered from an explanatory perspective rather than a limiting perspective. In other words, the scope of the true technical idea of the present invention is shown in the claims, and all differences within the scope of equivalents should be construed as being included in the present invention.

110 : 대화 예제 검출부 120 : 맞장구 대화 말뭉치 생성부
122 : 맞장구 응답 추출부 124 : 맞장구 응답 통합부
126 : 맞장구 대화 예제 생성부 130 : 맞장구 대화 말뭉치 저장부
140 : 맞장구 응답 출력부 150 : 대화 로그 저장부110: Conversation example detection unit 120: Conversation conversation corpus generation unit
122: Matching response extraction unit 124: Matching response integration unit
126: Matching conversation example generation unit 130: Matching conversation corpus storage unit
140: response output unit 150: conversation log storage unit

Claims

A tit-for-tat conversation corpus providing device that provides a tit-for-tat conversation corpus in conjunction with an example-based dialogue system that outputs the system's response to the user's utterance based on a conversation example corpus,
a dialogue example detection unit that detects a dialogue example mapping a user's utterance and a response of a system corresponding to the utterance from the database of the example-based dialogue system that stores the dialogue example corpus;
a tit-for-tat conversation corpus generating unit that extracts a tit-for-tat response from the system response of the detected conversation example, and generates a tit-for-tat conversation corpus by mapping the user's utterance of the detected conversation example and the extracted tit-for-tat response;
a matching conversation corpus storage unit that stores the matching conversation corpus; and
If the system response corresponding to the user's utterance input to the example-based dialogue system does not exist in the dialogue example corpus, it is input to the example-based dialogue system using the tit-for-tat dialogue corpus stored by the tit-for-tat dialogue corpus storage unit. A matching dialogue corpus providing device comprising: a matching response output unit that outputs a matching response corresponding to the user's utterance.

According to paragraph 1,
The device stores an input utterance and an output response pair that associate an input utterance input to the example-based dialogue system or another dialogue system and an output response output from the dialogue system in response to the input utterance in a dialogue log database. Further comprising a conversation log storage unit,
The conversation example detection unit is configured to detect input utterance and output response pairs stored in the conversation log database as conversation examples.

According to paragraph 1,
The tit-for-tat conversation corpus generation unit,
A response part containing at least one word is separated from the system response of the detected conversation example, and if the separated response part is included as an independent response in the dialogue example corpus, the separated response part is converted into the matching response. A matching response extraction unit for extracting; and
Providing a tit-for-tat conversation corpus, comprising a tit-for-tat response integration unit that groups the tit-for-tat responses extracted by the tit-for-tat response extractor and integrates the grouped tit-for-tat responses by selecting a representative tit-for-tat response representing the grouped tit-for-tat responses. Device.

According to clause 3,
The tit-for-tat response extractor is configured to separate a response part corresponding to the first sentence from the response when the response of the system of the detected conversation example includes a plurality of sentences.

According to paragraph 3,
The tit-for-tat response extractor is configured to separate, from the system response of the detected conversation example, a response portion that includes a plurality of consecutive words but includes words within a predetermined number.

According to clause 5,
The response part including the plurality of consecutive words includes a first word constituting the system's response to the detected conversation example and at least one word consecutive from the first word. Device.

According to paragraph 3,
The tit-for-tat conversation corpus generating unit further comprises a tit-for-tat conversation example generating unit that generates a tit-for-tat conversation example by mapping the user's utterance of the detected conversation example and the representative tit-for-tat response.

delete

A method of providing a matching conversation corpus in which a computer device provides a matching conversation corpus in conjunction with an example-based conversation system that outputs the system's response to the user's utterance based on a conversation example corpus, comprising:
Step (a) of detecting a conversation example mapping a user's utterance and a response of a system corresponding to the utterance in the database of the example-based conversation system where the computer device stores the conversation example corpus;
Step (b) in which the computer device extracts a tit-for-tat response from the system response of the detected conversation example, and generates a tit-for-tat conversation corpus by mapping the user's utterance of the detected conversation example and the extracted tit-for-tat response;
Step (c) of the computer device storing the corpus of tit-for-tat conversations; and
If the system response corresponding to the user's utterance input to the example-based dialogue system does not exist in the dialogue example corpus, the computer device uses the tit-for-tat dialogue corpus to determine the user's utterance input to the example-based dialogue system. A method of providing a matching conversation corpus comprising: outputting a matching response corresponding to.

According to clause 9,
The method includes an input utterance in which the computer device associates an input utterance input to the example-based dialogue system or another dialogue system with an output response output from the corresponding dialogue system in response to the input utterance before step (a), and Further comprising the step of storing the output response pair in a conversation log database,
The step (a) includes detecting, by the computer device, an input utterance and output response pair stored in the conversation log database as a conversation example.

According to clause 9,
In step (b),
A response part containing at least one word is separated from the system response of the detected conversation example, and if the separated response part is included as an independent response in the dialogue example corpus, the separated response part is converted into the matching response. Extracting (b1) step; and
A corpus of tit-for-tat conversations comprising a step (b2) of grouping the tit-for-tat responses extracted by the tit-for-tat response extractor and integrating the grouped tit-for-tat responses by selecting a representative tit-for-tat response representing the grouped tit-for-tat responses. How to provide.

According to clause 11,
The step (b1) includes, when the response of the system of the detected conversation example includes a plurality of sentences, separating the response part corresponding to the first sentence from the response. A corpus of tit-for-tat conversation. How to provide.

According to clause 11,
The step (b1) includes the step of separating a response part containing a plurality of consecutive words from the system response of the detected conversation example, but separating a response part containing words within a predetermined number. A method of providing a corpus of characteristic tit-for-tat conversations.

According to clause 13,
The response part including the plurality of consecutive words includes a first word constituting the system's response to the detected conversation example and at least one word consecutive from the first word. method.

According to clause 11,
The step (b) further includes a step (b3) of generating a tit-for-tat conversation example by mapping the user's utterance of the detected conversation example and the representative tit-for-tat response.

delete

A computer program that executes the method according to any one of claims 9 to 15 through a computer, and is recorded on a computer-readable recording medium.