KR102586289B1

KR102586289B1 - Conversational agent system and method using predict conversation prediction

Info

Publication number: KR102586289B1
Application number: KR1020200180681A
Authority: KR
Inventors: 신홍식; 정원택; 이청안
Original assignee: 한국전자인증 주식회사
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-10-10
Also published as: KR20220089972A

Abstract

본 기술은 대화 예측을 이용한 대화 에이전트 시스템 및 방법이 개시된다. 이러한 기술에 대한 구체적인 구현 예는 현재 대화 스텝을 기준으로 미래 대화 스텝의 다수의 질의 후보 각각에 대한 다수의 응답 후보군 각각의 보상값에 대한 누적 보상값을 반영하여 현재 대화 스텝에서의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화함에 따라 오픈 도메인 상에서 실시간으로 자연스러운 대화를 이어갈 수 있고, 다수의 응답 후보 중 악의적 응답 후보를 삭제함에 따라 악의적 대화를 미연에 방지할 수 있다.This technology discloses a conversation agent system and method using conversation prediction. A specific example of implementation of this technology is to reflect the cumulative reward value of each of the multiple response candidates for each of the multiple query candidates in the future dialog step based on the current dialog step, thereby generating the multiple response candidates in the current dialog step. By deriving one response candidate and uttering the derived response candidate, a natural conversation can be continued in real time on the open domain, and by deleting a malicious response candidate among multiple response candidates, malicious conversation can be prevented in advance.

Description

Conversation agent system and method using conversation prediction {CONVERSATIONAL AGENT SYSTEM AND METHOD USING PREDICT CONVERSATION PREDICTION}

본 발명은 대화 예측을 이용한 대화 에이전트 시스템 및 방법에 관한 것으로서, 더욱 상세하게는 오픈 도메인 상에서 다음 대화 스텝의 질의에 대한 다수의 응답 후보의 누적 보상값을 반영하여 현재 대화 스텝에서의 유저의 질의에 대한 응답을 발화함에 따라 대화를 실시간으로 자연스럽게 진행할 수 있도록 한 기술에 관한 것이다.The present invention relates to a conversation agent system and method using conversation prediction. More specifically, the present invention relates to a conversation agent system and method using conversation prediction. More specifically, the present invention relates to a user's query in the current conversation step by reflecting the cumulative reward value of multiple response candidates for the query of the next conversation step in an open domain. It is about technology that allows conversations to proceed naturally in real time as responses are uttered.

기존의 대화형 시스템 연구에는 사용자의 감정은 고려되지 않은 채 발화된 문장에 대해서 답변을 하기에 급급하였으나 근래에는 감정을 포함한 대화형 시스템을 개발하려는 연구가 활발히 진행되고 있다.Existing research on interactive systems focused on providing answers to sentences uttered without considering the user's emotions, but recently, research on developing interactive systems that include emotions is actively underway.

이러한 대화형 시스템에 적용되는 딥러닝 인코더는 딥러닝 기술을 사용해서 가변 길이 문서를 고정 길이 문서 벡터로 표현하는 방법으로, 감정 분류 분야에서 우수한 성능을 보여줄 수 있다. The deep learning encoder applied to these interactive systems uses deep learning technology to express variable-length documents as fixed-length document vectors, and can show excellent performance in the field of emotion classification.

하지만 전체 문서 시퀀스의 마지막 출력을 문장 벡터로 간주하는 LSTM(Long Short Term Memory) 인코딩 장치의 경우, 입력이 길어짐에 따라 초기에 입력된 패턴의 인식률이 급격히 저하되어, 긴 문장의 인코딩 장치로는 적합하지 않은 문제점이 있다. However, in the case of an LSTM (Long Short Term Memory) encoding device that considers the last output of the entire document sequence as a sentence vector, the recognition rate of the initially input pattern decreases rapidly as the input becomes longer, making it unsuitable as an encoding device for long sentences. There is a problem that was not done.

1. 한국공개특허 제2013-013557호(연쇄 대화 패턴 기반 대화 시스템 및 방법)1. Korean Patent Publication No. 2013-013557 (Chain conversation pattern-based conversation system and method)

2. Jang, Youngsoo, Jongmin Lee, and Kee-Eung Kim. "Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues." AAAI. 2020.2. Jang, Youngsoo, Jongmin Lee, and Kee-Eung Kim. “Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues.” AAAI. 2020.

따라서, 본 발명은 다음 대화 스텝의 질의에 대한 다수의 응답 후보의 누적 보상값을 토대호 토대로 다음 대화 스텝의 대화를 추정하고 추정된 다음 대화 스텝의 질의를 반영하여 현재 대화 스텝에서의 유저의 질의에 대한 응답을 발화함에 따라 오픈 도메인 상에서 대화를 실시간으로 자연스럽게 이어갈 수 있는 대화 예측을 이용한 대화 에이전트 시스템 및 방법을 제공하고자 함에 있다.Therefore, the present invention estimates the conversation of the next conversation step based on the cumulative reward value of a plurality of response candidates to the inquiry of the next conversation step, and reflects the query of the estimated next conversation step to answer the user's query in the current conversation step. The goal is to provide a conversation agent system and method using conversation prediction that can naturally continue a conversation in real time on an open domain as a response is uttered.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The object of the present invention is not limited to the object mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood through the following description and will be more clearly understood through the examples of the present invention. In addition, it will be readily apparent that the objects and advantages of the present invention can be realized by means and combinations thereof as indicated in the claims.

본 발명의 일 실시예에 따른 대화 예측을 이용한 대화 에이전트 시스템은, A conversation agent system using conversation prediction according to an embodiment of the present invention,

다수의 대화 데이터 및 이전 대화 데이터들을 수집하여 입력된 현재 대화 스텝의 질의에 대한 다수의 응답 후보를 추정하는 대화 학습 모델을 구축하는 대화 학습 모델구축부;A conversation learning model building unit that collects a plurality of conversation data and previous conversation data and builds a conversation learning model to estimate a number of response candidates for the query of the input current conversation step;

입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하거나 입력된 현재 대화 스텝에서의 질의에 대한 탐색 횟수가 추정된 다수의 응답 후보의 수보다 크지 아니한 경우 현재 대화 스텝에서 추정된 다수의 응답 후보 중 탐색하고자 하는 하나의 응답 후보를 도출하는 탐색 응답 후보 도출부; 및If the number of dialogue steps for a query in the current dialogue step entered reaches the predetermined maximum number of dialogue steps, or the number of searches for a query in the current dialogue step entered is not greater than the estimated number of response candidates, a search response candidate deriving unit that derives one response candidate to be searched among the estimated plurality of response candidates; and

상기 탐색 응답 후보에 대해 현재 대화 스텝을 기준으로 다음 대화 스텝에서의 다수의 각 질의 후보에 대한 다수의 응답 후보의 누적 보상값을 반영하여 현재 대화 스텝의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화하는 응답 발화부를 포함하는 것을 일 특징으로 한다.Regarding the search response candidate, based on the current conversation step, one response candidate among the plurality of response candidates of the current conversation step is derived by reflecting the cumulative reward value of the plurality of response candidates for each query candidate in the next conversation step. One feature is that it includes a response speech unit that utters the derived response candidate.

바람직하게 응답 발화부는,Preferably the response speech part is:

상기 탐색 응답 후보의 보상값을 연산하여 현재 보상값을 출력하는 현재 보상값 연산모듈;a current compensation value calculation module that calculates the compensation value of the search response candidate and outputs a current compensation value;

상기 탐색 응답 후보에 대해 현재 대화 스텝을 기준으로 다음 대화 스텝에서의 다수의 질의 후보에 대한 다수의 응답 후보의 누적 보상값을 도출하는 미래 누적보상값 연산모듈;a future cumulative reward value calculation module for deriving a cumulative reward value of a plurality of response candidates for a plurality of query candidates in a next conversation step based on the current conversation step for the search response candidate;

상기 미래 누적 보상값과 현재 보상값의 합으로 상기 탐색 응답 후보의 보상값을 도출하는 탐색응답후보 보상값 연산모듈; 및a search response candidate reward value calculation module that derives a reward value of the search response candidate from the sum of the future cumulative reward value and the current reward value; and

다수의 응답 후보에 대한 탐색응답후보 보상값의 평균을 도출하는 평균 보상값 도출모듈; 및An average reward value derivation module that derives the average of search response candidate reward values for a plurality of response candidates; and

다수의 응답 후보에 대한 모든 탐색응답후보 보상값의 평균으로 다수의 응답 후보 중 하나의 응답을 도출하고 도출된 하나의 응답을 발화하는 발화모듈을 포함할 수 있다. It may include an utterance module that derives one response among the plurality of response candidates by averaging the compensation values of all search response candidates for the plurality of response candidates and utters the derived single response.

바람직하게 상기 현재 보상값 연산 모듈은,Preferably, the current compensation value calculation module is:

연산된 현재 보상값이 기 정해진 최소 허용 보상값 보다 작은 경우 상기 탐색응답후보에 대해 악의적 응답 후보로 판단하고 탐색 응답 후보를 다수의 응답 후보에서 삭제하는 필터모듈을 더 포함할 수 있다.If the calculated current compensation value is smaller than the predetermined minimum allowable compensation value, the search response candidate may be determined to be a malicious response candidate and may further include a filter module for deleting the search response candidate from the plurality of response candidates.

바람직하게 상기 시스템은,Preferably, the system:

상기 입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하지 아니하고 입력된 현재 대화 스텝의 질의에 대한 탐색 횟수가 상기 추정된 다수의 응답 후보의 수보다 큰 경우 새로운 응답 후보를 추가로 추정하는 추가 응답 후보 생성부를 더 포함할 수 있다. If the number of dialogue steps for the query of the input current dialogue step does not reach the predetermined maximum number of dialogue steps and the number of searches for the query of the input current dialogue step is greater than the estimated number of response candidates, a new response candidate It may further include an additional response candidate generator that additionally estimates .

바람직하게 상기 누적 보상값은Preferably, the cumulative compensation value is

정책 그래디언트 기법으로 누적 보상값을 최적화하는 GAE(Generalized Advantage Estimation) 모델로 구비될 수 있다. It can be equipped with a Generalized Advantage Estimation (GAE) model that optimizes the cumulative reward value using a policy gradient technique.

본 발명의 다른 실시 양태에 의거, 대화 예측을 이용한 대화 에이전트 방법은, According to another embodiment of the present invention, a conversation agent method using conversation prediction,

다수의 대화 데이터 및 이전 대화 데이터들을 수집하여 입력된 현재 대화 스텝의 질의에 대한 다수의 응답 후보를 추정하는 대화 학습 모델을 구축하는 대화 학습 모델 구축단계;A conversation learning model building step of collecting a plurality of conversation data and previous conversation data to build a conversation learning model that estimates a number of response candidates for the query of the current conversation step input;

입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하거나 입력된 현재 대화 스텝에서의 질의에 대한 탐색 횟수가 추정된 다수의 응답 후보의 수보다 크지 아니한 경우 현재 대화 스텝에서 추정된 다수의 응답 후보 중 탐색하고자 하는 하나의 응답 후보를 도출하는 탐색 응답 후보 도출단계; 및If the number of dialogue steps for a query in the current dialogue step entered reaches the predetermined maximum number of dialogue steps, or the number of searches for a query in the current dialogue step entered is not greater than the estimated number of response candidates, the number of search steps in the current dialogue step is A search response candidate derivation step of deriving one response candidate to be searched from among the estimated plurality of response candidates; and

상기 탐색 응답 후보에 대해 현재 대화 스텝을 기준으로 다음 대화 스텝에서의 다수의 각 질의 후보에 대한 다수의 응답 후보의 누적 보상값을 반영하여 현재 대화 스텝의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화하는 응답 발화단계를 포함하는 것을 일 특징으로 한다.Regarding the search response candidate, based on the current conversation step, one response candidate among the plurality of response candidates of the current conversation step is derived by reflecting the cumulative reward value of the plurality of response candidates for each query candidate in the next conversation step. and a response utterance step of uttering the derived response candidate.

바람직하게 응답 발화단계는,Preferably, the response speech step is,

상기 탐색 응답 후보의 보상값을 연산하여 현재 보상값을 출력하는 단계;calculating a compensation value of the search response candidate and outputting a current compensation value;

상기 탐색 응답 후보에 대해 현재 대화 스텝을 기준으로 다음 대화 스텝에서의 다수의 질의 후보에 대한 다수의 응답 후보의 누적 보상값을 도출하는 단계;Deriving a cumulative reward value of a plurality of response candidates for a plurality of query candidates in a next dialogue step based on the current dialogue step for the search response candidate;

상기 미래 누적 보상값과 현재 보상값의 합으로 상기 탐색 응답 후보의 보상값을 도출하는 단계; Deriving a reward value of the search response candidate from the sum of the future accumulated reward value and the current reward value;

다수의 응답 후보에 대한 탐색응답후보 보상값의 평균을 도출하는 단계; 및Deriving an average of search response candidate reward values for a plurality of response candidates; and

다수의 응답 후보에 대한 모든 탐색응답후보 보상값의 평균으로 다수의 응답 후보 중 하나의 응답을 도출하고 도출된 하나의 응답을 발화하는 단계를 포함할 수 있다. It may include deriving one response among the plurality of response candidates by averaging the reward values of all search response candidates for the plurality of response candidates and uttering the derived single response.

바람직하게 상기 현재 보상값 연산하는 단계는,Preferably, the step of calculating the current compensation value is,

연산된 현재 보상값이 기 정해진 최소 허용 보상값 보다 작은 경우 상기 탐색응답후보에 대해 악의적 응답 후보로 판단하고 탐색 응답 후보를 다수의 응답 후보에서 삭제하는 단계를 더 포함할 수 있다.If the calculated current compensation value is less than a predetermined minimum allowable compensation value, the step of determining the search response candidate as a malicious response candidate and deleting the search response candidate from the plurality of response candidates may be further included.

바람직하게 상기 방법은,Preferably, the method:

상기 입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하지 아니하고 입력된 현재 대화 스텝의 질의에 대한 탐색 횟수가 상기 추정된 다수의 응답 후보의 수보다 큰 경우 새로운 응답 후보를 추가로 추정하는 추가 응답 후보 생성단계를 더 포함할 수 있다.If the number of dialogue steps for the query of the input current dialogue step does not reach the predetermined maximum number of dialogue steps and the number of searches for the query of the input current dialogue step is greater than the estimated number of response candidates, a new response candidate It may further include an additional response candidate generation step of additionally estimating .

일 실시 예에 따르면, 현재 대화 스텝을 기준으로 미래 대화 스텝의 다수의 질의 후보 각각에 대한 다수의 응답 후보군 각각의 보상값에 대한 누적 보상값을 반영하여 현재 대화 스텝에서의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화함에 따라 오픈 도메인 상에서 실시간으로 자연스러운 대화를 이어갈 수 있다.According to one embodiment, based on the current dialogue step, one of the plurality of response candidates in the current dialogue step is reflected by reflecting the cumulative reward value of each of the plurality of response candidates for each of the plurality of query candidates in the future dialogue step. By deriving response candidates and uttering the derived response candidates, a natural conversation can be continued in real time on the open domain.

또한 일 실시예에 의거, 다수의 응답 후보 중 악의적 응답 후보를 삭제함에 따라 악의적 대화로 인한 감정 손상을 미연에 방지할 수 있는 효과가 있다. In addition, according to one embodiment, by deleting a malicious response candidate from a plurality of response candidates, there is an effect of preventing emotional damage caused by a malicious conversation in advance.

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 후술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 일 실시예의 대화 예측을 이용한 대화 에이전트 시스템의 구성도이다.
도 2는 일 실시예의 시스템의 응답 발화부의 세부 구성도이다.
도 3은 다른 실시예의 대화 에이전트 과정의 전체 흐름도이다. The following drawings attached to this specification illustrate preferred embodiments of the present invention, and together with the detailed description of the invention described later, serve to further understand the technical idea of the present invention. Therefore, the present invention includes the matters described in such drawings. It should not be interpreted as limited to only .
1 is a configuration diagram of a conversation agent system using conversation prediction according to an embodiment.
Figure 2 is a detailed configuration diagram of the response speech unit of the system of one embodiment.
Figure 3 is an overall flow diagram of a conversation agent process in another embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, the embodiments are not limited to the specific disclosed form, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, but are not intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the relevant technical field. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by explaining preferred embodiments of the present invention with reference to the attached drawings.

도 1은 일 실시예의 대화 예측을 이용한 대화 에이전트 시스템의 구성도이고, 도 2는 도 1의 미래 대화 추정부의 세부 구성도이다.Figure 1 is a configuration diagram of a conversation agent system using conversation prediction according to an embodiment, and Figure 2 is a detailed configuration diagram of the future conversation estimation unit of Figure 1.

도 1 및 도 2를 참조하면, 일 실시 예에 따른 대화 예측을 이용한 대화 에이전트 시스템은 현재 대화 스텝을 기준으로 미래 대화 스텝의 다수의 질의 후보 각각에 대한 다수의 응답 후보 각각의 보상값에 대한 누적 보상값을 반영하여 현재 대화 스텝에서의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화하도록 구비되고, 이에 시스템은, 대화 학습 모델 구축부(100), 탐색응답후보 도출부(200), 응답 발화부(300), 및 추가응답후보 생성부(400) 중 적어도 하나를 포함할 수 있다.Referring to Figures 1 and 2, a conversation agent system using conversation prediction according to an embodiment accumulates reward values for each of a plurality of response candidates for each of a plurality of query candidates in a future conversation step based on the current conversation step. It is equipped to reflect the reward value to derive one response candidate among the plurality of response candidates in the current conversation step and utter the derived response candidate. Accordingly, the system includes a conversation learning model construction unit 100 and a search response candidate derivation unit. It may include at least one of (200), a response speech unit 300, and an additional response candidate generation unit 400.

여기서, 대화 학습 모델구축부(100)는 영화, 드라마 자막의 대화 데이터, 소셜 미디어 등의 다수의 대화 데이터를 수집할 수 있고, 유저와 대화 에이전트 시스템 간의 다수의 이전 대화 데이터들을 수집하고 수집된 대화 데이터를 학습 데이터 형태로 저장한다. Here, the conversation learning model building unit 100 can collect a large number of conversation data such as conversation data of movie and drama subtitles, social media, etc., and collect a number of previous conversation data between the user and the conversation agent system and collect the collected conversations. Save the data in the form of learning data.

입력된 현재 대화 스텝에서의 유저의 질의에 대해 시스템은 기 구축된 대화 학습 모델을 기반으로 학습 수행하여 다수의 응답 후보를 도출하고 도출된 다수의 응답 후보는 탐색응답후보 도출부(200)로 전달된다. 다수의 응답 후보는 질의 h_t에 대한 응답 후보 x가 나올 확률로 도출되며, 대화 학습 모델을 토대로 입력된 질의 문장에 대한 다수의 응답 문장을 생성하는 것은 당업자 수준에서 이해되어야 할 것이다.In response to the user's inquiry in the current input dialogue step, the system performs learning based on a pre-built dialogue learning model to derive a plurality of response candidates, and the derived plurality of response candidates are delivered to the search response candidate derivation unit 200. do. _Multiple response candidates are derived from the probability of a response candidate

탐색응답후보 도출부(200)는 입력된 현재 대화 스텝에서의 유저의 질의에 대한 대화 스텝 수 depth 가 기 정해진 최대 대화 스텝 수 Max_Depth 에 도달하거나 입력된 현재 대화 스텝에서의 질의에 대한 탐색 횟수 N(ht)가 추정된 다수의 응답 후보의 수 children(ht)보다 크지 아니한 경우 현재 대화 스텝에서 추정된 다수의 응답 후보 중 탐색응답후보 xt를 도출한다. 일 례로 대화 스텝은 유저와 시스템 간의 대화에서 하나의 질의 문장에 대한 하나의 응답 문장을 하나의 대화 스텝으로 정의되고 다수의 대화 스텝으로 유저와 시스템 간의 대화가 진행된다. 여기서, 최대 대화 스텝은 기 정해진다.The search response candidate derivation unit 200 determines whether the depth of the number of conversation steps for the user's query in the inputted current conversation step reaches the predetermined maximum number of conversation steps Max_Depth or the number of searches for the query in the current input conversation step N ( If ht) is not greater than the number children(ht) of the estimated multiple response candidates, the search response candidate xt is derived from the multiple response candidates estimated in the current conversation step. For example, a conversation step is defined as one response sentence to one query sentence in a conversation between a user and the system, and the conversation between the user and the system progresses through multiple conversation steps. Here, the maximum conversation step is predetermined.

여기서, 탐색응답후보 x_t는 다음 식 1로 나타낼 수 있다.Here, the search response candidate x _t can be expressed as the following equation 1.

[식 1][Equation 1]

여기서, Q(h _t , x)는 현재 단계 시점에서의 질의 h _t 에 대해 추정된 다수의 응답 후보 중 탐색 응답 후보 x에 대한 보상값이고, N(h _t )는 질의 h _t 에 대한 다수의 응답 후보 중 탐색 횟수이며, N(h _t , x)는 현재 대화 스텝에서의 현재 대화 스텝에서의 다수의 응답 후보 중 탐색 횟수이다. 여기서, 탐색 횟수는 다수의 응답 후보 중 보상값으로 발화된 응답 후보의 수이다. 예를 들어, 현재 대화 스텝에서의 질의 h_t에 대한 다수의 응답 후보가 10개이고 연산된 보상값으로 응답 발화된 응답 후보가 5인 경우 N(h_t)는 5이다. Here, Q(h _t , x) is the reward value for search response candidate x among the multiple response candidates estimated for query h _t at the current stage , and N(h _t ) is the multiple response candidate for query h _t This is the number of searches among response candidates, and N(h _t , x) is the number of searches among multiple response candidates at the current conversation step. Here, the number of searches is the number of response candidates uttered as reward values among a plurality of response candidates. For example, if the number of response candidates for the query h _t in the current conversation step is 10 and the number of response candidates uttered with the calculated compensation value is 5, N(h _t ) is 5.

그리고 탐색응답후보 x_t는 응답 발화부(300)로 전달된다. 응답 발화부(300)는 탐색응답후보 x_t에 대해 현재 대화 스텝 t을 기준으로 다음 대화 스텝 t+1에서의 다수의 질의 후보에 대한 다수의 응답 후보의 누적 보상값을 반영하여 현재 대화 스텝의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화하는 구성을 갖추며, 이에 응답 발화부(300)는 현재 보상값 연산모듈(310), 미래 누적보상값 연산모듈(320), 최종보상값 연산모듈(330), 평균 보상값 도출모듈(340), 및 발화모듈(350)을 포함할 수 있다. And the search response candidate x _t is delivered to the response speech unit 300. For the _search response candidate It is configured to derive one response candidate among a plurality of response candidates and utter the derived response candidate, and the response utterance unit 300 includes a current compensation value calculation module 310, a future accumulated compensation value calculation module 320, It may include a final compensation value calculation module 330, an average compensation value derivation module 340, and an ignition module 350.

여기서, 현재 보상값 연산보듈(310)는 수신된 탐색응답후보에 대해 기 구축된 보상 연산 모델 기반으로 학습하여 현재 보상값을 도출하고 도출된 현재 보상값은 최종보상값 연산모듈(330)로 전달된다. 여기서, 보상 연산 모델을 기반으로 탐색응답후보에 대한 보상값을 도출하는 것은 이미 대화 에이전트 시스템에서 이미 적용하고 있고 본 명세서 상에서는 탐색 응답 후보에 대한 보상값을 도출하는 과정을 구체적으로 명시하지 않지만, 당업자 수준에서 이해되어야 할 것이다.Here, the current compensation value operation module 310 learns the received search response candidate based on a pre-built compensation operation model to derive the current compensation value, and the derived current compensation value is transmitted to the final compensation value operation module 330. do. Here, deriving the reward value for the search response candidate based on the reward calculation model has already been applied in the conversation agent system, and although the process of deriving the reward value for the search response candidate is not specifically specified in this specification, those skilled in the art It must be understood at this level.

한편, 미래 누적보상값 연산모듈(320)는 탐색응답후보 x_t에 대해 현재 대화 스텝 t을 기준으로 다음 대화 스텝 t+1에서의 다수의 질의 후보 각각에 대한 다수의 응답 후보의 보상값을 누적하여 누적 보상값을 도출한다. 여기서 누적 보상값은 정책 그래디언트(policy gradient) 기법으로 각 응답 후보의 보상값을 최적화하는 GAE(Generalized Advantage Estimation) 알고리즘으로 구축된 학습 모델에 의거 도출된다. 즉, GAE 기법으로 구축된 모델은 누적 보상값들을 최적화하는 정책 그래디언트 방법으로 지수 가중 추정기의 어드밴티지(advantage) 함수를 이용하여 정책 그래디언트 추정의 분산을 감소하고, 가치(value) 함수를 이용하여 바이어스(bias) 분산(variance) 트래드오프(tradeoff)를 조절하여 trust region 최적화를 수행한다. 이에 GAE 알고리즘에 의거 입력된 탐색 응답 후보에 대한 다음 탐색 시점 t+1의 질의 후보 각각에 대한 다수의 응답 후보의 보상값을 추정하고 추정된 보상값을 누적하여 누적 보상값이 출력된다.Meanwhile, the future cumulative reward value calculation module 320 accumulates the reward values of multiple response candidates for each of the multiple query candidates at the next conversation step t+1 based on the current conversation step t for the search response candidate x _t . This derives the cumulative compensation value. Here, the cumulative reward value is derived based on a learning model built with the Generalized Advantage Estimation (GAE) algorithm that optimizes the reward value of each response candidate using a policy gradient technique. In other words, the model built using the GAE technique is a policy gradient method that optimizes cumulative reward values. It uses the advantage function of the exponential weight estimator to reduce the variance of the policy gradient estimate, and uses the value function to reduce the bias ( Perform trust region optimization by adjusting bias, variance, and tradeoff. Accordingly, based on the GAE algorithm, the compensation value of multiple response candidates for each query candidate at the next search time t+1 for the input search response candidate is estimated, and the estimated reward values are accumulated to output the cumulative reward value.

그리고 출력된 미래 누적보상값은 최종 보상값 연산모듈(330)로 전달된다.And the output future cumulative compensation value is transmitted to the final compensation value calculation module 330.

최종 보상값 연산모듈(330)은 하나의 질의에 대한 다수의 응답 후보의 미래 누적 보상값과 현재 보상값의 합으로 최종 보상값을 도출하고 도출된 최종 보상값은 평균 보상값 도출모듈(340)로 전달된다.The final compensation value calculation module 330 derives the final compensation value as the sum of the future cumulative compensation value and the current compensation value of multiple response candidates for one query, and the derived final compensation value is calculated from the average compensation value derivation module 340. is passed on.

평균 보상값 도출모듈(340)은 하나의 질의에 대한 다수의 응답 후보의 누적 보상값에 대한 평균값으로 다음 대화 스텝 t+1의 모든 질의 후보에 대한 다수의 응답 후보에 대한 평균 보상값을 도출하고 도출된 평균 보상값을 발화 모듈(350)로 전달된다.The average reward value derivation module 340 derives the average reward value for multiple response candidates for all query candidates in the next dialogue step t+1 as the average value of the accumulated reward values of multiple response candidates for one query, and The derived average compensation value is transmitted to the speech module 350.

발화 모듈(350)는 평균 보상값으로 현재 대화 스텝의 응답 후보를 도출하고 도출된 응답 후보를 발화한다. 여기서, 해당 평균 보상값으로 다수의 응답 후보 중 하나의 응답 부호를 도출하는 일련의 과정은 이미 대화 에이전트 시스템에서 이미 적용하고 있고 본 명세서 상에서는 도출된 보상값을 반영하여 하나의 응답 후보를 선택하고 선택된 응답 후보로 발화하는 과정을 구체적으로 명시하지 않지만, 당업자 수준에서 이해되어야 할 것이다. The speech module 350 derives a response candidate for the current conversation step using the average compensation value and utters the derived response candidate. Here, a series of processes for deriving one response sign among multiple response candidates using the average reward value has already been applied in the conversation agent system, and in this specification, one response candidate is selected by reflecting the derived reward value and the selected response candidate is selected. Although the process of uttering a response candidate is not specifically specified, it should be understood at the level of those skilled in the art.

한편, 응답 발화부(300)는 필터모듈(360)을 더 포함하고, 필터모듈(360)은 현재 보상값과 기 정해진 최소 허용 보상값과 비교하고 비교 결과 현재 보상값이 최소 허용 보상값 보다 작은 경우 도출된 탐색 응답 후보가 악의적 응답 후보로 판정하고 탐색 응답 후보를 다수의 응답 후보에서 제거한다. 이에 악의적 대화를 삭제하여 유저와 시스템 간의 감정 손상이 방지된다.Meanwhile, the response utterance unit 300 further includes a filter module 360, where the filter module 360 compares the current compensation value with a predetermined minimum allowable compensation value, and as a result of the comparison, the current compensation value is smaller than the minimum allowable compensation value. In this case, the derived search response candidate is determined to be a malicious response candidate and the search response candidate is removed from the majority of response candidates. As a result, malicious conversations are deleted to prevent emotional damage between users and the system.

그리고, 일 실시예의 시스템은, 추가 응답 후보 생성부(400)를 더 포함하고 추가 응답 후보 생성부(400)는 상기 탐색 응답 후보 도출부(200)에서 입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하지 아니하고 입력된 현재 대화 스텝의 질의에 대한 탐색 횟수가 상기 추정된 다수의 응답 후보의 수보다 큰 경우 새로운 응답 후보를 추가로 추정한다.In addition, the system of one embodiment further includes an additional response candidate generating unit 400, and the additional response candidate generating unit 400 generates a dialogue step for the query of the current dialogue step input from the search response candidate generating unit 200. If the number does not reach the predetermined maximum number of conversation steps and the number of searches for the query of the current conversation step input is greater than the estimated number of response candidates, new response candidates are additionally estimated.

이에 일 실시예는 현재 대화 스텝을 기준으로 미래 대화 스텝의 다수의 질의 후보 각각에 대한 다수의 응답 후보군 각각의 보상값에 대한 누적 보상값을 반영하여 현재 대화 스텝에서의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화함에 따라 오픈 도메인 상에서 실시간으로 자연스러운 대화를 이어갈 수 있고, 다수의 응답 후보 중 악의적 응답 후보를 삭제함에 따라 악의적 대화를 미연에 방지할 수 있다.Accordingly, in one embodiment, based on the current dialogue step, the accumulated compensation value for each of the plurality of response candidates for each of the plurality of query candidates in the future dialogue step is reflected, and one of the plurality of response candidates in the current dialogue step is reflected. By deriving response candidates and uttering the derived response candidates, a natural conversation can be continued in real time on the open domain, and by deleting malicious response candidates among multiple response candidates, malicious conversations can be prevented in advance.

도 3은 일 실시예의 대화 예측을 이용한 대화 에이전트 시스템에 대한 동작 과정을 보인 전체 흐름도로서, 도 1 내지 도 3을 참조하여 본 발명의 다른 실시예에 따른 대화 예측을 이용한 대화 에이전트 방법을 설명한다.FIG. 3 is an overall flowchart showing the operation process of the conversation agent system using conversation prediction according to one embodiment. Referring to FIGS. 1 to 3, the conversation agent method using conversation prediction according to another embodiment of the present invention is explained.

일 실시예의 대화 학습 모델 구축부(100)는 영화, 드라마 자막의 대화 데이터, 소셜 미디어 등의 다수의 대화 데이터 및 이전 대화 데이터들을 수집하여 입력된 질의에 대한 다수의 응답 후보를 추정하는 대화 학습 모델을 구축되었다고 가정하자.In one embodiment, the conversation learning model building unit 100 collects a lot of conversation data and previous conversation data from movie and drama subtitles, social media, etc., and collects a conversation learning model to estimate a number of response candidates for the input query. Let's assume it has been built.

단계(S11)에서, 일 실시예의 탐색응답후보 도출부(200)는 현재 대화 스텝에서 입력된 질의에 대한 다수의 응답 후보를 도출하고, 현재 대화 스텝에서 입력된 질의에 대한 대화 스텝 수 depth를 기 정해진 최대 대화 스텝 수 Max_depth에 도달하였거나(단계 S13) 질의 대한 다수의 응답 후보의 탐색 횟수 N(h_t)가 추정된 다수의 응답 후보의 수 children(h_t) 보다 크지 아니한 경우(S15) 다수의 응답 후보 중 탐색응답후보를 도출한다(S17). In step S11, the search response candidate derivation unit 200 of one embodiment derives a plurality of response candidates for the query input in the current conversation step and stores the number of conversation step depth for the query input in the current conversation step. When the set maximum number of conversation steps Max_depth is reached (step S13) or the search number N(h _t ) of multiple response candidates for the query is not greater than the estimated number children(h _t ) of multiple response candidates (S15) Search response candidates are derived from among the response candidates (S17).

그리고 단계(S19)에서, 일 실시 예의 응답 발화부(300)는 도출된 탐색응답후보에 대해 기 구축된 보상 연산 모델 기반으로 학습하여 현재 보상값을 도출한다.And in step S19, the response utterance unit 300 of one embodiment derives a current compensation value by learning based on a pre-built compensation calculation model for the derived search response candidate.

한편, 단계(S21)(S23)에서, 일 실시 예의 응답 발화부(300)는 현재 보상값 r이 기 정해진 최소 허용 보상값 MIN_R 과 비교하여 현재 보상값 r이 기 정해진 최소 허용 보상값 MIN_R 보다 작은 경우 악의적 대화로 판단하여 도출된 탐색응답후보를 삭제한다. Meanwhile, in steps S21 and S23, the response utterance unit 300 of one embodiment compares the current compensation value r with a predetermined minimum allowable compensation value MIN_R and determines that the current compensation value r is smaller than the predetermined minimum allowable compensation value MIN_R. In this case, it is judged to be a malicious conversation and the derived search response candidate is deleted.

현재 보상값 r이 기 정해진 최소 허용 보상값 MIN_R 보다 작지 아니한 경우, 단계(S25)에서, 일 실시 예의 응답 발화부(300)는 탐색응답후보 x_t에 대해 현재 대화 스텝 t을 기준으로 다음 대화 스텝 t+1에서의 다수의 질의 후보 각각에 대한 다수의 응답 후보의 보상값을 누적하여 미래 누적 보상값을 도출한다.If the current compensation value r is not smaller than the predetermined minimum allowable compensation value MIN_R, in step S25, the response speech unit 300 of one embodiment selects the next dialogue step for the search response candidate x _t based on the current dialogue step t. The future cumulative reward value is derived by accumulating the reward values of the multiple response candidates for each of the multiple query candidates at t+1.

그리고, 단계(S27)에서, 일 실시예의 응답 발화부(300)는 현재 보상값과 미래 누적 보상값을 합산하여 하나의 질의 후보에 대한 다수의 응답 후보의 최종 보상값을 도출한 후 저장한다.Then, in step S27, the response speech unit 300 of one embodiment adds the current reward value and the future cumulative reward value to derive the final reward value of multiple response candidates for one query candidate and stores the result.

단계(S29)에서, 일 실시예의 응답 발화부(300)는 하나의 질의 후보가 추정된 다수의 질의 후보 중 마지막 질의 후보인 지를 판단하며, 판단 결과 마지막 질의 후보인 경우 단계(S31)에서, 일 실시예의 응답 발화부(300)는 도출된 하나의 질의 후보에 대한 다수의 응답 후보의 최종 보상값의 평균값을 연산하여 평균 보상값을 도출한다. In step S29, the response speech unit 300 of one embodiment determines whether one query candidate is the last query candidate among the estimated plurality of query candidates, and if the determination result is the last query candidate, in step S31, The response speech unit 300 of the embodiment calculates the average value of the final reward values of multiple response candidates for one derived query candidate and derives the average reward value.

그리고, 단계(S33)에서, 일 실시 예의 응답 발화부(300)는 평균 보상값을 반영하여 현재 대화 스텝의 유저의 질의에 대한 응답 후보를 도출하고 도출된 응답 후보를 발화한다.Then, in step S33, the response speech unit 300 of one embodiment derives a response candidate for the user's inquiry of the current conversation step by reflecting the average compensation value and utters the derived response candidate.

한편, 단계(S13)(S15)에서 입력된 현재 대화 스텝의 질의에 대한 대화 스텝 수가 기 정해진 최대 대화 스텝 수에 도달하지 아니하고 입력된 현재 대화 스텝의 질의에 대한 탐색 횟수가 상기 추정된 다수의 응답 후보의 수보다 큰 경우 단계(S35)에서, 일 실시예의 추가응답후보 생성부(400)는 현재 대화 스텝의 유저 질의에 대한 응답 후보를 추가한다.Meanwhile, in steps S13 and S15, the number of dialogue steps for the query of the current dialogue step input does not reach the predetermined maximum number of dialogue steps, and the number of searches for the query of the current dialogue step input is reduced to the estimated number of responses. If the number of candidates is greater than the number of candidates, in step S35, the additional response candidate generator 400 of one embodiment adds a response candidate to the user inquiry of the current conversation step.

이에 일 실시예에 의거, 현재 대화 스텝을 기준으로 미래 대화 스텝의 다수의 질의 후보 각각에 대한 다수의 응답 후보군 각각의 보상값에 대한 누적 보상값을 반영하여 현재 대화 스텝에서의 다수의 응답 후보 중 하나의 응답 후보를 도출하고 도출된 응답 후보를 발화함에 따라 오픈 도메인 상에서 실시간으로 자연스러운 대화를 이어갈 수 있고, 다수의 응답 후보 중 악의적 응답 후보를 삭제함에 따라 악의적 대화를 미연에 방지할 수 있다.Accordingly, according to one embodiment, based on the current conversation step, the cumulative reward value for each of the plurality of response candidates for each of the plurality of question candidates in the future dialogue step is reflected and the cumulative compensation value of each of the plurality of response candidates in the current dialogue step is reflected. By deriving one response candidate and uttering the derived response candidate, a natural conversation can be continued in real time on the open domain, and by deleting a malicious response candidate among multiple response candidates, malicious conversation can be prevented in advance.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using one or more general-purpose or special-purpose computers, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로 (collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기 광매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. Computer-readable media may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on a computer-readable medium may be specially designed and configured for an embodiment or may be known and usable by those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. Includes magneto-optical media and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with limited drawings as described above, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

100 : 대화 학습 모델 구축부
200 : 탐색응답후보 도출부
300 : 응답 발화부
310 : 현재 보상값 연산모듈
320 : 미래 누적보상값 연산모듈
330 : 최종보상값 연산모듈
340 : 평균 보상값 도출모듈
350 : 발화모듈
400 : 추가응답후보 생성부100: Dialogue learning model construction unit
200: Search response candidate derivation unit
300: response firing unit
310: Current compensation value calculation module
320: Future accumulated compensation value calculation module
330: Final compensation value calculation module
340: Average compensation value derivation module
350: Ignition module
400: Additional response candidate generation unit

Claims

A conversation learning model building unit that collects a plurality of conversation data and previous conversation data and builds a conversation learning model to estimate a number of response candidates for the query of the input current conversation step;
If the number of dialogue steps for a query in the current dialogue step entered reaches the predetermined maximum number of dialogue steps, or the number of searches for a query in the current dialogue step entered is not greater than the estimated number of response candidates, a search response candidate deriving unit that derives one response candidate to be searched among the estimated plurality of response candidates; and
Based on the current dialogue step for the search response candidate, the cumulative reward value of the plurality of response candidates for each query candidate is reflected as a learning result of the dialogue learning model constructed in the next dialogue step, and the plurality of the current dialogue step is calculated. A conversation agent system using conversation prediction, characterized by including a response speech unit that derives one response candidate among response candidates and utters the derived response candidate.

The method of claim 1, wherein the response speech unit,
a current compensation value calculation module that calculates the compensation value of the search response candidate and outputs a current compensation value;
a future cumulative reward value calculation module for deriving a cumulative reward value of a plurality of response candidates for a plurality of query candidates in a next conversation step based on the current conversation step for the search response candidate;
a search response candidate reward value calculation module that derives a reward value of the search response candidate from the sum of the future cumulative reward value and the current reward value; and
An average reward value derivation module that derives the average of search response candidate reward values for a plurality of response candidates; and
A conversation agent system using conversation prediction, characterized by including a speech module that derives one response among the plurality of response candidates by averaging the reward values of all search response candidates for the plurality of response candidates and utters the derived one response. .

The method of claim 2, wherein the current compensation value calculation module,
Conversation prediction further comprising a filter module that determines the search response candidate as a malicious response candidate and deletes the search response candidate from the plurality of response candidates when the calculated current compensation value is smaller than a predetermined minimum allowable compensation value. A conversation agent system using .

The system of claim 1, wherein:
If the number of dialogue steps for the query of the input current dialogue step does not reach the predetermined maximum number of dialogue steps and the number of searches for the query of the input current dialogue step is greater than the estimated number of response candidates, a new response candidate A conversation agent system using conversation prediction, further comprising an additional response candidate generator that additionally estimates .

The method of claim 2, wherein the cumulative compensation value is
A conversation agent system using conversation prediction, characterized by being derived by the GAE (Generalized Advantage Estimation) algorithm that optimizes the cumulative reward value using a policy gradient technique.

In the conversation agent method using conversation prediction, which is performed based on a conversation agent system using conversation prediction, including a conversation learning model building unit, a search response candidate derivation unit, and a response speech unit,
A conversation learning model building step of collecting a plurality of conversation data and previous conversation data to build a conversation learning model that estimates a number of response candidates for the query of the current conversation step input;
If the number of dialogue steps for a query in the current dialogue step entered reaches the predetermined maximum number of dialogue steps, or the number of searches for a query in the current dialogue step entered is not greater than the estimated number of response candidates, A search response candidate derivation step of deriving one response candidate to be searched from among the estimated plurality of response candidates; and
Based on the current dialogue step for the search response candidate, the cumulative reward value of the plurality of response candidates for each query candidate is reflected as a learning result of the dialogue learning model constructed in the next dialogue step, and the plurality of the current dialogue step is calculated. A conversation agent method using conversation prediction, comprising a response utterance step of deriving one response candidate among response candidates and uttering the derived response candidate.

The method of claim 6, wherein the response utterance step is,
calculating a current compensation value by calculating a compensation value of the search response candidate;
Deriving a future cumulative reward value of a plurality of response candidates for a plurality of query candidates in a next dialogue step based on the current dialogue step for the search response candidate;
Deriving a reward value of the search response candidate from the sum of the future accumulated reward value and the current reward value;
Deriving an average of search response candidate reward values for a plurality of response candidates; and
A conversation agent method using conversation prediction, comprising the step of deriving one response among the plurality of response candidates by averaging the reward values of all search response candidates for the plurality of response candidates and uttering the derived one response.

The method of claim 7, wherein calculating the current compensation value includes:
Conversation prediction further comprising the step of determining the search response candidate as a malicious response candidate and deleting the search response candidate from the plurality of response candidates when the calculated current reward value is less than a predetermined minimum allowable reward value. Conversational agent method used.

The method of claim 6, wherein
If the number of dialogue steps for the query of the input current dialogue step does not reach the predetermined maximum number of dialogue steps and the number of searches for the query of the input current dialogue step is greater than the estimated number of response candidates, a new response candidate A conversation agent method using conversation prediction, further comprising an additional response candidate generation step of additionally estimating .

A computer-readable recording medium on which a program for executing the conversation agent method using conversation prediction according to any one of claims 6 to 9 is recorded.