KR20200111595A

KR20200111595A - Conversation agent system and method using emotional history

Info

Publication number: KR20200111595A
Application number: KR1020190043129A
Authority: KR
Inventors: 신홍식; 홍은미; 이청안
Original assignee: 한국전자인증 주식회사
Priority date: 2019-03-19
Filing date: 2019-04-12
Publication date: 2020-09-29
Also published as: KR102323482B1

Abstract

The present technique relates to a conversation agent system and method using an utterance emotion history. According to a specific embodiment of the present technique, learning materials are derived by reflecting a cumulative discount reward value to induce a change in emotion for many collected original materials and a conversation model is constructed by learning based on a learning technique set for the derived learning materials. By generating a response sentence and response emotion for each of the utterance sentences and utterance emotion inputted using the constructed conversation model, converting the generated response emotion and response sentence into natural language forms, and combining the converted forms, the accuracy for emotion of a speaker can be improved as the speaker utters and an effective conversation reflecting the emotion can be conducted.

Description

Conversation agent system and method using speech emotion history {CONVERSATION AGENT SYSTEM AND METHOD USING EMOTIONAL HISTORY}

본 발명은 발화 감정 히스토리를 이용한 대화 에이젼트 시스템 및 방법에 관한 것으로서, 더욱 상세하게는 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 응답 문장을 생성함에 따라 발화자의 감정이 적극 반영된 대화를 수행할 수 있도록 한 기술에 관한 것이다.The present invention relates to a dialogue agent system and method using a speech emotion history, and more particularly, a dialogue in which the emotion of the talker is actively reflected by generating a response sentence by reflecting a discount cumulative compensation value for inducing a change in the emotion of the talker. It is about the technology that has made it possible to perform.

기존의 대화형 시스템 연구에는 사용자의 감정은 고려 되지 않은 채 발화된 문장에 대해서 답변을 하기에 급급하였으나 근래에는 감정을 포함한 대화형 시스템을 개발하려는 연구가 활발히 진행되고 있다.In the existing interactive system research, it was urgent to answer the uttered sentences without considering the user's emotions, but in recent years, research to develop an interactive system including emotions has been actively conducted.

이러한 대화형 시스템에 적용되는 딥러닝 인코더는 딥러닝 기술을 사용해서 가변 길이 문서를 고정 길이 문서 벡터로 표현하는 방법으로, 감정 분류 분야에서 우수한 성능을 보여줄 수 있다. The deep learning encoder applied to such an interactive system is a method of expressing a variable length document as a fixed length document vector using deep learning technology, and can show excellent performance in the field of emotion classification.

하지만 전체 문서 시퀀스의 마지막 출력을 문서 벡터로 간주하는 LSTM(Long Short Term Momory) 인코딩 장치의 경우, 입력이 길어짐에 따라 초기에 입력된 패턴의 인식률이 급격히 저하되어, 긴 문서의 인코딩 장치로는 적합하지 않은 문제점이 있다. However, in the case of an LSTM (Long Short Term Momory) encoding device that considers the last output of the entire document sequence as a document vector, the recognition rate of the initially input pattern rapidly decreases as the input lengthens, making it suitable as an encoding device for long documents. There is a problem that has not been done.

본 발명은 발화자의 감정 변화가 유도되는 할인 누적 보상값이 반영된 응답 문장 및 응답 감정을 생성하여 발화함으로써 대화의 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있는 발화 감정 히스토리를 이용한 대화 에이젼트 시스템 및 방법을 제공하고자 함에 있다. The present invention can improve the accuracy of conversation by generating and uttering response sentences and response emotions reflecting the discount cumulative compensation value from which the speaker's emotion change is induced, and using a speech emotion history capable of performing an effective dialogue reflecting emotions. It is to provide a conversation agent system and method.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시 예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The object of the present invention is not limited to the above-mentioned object, and other objects and advantages of the present invention that are not mentioned can be understood by the following description, and will be more clearly understood by examples of the present invention. In addition, it will be easily understood that the objects and advantages of the present invention can be realized by means and combinations thereof indicated in the claims.

일 실시예에 의한 발화 감정 히스토리를 이용하여 대화 에이젼트 시스템은 The conversation agent system using the speech emotion history according to an embodiment

수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 연산하고 연산된 할인 누적 보상값을 반영하여 대화 모델을 구축하는 대화 모델 구축장치;A conversation model building device for calculating a discount cumulative reward value for inducing a change in the talker's emotion for the collected original data and constructing a dialog model by reflecting the calculated discount cumulative reward value;

발화에 포함된 문장 및 감정을 수신하는 수신장치; A receiving device for receiving sentences and emotions included in the speech;

상기 수신장치의 추출된 문장 및 감정을 하나의 학습 자료로 전처리하는 전처리장치; A preprocessing device for preprocessing the extracted sentences and emotions of the receiving device into one learning material;

상기 전처리장치의 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출하는 응답 생성장치; 및A response generating device for deriving a response sentence and an emotion, respectively, based on a dialogue model built for one learning material of the preprocessor; And

도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 발화하는 출력장치를 포함하는 것을 일 특징으로 할 수 있다.It may be characterized in that it comprises an output device for converting each of the derived response sentences and emotions into a natural language form and then combining them to combine the response emotions for the uttered emotions and the response sentences for the uttered sentences for speech.

바람직하게 상기 대화 모델 구축장치는, Preferably the conversation model building device,

다수의 원본 자료를 수집하는 원본자료 수집모듈;An original data collection module for collecting a plurality of original data;

상기 각각의 원본 자료에 대해 할인 누적 보상값을 연산하는 할인 누적 보상값 연산모듈; A discount accumulation compensation value calculation module for calculating a discount accumulation compensation value for each of the original data;

상기 각각의 원본 자료에 연산된 할인 누적 보상값을 반영하여 학습 자료를 생성한 다음 생성된 학습 자료에 대해 기 정해진 학습 알고리즘을 토대로 학습 수행하는 학습 모듈; 및A learning module that generates learning materials by reflecting the calculated discount cumulative compensation value on each of the original materials, and then performs learning on the generated learning materials based on a predetermined learning algorithm; And

상기 학습 결과에 의거 대화 모델을 구축하는 대화 모델 구축모듈을 포함할 수 있다.It may include a dialog model building module for building a dialog model based on the learning result.

바람직하게 상기 할인 누적 보상값 연산모듈은,Preferably the discount cumulative compensation value calculation module,

할인 연산 보상값은 reward=0 으로 초기 설정하고,The discount calculation reward value is initially set to reward=0,

index+2n+2< 2 인 경우, x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단하고 행복인 경우 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ 으로 설정하며,If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happiness, the current discount calculation compensation value = previous discount calculation compensation value + r ⁿ is set,

상기 n=n+1 로 증가한 다음 모든 원본 자료에 대해 반복 수행하도록 구비될 수 있다.It may be provided to increase to n=n+1 and then repeat it for all original data.

여기서, r은 할인율이고, x[index+2n+2]는 문장과 감정으로 하나의 원본 자료의 에피소드이고, index는 원본 자료의 식별정보이고, n은 할인 누적 보상값으로 보정된 학습 자료와 원본 자료와의 거리 정보이다.Here, r is the discount rate, x[index+2n+2] is the episode of one original material with sentences and emotions, index is the identification information of the original material, and n is the learning material and original corrected by the discount cumulative reward value. This is the distance information from the data.

바람직하게 할인 누적 보상값은 원본 자료의 임의의 응답 문장의 반응으로 바로 이어 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 증가하고, 원본 자료의 임의의 응답 문장의 반응으로 소정 수의 지난 후 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 감소하도록 구비될 수 있다.Preferably, the cumulative discount compensation value is increased based on the previous discount cumulative compensation value when the same sentiment is uttered immediately following the response of a random response sentence of the original data, and a predetermined number is a response of a random response sentence of the original data. If the sentence of the same emotion is uttered after the passing of, it may be provided to decrease based on the previous discount accumulated compensation value.

바람직하게 상기 발화 감정 히스토리를 이용한 대화 에이젼트 시스템은Preferably, the dialogue agent system using the speech emotion history

기 정해진 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 상기 출력된 응답 문장 및 감정에 대해 학습을 수행하여 상기 대화 모델을 업데이트하는 모델 업데이트장치를 더 포함할 수 있다.A model updating device that updates the conversation model by learning about the input speech sentences and emotions and the output response sentences and emotions using a policy gradient training method based on a predetermined reinforcement learning policy It may further include.

일 실시 예의 발화 감정 히스토리를 이용하여 대화 에이젼트 방법은, A conversation agent method using a speech emotion history according to an embodiment,

수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 연산하고 연산된 할인 누적 보상값을 반영하여 대화 모델을 구축하는 대화 모델 구축단계;A dialogue model construction step of calculating a discount cumulative reward value for inducing a change in the talker's emotion for the collected original data and constructing a dialogue model by reflecting the calculated discount cumulative reward value;

발화에 포함된 문장 및 감정을 수신하는 수신단계; A receiving step of receiving sentences and emotions included in the speech;

상기 추출된 문장 및 감정을 하나의 학습 자료로 전처리하는 전처리단계; A preprocessing step of preprocessing the extracted sentences and emotions into one learning material;

상기 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출하는 응답 생성단계; 및A response generation step of deriving a response sentence and an emotion, respectively, based on a dialogue model built for the one learning material; And

도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 발화하는 출력단계를 포함하는 발화 감정 히스토리를 이용한 대화 에이젼트 방법에 의하여 상기 발화 감정 히스토리를 이용한 대화 에이젼트 방법을 일 특징으로 한다.By converting each of the derived response sentences and emotions into a natural language form and combining them to combine the response emotions for the uttered emotions and the response sentences for the uttered sentences, the dialogue agent method using the speech emotion history includes an output step. A conversation agent method using the speech emotion history is provided.

바람직하게 상기 대화 모델 구축단계는, Preferably, the dialogue model building step,

다수의 원본 자료를 수집하고,Collect a number of original data,

상기 각각의 원본 자료에 대해 할인 누적 보상값을 연산하며,Calculate the discount cumulative compensation value for each of the above original data,

상기 각각의 원본 자료에 연산된 할인 누적 보상값을 반영하여 학습 자료를 생성한 다음 생성된 학습 자료에 대해 기 정해진 학습 알고리즘을 토대로 학습 수행하고,The learning materials are generated by reflecting the calculated discount cumulative compensation value on each of the original materials, and then learning is performed on the generated learning materials based on a predetermined learning algorithm,

상기 학습 결과에 의거 대화 모델을 구축하도록 구비될 수 있다.It may be provided to build a conversation model based on the learning result.

바람직하게 상기 할인 누적 보상값은,Preferably the discount cumulative compensation value,

reward=0 으로 초기 설정하고,Initially set to reward=0,

할인 누적 보상값은 원본 자료의 임의의 응답 문장의 반응으로 바로 이어 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 증가하고, 원본 자료의 임의의 응답 문장의 반응으로 소정 수의 지난 후 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 감소하도록 구비될 수 있다.The cumulative discount reward value increases based on the previous discount cumulative reward value when a sentence of the same sentiment is uttered immediately following the response of a random response sentence of the original data. When the sentence of the same emotion is uttered afterwards, it may be provided to decrease based on the previous discount accumulated compensation value.

바람직하게 상기 출력단계 이후에Preferably after the output step

기 정해진 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 상기 출력된 응답 문장 및 감정에 대해 학습을 수행하여 상기 대화 모델을 업데이트하는 모델 업데이트단계를 더 포함할 수 있다.A model update step of updating the dialogue model by learning the input speech sentences and emotions and the output response sentences and emotions using a policy gradient training according to a predetermined reinforcement learning policy It may further include.

일 실시 예에 따르면, 다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구출하며, 구출된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.According to an embodiment, learning materials are derived by reflecting the accumulated discount compensation value for inducing emotional change for a plurality of collected original materials, and learning is performed on the derived learning materials based on a set learning technique to rescue a dialogue model. And, by using the rescued dialogue model, a response sentence and a response emotion for each of the input speech sentence and speech emotion are generated, and the generated response emotion and response sentence are converted into natural language form, and then combined and uttered, the emotion of the speaker It is possible to improve the accuracy of the person and conduct effective conversations that reflect emotions.

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 후술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 일 실시 예에 따른 시스템 구성도이다.
도 2는 일 실시예의 시스템의 대화 모델 구축장치의 세부 구성도이다.
도 3은 일 실시예의 원본 자료 및 학습 자료를 보인 예시도들이다.
도 4는 일 실시 예에 따른 시스템의 동작 과정을 보인 전체 흐름도이다.
도 5는 일실시 예에 따른 대화 모델 구축과정의 세부 흐름도이다.The following drawings appended in the present specification illustrate preferred embodiments of the present invention, and serve to further understand the technical idea of the present invention together with the detailed description of the present invention to be described later, so the present invention is described in such drawings. It is limited only to and should not be interpreted.
1 is a system configuration diagram according to an embodiment.
2 is a detailed configuration diagram of an apparatus for constructing a conversation model of a system according to an embodiment.
3 is an exemplary diagram showing original materials and learning materials according to an embodiment.
4 is an overall flowchart illustrating an operation process of a system according to an exemplary embodiment.
5 is a detailed flowchart of a conversation model building process according to an embodiment.

본 발명은 대화형 시스템에 적용된다. 그러나 본 발명은 이에 한정되지 않고, 본 발명의 기술적 사상이 적용될 수 있는 모든 대화형 통신 시스템 및 방법에도 적용될 수 있다.The present invention is applied to an interactive system. However, the present invention is not limited thereto, and can be applied to all interactive communication systems and methods to which the technical idea of the present invention can be applied.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in the present specification are only used to describe specific embodiments, and are not intended to limit the present invention. In addition, the technical terms used in the present specification should be interpreted as generally understood by those of ordinary skill in the technical field to which the present invention belongs, unless otherwise defined in the present specification, and excessively comprehensive It should not be construed as a human meaning or an excessively reduced meaning. In addition, when a technical term used in the present specification is an incorrect technical term that does not accurately express the spirit of the present invention, it should be replaced with a technical term that can be correctly understood by those skilled in the art. In addition, general terms used in the present invention should be interpreted as defined in the dictionary or according to the context before and after, and should not be interpreted as an excessively reduced meaning.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.In addition, the singular expression used in the present specification includes a plurality of expressions unless the context clearly indicates otherwise. In the present application, terms such as "consist of" or "include" should not be construed as necessarily including all of the various elements or various steps described in the specification, and some of the elements or some steps It may not be included, or it should be interpreted that it may further include additional elements or steps.

또한, 본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.In addition, terms including ordinal numbers such as first and second used herein may be used to describe various elements, but the elements should not be limited by the terms. These terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 한다.When a component is referred to as being "connected" or "connected" to another component, it may be directly connected or connected to the other component, but another component may exist in the middle. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것 일뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다. 본 발명의 사상은 첨부된 도면 외에 모든 변경, 균등물 내지 대체물에 까지도 확장되는 것으로 해석되어야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numerals regardless of the reference numerals, and redundant descriptions thereof will be omitted. In addition, in describing the present invention, when it is determined that a detailed description of a related known technology may obscure the subject matter of the present invention, a detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only intended to facilitate understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings. The spirit of the present invention should be construed as extending to all changes, equivalents, or substitutes in addition to the accompanying drawings.

일 실시 예는 발화자와 대화자 간의 원본 자료에 의거 도출된 할인 누적 보상값이 반영된 학습 자료로 구축된 대화 모델에 이용하여 발화자 문장에 포함된 단어 및 감정 각각에 대한 응답 단어 및 응답 감정을 생성하며 생성된 응답 감정 및 응답 단어 각각을 자연어로 변환한 다음 결합된 응답 문장을 발화하는 구성을 갖춘다.One embodiment generates and generates response words and response emotions for each of the words and emotions included in the speaker's sentence using a dialogue model built with learning data reflecting the accumulated discount reward values derived based on the original data between the talker and the talker. Each of the response emotions and response words is converted into natural language, and a combined response sentence is uttered.

도 1은 일 실시 예의 감정 히스토리를 이용한 대화 에이젼트 시스템의 구성을 보인 도면이고, 도 2는 도 1에 도시된 대화 모델 구축장치(100)의 세부적인 구성을 보인 도면이고, 도 3은 도 2에 도시된 대화 모델을 구축하기 위해 수집된 원본 자료 및 학습 자료를 보인 도이다.1 is a diagram showing a configuration of a conversation agent system using an emotion history according to an embodiment, FIG. 2 is a diagram showing a detailed configuration of the conversation model building apparatus 100 shown in FIG. 1, and FIG. It is a diagram showing the original data and learning data collected to build the illustrated dialogue model.

도 1 내지 도 3을 참조하면, 일 실시 예에 따른 시스템은, 대화 모델 구축장치(100), 수신장치(200), 전처리장치(300), 응답 생성장치(400), 및 출력장치(500)를 포함할 수 있다.1 to 3, a system according to an embodiment includes a conversation model building apparatus 100, a receiving apparatus 200, a preprocessing apparatus 300, a response generating apparatus 400, and an output apparatus 500. It may include.

대화 모델 구축장치(100)는 수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대한 학습을 통해 대화 모델을 구축할 수 있으며, 이에 대화 모델 구축장치(100)는 도 2에 도시된 바와 같이, 원본자료 수집모듈(111), 할인 누적 보상값 연산모듈(112), 학습 모듈(113), 및 대화 모델 구축모듈(114)를 포함할 수 있다.The conversation model building apparatus 100 derives learning data by reflecting the accumulated discount reward value for inducing a change in the speaker's emotions with respect to the collected original data, and builds a dialog model through learning the derived learning materials. In this case, as shown in FIG. 2, the conversation model building apparatus 100 includes an original data collection module 111, a discount accumulated compensation value calculation module 112, a learning module 113, and a conversation model building module ( 114) may be included.

원본 자료 수집모듈(111)은 다수의 원본 자료들을 수집하고, 수집된 원본 자료는 할인 누적 보상값 연산모듈(112)로 전달할 수 있다. 여기서, 하나의 원본 자료는 도 3의 (a)에 도시된 바와 같이, 대화자 간에 발화 문장에 포함된 문장 및 감정으로 나타낸다.The original data collection module 111 may collect a plurality of original data, and the collected original data may be transferred to the discount cumulative compensation value calculation module 112. Here, as shown in Fig. 3(a), one original material is expressed as a sentence and emotion included in the spoken sentence between the interlocutors.

할인 누적 보상값 연산모듈(112)는 수집된 원본 자료의 발화에서 이어지는 적어도 하나의 문장 및 감정으로부터 기 정해진 손실 함수에 의거 할인 누적 보상값(reward)을 연산할 수 있다.The discount cumulative compensation value calculation module 112 may calculate a discount cumulative compensation value based on a predetermined loss function from at least one sentence and emotion following the utterance of the collected original data.

할인 누적 보상값(reward)는 다음 알고리즘으로 손실함수의 해를 도출함에 따라 연산될 수 있다.The accumulated discount reward can be calculated by deriving the solution of the loss function with the following algorithm.

(1) 할인 연산 보상값 reward=0 으로 초기 설정 (1) Initially set as reward=0 for discount calculation reward value

(2) index+2n+2< 2 인 경우, x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단하고 행복인 경우 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ으로 설정(2) If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happiness, the current discount calculation reward value = previous discount calculation reward value + r ⁿ

(3) n=n+1 로 증가한 후 (2) 단계로 진행(3) After increasing n=n+1, proceed to step (2)

여기서 r은 할인률로 1보다 작은 값을 가지며, x[index+2n+2]는 문장과 감정으로 하나의 원본 자료이다. 여기서, x가 홀수이면 홀수번째 발화 문장으로 대화자 1이 발화한 원본 자료이고, x가 짝수이면 대화자 2가 발화한 원본 자료이다.Here, r is the discount rate and has a value less than 1, and x[index+2n+2] is a single original data as sentences and emotions. Here, if x is an odd number, it is the original data uttered by the talker 1 as an odd-numbered sentence, and if x is an even number, it is the original data uttered by the talker 2.

또한, index는 원본 자료의 인덱스로서, 연속된 임의의 두개의 에피소드에 대해 할인 누적 보상값(reward)을 도출할 수 있다. 이에 에피소드 x가 대화 중 몇번째 발화 문장인 지를 나타낸다. 예를 들어, 첫번째 발화 문장의 index 및 n의 값은 0이므로, 할인 누적 보상값(reward)을 도출하기 위한 최초 에피소드 x[index+2n+2]는 x[2]이다. 이러한 x[2]의 에피소드에 행복이라는 감정이 포함된 경우 할인 누적 보상값은 이전 할인 연산 보상값+rⁿ이다. 여기서 n은 0 의 값을 가진다. Also, the index is an index of the original data, and a discount cumulative reward value can be derived for any two consecutive episodes. Thus, episode x indicates the number of sentences spoken in the conversation. For example, since the values of index and n of the first speech sentence are 0, the first episode x[index+2n+2] for deriving the discount cumulative reward value is x[2]. When the emotion of happiness is included in the episode of x[2], the accumulated discount compensation value is the previous discount calculation compensation value + r ⁿ . Where n has a value of 0.

즉, n은 할인 누적 보상값을 연산하기 위한 변수로 발화 문장에서 할인 누적 보상값을 연산하기 위한 감정까지 떨어진 거리를 나타내며, 예를 들어, 첫번째 발화 문장 및 첫번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 첫번째 문장까지의 거리 n=0이고, 세번째 발화 문장 및 세번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 두번째 문장까지의 거리 n=1이며, 다섯번째 발화 문장 및 다섯번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 두번째 문장까지의 거리 n=2 이다.That is, n is a variable for calculating the discount cumulative reward value and represents the distance from the speech sentence to the emotion for calculating the discount cumulative reward value. For example, the discount cumulative reward value for the emotion in the first speech sentence and the first response sentence The distance to the first sentence compensated by is n = 0, the distance from the third sentence and the third reply sentence to the second sentence compensated with the cumulative reward value for the emotion is n = 1, and in the fifth sentence and the fifth reply sentence The distance n=2 to the second sentence compensated by the cumulative reward value of the discount on emotion.

또한 원본 자료의 임의의 문장의 반응으로 바로 다음 문장에 포함된 감정이 행복이면 할인 누적 보상값은 이전 할인 누적 보상값을 기준으로 증가될 수 있고, 임의의 문장의 반응으로 소정 회 다음 원본 자료의 문장에 포함된 감정이 행복이면 할인 누적 보상값은 이전 할인 누적 보상값을 기준으로 증가될 수 있다.In addition, if the emotion included in the immediately next sentence is happiness as a response of a random sentence of the original data, the accumulated discount reward value may be increased based on the previous discount accumulated reward value, and the response of the random sentence If the emotion included in the sentence is happiness, the accumulated discount compensation value may be increased based on the previous discount accumulated compensation value.

연산된 할인 누적 보상값이 반영된 도 3의 (b)의 학습 자료는 학습 모듈(313)으로 전달되고, 학습 모듈(113)은 할인 누적 보상값이 반영된 학습 자료에 대해 학습을 수행할 수 있다. 여기서, 일 실시 예에 학습든 기계 학습 또는 딥러닝 등의 다양한 형태로 수행될 수 있으나, 이에 한정하지 아니한다.The learning material of FIG. 3B in which the calculated discount cumulative compensation value is reflected is transferred to the learning module 313, and the learning module 113 may perform learning on the learning material on which the discount cumulative compensation value is reflected. Here, in an embodiment, learning may be performed in various forms, such as machine learning or deep learning, but the present invention is not limited thereto.

그리고, 학습 모듈(113)의 학습 결과는 대화 모델 구축모듈(114)로 전달되며, 대화 모델 구축모듈(114)는 학습 결과를 토대로 대화 모델을 구축할 수 있다. 이러한 대화 모델은 순환 신경 회로망 인코더 및 디코더 모델이며, 순환 신경 회로망 인코더 및 디코더 모델을 구축하는 일련의 과정은 본 출원인에 의거 기 출원된 바 있다. 이에 대화 모델 구축모듈(114)은 각 입력 문장과 출력 문장에 대해 할인된 누적 보상을 도출하고, 상기 할인된 누적 보상을 기 정해진 손실함수의 가중 인자로 곱하여 최종 손실함수를 도출하며, 상기 최종 손실함수를 이용하여 경사 하강법 알고리즘으로 입력된 원본 자료에 대해 학습하여 최종 대화 모델을 구축할 수 있고, 이에 구축된 최종 대화 모델에 의거 입력된 발화 문장 및 감정에 각각에 대한 응답 단어 및 응답 감정이 도출될 수 있다.In addition, the learning result of the learning module 113 is transmitted to the dialog model building module 114, and the dialog model building module 114 may build a dialog model based on the learning result. Such a dialog model is a recurrent neural network encoder and decoder model, and a series of processes for constructing a recurrent neural network encoder and decoder model have been previously filed by the present applicant. Accordingly, the dialogue model building module 114 derives discounted cumulative compensation for each input sentence and output sentence, multiplies the discounted cumulative compensation by a weighting factor of a predetermined loss function to derive a final loss function, and the final loss A final dialogue model can be constructed by learning the original data input by the gradient descent algorithm using the function, and the response word and response emotion for each of the speech sentences and emotions input based on the constructed final dialogue model Can be derived.

한편, 수신장치(200)는 발화자의 문장을 수신하는 기능을 수행하고, 예를 들어 발화자에 의해 "나는 너무 행복해."라고 말하는 경우 발화자의 음성을 문장으로 변환하여 전처리장치(300)로 전달한다.On the other hand, the receiving device 200 performs a function of receiving a sentence of the talker, and, for example, when the talker says "I am so happy", the voice of the talker is converted into a sentence and transmitted to the preprocessor 300. .

여기서, 일 실시 예에서 설명 상의 편의를 위해 수신장치(200)는 발화자의 음성을 문장으로 단순 변환하는 음성 변환기를 일 례로 설명하고 있으나 전술한 다양한 형태로 감정을 추출하여 전처리장치(300)로 전달할 수 있다.Here, in an embodiment, for convenience of explanation, the receiving device 200 describes as an example a voice converter that simply converts the talker's voice into a sentence, but extracts emotions in various forms described above and transmits them to the preprocessor 300. I can.

그리고 수신장치(200)는 발화자의 얼굴 표정 또는 행동 인식 등을 통해 발화자의 감정을 추출할 수 있고, 또한 수신된 단어에 포함된 감정이 반영된 발화자의 보이스 강약 및 높 낮음 등을 인식하여 발화자의 감정을 추출할 수 있다. In addition, the receiving device 200 may extract the speaker's emotion through recognition of the speaker's facial expressions or actions, and also recognize the speaker's voice strength and low level, etc., in which the emotion included in the received word is reflected, Can be extracted.

즉, 발화자의 얼굴 표정 인식 알고리즘을 이용하여 얼굴 근육의 움직임에 따라 변하는 얼굴 모양, 눈·코·입의 변화, 일시적인 주름 등의 빠른 신호가 추출되고, 추출된 빠른 신호로부터 발화자의 감정이 도출된다. 여기서 감정이라 함은 놀라움, 두려움, 혐오, 화, 행복, 슬픔을 의미한다. 즉, 놀라움은 지속 시간이 가장 짧고, 두려움은 피해를 입기 전에 느껴지며, 혐오는 무언가에 대한 반감 행동으로 나타난다. 화는 가장 위험한 감정으로, 좌절이나 위협, 자극 등에 의해 일어난다. 반면 행복은 가장 긍정적인 감정이고, 슬픔은 상실이 원인으로 지속 시간이 길다는 특징을 가진다. 이러한 특징으로 추출된 감정은 전처리장치(300)로 전달된다. That is, using the talker's facial expression recognition algorithm, fast signals such as facial shapes that change according to the movement of facial muscles, changes in eyes, nose, mouth, and temporary wrinkles are extracted, and the talker's emotions are derived from the extracted fast signals. . Here, emotion means surprise, fear, disgust, anger, happiness, and sadness. In other words, surprise has the shortest duration, fear is felt before taking damage, and disgust appears as an act of antipathy to something. Anger is the most dangerous emotion, caused by frustration, threats, or stimulation. On the other hand, happiness is the most positive emotion, and sadness is characterized by a long duration due to loss. The emotion extracted by these features is transmitted to the preprocessor 300.

한편, 수신장치(200)는 HMM(Hidden Markov Models), CART(Classification and Regression Trees), SSL(Stacked Sequential Learning) 방법 중의 적어도 하나를 이용하여 발화자 보이스의 운율 경계를 추정하여 발화자의 감정을 추출하거나 상기 각 감정 별로 주파수 영역 및 크기 분석 결과를 토대로 발화자의 감정을 추출하여 전처리장치(300)로 전달한다.On the other hand, the receiving device 200 extracts the talker's emotion by estimating the prosody boundary of the talker's voice using at least one of HMM (Hidden Markov Models), CART (Classification and Regression Trees), and SSL (Stacked Sequential Learning) method. The emotion of the talker is extracted based on the result of analyzing the frequency domain and the size of each emotion and transmitted to the preprocessor 300.

이하 본 실시 예에서는 설명 상의 편의를 위해 음성-텍스트 변환기를 이용하여 발화자의 음성을 단어 형태로 변환한 후 변환된 단어와 상기 단어에 포함된 감정이 전처리장치(300)로 전달하는 것을 일 례로 설명한다.Hereinafter, in the present embodiment, for convenience of explanation, a speech-to-text converter is used to convert the speaker's voice into a word form, and then, the converted word and the emotion included in the word are transferred to the preprocessor 300 as an example. do.

이에 전처리장치(300)는 수신된 발화자의 문장을 형태소 단위로 분리한 후 형태소 형태의 단어와 상기 분리된 단어에 포함된 감정을 출력하는 기능을 수행한다. 예를 들어, 전처리장치(200)는 "나는", "너무", 및 "행복해"의 단어(x₁ ~ x₄,)와 사용자의 감정인 "행복"이라는 감정(e_x)을 출력한다.Accordingly, the preprocessor 300 performs a function of separating the received speaker's sentence into morpheme units and then outputting a morpheme type word and an emotion included in the separated word. For example, the preprocessor 200 outputs words of "I", "too", and "I'm happy" (x ₁ to x ₄ ,) and an emotion (e_x) of "happiness", which is a user's emotion.

그리고, 감정은 감성 TOBI(Tones and Breaking Indices: 운영전사규약) 등을 이용하여 도출되고, 도출된 감정(e_x)은 해당 감정을 나타내는 단어에 대한 후미에 추가되어 응답 생성장치(400)로 전달된다.In addition, the emotion is derived using emotional TOBI (Tones and Breaking Indices), etc., and the derived emotion (e_x) is added to the tail of the word representing the emotion and transmitted to the response generating device 400.

예를 들어, 응답 생성장치(400)는 전처리장치(300)에 의거 처리된 발화 문장 및 감정에 대해 전술한 최종 대화 모델을 토대로 응답 문장 및 응답 감정을 생성하는 기능을 수행함에 따라 응답 생성장치(400)는 각 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 출력한다. 이러한 응답 감정 및 응답 문장 각각은 출력장치(500)로 전달된다.For example, the response generation device 400 performs a function of generating a response sentence and a response emotion based on the above-described final dialogue model for the speech sentence and emotion processed based on the preprocessor 300. 400) outputs a response emotion for each speech emotion and a response sentence for the speech sentence. Each of these response emotions and response sentences is transmitted to the output device 500.

출력장치(500)는 수신된 응답 감정 및 응답 문장을 자연어 형태로 각각 변환한 후 결합하여 응답 문장을 생성하고 생성된 응답 문장을 발화한다. 예를 들어, "나도 행복해" 등의 다양한 응답 문장 및 응답 감정이 출력되게 되며 이는 대화자에 의거 발화된다.The output device 500 converts the received response sentiment and the response sentence into natural language forms, respectively, and combines them to generate a response sentence and utters the generated response sentence. For example, various response sentences such as "I'm happy too" and response emotions are output, which is uttered by the speaker.

또한, 출력장치(500)는 감정이 반영된 응답 문장 및 감정을 다양한 형태로 출력할 수 있다. 예를 들어, 출력부(500)는 아바타 등의 캐릭터에 응답 감정과 매칭되는 얼굴 표정 및/또는 행동과 조절된 보이스 강약 및 높낮음으로 응답 문장을 출력 및/또는 발화할 수 있다. In addition, the output device 500 may output a response sentence and emotion in which emotion is reflected in various forms. For example, the output unit 500 may output and/or utter a response sentence to a character such as an avatar with a facial expression and/or behavior matched with a response emotion and adjusted voice strength and low level.

이에 일 실시 예는 다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구축하며, 구축된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.Accordingly, in an embodiment, the learning material is derived by reflecting the cumulative discount compensation value for inducing emotional change for a plurality of collected original materials, and learning is performed on the derived learning material based on a set learning technique to establish a dialogue model. , Using the constructed dialogue model, a response sentence and a response emotion for each of the input speech sentence and speech emotion are generated, and the generated response emotion and response sentence are converted into natural language form, and then combined and uttered, the emotion of the speaker Accuracy can be improved, and effective conversation can be conducted that reflects emotions.

한편, 일 실시 예의 감정 히스토리를 이용한 대화 에이젼트 시스템은 모델 업데이트장치(600)를 더 포함할 수 있다. 전처리장치(300)의 문장 및 감정을 전달받은 모델 업데이트장치(600)는 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 출력된 응답 문장 및 감정에 대해 학습을 수행하여 대화 모델 구축장치(100)의 대화 모델을 업데이트할 수 있다.Meanwhile, the conversation agent system using the emotion history according to an embodiment may further include a model update device 600. The model update device 600, which received the sentences and emotions from the preprocessor 300, is a policy gradient training method based on a reinforcement learning policy. The conversation model of the apparatus 100 for building a conversation model may be updated by learning about emotions.

이에 대화 모델 구축장치(100)는 전처리장치(200)의 문장 및 감정에 대해 정책 변화도 학습법에 의거 학습 수행하여 학습 결과를 구축된 대화 모델에 업데이트함에 따라 감정이 행복할 수록 할인 누적 보상값을 증가할 수 있다.Accordingly, the conversation model building device 100 learns the sentences and emotions of the preprocessor 200 based on the policy change learning method, and updates the learning results to the established conversation model. Can increase.

일 실시 예에서 감정은 행복을 일 례로 설명하고 있으나, 다수의 감정에 대해 적용 가능하며, 이에 한정하지 아니한다.In one embodiment, the emotion is described as an example of happiness, but it is applicable to a plurality of emotions, and the present invention is not limited thereto.

도 4는 일 실시 예에 따른 감정 히스토리를 이용한 대화 에이젼트 시스템이 발화에 응답하여 대화하는 동작을 설명하기 위한 흐름도이고 도 5는 도 4의 대화 모델 구축장치(100)의 세부 흐름도이다. 일 실시 예에 따른 감정 히스토리를 이용한 대화 에이젼트 방법을 실행하기 위한 프로그램이 기록된 컴퓨터에서 판독 가능한 기록매체가 제공될 수 있다. 상기 프로그램은 아이템 추천 방법을 저장한 응용 프로그램, 디바이스 드라이버, 펌웨어, 미들웨어, 동적 링크 라이브러리(DLL) 및 애플릿 중 적어도 하나를 포함할 수 있다. 감정 히스토리를 이용한 대화 에이젼트 시스템은 프로세서를 포함하고, 프로세서는 감정 히스토리를 이용한 대화 에이젼트 방법이 기록된 기록 매체를 판독함으로써, 감정 히스토리를 이용한 대화 에이젼트 방법을 실행할 수 있다. FIG. 4 is a flowchart illustrating an operation of a conversation agent system using an emotion history in response to an utterance, and FIG. 5 is a detailed flowchart of the conversation model building apparatus 100 of FIG. 4. A recording medium readable by a computer in which a program for executing a conversation agent method using an emotion history according to an embodiment is recorded may be provided. The program may include at least one of an application program storing an item recommendation method, a device driver, firmware, middleware, a dynamic link library (DLL), and an applet. The dialogue agent system using the emotion history includes a processor, and the processor reads a recording medium on which the dialogue agent method using the emotion history is recorded, thereby executing the dialogue agent method using the emotion history.

도 4 및 도 5를 참조하면, 단계(10)에서, 발화 감정 히스토리를 이용한 대화 에이젼트 시스템의 대화 모델 구축장치(100)는 수집된 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 도출할 수 있다.4 and 5, in step 10, the conversation model building apparatus 100 of the conversation agent system using the speech emotion history is a discount cumulative compensation value for inducing a change in the talker's emotion with respect to the collected original data. Can be derived.

예를 들어, 모델 구축장치(100)는 단계(10)에서 다수의 원본 자료를 수집한 후 수집된 원본 자료에 대해 할인 누적 보상값을 연산할 수 있다.For example, the model building apparatus 100 may collect a plurality of original data in step 10 and then calculate a discount cumulative compensation value for the collected original data.

즉, 모델 구축장치(100)는 단계(11)에서 다수의 원본 자료를 수집한 다음 단계(12)에서 각각의 원본 자료에 대해 초기 할인 누적 보상값 reward는 0으로 설정한 다음, 단계(13)에서 index+2n+2< 2 인 지를 판단하고, 단계(13)에서 판단 결과 index+2n+2< 2 를 만족하는 경우 단계(14)에서 x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단한다.That is, the model building apparatus 100 collects a plurality of original data in step 11, and then sets the initial discount cumulative compensation value reward to 0 for each original data in step 12, and then step 13 If index+2n+2<2 is determined in step (13), and index+2n+2<2 is satisfied in step (13), the emotion of episode x[index +2n+2] in step (14) is happy Judging recognition.

또한, 단계(14)에서 x[index +2n+2]의 에피소드의 감정이 행복인 경우 대화 모델 구축장치(100)는 단계(15)에서 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ으로 설정한 다음 단계(16)에서 n=n+1 로 증가한 다음 단계(13)으로 진행하여 index+2n+2< 2 인 지를 판단한다. In addition, when the emotion of the episode of x[index +2n+2] in step 14 is happiness, the conversation model building apparatus 100 in step 15 includes the current discount calculation compensation value = previous discount calculation compensation value + r ⁿ Is set to and then increases to n=n+1 in step (16), and then proceeds to step (13) to determine whether index+2n+2<2.

이에 단계(17)에서 입력된 원본 자료에 대해 할인 누적 보상값을 반영하여 학습 자료를 도출하고 단계(18)에서 도출된 학습 자료에 대한 기 설정된 기계 학습 기법을 이용하여 학습을 수행하고 단계(19)에서 학습 결과를 토대로 대화 모델을 구축할 수 있다. Accordingly, learning materials are derived by reflecting the discount cumulative compensation value for the original data input in step (17), and learning is performed using a preset machine learning technique for the learning materials derived in step (18), and ), you can build a dialogue model based on the learning results.

한편, 단계(20)에서 발화 감정 히스토리를 이용한 대화 에이젼트 시스템의 수신장치(200)는 발화에 포함된 문장 및 감정을 수신할 수 있다. 이때 문장은 음성 변환기를 토대로 도출될 수 있고, 감정은 감정 분석기를 통해 추출될 수 있다. Meanwhile, in step 20, the reception apparatus 200 of the conversation agent system using the speech emotion history may receive sentences and emotions included in the speech. At this time, the sentence may be derived based on the voice converter, and the emotion may be extracted through the emotion analyzer.

그리고 단계(30)에서 전처리장치(300)는 추출된 문장 및 감정을 하나의 학습 자료로 전처리할 수 있다. And in step 30, the preprocessor 300 may preprocess the extracted sentences and emotions into one learning material.

한편 단계(40)에서 응답 생성장치(400)는 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출할 수 있다.Meanwhile, in step 40, the response generating apparatus 400 may derive a response sentence and an emotion, respectively, based on a conversation model built for one learning material.

그리고 단계(50)에서 출력장치(500)는 도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 출력할 수 있다. In step 50, the output device 500 converts each of the derived response sentences and emotions into a natural language form, and then combines them to combine and output the response emotions for the speech emotions and the response sentences for the speech sentences.

한편, 모델 업데이트장치(600)는 단계(60)에서, 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 입력된 발화 문장 및 감정과 출력된 응답 문장 및 감정에 대해 학습을 수행하여 대화 모델 구축장치(100)의 대화 모델을 업데이트할 수 있다.On the other hand, in step 60, the model update device 600 learns about the input speech sentence and emotion and the output response sentence and emotion by the policy gradient training according to the reinforcement learning policy. By doing so, the conversation model of the apparatus 100 for constructing a conversation model may be updated.

다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구축하며, 구축된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.Conversation model is constructed by deriving learning materials by reflecting the accumulated discount compensation value to induce emotional change for a large number of collected original materials, and performing learning based on the set learning technique for the derived learning materials. By generating response sentences and response emotions for each of the input speech sentences and speech emotions, the generated response emotions and response sentences are converted into natural language form, and then combined and uttered, improving the accuracy of the speaker’s emotions. And, you can conduct effective conversations that reflect your emotions.

이상에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자라면 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the preferred embodiments of the present invention, those skilled in the art will be able to variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims. You will understand that you can.

100 : 대화 모델 구축장치
111 : 원본자료 수집모듈
112 : 할인 누적 보상값 연산모듈
113 : 학습 모듈
114 : 대화 모델 구축모듈
200 : 수신장치
300 : 전처리장치
400 : 응답 생성장치
500 : 출력장치
600 : 모델 업데이트장치100: conversation model building device
111: original data collection module
112: discount accumulated compensation value calculation module
113: learning module
114: dialog model building module
200: receiving device
300: pretreatment device
400: response generating device
500: output device
600: model update device

Claims

A conversation model building device for calculating a discount cumulative reward value for inducing a change in the talker's emotion for the collected original data and constructing a dialog model by reflecting the calculated discount cumulative reward value;
A receiving device for receiving sentences and emotions included in the speech;
A preprocessing device for preprocessing the extracted sentences and emotions of the receiving device into one learning material;
A response generating device for deriving a response sentence and an emotion, respectively, based on a dialogue model built for one learning material of the preprocessor; And
Conversation using speech emotion history, characterized in that it includes an output device that converts each of the derived response sentences and emotions into natural language form and combines them and combines the response emotions for the speech emotions and the response sentences for the speech emotions. Agent system.

The method of claim 1, wherein the conversation model building apparatus,
An original data collection module for collecting a plurality of original data;
A discount cumulative compensation value calculating module for calculating a discount cumulative compensation value for each of the original data;
A learning module that generates learning materials by reflecting the calculated discount cumulative compensation value on each of the original materials, and then performs learning on the generated learning materials based on a predetermined learning algorithm; And
And a dialogue model building module for constructing a dialogue model based on the learning result.

The method of claim 2, wherein the discount cumulative compensation value calculation module,
The discount calculation reward value is initially set to reward=0,
If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happiness, the current discount calculation compensation value = previous discount calculation compensation value + r ⁿ is set,
It is provided to repeat n=n+1 and then repeat for all original data,
Here, r is the discount rate, x[index+2n+2] is the episode of one original material with sentences and emotions, index is the identification information of the original material, and n is the learning material and original corrected by the discount cumulative reward value. Conversation agent system using a history of speech emotions, characterized in that distance information from data.

The method of claim 1, wherein the cumulative discount compensation value increases based on a previous discount cumulative compensation value when a sentence of the same emotion is uttered immediately following a response of an arbitrary response sentence of the original data,
A conversation agent system using a speech emotion history, characterized in that it is provided to decrease based on a previous discount cumulative reward value when the same emotion sentence is uttered after a predetermined number of response sentences in response to a random response sentence of the original data.

The method of claim 1, wherein the conversation agent system using the speech emotion history
A model updating device that updates the conversation model by learning about the input speech sentences and emotions and the output response sentences and emotions using a policy gradient training method based on a predetermined reinforcement learning policy Conversation agent system using a speech emotion history, characterized in that it further comprises.

A dialogue model construction step of calculating a discount cumulative reward value for inducing a change in the talker's emotion for the collected original data and constructing a dialogue model by reflecting the calculated discount cumulative reward value;
A receiving step of receiving sentences and emotions included in the speech;
A preprocessing step of preprocessing the extracted sentences and emotions into one learning material;
A response generation step of deriving a response sentence and an emotion, respectively, based on a dialogue model built for the one learning material; And
A conversation agent method using a speech emotion history comprising an output step of converting each of the derived response sentences and emotions into a natural language form and then combining them to combine the response emotions for the speech emotions and the response sentences for the speech emotions.

The method of claim 6, wherein the building of the dialogue model comprises:
Collect a number of original data,
Calculate the discount cumulative compensation value for each of the above original data,
The learning materials are generated by reflecting the calculated discount cumulative compensation value for each of the original materials, and then learning is performed on the generated learning materials based on a predetermined learning algorithm,
A conversation agent method using a speech emotion history, which is provided to construct a conversation model based on the learning result.

The method of claim 1, wherein the discount cumulative compensation value is
Initially set to reward=0,
If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happiness, the current discount calculation compensation value = previous discount calculation compensation value + r ⁿ is set,
It is provided to repeat n=n+1 and then repeat for all original data
Here, r is the discount rate, x[index+2n+2] is the episode of one original material with sentences and emotions, index is the identification information of the original material, and n is the learning material and original corrected by the discount cumulative reward value. A conversation agent method using a history of speech emotions, which is distance information from data.

The method of claim 8, wherein the discount cumulative compensation value is
If a sentence of the same emotion is uttered immediately following the response of a random response sentence in the original data, it increases based on the previous discount cumulative compensation value,
A conversation agent method using a speech emotion history, which is provided to decrease based on a previous discount cumulative reward value when the same emotion sentence is uttered after a predetermined number of responses to a random response sentence of the original data.

The method of claim 6, after the output step
A model update step of updating the dialogue model by learning the input speech sentences and emotions and the output response sentences and emotions using a policy gradient training according to a predetermined reinforcement learning policy Conversation agent method using a speech emotion history further comprising a.

A computer-readable medium for performing the conversation agent method using the speech emotion history of claim 6.