KR20190074508A

KR20190074508A - Method for crowdsourcing data of chat model for chatbot

Info

Publication number: KR20190074508A
Application number: KR1020170175923A
Authority: KR
Inventors: 김주호; 정준영
Original assignee: 한국과학기술원
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-06-28
Also published as: KR102102287B1

Abstract

The present invention relates to a conversation model for a chatbot, and more specifically, to a method for crowdsourcing data of a conversation model for a chatbot, which collects speaking data and induces to a learning situation based on the collected speaking data, such that spoken data are expanded to expand a conversation model. To this end, the method for crowdsourcing data of a conversation model for a chatbot according to the present invention comprises: a step (a) which user data are inputted; a step (b) of processing the user data inputted in the step (a) in a natural language method; a step (c) of determining recognition credibility of a conversation model corresponding to the user data processed in the step (b); a step (d) of providing a question when the recognition credibility determined in the step (c) is below a reference level, and identifying a cause of the low recognition credibility by using response data of a user with respect to the question; and a step (e) of correcting the conversation model based on the cause of the low recognition credibility identified in the step (d).

Description

[0001] The present invention relates to a method for crowd-sourcing data of a conversation model for a chatbot,

본 발명은 챗봇의 대화 모델에 관한 것으로서, 더욱 상세하게는 발화 데이터 수집과 이 수집된 발화 데이터에 기초하여 학습 상황으로 유도하므로 착화 데이터가 확장되어 대화 모델이 확장되는 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법에 관한 것이다. More particularly, the present invention relates to a communication model of a chatbot, and more particularly, to a communication model of a chatbot for chatbots in which the ignition data is expanded and the communication model is expanded, Sourcing method.

전세계적으로 사람들이 페이스북, 트위터, 밴드 등 소셜미디어 보다 메신저를 더 많이 쓰기 시작했다. 메신저 앱(예를 들면; 카카오톡이나 위챗)을 보면 거의모든 서비스앱(예를 들면; 페북, 페이팔, 우버, 금융 서비스 등)을 메신저 앱 하나 안에서 서비스한다. 이와 같이 메신저 앱이 이미 라이프스타일 플랫폼으로 자리잡아 가고 있는 추세인 것이다. People around the world began to use messengers more than social media such as Facebook, Twitter, and the band. Almost all of your service apps (e.g., Peek, PayPal, Uber, Financial Services, etc.) are served in a messenger app by looking at a messenger app (e.g., KakaoTalk or WitChat). As such, the messenger app is already becoming a lifestyle platform.

챗봇은 대화형 인터페이스의 한 형태로 채팅 화면에서 사람대신 대화하는 로봇이라고 생각할 수 있으며 전화로 하는 ARS 자동응답시스템이 메신저 안으로 들어왔다고 생각할 수 있다. 다시 말하면 아이폰의 시리는 목소리 봇(Voice bot)이고, 시리가 메신저 안으로 들어와서 목소리 말고 문자로 사용자와 대화하는 것이다. Chatbots can be thought of as a form of interactive interface, a robot that speaks on behalf of a person on the chat screen, and you may think that the ARS answering system by phone has come into the messenger. In other words, the iPhone's Siri is a voice bot, and Siri comes into the messenger and talks to the user, not his voice.

그러나 기존의 챗봇들은 단순매칭으로 미리 정의된 키워드를 인식해서 대답하므로 사용자의 발화를 정확하게 인식하지 못하였을 때 그에 상응하게 대응하지 못하거나 더 이상 대화가 불가능한 형태로 대응하여 사용자가 당혹감을 느끼게 하는 문제가 발생하였다. 또한 데이터가 충분치 못하여 여러 대화 상황에 대응하지 못하는 문제도 발생하였다. However, since conventional chatbots recognize and answer predefined keywords by simple matching, when they do not recognize the user's utterance correctly, they can not respond accordingly, . In addition, there was a problem that the data was not enough to cope with various conversation situations.

USUS 2014012261820140122618 A1A1

본 발명은 이와 같은 문제점을 해결하기 위하여 창안된 것으로서, 발화 데이터 수집과 이 수집된 발화 데이터에 기초하여 학습 상황으로 유도하여 착화 데이터가 확장됨으로 대화 모델을 확장시키는 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 제공하는 것을 그 목적으로 한다. Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a communication crowd sourcing The present invention is directed to providing a method for providing a service to a user.

이와 같은 목적을 달성하기 위하여 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법으로서, (a) 사용자 데이터가 입력되는 단계; (b) 상기 단계 (a)에서 입력된 사용자 데이터를 자연어 기법으로 처리하는 단계; (c) 상기 단계 (b)에서 처리된 사용자 데이터에 대응되는 대화 모델의 인식 신뢰도를 판단하는 단계; (d) 상기 단계 (c)에서 판단된 인식 신뢰도가 기준치 이하인 경우 질문을 제공하고 상기 질문에 대한 사용자의 답변 데이터를 이용하여 인식 신뢰도가 낮은 이유를 파악하는 단계; 및 (e) 상기 단계 (d)에서 파악된 낮은 이유에 기초하여 대화 모델을 수정하는 단계를 포함한다. According to another aspect of the present invention, there is provided a chat crowd sourcing method for an interactive model for a chatbot, the method comprising: (a) inputting user data; (b) processing the user data inputted in the step (a) by a natural language technique; (c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b); (d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And (e) modifying the conversation model based on the low reason identified in step (d).

바람직하게는 상기 단계 (c)에서 판단된 인식 신뢰도가 기준치 이상인 경우, 현재 대화 모델 흐름으로 대화가 이루어지는 것이다. Preferably, when the recognition reliability determined in step (c) is equal to or higher than the reference value, conversation is performed in the current conversation model flow.

바람직하게는 상기 기준치는 0.8를 의미하는 것이다. Preferably, the reference value is 0.8.

바람직하게는 상기 인식 신뢰도가 낮은 이유는 현재의 대화 모델로 대응 가능하나 데이터가 부족한 경우이거나 또는 현재의 대화 모델 흐름 상 상기 사용자 데이터를 처리할 수 있는 대화 모델 흐름이 존재하지 않는 경우 중 하나 인 것이다. Preferably, the recognition reliability is low because it is possible to respond to the current conversation model but the data is insufficient or there is no conversation model flow that can process the user data on the current conversation model flow .

바람직하게는 상기 단계 (e)의 대화 모델 수정은, 상기 단계 (d)에서 파악된 이유가 현재의 대화 모델로 대응 가능하나 데이터가 부족한 경우이면, 현재의 대화 모델에 포함된 대화 흐름 중 어느 하나에 해당하는지 여부를 파악하여 해당하는 대화 모델 흐름에 데이터를 추가하고, 현재의 대화 모델 흐름 중 상기 사용자 데이터를 처리할 수 있는 대화 모델 흐름이 존재하지 않는 경우이면, 새로운 대화 흐름을 추가하는 것이다. Preferably, the modification of the dialogue model of the step (e) may be performed when the reason identified in the step (d) is applicable to the current dialogue model but the data is insufficient, , Adds data to the corresponding conversation model flow, and adds a new conversation flow if there is no conversation model flow that can process the user data in the current conversation model flow.

바람직하게는 상기 새로운 대화 흐름을 추가하는 과정은, (e1) 다음 대화 모델이 어떻게 이어질 지에 대한 질문을 제공하고 상기 질문에 대한 사용자의 답변 데이터를 수집하는 단계; 및 (e2) 상기 단계 (e1)에서 제공된 질문 이후 어떻게 확장될 수 있는지에 대한 질문을 제공하고 상기 질문에 대한 사용자의 답변 데이터를 수집하는 단계를 포함하는 것이다. Preferably, the step of adding the new conversation flow further comprises: (e1) providing a question as to how the next conversation model will lead and collecting the user's answer data for the question; And (e2) providing a question as to how to expand after the question provided in the step (e1), and collecting user's answer data for the question.

바람직하게는 상기 단계 (e2) 이후, 상기 수집된 답변 데이터가 신뢰할 수 있는지를 묻는 질문을 통해 신뢰도를 확인하는 것이다. Preferably, after step (e2), the reliability is confirmed by asking whether the collected answer data is reliable.

바람직하게는 상기 데이터를 추가하는 과정은 사용자 데이터의 핵심이 되는 명사가 사용된, 질문을 이용하여 핵심이 되는 단어들이 무엇인지 파악함으로 이루어지는 것이다. Preferably, the process of adding the data is performed by determining what words are the core words using the nouns that are the core of the user data.

바람직하게는 상기 흐름은 대화 노드 및 대화 엣지를 포함한다.컴퓨터로 실행가능한 명령을 저장하는 적어도 하나의 메모리를 포함하되,Preferably, the flow comprises a dialog node and a dialog edge. The computer-readable medium of claim 1, further comprising: at least one memory for storing computer-executable instructions,

상기 적어도 하나의 메모리에 저장된 상기 컴퓨터로 실행가능한 명령은, 상기 적어도 하나의 프로세서에 의하여, 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법으로서, (a) 사용자 데이터가 입력되는 단계; (b) 상기 단계 (a)에서 입력된 사용자 데이터를 자연어 기법으로 처리하는 단계; (c) 상기 단계 (b)에서 처리된 사용자 데이터에 대응되는 대화 모델의 인식 신뢰도를 판단하는 단계; (d) 상기 단계 (c)에서 판단된 인식 신뢰도가 기준치 이하인 경우 질문을 제공하고 상기 질문에 대한 사용자의 답변 데이터를 이용하여 인식 신뢰도가 낮은 이유를 파악하는 단계; 및 (e) 상기 단계 (d)에서 파악된 낮은 이유에 기초하여 대화 모델을 수정하는 단계를 포함하는 방법이 수행되도록 하는 챗봇을 위한 대화 모델의 데이터 크라우드소싱을 위한 장치이다. The computer-executable instructions stored in the at least one memory, the method comprising the steps of: (a) inputting user data; (b) processing the user data inputted in the step (a) by a natural language technique; (c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b); (d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And (e) modifying the conversation model based on the low reason identified in step (d) above, is performed for the chatbot.

이와 같은 목적을 달성하기 위하여 본 발명에 따른 다른 측면은 비일시적 저장 매체에 저장되며, 프로세서에 의하여 실행되면, 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법으로서, (a) 사용자 데이터가 입력되는 단계; (b) 상기 단계 (a)에서 입력된 사용자 데이터를 자연어 기법으로 처리하는 단계; (c) 상기 단계 (b)에서 처리된 사용자 데이터에 대응되는 대화 모델의 인식 신뢰도를 판단하는 단계; (d) 상기 단계 (c)에서 판단된 인식 신뢰도가 기준치 이하인 경우 질문을 제공하고 상기 질문에 대한 사용자의 답변 데이터를 이용하여 인식 신뢰도가 낮은 이유를 파악하는 단계; 및 (e) 상기 단계 (d)에서 파악된 낮은 이유에 기초하여 대화 모델을 수정하는 단계를 포함하는 방법이 수행되도록 명령을 포함하는 컴퓨터 프로그램 제품이다.According to another aspect of the present invention, there is provided a method for sourcing a data model of an interactive model for a chatbot, the method comprising the steps of: (a) inputting user data; (b) processing the user data inputted in the step (a) by a natural language technique; (c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b); (d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And (e) modifying the conversation model based on the low reason identified in step (d).

본 발명에 의하면, 사용자의 다양한 발화에 대응할 수 있는 대화 모델을 제작할 수 있는 효과가 있다. According to the present invention, an interactive model capable of responding to various utterances of the user can be produced.

도 1은 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 나타낸 순서도.
도 2는 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 설명하기 위한 대화 모델 구조의 일 예를 나타낸 도면.
도 3은 도 1에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법의 일예로 설명하기 위한 도면.
도 4는 도 3에 따른 과정을 보여주기 위한 학습의 일예를 나타낸 도면.
도 5는 도 1에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법의 다른 예를 설명하기 위한 도면.
도 6은 도 5에 따른 과정을 보여주기 위한 학습의 일예를 나타낸 도면.
도 7은 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 적용하는데 있어 분류기의 성능이 좋지 못한 경우를 보여주기 위한 도면.
도 8은 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 적용하는데 있어 대화 모델이 대응할 수 없는 경우를 보여주는 도면.
도 9는 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 장치를 나타낸 블록도.1 is a flowchart showing a data crowd sourcing method of an interactive model for a chatbot according to the present invention.
2 is a diagram illustrating an example of a dialogue model structure for explaining a data crowd sourcing method of an interactive model for a chatbot according to the present invention.
FIG. 3 is a diagram for explaining an example of a data crowd sourcing method of a conversation model for a chatbox according to FIG. 1; FIG.
4 is a view showing an example of learning for showing a process according to FIG. 3. FIG.
FIG. 5 is a diagram for explaining another example of a data crowd sourcing method of an interactive model for a chatbot according to FIG. 1; FIG.
FIG. 6 is a view showing an example of learning for showing a process according to FIG. 5; FIG.
FIG. 7 is a diagram illustrating a case where the performance of a classifier is poor in applying a data crowd sourcing method of an interactive model for a chatbot according to the present invention; FIG.
FIG. 8 is a diagram illustrating a case in which an interactive model can not accommodate a data crowd sourcing method of an interactive model for a chatbot according to the present invention; FIG.
FIG. 9 is a block diagram illustrating a data crowd sourcing apparatus of an interactive model for a chatbot according to the present invention; FIG.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined. Therefore, the embodiments described in this specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and do not represent all the technical ideas of the present invention. Therefore, It is to be understood that equivalents and modifications are possible.

도 1은 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 나타낸 순서도이고, 도 2는 본 발명에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법을 설명하기 위한 대화 모델 구조의 일 실시예를 나타낸 도면이다. FIG. 1 is a flowchart illustrating a data crowd sourcing method of an interactive model for a chatbot according to the present invention. FIG. 2 is a flowchart illustrating an example of a dialogue model structure for explaining a data crowd sourcing method of an interactive model for a chatbot according to the present invention Fig.

먼저 본 발명의 이해를 돕기 위하여 도 2를 참조하여 본 발명에서 표현된 대화 모델의 구조를 설명하면, 대화 모델이란 하나의 대화 노드(원으로 표시)로 시작하여 다수의 대화 노드와 대화 엣지(화살표 실선으로 표시)를 포함한다. 대화 노드는 대화에 있어서 현재의 상태를 명시하며, 더불어 챗봇이 발화해야하는 내용을 담고 있는 자료 구조체이다. 대화 노드에서의 노드 ID는 전체 대화 모델에 대한 현재 대화의 위치를 표시하고, 노드 컨텐츠는 현재 대화 상황에서 챗봇이 출력 가능한 문장들의 모음이며, 현재 사용자의 인풋에 대해 대응할 수 있도록 교체 가능한 명사 같은 부분은 따로 명시(예; You went to {place/}) 되어 있다. 대화 엣지는 대화에 있어서 사용자의 입력과 대응되는 자료 구조체로 대화 노드들을 연결하여 사용자의 입력에 대해 어떤 노드로 이동해야하는지 알려준다. 즉 엣지 ID는 전체 대화 모델이 현재 노드 중에 어느 방향으로 이동해야하는지 알려준다. 본 발명에서는 대화 노드 및 대화 엣지를 포함하여 대화 흐름으로 표현하기도 한다. 2, the dialog model will be described with reference to FIG. 2. Referring to FIG. 2, the dialog model includes a plurality of dialog nodes (arrows) Indicated by solid lines). The conversation node specifies the current state of the conversation, and is a data structure that contains the contents of the chatbot. The node ID at the conversation node indicates the location of the current conversation for the entire conversation model, and the node content is a collection of sentences that can be output by the chatbot in the current conversation situation, such as a replaceable noun portion (Eg You went to {place /}). The conversation edge connects the conversation nodes to the data structure corresponding to the user's input in the conversation, and tells the user what input node should be moved to the input. That is, the edge ID tells the entire conversation model which direction the current node should move. In the present invention, a conversation flow including a conversation node and a conversation edge may be expressed.

본 발명에 따른 대화 모델의 데이터 크라우드소싱을 위한 방법은 도 1에 도시된 바와 같이 사용자 데이터가 입력된다(S100). 사용자 데이터는 음성 데이터 또는 텍스트 데이터 일 수 있으며 본 발명에서는 음성 데이터 및 텍스트 데이터를 포괄하여 사용자 데이터라 칭한다. In a method for crowd sourcing data of an interactive model according to the present invention, user data is input as shown in FIG. 1 (S100). The user data may be voice data or text data. In the present invention, voice data and text data are collectively referred to as user data.

사용자 데이터가가 입력되면, 이 입력된 사용자 데이터를 챗봇이 이해할 수 있도록 자연어 기법에 의하여 자연어로 처리한다(S110).When the user data is input, the input user data is processed as a natural language by a natural language technique so that the chatbot can understand it (S110).

이후 자연어 기법으로 처리된 사용자 데이터에 대응되는 대화 모델의 인식 신뢰도를 판단(S120)하는데, 이 때 인식 신뢰도가'1'이라 할 때, 이 인식 신뢰도가 기준치 이상인지를 판단한다(S130). 여기서 기준치는 0.8을 의미한다. Then, the recognition reliability of the conversation model corresponding to the user data processed by the natural language technique is determined (S120). When the recognition reliability is '1', it is determined whether the recognition reliability is equal to or greater than the reference value (S130). Here, the reference value means 0.8.

판단 결과(S130), 인식 신뢰도가 기준치 이상인 경우, 현재 대화 모델 흐름으로 대화를 수행한다(S140).If the recognition reliability is equal to or higher than the reference value, the dialogue is performed in the current dialogue model flow (S140).

반면 인식 신뢰도가 기준치 이하이면 사용자에게 질문을 제공하고 이 제공된 질문에 대한 사용자의 답변 데이터를 이용하여 인식 신뢰도가 낮은 이유를 파악한다(S150). 여기서 인식 신뢰도가 낮은 이유는 현재의 대화 모델로 대응 가능하나 데이터가 부족한 경우이거나 또는 현재의 대화 모델 흐름 중 사용자 데이터를 처리할 수 있는 대화 모델 흐름이 존재하지 않는 경우이다. On the other hand, if the recognition reliability is less than the reference value, a question is provided to the user, and the reason why the recognition reliability is low is determined using the user's answer data for the provided question (S150). The reason that the recognition reliability is low is that the current dialogue model is applicable but the data is insufficient or there is no dialogue model flow that can process the user data in the current dialogue model flow.

이어서 단계 (S150)에서 파악된 낮은 이유에 기초하여 대화 모델을 수정한다(S160). 여기서 수정하는 방법은 단계 (S150)에서 파악된 이유에 따라 다르며, 파악된 이유가 현재의 대화 모델로 대응 가능하나 데이터가 부족한 경우이면, 현재의 대화 모델에 포함된 대화 흐름 중 어느 하나에 해당하는지 여부를 파악하여 해당하는 대화 모델 흐름에 데이터를 추가하고, 현재의 대화 모델 흐름 중 사용자 데이터를 처리할 수 있는 대화 모델 흐름이 존재하지 않는 경우이면, 새로운 대화 흐름을 추가한다.Subsequently, the dialogue model is modified based on the low reason identified in step S150 (S160). Here, the method for modification differs depending on the reason identified in step S150. If the reason is grasped by the current dialogue model, but the data is insufficient, it is determined which one of the conversation flows included in the current dialogue model , Adds data to the corresponding conversation model flow, and adds a new conversation flow if no conversation model flow exists to process user data in the current conversation model flow.

또한 새로운 대화 흐름을 추가하는 과정은 다음 대화 모델이 어떻게 이어질 지에 대한 질문을 제공하고 제공된 질문에 대한 사용자의 답변 데이터를 수집한다. 그리고 어떻게 확장될 수 있는지에 대한 질문을 제공하고 이 질문에 대한 사용자의 답변 데이터를 수집한다. In addition, the process of adding a new conversation flow provides questions about how the next conversation model will lead and collects the user's answer data for the questions provided. And how it can be extended, and collects user response data for this question.

그리고 데이터를 추가하는 과정은 사용자 데이터의 핵심이 되는 명사가 사용된, 질문을 이용하여 핵심이 되는 단어들이 무엇인지 파악함으로 이루어진다. The process of adding data is done by using the nouns that are the core of the user data and identifying the key words using the questions.

본 발명에서 대화 모델의 데이터 크라우드소싱이 이루어지는 상황은 크게 두 상황에 대응하기 위한 것으로써 첫째 인식 신뢰도를 판단하는 분류기(대화관계 모듈)의 성능이 좋지 않아 문장이 오인식 되는 경우와, 둘째 대화 모델이 대응할 수 없는 문장이 입력되는 경우로, 두 경우 모두 챗봇의 분류기에 있어 인식 신뢰도가 낮은 상황으로 도 7은 분류기의 성능이 좋지 못한 경우를 보여주기 위한 도면이고 도 8은 대화 모델이 대응할 수 없는 경우를 보여주는 도면이다. In the present invention, the situation in which data crowd sourcing of the conversation model is performed corresponds to two situations. First, there is a case where the performance of the classifier (conversation relation module) for determining the recognition reliability is poor and the sentence is misrecognized. In the case of both cases, the recognition reliability is low in the classifier of the chatbot. Fig. 7 is a diagram showing a case where the classifier performance is not good. Fig. 8 is a diagram Fig.

도 3은 도 1에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법의 일예를 나타낸 도면으로 사용자 데이터가 입력되고, 이 입력된 사용자 데이터를 자연어 처리 기법으로 처리하여 '로마에 갔어요'라는 문장이 입력되면, 이 입력된 문장의 인식 신뢰도를 판단하게 된다. 이때 '로마에 갔어요'라는 문장이 노드_1에 대하여 인식 신뢰도가 기준치 이상이 되면, 도 3에 도시된 바와 같이'노드_2'의 대화 모델로 대화 흐름이 전개된다. 그리고 노드_2에 대하여 입력된 문장이 '파스타를 먹었어요','콜로세움을 봤어요','옷을 샀어요'등의 문장이 입력된 경우, 이 입력된 문장들의 인식 신뢰도가 기준치 이상인지를 판단한다. 이때 기준치에 따라 현재 모델에 포함된 대화 흐름(노드_3, 노드_4, 노드_5) 중 어느 하나에 해당하는지 파악하여 해당하는 대화 흐름으로 대화가 수행되도록 한다. 이때 분류기(대화관리 모듈)의 성능이 좋지 못한 경우 추가적으로 핵심이 되는 명사가 무언인지에 대한 정보 등을 받는 과정을 거쳐 사용자에게 단어의 사용에 대해 한 번 더 물음으로서 문장에서 핵심이 되는 단어들이 무엇인지 파악한다. 즉, 도 3의 노드_4 내지 노드_5와 같이 핵심이 되는 명사 '콜로세움'과 '옷'에 대하여 사용자에게 '콜로세움을 보셨구나''쇼핑하셨구나'라는 문장을 사용하면서 노드_6과 같은 흐름이 전개 될 수 있도록 대화 흐름을 유도한다. 이와 같이 대화의 인식 신뢰도가 낮은 상황에서 오류문장을 출력하는 대신 대화 데이터를 수집하는 자연스러운 학습상황으로 사용자를 유도하여 챗봇이 대화에 대응하지 못하는 상황을 자연스럽게 넘어가는 역할을 담당하도록 한다.FIG. 3 is a diagram showing an example of a data crowd sourcing method of a conversation model for a chatbox according to FIG. 1, in which user data is input, the input user data is processed by a natural language processing technique, and a sentence ' The recognition reliability of the inputted sentence is judged. At this time, if the sentence 'went to Rome' has a recognition reliability higher than the reference value for the node _ 1, a conversation flow is developed to the dialog model of 'node _ 2' as shown in FIG. When a sentence such as 'Have eaten pasta', 'Have you seen colosseum' or 'Have bought clothes' has been input to the node_2, it is determined whether the recognition reliability of the inputted sentences is equal to or more than the reference value . At this time, it is determined according to the reference value whether it corresponds to the conversation flow (node_3, node_4, node_5) included in the current model, and the conversation is performed in the corresponding conversation flow. In this case, if the performance of the classifier (the dialogue management module) is not good, the process of receiving the information about the noun which is the core, and the like is inquired once more about the use of the word by the user, . That is, it is possible to use a sentence such as 'Have You Seen the Coliseum' or 'You Have Shopped' for the core colloquial 'Colosseum' and 'Clothes' as in node_4 to node_5 in FIG. It induces the conversation flow so that the flow can be evolved. In this way, when the recognition reliability of the conversation is low, the user is guided to a natural learning situation in which the conversation data is collected instead of outputting the error sentence, so that the chatbot naturally takes over the situation in which the chatbot can not respond to the conversation.

도 4는 도 3에 따른 과정을 보여주기 위한 학습의 일예를 나타낸 도면으로 현재 대화 흐름의 대화 엣지를 타고 다음 대화 흐름으로 넘어갈 수 있는 경우는 인식 신뢰도가 기준치 이상인 경우이며, 그 외의 경우에는 도 4와 같이 학습의 단계가 이루어진다. 이 경우 새로 습득된 데이터와 레이블을 포함하여 다시 학습시키는 과정을 통해 챗봇의 성능을 개선할 수 있다. FIG. 4 is a diagram showing an example of learning for showing the process according to FIG. 3. In the case where the recognition reliability can be transferred to the next conversation flow on the conversation edge of the current conversation flow, the recognition reliability is higher than the reference value. As shown in Fig. In this case, the performance of the chatbot can be improved through the process of re-learning including newly acquired data and labels.

도 5는 도 1에 따른 챗봇을 위한 대화 모델의 데이터 크라우드소싱 방법 중 다른 예를 나타낸 도면이고 도 6은 도 5에 따른 과정을 보여주기 위한 학습의 일예를 나타낸 도면으로 사용자 데이터가 입력되고, 이 입력된 사용자 데이터를 자연어 처리 기법으로 처리하여 '로마에 갔어요'라는 문장이 입력되면, 이 입력된 문장의 인식 신뢰도를 판단하게 된다. 이때 '로마에 갔어요'라는 문장이 노드_1에 대하여 인식 신뢰도가 기준치 이상이 되면 도 5에 도시된 바와 같이'노드_2'의 대화 모델로 대화 흐름이 전개되거나 기준치 이하이면 현재 대화 모델에 새로운 대화 모델로 추가하는 과정을 수행하게 되는데 도 5에 도시된 바와 같이 노드_1에 대하여 입력된 문장이'돈이 없어서 여행을 못갔어"와 같이 인식 신뢰도가 기준치 이하이고 기준치 이하인 이유가 사용자 데이터를 처리할 수 없는 대화 모델이 존재하지 않은 경우 새로운 대화 모델(노드_N#)을 추가하는 과정이 수행된다. 이때 새로운 대화 모델(노드_N#)이 추가 수행되는 과정은 판단된 오류가 해당 챗봇이 대응할 수 없는 발화 때문이므로 다음 발화가 어떻게 이어질지에 대해 물음으로서 대화 모델이 어떻게 확장될 수 있는지에 대한 데이터를 수집할 수 있다. 또한 이러한 과정을 통해 학습자들이 생각하지 못했던, 혹은 단순히 대화 모델이 제공하지 못하는 대화 상황에 대해서도 발화들에 대해 생각할 기회를 주고 학습자들의 언어 사용 실력을 증진할 수 있다. FIG. 5 is a diagram illustrating another example of a data crowd sourcing method of an interactive model for a chatbox according to FIG. 1. FIG. 6 illustrates an example of learning for showing a process according to FIG. 5, When the inputted user data is processed by the natural language processing method and the sentence 'went to Rome' is inputted, the recognition reliability of the inputted sentence is judged. At this time, if the sentence 'went to Rome' has a recognition reliability higher than the reference value for the node_1, as shown in FIG. 5, if the dialogue flow of the node_2 is developed or below the reference value, The reason why the recognition reliability is below the reference value and below the reference value as shown in FIG. 5 is that the sentence inputted to the node_1 is " If there is no dialog model that can not be processed, a process of adding a new conversation model (node_N #) is performed. At this time, the process of adding a new conversation model (node_N # Because this is an unacceptable ignition, we can collect data on how the dialogue model can be expanded by asking how the next speech will lead (C) it may also give a chance to think about the fire about the learners did not think through this process, or the conversation situation does not simply provide a dialogue model to promote the language skills of the learners.

그리고 이와 같이 대화 모델을 확장하는 데이터를 수집한 경우 학습자에게 받은 데이터가 정답이라는 확신을 하기 어려우니 향후 사용자들에게 해당 노드의 데이터가 신뢰할 수 있는 데이터인지를 묻는 과정을 통해 해당 데이터의 신뢰도를 확인 한 후 대화 모델에 추가시키며 이는 도 6에 도시되어 있다. In this way, it is difficult to confirm that the data received by the learner is correct when collecting data that extends the dialogue model. In the future, users are asked whether the data of the corresponding node is reliable data, And then added to the conversation model, as shown in FIG.

또한 대화를 확장하는 도중에 대화 모델이 끊기는 경우, 본 대화 모델은 결국 하나의 지점으로 수렴하므로'그런데...'와 같은 화제를 전환하는 발화와 함께 현재의 대화 노드를 챗봇의 대화가 수렴하는 지점으로 이동시킨다. In addition, if the conversation model is disconnected while expanding the conversation, the conversation model converges to a single point, and thus, the conversation switching topic such as 'but ...' .

도 9는 본 발명에 따른 대화 모델의 데이터 크라우드소싱을 위한 챗봇을 간략하게 도시한 블록도이다. 9 is a block diagram briefly showing a chatbot for data crowd sourcing of an interactive model according to the present invention.

챗봇은 컴퓨터 프로그램으로 구현되어 컴퓨팅 장치에 의하여 수행될 수 있다. 컴퓨터 프로그램으로 구현된 본 발명의 챗봇은 일반적인 컴퓨터 시스템일 수 있다. The chatbot can be implemented as a computer program and executed by a computing device. The chatbot of the present invention implemented as a computer program may be a general computer system.

챗봇(10)은 도 1에 도시된 바와 같이, 하나 이상의 프로세서(100), 하나 이상의 메모리(200), 하나 이상의 입력장치(300), 하나 이상의 출력장치(400), 하나 이상이 통신장치(500), 하나 이상의 저장장치(700) 및 이들간의 통신을 위한 통신채널(600)을 구비한다. 1, the chatbot 10 may include one or more processors 100, one or more memories 200, one or more input devices 300, one or more output devices 400, one or more communication devices 500 , One or more storage devices 700, and a communication channel 600 for communication therebetween.

프로세서(100)는 챗봇(10) 내에서 명령을 실행하도록 구성된다. 메모리(200)는 컴퓨터 판독 가능한 일시적인 저장 매체로서, 전원이 공급되지 않으면 저장된 데이터를 유지하지 않는다. 메모리(200)는 예를 들어 프로세서(100)가 실행하기 위한 명령, 명령을 실행하기 위하여 필요한 데이터, 명령의 실행에 의하여 발생된 데이터 등을 일시적으로 저장하기 위해 사용된다. The processor 100 is configured to execute instructions within the chatbot 10. The memory 200 is a computer-readable temporary storage medium that does not retain stored data unless power is supplied. The memory 200 is used for temporarily storing, for example, a command for the processor 100 to execute, data necessary for executing the command, data generated by execution of the command, and the like.

입력장치(300)는 사용자로부터 다양한 형태의 데이터를 입력받기 위하여 사용되며, 데이터의 형태 및 컴퓨팅 장치(10)에 따라 다양한 종류의 것이 사용될 수 있다. 예를 들어 컴퓨팅 장치(10)가 일반적인 컴퓨터 시스템인 경우에는 키보드와 마우스가 사용될 수 있으며, 태블릿 컴퓨터인 경우에는 터치 디스플레이에 표시되는 키보드가 사용될 수 있다. 또한 입력되는 데이터가 영상인 경우에는 카메라가 사용될 수 있으며, 음향인 경우에는 마이크가 사용될 수 있다. The input device 300 is used to receive various types of data from a user. Various types of data may be used depending on the type of data and the computing device 10. For example, a keyboard and a mouse may be used when the computing device 10 is a general computer system, and a keyboard displayed on a touch display may be used if the computing device 10 is a tablet computer. A camera may be used when the input data is a video, and a microphone may be used when the input data is audio.

출력장치(400)는 사용자에게 다양한 형태의 데이터를 제공하기 위하여 사용되며, 데이터의 형태 및 컴퓨팅 장치(10)에 따라 다양한 종류의 것이 사용될 수 있다. The output device 400 is used to provide various types of data to the user, and various types of data may be used depending on the type of data and the computing device 10.

컴퓨팅 장치(10)는 통신장치(500)를 이용하여 외부의 기기 또는 네트워크와 통신한다. 통신장치(500)를 이용한 통신은 유선, 무선 등 다양한 방식으로 이루어질 수 있다. The computing device 10 communicates with an external device or network using the communication device 500. Communication using the communication device 500 can be performed by various methods such as wire and wireless.

저장장치(700)는 컴퓨터로 판독 가능한 장기간의 저장을 위한, 하드 디스크와 같은 저장 매체이다. 컴퓨팅 장치(10)는 프로세서(100)에 의하여 실행 될 수 있는 오퍼레이팅 시스템(20)을 저장하며 오퍼레이팅 시스템(20)은 컴퓨팅 장치(10)의 작동 및 컴퓨터 프로그램의 실행을 제어한다. 저장장치(700)는 컴퓨팅 장치(10)에 의해 실행되는 컴퓨터 프로그램 및/또는 데이터를 저장할 수 있다. Storage 700 is a storage medium, such as a hard disk, for long term storage that is computer readable. The computing device 10 stores an operating system 20 that may be executed by the processor 100 and the operating system 20 controls the operation of the computing device 10 and the execution of the computer program. The storage device 700 may store computer programs and / or data that is executed by the computing device 10.

저장장치(700)는 컴퓨팅 장치(10)에 의하여 실행되는 컴퓨터 프로그램을 포함하는 복수개의 모듈, 즉 채팅관리 모듈(710), 자연어처리 모듈(720), 대화관리 모듈(730), 학습관리 모듈(750)을 저장한다. 또한 저장장치(700)는 이들 복수개의 모듈에 저장된 컴퓨터 프로그램을 실행하기 위하여 필요한 대화 모델 데이터 베이스가 구비되어 있다. The storage 700 includes a plurality of modules including a computer program executed by the computing device 10 such as a chat management module 710, a natural language processing module 720, a conversation management module 730, a learning management module 750). Also, the storage device 700 is provided with an interactive model database necessary for executing a computer program stored in the plurality of modules.

챗봇(10)에서의 일부 모듈은 사용자 단말기에 일부 포함되거나 일부는 서버(도시되지 않음)에 포함될 수 있으며, 챗팅관리 모듈(710)은 사용자 인터페이스를 처리하는 모듈로서 사용자와 사물 또는 시스템, 특히 기계, 컴퓨터 프로그램 등 사이에서 의사소통을 할 수 있도록 일시적 또는 영구적인 접근을 목적으로 만들어진 물리적, 가상적 매개체이다. 사용자 인터페이스는 물리적인 하드웨어와 논리적인 하드웨어 요소를 포함하며 사용자가 시스템을 조작할 수 있는 입력과, 시스템이 사용자가 이용한 것에 대한 결과를 표시하는 출력이 가능할 수 있도록 한다. Some of the modules in the chatbot 10 may be included in the user terminal or some of them may be included in the server (not shown), and the chat management module 710 may be a module for processing the user interface, , A computer program, etc., for the purpose of temporary or permanent access to the physical and virtual medium. The user interface includes physical hardware and logical hardware components and allows the user to manipulate the system and output to indicate the results of what the system is using.

자연어처리 모듈(720)은 입력된 텍스트 데이터를 가장 작은 단위인 형태소 단위로 구분해 각 형태소가 어떤 품사를 가지는지 분석하고 형태소 분석으로 입력된 데이터에 대해 명사, 동사, 형용사, 조사 등의 품사 정보를 알아내어 구문분석 과정을 거쳐 의미 관계 분석에 활용된다. 구문분석은 명사구, 동사구, 형용사구 등 특정 기준을 가지고 구분하고, 구분된 각 덩어리(chunk) 사이에 어떠한 관계가 존재하는지 분석한다. 이러한 구문분석을 주어, 목적어, 수식어구들을 파악할 수 있다. 이 정보들은 그 자체로도 의미를 지니지만 추가적인 화행분석으로 활용도가 높아진다. 화행분석이란 사용자 발화에 대한 의도를 분석하는 작업으로 사용자가 질문을 하는 것인지, 요청을 하는 것인지, 단순한 감정 표현인지 문장의 의도를 구분해 내는 것이다.The natural language processing module 720 divides the input text data into morpheme units, which are the smallest units, and analyzes what parts of the morpheme have, and inputs morphemes, verbs, adjectives, And it is used for semantic relationship analysis through parsing process. Parsing is done with specific criteria such as noun phrases, verbal phrases, and adjective phrases, and analyzes how there is a relationship between each chunk. By giving these parsing, you can identify object and modifiers. This information is meaningful in itself, but its use is enhanced by additional analysis of the legend. An analysis is an analysis of the intention of a user utterance. It is a classification of the intention of the user whether it is a question, a request, or a simple emotion expression.

대화관리 모듈(730)은 인식 신뢰에 따라 입력된 사용자 데이터 대해 어떠한 대답으로 대화가 이루어져야 할지, 호응을 할지를 판단하므로 적절한 답변을 위한 대화처리 과정이 이루어지도록 한다. The conversation management module 730 determines whether or not the conversation should be performed or responded to the inputted user data according to the recognition reliability, so that the conversation processing process for the proper answer is performed.

학습관리 모듈(740)은 습득한 데이터와 레이블을 포함하여 다시 학습시키는 과정을 통해 챗봇의 성능을 개선시키도록 한다. The learning management module 740 improves the performance of the chatbot by re-learning including the acquired data and the label.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It will be understood that various modifications and changes may be made without departing from the scope of the appended claims.

10: 챗봇
20: 오퍼레이팅시스템
100: 프로세서
200: 메모리
300: 입력장치
400: 출력장치
500: 통신장치
600: 통신채널
700: 저장장치
710: 챗팅관리 모듈
720: 자연어처리 모듈
730: 대화관리 모듈
740: 학습관리 모듈10: Chambot
20: Operating System
100: Processor
200: memory
300: input device
400: Output device
500: communication device
600: communication channel
700: storage device
710: Chat management module
720: Natural language processing module
730: Dialogue management module
740: Learning Management Module

Claims

As a data crowd sourcing method of a conversation model for chatbots,
(a) inputting user data;
(b) processing the user data inputted in the step (a) by a natural language technique;
(c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b);
(d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And
(e) modifying the conversation model based on the low reason identified in step (d)
A method for crowd sourcing data of a chat model for chatbots.

The method according to claim 1,
If the recognition reliability determined in step (c) is equal to or higher than the reference value, conversation is performed in the current conversation model flow
A method for sourcing data crowds of chat models for chatbots featuring.

The method according to claim 1,
The reference value means 0.8
A method for sourcing data crowds of chat models for chatbots featuring.

The method according to claim 1,
The reason that the recognition reliability is low is that it is possible to respond to the current conversation model but the data is insufficient or the conversation model flow that can process the user data in the current conversation model flow does not exist
A method for sourcing data crowds of chat models for chatbots featuring.

The method according to claim 1,
The dialogue model modification of step (e)
If the reason identified in the step (d) is applicable to the current conversation model but the data is insufficient, it is determined which one of the conversation flows included in the current conversation model, And if there is no conversation model flow that can process the user data in the current conversation model flow, adding a new conversation flow
A method for sourcing data crowds of chat models for chatbots featuring.

The method of claim 5,
The process of adding the new conversation flow comprises:
(e1) providing a question as to how the next conversation model will lead and collecting user's answer data for the question; And
(e2) providing a question as to how to be expanded after the question provided in the step (e1) and collecting user's answer data for the question
The method comprising the steps of:

The method of claim 6,
After the step (e2), the reliability is confirmed by asking whether the collected answer data is reliable
A method for sourcing data crowds of chat models for chatbots featuring.

The method of claim 6,
The process of adding the data
The key to user data is by using the nouns to identify the key words
A method for sourcing data crowds of chat models for chatbots featuring.

Claim 2
The flow includes a dialog node and a dialog edge
A method for sourcing data crowds of chat models for chatbots featuring.

At least one memory for storing computer executable instructions,
The computer-executable instructions stored in the at least one memory, the method of sourcing a data model of a conversation model for a chatbot by the at least one processor,
(a) inputting user data;
(b) processing the user data inputted in the step (a) by a natural language technique;
(c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b);
(d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And
(e) modifying the conversation model based on the low reason identified in step (d)
The apparatus for crowd sourcing data of a dialogue model for a chatbot.

A method for sourcing a data model of a conversation model for a chatbot stored in a non-temporary storage medium and executed by a processor,
(a) inputting user data;
(b) processing the user data inputted in the step (a) by a natural language technique;
(c) determining recognition reliability of the conversation model corresponding to the user data processed in the step (b);
(d) providing a question if the recognition reliability determined in step (c) is less than a reference value and determining a reason for low recognition reliability using the user's answer data for the question; And
(e) modifying the conversation model based on the low reason identified in step (d)
The method comprising the steps of: