KR20200119393A

KR20200119393A - Apparatus and method for recommending learning data for chatbots

Info

Publication number: KR20200119393A
Application number: KR1020190034927A
Authority: KR
Inventors: 서지암; 서문길; 전병훈
Original assignee: 주식회사 단비아이엔씨
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-20
Also published as: KR102285142B1

Abstract

The present invention relates to a device and a method for recommending training data for a chatbot. The device for recommending training data comprises: a data preprocessing part that determines a similarity ranking by at least one classification criterion by analyzing a training population that is periodically collected and updated from a plurality of chatbots; a recommendation request receiving par that receives a recommendation request for training data from a user terminal; a recommendation data generation part that determines the training type for the recommendation request and generates recommendation data for the training type based on the similarity ranking; and a data recommendation part that provides the recommendation data to a user terminal as a response to the recommendation request. Therefore, training data that is most suitable for input of a user can be recommended from data used by chatbots.

Description

Learning data recommendation device and method for chatbot {APPARATUS AND METHOD FOR RECOMMENDING LEARNING DATA FOR CHATBOTS}

본 발명은 챗봇을 위한 학습 데이터 추천 기술에 관한 것으로, 보다 상세하게는 챗봇이 사용하는 데이터에서 사용자의 입력에 가장 적합한 학습 데이터를 추천할 수 있는 챗봇을 위한 학습 데이터 추천 장치 및 방법에 관한 것이다.The present invention relates to a learning data recommendation technology for a chatbot, and more particularly, to a learning data recommendation apparatus and method for a chatbot capable of recommending learning data most suitable for user input from data used by the chatbot.

챗봇(chatbot)은 사용자의 의도를 파악하여 답변/접수를 대화로 진행할 수 있는 Bot서비스에 해당할 수 있다. 챗봇은 ARS처럼 명확하게 의도를 알아내고 처리하는 룰(rule) 기반 챗봇과 자연어처리를 통해 의도를 인식하고 답변하는 지능형 챗봇으로 분류될 수 있다. 챗봇은 채팅의 텍스트나 음성으로 구성된 언어적 입력을 자연어 처리 프로세서(NPL, Natural Language Processor)를 통해 컴퓨터 상에서 처리될 수 있는 형태로 변환하고, 변환된 자연어의 대화 의도에 따라 적절한 답변을 제공하는 방식으로 구현될 수 있다.A chatbot may correspond to a Bot service that can identify a user's intention and respond/receive through a conversation. Chatbots can be classified into rule-based chatbots that clearly identify and process intentions like ARS, and intelligent chatbots that recognize and respond to intentions through natural language processing. Chatbot converts verbal input composed of text or voice of chat into a form that can be processed on a computer through a natural language processor (NPL), and provides appropriate answers according to the conversation intention of the converted natural language. It can be implemented as

또한, 챗봇은 머신 러닝(machine learning)을 적용하여 기존의 많은 요소들을 대체할 수 있다. 특히, 챗봇은 자연어 처리 프로세서에 딥러닝 모델을 적용하여 사용자가 입력한 텍스트 또는 음성의 의도를 보다 정확히 파악할 수 있다. 다만, 이러한 딥러닝 모델이 일정 수준 이상의 정확도를 제공하기 위해서는 사전에 많은 데이터를 학습할 필요가 있으며, 챗봇의 운용 초기와 같이 학습 데이터를 확보하기 어려운 상황에서 학습 데이터를 효과적으로 제공할 수 있는 기술이 필요하다.In addition, chatbots can replace many existing elements by applying machine learning. In particular, the chatbot can more accurately grasp the intention of the text or voice input by the user by applying a deep learning model to the natural language processing processor. However, in order for this deep learning model to provide more than a certain level of accuracy, it is necessary to learn a lot of data in advance. need.

한국등록특허 제10-1840420(2018.03.14)호는 챗봇 플랫폼 제공 방법 및 장치에 관한 것으로, 복수의 챗봇을 하나의 플랫폼을 통해 이용함으로써 개별적인 응답을 얻기 위해 복수의 챗봇을 별도로 호출하는 것이 아니라, 하나의 챗봇 플랫폼 상에서 복수의 챗봇을 호출하여 서비스를 받을 수 있는 기술을 개시하고 있다.Korean Patent Registration No. 10-1840420 (2018.03.14) relates to a method and apparatus for providing a chatbot platform, rather than calling a plurality of chatbots separately in order to obtain individual responses by using a plurality of chatbots through one platform. It discloses a technology capable of receiving services by calling multiple chatbots on one chatbot platform.

한국공개특허 제10-2018-0003417(2018.01.09)호는 챗봇을 이용한 콘텐트 제공 방법 및 장치에 관한 것으로, 챗봇을 이용하여 콘텐트를 제공함으로써 사용자가 콘텐트를 이용하기 위해 별도의 앱을 설치하거나 웹사이트에 접속할 필요가 없는 기술을 개시하고 있다.Korean Patent Publication No. 10-2018-0003417 (2018.01.09) relates to a method and device for providing content using a chatbot. By providing content using a chatbot, users install a separate app or install a web It discloses a technology that does not require access to the site.

한국등록특허 제10-1840420(2018.03.14)호Korean Patent Registration No. 10-1840420 (2018.03.14) 한국공개특허 제10-2018-0003417(2018.01.09)호Korean Patent Publication No. 10-2018-0003417 (2018.01.09)

본 발명의 일 실시예는 챗봇이 사용하는 데이터에서 사용자의 입력에 가장 적합한 학습 데이터를 추천할 수 있는 챗봇을 위한 학습 데이터 추천 장치 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide an apparatus and method for recommending learning data for a chatbot capable of recommending learning data most suitable for a user's input from data used by the chatbot.

본 발명의 일 실시예는 다른 챗봇이 사용하는 사전과 의도별 예문을 분석하여 유사도를 기초로 순위화된 단어 또는 문장을 학습 데이터로서 추천할 수 있는 챗봇을 위한 학습 데이터 추천 장치 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide an apparatus and method for recommending learning data for a chatbot that can recommend words or sentences ranked based on similarity as learning data by analyzing examples sentences by intention and dictionaries used by other chatbots. do.

본 발명의 일 실시예는 자연어 이해를 통한 의도추론을 통해 단어 뿐만 아니라 문장에 관한 사용자 입력에 대해서도 학습 유형에 적합한 학습 데이터를 추천할 수 있는 챗봇을 위한 학습 데이터 추천 장치 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide an apparatus and method for recommending learning data for a chatbot that can recommend learning data suitable for a learning type for a user input about not only words but also sentences through intention inference through natural language understanding.

실시예들 중에서, 챗봇을 위한 학습 데이터 추천 장치는 복수의 챗봇(chatbot)들로부터 주기적으로 수집되어 갱신되는 학습 모집단을 분석하여 적어도 하나의 분류 기준별 유사도 순위를 결정하는 데이터 전처리부, 사용자 단말로부터 학습 데이터에 관한 추천 요청을 수신하는 추천 요청 수신부, 상기 추천 요청에 관한 학습 유형을 결정하고 상기 유사도 순위를 기초로 상기 학습 유형에 대한 추천 데이터를 생성하는 추천 데이터 생성부 및 상기 추천 데이터를 상기 추천 요청에 대한 응답으로서 상기 사용자 단말에게 제공하는 데이터 추천부를 포함한다.Among the embodiments, the apparatus for recommending learning data for a chatbot includes a data preprocessing unit that determines a similarity ranking according to at least one classification criterion by analyzing a learning population periodically collected and updated from a plurality of chatbots, from a user terminal. A recommendation request receiving unit for receiving a recommendation request for learning data, a recommendation data generator for determining a learning type for the recommendation request and generating recommendation data for the learning type based on the similarity ranking, and the recommendation data And a data recommendation unit provided to the user terminal as a response to the request.

상기 데이터 전처리부는 상기 학습 모집단을 챗봇 사전(dictionary) 집합과 의도(intent)별 예문 집합으로 분류하고 단어-단어 및 단어-문장 형식의 학습 데이터를 각각 학습한 결과로서 학습 모델을 생성할 수 있다.The data preprocessor may classify the learning population into a chatbot dictionary set and an example sentence set according to intent, and generate a learning model as a result of learning each learning data in a word-word and word-sentence format.

상기 데이터 전처리부는 상기 학습 모델을 기초로 상기 챗봇 사전 집합과 상기 의도별 예문 집합에서 도출되는 모든 단어-단어 및 단어-문장 조합들에 대한 유사도를 산출할 수 있다.The data preprocessor may calculate a similarity for all word-word and word-sentence combinations derived from the chatbot dictionary set and the intention-specific example sentence set based on the learning model.

상기 추천 데이터 생성부는 입력된 의도(intent)와 예문을 기초로 학습예문을 학습하는 의도별 예문 학습 및 입력된 대표어를 기초로 유의어를 학습하는 유의어 학습 중 어느 하나를 상기 학습 유형으로서 결정할 수 있다.The recommendation data generator may determine any one of an intention-specific example sentence learning for learning a learning example based on an input intent and an example sentence, and a synonym learning for learning a synonym based on an input representative word as the learning type. .

상기 추천 데이터 생성부는 입력된 예문에 대해 자연어 이해(NLU)를 통한 의도추론을 수행하고 추론된 의도에 관한 단어-문장 간의 유사도 순위에 따라 정렬된 예문 리스트를 상기 의도별 예문 학습에 대한 추천 데이터로서 생성할 수 있다.The recommendation data generation unit performs intention inference through natural language understanding (NLU) on the input example sentences, and provides a list of example sentences arranged according to the rank of similarity between words-sentences regarding the inferred intention as recommendation data for learning examples for each intention. Can be generated.

상기 추천 데이터 생성부는 상기 학습 모집단 내에 상기 추론된 의도와 동일한 의도가 존재하지 않는 경우 상기 추론된 의도와 유사한 적어도 하나의 의도를 결정하고, 상기 적어도 하나의 의도와 연관된 의도별 예문을 기초로 상기 의도별 예문 학습에 대한 추천 데이터를 생성할 수 있다.The recommendation data generation unit determines at least one intention similar to the inferred intention when there is no intention identical to the inferred intention in the learning population, and the intention based on an intention-specific example sentence related to the at least one intention It is possible to generate recommended data for learning each example sentence.

상기 데이터 추천부는 상기 사용자 단말로부터 상기 추천 데이터에 대한 사용자 응답을 수신하여 상기 학습 모델을 갱신할 수 있다.The data recommendation unit may update the learning model by receiving a user response to the recommendation data from the user terminal.

실시예들 중에서, 챗봇을 위한 학습 데이터 추천 방법은 복수의 챗봇들에 의해 관리되는 챗봇 데이터를 주기적으로 수집하여 학습 모집단을 갱신하는 단계, 상기 학습 모집단을 분석하여 적어도 하나의 분류 기준별 유사도 순위를 결정하는 단계, 사용자 단말로부터 학습 데이터에 관한 추천 요청을 수신하는 단계, 상기 추천 요청에 관한 학습 유형을 결정하는 단계, 상기 유사도 순위를 기초로 상기 학습 유형에 대한 추천 데이터를 생성하는 단계 및 상기 추천 데이터를 상기 추천 요청에 대한 응답으로서 상기 사용자 단말에게 제공하는 단계를 포함한다.Among embodiments, the method of recommending learning data for a chatbot includes the steps of periodically collecting chatbot data managed by a plurality of chatbots to update a learning population, analyzing the learning population to rank similarity by at least one classification criterion. Determining, receiving a recommendation request for learning data from a user terminal, determining a learning type for the recommendation request, generating recommendation data for the learning type based on the similarity ranking, and the recommendation And providing data to the user terminal as a response to the recommendation request.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology can have the following effects. However, since it does not mean that a specific embodiment should include all of the following effects or only the following effects, it should not be understood that the scope of the rights of the disclosed technology is limited thereby.

본 발명의 일 실시예에 따른 챗봇을 위한 학습 데이터 추천 장치 및 방법은 다른 챗봇이 사용하는 사전과 의도별 예문을 분석하여 유사도를 기초로 순위화된 단어 또는 문장을 학습 데이터로서 추천할 수 있다.The apparatus and method for recommending learning data for a chatbot according to an embodiment of the present invention may recommend words or sentences ranked based on similarity as learning data by analyzing dictionaries used by other chatbots and example sentences by intention.

본 발명의 일 실시예에 따른 챗봇을 위한 학습 데이터 추천 장치 및 방법은 자연어 이해를 통한 의도추론을 통해 단어 뿐만 아니라 문장에 관한 사용자 입력에 대해서도 학습 유형에 적합한 학습 데이터를 추천할 수 있다.The apparatus and method for recommending learning data for a chatbot according to an embodiment of the present invention may recommend learning data suitable for a learning type for a user input regarding not only words but also sentences through intention inference through natural language understanding.

도 1은 본 발명의 일 실시예에 따른 챗봇을 위한 학습 데이터 추천 시스템을 설명하는 도면이다.
도 2는 도 1에 있는 학습 데이터 추천 장치의 물리적 구성을 설명하는 블록도이다.
도 3은 도 1에 있는 학습 데이터 추천 장치의 기능적 구성을 설명하는 블록도이다.
도 4는 도 1에 있는 학습 데이터 추천 장치에서 수행되는 학습 데이터 추천 과정을 설명하는 순서도이다.
도 5는 본 발명의 일 실시예에 따른 학습 데이터 추천 장치의 전체적인 동작을 설명하는 도면이다.1 is a diagram illustrating a system for recommending learning data for a chatbot according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a physical configuration of the apparatus for recommending learning data in FIG. 1.
FIG. 3 is a block diagram illustrating a functional configuration of the learning data recommendation apparatus of FIG. 1.
FIG. 4 is a flowchart illustrating a process of recommending learning data performed by the device for recommending learning data of FIG. 1.
5 is a diagram illustrating the overall operation of the apparatus for recommending learning data according to an embodiment of the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is merely an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, since the embodiments can be variously changed and have various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only those effects, the scope of the present invention should not be understood as being limited thereto.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are used to distinguish one component from other components, and the scope of rights is not limited by these terms. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" to another component, it should be understood that although it may be directly connected to the other component, another component may exist in the middle. On the other hand, when it is mentioned that a certain component is "directly connected" to another component, it should be understood that no other component exists in the middle. On the other hand, other expressions describing the relationship between the constituent elements, that is, "between" and "just between" or "neighboring to" and "directly neighboring to" should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions are to be understood as including plural expressions unless the context clearly indicates otherwise, and terms such as “comprise” or “have” refer to implemented features, numbers, steps, actions, components, parts, or It is to be understood that it is intended to designate that a combination exists and does not preclude the presence or addition of one or more other features or numbers, steps, actions, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (for example, a, b, c, etc.) is used for convenience of explanation, and the identification code does not describe the order of each step, and each step has a specific sequence clearly in context. Unless otherwise stated, it may occur differently from the stated order. That is, each of the steps may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable codes on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices storing data that can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. Further, the computer-readable recording medium is distributed over a computer system connected by a network, so that the computer-readable code can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the field to which the present invention belongs, unless otherwise defined. Terms defined in commonly used dictionaries should be construed as having meanings in the context of related technologies, and cannot be construed as having an ideal or excessive formal meaning unless explicitly defined in the present application.

도 1은 본 발명의 일 실시예에 따른 챗봇을 위한 학습 데이터 추천 시스템을 설명하는 도면이다.1 is a diagram illustrating a system for recommending learning data for a chatbot according to an embodiment of the present invention.

도 1을 참조하면, 챗봇을 위한 학습 데이터 추천 시스템(100)은 사용자 단말(110), 학습 데이터 추천 장치(130) 및 데이터베이스(150)를 포함할 수 있다.Referring to FIG. 1, a learning data recommendation system 100 for a chatbot may include a user terminal 110, a learning data recommendation device 130, and a database 150.

사용자 단말(110)은 데이터 추천을 요청하고 추천된 학습 데이터를 확인할 수 있는 컴퓨팅 장치에 해당할 수 있고, 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다. 사용자 단말(110)은 학습 데이터 추천 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 학습 데이터 추천 장치(130)와 동시에 연결될 수 있다.The user terminal 110 may correspond to a computing device that can request data recommendation and check the recommended learning data, and may be implemented as a smartphone, a laptop computer, or a computer, but is not necessarily limited thereto, and various devices such as a tablet PC. It can also be implemented as The user terminal 110 may be connected to the learning data recommendation device 130 through a network, and a plurality of user terminals 110 may be connected to the learning data recommendation device 130 at the same time.

학습 데이터 추천 장치(130)는 복수의 챗봇들에 의해 수집되어 관리되는 학습 데이터를 활용하여 사용자의 요청에 대한 응답으로서 학습 데이터를 생성하여 제공할 수 있는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 학습 데이터 추천 장치(130)는 사용자 단말(110)과 블루투스, WiFi 등을 통해 무선으로 연결될 수 있고, 네트워크를 통해 사용자 단말(110)과 데이터를 주고 받을 수 있다.The learning data recommendation device 130 may be implemented as a server corresponding to a computer or program that can generate and provide learning data in response to a user's request by using learning data collected and managed by a plurality of chatbots. have. The learning data recommendation device 130 may be wirelessly connected to the user terminal 110 through Bluetooth, WiFi, or the like, and may exchange data with the user terminal 110 through a network.

일 실시예에서, 학습 데이터 추천 장치(130)는 데이터베이스(150)와 연동하여 학습 데이터 추천에 필요한 정보를 저장할 수 있다. 한편, 학습 데이터 추천 장치(130)는 도 1과 달리, 데이터베이스(150)를 내부에 포함하여 구현될 수 있다. 또한, 학습 데이터 추천 장치(130)는 프로세서, 메모리, 사용자 입출력부 및 네트워크 입출력부를 포함하여 구현될 수 있으며, 이에 대해서는 도 2에서 보다 자세히 설명한다.In an embodiment, the apparatus 130 for recommending learning data may store information necessary for recommending learning data in conjunction with the database 150. Meanwhile, unlike FIG. 1, the apparatus 130 for recommending learning data may be implemented including a database 150 inside. In addition, the learning data recommendation device 130 may be implemented including a processor, a memory, a user input/output unit, and a network input/output unit, which will be described in more detail with reference to FIG. 2.

데이터베이스(150)는 챗봇의 학습 데이터를 기초로 추천 데이터를 생성하고 제공하는 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(150)는 복수의 챗봇들이 수집하는 사전(dictionary), 의도별 예문에 관한 정보를 저장할 수 있고, 반드시 이에 한정되지 않고, 학습 데이터 추천 장치(130)가 유사도를 이용하여 학습 데이터를 생성하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 150 may correspond to a storage device that stores various pieces of information necessary in the process of generating and providing recommendation data based on the learning data of the chatbot. The database 150 may store information on a dictionary and example sentences for each intention collected by a plurality of chatbots, and is not necessarily limited thereto, and the learning data recommendation device 130 generates learning data using similarity. Information collected or processed in various forms during the process can be stored.

일 실시예에서, 각 챗봇은 독립적인 장치에 의해 운용될 수 있고, 해당 챗봇에 의해 수집된 데이터는 해당 장치에 포함된 내부 저장장치에 저장될 수 있다. 이 경우, 데이터베이스(150)는 복수의 챗봇들 각각과 연관되고 독립적으로 관리되는 저장장치들이 서로 연결되어 동작하는 논리적인 데이터베이스로서 구현될 수 있다.In one embodiment, each chatbot may be operated by an independent device, and data collected by the chatbot may be stored in an internal storage device included in the corresponding device. In this case, the database 150 may be implemented as a logical database in which storage devices associated with each of a plurality of chatbots and independently managed are connected to each other to operate.

도 2는 도 1에 있는 학습 데이터 추천 장치의 물리적 구성을 설명하는 블록도이다.FIG. 2 is a block diagram illustrating a physical configuration of the apparatus for recommending learning data in FIG. 1.

도 2를 참조하면, 학습 데이터 추천 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함하여 구현될 수 있다.Referring to FIG. 2, the apparatus 130 for recommending learning data may include a processor 210, a memory 230, a user input/output unit 250, and a network input/output unit 270.

프로세서(210)는 사용자의 요청에 따라 학습 데이터를 생성하여 제공하는 과정에서의 동작들을 처리하는 각 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄할 수 있다. 프로세서(210)는 학습 데이터 추천 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 학습 데이터 추천 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor 210 can execute each procedure for processing operations in the process of generating and providing learning data according to the user's request, and can manage the memory 230 that is read or written throughout the process, A synchronization time between the volatile memory and the non-volatile memory in the memory 230 may be scheduled. The processor 210 can control the overall operation of the learning data recommendation device 130, and is electrically connected to the memory 230, the user input/output unit 250, and the network input/output unit 270 to control data flow between them. can do. The processor 210 may be implemented as a CPU (Central Processing Unit) of the learning data recommendation device 130.

메모리(230)는 SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 학습 데이터 추천 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory 230 may include an auxiliary memory device that is implemented as a nonvolatile memory such as a solid state disk (SSD) or a hard disk drive (HDD), and is used to store overall data required for the learning data recommendation device 130. , And a main memory device implemented as a volatile memory such as random access memory (RAM).

사용자 입출력부(250)는 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)는 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 학습 데이터 추천 장치(130)는 서버로서 수행될 수 있다.The user input/output unit 250 may include an environment for receiving a user input and an environment for outputting specific information to a user. For example, the user input/output unit 250 may include an input device including an adapter such as a touch pad, a touch screen, an on-screen keyboard, or a pointing device, and an output device including an adapter such as a monitor or a touch screen. In one embodiment, the user input/output unit 250 may correspond to a computing device connected through a remote connection, and in that case, the learning data recommendation device 130 may be performed as a server.

네트워크 입출력부(270)은 네트워크를 통해 외부 장치 또는 시스템과 연결하기 위한 환경을 포함하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다.The network input/output unit 270 includes an environment for connecting to an external device or system through a network, and includes, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a VAN ( Value Added Network) may include an adapter for communication.

도 3은 도 1에 있는 학습 데이터 추천 장치의 기능적 구성을 설명하는 블록도이다.FIG. 3 is a block diagram illustrating a functional configuration of the learning data recommendation apparatus of FIG. 1.

도 3을 참조하면, 학습 데이터 추천 장치(130)는 데이터 전처리부(310), 추천 요청 수신부(330), 추천 데이터 생성부(350), 데이터 추천부(370) 및 제어부(390)를 포함할 수 있다.3, the learning data recommendation device 130 includes a data preprocessing unit 310, a recommendation request receiving unit 330, a recommendation data generation unit 350, a data recommendation unit 370, and a control unit 390. I can.

데이터 전처리부(310)는 복수의 챗봇들로부터 주기적으로 수집되어 갱신되는 학습 모집단을 분석하여 적어도 하나의 분류 기준별 유사도 순위를 결정할 수 있다. 여기에서, 학습 모집단은 추천 데이터를 생성하기 위하여 사용하는 데이터 집합에 해당할 수 있고, 복수의 챗봇들 각각이 관리하는 사전(dictionary)과 의도별 예문 및 대화 기록을 포함할 수 있다. The data preprocessor 310 may determine a similarity ranking according to at least one classification criterion by analyzing a learning population that is periodically collected and updated from a plurality of chatbots. Here, the learning population may correspond to a data set used to generate recommendation data, and may include a dictionary managed by each of the plurality of chatbots, example sentences by intention, and conversation records.

사전(dictionary)은 단어별 유의어 또는 동의어에 관한 정보로 구성될 수 있고, 경우에 따라 유의어의 대표 단어인 대표어에 관한 정보를 추가로 포함할 수 있다. 유의어는 특정 단어로 인식되는 비슷한 단어의 집합에 해당할 수 있고, 동의어는 동일한 단어의 집합에 해당할 수 있다. The dictionary may be composed of information on synonyms or synonyms for each word, and in some cases, may additionally include information on a representative word, which is a representative word of the synonym. Thesaurus may correspond to a set of similar words recognized as a specific word, and the synonym may correspond to a set of identical words.

데이터 전처리부(310)는 일정한 시간 간격, 예를 들어 1주, 30일마다 복수의 챗봇들로부터 수집된 데이터를 기초로 학습 모집단을 갱신할 수 있다. 이 때, 학습 모집단의 갱신은 챗봇에 의해 수집된 대화 기록에 대한 분석을 포함할 수 있으며, 대화 기록의 분석은 대화 기록에 포함된 대화 문장에 대한 자연어 처리(Natural Language Processing)를 포함할 수 있다. The data preprocessor 310 may update the learning population based on data collected from a plurality of chatbots at regular time intervals, for example, every 1 week or 30 days. In this case, the update of the learning population may include an analysis of the conversation record collected by the chatbot, and the analysis of the conversation record may include natural language processing on the conversation sentences included in the conversation record. .

자연어 처리는 형태소 분석과 패턴 매칭(pattern matching)을 포함할 수 있으며, 대화 기록의 각 문장에 대한 의도추론과 키워드 추출을 수행할 수 있다. 데이터 전처리부(310)는 대화기록 분석을 통해 유의어 집합, 의도별 예문 집합 등을 생성하여 학습 모집단에 추가함으로써 갱신 동작을 수행할 수 있다.Natural language processing may include morpheme analysis and pattern matching, and intention inference and keyword extraction for each sentence in the conversation record may be performed. The data preprocessor 310 may perform an update operation by generating a synonym set, an example sentence set according to intention, etc. through analysis of the conversation record and adding it to the learning population.

또한, 데이터 전처리부(310)는 분류 기준에 따른 유사도를 기초로 데이터들을 순위화할 수 있다. 여기에서, 분류 기준은 학습 데이터 추천에 활용되는 파라미터에 해당할 수 있고, 사용자가 학습을 위하여 입력하는 단어 또는 문장에 매칭되어 학습 데이터 추천을 위한 연결고리 역할을 수행할 수 있다. 즉, 데이터 전처리부(310)는 분류기준과 학습 모집단에 속한 데이터 간의 유사도를 산출하여 유사도에 따른 순위를 결정할 수 있다.In addition, the data preprocessor 310 may rank data based on similarity according to the classification criteria. Here, the classification criterion may correspond to a parameter used for recommending learning data, and may be matched to a word or sentence input for learning by a user to serve as a link for recommending learning data. That is, the data preprocessor 310 may determine a ranking according to the similarity by calculating the similarity between the classification criteria and data belonging to the learning population.

일 실시예에서, 데이터 전처리부(310)는 유사도에 따른 순위를 결정하여 유사도 테이블을 구축할 수 있다. 즉, 데이터 전처리부(310)는 데이터 전처리를 수행한 결과로서 단어-단어 간의 유사도 테이블 또는 단어-문장 간의 유사도 테이블을 생성할 수 있다. 예를 들어, 유사도 테이블은 학습 모집단에 속한 단어들 간의 유사도 매트릭스(matrix) 또는 단어와 문장 간의 유사도 매트릭스에 해당할 수 있다.In an embodiment, the data preprocessor 310 may establish a similarity table by determining a ranking according to the similarity. That is, the data preprocessor 310 may generate a word-word similarity table or a word-sentence similarity table as a result of performing data preprocessing. For example, the similarity table may correspond to a similarity matrix between words belonging to a learning population or a similarity matrix between words and sentences.

일 실시예에서, 데이터 전처리부(310)는 학습 모집단을 챗봇 사전(dictionary) 집합과 의도(intent)별 예문 집합으로 분류하고 단어-단어 및 단어-문장 형식의 학습 데이터를 각각 학습한 결과로서 학습 모델을 생성할 수 있다. 챗봇 사전(dictionary) 집합은 챗봇들에서 수집된 사전들을 통합한 것으로서 동의어 또는 유의어 집합으로 구성될 수 있다. 경우에 따라서, 유의어 집합은 동일한 의미를 가진 유의어들과 함께 해당 유의어들을 대표하는 대표어에 관한 정보를 추가적으로 포함할 수 있다. 의도(intent)별 예문 집합은 특정 의도와 해당 특정 의도에 관한 예문 집합으로 구성될 수 있다. 즉, 예문 집합에 포함된 예문들은 해당 특정 의도를 위한 것으로 분류될 수 있다.In one embodiment, the data preprocessor 310 classifies the learning population into a chatbot dictionary set and an example sentence set by intent, and learns the training data in the form of words-words and words-sentences, respectively, as a learning model. Can be created. The chatbot dictionary set is a combination of dictionaries collected from chatbots and may be composed of a synonym or synonym set. In some cases, the synonym set may additionally include information on representative words representing the synonyms together with synonyms having the same meaning. An example sentence set by intent may consist of a specific intent and an example sentence set related to the specific intent. That is, example sentences included in the example sentence set may be classified as those for a specific intention.

데이터 전처리부(310)는 챗봇 사전 집합에 대해 동의어, 유의어 및 대표어에 대한 분석을 통해 단어-단어 형식의 학습 데이터를 학습할 수 있고, 의도별 예문 집합에 대해 의도와 예문, 즉 단어-문장 형식의 학습 데이터를 학습할 수 있다. 데이터 전처리부(310)는 각각의 학습에 의해 생성되는 학습 모델을 독립적으로 유지 및 관리할 수 있고, 학습 결과로서 생성된 학습 모델은 특정 단어-단어 또는 단어-문장 쌍으로 구성된 입력에 대해 해당 단어-단어 또는 단어-문장 간의 유사도를 출력으로서 제공할 수 있다.The data preprocessing unit 310 may learn word-word format learning data through analysis of synonyms, synonyms, and representative words for the chatbot dictionary set, and intention and example sentences, that is, word-sentence, for example sentence sets for each intention. You can learn the form of learning data. The data preprocessor 310 can independently maintain and manage the learning model generated by each learning, and the learning model generated as a learning result is a corresponding word for an input consisting of a specific word-word or word-sentence pair. -Word or word-sentence similarity can be provided as an output.

일 실시예에서, 데이터 전처리부(310)는 학습 모델을 기초로 챗봇 사전 집합과 의도별 예문 집합에서 도출되는 모든 단어-단어 및 단어-문장 조합들에 대한 유사도를 산출할 수 있다. 데이터 전처리부(310)는 학습 데이터 추천을 위하여 전처리 단계에서 학습 모델을 활용하여 유사도에 관한 순위화 정보를 생성할 수 있다. 즉, 데이터 전처리부(310)는 챗봇 사전 집합에 있는 각 단어 별로 단어들과의 유사도를 측정하여 순위화할 수 있고, 의도별 예문 집합에 있는 각 의도 별로 예문들과의 유사도를 측정하여 순위화할 수 있다.In an embodiment, the data preprocessor 310 may calculate a similarity for all word-word and word-sentence combinations derived from the chatbot dictionary set and the example sentence set for each intention based on the learning model. The data preprocessor 310 may generate ranking information about similarity by using a learning model in a preprocessing step in order to recommend training data. That is, the data preprocessor 310 may measure and rank the similarity with words for each word in the chatbot dictionary set, and measure and rank the similarity with example sentences for each intention in the example sentence set for each intention. have.

추천 요청 수신부(330)는 사용자 단말(110)로부터 학습 데이터에 관한 추천 요청을 수신할 수 있다. 사용자는 사용자 단말(110)을 통해 자신이 사용하고자 하는 챗봇에 대한 학습을 위해 학습 데이터를 요청할 수 있다. 예를 들어, 유의어 학습의 경우 사용자는 학습 대상이 되는 특정 단어를 입력할 수 있고 사용자 단말(110)은 해당 단어의 유의어로서 학습 가능한 추천 데이터를 학습 데이터 추천 장치(130)에 요청할 수 있다. 의도별 예문 학습의 경우 사용자는 학습 대상이 되는 의도를 입력할 수 있고 사용자 단말(110)은 해당 의도의 예문으로서 학습 가능한 추천 데이터를 학습 데이터 추천 장치(130)에 요청할 수 있다.The recommendation request receiving unit 330 may receive a recommendation request for learning data from the user terminal 110. The user may request learning data to learn about the chatbot he wants to use through the user terminal 110. For example, in the case of thesaurus learning, the user may input a specific word to be learned, and the user terminal 110 may request the learning data recommendation apparatus 130 for recommendation data that can be learned as the synonym of the corresponding word. In the case of learning example sentences by intention, the user may input an intention to be a learning target, and the user terminal 110 may request from the learning data recommendation apparatus 130 for recommendation data that can be learned as an example sentence of the corresponding intention.

추천 데이터 생성부(350)는 추천 요청에 관한 학습 유형을 결정하고 유사도 순위를 기초로 해당 학습 유형에 대한 추천 데이터를 생성할 수 있다. 추천 데이터 생성부(350)는 추천 요청과 함께 제공되는 정보, 즉 사용자가 입력한 단어 또는 문장을 기초로 학습 유형을 결정할 수 있다.The recommendation data generating unit 350 may determine a learning type for a recommendation request and generate recommendation data for a corresponding learning type based on a similarity ranking. The recommendation data generator 350 may determine a learning type based on information provided with the recommendation request, that is, a word or sentence input by the user.

일 실시예에서, 추천 데이터 생성부(350)는 입력된 의도(intent)와 예문을 기초로 학습예문을 학습하는 의도별 예문 학습 및 입력된 대표어를 기초로 유의어를 학습하는 유의어 학습 중 어느 하나를 학습 유형으로서 결정할 수 있다. 즉, 사용자가 입력한 단어가 의도(intent)에 해당하거나 또는 사용자가 문장을 입력한 경우 특정 의도(intent)에 관한 예문을 학습하기 위한 의도별 예문 학습에 해당할 수 있다. 만약 사용자가 입력한 단어가 의도(intent)에 해당하지 않는 경우 해당 단어의 유의어를 학습하기 위한 유의어 학습에 해당할 수 있다. 추천 데이터 생성부(350)는 사용자의 입력을 기초로 학습 유형을 분류한 후 해당 학습 유형에 맞는 학습 데이터를 응답으로서 제공할 수 있다.In one embodiment, the recommendation data generation unit 350 is any one of thesaurus learning of learning examples by intention to learn learning examples based on the input intent and example sentences, and thesaurus learning based on the input representative words. Can be determined as the learning type. That is, when a word input by the user corresponds to an intent, or when a user inputs a sentence, it may correspond to example sentence learning by intention to learn an example sentence related to a specific intent. If the word input by the user does not correspond to the intent, it may correspond to the thesaurus learning for learning the synonym of the corresponding word. The recommendation data generation unit 350 may classify a learning type based on a user's input and then provide learning data suitable for a corresponding learning type as a response.

일 실시예에서, 추천 데이터 생성부(350)는 입력된 예문에 대해 자연어 이해(NLU)를 통한 의도추론을 수행하고 추론된 의도에 관한 단어-문장 간의 유사도 순위에 따라 정렬된 예문 리스트를 의도별 예문 학습에 대한 추천 데이터로서 생성할 수 있다. In an embodiment, the recommendation data generation unit 350 performs intention inference through natural language understanding (NLU) on the input example sentences, and sorts example sentences according to the order of similarity between words and sentences regarding the inferred intention. It can be generated as recommendation data for learning example sentences.

추천 데이터 생성부(350)는 사용자의 입력이 단어가 아닌 문장인 경우 의도별 예문 학습에 해당할 수 있고, 학습 데이터 생성을 위하여 해당 문장에 대한 의도를 결정해야 한다. 따라서, 추천 데이터 생성부(350)는 자연어 처리(NLP)에 딥러닝이 결합된 자연어 이해(NLU)를 통해 입력된 문장에 대한 의도를 추론할 수 있다. When the user's input is a sentence other than a word, the recommendation data generation unit 350 may correspond to example sentence learning by intention and must determine an intention for the sentence in order to generate the learning data. Accordingly, the recommendation data generation unit 350 may infer an intention for an input sentence through natural language understanding (NLU) in which deep learning is combined with natural language processing (NLP).

또한, 추천 데이터 생성부(350)는 자연어 이해(NLU)를 위하여 독립적인 학습 모델을 구축할 수 있다. 다른 실시예에서, 추천 데이터 생성부(350)는 자연어 이해(NLU)를 위하여 다른 챗봇에 의해 학습된 학습 모델을 활용할 수 있다. 이를 위하여, 추천 데이터 생성부(350)는 복수의 챗봇들 중에서 자연어 이해(NLU)를 위한 챗봇을 결정하는 별도의 모듈을 포함할 수 있다.In addition, the recommendation data generation unit 350 may build an independent learning model for natural language understanding (NLU). In another embodiment, the recommendation data generation unit 350 may use a learning model learned by another chatbot for natural language understanding (NLU). To this end, the recommendation data generation unit 350 may include a separate module that determines a chatbot for natural language understanding (NLU) among a plurality of chatbots.

추천 데이터 생성부(350)는 입력된 예문에 대해 추론된 의도를 기초로 데이터 전처리부(310)에 의해 처리된 결과인 단어-문장 간의 유사도 순위에 따라 상위 특정 순위까지의 예문만을 선별하여 추천 데이터로서 생성할 수 있다. 다른 예에서, 추천 데이터 생성부(350)는 추론된 의도와 유사한 후보 의도들을 선정할 수 있고, 추론된 의도에 관한 예문 집합과 후보 의도들에 관한 예문 집합을 대상으로 추천 데이터를 생성할 수 있으며, 경우에 따라서 해당 예문 집합을 대상으로 유사도를 재산출할 수 있다.The recommendation data generation unit 350 selects only example sentences up to a specific ranking according to the similarity ranking between words and sentences, which is a result processed by the data preprocessing unit 310, based on the intention inferred for the input example sentences, and recommends data. Can be created as In another example, the recommendation data generation unit 350 may select candidate intentions similar to the inferred intention, and generate recommendation data for an example sentence set related to the inferred intention and an example sentence set related to the candidate intention, In some cases, similarity can be recalculated for the set of examples.

일 실시예에서, 추천 데이터 생성부(350)는 학습 모집단 내에 추론된 의도와 동일한 의도가 존재하지 않는 경우 추론된 의도와 유사한 적어도 하나의 의도를 결정하고, 적어도 하나의 의도와 연관된 의도별 예문을 기초로 의도별 예문 학습에 대한 추천 데이터를 생성할 수 있다.In one embodiment, the recommendation data generation unit 350 determines at least one intention similar to the inferred intention when the same intention as the inferred intention does not exist in the learning population, and generates an intention-specific example sentence associated with the at least one intention. As a basis, recommendation data for learning example sentences by intention may be generated.

예를 들어, 추천 데이터 생성부(350)는 추론된 의도와 가장 유사한 의도를 결정하고 해당 의도에 관한 예문 집합(즉, 의도별 예문)을 대상으로 유사도 순위에 따른 추천 데이터를 생성할 수 있다. 또한, 추천 데이터 생성부(350)는 추론된 의도와의 유사도를 기초로 적어도 하나의 의도를 결정할 수 있고, 적어도 하나의 의도에 관한 예문 집합을 통합한 후 유사도 순위에 따른 추천 데이터를 생성할 수 있다.For example, the recommendation data generation unit 350 may determine an intention most similar to the inferred intention, and generate recommendation data according to a similarity ranking targeting an example sentence set (ie, example sentences by intention) related to the intention. In addition, the recommendation data generation unit 350 may determine at least one intention based on the similarity with the inferred intention, and generate recommendation data according to the similarity ranking after integrating a set of example sentences for at least one intention. have.

데이터 추천부(370)는 추천 데이터를 추천 요청에 대한 응답으로서 사용자 단말(110)에게 제공할 수 있다. 데이터 추천부(370)는 추천 데이터 생성부(350)에 의해 생성된 추천 데이터를 사용자 단말(110)에게 그대로 제공할 수 있고, 필요한 경우 추천 데이터의 일부만을 제공할 수 있다.The data recommendation unit 370 may provide recommendation data to the user terminal 110 as a response to the recommendation request. The data recommendation unit 370 may provide the recommendation data generated by the recommendation data generation unit 350 to the user terminal 110 as it is, and, if necessary, may provide only a part of the recommendation data.

일 실시예에서, 데이터 추천부(370)는 사용자 단말(110)로부터 추천 데이터에 대한 사용자 응답을 수신하여 학습 모델을 갱신할 수 있다. 이를 위해, 데이터 추천부(370)는 추천 데이터를 제공하면서 사용자 응답을 입력할 수 있는 인터페이스를 함께 제공할 수 있고, 사용자는 해당 인터페이스를 통해 추천된 데이터 각각에 대해 적용 또는 제외를 선택할 수 있다.In one embodiment, the data recommendation unit 370 may receive a user response to the recommendation data from the user terminal 110 and update the learning model. To this end, the data recommendation unit 370 may provide an interface for inputting a user response while providing recommendation data, and the user may select to apply or exclude each of the recommended data through the corresponding interface.

제어부(390)는 학습 데이터 추천 장치(130)의 전체적인 동작을 제어하고, 데이터 전처리부(310), 추천 요청 수신부(330), 추천 데이터 생성부(350) 및 데이터 추천부(370) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit 390 controls the overall operation of the learning data recommendation device 130, and a control flow between the data preprocessor 310, the recommendation request reception unit 330, the recommendation data generation unit 350, and the data recommendation unit 370 Or you can manage the data flow.

도 4는 도 1에 있는 학습 데이터 추천 장치에서 수행되는 학습 데이터 추천 과정을 설명하는 순서도이다.FIG. 4 is a flowchart illustrating a process of recommending learning data performed by the device for recommending learning data of FIG. 1.

도 4를 참조하면, 학습 데이터 추천 장치(130)는 데이터 전처리부(310)를 통해 복수의 챗봇들로부터 주기적으로 수집되어 갱신되는 학습 모집단을 분석하여 적어도 하나의 분류 기준별 유사도 순위를 결정할 수 있다(단계 S410). 학습 데이터 추천 장치(130)는 추천 요청 수신부(330)를 통해 사용자 단말(110)로부터 학습 데이터에 관한 추천 요청을 수신할 수 있다(단계 S430).Referring to FIG. 4, the apparatus 130 for recommending learning data may determine a similarity ranking according to at least one classification criterion by analyzing a learning population periodically collected and updated from a plurality of chatbots through the data preprocessor 310. (Step S410). The learning data recommendation device 130 may receive a recommendation request for learning data from the user terminal 110 through the recommendation request receiving unit 330 (step S430).

학습 데이터 추천 장치(130)는 추천 데이터 생성부(350)를 통해 추천 요청에 관한 학습 유형을 결정하고 유사도 순위를 기초로 해당 학습 유형에 대한 추천 데이터를 생성할 수 있다(단계 S450). 학습 데이터 추천 장치(130)는 데이터 추천부(370)를 통해 추천 데이터를 추천 요청에 대한 응답으로서 사용자 단말(110)에게 제공할 수 있다(단계 S470).The learning data recommendation device 130 may determine a learning type for a recommendation request through the recommendation data generation unit 350 and generate recommendation data for a corresponding learning type based on a similarity ranking (step S450). The learning data recommendation device 130 may provide the recommendation data to the user terminal 110 as a response to the recommendation request through the data recommendation unit 370 (step S470).

도 5는 본 발명의 일 실시예에 따른 학습 데이터 추천 장치의 전체적인 동작을 설명하는 도면이다.5 is a diagram illustrating the overall operation of the apparatus for recommending learning data according to an embodiment of the present invention.

도 5를 참조하면, 학습 데이터 추천 장치(130)는 사용자 입력을 수신함으로써 학습 데이터에 관한 추천 요청을 확인할 수 있다. 학습 데이터 추천 장치(130)는 사용자 입력에 관한 학습 유형을 결정할 수 있고, 특정 단어의 유의어를 학습하는 유의어 학습 유형과 특정 의도(intent)에 관한 예문들을 학습하는 의도별 예문 학습 유형 중 어느 하나를 결정할 수 있다.Referring to FIG. 5, the apparatus 130 for recommending learning data may confirm a request for recommending learning data by receiving a user input. The learning data recommendation device 130 may determine a learning type related to a user input, and select any one of a thesaurus learning type for learning the synonyms of a specific word and an example sentence learning type by intention for learning example sentences related to a specific intent. You can decide.

즉, 학습 데이터 추천 장치(130)는 학습 유형에 따라 독립적인 프로세스를 통해 각각의 학습 데이터를 생성하여 추천할 수 있다. 이를 위하여 학습 데이터 추천 장치(130)는 데이터 전처리부(310)를 통해 데이터 전처리 과정을 수행할 수 있다.That is, the learning data recommendation device 130 may generate and recommend each learning data through an independent process according to the learning type. To this end, the learning data recommendation device 130 may perform a data preprocessing process through the data preprocessor 310.

데이터 전처리 과정은 복수의 챗봇들에 대한 데이터를 수집하여 단어-단어 및 단어-문장 간의 유사도를 측정하고 순위화하는 과정으로 진행될 수 있다. 일 실시예에서, 데이터 전처리부(310)는 한글에 적합하도록 변경된 Word2Vec 알고리즘을 사용할 수 있다. 또한, 데이터 전처리부(310)는 편집 거리(Levenshtein Distance)를 활용한 가중 퍼지(weighted fuzzy) 알고리즘을 사용하여 유사도 산출 결과를 개선할 수 있다.The data preprocessing process may proceed as a process of collecting data on a plurality of chatbots, measuring and ranking similarities between words-words and words-sentences. In one embodiment, the data preprocessor 310 may use the Word2Vec algorithm modified to be suitable for Korean. In addition, the data preprocessor 310 may improve the similarity calculation result by using a weighted fuzzy algorithm utilizing the edit distance (Levenshtein Distance).

도 5에서, 학습 데이터 추천 장치(130)는 사용자 입력이 문장인 경우 해당 문장에 대한 의도를 추론한 후 추론된 의도를 기초로 추천 데이터를 생성할 수 있다. 의도추론은 입력 문장이 어떤 의도인지 분류하는 과정으로 자연어 이해(Natural Language Understanding)를 통해 수행될 수 있으며, 학습 데이터 추천 장치(130)는 자연어 이해를 위해 독립적으로 동작하는 모듈을 포함하여 구현될 수 있다.In FIG. 5, when the user input is a sentence, the apparatus 130 for recommending learning data may infer an intention for the sentence and then generate recommendation data based on the inferred intention. Intention inference is a process of classifying an input sentence as to what kind of intention, and may be performed through natural language understanding, and the learning data recommendation device 130 may be implemented including a module that operates independently for natural language understanding. have.

여기에서, 자연어 이해(NLU)는 입력된 문장의 의도를 파악하기 위해 어떤 순서로 추론을 진행할 것인지에 따라 형태소 분석 우선 추론과 머신러닝(machin learning) 우선 추론으로 구분될 수 있다. Here, natural language understanding (NLU) can be divided into morpheme analysis-first inference and machine learning-first inference according to the order in which inference is to be performed in order to grasp the intention of the input sentence.

형태소 분석 우선 추론은 패턴 매칭(이하, A패턴) 추론에서 적합한 의도(intent)를 찾지 못할 경우 형태소 분석 추론(이하, B패턴) 방식으로 의도를 재추론할 수 있다. B패턴 추론 결과 의도를 찾지 못한 경우, 혹은 적합한 의도를 찾았지만 추론율이 설정한 형태소 분석 추론 성공범위의 임계값 미만일 경우 머신러닝 추론(이하, C패턴)을 수행할 수 있다. C패턴 추론 결과 역시 설정한 머신러닝 추론 성공범위 이상일 경우 찾은 의도를 반환하지만 설정값 이하일 경우 최종적으로 디폴트 결과(default fallback)를 반환할 수 있다. 추론 성공범위는 추론된 데이터를 얼마나 신뢰할 것인지를 결정하는 기준으로서 각 패턴 별로 설정될 수 있다.In the morpheme analysis-priority inference, when a suitable intent is not found in pattern matching (hereinafter, pattern A) inference, the intention may be re-inferred in a morpheme analysis inference method (hereinafter, pattern B). Machine learning inference (hereinafter referred to as C pattern) can be performed when the intention is not found as a result of the B pattern inference, or when an appropriate intention is found but the inference rate is less than the threshold of the set morpheme analysis inference success range. If the C pattern inference result is also above the set machine learning inference success range, the found intention is returned, but if it is less than the set value, the default fallback can be finally returned. The inference success range is a criterion for determining how reliable the inferred data is, and may be set for each pattern.

머신러닝 우선 추론은 형태소 분석 우선 추론과 반대로 A패턴에서 의도(intent)를 찾지 못하면 C패턴 방식으로 먼저 의도를 추론할 수 있다. 순서를 제외한 각 추론의 내용은 형태소 분석 우선 추론과 동일하게 수행될 수 있다.Machine learning-first inference, contrary to morpheme analysis-first inference, can infer the intention first in the C-pattern method if the intent is not found in the A pattern. The content of each reasoning except the order can be performed in the same way as the morpheme analysis-first inference.

패턴 매칭(A패턴) 추론은 의도에 등록된 예문을 기준으로 완벽히 동일한 패턴을 가진 문장을 찾는 방법에 해당할 수 있다. A패턴 추론은 완벽하게 동일한 문장이나 파라미터(parameter) 등록시 해당 파라미터를 치환할 수 있는 문장을 찾을 수 있으며, 예문의 표준 문장이 정의된 경우 동일한 표준 문장이 들어왔을 때 이를 인식할 수 있다.Pattern matching (pattern A) inference may correspond to a method of finding sentences with perfectly identical patterns based on example sentences registered in the intention. A pattern inference can find a sentence that can substitute a corresponding parameter when registering a perfectly identical sentence or parameter, and when a standard sentence of an example sentence is defined, it can recognize it when the same standard sentence is entered.

형태소 분석(B패턴) 추론은 입력 문장의 형태소 분석 결과와 가장 유사한 형태소 분석 결과를 보이는 예문을 가진 의도를 추론하는 방법에 해당할 수 있다. B패턴 추론은 명사와 동사, 그리고 키워드를 중심으로 유사도를 평가할 수 있다. 머신러닝(C패턴) 추론은 머신러닝으로 자동 학습된 결과를 기초로 의도를 추론하는 방법에 해당할 수 있다.Morphological analysis (B pattern) inference may correspond to a method of inferring an intention with an example sentence that shows a morpheme analysis result most similar to the morpheme analysis result of an input sentence. B-pattern inference can evaluate similarity based on nouns, verbs, and keywords. Machine learning (C pattern) inference may correspond to a method of inferring an intention based on a result automatically learned by machine learning.

도 5에서, 학습 데이터 추천 장치(130)는 의도별 예문 학습 유형에 대한 추천 데이터를 생성하기 위하여 사용자 입력이 단어에 해당하는지를 결정할 수 있다. 만약 사용자 입력이 단어에 해당하고 동일한 의도(intent)가 존재하는 경우 학습 데이터 추천 장치(130)는 단어-문장 간의 유사도 순위를 기초로 학습 데이터를 생성하여 추천할 수 있다.In FIG. 5, the apparatus 130 for recommending learning data may determine whether a user input corresponds to a word in order to generate recommendation data for an example sentence learning type for each intention. If the user input corresponds to a word and the same intent exists, the learning data recommendation apparatus 130 may generate and recommend learning data based on a similarity ranking between words and sentences.

그러나, 사용자 입력이 단어에 해당함에도 불구하고 동일한 의도(intent)가 존재하지 않는 경우 학습 데이터 추천 장치(130)는 의도별 예문 집합에 포함된 복수의 의도(intent)들 중에서 특정 유사도 이상의 복수의 유사 의도들을 선정할 수 있다. 학습 데이터 추천 장치(130)는 유사 의도에 관한 의도별 예문 집합을 대상으로 단어-문장 간의 유사도 순위를 기초로 학습 데이터를 생성할 수 있다.However, if the same intent does not exist even though the user input corresponds to a word, the learning data recommendation device 130 may perform a plurality of similarities with a certain degree of similarity or higher among a plurality of intents included in the example sentence set for each intention. You can choose your intentions. The learning data recommendation apparatus 130 may generate learning data based on a similarity ranking between words and sentences for a set of example sentences according to intentions related to similar intentions.

일 실시예에서, 학습 데이터 추천 장치(130)는 복수의 챗봇들에 의해 관리되는 챗봇 데이터를 주기적으로 수집하여 학습 모집단을 갱신하는 단계, 학습 모집단을 분석하여 적어도 하나의 분류 기준별 유사도 순위를 결정하는 단계, 사용자 단말(110)로부터 학습 데이터에 관한 추천 요청을 수신하는 단계, 추천 요청에 관한 학습 유형을 결정하는 단계, 유사도 순위를 기초로 학습 유형에 대한 추천 데이터를 생성하는 단계 및 추천 데이터를 추천 요청에 대한 응답으로서 사용자 단말(110)에게 제공하는 단계를 포함하는 학습 데이터 추천 방법을 수행할 수 있다.In one embodiment, the learning data recommendation device 130 periodically collects chatbot data managed by a plurality of chatbots to update the learning population, and analyzes the learning population to determine a similarity ranking by at least one classification criterion A step of, receiving a recommendation request for learning data from the user terminal 110, determining a learning type for the recommendation request, generating recommendation data for a learning type based on a similarity ranking, and recommending data. In response to the recommendation request, a method of recommending learning data including providing to the user terminal 110 may be performed.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to preferred embodiments of the present invention, those skilled in the art will variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims. You will understand that you can do it.

100: 챗봇을 위한 학습 데이터 추천 시스템
110: 사용자 단말 130: 학습 데이터 추천 장치
150: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 데이터 전처리부 330: 추천 요청 수신부
350: 추천 데이터 생성부 370: 데이터 추천부100: learning data recommendation system for chatbots
110: user terminal 130: learning data recommendation device
150: database
210: processor 230: memory
250: user input/output unit 270: network input/output unit
310: data preprocessing unit 330: recommendation request receiving unit
350: recommendation data generation unit 370: data recommendation unit

Claims

A data preprocessing unit that analyzes a learning population that is periodically collected and updated from a plurality of chatbots and determines a similarity ranking according to at least one classification criterion;
A recommendation request receiving unit for receiving a recommendation request for learning data from a user terminal;
A recommendation data generator for determining a learning type for the recommendation request and generating recommendation data for the learning type based on the similarity ranking; And
Learning data recommendation device for a chatbot comprising a data recommendation unit that provides the recommendation data to the user terminal as a response to the recommendation request.

The method of claim 1, wherein the data preprocessor
Learning for a chatbot, characterized in that the learning population is classified into a chatbot dictionary set and an example sentence set by intent, and a learning model is generated as a result of learning each learning data in the form of words-words and words-sentences. Data recommendation device.

The method of claim 2, wherein the data preprocessor
A learning data recommendation device for a chatbot, characterized in that, based on the learning model, a degree of similarity to all word-word and word-sentence combinations derived from the chatbot dictionary set and the intention-specific example sentence set is calculated.

The method of claim 1, wherein the recommendation data generation unit
A chatbot, characterized in that, as the learning type, one of learning examples by intention to learn learning examples based on the input intent and example sentences, and learning thesaurus learning the synonyms based on the input representative words. Learning data recommendation device for.

The method of claim 3, wherein the recommendation data generation unit
Intentional inference through natural language understanding (NLU) is performed on the input example sentences, and a list of example sentences arranged according to the order of similarity between words-sentences about the inferred intention is generated as recommended data for learning examples by intention. Learning data recommendation device for a chatbot, characterized in that.

The method of claim 5, wherein the recommendation data generation unit
If the same intention as the inferred intention does not exist in the learning population, at least one intention similar to the inferred intention is determined, and the example sentences for each intention are learned based on an intention-specific example sentence related to the at least one intention. A learning data recommendation device for a chatbot, characterized in that it generates recommendation data.

The method of claim 2, wherein the data recommendation unit
A learning data recommendation device for a chatbot, characterized in that receiving a user response to the recommendation data from the user terminal and updating the learning model.

In the learning data recommendation method performed in the learning data recommendation device,
Updating a learning population by periodically collecting chatbot data managed by a plurality of chatbots;
Analyzing the learning population to determine a similarity ranking according to at least one classification criterion;
Receiving a recommendation request for learning data from a user terminal;
Determining a learning type for the recommendation request;
Generating recommendation data for the learning type based on the similarity ranking; And
And providing the recommendation data to the user terminal as a response to the recommendation request.