KR102117281B1

KR102117281B1 - Method for generating chatbot utterance using frequency table

Info

Publication number: KR102117281B1
Application number: KR1020180117014A
Authority: KR
Inventors: 남후람; 프로코페브 알렉세이; 정명원
Original assignee: 주식회사 아카에이아이
Priority date: 2018-10-01
Filing date: 2018-10-01
Publication date: 2020-06-01
Also published as: KR20200037593A; WO2020071666A1

Abstract

본 발명은 빈도 테이블을 이용한 챗봇 발언 생성 방법에 관한 것이 개시된다. 상기 방법은, 사용자의 과거의 발언을 이용하여 빈도 테이블을 생성하는 단계, 및 상기 빈도 테이블을 이용하여 상기 사용자의 새로운 발언에 응답하기 위한 챗봇의 발언을 생성하는 단계를 포함하고, 상기 빈도 테이블을 생성하는 단계는, 상기 사용자의 상기 과거의 발언의 하나 이상의 문장에 관한 제1 의존 구문 분석 트리를 생성하는 단계, 상기 제1 의존 구문 분석 트리로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제1 데이터 셋을 추출하는 단계, 추출된 상기 하나 이상의 제1 데이터 셋을 입력한 제1 테이블을 생성하는 단계, 상기 제1 테이블에서 상기 하나 이상의 제1 데이터 셋의 빈도를 계산하는 단계, 및 상기 제1 테이블의 상기 하나 이상의 제1 데이터 셋으로부터 소정의 빈도를 갖고 상기 사용자에게 의미있는 하나 이상의 제2 데이터 셋을 추출하여 상기 빈도 테이블을 생성하는 단계를 포함하고, 상기 챗봇의 발언을 생성하는 단계는, 상기 사용자의 새로운 발언의 하나 이상의 문장에 관한 제2 의존 구문 분석 트리를 생성하는 단계, 상기 제2 의존 구문 분석 트리로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제3 데이터 셋을 추출하는 단계, 상기 빈도 테이블의 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋을 비교하는 단계, 상기 빈도 테이블의 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋이 소정의 유사도를 가지는 경우 상기 하나 이상의 제3 데이터 셋을 발언 후보 데이터 셋으로 추출하는 단계, 자연어 생성 알고리즘을 이용하여, 추출된 상기 발언 후보 데이터 셋에 상응하는 대화 후보 문장을 생성하는 단계, 및 상기 대화 후보 문장을 이용하여 챗봇의 발언을 생성하는 단계를 포함한다.The present invention relates to a method for generating a chatbot utterance using a frequency table. The method includes generating a frequency table using the user's past remarks, and generating a chatbot's remark to respond to the user's new remarks using the frequency table, and the frequency table is generated. Generating comprises: generating a first dependent parse tree for one or more sentences of the user's past remarks, i) a first object corresponding to a predetermined first vertex from the first dependent parse tree , ii) extracting a predetermined one or more first data sets defined as second objects corresponding to a predetermined second vertex, and generating a first table in which the extracted one or more first data sets are input; Calculating a frequency of the one or more first data sets in the first table, and one or more second data sets having a predetermined frequency and meaningful to the user from the one or more first data sets in the first table. Extracting and generating the frequency table, and generating the chatbot's speech comprises: generating a second dependent parsing tree for one or more sentences of the user's new speech, and the second dependent syntax Extracting a set of one or more third data sets defined by i) a first object corresponding to a predetermined first vertex, and ii) a second object corresponding to a predetermined second vertex, from the frequency table Comparing the one or more second data sets with the one or more third data sets, and wherein the one or more second data sets and the one or more third data sets in the frequency table have a predetermined similarity 3 extracting a data set as a speech candidate data set, using a natural language generation algorithm, generating a conversation candidate sentence corresponding to the extracted speech candidate data set, and using the conversation candidate sentence to make a chatbot's speech And generating.

Description

METHOD FOR GENERATING CHATBOT UTTERANCE USING FREQUENCY TABLE

본 발명은 빈도 테이블을 이용한 챗봇 발언 생성 방법에 관한 것이다.The present invention relates to a method for generating chatbot speech using a frequency table.

최근 인간과 챗봇 간 의사소통이 가능하게 하는 자연어 처리에 대한 수요가 개인 비서, 메신저, 금융, 유통 등 다양한 분야에서 증가하고 있다. 실시간으로 자연어를 처리하는 챗봇에 있어서, 사용자와 챗봇의 대화를 통해서 얻어진 전체 대화 로그를 저장하고 사용자의 요구가 있을 때 정보를 처리하는 방법은 처리할 데이터의 양이 많아서 신속한 자연어 처리에 적합하지 않다. 이에 따라, 정보를 체계적으로 저장하고 사용자의 요구가 있을 때 신속하게 정보를 처리할 수 있게 하는 기술이 필요해지고 있다.Recently, the demand for natural language processing that enables communication between humans and chatbots is increasing in various fields such as personal assistant, messenger, finance, and distribution. In the case of a chatbot that processes natural language in real time, the method of storing the entire conversation log obtained through the conversation between the user and the chatbot and processing information when there is a user's request is not suitable for rapid natural language processing due to the large amount of data to be processed. . Accordingly, there is a need for a technique that systematically stores information and enables rapid processing of information when there is a user's request.

공개특허공보 제10-2016-0082996호, 2016.06.30Publication Patent Publication No. 10-2016-0082996, 2016.06.30

본 발명이 해결하고자 하는 과제는 빈도 테이블을 이용한 챗봇 발언 생성 방법을 제공하는 것이다.The problem to be solved by the present invention is to provide a method for generating a chatbot utterance using a frequency table.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 빈도 테이블을 이용한 챗봇 발언 생성 방법은 사용자의 과거의 발언을 이용하여 빈도 테이블을 생성하는 단계, 및 상기 빈도 테이블을 이용하여 상기 사용자의 새로운 발언에 응답하기 위한 챗봇의 발언을 생성하는 단계를 포함하고, 상기 빈도 테이블을 생성하는 단계는, 상기 사용자의 상기 과거의 발언의 하나 이상의 문장에 관한 제1 의존 구문 분석 트리를 생성하는 단계, 상기 제1 의존 구문 분석 트리로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제1 데이터 셋을 추출하는 단계, 추출된 상기 하나 이상의 제1 데이터 셋을 입력한 제1 테이블을 생성하는 단계, 상기 제1 테이블에서 상기 하나 이상의 제1 데이터 셋의 빈도를 계산하는 단계, 및 상기 제1 테이블의 상기 하나 이상의 제1 데이터 셋으로부터 소정의 빈도를 갖고 상기 사용자에게 의미있는 하나 이상의 제2 데이터 셋을 추출하여 상기 빈도 테이블을 생성하는 단계를 포함하고, 상기 챗봇의 발언을 생성하는 단계는, 상기 사용자의 새로운 발언의 하나 이상의 문장에 관한 제2 의존 구문 분석 트리를 생성하는 단계, 상기 제2 의존 구문 분석 트리로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제3 데이터 셋을 추출하는 단계, 상기 빈도 테이블의 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋을 비교하는 단계, 상기 빈도 테이블의 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋이 소정의 유사도를 가지는 경우 상기 하나 이상의 제3 데이터 셋을 발언 후보 데이터 셋으로 추출하는 단계, 자연어 생성 알고리즘을 이용하여, 추출된 상기 발언 후보 데이터 셋에 상응하는 대화 후보 문장을 생성하는 단계, 및 상기 대화 후보 문장을 이용하여 챗봇의 발언을 생성하는 단계를 포함한다.Method for generating a chatbot utterance using a frequency table according to an aspect of the present invention for solving the above-described problems is a step of generating a frequency table using the user's past speech, and the user's new speech using the frequency table Generating a utterance of the chatbot for responding to, and generating the frequency table comprises: generating a first dependent parsing tree for one or more sentences of the user's past utterance; 1) extracting, from the dependent parsing tree, i) a first object corresponding to a predetermined first vertex, and ii) extracting a predetermined one or more first data sets defined as a second object corresponding to the predetermined second vertex. Generating a first table in which the one or more first data sets are input, calculating a frequency of the one or more first data sets in the first table, and the one or more first data in the first table And generating the frequency table by extracting one or more second data sets having a predetermined frequency from the set and meaningful to the user, and generating the speech of the chatbot, wherein one or more of the new speech of the user is generated. Generating a second dependent parsing tree for the sentence, from the second dependent parsing tree i) to a first object corresponding to a predetermined first vertex, and ii) to a second object corresponding to a predetermined second vertex. Extracting a defined one or more third data sets, comparing the one or more second data sets in the frequency table with the one or more third data sets, and the one or more second data sets in the frequency table And when the one or more third data sets have a predetermined similarity, extracting the one or more third data sets as a speech candidate data set, and a conversation corresponding to the extracted speech candidate data set using a natural language generation algorithm. Generating a candidate sentence, and generating a chatbot remark using the conversation candidate sentence.

일부 실시예에서, 상기 제1 테이블의 상기 하나 이상의 제1 데이터 셋으로부터 소정의 빈도를 갖고 상기 사용자에게 의미있는 하나 이상의 제2 데이터 셋을 추출하여 상기 빈도 테이블을 생성하는 단계는, 복수의 다른 사용자의 과거의 발언의 하나 이상의 문장에 관한 제3 의존 구문 분석 트리를 생성하는 단계, 상기 제3 의존 구문 분석 트리로부터 i) 소정의 제3 정점에 상응하는 제3 객체, ii) 소정의 제4 정점에 상응하는 제4 객체로 정의된 소정의 하나 이상의 제4 데이터 셋을 추출하는 단계, 상기 하나 이상의 제4 데이터 셋을 입력한 제2 테이블을 생성하는 단계, 상기 제2 테이블에서 상기 하나 이상의 제4 데이터 셋의 빈도를 계산하는 단계, 및 상기 제4 데이터 셋의 빈도 계산 결과와 상기 제1 데이터 셋의 빈도 계산 결과를 비교하고, 상기 비교 결과에 기초하여 상기 하나 이상의 제1 데이터 셋을 상기 사용자에게 의미있는 상기 하나 이상의 제2 데이터 셋으로 추출하는 단계를 포함한다.In some embodiments, generating the frequency table by extracting one or more second data sets having a predetermined frequency and meaningful to the user from the one or more first data sets of the first table includes: a plurality of different users Generating a third dependent parsing tree for one or more sentences of the past remark of i) from the third dependent parsing tree i) a third object corresponding to a predetermined third vertex, ii) a predetermined fourth vertex Extracting a predetermined one or more fourth data sets defined as a fourth object corresponding to, generating a second table in which the one or more fourth data sets are input, and the one or more fourth ones in the second table Calculating a frequency of the data set, and comparing the frequency calculation result of the fourth data set with the frequency calculation result of the first data set, and based on the comparison result, the one or more first data sets to the user And extracting the meaningful one or more second data sets.

일부 실시예에서, 상기 제1 객체 또는 상기 제3 객체는 명사구이고, 상기 제2 객체 또는 상기 제4 객체는 동사구로 정의된다.In some embodiments, the first object or the third object is a noun phrase, and the second object or the fourth object is defined as a verb phrase.

일부 실시예에서, 상기 제1 객체, 상기 제2 객체, 상기 제3 객체 또는 상기 제4 객체는, 형태소적 및 사전적 분석을 통해 파생적 요소가 제거된 표제어로 변환된다.In some embodiments, the first object, the second object, the third object, or the fourth object is converted into a title word with derivative elements removed through morphological and dictionary analysis.

일부 실시예에서, 상기 제1 객체, 상기 제2 객체, 상기 제3 객체 또는 상기 제4 객체는, 복수의 단어가 서로 결합하여 하나의 의미를 형성하는 경우, 복수의 단어가 하나의 객체로 처리된다.In some embodiments, when the first object, the second object, the third object, or the fourth object form a meaning by combining a plurality of words, a plurality of words are treated as one object. do.

일부 실시예에서, 상기 빈도 테이블의 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋이 소정의 유사도를 가지는 경우 상기 하나 이상의 제3 데이터 셋을 발언 후보 데이터 셋으로 추출하는 단계는, 워드투벡(Word2Vec) 기법을 이용하여 상기 하나 이상의 제2 데이터 셋과 상기 하나 이상의 제3 데이터 셋의 유사도를 판단한다.In some embodiments, when the one or more second data sets of the frequency table and the one or more third data sets have a predetermined similarity, extracting the one or more third data sets as a speech candidate data set may include: The similarity between the one or more second data sets and the one or more third data sets is determined using a Word2Vec technique.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific matters of the present invention are included in the detailed description and drawings.

본 발명의 빈도 테이블을 이용한 챗봇 발언 생성 방법에 의하면, 사용자의 과거의 발언을 빈도 테이블로 임베딩하여 체계화 및 조직화된 방식으로 저장 및 관리할 수 있다.According to the method for generating chatbot utterances using the frequency table of the present invention, it is possible to store and manage the user's past utterances into a frequency table in a structured and organized manner.

또한, 사용자의 발언을 빈도 테이블로 만들어서 빈도가 높은 순서대로 정렬함으로써, 지프 법칙에 따라 사용자에게 의미있는 단어 묶음을 검출할 수 있다.In addition, by making the user's remarks into a frequency table and sorting them in order of frequency, it is possible to detect a group of words meaningful to the user according to the Jeep's law.

또한, 자연어 처리 메커니즘을 적용하기 전에 빈도 테이블을 이용하여 자연어 생성에 있어서 의미가 있는 단어인지를 판단하고, 과거 발언과 새로운 발언의 유사도를 판단하여 빈도 테이블에 있는 단어 묶음에 의해 챗봇의 발언이 생성될 수 있는지 알 수 있다.In addition, before applying the natural language processing mechanism, the frequency table is used to determine whether it is a meaningful word in natural language generation, and the similarity between old and new words is determined, and chatbot comments are generated by grouping words in the frequency table. I know if it can be done.

따라서, 빈도 테이블을 이용하여 사용자의 새로운 발언이 의미있는지 판단하고, 새로운 발언이 과거 발언과 유사한지를 판단함으로써, 사용자의 응답이 필요시 신속하게 챗봇의 발언을 생성하여 사용자와 챗봇의 실시간 대화를 가능하게 한다.Therefore, by using the frequency table to determine whether the user's new speech is meaningful, and whether the new speech is similar to the past speech, a chatbot's speech can be quickly generated when the user's response is needed, enabling real-time conversation between the user and the chatbot. To do.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 실시예에 따른 빈도 테이블을 이용한 챗봇 발언을 생성하는 방법의 개략적인 순서도이다.
도 2는 본 발명의 실시예에 따른 빈도 테이블을 생성하는 방법의 개략적인 순서도이다.
도 3은 본 발명의 실시예에 따른 의존 구문 분석 트리의 예시도이다.
도 4는 본 발명의 실시예에 따른 사용자에게 의미있는 제2 데이터 셋을 추출하는 방법의 개략적인 순서도이다.
도 5는 본 발명의 실시예에 따른 챗봇의 발언을 생성하는 방법의 개략적인 순서도이다.
도 6은 본 발명의 실시예에 따른 챗봇 서비스 프로그램을 생성하는 컴퓨터 시스템을 도시한 도면이다.
도 7은 본 발명의 실시예에 따른 챗봇의 발언을 생성하는 챗봇 시스템을 도시한 도면이다.1 is a schematic flowchart of a method for generating a chatbot utterance using a frequency table according to an embodiment of the present invention.
2 is a schematic flowchart of a method for generating a frequency table according to an embodiment of the present invention.
3 is an exemplary diagram of a dependent parsing tree according to an embodiment of the present invention.
4 is a schematic flowchart of a method of extracting a second data set meaningful to a user according to an embodiment of the present invention.
5 is a schematic flowchart of a method for generating a chatbot's remark according to an embodiment of the present invention.
6 is a diagram illustrating a computer system for generating a chatbot service program according to an embodiment of the present invention.
7 is a diagram illustrating a chatbot system that generates a chatbot's comments according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and are common in the technical field to which the present invention pertains. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, “comprises” and / or “comprising” does not exclude the presence or addition of one or more other components other than the components mentioned. The same reference numerals throughout the specification refer to the same components, and “and / or” includes each and every combination of one or more of the components mentioned. Although "first", "second", etc. are used to describe various components, it goes without saying that these components are not limited by these terms. These terms are only used to distinguish one component from another component. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used as meanings commonly understood by those skilled in the art to which the present invention pertains. In addition, terms defined in the commonly used dictionary are not ideally or excessively interpreted unless explicitly defined.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 빈도 테이블을 이용한 챗봇 발언을 생성하는 방법의 개략적인 순서도이다.1 is a schematic flowchart of a method for generating a chatbot utterance using a frequency table according to an embodiment of the present invention.

도 1을 참조하면, 빈도 테이블을 이용한 챗봇 발언을 생성하는 방법은, 빈도 테이블을 생성하는 단계(S100), 챗봇의 발언을 생성하는 단계(S200)를 포함한다.Referring to FIG. 1, a method of generating a chatbot utterance using a frequency table includes generating a frequency table (S100) and generating a chatbot utterance (S200).

단계 S100에서, 컴퓨터는 사용자의 과거의 발언을 이용하여 빈도 테이블을 생성한다.In step S100, the computer generates a frequency table using the user's past remarks.

단계 S200에서, 컴퓨터는 빈도 테이블을 이용하여 사용자의 새로운 발언에 응답하기 위한 챗봇의 발언을 생성한다.In step S200, the computer generates a chatbot's utterance to respond to the user's new utterance using the frequency table.

도 2는 본 발명의 실시예에 따른 빈도 테이블을 생성하는 방법의 개략적인 순서도이다.2 is a schematic flowchart of a method for generating a frequency table according to an embodiment of the present invention.

도 2를 참조하면, 빈도 테이블을 생성하는 방법은, 제1 의존 구문 분석 트리(300)를 생성하는 단계(S110), 제1 데이터 셋을 추출하는 단계(S120), 제1 테이블을 생성하는 단계(S130), 제1 데이터 셋의 빈도를 계산하는 단계(S140) 및 제2 데이터 셋을 추출하여 빈도 테이블을 생성하는 단계(S150)를 포함한다.Referring to FIG. 2, a method of generating a frequency table includes: generating a first dependent parsing tree 300 (S110), extracting a first data set (S120), and generating a first table (S130), calculating the frequency of the first data set (S140) and extracting the second data set to generate a frequency table (S150).

단계 S110에서, 컴퓨터는 사용자(특정 사용자라고 할 수 있다.)의 과거의 발언의 하나 이상의 문장에 관한 제1 의존 구문 분석 트리(300)를 생성한다.In step S110, the computer generates a first dependent parsing tree 300 relating to one or more sentences of the past remarks of the user (which may be referred to as a specific user).

의존 구문 분석 트리(300)는 사용자의 과거의 발언의 하나 이상의 문장에 포함되어 있는 복수의 단어를 각각 복수의 정점으로 하고, 복수의 단어 사이의 의존 관계에 따라 복수의 정점을 연결한 것이다.The dependency parsing tree 300 is a plurality of words included in one or more sentences of a user's past remarks, and is a plurality of vertices, and a plurality of vertices are connected according to a dependency relationship between the plurality of words.

단어 사이의 의존 관계는, 예를 들어, 표 1과 같이 문장에서 단어와 단어가 문법적 또는 의미적으로 어떤 관계로 되어있는지를 알려준다.The dependency relationship between words indicates, for example, how the words and words are grammatically or semantically related in a sentence, as shown in Table 1.

의존 관계Dependence 설명Explanation 의존 관계Dependence 설명Explanation nsubjnsubj 문장 주어Sentence subject detdet 한정사Limited company dobjdobj 직접 목적어Direct object possposs 소유격possessive case iobjiobj 간접 목적어An indirect object conjconj 접속어conjunctive ccompccomp 절의 보어Clause bore cccc 등위 접속사coordinator nmodnmod 명사의 수식어Modifiers of nouns compoundcompound 합성어compound word amodamod 형용사의 수식어Adjectives punctpunct 구두점full stop

의존 구문 분석 트리(300)는 본 발명이 속하는 기술분야의 통상의 기술자에게 자명한 사항이며, 이에 관한 상세한 설명은 본 발명의 요지를 흐릴 수 있으므로 이하 생략하기로 한다.The dependent parsing tree 300 is a matter obvious to those skilled in the art to which the present invention pertains, and a detailed description thereof will be omitted below because it may obscure the subject matter of the present invention.

일부 실시예에서, 컴퓨터는 제1 의존 구문 분석 트리(300)를 생성하기 전에, 사용자의 과거의 발언의 하나 이상의 문장 중에 대명사와 같이 동일 대상을 나타내는 복수의 단어가 있으면 다른 대상을 나타내는 대명사와 구별할 수 있도록 변환한다. 컴퓨터는 대명사뿐만 아니라 지시대명사와 제3자를 지칭하는 표현에서도 동일 작업을 수행한다. 예를 들어, “Jane likes apple. She is vegetarian.”와 “Alice goes to mountain. She likes hiking.”의 문장에서 첫 번째 문장의 “She”는 “Jane”을 의미하고 두 번째 문장의 “She”는 “Alice”를 의미한다. 이때 첫 번째 문장의 “She”와 두 번째 문장의 “She”는 각각 다른 대상을 지시하고 있는 대명사로 구별되므로 각각 별개의 표현이 되도록 변환한다.In some embodiments, before the computer generates the first dependent parsing tree 300, if there is a plurality of words representing the same subject, such as pronouns, in one or more sentences of the user's past remarks, they are distinguished from pronouns representing other objects. To make it possible. The computer performs the same operation in not only pronouns, but also expressions referring to directive pronouns and third parties. For example, “Jane likes apple. She is vegetarian. ”And“ Alice goes to mountain. In the sentence “She likes hiking.”, “She” in the first sentence means “Jane”, and “She” in the second sentence means “Alice”. At this time, “She” in the first sentence and “She” in the second sentence are distinguished by pronouns indicating different objects, so they are converted into separate expressions.

일부 실시예에서, 컴퓨터는 “and”, “the”, “also”, “which”와 같은 접속사, 관사, 부사, 한정사 등과 같은 단어도 의존 구문 분석 트리(300)의 한 정점으로 포함시킬 수 있다.In some embodiments, the computer may also include words such as conjunctions, articles, adverbs, qualifiers, such as “and”, “the”, “also”, and “which” as a vertex of the dependent parsing tree 300. .

일부 실시예에서, 컴퓨터는 제1 의존 구문 분석 트리(300)의 복수의 정점에 상응하는 복수의 단어가 서로 결합하여 하나의 의미를 형성하는 경우, 복수의 정점을 하나의 정점으로 통합한다. 예를 들어, 컴퓨터는 두 단어가 결합하여 합성어를 형성하거나 소유격과 명사가 결합한 형태의 경우 하나의 정점으로 통합한다. 예를 들어, “우리”와 “할머니”가 결합하여 “우리 할머니”로 통합할 수 있다. 또한, “보따리”와 “짐”이 결합하여 “보따리 짐”으로 통합할 수 있다. 또한, “오고”와 “가다”가 결합하여 “오고 가다”로 통합할 수 있다.In some embodiments, when a plurality of words corresponding to a plurality of vertices of the first dependent parsing tree 300 are combined with each other to form a meaning, the computer merges the plurality of vertices into one vertex. For example, a computer combines two words to form a compound word, or combines possessive and noun into one vertex. For example, “us” and “grandmother” can be combined and integrated into “my grandmother”. In addition, “bag” and “baggage” can be combined to form “baggage”. In addition, “coming” and “going” can be combined to form “coming and going”.

의존 구문 분석 트리(300)의 소정의 복수의 정점을 하나의 정점으로 통합하기 위한 규칙의 예가 이에 제한되는 것은 아니며, 예시되지 않은 다른 임의의 규칙이 적용될 수 있다.Examples of rules for integrating a plurality of vertices of the dependency parsing tree 300 into one vertex are not limited thereto, and other arbitrary rules not illustrated may be applied.

단계 S120에서, 컴퓨터는 제1 의존 구문 분석 트리(300)로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제1 데이터 셋을 추출한다.In step S120, the computer from the first dependent parsing tree 300 i) a predetermined object defined by a first object corresponding to a predetermined first vertex, ii) a second object corresponding to a predetermined second vertex The first data set is extracted.

하나 이상의 제1 데이터 셋의 제1 객체는 문장의 주어로 될 수 있고, 제2 객체는 문장의 동사로 될 수 있다. 또한, 하나 이상의 제1 데이터 셋의 제1 객체는 명사 또는 명사구로 될 수 있고, 제2 객체는 동사 또는 동사구로 될 수 있다.The first object of the one or more first data sets may be the subject of the sentence, and the second object may be the verb of the sentence. Also, the first object of the one or more first data sets may be a noun or a noun phrase, and the second object may be a verb or a verb phrase.

제1 객체 또는 제2 객체는 형태소적 및 사전적 분석을 통해 파생적 요소가 제거된 표제어로 변환되어, 제1 데이터 셋으로 될 수 있다. 예를 들어, “걷기”, “걸어 다닙니다”, “걷다”는 “걷다”로 변환될 수 있다.The first object or the second object may be converted into a title word in which a derivative element is removed through morphological and dictionary analysis, and thus may be a first data set. For example, “walk”, “walk”, and “walk” can be converted to “walk”.

컴퓨터는 추출된 하나 이상의 제1 데이터 셋의 제1 객체 및 제2 객체를 정점으로 하고, 제1 객체와 제2 객체 간의 관계를 방향성 간선으로 하는 시맨틱 그래프 데이터베이스를 생성할 수 있다. The computer may generate a semantic graph database having a first object and a second object of the extracted one or more first data sets as vertices, and a relationship between the first object and the second object as a directional edge.

일부 실시예에서, 방향성 간선은 제1 객체와 제2 객체 간의 관계 정보뿐만 아니라 데이터 셋에 상응하는 문장의 타임 스탬프 정보를 포함한다. 예를 들어, 사용자가 데이터 셋에 상응하는 문장을 발언하여 제1 객체와 제2 객체 간의 관계를 정의 내린 시점의 시간 정보가 방향성 간선에 포함되게 할 수 있다.In some embodiments, the directional trunk includes relationship information between the first object and the second object, as well as time stamp information of the sentence corresponding to the data set. For example, time information at a time when a user defines a relationship between a first object and a second object by speaking a sentence corresponding to a data set may be included in the directional trunk.

단계 S130에서, 컴퓨터는 추출된 하나 이상의 제1 데이터 셋을 입력한 제1 테이블을 생성한다.In step S130, the computer generates a first table in which the extracted one or more first data sets are input.

컴퓨터는 추출된 하나 이상의 제1 데이터 셋을 제1 테이블에 입력하기 전에 제1 테이블을 초기화한다. 여기서, “초기화”한다는 것은 테이블의 데이터를 저장하기 위한 메모리 공간을 확보(또는 할당)하는 것을 의미한다.The computer initializes the first table before inputting the extracted one or more first data sets to the first table. Here, “initializing” means securing (or allocating) memory space for storing table data.

일부 실시예에서, 컴퓨터는 주어에 해당하는 제1 객체와 동사에 해당하는 제2 객체를 제1 테이블의 주어 열과 동사 열에 각각 입력하여 쌍을 이루게 할 수 있다.In some embodiments, the computer may pair the first object corresponding to the subject and the second object corresponding to the verb in the subject column and the verb column of the first table, respectively.

단계 S140에서, 컴퓨터는 제1 테이블에서 하나 이상의 제1 데이터 셋의 빈도를 계산한다.In step S140, the computer calculates the frequency of one or more first data sets in the first table.

컴퓨터는 제1 테이블에 입력된 제1 데이터 셋의 빈도를 계산하고, 계산된 빈도가 높은 순서대로 제1 데이터 셋을 나열하면 지프 법칙을 따를 수 있다. 예를 들어, 문장에서 많이 발견되는 “I-be”, “I-take”, “I-get”은 순위를 낮게 계산한다.The computer may follow the Jeep's law by calculating the frequency of the first data set input to the first table, and listing the first data set in the order in which the calculated frequency is high. For example, “I-be”, “I-take”, and “I-get”, which are frequently found in sentences, count the ranks low.

지프 법칙의 가장 단순한 예는 “1/f 함수”로, 지프 분포를 따르는 빈도가 순위에 따라 정렬되어 주어졌을 때, 2 위에 해당하는 빈도는 1 위의 빈도의 1/2가 된다. 3 위의 빈도는 1 위 빈도의 1/3이 된다. 이러한 방식으로, n 위의 빈도는 1 위의 빈도의 1/n이 된다. 즉 단어 빈도는 단어의 빈도를 크기 순으로 정렬했을 경우 빈도수와 순위는 반비례 관계가 존재한다. 즉, 단어 빈도는 아래와 같은 수식으로 표현될 수 있다.The simplest example of the Jeep's law is the “1 / f function”, where the frequency following the Jeep distribution is given in order of magnitude, and the frequency corresponding to 2 is half the frequency of 1 above. The frequency above 3 is 1/3 of the frequency above 1. In this way, the frequency above n becomes 1 / n of the frequency above 1. That is, when word frequencies are sorted in order of word frequency, there is an inverse relationship between frequency and ranking. That is, the word frequency can be expressed by the following formula.

단계 S150에서, 컴퓨터는 제1 테이블의 하나 이상의 제1 데이터 셋으로부터 소정의 빈도를 갖고 사용자에게 의미있는 하나 이상의 제2 데이터 셋을 추출하여 빈도 테이블을 생성한다.In step S150, the computer generates a frequency table by extracting one or more second data sets having a predetermined frequency and meaningful to a user from one or more first data sets of the first table.

단계 S150의 사용자에게 의미있는 데이터 셋을 추출하는 방법은 아래의 도 4에서 상세히 설명한다.A method of extracting a data set meaningful to the user of step S150 will be described in detail in FIG. 4 below.

도 3은 본 발명의 실시예에 따른 의존 구문 분석 트리의 예시도이다.3 is an exemplary diagram of a dependent parsing tree according to an embodiment of the present invention.

도 3을 참조하면, 예시적인 사용자의 발언인 “My brother and I like the new Star Wars and also Star Trek which are coming out this year.”의 의존 구문 분석 트리(300)가 도시된다.Referring to FIG. 3, a dependency parsing tree 300 of an example user's remark “My brother and I like the new Star Wars and also Star Trek which are coming out this year.” Is shown.

예시적인 의존 구문 분석 트리(300)에서는, “My brother and I like the new Star Wars and also Star Trek which are coming out this year.” 문장이 “My”(301), “brother”(302), “and”(309), “I”(305), “like”(303), “the”(310), “new”(311), “Star”(312), “Wars”(306), “and”(309), “also”(313), “Star”(312), “Trek”(307), “which”(314), “are”(315), “coming”(308), “out”(316), “this”(318), “year”(317), “.”(304)으로 각각의 단어와 문장의 부호로 나누어진다.In the exemplary dependency parsing tree 300, “My brother and I like the new Star Wars and also Star Trek which are coming out this year.” The sentences are “My” (301), “brother” (302), “and” (309), “I” (305), “like” (303), “the” (310), “new” (311) , “Star” (312), “Wars” (306), “and” (309), “also” (313), “Star” (312), “Trek” (307), “which” (314), “Are” (315), “coming” (308), “out” (316), “this” (318), “year” (317), “.” (304) as the sign of each word and sentence Is divided.

예시적인 의존 구문 분석 트리(300)에서, “My”(301), “brother”(302), “and”(309), “I”(305), “like”(303), “the”(310), “new”(311), “Star”(312), “Wars”(306), “and”(309), “also”(313), “Star”(312), “Trek”(307), “which”(314), “are”(315), “coming”(308), “out”(316), “this”(318), “year”(317), “.”(304)의 각각의 단어와 문장의 부호가 정점으로 되고 각각의 단어의 의존 관계가 방향성 간선으로 연결되어 있다. 예를 들어, “My”(301)와 “brother”(302)는 소유격(possessive) 관계를, “brother”(302)와 “I”(305)는 접속어(conjunctive) 관계를, “Star”(312)와 “Wars”(306)는 합성어(compound) 관계를 갖는 것으로 분석되고, 방향성 간선에 의해 연결된다.In the example dependency parsing tree 300, “My” 301, “brother” 302, “and” 309, “I” 305, “like” 303, “the” ( 310), “new” (311), “Star” (312), “Wars” (306), “and” (309), “also” (313), “Star” (312), “Trek” (307 ), “Which” (314), “are” (315), “coming” (308), “out” (316), “this” (318), “year” (317), “.” (304) The sign of each word and sentence of is the peak, and the dependency relationship of each word is connected by the directional trunk. For example, “My” (301) and “brother” (302) have a possessive relationship, “brother” (302) and “I” (305) have a conjunctive relationship, and “Star” ( 312) and “Wars” 306 are analyzed to have a compound relationship, and are connected by directional trunks.

상술한 바와 같이, 의존 구문 분석 트리(300)의 소정의 복수의 정점이 하나의 정점으로 통합되어 변환될 수 있다. 예시적인 의존 구문 분석 트리(300)에서, “My”(301)와 “brother”(302)가 결합하여 “My brother”로 통합할 수 있다. 또한, “Star”(312)와 “Trek”(307)이 결합하여 “Star Trek”으로 통합할 수 있다. 또한, “are”(315)와 “coming”(308)과 “out”(316)이 결합하여 “are coming out”으로 통합할 수 있다.As described above, a plurality of predetermined vertices of the dependent parsing tree 300 may be integrated and transformed into one vertex. In the exemplary dependent parsing tree 300, “My” 301 and “brother” 302 can be combined and incorporated into “My brother”. In addition, “Star” 312 and “Trek” 307 can be combined and integrated into “Star Trek”. In addition, “are” 315 and “coming” 308 and “out” 316 can be combined to be integrated into “are coming out”.

컴퓨터는 예시적인 사용자의 발언인 “My brother and I like the new Star Wars and also Star Trek which are coming out this year.”의 의존 구문 분석 트리(300)에서 데이터 셋을 추출하고, 추출된 데이터 셋을 정점과 방향성 간선으로 하는 시맨틱 그래프 데이터베이스를 생성한다.The computer extracts the data set from the dependency parsing tree 300 of the example user's remark “My brother and I like the new Star Wars and also Star Trek which are coming out this year.”, And extracts the extracted data set. Create a semantic graph database with vertices and directional edges.

도 4는 본 발명의 실시예에 따른 사용자에게 의미있는 제2 데이터 셋을 추출하는 방법의 개략적인 순서도이다.4 is a schematic flowchart of a method of extracting a second data set meaningful to a user according to an embodiment of the present invention.

도 4를 참조하면, 사용자에게 의미있는 제2 데이터 셋을 추출하는 방법은 제3 의존 구문 분석 트리(300)를 생성하는 단계(S151), 제4 데이터 셋을 추출하는 단계(S152), 제2 테이블을 생성하는 단계(S153), 제4 데이터 셋의 빈도를 계산하는 단계(S154) 및 사용자에게 의미있는 제2 데이터 셋을 추출하는 단계(S155)를 포함한다.Referring to FIG. 4, a method for extracting a second data set meaningful to a user includes generating a third dependent parsing tree 300 (S151), extracting a fourth data set (S152), and second It includes creating a table (S153), calculating a frequency of the fourth data set (S154), and extracting a second data set meaningful to the user (S155).

단계 S151에서, 컴퓨터는 복수의 다른 사용자(글로벌 사용자라고 할 수 있다.)의 과거의 발언의 하나 이상의 문장에 관한 제3 의존 구문 분석 트리(300)를 생성한다.In step S151, the computer generates a third dependent parsing tree 300 for one or more sentences of the past remarks of a plurality of different users (which may be referred to as global users).

컴퓨터는 도 2에서 상술한 방법에 의해 복수의 다른 사용자의 과거의 발언의 하나 이상의 문장에 관한 제3 의존 구문 분석 트리(300)를 생성한다. 컴퓨터는 데이터 베이스에 다양한 사용자의 과거 발언을 저장하고, 이를 이용하여 제3 의존 구문 분석 트리(300)를 생성한다. 컴퓨터는 제1 의존 구문 분석 트리(300), 제2 의존 구문 분석 트리(300)로 구성된 사용자의 과거, 현재의 발언 또한 제3 의존 구문 분석 트리(300)로 구성되게 할 수 있다.The computer generates a third dependent parsing tree 300 for one or more sentences of past remarks of a plurality of different users by the method described above in FIG. 2. The computer stores the past remarks of various users in the database, and uses this to generate the third dependent parsing tree 300. The computer may cause the user's past and present statements composed of the first dependent parsing tree 300 and the second dependent parsing tree 300 to also be composed of the third dependent parsing tree 300.

단계 S152에서, 컴퓨터는 제3 의존 구문 분석 트리(300)로부터 i) 소정의 제3 정점에 상응하는 제3 객체, ii) 소정의 제4 정점에 상응하는 제4 객체로 정의된 소정의 하나 이상의 제4 데이터 셋을 추출한다.In step S152, the computer from the third dependent parsing tree 300 i) a third object corresponding to a predetermined third vertex, ii) a predetermined one or more objects defined as a fourth object corresponding to the predetermined fourth vertex The fourth data set is extracted.

하나 이상의 제4 데이터 셋의 제3 객체는 문장의 주어로 될 수 있고, 제4 객체는 문장의 동사로 될 수 있다. 또한, 하나 이상의 제4 데이터 셋의 제3 객체는 명사 또는 명사구로 될 수 있고, 제4 객체는 동사 또는 동사구로 될 수 있다.The third object of the one or more fourth data sets may be the subject of the sentence, and the fourth object may be the verb of the sentence. Also, the third object of the one or more fourth data sets may be a noun or a noun phrase, and the fourth object may be a verb or a verb phrase.

제3 객체 또는 제4 객체는 형태소적 및 사전적 분석을 통해 파생적 요소가 제거된 표제어로 변환되어, 제4 데이터 셋으로 될 수 있다. 예를 들어, “걷기”, “걸어 다닙니다”, “걷다”는 “걷다”로 변환될 수 있다.The third object or the fourth object may be converted into a title word in which a derivative element is removed through morphological and dictionary analysis, and thus may be a fourth data set. For example, “walk”, “walk”, and “walk” can be converted to “walk”.

단계 S153에서, 컴퓨터는 하나 이상의 제4 데이터 셋을 입력한 제2 테이블을 생성한다.In step S153, the computer generates a second table in which one or more fourth data sets are input.

일부 실시예에서, 컴퓨터는 주어에 해당하는 제3 객체와 동사에 해당하는 제4 객체를 제2 테이블의 주어 열과 동사 열에 각각 입력하여 쌍을 이루게 할 수 있다.In some embodiments, the computer may pair the third object corresponding to the subject and the fourth object corresponding to the verb in the subject column and the verb column of the second table, respectively.

단계 S154에서, 컴퓨터는 제2 테이블에서 하나 이상의 제4 데이터 셋의 빈도를 계산한다.In step S154, the computer calculates the frequency of one or more fourth data sets in the second table.

컴퓨터는 제2 테이블에 입력된 제4 데이터 셋의 빈도를 계산하고, 계산된 빈도가 높은 순서대로 제4 데이터 셋을 나열하면 지프 법칙을 따를 수 있다. 예를 들어, 문장에서 많이 발견되는 “I-be”, “I-take”, “I-get”은 순위를 낮게 계산한다.The computer can follow the Jeep's law by calculating the frequency of the fourth data set input to the second table and listing the fourth data set in the order of the highest calculated frequency. For example, “I-be”, “I-take”, and “I-get”, which are frequently found in sentences, count low.

단계 S155에서, 컴퓨터는 제4 데이터 셋의 빈도 계산 결과와 제1 데이터 셋의 빈도 계산 결과를 비교하고, 비교 결과에 기초하여 하나 이상의 제1 데이터 셋을 사용자에게 의미있는 하나 이상의 제2 데이터 셋으로 추출한다.In step S155, the computer compares the frequency calculation result of the fourth data set with the frequency calculation result of the first data set, and based on the comparison result, one or more first data sets into one or more second data sets meaningful to the user. To extract.

컴퓨터는 제1 데이터 셋이 제4 데이터 셋에서는 발견되지 않거나(즉, 빈도가 0임을 의미한다.), 그 빈도가 소정의 기준 값보다 낮을 때에, 제1 데이터 셋을 해당 사용자에게 의미 있는 것으로 판단하고 제2 데이터 셋으로 추출할 수 있다.The computer determines that the first data set is meaningful to the user when the first data set is not found in the fourth data set (that is, the frequency is 0) or when the frequency is lower than a predetermined reference value. And extract it as a second data set.

특정 사용자의 데이터 셋 중 빈도수가 낮은 쌍만이 문장에서 의미 있다. 따라서, 글로벌 사용자의 데이터 셋 중 빈도수가 높은 쌍이 특정 사용자의 데이터 셋에 포함되어 있으면 해당 데이터 셋은 문장에서 의미 없는 것으로 판단할 수 있다. 예를 들어, 특정 사용자의 데이터 셋인 (my father, eat) 쌍의 발생이 특정 사용자 쌍의 x%라고 가정하고, 글로벌 사용자의 데이터 셋인 (my father, eat) 쌍의 발생이 글로벌 사용자 쌍의 y%라고 가정하면, y%와 x%의 차이가 소정의 기준값 이상인 경우 (my father, eat) 쌍이 특정 사용자에 대해 의미 있는 것으로 판단할 수 있다.Of the specific user data sets, only the low-frequency pair is meaningful in a sentence. Accordingly, if a pair of high frequency among the global user's data set is included in the data set of a specific user, it may be determined that the data set is meaningless in a sentence. For example, assuming that the occurrence of a pair (my father, eat) of a particular user's data set is x% of a particular user pair, and the occurrence of a pair of (my father, eat) of a global user's data set is y% of a global user pair. Assuming that, if the difference between y% and x% is greater than or equal to a predetermined reference value (my father, eat), it can be determined that the pair is meaningful for a specific user.

도 5는 본 발명의 실시예에 따른 챗봇의 발언을 생성하는 방법의 개략적인 순서도이다.5 is a schematic flowchart of a method for generating a chatbot's remark according to an embodiment of the present invention.

도 5를 참조하면, 챗봇의 발언을 생성하는 방법은 제2 의존 구문 분석 트리(300)를 생성하는 단계(S210), 제3 데이터 셋을 추출하는 단계(S220), 제2 데이터 셋과 제3 데이터 셋을 비교하는 단계(S230), 제3 데이터 셋을 발언 후보 데이터 셋으로 추출하는 단계(S240), 대화 후보 문장을 생성하는 단계(S250), 챗봇의 발언을 생성하는 단계(S260)를 포함한다.Referring to FIG. 5, a method for generating a chatbot's remarks includes: generating a second dependent parsing tree 300 (S210), extracting a third data set (S220), second data set and third Comparing the data set (S230), extracting the third data set as a speech candidate data set (S240), generating a conversation candidate sentence (S250), and generating a chatbot's speech (S260) do.

단계 S210에서, 컴퓨터는 사용자(특정 사용자라고 할 수 있다.)의 새로운 발언의 하나 이상의 문장에 관한 제2 의존 구문 분석 트리(300)를 생성한다.In step S210, the computer generates a second dependent parsing tree 300 for one or more sentences of the new remark of the user (which may be referred to as a specific user).

컴퓨터는 도 2에서 상술한 방법에 의해 사용자의 새로운 발언의 하나 이상의 문장에 관한 제2 의존 구문 분석 트리(300)를 생성한다.The computer generates a second dependent parse tree 300 for one or more sentences of the user's new remark by the method described above in FIG. 2.

단계 S220에서, 컴퓨터는 제2 의존 구문 분석 트리(300)로부터 i) 소정의 제1 정점에 상응하는 제1 객체, ii) 소정의 제2 정점에 상응하는 제2 객체로 정의된 소정의 하나 이상의 제3 데이터 셋을 추출한다.In step S220, the computer from the second dependent parsing tree 300 i) a predetermined object defined by a first object corresponding to a predetermined first vertex, ii) a second object corresponding to the predetermined second vertex Extract the third data set.

하나 이상의 제3 데이터 셋의 제1 객체는 문장의 주어로 될 수 있고, 제2 객체는 문장의 동사로 될 수 있다. 또한, 하나 이상의 제3 데이터 셋의 제1 객체는 명사 또는 명사구로 될 수 있고, 제2 객체는 동사 또는 동사구로 될 수 있다.The first object of the one or more third data sets may be the subject of the sentence, and the second object may be the verb of the sentence. Also, the first object of the one or more third data sets may be a noun or a noun phrase, and the second object may be a verb or a verb phrase.

제1 객체 또는 제2 객체는 형태소적 및 사전적 분석을 통해 파생적 요소가 제거된 표제어로 변환되어, 제3 데이터 셋으로 될 수 있다. 예를 들어, “걷기”, “걸어 다닙니다”, “걷다”는 “걷다”로 변환될 수 있다.The first object or the second object may be converted into a title word in which a derivative element is removed through morphological and dictionary analysis, and thus may be a third data set. For example, “walk”, “walk”, and “walk” can be converted to “walk”.

단계 S230에서, 컴퓨터는 빈도 테이블의 하나 이상의 제2 데이터 셋과 하나 이상의 제3 데이터 셋을 비교한다.In step S230, the computer compares the one or more second data sets of the frequency table with the one or more third data sets.

컴퓨터는 하나 이상의 제2 데이터 셋과 하나 이상의 제3 데이터 셋을 비교하여 소정의 유사도를 가지는지를 판단한다.The computer compares one or more second data sets and one or more third data sets to determine whether they have a predetermined similarity.

단계 S240에서, 컴퓨터는 빈도 테이블의 하나 이상의 제2 데이터 셋과 하나 이상의 제3 데이터 셋이 소정의 유사도를 가지는 경우 하나 이상의 제3 데이터 셋을 발언 후보 데이터 셋으로 추출한다.In step S240, the computer extracts one or more third data sets as candidate speech data sets when the one or more second data sets of the frequency table and the one or more third data sets have a predetermined similarity.

컴퓨터는 특정 사용자의 새로운 발언으로부터 추출된 제3 데이터 셋이 글로벌 사용자의 발언으로부터 추출된 제4 데이터 셋과 비교하여 의미있는지 판단하고, 특정 사용자의 과거 발언 중 의미있는 제2 데이터 셋과 제3 데이터 셋이 유사한지를 판단하여 발언 후보 데이터 셋을 추출한다.The computer determines whether the third data set extracted from the new speech of the specific user is meaningful compared to the fourth data set extracted from the global user's speech, and the meaningful second data set and the third data among the past speeches of the specific user The set of similar candidates is extracted by determining whether the three are similar.

일부 실시예에서, 특정 사용자가 “I cooked with my dad last weekend.”이라고 말한다고 하면, 컴퓨터는 명사인 “my dad”와 “I”, 동사인 “cook”을 의존 구문 분석 트리(300)의 객체로 추출할 수 있다. 그 후, 컴퓨터는 (I, cook), (my dad, cook)의 데이터 셋을 추출할 수 있다. (I, cook), (my dad, cook)의 데이터 셋은 글로벌 사용자의 발언의 데이터 셋과 비교하여 의미가 있으므로, 사용자의 과거 발언 중 의미있는 데이터 셋과 유사도 판단을 한다. 특정 사용자의 새로운 발언의 데이터 셋인 (my dad, cook)은 특정 사용자의 과거 발언의 데이터 셋인 (my dad, eat)과 유사하다. 따라서, 컴퓨터는 (my dad, eat)을 발언 후보 데이터 셋으로 추출할 수 있다. 컴퓨터는 아래의 S250 내지 S260의 방법으로 발언 후보 데이터 셋으로 추출된 (my dad, eat)을 반영한 자연어 생성 알고리즘으로 “Did your father eat a lot?”을 생성될 수 있다.In some embodiments, if a particular user says “I cooked with my dad last weekend.”, The computer relies on the nouns “my dad” and “I”, the verb “cook” as an object in the dependency parsing tree 300 Can be extracted with Then, the computer can extract data sets of (I, cook), (my dad, cook). Since the data sets of (I, cook) and (my dad, cook) are meaningful compared to the data set of the global user's speech, the similarity of the data set of the user's past speech is determined. The dataset (my dad, cook) of a particular user's new speech is similar to the dataset (my dad, eat) of a particular user's past speech. Therefore, the computer can extract (my dad, eat) as a speech candidate data set. The computer may generate “Did your father eat a lot?” With a natural language generation algorithm that reflects (my dad, eat) extracted as a speech candidate data set in the following S250 to S260 methods.

다른 실시예에서, 사용자가 “I watched a movie with my father last night.”이라고 말한다고 하면, 컴퓨터는 명사인 “I”, “movie”, “my father”과 동사인 “watch”를 의존 구문 분석 트리(300)의 객체로 추출할 수 있다. 그 후, 컴퓨터는 (I, watch), (movie, watch), (my father, watch)의 데이터 셋을 추출할 수 있다. (I, watch), (movie, watch), (my father, watch)의 데이터 셋은 글로벌 사용자의 발언의 데이터 셋과 비교하여 의미가 있으므로, 사용자의 과거 발언 중 의미있는 데이터 셋과 유사도 판단을 한다. 하지만, 특정 사용자의 새로운 발언의 데이터 셋인 (I, watch), (movie, watch), (my father, watch)은 특정 사용자의 과거 발언의 데이터 셋과 유사하지 않다. 따라서, 컴퓨터는 (I, watch), (movie, watch), (my father, watch)를 발언 후보 데이터 셋으로 추출하지 않을 수 있다.In another embodiment, if the user says "I watched a movie with my father last night." It can be extracted as an object of 300. Then, the computer can extract data sets of (I, watch), (movie, watch), and (my father, watch). The data sets of (I, watch), (movie, watch), and (my father, watch) are meaningful compared to the data set of the global user's speech, so the similarity of the data set of the user's past speech is judged. . However, the datasets of a specific user's new remarks (I, watch), (movie, watch), and (my father, watch) are not similar to the dataset of a specific user's past remarks. Therefore, the computer may not extract (I, watch), (movie, watch), and (my father, watch) as a candidate speech data set.

다른 실시예에서, 사용자가 “I am tired.”라고 말한다면, 컴퓨터는 명사인 “I”와 동사인 “be”를 의존 구문 분석 트리(300)의 객체로 추출할 수 있다. 그 후, 컴퓨터는 (I, be)의 데이터 셋을 추출할 수 있다. 하지만, (I, be)의 데이터 셋은 글로벌 사용자의 발언의 데이터 셋과 비교하여 의미가 없으므로, 컴퓨터는 (I, be)를 발언 후보 데이터 셋으로 추출하지 않을 수 있다.In another embodiment, if the user says “I am tired.”, The computer may extract the nouns “I” and the verb “be” as objects in the dependent parsing tree 300. Then, the computer can extract the data set of (I, be). However, since the data set of (I, be) has no meaning compared to the data set of the global user's speech, the computer may not extract (I, be) as the speech candidate data set.

컴퓨터가 특정 사용자의 새로운 발언인 제3 데이터 셋과 과거 발언인 제2 데이터 셋의 유사도를 판단하는 것은 워드투벡(Word2Vec) 기법을 이용하거나 소정의 임계점 이상으로 제2 데이터 셋과 제3 데이터 셋이 유사한지 판단함으로써, 유사도를 판단할 수 있다.When the computer determines the similarity between a specific user's new utterance 3rd data set and the past utterance 2nd data set, the second data set and the third data set may be used by using the Word2Vec technique or above a predetermined threshold. By determining whether they are similar, similarity can be determined.

단계 S250에서, 컴퓨터는 자연어 생성 알고리즘을 이용하여, 추출된 상기 발언 후보 데이터 셋에 상응하는 대화 후보 문장을 생성한다. 자연어 생성 알고리즘으로는 본 발명이 속하는 기술 분야에서 잘 알려진 다양한 알고리즘이 사용될 수 있다.In step S250, the computer generates a dialogue candidate sentence corresponding to the extracted speech candidate data set using a natural language generation algorithm. As a natural language generation algorithm, various algorithms well known in the art to which the present invention pertains can be used.

단계 S260에서, 컴퓨터는 대화 후보 문장을 이용하여 챗봇의 발언을 생성한다.In step S260, the computer generates a chatbot's remark using the dialogue candidate sentence.

컴퓨터는 하나 이상의 후보 문장을 이용하여 챗봇의 발언을 생성함에 있어서 딥러닝 알고리즘을 이용하여, 사용자와 챗봇의 대화의 컨텍스트 및 사용자의 새로운 발언에 대한 하나 이상의 후보 문장의 평가 스코어를 산출한다. 컴퓨터는 산출된 평가 스코어가 소정의 기준치 이상인 하나 이상의 후보 문장을 이용하여 챗봇의 발언을 생성한다.The computer uses a deep learning algorithm in generating a chatbot's remark using one or more candidate sentences, and calculates an evaluation score of the user's chatbot's context and the user's new remark for a new remark. The computer generates a chatbot's remark using one or more candidate sentences in which the calculated evaluation score is greater than or equal to a predetermined reference value.

컴퓨터는 사용자의 새로운 발언에 대해서 사용자와 챗봇의 전체 대화의 컨텍스트 문자열을 만든다. 예를 들어, 컨텍스트 문자열은 전체 대화에서 사용자의 과거 발언으로부터 새로운 발언까지 단어들의 연결일 수 있다.The computer creates a context string of the entire conversation between the user and the chatbot about the user's new speech. For example, the context string may be a concatenation of words from the user's past remarks to new remarks in the entire conversation.

컴퓨터는 딥러닝을 이용한 사전에 훈련된 스코어 시스템으로 단계 S250에서 생성된 후보 문장의 평가 스코어를 산출한다. 스코어 시스템은, [컨텍스트 문자열, 사용자의 새로운 발언, 후보 문장]의 형식을 입력받아서, 컨텍스트 문자열과 사용자의 새로운 발언을 기준으로 단계 S250에서 생성된 후보 문장의 평가 스코어를 산출한다.The computer calculates the evaluation scores of candidate sentences generated in step S250 using a pre-trained score system using deep learning. The score system receives the form of [context string, user's new remark, candidate sentence], and calculates the evaluation score of the candidate sentence generated in step S250 based on the context string and the user's new remark.

일부 실시예에서, 컴퓨터는 평가 스코어가 높은 후보 문장을 이용하여 챗봇의 발언을 생성할 수 있다.In some embodiments, the computer may generate a chatbot's comments using candidate sentences with a high evaluation score.

다른 실시예에서, 컴퓨터는 빈도 테이블에 생생된 제2 데이터 셋을 이용하여 사용자의 대화 패턴을 분석하여 사용자에게 시각화하여 제공할 수 있다. 예를 들어, 컴퓨터는 제2 데이터 셋을 워드 클라우드(word cloud), 시간에 따른 누적 막대그래프(time-dependent stacked bar chart)으로 시각화하여 사용자에게 제공할 수 있다.In another embodiment, the computer may analyze the user's conversation pattern using the second data set generated in the frequency table and visualize and provide the user with the conversation pattern. For example, the computer may visualize the second data set as a word cloud, a time-dependent stacked bar chart over time, and provide it to the user.

도 6은 본 발명의 실시예에 따른 챗봇 서비스 프로그램을 생성하는 컴퓨터 시스템을 도시한 도면이다.6 is a diagram illustrating a computer system for generating a chatbot service program according to an embodiment of the present invention.

도 6을 참조하면, 챗봇 서비스 프로그램을 생성하는 컴퓨터 시스템(400)은 메모리(401), 버스(402), 프로세서(403)를 포함한다.Referring to FIG. 6, the computer system 400 for generating a chatbot service program includes a memory 401, a bus 402, and a processor 403.

메모리(401)는 상술한 빈도 테이블을 이용한 챗봇 발언 생성 방법을 실행시키기 위한 명령어와 데이터를 저장한다.The memory 401 stores instructions and data for executing the chatbot speech generation method using the above-described frequency table.

프로세서(403)는 메모리(401)에 저장되어 있는 명령어와 데이터를 해석하여 연산하고, 챗봇 시스템(500)의 챗봇 엔진으로 출력한다.The processor 403 interprets and operates instructions and data stored in the memory 401, and outputs it to the chatbot engine of the chatbot system 500.

버스(402)는 메모리(401)와 프로세서(403)를 연결하여 명령어와 데이터를 전송한다.The bus 402 connects the memory 401 and the processor 403 to transmit instructions and data.

도 7은 본 발명의 실시예에 따른 챗봇의 발언을 생성하는 챗봇 시스템을 도시한 도면이다.7 is a diagram illustrating a chatbot system that generates a chatbot's speech according to an embodiment of the present invention.

도 7를 참조하면, 챗봇의 발언을 생성하는 챗봇 시스템(500)은 사용자 단말기(501), 서비스 서버(502), 챗봇 서버(503)를 포함한다.Referring to FIG. 7, a chatbot system 500 that generates chatbot comments includes a user terminal 501, a service server 502, and a chatbot server 503.

서비스 서버(502)는 사용자에게 챗봇 서비스를 제공하고, 사용자 단말기(501)에서 사용자 발언이 입력되면 사용자 발언을 챗봇 서버(503)에 전송한다. The service server 502 provides a chatbot service to the user, and when the user's speech is input from the user terminal 501, the user's speech is transmitted to the chatbot server 503.

챗봇 서버(503)는 챗봇 엔진, 데이터 베이스를 포함한다. 챗봇 엔진은, 상술한 빈도 테이블을 이용한 챗봇 발언 생성 방법으로, 사용자 발언으로부터 챗봇 발언을 생성한다. 데이터 베이스는 서비스 서버(502)로부터 전송된 사용자 발언을 저장하고, 챗봇 발언 생성 시에 챗봇 엔진으로 사용자 발언을 전송한다.The chatbot server 503 includes a chatbot engine and a database. The chatbot engine is a chatbot speech generation method using the above-described frequency table, and generates a chatbot speech from the user speech. The database stores user comments transmitted from the service server 502, and transmits the user comments to the chatbot engine when the chatbot comments are generated.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, a software module executed by hardware, or a combination thereof. The software modules may include Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), Flash Memory, Hard Disk, Removable Disk, CD-ROM, or It may reside on any type of computer readable recording medium well known in the art.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.The embodiments of the present invention have been described above with reference to the accompanying drawings, but a person skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing its technical spirit or essential features. You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive.

300 : 의존 구문 분석 트리
400 : 컴퓨터 시스템
500 : 챗봇 시스템300: dependent parse tree
400: computer system
500: chatbot system

Claims

As a method executed by a computer,
Generating a frequency table using the user's past comments; And
Generating a chatbot's utterance for responding to the user's new utterance using the frequency table;
The step of generating the frequency table,
Generating a first dependent parse tree for one or more sentences of the user's past remarks;
Extracting a predetermined one or more first data sets defined as i) a first object corresponding to a predetermined first vertex, and ii) a second object corresponding to a predetermined second vertex from the first dependent parsing tree. ;
Generating a first table inputting the extracted one or more first data sets;
Calculating a frequency of the one or more first data sets in the first table; And
Generating the frequency table by extracting one or more second data sets having a predetermined frequency and meaningful to the user from the one or more first data sets of the first table,
Generating the frequency table by extracting one or more second data sets having a predetermined frequency and meaningful to the user from the one or more first data sets of the first table,
Generating a third dependent parsing tree for one or more sentences of past remarks of a plurality of different users;
Extracting a predetermined one or more fourth data sets defined as i) a third object corresponding to a predetermined third vertex, and ii) a fourth object corresponding to a predetermined fourth vertex from the third dependent parsing tree. ;
Generating a second table in which the one or more fourth data sets are input;
Calculating a frequency of the one or more fourth data sets in the second table;
Comparing the frequency calculation result of the fourth data set with the frequency calculation result of the first data set to determine whether the one or more first data sets are meaningful to the user; And
If the one or more data sets are determined to be a data set meaningful to a user, the method includes extracting the one or more first data sets to the one or more second data sets meaningful to the user,
Determining whether the at least one first data set is a data set meaningful to the user,
Determining a first data set having a difference between a frequency calculation result of the fourth data set and a frequency calculation result of the first data set equal to or greater than a predetermined reference value or a frequency of a corresponding fourth data set is 0 as a meaningful data set Will
The step of generating a remark of the chatbot,
Generating a second dependent parsing tree for one or more sentences of the user's new utterance;
Extracting a set of one or more third data sets defined as i) a first object corresponding to a predetermined first vertex, and ii) a second object corresponding to a predetermined second vertex from the second dependent parsing tree. ;
Comparing the one or more second data sets and the one or more third data sets in the frequency table;
Extracting the one or more third data sets as a speech candidate data set when the one or more second data sets and the one or more third data sets of the frequency table have a predetermined similarity;
Generating a conversation candidate sentence corresponding to the extracted speech candidate data set using a natural language generation algorithm; And
And generating a chatbot's remark using the dialogue candidate sentence.
How to create a chatbot remark using a frequency table.

delete

According to claim 1,
The first object or the third object is a noun phrase, and the second object or the fourth object is defined as a verb phrase
How to create a chatbot remark using a frequency table.

According to claim 1,
The first object, the second object, the third object or the fourth object,
Derivative elements are transformed into removed headwords through morphological and dictionary analysis
How to create a chatbot remark using a frequency table.

According to claim 1,
The first object, the second object, the third object or the fourth object,
When a plurality of words are combined with each other to form a meaning, a plurality of words are processed as one object
How to create a chatbot remark using a frequency table.

According to claim 1,
When the one or more second data sets of the frequency table and the one or more third data sets have a predetermined degree of similarity, extracting the one or more third data sets as a speech candidate data set,
Determining the similarity between the one or more second data sets and the one or more third data sets using a Word2Vec technique
How to create a chatbot remark using a frequency table.

A computer program stored in a computer-readable recording medium in combination with a computer to execute a method for generating a chatbot utterance using the frequency table of any one of claims 1, 3 to 6.