KR20210130976A

KR20210130976A - Device, method and computer program for deriving response based on knowledge graph

Info

Publication number: KR20210130976A
Application number: KR1020200049180A
Authority: KR
Inventors: 오경진; 임지희
Original assignee: 주식회사 케이티
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2021-11-02
Also published as: KR102398832B1

Abstract

An apparatus for deriving response on basis of knowledge graph comprises: a collection unit for collecting semantic information including subject information, object information, and predicate information between subject information and object information from an open domain; a knowledge graph generating unit that generates a knowledge graph on the basis of subject information, predicate information, and purpose information; a query analysis unit that receives query data and analyzes the input query data; a subgraph derivation unit that searches for at least one subgraph from the knowledge graph generated based on an analysis result of the query data, and derives a final subgraph from among the at least one searched subgraph; and a response derivation unit that selects at least one document from among a plurality of documents collected from the open domain on the basis of the derived final subgraph, and derives response data to the query data on the basis of a selected document. A final subgraph among at least one subgraph corresponding to an analysis result of searched query data is derived from a knowledge graph.

Description

DEVICE, METHOD AND COMPUTER PROGRAM FOR DERIVING RESPONSE BASED ON KNOWLEDGE GRAPH

본 발명은 지식 그래프에 기초하여 응답을 도출하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to an apparatus, method and computer program for deriving a response based on a knowledge graph.

기존의 질의응답 방식으로는, 지식베이스 기반 알고리즘(Knowledge Based Question Answering, KBQA)을 이용한 질의응답 방식과 정보검색 기반 알고리즘(Information Retrieval Question Answering, IRQA)을 이용한 질의응답 방식 등이 있다. 최근에는 정보검색 기반의 질의 응답 알고리즘에 딥러닝 기술을 적용한 기계독해 기반 질의응답(Machine Reading Comprehension Question Answering, MRCQA) 방식에 대한 연구가 진행되고 있다. Existing question-and-answer methods include a question-and-answer method using a Knowledge Based Question Answering (KBQA) and a Q&A method using an Information Retrieval Question Answering (IRQA) algorithm. Recently, research on a machine reading comprehension question answering (MRCQA) method in which deep learning technology is applied to an information search-based question and answer algorithm is being conducted.

지식베이스 기반 알고리즘을 이용한 질의 응답 방식은 온톨로지 방식을 사용한다. 여기서, 온톨로지 방식은 단어 간의 관계를 정의하고 생성하는 지식관리 기법이다. 지식베이스 기반 알고리즘을 이용한 질의 응답 방식은 질문의 핵심 의미를 파악하고, 이를 지식베이스에 질의하여 답변을 탐색하는 방식이다. 이러한, 질의 응답 방식은 지식베이스 내부에 존재하는 지식범주에 대한 질의응답만을 제공하기 때문에 오픈 도메인 또는 실시간성(트렌드성) 데이터에 대한 질의 응답을 도출하는 것이 불가능하다. The question-and-answer method using the knowledge base-based algorithm uses the ontology method. Here, the ontology method is a knowledge management technique that defines and creates relationships between words. The question-and-answer method using a knowledge base-based algorithm is a method of identifying the core meaning of a question and searching for an answer by querying the knowledge base. Since this question-and-answer method provides only Q&A for knowledge categories existing in the knowledge base, it is impossible to derive Q&A for open domain or real-time (trend) data.

정보검색 기반 알고리즘을 이용한 질의 응답 방식은 예상 질문 및 답변 데 이터를 미리 구축하고, 사용자 질문과 유사한 질문을 검색하여 검색된 질문에 매칭된 답변을 제공하는 방식이다. 이러한 질의 응답 방식은 예상 질문과 이에 대한 답변 데이터가 구축되지 않은 질의에 대해서 답변을 제공하기 어렵다. The question-and-answer method using an information search-based algorithm builds expected question and answer data in advance, searches for questions similar to user questions, and provides answers matched to the searched questions. In this question-and-answer method, it is difficult to provide an answer to the expected question and the question for which the answer data is not established.

기계독해 기반 질의 응답 방식은 사람이 직접 지식 구축을 하지 않아도 기계가 문서를 읽고, 질문에 대한 답변을 찾아 제시할 수 있도록 학습시키는 방식이다. 이러한 질의 응답 방식은 문서 내 정답에 대한 패턴을 학습하기 때문에 정확한 문서 검색이 전제되지 않을 경우에는 정확한 답을 도출하기 어렵다. 이와 같은 정답 패턴에 기초한 응답 도출 방식은 질의의 문맥과 관련없는 응답이 도출되는 문제가 있다. The machine reading-based question-and-answer method is a method that trains a machine to read documents, find answers to questions, and present them without a person directly building knowledge. Since this question-and-answer method learns the pattern for the correct answer in a document, it is difficult to derive an accurate answer unless an accurate document search is premised. The method of deriving a response based on such a correct answer pattern has a problem in that a response that is not related to the context of the query is derived.

기존에는 문서 검색을 위한 색인 과정을 수행할 시, 개체명에 대한 색인, 의미적 단락 단위 검색 또는 딥러닝 기술을 이용한 단락 재순위화 기반 기술을 이용하여 문서 검색순위를 재순위화한다. 하지만, 개체명 색인을 통한 검색을 이용하게 되면, 질의와 문맥이 의미적으로 일치하지 않는 검색 결과가 나오는 경우가 많고, 뉴스와 같은 오픈 도메인 또는 트렌드 문서구조에 대한 메타데이터가 없는 문서에 대해서는 기존의 개체명 색인을 통한 검색의 적용이 불가능하다.Conventionally, when performing an indexing process for document search, the document search ranking is re-ranked using an index for entity names, a semantic paragraph unit search, or a paragraph re-ranking-based technology using deep learning technology. However, when searching through the entity name index is used, search results that do not match the query and the context semantically often come out. It is not possible to apply a search through the object name index of

일본등록특허공보 제4746439호 (2011.05.20. 등록)Japanese Patent Publication No. 4746439 (Registered on May 20, 2011)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 지식 그래프로부터 탐색된 질의 데이터에 대한 분석 결과에 대응하는 적어도 하나의 부그래프 중 최종 부그래프를 도출하고, 도출된 최종 부그래프에 기초하여 선택된 문서로부터 질의 데이터에 대한 응답 데이터를 도출하고자 한다. The present invention is to solve the problems of the prior art, and derives a final subgraph among at least one subgraph corresponding to an analysis result for query data searched from a knowledge graph, and based on the derived final subgraph We want to derive response data to the query data from the selected document.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 지식 그래프에 기초하여 응답을 도출하는 장치는 오픈 도메인으로부터 주어 정보, 목적 정보 및 상기 주어 정보 및 상기 목적 정보 간의 술어 정보를 포함하는 시맨틱 정보를 수집하는 수집부; 상기 주어 정보, 상기 술어 정보 및 상기 목적 정보에 기초하여 지식 그래프를 생성하는 지식 그래프 생성부; 질의 데이터를 입력받고, 상기 입력된 질의 데이터를 분석하는 질의 분석부; 상기 질의 데이터에 대한 분석 결과에 기초하여 상기 생성된 지식 그래프로부터 적어도 하나의 부그래프를 탐색하고, 상기 탐색된 적어도 하나의 부그래프 중 최종 부그래프를 도출하는 부그래프 도출부; 및 상기 도출된 최종 부그래프에 기초하여 상기 오픈 도메인으로부터 수집된 복수의 문서 중 적어도 하나의 문서를 선택하고, 상기 선택된 문서에 기초하여 상기 질의 데이터에 대한 응답 데이터를 도출하는 응답 도출부를 포함할 수 있다.As a technical means for achieving the above-described technical problem, the apparatus for deriving a response based on the knowledge graph according to the first aspect of the present invention includes subject information, object information, and predicate information between the subject information and the object information from an open domain. a collection unit for collecting semantic information including; a knowledge graph generator configured to generate a knowledge graph based on the subject information, the predicate information, and the object information; a query analysis unit that receives query data and analyzes the input query data; a subgraph derivation unit that searches for at least one subgraph from the generated knowledge graph based on an analysis result of the query data and derives a final subgraph from among the found at least one subgraph; and a response derivation unit that selects at least one document from among the plurality of documents collected from the open domain based on the derived final subgraph, and derives response data to the query data based on the selected document. have.

본 발명의 제 2 측면에 따른 지식 그래프에 기초하여 응답을 도출하는 방법은 오픈 도메인으로부터 주어 정보, 목적 정보 및 상기 주어 정보 및 상기 목적 정보 간의 술어 정보를 포함하는 시맨틱 정보를 수집하는 단계; 상기 주어 정보, 상기 술어 정보 및 상기 목적 정보에 기초하여 지식 그래프를 생성하는 단계; 질의 데이터를 입력받는 단계; 상기 입력된 질의 데이터를 분석하는 단계; 상기 질의 데이터에 대한 분석 결과에 기초하여 상기 생성된 지식 그래프로부터 적어도 하나의 부그래프를 탐색하는 단계; 상기 탐색된 적어도 하나의 부그래프 중 최종 부그래프를 도출하는 단계; 상기 도출된 최종 부그래프에 기초하여 상기 오픈 도메인으로부터 수집된 복수의 문서 중 적어도 하나의 문서를 선택하는 단계; 및 상기 선택된 문서에 기초하여 상기 질의 데이터에 대한 응답 데이터를 도출하는 단계를 포함할 수 있다. A method for deriving a response based on a knowledge graph according to a second aspect of the present invention comprises: collecting semantic information including subject information, object information, and predicate information between the subject information and the object information from an open domain; generating a knowledge graph based on the subject information, the predicate information, and the object information; receiving query data; analyzing the input query data; searching for at least one subgraph from the generated knowledge graph based on an analysis result of the query data; deriving a final subgraph from among the searched at least one subgraph; selecting at least one document from among a plurality of documents collected from the open domain based on the derived final subgraph; and deriving response data to the query data based on the selected document.

본 발명의 제 3 측면에 따른 지식 그래프에 기초하여 응답을 도출하는 명령어들의 시퀀스를 포함하는 컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 오픈 도메인으로부터 주어 정보, 목적 정보 및 상기 주어 정보 및 상기 목적 정보 간의 술어 정보를 포함하는 시맨틱 정보를 수집하고, 상기 주어 정보, 상기 술어 정보 및 상기 목적 정보에 기초하여 지식 그래프를 생성하고, 질의 데이터에 대한 분석 결과에 기초하여 상기 생성된 지식 그래프로부터 적어도 하나의 부그래프를 탐색하고, 상기 탐색된 적어도 하나의 부그래프 중 최종 부그래프를 도출하고, 상기 도출된 최종 부그래프에 기초하여 상기 오픈 도메인으로부터 수집된 복수의 문서 중 적어도 하나의 문서를 선택하고, 상기 선택된 문서에 기초하여 상기 질의 데이터에 대한 응답 데이터를 도출하도록 하는 명령어들의 시퀀스를 포함할 수 있다. A computer program stored in a computer readable recording medium including a sequence of instructions for deriving a response based on the knowledge graph according to the third aspect of the present invention, when executed by a computing device, includes subject information, object information and the above information from an open domain. Collect semantic information including predicate information between subject information and the target information, generate a knowledge graph based on the subject information, the predicate information, and the target information, and based on the analysis result of the query data, the generated Searching for at least one subgraph from the knowledge graph, deriving a final subgraph among the found at least one subgraph, and at least one of a plurality of documents collected from the open domain based on the derived final subgraph and a sequence of instructions for selecting a document and deriving response data to the query data based on the selected document.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 지식 그래프로부터 탐색된 질의 데이터에 대한 분석 결과에 대응하는 적어도 하나의 부그래프 중 최종 부그래프를 도출하고, 도출된 최종 부그래프에 기초하여 선택된 문서로부터 질의 데이터에 대한 응답 데이터를 도출할 수 있다. According to any one of the above-described problem solving means of the present invention, the present invention derives the final subgraph among at least one subgraph corresponding to the analysis result for the query data searched from the knowledge graph, and adds it to the derived final subgraph. Based on the selected document, response data to the query data may be derived.

또한, 본 발명은 최종 부그래프 정보(개체명과 관계명)를 기초하여 질의 데이터의 문맥에 가장 적합한 문서를 검색하고, 검색된 문서에서 질의 데이터에 대한 응답 데이터를 도출할 수 있다. Also, according to the present invention, a document most suitable for the context of the query data may be searched for based on the final subgraph information (object name and relationship name), and response data to the query data may be derived from the searched document.

또한, 본 발명은 최종 부그래프 정보를 이용하여 문서 내 질의 데이터에 대한 응답 데이터의 유형을 도출할 수 있고, 응답 데이터의 유형에 따라 응답 데이터를 검증할 수 있다. In addition, according to the present invention, the type of response data to the query data in the document may be derived using the final subgraph information, and the response data may be verified according to the type of the response data.

또한, 본 발명은 개체명이 인식되지 않는 상황에서도 관계명의 의미적 인식을 통해 개체명의 도메인을 한정할 수 있기 때문에, 개체명 인식이 되지 않는 검색에서도 개체명에 대한 도메인과 추출된 관계명을 통해 검색의 정확도를 높일 수 있다. In addition, since the present invention can limit the domain of the entity name through semantic recognition of the relation name even in a situation where the entity name is not recognized, the domain for the entity name and the extracted relation name are searched even in a search where the entity name is not recognized. can increase the accuracy of

도 1은 본 발명의 일 실시예에 따른, 응답 도출 장치의 블록도이다.
도 2a 내지 2g는 본 발명의 일 실시예에 따른, 지식 그래프에 기초하여 응답을 도출하는 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 다른 실시예에 따른, 지식 그래프에 기초하여 응답을 도출하는 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른, 지식 그래프에 기초하여 응답을 도출하는 방법을 나타낸 흐름도이다. 1 is a block diagram of an apparatus for deriving a response, according to an embodiment of the present invention.
2A to 2G are diagrams for explaining a method of deriving a response based on a knowledge graph, according to an embodiment of the present invention.
3 is a diagram for explaining a method of deriving a response based on a knowledge graph, according to another embodiment of the present invention.
4 is a flowchart illustrating a method of deriving a response based on a knowledge graph according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that other components may be further included, rather than excluding other components, unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal or device in the present specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다. Hereinafter, detailed contents for carrying out the present invention will be described with reference to the accompanying configuration diagram or process flow diagram.

도 1은 본 발명의 일 실시예에 따른, 응답 도출 장치(10)의 블록도이다. 1 is a block diagram of an apparatus 10 for deriving a response, according to an embodiment of the present invention.

도 1을 참조하면, 응답 도출 장치(10)는 수집부(100), 지식 그래프 생성부(110), 질의 분석부(120), 부그래프 도출부(130), 응답 도출부(140) 및 색인부(150)를 포함할 수 있다. 다만, 도 1에 도시된 응답 도출 장치(10)는 본 발명의 하나의 구현 예에 불과하며, 도 1에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. Referring to FIG. 1 , the response derivation apparatus 10 includes a collection unit 100 , a knowledge graph generation unit 110 , a query analysis unit 120 , a subgraph derivation unit 130 , a response derivation unit 140 , and an index. part 150 may be included. However, the response deriving apparatus 10 shown in FIG. 1 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 1 .

색인부(150)는 기등록된 개체명 사전 및 관계명 사전을 통해 개체명(표제어)가 인식된 경우, 개체명 사전 및 관계명 사전을 이용한 개체명 인식 결과에 기초하여 수집된 복수의 문서(예컨대, 실시간/트렌드성 뉴스, 웹문서 등)에 포함된 문장에 대해 색인 프로세스를 수행할 수 있다. 또한, 색인부(150)는 개체명 인식 결과에 따라 각 문장의 주어 정보 및 목적 정보 각각에 개체명 및 어휘 정보를 색인하고, 각 문장의 술어 정보에 개체명 및 어휘 정보를 색인할 수 있다. 여기서, 개체명 사전은 개체명(예컨대, 인명, 기관명, 지명 등의 고유한 의미를 갖는 명사들)을 정의한 사전이고, 관계명 사전은 개체 간의 관계(예컨대, 용언)를 정의한 사전이다. 개체명 사전 및 관계명 사전은 예를 들어, 인물, 장소, 조직, 음식, 문화컨텐츠, 용어, 제품, 사건, 인공물 등에 대한 범주를 중심으로 구축될 수 있다. The indexing unit 150 includes a plurality of documents ( For example, an indexing process may be performed on sentences included in real-time/trend news, web documents, etc.). Also, the indexing unit 150 may index entity name and vocabulary information into subject information and object information of each sentence according to the entity name recognition result, and index entity name and vocabulary information into predicate information of each sentence. Here, the entity name dictionary is a dictionary defining entity names (eg, nouns having unique meanings such as person names, organization names, and geographical names), and the relation name dictionary is a dictionary defining relationships (eg, verbs) between entities. The entity name dictionary and relation name dictionary may be built around categories for, for example, people, places, organizations, food, cultural contents, terms, products, events, artifacts, and the like.

예를 들어, '트럼프의 당선으로 멜라니아는 미국의 두 번째 이민자 출신이자 첫 공산권 국가 출신 퍼스트 레이디가 됐다'라는 제 1 문장에 대한 개체명 인식 결과는 [도널드 트럼프/Person, 멜라니아 트럼프/Person, 미국/Nation, 이민자/RQCWho, 퍼스트 레이디/PoliticalPerson, 멜라니/Singer. 두번째/Sequence]이다. 이 때, 색인부(150)는 개체명 인식 결과에 기초하여 제 1 문장의 주어 정보 및 목적 정보 각각에 개체명(예컨대, Person) 및 어휘 정보(예컨대, 도널드 트럼프)를 색인할 수 있다. 또한, 색인부(150)는 개체명 인식 결과에 기초하여 제 1 문장의 술어 정보에 개체명(예컨대, IsA) 및 어휘 정보(예컨대, 됐다)를 색인할 수 있다. For example, the result of recognition of individual names in the first sentence of 'Trump's election made Melania the second immigrant to the United States and the first first lady from a communist country' is [Donald Trump/Person, Melania Trump/Person, USA/Nation, Immigrant/RQCWho, First Lady/PoliticalPerson, Melanie/Singer. second/Sequence]. In this case, the indexing unit 150 may index an entity name (eg, Person) and vocabulary information (eg, Donald Trump) to each of the subject information and the object information of the first sentence based on the entity name recognition result. Also, the indexing unit 150 may index the entity name (eg, IsA) and vocabulary information (eg, it was) in the predicate information of the first sentence based on the entity name recognition result.

색인부(150)는 형태소 분석 결과 및 토크나이저(Tokenizer) 결과에 더 기초하여 수집된 복수의 문서에 포함된 문장에 대해 색인 프로세스를 수행할 수 있다. 예를 들어, 제 1 문장에 대한 형태소 분석 결과는 [트럼프/NNP+의/JKG+당선/NNG+으로/JKB+멜라니아/NNG+는/JX+미국/NNP+의/JKG+두/MM+번/NNB+째/XSN+이민자/NNG+출신/NNG+이/VCP+자/EC+첫/MM+공산/NNG+권/XSN+국가/NNG+출신/NNG+퍼스트/NNG+레이디/NNG+가/JKS+되/VV+었/EP+다/EF+./SF] 이고, 토크나이저 결과는 [트럼프 의 당선 으로 멜 라니 아는 미국 의 두 번째 이민 자 출신 이자 첫 공산 권 국가 출신 퍼 스트 레이 디가 됐다]이다. 이 때, 토크나이저 결과에서 '_'는 입력 문장에서 공백을 의미한다. The indexing unit 150 may perform an indexing process on sentences included in a plurality of collected documents based on the morpheme analysis result and the tokenizer result. For example, the morphological analysis result for the first sentence is [Trump/NNP+'s/JKG+Elected/NNG+/JKB+Melania/NNG+/JX+US/NNP+'s/JKG+Double/MM+No./NNB+th/XSN+Immigrant/ From NNG+/NNG+Lee/VCP+Ja/EC+First/MM+Communist/NNG+Kwon/XSN+Country/NNG+From/NNG+First/NNG+Lady/NNG+A/JKS+Be/VV+Had/EP+Da/EF+./SF] and Talk The Niger result is [Trump's election made Melania the second immigrant to the United States and the first First Lady from a communist country]. In this case, '_' in the tokenizer result means a space in the input sentence.

색인부(150)는 문장의 주어 정보, 목적 정보, 술어 정보 각각에 대응하는 개체명을 색인할 때, 기등록된 유의어 사전에 포함된 키워드를 이용하여 색인 프로세스를 수행할 수 있다. 여기서, 유의어 사전은 개체명 및 관계명 각각에 대한 유의어(예를 들어, 의미가 비슷한 말)를 모아놓은 사전이다. When indexing an entity name corresponding to each of subject information, object information, and predicate information of a sentence, the indexing unit 150 may perform an indexing process using keywords included in a pre-registered thesaurus. Here, the thesaurus is a dictionary in which synonyms (eg, words having similar meanings) for each of the entity name and the relation name are collected.

색인부(150)는 하나의 문장의 주어 정보, 목적 정보 및 술어 정보 중에서 개체명이 복수개 인식되는 경우, 복수의 개체명 간의 거리 정보를 주어 정보, 목적 정보 및 술어 정보 중 해당 개체명이 색인된 위치에 색인 프로세스를 수행할 수 있다. 이 때, 개체명 간의 거리 정보는 추후 문서 검색에 대한 랭킹을 산출할 때 사용될 수 있다. When a plurality of entity names are recognized among subject information, object information, and predicate information of a single sentence, the indexing unit 150 provides distance information between a plurality of entity names to index the corresponding entity name among information, object information, and predicate information. The indexing process can be performed. In this case, the distance information between the object names may be used when calculating a ranking for a document search later.

수집부(100)는 오픈 도메인으로부터 주어 정보, 목적 정보 및 주어 정보 및 목적 정보 간의 술어 정보를 포함하는 시맨틱 정보를 수집할 수 있다. 예를 들어, 수집부(100)는 '도널드트럼프(주어 정보)는 뉴욕주퀸즈(목적 정보)에서 태어났다(술어 정보)'를 포함하는 시멘틱 정보를 수집할 수 있다. The collection unit 100 may collect subject information, object information, and semantic information including predicate information between subject information and object information from the open domain. For example, the collection unit 100 may collect semantic information including 'Donald Trump (subject information) was born in Queens, New York (purpose information) (predicate information)'.

지식 그래프 생성부(110)는 수집된 시맨틱 정보에 포함된 주어 정보, 술어 정보 및 목적 정보에 기초하여 지식 그래프를 주그래프로 생성할 수 있다. 예를 들어, 지식 그래프 생성부(110)는 '도널드트럼프는 멜라니아트럼프와 결혼했다'를 포함하는 시멘틱 정보로부터 '도널드트럼프'를 주어 정보로서 추출하고, '멜라니아트럼프'를 목적 정보로서 추출하고, '결혼했다'를 술어 정보로서 추출할 수 있다.The knowledge graph generating unit 110 may generate the knowledge graph as a main graph based on subject information, predicate information, and object information included in the collected semantic information. For example, the knowledge graph generating unit 110 extracts 'Donald Trump' from semantic information including 'Donald Trump married Melania Trump' as information, and extracts 'Melania Trump' as target information. and 'married' can be extracted as predicate information.

지식 그래프 생성부(110)는 주어 정보 및 목적 정보에 대응하는 노드를 생성하고, 각 노드 간의 술어 정보를 나타내는 엣지를 생성하고, 생성된 노드 및 엣지에 기초하여 지식 그래프를 생성할 수 있다. 여기서, 지식 그래프는 하나의 지식을 구성하는 트리플 집합(주어 정보, 목적 정보 및 술어 정보)이 노드와 엣지로 표현된 구조를 갖는다. 여기서, 노드는 주어 정보, 목적 정보를 나타내고, 엣지는 술어 정보를 나타낸다. 노드는 인스턴스와 리터럴(literal) 값으로 구분될 수 있다. 예를 들어, 도 2b를 참조하면, 인스턴스는 '도널드 트럼프, 슬로베니아, 미국 등'과 같은 개체명을 의미하고, 리터럴값은 '1946-06-14'와 같은 상수값을 의미한다. The knowledge graph generating unit 110 may generate a node corresponding to the subject information and the object information, generate an edge indicating predicate information between each node, and generate a knowledge graph based on the generated node and the edge. Here, the knowledge graph has a structure in which a triple set (subject information, object information, and predicate information) constituting one knowledge is expressed by nodes and edges. Here, a node represents subject information and object information, and an edge represents predicate information. Nodes can be divided into instances and literal values. For example, referring to FIG. 2B , an instance means an entity name such as 'Donald Trump, Slovenia, USA, etc.', and a literal value means a constant value such as '1946-06-14'.

예를 들어, 도 2a를 참조하면, 지식 그래프 생성부(110)는 주어 정보에 해당하는 '도널드트럼프'와 목적 정보에 해당하는 '멜라니아트럼프' 각각을 노드(201, 205)로서 생성하고, '도널드트럼프'에 대응하는 노드(201)와 '멜라니아트럼프'에 대응하는 노드(205) 간의 술어 정보에 해당하는 'isMarriedTo'를 엣지(203)로서 생성할 수 있다. 또한, 지식 그래프 생성부(110)는 '도널드트럼프'에 대응하는 노드(201), '멜라니아트럼프'에 대응하는 노드(205) 및 'isMarriedTo'에 대응하는 엣지(203)를 이용하여 지식 그래프를 생성할 수 있다. For example, referring to FIG. 2A , the knowledge graph generating unit 110 generates 'Donald Trump' corresponding to subject information and 'Melania Trump' corresponding to target information as nodes 201 and 205, respectively, 'isMarriedTo' corresponding to predicate information between the node 201 corresponding to 'Donald Trump' and the node 205 corresponding to 'Melania Trump' may be generated as the edge 203 . In addition, the knowledge graph generating unit 110 uses the node 201 corresponding to 'Donald Trump', the node 205 corresponding to 'Melania Trump', and the edge 203 corresponding to 'isMarriedTo' to the knowledge graph. can create

지식 그래프 생성부(110)는 실시간/트랜드성 정보를 포함하는 뉴스 기사(뉴스 문서)에 포함된 시멘틱 정보에 기초하여 지식 그래프를 생성할 수 있다(도 2b 참조). 예를 들어, 지식 그래프 생성부(110)는 뉴스의 경우, 정치, 경제, 사회, 생활, 문화, 세계, IT/과학, 스포츠 등과 같이 각 분야로 분류하여 각 도메인에 대응하는 지식 그래프를 생성할 수 있다. 또한, 지식 그래프 생성부(110)는 스포츠 도메인의 경우, 야구, 축구, 배구, 농구, 골프, e 스포츠 등으로 구분하여 지식 그래프를 생성할 수 있다. The knowledge graph generating unit 110 may generate a knowledge graph based on semantic information included in a news article (news document) including real-time/trend information (refer to FIG. 2B ). For example, the knowledge graph generating unit 110 generates a knowledge graph corresponding to each domain by classifying the news into each field such as politics, economy, society, life, culture, world, IT/science, sports, etc. can In addition, the knowledge graph generating unit 110 may generate a knowledge graph by dividing the sports domain into baseball, soccer, volleyball, basketball, golf, e-sports, and the like.

지식 그래프 생성부(110)는 기등록된 개체명 사전 및 관계명 사전에 기초하여 지식 그래프에 포함된 노드 또는 엣지에 대한 스키마(Schema) 정보에 개체명 또는 관계명을 매핑시킬 수 있다. 구체적으로, 지식 그래프 생성부(110)는 지식 그래프에 포함된 노드에 대한 클래스 정보에 개체명을 매핑시키고, 엣지에 대한 속성/관계 정보(즉, 클래스 정보 간의 관계)에 관계명을 매핑시킬 수 있다. 예를 들어, 개체명 인식결과에서 '멜라니아 트럼프/ Person'을 살펴보면, Person은 개체명인 동시에 지식그래프에서 '멜라니아 트럼프'라는 인스턴스의 클래스 정보에 해당된다. The knowledge graph generating unit 110 may map an entity name or a relation name to schema information about a node or an edge included in the knowledge graph based on the previously registered entity name dictionary and relation name dictionary. Specifically, the knowledge graph generating unit 110 may map the entity name to the class information for the node included in the knowledge graph, and map the relation name to the attribute/relationship information (ie, the relationship between class information) about the edge. have. For example, if you look at 'Melania Trump/Person' in the entity name recognition result, Person corresponds to both the entity name and the class information of the instance 'Melania Trump' in the knowledge graph.

지식 그래프 생성부(110)는 기등록된 유의어 사전에 기초하여 지식 그래프의 노드에 매핑된 개체명과 엣지에 매핑된 관계명 각각과 관련된 유의어를 지식 그래프를 통해 관리할 수 있다. 예를 들어, '줄거리'에 대한 유의어는 '요약'이 될 수 있고, '저자'에 대한 유의어는 '글쓴이'가 될 수 있다. The knowledge graph generating unit 110 may manage the synonyms associated with each of the entity names mapped to the nodes of the knowledge graph and the relation names mapped to the edges based on the pre-registered thesaurus through the knowledge graph. For example, a synonym for 'plot' may be 'summary', and a synonym for 'author' may be 'writer'.

질의 분석부(120)는 질의 데이터를 입력받고, 입력된 질의 데이터를 분석할 수 있다. 또한, 질의 분석부(120)는 입력된 질의 데이터에 대한 자연어처리를 수행한 후, 자연어처리된 질의 데이터를 분석할 수 있다. 여기서, 자연어처리는 기등록된 개체명 사전 및 관계명 사전을 이용한 개체명 인식 결과, 형태소 분석 결과 및 토크나이저 결과를 이용하여 수행될 수 있다. The query analysis unit 120 may receive query data and analyze the input query data. Also, the query analysis unit 120 may analyze the natural language-processed query data after performing natural language processing on the input query data. Here, the natural language processing may be performed using the entity name recognition result using the pre-registered entity name dictionary and relation name dictionary, morpheme analysis result, and tokenizer result.

질의 분석부(120)는 개체명 인식 결과에 기초하여 질의 데이터를 분석할 수 있다. 또한, 질의 분석부(120)는 형태소 분석 결과 및 토크나이저 결과에 더 기초하여 질의 데이터를 더 분석할 수 있다. The query analysis unit 120 may analyze the query data based on the entity name recognition result. Also, the query analyzer 120 may further analyze the query data based on the morpheme analysis result and the tokenizer result.

예를 들어, 질의 데이터가 '멜라니아 트럼프가 몇번째 이민자 출신 퍼스트 레이디야'인 경우, 개체명 인식 결과는 [멜라니아 트럼프/Person, 멜라니/Singer, 도날드 트럼프/Person, 이민자/RQCWho, 퍼스트 레이디/PoliticalPerson, 몇번째/Sequence]이고, 형태소 분석 결과는 [멜라니/NNP+아/JKV+트럼프/NNG+가/JKS+몇/MM+번/NNB+째/XSN+이민자/NNG+출신/NNG+퍼스트/NNG+레이디/NNG+이/VCP+야/EF]이고, 토크나이저 결과는 [멜 라 니아 트럼프 가 몇번 째 이민 자 출신 퍼 스트 레이 디 야]이다. For example, if the query data is 'Melania Trump is the first lady from which immigrant', the object name recognition result is [Melania Trump/Person, Melanie/Singer, Donald Trump/Person, Immigrant/RQCWho, First Lady. /PoliticalPerson, number/Sequence], and the result of morphological analysis is [Melanie/NNP+ah/JKV+Trump/NNG+Ka/JKS+Several/MM+#/NNB+th/XSN+Immigrant/NNG+ native/NNG+First/NNG+Lady/NNG+Lee/ VCP+Ya/EF], and the tokenizer result is [Melania Trump is the First Lady of the First Immigrants].

질의 분석부(120)는 자연어처리된 질의 데이터의 분석 결과로부터 복수의 토큰을 추출할 수 있다. 구체적으로, 질의 분석부(120)는 질의 데이터에 대한 개체명 인식 결과, 형태소 분석 결과 및 토크나이저 결과로부터 복수의 토큰을 추출할 수 있다. 여기서, 복수의 토큰은 지식 그래프의 노드 및 엣지를 검색할 때 이용된다. The query analysis unit 120 may extract a plurality of tokens from the analysis result of the query data processed by the natural language. Specifically, the query analysis unit 120 may extract a plurality of tokens from the entity name recognition result, the morpheme analysis result, and the tokenizer result for the query data. Here, the plurality of tokens are used when searching for nodes and edges of the knowledge graph.

예를 들어, 질의 분석부(120)는 개체명 인식 결과에 해당하는 [멜라니아 트럼프/Person]로부터 '멜라니아 트럼프' 토큰을 추출하고, 형태소 분석 결과에 해당하는 [트럼프/NNG]로부터 '트럼프' 토큰을 추출하고, 토크나이저 결과로부터 '트럼프' 토큰을 추출할 수 있다. For example, the query analysis unit 120 extracts a 'Melania Trump' token from [Melania Trump/Person] corresponding to the entity name recognition result, and 'Trump' from [Trump/NNG] corresponding to the morphological analysis result. ' tokens can be extracted, and 'Trump' tokens can be extracted from the tokenizer result.

또한, 질의 분석부(120)는 유의어 사전으로부터 복수의 토큰과 관련된 유의어를 추출할 수 있다. 이 때, 추출된 복수의 토큰과 관련된 유의어는 지식 그래프의 노드 및 엣지를 검색할 때 활용될 수 있다. Also, the query analysis unit 120 may extract synonyms related to a plurality of tokens from the thesaurus. In this case, synonyms related to the plurality of extracted tokens may be utilized when searching for nodes and edges of the knowledge graph.

검색부(미도시)는 질의 분석부(120)에 의해 추출된 복수의 토큰을 이용하여 지식 그래프의 노드 및 엣지를 검색할 수 있다. 예를 들어, 도 2c를 참조하면, 검색부(미도시)는 '멜라니아 트럼프가 몇번째 이민자 출신 퍼스트 레이디야'로 구성된 질의 데이터에 대한 분석 결과로부터 추출된 복수의 토큰을 이용하여 노드에 해당하는 {'도널드 트럼프', '트럼프', '멜라니아 트럼프','여자', '퍼스트레이디'}와 엣지에 해당하는 {'hasCountry', 'immigrator'}를 검색할 수 있다. The search unit (not shown) may search for nodes and edges of the knowledge graph using the plurality of tokens extracted by the query analysis unit 120 . For example, referring to FIG. 2C , the search unit (not shown) corresponds to a node by using a plurality of tokens extracted from the analysis result of the query data composed of 'Melania Trump is the first lady from which immigrant is' You can search for {'donald trump', 'trump', 'melania trump', 'woman', 'first lady'} and edge {'hasCountry', 'immigrator'}.

부그래프 도출부(130)는 질의 데이터에 대한 분석 결과에 기초하여 지식 그래프로부터 적어도 하나의 부그래프(Sub-graph)를 탐색하고, 탐색된 적어도 하나의 부그래프 중 최종 부그래프를 도출할 수 있다. 예를 들어, 도 2c 및 2d를 함께 참조하면, 부그래프 도출부(130)는 복수의 토큰을 통해 검색된 노드('도널드 트럼프', '트럼프', '멜라니아 트럼프', '여자', '퍼스트레이디') 및 엣지('hasCountry', 'immigrator')를 이용하여 지식 그래프로부터 {도널드 트럼프-FamilyName-트럼프}로 구성된 제 1 부그래프, {도널드 트럼프-isMarriedTo-멜라니아 트럼프}로 구성된 제 2 부그래프, {멜라니아 트럼프-hasCountry-슬로베니아}로 구성된 제 3 부그래프, {멜라니아 트럼프-gender-여자}로 구성된 제 4 부그래프, {멜라니아 트럼프-position-퍼스트레이디}로 구성된 제 5 부그래프, {멜라니아 트럼프-immigrator-yes}로 구성된 제 6 부그래프를 탐색할 수 있다. The sub-graph derivation unit 130 may search for at least one sub-graph from the knowledge graph based on the analysis result of the query data, and derive a final sub-graph from among the at least one searched sub-graph. . For example, referring to FIGS. 2C and 2D together, the subgraph derivation unit 130 determines the nodes searched through a plurality of tokens ('Donald Trump', 'Trump', 'Melania Trump', 'Female', 'First Part 1 consisting of {Donald Trump-FamilyName-Trump} from the knowledge graph using 'Lady') and Edge ('hasCountry', 'immigrator'), Part 2 consisting of {Donald Trump-isMarriedTo-Melania Trump} Graph, subgraph 3 composed of {Melania trump-hasCountry-Slovenia}, subgraph 4 composed of {melania trump-gender-woman}, subgraph 5 composed of {melania trump-position-first lady} , {Melania Trump-immigrator-yes} can be searched for the sixth subgraph.

예를 들어, 부그래프 도출부(130)는 질의 데이터의 분석 결과에 따른 지식 그래프의 적어도 하나의 노드 중 인스턴스에 해당하는 노드들을 알파벳 순서로 정렬할 수 있다. 또한, 부그래프 도출부(130)는 알파벳 순서로 정렬된 노드들 중 첫번째 노드를 기준으로 지식 그래프에서 부그래프의 탐색하는 과정에서 첫번째 노드와 연결된 엣지를 검색할 수 있다. 또한, 부그래프 도출부(130)는 질의 데이터의 분석 결과에 따른 엣지 리스트에 검색된 첫번째 노드와 연결된 엣지가 있는 경우, 첫번째 노드, 엣지, 엣지와 연결된 확장 노드를 연결하여 부그래프를 확장할 수 있다. 만일, 첫번째 노드에 연결된 엣지가 질의 데이터의 분석 결과에 따른 엣지 리스트에 없는 경우, 부그래프 도출부(130)는 첫번째 노드에 연결된 모든 엣지와 엣지에 연결된 노드를 임시 부그래프로 확장할 수 있다. 또한, 부그래프 도출부(130)는 확장 노드에 대하여 확장 노드와 연결된 다른 엣지를 검색하고, 검색된 다른 엣지가 질의 데이터의 분석 결과에 따른 엣지 리스트에 존재하는지 여부에 기초하여 확장 노드 및 다른 엣지와의 부그래프 확장을 결정할 수 있다. 또한, 부그래프 도출부(130)는 확장 노드가 현재 부그래프에 포함되는 경우, 부그래프 확장을 종료할 수 있다. 부그래프 도출부(130)는 확장된 부그래프에서 인스턴스에 해당하는 노드를 기준으로 지식 그래프에서 부그래프 탐색을 앞서 설명한 방식대로 수행할 수 있다. For example, the subgraph derivation unit 130 may sort nodes corresponding to instances among at least one node of the knowledge graph according to the analysis result of the query data in alphabetical order. Also, the subgraph derivation unit 130 may search for an edge connected to the first node in the process of searching the subgraph in the knowledge graph based on the first node among the nodes arranged in alphabetical order. In addition, when there is an edge connected to the first node found in the edge list according to the analysis result of the query data, the subgraph derivation unit 130 may extend the subgraph by connecting the first node, the edge, and the extension node connected to the edge. . If the edge connected to the first node is not in the edge list according to the analysis result of the query data, the subgraph derivation unit 130 may extend all edges connected to the first node and nodes connected to the edge to a temporary subgraph. In addition, the subgraph derivation unit 130 searches for another edge connected to the extension node with respect to the extension node, and based on whether the searched other edge exists in the edge list according to the analysis result of the query data, the extension node and the other edge It is possible to determine the subgraph extension of . Also, when the extension node is included in the current subgraph, the subgraph derivation unit 130 may end the subgraph extension. The subgraph derivation unit 130 may perform the subgraph search in the knowledge graph based on the node corresponding to the instance in the extended subgraph in the manner described above.

부그래프 도출부(130)는 탐색된 적어도 하나의 부그래프를 기학습된 딥러닝 네트워크에 입력하여 최종 부그래프를 도출할 수 있다. 예를 들어, 도 2e를 참조하면, 부그래프 도출부(130)는 질의 데이터에 대한 분석 결과에 따라 지식 그래프로부터 탐색된 복수의 부그래프를 딥러닝 네트워크에 입력하고, 딥러닝 네트워크를 통해 추출된 부그래프를 최종 부그래프로 결정할 수 있다. The subgraph derivation unit 130 may derive the final subgraph by inputting at least one found subgraph to the pre-learned deep learning network. For example, referring to FIG. 2E , the subgraph derivation unit 130 inputs a plurality of subgraphs found from the knowledge graph according to the analysis result of the query data into the deep learning network, and the subgraph extracted through the deep learning network. A subgraph may be determined as the final subgraph.

부그래프 도출부(130)는 추출된 복수의 부그래프 각각으로부터 트리플 집합(즉, 노드-엣지-노드)을 추출할 수 있다. 여기서, 트리플 집합은 각각의 부그래프에 포함된 적어도 둘의 노드에 대한 정보 및 적어도 둘의 노드 간을 연결시키는 엣지에 대한 정보를 포함할 수 있다. 예를 들어, 트리플 집합은 멜라니아 트럼프(노드)-position(엣지)-퍼스트레이디(노드)가 될 수 있다. The subgraph derivation unit 130 may extract a triple set (ie, node-edge-node) from each of the plurality of extracted subgraphs. Here, the triple set may include information on at least two nodes included in each subgraph and information on an edge connecting the at least two nodes. For example, the triple set may be Melania Trump (node)-position (edge)-first lady (node).

여기서, 트리플 집합을 구성하는 적어도 둘의 노드와, 적어도 둘의 노드 간을 연결하는 엣지는 지식 그래프에 대한 스키마 정보에 매핑되어 있다. 구체적으로, 트리플 집합의 구성 요소인 (노드, 엣지, 노드)는 스키마 정보의 (클래스 정보, 속성/관계 정보, 클래스 정보)에 매핑되어 있다. 다른 예를 들어, 트리플 집합인 (멜라니아 트럼프, position, 퍼스트레이디)는 스키마 정보의 (Person 클래스 정보, 속성/관계 정보, 리터럴 클래스 정보)에 매핑되어 있다. Here, at least two nodes constituting the triple set and an edge connecting the at least two nodes are mapped to schema information for the knowledge graph. Specifically, (node, edge, node), which are components of the triple set, is mapped to (class information, attribute/relationship information, class information) of schema information. As another example, a triple set (Melania Trump, position, first lady) is mapped to schema information (Person class information, attribute/relationship information, and literal class information).

부그래프 도출부(130)는 추출된 트리플 집합을 2차원 형태의 입력 벡터로 구성할 수 있다. 이러한 입력 벡터는 해당 입력 벡터를 구성하는 트리플 집합이 추출된 부그래프와 대응될 수 있다. The subgraph derivation unit 130 may configure the extracted triple set as an input vector in a two-dimensional form. Such an input vector may correspond to a subgraph from which a triple set constituting the corresponding input vector is extracted.

부그래프 도출부(130)는 각각의 부그래프에 대응하는 입력 벡터의 행 또는 열에 지식 그래프에 대한 스키마 정보를 매핑시킬 수 있다. 여기서, 입력 벡터의 행과 열은 클래스와 관계명에 대한 정보를 저장하고 있다. 예를 들어, 입력 벡터의 각 행에는 클래스 정보(예컨대, 1행은 정치인 클래스, 2행은 배우 클래스, 3행은 가수 클래스 등)가 저장되어 있고, 입력 벡터의 각 열에는 관계 정보(예컨대, 1열은 이름, 2열은 국적, 3열은 출생일 등)가 저장되어 있다. The subgraph derivation unit 130 may map schema information for the knowledge graph to a row or column of an input vector corresponding to each subgraph. Here, the row and column of the input vector store information on class and relationship names. For example, class information (eg, row 1 is a politician class, row 2 is an actor class, row 3 is a singer class, etc.) is stored in each row of the input vector, and relationship information (eg, Column 1 is the name, column 2 is the nationality, column 3 is the date of birth, etc.).

구체적으로, 부그래프 도출부(130)는 부그래프에 대응하는 입력 벡터의 행에 지식 그래프에 대한 스키마 정보 중 클래스 정보를 매핑시키고, 입력 벡터의 열에 지식 그래프에 대한 스키마 정보 중 속성/관계 정보를 매핑시킬 수 있다. 예를 들어, 도 2f를 참조하면, 부그래프 도출부(130)는 {멜라니아 트럼프-position-퍼스트레이디}로 구성된 제 1 부그래프에 대응하는 제 1 입력 벡터(207)의 1행에 Person 클래스 정보를 매핑하고, 1열에 position 속성/관계 정보를 매핑할 수 있다. 부그래프 도출부(130)는 {멜라니아 트럼프-hasCountry-슬로베니아}로 구성된 제 2 부그래프에 대응하는 제 2 입력 벡터(209)의 2열에 hasCountry 속성/관계 정보를 매핑하고, 3행에 Nation 클래스 정보를 매핑하고, 3열에 hasPeople 속성/관계 정보를 매핑할 수 있다. 이와 같이, 각 부그래프는 각 부그래프로부터 추출된 트리플 정보(또는 이로 구성되는 입력 벡)에 기초하여 2차원 행렬로 표현될 수 있다. 여기서, 사용자의 질의에 대한 모든 부그래프는 입력벡터로 표현할 수 있다. Specifically, the subgraph derivation unit 130 maps class information among the schema information for the knowledge graph to the row of the input vector corresponding to the subgraph, and provides attribute/relationship information among the schema information for the knowledge graph to the column of the input vector. can be mapped. For example, referring to FIG. 2F , the subgraph derivation unit 130 places the Person class in the first row of the first input vector 207 corresponding to the first subgraph composed of {Melania Trump-position-First Lady} Information can be mapped, and position property/relationship information can be mapped to column 1. The subgraph derivation unit 130 maps hasCountry attribute/relationship information to column 2 of the second input vector 209 corresponding to the second subgraph composed of {Melania Trump-hasCountry-Slovenia}, and Nation class in row 3 Information can be mapped, and hasPeople attribute/relation information can be mapped in column 3. In this way, each subgraph may be expressed as a two-dimensional matrix based on triple information (or an input vector composed of it) extracted from each subgraph. Here, all subgraphs for the user's query can be expressed as input vectors.

예를 들어, '코로나가 언제 시작됐어?'를 포함하는 질의 데이터를 수신한 경우, '코로나', '시작' 등의 키워드, '코로나 19 바이러스' 개체명과 '발생일' 관계명을 이용하여 탐색된 복수의 부그래프 각각의 노드 및 엣지가 속해있는 클래스 및 관계명에 기초하여 2차원 행렬이 생성될 수 있다. 부그래프 도출부(130)는 도 2f와 같이 추출된 각 부그래프의 노드 및 엣지에 해당하는 클래스 및 관계명의 행렬 위치에 임베딩을 위한 초기값을 설정할 수 있다. For example, if query data including 'When did the corona start?' A two-dimensional matrix may be generated based on the class and relationship name to which the node and edge of each of the plurality of subgraphs belong. The subgraph derivation unit 130 may set an initial value for embedding at the matrix positions of class and relation names corresponding to nodes and edges of each extracted subgraph as shown in FIG. 2F .

또한, 부그래프 도출부(130)는 각 부그래프 각각에 대한 2차원 행렬을 합하여 2차원 벡터값을 계산할 수 있다. 부그래프 도출부(130)는 추출된 복수의 그래프가 N개인 경우, N개의 2차원 행렬이 표현되고, N개의 2차원 행렬에 대한 최종 합집합을 생성할 수 있다. 이 때, 각 2차원 행렬 내 2차원 벡터값은 트리플 집합이 중복되어도 별도의 값을 차지하지 않고, 예를 들어, 도 2f의 맨 우측 행렬과 같이 기설정된 값(예컨대, 1)로 유지될 수 있다. 이를 통해, 본 발명은 심층신경망(예컨대, CNN)의 입력으로 2차원 벡터값을 사용하기 때문에 사용자 질의에 가장 적합하게 매칭되는 부그래프를 추론할 수 있다. Also, the subgraph derivation unit 130 may calculate a two-dimensional vector value by summing the two-dimensional matrix for each subgraph. When the plurality of extracted graphs is N, the subgraph derivation unit 130 may represent N two-dimensional matrices and generate a final union of the N two-dimensional matrices. At this time, the two-dimensional vector value in each two-dimensional matrix does not occupy a separate value even if the triple set overlaps, for example, it can be maintained as a preset value (eg, 1) as in the rightmost matrix of FIG. 2F. have. Through this, since the present invention uses a two-dimensional vector value as an input of a deep neural network (eg, CNN), it is possible to infer a subgraph that best matches a user query.

부그래프 도출부(130)는 각각의 부그래프로부터 추출된 트리플 집합을 각 부그래프에 대응하는 입력 벡터에 매핑시키고, 트리플 집합이 매핑된 입력 벡터를 딥러닝 네트워크에 입력할 수 있다. 또한, 부그래프 도출부(130)는 각 부그래프 각각에 대한 2차원 행렬를 합한 2차원 벡터값을 딥러닝 네트워크에 입력할 수 있다. 이 때, 딥러닝 네트워크는 각 부그래프 별 트리플 집합이 매핑된 입력 벡터에 기초하여 출력값을 도출할 수 있다. 이 때 도출된 출력값은 최종 부그래프(즉, 질의 데이터에 대한 최종 부그래프)에 대응하는 출력값일 수 있다. The subgraph derivation unit 130 may map a set of triples extracted from each subgraph to an input vector corresponding to each subgraph, and input the input vector to which the set of triples is mapped to the deep learning network. In addition, the subgraph derivation unit 130 may input a two-dimensional vector value obtained by summing the two-dimensional matrix for each subgraph to the deep learning network. In this case, the deep learning network may derive an output value based on the input vector to which the triple set for each subgraph is mapped. In this case, the derived output value may be an output value corresponding to the final subgraph (ie, the final subgraph for the query data).

부그래프 도출부(130)는 기학습된 딥러닝 네트워크에 기초하여 트리플 집합(복수의 부그래프 각각으로부터 추출된 트리플 집합) 및 질의 데이터에 대한 분석 결과 간의 유사도를 도출하고, 도출된 유사도에 기초하여 추출된 복수의 부그래프 중 유사도가 가장 높게 도출된 부그래프를 최종 부그래프를 도출할 수 있다. The subgraph derivation unit 130 derives a degree of similarity between a triple set (a triple set extracted from each of a plurality of subgraphs) and an analysis result for query data based on the previously learned deep learning network, and based on the derived similarity The final subgraph may be derived from the subgraph with the highest similarity among the plurality of extracted subgraphs.

부그래프 도출부(130)는 도출된 최종 부그래프로부터 문서 검색에 사용될 정보를 추출할 수 있다. 구체적으로, 부그래프 도출부(130)는 최종 부그래프로부터 최종 부그래프를 구성하는 적어도 둘 노드에 대한 개체명, 적어도 둘 노드를 연결하는 엣지에 대한 관계명, 유의어 사전으로부터 추출된 해당 개체명과 관계명 각각과 관련된 유의어, 최종 부그래프에 매핑된 스키마 정보(클래스 정보, 속성관계)를 추출할 수 있다. 예를 들어, 부그래프 도출부(130)는 '멜라니아 트럼프가 몇번째 이민자 출신 퍼스트 레이디야?'를 포함하는 질의 데이터에 대한 최종 부그래프를 도 2g와 같이 도출할 수 있다. 또한, 부그래프 도출부(130)는 해당 질의 데이터에 대한 최종 부그래프로부터 'Nation, Person, hasCountry, gender, position, immigrator'을 포함하는 스키마 정보를 추출할 수 있다. The subgraph derivation unit 130 may extract information to be used for document search from the derived final subgraph. Specifically, the sub-graph derivation unit 130 is an entity name for at least two nodes constituting the final sub-graph from the final sub-graph, a relation name for an edge connecting at least two nodes, and a corresponding entity name and relation extracted from the thesaurus. Synonyms related to each name and schema information (class information, attribute relationship) mapped to the final subgraph can be extracted. For example, the subgraph derivation unit 130 may derive a final subgraph for the query data including 'What immigrant first lady is Melania Trump?' as shown in FIG. 2G . Also, the subgraph derivation unit 130 may extract schema information including 'Nation, Person, hasCountry, gender, position, immigrator' from the final subgraph for the corresponding query data.

응답 도출부(140)는 도출된 최종 부그래프에 기초하여 오픈 도메인으로부터 수집된 복수의 문서 중 적어도 하나의 문서를 선택하고, 선택된 문서에 기초하여 질의 데이터에 대한 응답 데이터를 도출할 수 있다. The response derivation unit 140 may select at least one document from among a plurality of documents collected from the open domain based on the derived final subgraph, and may derive response data for the query data based on the selected document.

응답 도출부(140)는 최종 부그래프에 포함된 노드 또는 엣지에 대한 키워드, 개체명, 관계명, 스키마 정보 중 적어도 하나에 기초하여 복수의 문서 중 적어도 하나의 문서를 선택할 수 있다. 여기서, 복수의 문서는 각 문서에 포함된 문장에 대하여 색인 프로세스가 수행된 문서이다. 각 문서의 문장에는 개체명 인식 결과, 형태소 분석 결과 및 토크나이저 결과, 개체명 정보 및 관계명 정보가 색인되어 있다. The response derivation unit 140 may select at least one document from among the plurality of documents based on at least one of a keyword, an entity name, a relation name, and schema information for a node or edge included in the final subgraph. Here, the plurality of documents is a document on which an indexing process is performed on sentences included in each document. Entity name recognition result, morpheme analysis result, tokenizer result, entity name information, and relationship name information are indexed in the sentences of each document.

문서 랭킹부(미도시)는 복수의 문서 각각에 대하여 최종 부그래프로부터 추출된 정보(최종 부그래프의 노드 또는 엣지에 대한 키워드, 개체명, 관계명, 스키마 정보)가 차지하는 비율(비중)을 산출할 수 있다. 예를 들어, 문서 랭킹부(미도시)는 각 문서 별로 최종 부그래프의 노드 또는 엣지에 대한 키워드 및 이에 대한 유의어가 출현하는 제 1 빈도수를 산출하고, 최종 부그래프의 개체명 및 관계명이 출현하는 제 2 빈도수를 산출하고, 최종 부그래프의 스키마 정보가 출현하는 제 3 빈도수를 계산할 수 있다. 예를 들어, 문서 랭킹부(미도시)는 각 문서 별로, 최종 부그래프의 노드 및 엣지 리스트에 포함된 문자열이 문장 내 한 쌍(pair)으로 나타나는 비율을 계산할 수 있다. 또는, 문서 랭킹부(미도시)는 각 문서 별로, 최종 부그래프의 개체명 및 관계명이 한 쌍으로 나타나는 비율을 계산할 수 있다. The document ranking unit (not shown) calculates the ratio (specific gravity) of the information extracted from the final subgraph for each of the plurality of documents (keyword, entity name, relation name, schema information for the node or edge of the final subgraph) can do. For example, the document ranking unit (not shown) calculates the first frequency of occurrence of keywords and synonyms for nodes or edges of the final subgraph for each document, and the entity name and relationship name of the final subgraph appear. The second frequency may be calculated, and the third frequency at which the schema information of the final subgraph appears may be calculated. For example, the document ranking unit (not shown) may calculate a ratio in which character strings included in the node and edge lists of the final subgraph appear as a pair in the sentence for each document. Alternatively, the document ranking unit (not shown) may calculate a ratio in which the entity name and the relation name of the final subgraph appear as a pair for each document.

문서 랭킹부(미도시)는 각 문서 별로 계산된 최종 부그래프로부터 추출된 정보에 대한 출현 빈도수(비율)를 특징 정보로서 기계 학습 모델(예컨대, Random Forest 모델 등)에 입력하여 기계 학습 모델을 학습시키고, 학습된 기계 학습 모델을 통해 각 특징 정보 별 가중치를 산출할 수 있다. The document ranking unit (not shown) learns the machine learning model by inputting the frequency of appearance (ratio) of the information extracted from the final subgraph calculated for each document into the machine learning model (eg, random forest model, etc.) as feature information. and a weight for each feature information can be calculated through the learned machine learning model.

문서 랭킹부(미도시)는 질의 데이터에 대한 수치 정보 및 기계 학습 모델을 토해 산출된 각 특징 정보별 가중치를 이용하여 각 문서에 대한 랭킹 점수를 산출하고, 산출된 각 문서에 대한 랭킹 점수에 기초하여 복수의 문서를 랭킹화할 수 있다. The document ranking unit (not shown) calculates a ranking score for each document by using the weight for each characteristic information calculated by vomiting numerical information about the query data and a machine learning model, and based on the calculated ranking score for each document Thus, a plurality of documents can be ranked.

응답 도출부(140)는 복수의 문서 중 랭킹 점수가 높은 문서를 선택하고, 선택된 문서에 기초하여 질의 데이터에 대한 응답 데이터를 도출할 수 있다. The response derivation unit 140 may select a document having a high ranking score among a plurality of documents, and may derive response data for the query data based on the selected document.

응답 도출부(140)는 도출된 질의 데이터에 대한 응답 데이터를 검증할 수 있다. 예를 들어, 응답 도출부(140)는 질의 데이터에 대한 최종 부그래프에 포함된 정보를 이용하여 응답 데이터를 검증할 수 있다. 최종 부그래프는 노드 또는 엣지에 대한 키워드, 개체명, 관계명, 스키마 정보를 가지고 있기 때문에 질의 데이터를 구성하는 주어 정보(개체명) 및 술어 정보(관계명)에 기초하여 응답 데이터(즉, 목적 정보)의 타입을 유추할 수 있고, 응답 데이터에 대한 타입 검증을 통해 응답 데이터를 검증할 수 있다. The response derivation unit 140 may verify response data for the derived query data. For example, the response derivation unit 140 may verify the response data by using information included in the final subgraph for the query data. Because the final subgraph has keyword, entity name, relation name, and schema information for a node or edge, response data (ie, purpose) based on the subject information (object name) and predicate information (relation name) constituting the query data information) can be inferred, and response data can be verified through type verification for response data.

예를 들어, 도 3을 참조하면, 질의 데이터가 '멜라니아 트럼프가 몇번째 이민자 출신 퍼스트 레이디야?'이고, 질의 데이터에 대하여 검색된 문서 문장이 '트럼프의 당선으로 멜라니아는 미국의 두 번째 이민자 출신이자 첫 공산권 국가 출신 퍼스트 레이디가 됐다.'라고 가정하면, 질의 데이터에 대한 최종 부그래프의 스키마 정보는 '멜라니아 트럼프/Person, 멜라니/Singer, 도날드 트럼프/Person, 이민자/RQCWho, 퍼스트 레이디/PoliticalPerson, 몇번째/Sequence'이고, 검색된 문서 문장에서의 스키마 정보는 '도널드 트럼프/Person, 멜라니아 트럼프/Person, 미국/Nation, 이민자/RQCWho, 퍼스트 레이디/PoliticalPerson, 멜라니/Singer. 두번째/Sequence'이다. For example, referring to FIG. 3 , the query data is 'What immigrant first lady is Melania Trump?' Assuming that she became the first lady from a communist country, the schema information of the final subgraph for the query data is 'Melania Trump/Person, Melanie/Singer, Donald Trump/Person, Immigrant/RQCWho, First Lady/PolicalPerson. . Second/Sequence'.

질의 데이터에 대한 정답은 '두 번째'이다. 질의 데이터에서 나타난 '몇번째'에 해당하는 개체명과 응답 데이터의 '두 번째'에 해당하는 개체명이 동일한 시퀀스(Sequence) 타입이므로 응답 데이터의 타입이 검증될 수 있다. 또한, 시퀀스 타입이므로 한글 문자열이 아닌 수치형으로 표현되어 있더라도 응답 데이터의 검증이 가능하다. The correct answer for the query data is 'second'. The type of response data can be verified because the entity name corresponding to the 'number' in the query data and the entity name corresponding to the 'second' in the response data are the same sequence type. In addition, since it is a sequence type, it is possible to verify the response data even if it is expressed in a numeric format instead of a Korean character string.

한편, 개체명 사전을 통해 개체명이 인식되지 않은 경우, 색인부(150)는 관계명 사전을 이용한 관계명의 의미적 인식을 통해 개체명의 도메인을 한정하고, 해당 개체명의 도메인에 기초하여 수집된 복수의 문서에 포함된 문장에 대해 색인 프로세스를 수행할 수 있다. 관계명의 의미적 인식을 통해 한정된 개체명의 도메인을 이용함으로써 검색 정확도를 높일 수 있다. On the other hand, if the entity name is not recognized through the entity name dictionary, the indexing unit 150 limits the domain of the entity name through semantic recognition of the relation name using the relation name dictionary, and a plurality of An indexing process can be performed on the sentences included in the document. The search accuracy can be improved by using the domain of the limited entity name through semantic recognition of the relation name.

예를 들어, 특정 드라마의 약어(예컨대, 구르미 그린 달빛의 드라마 약어인 '구르미')나 부분 문자열이 포함된 질의 데이터의 경우, 해당 질의 데이터에 포함된 개체명이 개체명 사전을 통해 인식되지 않더라도 관계명의 의미적 인식을 통해 유추된 개체명의 도메인을 통해 질의에 대한 답변을 도출할 수 있다. For example, in the case of query data containing an abbreviation of a specific drama (eg, 'Gourmi', which is an abbreviation of the drama of moonlight drawn by clouds) or substrings, the relationship between the entity names included in the query data is not recognized through the entity name dictionary. The answer to the query can be derived through the domain of the entity name inferred through the semantic recognition of the name.

다른 예로, '코로나가 언제 시작됐어?'를 포함하는 질의 데이터의 경우, '코로나' 개체명이 어떤 개체명(예컨대, 바이러스 개체명, 맥주 개체명, 플라즈마 대기 개체명 등)에 대응되는지 확인이 어렵다. 또한, '코로나' 키워드를 이용하여 검색하는 경우에도 해당 '코로나' 키워드 대응하는 정확한 개체명에 대해 기술된 문서를 검색할 확률이 높지 않다. As another example, in the case of query data including 'When did the corona start?', it is difficult to determine which entity name the 'corona' entity corresponds to (eg, virus entity name, beer entity name, plasma atmospheric entity name, etc.) . In addition, even when searching using the 'corona' keyword, the probability of searching for a document describing the exact entity name corresponding to the 'corona' keyword is not high.

따라서, 색인부(150)는 '코로나가 언제 시작됐어?'를 포함하는 질의 데이터를 구성하는 일부 표현인 '언제 시작됐어?' 질의 표현 정보에 기초하여 관계명 사전으로부터 '발생일' 관계명을 도출할 수 있다. 또한, 색인부(150)는 '발생일' 관계명이 관계명 사전 내 어느 클래스에 연결이 되는지를 확인하여 '발생일' 관계명과 관계된 스키마 정보를 관계명 사전으로부터 추출할 수 있다. Accordingly, the indexing unit 150 may include 'When did the corona start?' Based on the query expression information, it is possible to derive the relationship name of 'occurrence date' from the relationship name dictionary. In addition, the indexing unit 150 may extract schema information related to the relationship name of the 'occurrence date' from the relation name dictionary by checking which class in the relation name dictionary the 'occurrence date' relation name is connected to.

이후, 질의 분석부(120)는 '코로나가 언제 시작됐어?'를 포함하는 질의 데이터의 분석 결과로부터 '코로나'(개체명이 아닌 키워드) 토큰, '발생일' 관계명 토큰(관계명 사전에 포함된 클래스 및 관계 의미 포함) 및 '언제 시작' 토큰을 추출할 수 있다. Thereafter, the query analysis unit 120 performs a 'corona' (keyword, not an entity name) token, a 'occurrence date' relation name token (included in the relation name dictionary) from the analysis result of the query data including 'when did corona start?' class and relationship semantics) and 'when to start' tokens can be extracted.

검색부(미도시)는 질의 분석부(120)에 의해 추출된 '코로나' 토큰, '발생일' 관계명 토큰 및 '언제 시작' 토큰을 이용하여 지식 그래프의 노드 및 엣지를 검색할 수 있다. The search unit (not shown) may search the nodes and edges of the knowledge graph using the 'corona' token, the 'occurrence date' relation name token, and the 'when start' token extracted by the query analysis unit 120 .

부그래프 도출부(130)는 질의 데이터에 대한 분석 결과에 기초하여 지식 그래프를 이용한 부그래프의 탐색을 통해 의미적으로 연결된 노드 및 엣지 집합인 복수의 부그래프를 생성할 수 있다. 이 때, 부그래프의 생성에 이용되지 않은 노드 및 엣지는 삭제된다. The subgraph derivation unit 130 may generate a plurality of subgraphs that are a set of nodes and edges that are semantically connected through the search of the subgraph using the knowledge graph based on the analysis result of the query data. At this time, nodes and edges that are not used to generate the subgraph are deleted.

질의 분석부(120)는 심층신경망을 통해 질의 데이터와 관련된 최적의 클래스, 개체명(인스턴스) 및 관계명을 도출함으로써 질의 데이터에 대한 의미적 질문 분석에 따른 결과를 도출할 수 있다. The query analysis unit 120 may derive a result according to the semantic question analysis of the query data by deriving an optimal class, entity name (instance), and relationship name related to the query data through the deep neural network.

또한, 문서 랭킹부(미도시)는 의미적 질의분석 결과와 키워드를 통해 문서 검색 및 검색된 문서별 랭킹을 수행할 수 있다. 이 과정에서 '코로나' 키워드와 '발생일' 관계명이 적정거리 내 존재하는 문서를 상위 랭킹으로 산출함으로써 '코로나' 키워드 검색시 나타날 수 있는 오검색 비율을 줄여 검색 정확도를 향상시킬 수 있다. In addition, the document ranking unit (not shown) may perform a document search and ranking for each searched document through the result of semantic query analysis and keywords. In this process, it is possible to improve the search accuracy by reducing the rate of false searches that may appear when searching for the 'corona' keyword by calculating the documents that exist within an appropriate distance between the 'corona' keyword and the 'date of occurrence' as a higher ranking.

응답 도출부(140)는 기계 학습 모델(예컨대, MRC 모델 등)에 질의 데이터와 상위 랭킹의 문서를 입력하여 질의 데이터에 대한 답변을 도출할 수 있다. The response derivation unit 140 may derive an answer to the query data by inputting the query data and a document having a higher ranking to the machine learning model (eg, an MRC model, etc.).

이처럼, 본 발명은 오픈 도메인 및 실시간 정보에 대한 질의 데이터에 대해서도 정확한 답변을 제공할 수 있다. As such, the present invention can provide an accurate answer to query data for open domain and real-time information.

한편, 당업자라면, 수집부(100), 지식 그래프 생성부(110), 질의 분석부(120), 부그래프 도출부(130), 응답 도출부(140) 및 색인부(150) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, for those skilled in the art, the collection unit 100, the knowledge graph generation unit 110, the query analysis unit 120, the subgraph derivation unit 130, the response derivation unit 140, and the index unit 150 are each separated. It will be fully understood that it may be implemented, or one or more of these may be integrated and implemented.

도 4는 본 발명의 일 실시예에 따른, 지식 그래프에 기초하여 응답을 도출하는 방법을 나타낸 흐름도이다. 4 is a flowchart illustrating a method of deriving a response based on a knowledge graph according to an embodiment of the present invention.

도 4를 참조하면, 단계 S401에서 응답 도출 장치(10)는 오픈 도메인으로부터 주어 정보, 목적 정보 및 주어 정보 및 목적 정보 간의 술어 정보를 포함하는 시맨틱 정보를 수집할 수 있다. Referring to FIG. 4 , in step S401 , the response derivation apparatus 10 may collect subject information, object information, and semantic information including predicate information between subject information and object information from the open domain.

단계 S403에서 응답 도출 장치(10)는 주어 정보, 술어 정보 및 목적 정보에 기초하여 지식 그래프를 생성할 수 있다. In step S403, the response derivation apparatus 10 may generate a knowledge graph based on subject information, predicate information, and object information.

단계 S405에서 응답 도출 장치(10)는 질의 데이터를 입력받을 수 있다. In step S405, the response derivation apparatus 10 may receive query data.

단계 S407에서 응답 도출 장치(10)는 입력된 질의 데이터를 분석할 수 있다. In step S407 , the response derivation apparatus 10 may analyze the input query data.

단계 S409에서 응답 도출 장치(10)는 질의 데이터에 대한 분석 결과에 기초하여 생성된 지식 그래프로부터 적어도 하나의 부그래프를 탐색할 수 있다. In step S409 , the response derivation apparatus 10 may search for at least one subgraph from the knowledge graph generated based on the analysis result of the query data.

단계 S411에서 응답 도출 장치(10)는 탐색된 적어도 하나의 부그래프 중 최종 부그래프를 도출할 수 있다. In step S411, the response deriving apparatus 10 may derive a final subgraph from among at least one searched subgraph.

단계 S413에서 응답 도출 장치(10)는 도출된 최종 부그래프에 기초하여 오픈 도메인으로부터 수집된 복수의 문서 중 적어도 하나의 문서를 선택할 수 있다. In step S413 , the response derivation apparatus 10 may select at least one document from among a plurality of documents collected from the open domain based on the derived final subgraph.

단계 S415에서 응답 도출 장치(10)는 선택된 문서에 기초하여 질의 데이터에 대한 응답 데이터를 도출할 수 있다. In operation S415, the response deriving apparatus 10 may derive response data for the query data based on the selected document.

상술한 설명에서, 단계 S401 내지 S415는 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S401 to S415 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

10: 응답 도출 장치
100: 수집부
110: 지식 그래프 생성부
120: 질의 분석부
130: 부그래프 도출부
140: 응답 도출부
150: 색인부10: response eliciting device
100: collection unit
110: knowledge graph generating unit
120: query analysis unit
130: subgraph derivation unit
140: response elicitation unit
150: index

Claims

An apparatus for deriving a response based on a knowledge graph, the apparatus comprising:
a collecting unit for collecting subject information, object information, and semantic information including predicate information between the subject information and the object information from the open domain;
a knowledge graph generator configured to generate a knowledge graph based on the subject information, the predicate information, and the object information;
a query analysis unit that receives query data and analyzes the input query data;
a sub-graph derivation unit that searches for at least one sub-graph from the generated knowledge graph based on an analysis result of the query data, and derives a final sub-graph from among the searched at least one sub-graph; and
A response derivation unit that selects at least one document from among the plurality of documents collected from the open domain based on the derived final subgraph, and derives response data to the query data based on the selected document
Which includes, a response derivation device.

The method of claim 1,
The knowledge graph generating unit generates a node corresponding to the subject information and the object information, generates an edge indicating predicate information between each node, and generates a knowledge graph based on the node and the edge, Response derivation Device.

3. The method of claim 2,
The knowledge graph generating unit is to map the entity name or relation name to schema information for a node or edge included in the knowledge graph based on the previously registered entity name dictionary and relation name dictionary.

4. The method of claim 3,
The knowledge graph generating unit will manage the thesaurus related to the mapped entity name or relationship name through the knowledge graph based on a pre-registered thesaurus.

4. The method of claim 3,
An indexing unit that performs an indexing process on the sentences included in the plurality of collected documents based on the entity name recognition result using the entity name dictionary and the relation name dictionary
further comprising,
The response derivation apparatus, wherein the query analyzer analyzes the query data based on the entity name recognition result.

6. The method of claim 5,
The indexing unit performs an indexing process on the sentences included in the plurality of collected documents based on the morpheme analysis result and the tokenizer result,
and the query analyzing unit further analyzes the query data based on the morpheme analysis result and the tokenizer result.

6. The method of claim 5,
The subgraph derivation unit searches for at least one node or edge from the knowledge graph based on subject information or predicate information included in the analysis result for the query data, and generates a plurality of subgraphs based on the found node or edge. to extract, a response derivation device.

8. The method of claim 7,
The subgraph derivation unit extracts a triple set from each of the plurality of extracted subgraphs, derives a similarity between the extracted triple set and an analysis result for the query data based on a previously learned deep learning network, and the derivation Deriving a final subgraph among the plurality of extracted subgraphs based on the obtained similarity,
The triple set includes information on at least two nodes included in each subgraph and information on an edge connecting between the at least two nodes,
The at least two nodes and an edge connecting the at least two nodes are mapped with schema information for the knowledge graph.

9. The method of claim 8,
The subgraph derivation unit maps schema information for the knowledge graph to a row or column of an input vector corresponding to each subgraph. The triple set extracted from each subgraph is mapped to the input vector, and the input vector to which the triple set is mapped is input to the deep learning network, the response derivation apparatus.

The method of claim 1,
The response derivation unit selects at least one document from among the plurality of documents based on at least one of a keyword for a node or edge included in the final subgraph, an entity name, a relation name, and schema information. .

A method for deriving a response based on a knowledge graph, the method comprising:
collecting semantic information including subject information, object information, and predicate information between the subject information and the object information from the open domain;
generating a knowledge graph based on the subject information, the predicate information, and the object information;
receiving query data;
analyzing the input query data;
searching for at least one subgraph from the generated knowledge graph based on an analysis result of the query data;
deriving a final subgraph from among the searched at least one subgraph;
selecting at least one document from among a plurality of documents collected from the open domain based on the derived final subgraph; and
deriving response data to the query data based on the selected document
A method for deriving a response that includes.

12. The method of claim 11,
generating the knowledge graph.
generating a node corresponding to the subject information and the object information;
generating an edge representing predicate information between each node; and
generating a knowledge graph based on the node and the edge
A method for deriving a response that includes.

13. The method of claim 12,
generating the knowledge graph.
mapping the entity name or relation name to the node or edge included in the knowledge graph based on the previously registered entity name dictionary and relation name dictionary
Which further comprises, the response derivation method.

14. The method of claim 13,
performing an indexing process on the sentences included in the plurality of collected documents based on the entity name recognition result using the entity name dictionary and the relation name dictionary
further comprising,
The analyzing of the input query data may include analyzing the query data based on the entity name recognition result.

15. The method of claim 14,
The step of searching the subgraph.
searching for at least one node or edge from the knowledge graph based on subject information or predicate information included in an analysis result of the query data; and
extracting a plurality of subgraphs based on the found node or edge
A method for deriving a response that includes.

16. The method of claim 15,
The step of deriving the final subgraph.
extracting a triple set from each of the plurality of extracted subgraphs;
deriving a similarity between the extracted triple set and the analysis result for the query data based on the previously learned deep learning network; and
deriving a final subgraph from among the plurality of extracted subgraphs based on the derived similarity
including,
The triple set includes information on at least two nodes included in each subgraph and information on an edge connecting the at least two nodes.

12. The method of claim 11,
Selecting the document includes:
The method for deriving a response, in which at least one of the plurality of documents is selected based on at least one of a keyword, an entity name, a relation name, and schema information for a node or edge included in the final subgraph.

A computer program stored in a computer-readable medium comprising a sequence of instructions for deriving a response based on a knowledge graph, the computer program comprising:
When the computer program is executed by a computing device,
Collect semantic information including subject information, object information, and predicate information between the subject information and the object information from the open domain,
generating a knowledge graph based on the subject information, the predicate information, and the object information;
Searching for at least one subgraph from the generated knowledge graph based on the analysis result of the query data,
Deriving a final subgraph among the searched at least one subgraph,
selecting at least one document from among the plurality of documents collected from the open domain based on the derived final subgraph;
and a sequence of instructions for deriving response data to the query data based on the selected document.