KR102582779B1

KR102582779B1 - Knowledge completion method and apparatus through neuro symbolic-based relation embeding

Info

Publication number: KR102582779B1
Application number: KR1020200162934A
Authority: KR
Inventors: 박영택; 노재승; 박현규; 신원철
Original assignee: 숭실대학교산학협력단
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2023-09-25
Also published as: KR20220074430A; WO2022114368A1

Abstract

본 발명은 뉴로 심볼릭을 이용한 지식완성 방법 및 장치를 개시한다. 본 발명에 따르면, 프로세서; 및 상기 프로세서에 연결되는 메모리를 포함하되, 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션 및 엔티티와, 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩하고, 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 상기 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하고, 상기 업데이트를 통해 상기 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하고, 상기 하나 이상의 경로를 이용하여 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 생성하고, 상기 생성된 추론 규칙을 통해 누락된 링크를 연결하도록, 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 지식완성 장치가 제공된다. The present invention discloses a knowledge completion method and device using neurosymbols. According to the invention, a processor; and a memory connected to the processor, where relations and entities of triple data included in the incomplete knowledge graph and relations included in the parameterized rule are embedded in a multidimensional space, and a target triple for link connection is input. , The embedding value of the relation included in the parameterized rule is updated through a neuro-symbolic unification process based on Backward Chaining, and one or more paths containing a combination of relations that satisfy the target triple are created through the update. storing program instructions executable by the processor to generate an inference rule that most semantically matches the target triple using the one or more paths, and to connect missing links through the generated inference rule. A knowledge completion device is provided.

Description

Knowledge completion method and apparatus through neuro symbolic-based relation embedding}

본 발명은 뉴로 심볼릭 기반 릴레이션 임베딩을 통한 지식완성 방법 및 장치에 관한 것이다. The present invention relates to a method and device for completing knowledge through neurosymbolic-based relation embedding.

지식 그래프는 데이터들 간의 관계를 표현한 네트워크로 인공지능 기술에 접목되어 다양하게 활용되고 있지만, 엔티티 또는 엔티티 사이의 링크가 누락되어 지식의 불완전성에 대한 문제가 존재한다. The knowledge graph is a network that expresses relationships between data and is being used in a variety of ways by incorporating artificial intelligence technology. However, there is a problem with incomplete knowledge due to missing entities or links between entities.

상기한 문제점을 해결하기 위해 자동 지식완성 기법 연구가 중요하게 요구되며, 임베딩 기법을 사용하거나 딥러닝을 활용한 연구와 온톨로지를 이용한 심볼릭 규칙 추론을 통한 지식완성 수행과 같은 다양한 연구들이 진행되었다. To solve the above problems, research on automatic knowledge completion techniques is important, and various studies have been conducted, such as research using embedding techniques or deep learning, and knowledge completion through symbolic rule inference using ontology.

이러한 방식을 통해 효율적으로 자동 지식완성을 수행하지만 딥러닝 방식은 데이터 기반의 처리 방식으로 인해 대량의 학습 데이터가 요구되며, 결과에 대한 설명이 불가능하다는 문제점이 있다. Although this method efficiently performs automatic knowledge completion, the deep learning method has the problem of requiring a large amount of learning data due to its data-based processing method and making it impossible to explain the results.

그리고 심볼릭 추론 방식을 이용하는 대다수의 연구들은 온톨로지를 통해 지식의 관계에 대하여 정의하고, 이를 규칙 기반 시맨틱 추론으로 지식완성을 수행한다. And most studies using symbolic reasoning define knowledge relationships through ontology and complete knowledge through rule-based semantic reasoning.

전문가에 의해 정의된 규칙을 활용하기 때문에 잘 반영된 지식 그래프를 통해 누락된 지식을 완성할 수 있지만 대용량의 지식 그래프에 대한 관계 표현 및 규칙을 제공하기 위해서는 전문가의 많은 시간과 비용이 발생하는 문제점이 존재하며, 새로운 지식이나 기존 지식에 대한 변경이 발생할 때마다 관계 표현 및 규칙을 변경된 지식에 맞게 수정해야하는 문제점이 존재한다. Because the rules defined by experts are utilized, missing knowledge can be completed through a well-reflected knowledge graph. However, there is a problem that it takes a lot of time and money for experts to provide relationship expressions and rules for a large knowledge graph. And, whenever new knowledge or changes to existing knowledge occur, there is a problem that relationship expressions and rules must be modified to fit the changed knowledge.

대한민국등록특허 제10-2140585호Republic of Korea Patent No. 10-2140585

상기한 종래기술의 문제점을 해결하기 위해, 본 발명은 효율적이며 정확한 지식완성을 수행할 수 있는 뉴로 심볼릭 기반 릴레이션 임베딩을 통한 지식완성 방법 및 장치를 제안하고자 한다. In order to solve the problems of the prior art described above, the present invention seeks to propose a knowledge completion method and device through neurosymbolic-based relation embedding that can perform efficient and accurate knowledge completion.

상기한 바와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따르면, 뉴로 심볼릭을 이용한 지식완성 장치로서, 프로세서; 및 상기 프로세서에 연결되는 메모리를 포함하되, 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션 및 엔티티와, 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩하고, 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 상기 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하고, 상기 업데이트를 통해 상기 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하고, 상기 하나 이상의 경로를 이용하여 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 생성하고, 상기 생성된 추론 규칙을 통해 누락된 링크를 연결하도록, 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 지식완성 장치가 제공된다. In order to achieve the above-described object, according to an embodiment of the present invention, there is provided a knowledge completion device using neuro-symbols, comprising: a processor; and a memory connected to the processor, where relations and entities of triple data included in the incomplete knowledge graph and relations included in the parameterized rule are embedded in a multidimensional space, and a target triple for link connection is input. , The embedding value of the relation included in the parameterized rule is updated through a neuro-symbolic unification process based on Backward Chaining, and one or more paths containing a combination of relations that satisfy the target triple are created through the update. storing program instructions executable by the processor to generate an inference rule that most semantically matches the target triple using the one or more paths, and to connect missing links through the generated inference rule. A knowledge completion device is provided.

상기 파라미터화된 규칙은 제1 릴레이션 및 복수의 변수를 포함하는 결론 항과, 제2 릴레이션 및 복수의 변수를 포함하는 전제 항으로 구성되고, 상기 프로그램 명령어들은, 상기 목표 트리플의 릴레이션과 상기 제1 릴레이션의 유사도를 비교하여 상기 제1 릴레이션의 임베딩 값을 업데이트하고, 상기 목표 트리플의 복수의 엔티티들을 상기 복수의 변수에 각각 바인딩하여 치환 집합을 획득할 수 있다. The parameterized rule consists of a conclusion term including a first relation and a plurality of variables, and a premise term including a second relation and a plurality of variables, and the program instructions include the relation of the target triple and the first The embedding value of the first relation can be updated by comparing the similarity of the relations, and a substitution set can be obtained by binding the plurality of entities of the target triple to the plurality of variables.

상기 프로그램 명령어들은, 상기 획득된 치환 집합을 이용하여 상기 제2 릴레이션과 유사도 비교 대상이 되는 상기 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션을 결정할 수 있다. The program instructions may use the obtained substitution set to determine a relation of triple data included in the incomplete knowledge graph that is subject to similarity comparison with the second relation.

상기 전제 항은 제1 및 제2 전제 항을 포함하고, 상기 결론 항은 제1 릴레이션, 제1 변수 및 제2 변수를 포함하고, 상기 제1 전제 항은 제2-1 릴레이션 및 제1 변수 및 제3 변수를 포함하고, 상기 제2 전제 항은 제2-2 릴레이션 및 제3 변수 및 제2 변수를 포함할 수 있다. The premise terms include first and second premise terms, the conclusion term includes a first relation, a first variable and a second variable, and the first premise term includes a 2-1 relation and a first variable and It includes a third variable, and the second premise term may include a 2-2 relation, a third variable, and a second variable.

상기 치환 집합은 상기 제1 변수를 상기 목표 트리플의 주어 엔티티와 바인딩하고, 상기 제2 변수를 상기 목표 트리플의 목적어 엔티티와 바인딩하여 획득되며, 상기 프로그램 명령어들은, 상기 제1 변수와 동일한 주어 엔티티를 갖는 트리플 데이터를 상기 불완전 지식 그래프에서 탐색하고, 상기 탐색된 트리플 데이터의 릴레이션과 상기 제2-1 릴레이션의 유사도를 비교하고, 상기 제3 변수를 상기 탐색된 트리플 데이터의 목적어 엔티티와 바인딩하여 상기 치환 집합을 업데이트할 수 있다. The substitution set is obtained by binding the first variable with the subject entity of the target triple and binding the second variable with the object entity of the target triple, and the program instructions include the same subject entity as the first variable. Search for triple data in the incomplete knowledge graph, compare the similarity between the relation of the searched triple data and the 2-1 relation, and bind the third variable with the object entity of the searched triple data to perform the substitution. The set can be updated.

상기 프로그램 명령어들은, 상기 하나 이상의 경로 각각에 포함된 릴레이션 조합의 임베딩 값의 평균값을 계산하고, 상기 평균값을 비용 함수에 적용하여 상기 추론 규칙 생성을 위한 하나의 경로를 결정할 수 있다. The program instructions may calculate an average value of embedding values of relation combinations included in each of the one or more paths, and apply the average value to a cost function to determine one path for generating the inference rule.

본 발명의 다른 측면에 따르면, 프로세서 및 메모리를 포함하는 뉴로 심볼릭을 이용한 지식완성 방법으로서, 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션 및 엔티티와 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩하는 단계; 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 상기 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하는 단계; 상기 업데이트를 통해 상기 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하는 단계; 상기 하나 이상의 경로를 이용하여 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 생성하는 단계; 및 상기 생성된 추론 규칙을 통해 누락된 링크를 연결하는 단계를 포함하는 지식완성 방법이 제공된다. According to another aspect of the present invention, there is a knowledge completion method using neurosymbols including a processor and memory, which includes embedding relations and entities of triple data included in an incomplete knowledge graph and relations included in parameterized rules in a multidimensional space. step; When a target triple for link connection is input, updating the embedding value of the relation included in the parameterized rule through a neuro-symbolic unification process based on backward chaining; generating one or more paths including a combination of relations satisfying the target triple through the update; generating an inference rule that most semantically matches the target triple using the one or more paths; And a knowledge completion method is provided, including the step of connecting missing links through the generated inference rules.

본 발명의 또 다른 측면에 따르면, 상기 방법을 수행하는 컴퓨터 판독 가능한 프로그램이 제공된다. According to another aspect of the present invention, a computer-readable program for performing the method is provided.

본 발명의 또 다른 측면에 따르면, 뉴로 심볼릭을 이용한 지식완성 시스템으로서, 불완전 지식 그래프; 상기 불완전 지식 그래프 포함된 트리플 데이터의 릴레이션 및 엔티티와 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩하고, 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 상기 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하고, 상기 업데이트를 통해 상기 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하는 뉴로 심볼릭 통합 모듈; 및 상기 하나 이상의 경로를 이용하여 생성되며, 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 이용하여 누락된 링크를 연결하는 지식완성 모듈을 포함하는 지식완성 시스템이 제공된다. According to another aspect of the present invention, there is a knowledge completion system using neurosymbols, comprising: an incomplete knowledge graph; The relations and entities of the triple data included in the incomplete knowledge graph and the relations included in the parameterized rules are embedded in a multidimensional space, and when a target triple for link connection is input, a neurosymbolic unification process based on Backward Chaining is performed. A neuro-symbolic integration module that updates an embedding value of a relation included in the parameterized rule and generates one or more paths including a combination of relations satisfying the target triple through the update; and a knowledge completion module that is generated using the one or more paths and connects missing links using an inference rule that most semantically matches the target triple.

본 발명에 따르면, 선택적으로 릴레이션을 임베딩하고, 파라미터 패싱을 통해 효율적이며 정확하게 지식완성을 수행할 수 있는 장점이 있다. According to the present invention, there is an advantage of selectively embedding relations and performing knowledge completion efficiently and accurately through parameter passing.

도 1은 본 발명의 바람직한 일 실시에에 따른 지식완성 장치의 시스템의 구성을 도시한 도면이다.
도 2는 본 실시예에 따른 통합(unification) 과정에 대한 간단한 예를 도시한 도면이다.
도 3은 본 실시예에 따른 파라미터화된 규칙을 이용한 뉴로 심볼릭 통합 과정을 도시한 도면이다.
도 4 내지 도 5는 본 실시예에 따른 positive 데이터와 negative 데이터를 도시한 도면이다.
도 6은 본 실시예에 따른 지식완성 과정을 설명하기 위한 도면이다.
도 7은 본 실시예에 따른 지식완성 장치의 구성을 도시한 도면이다. Figure 1 is a diagram showing the configuration of a system of a knowledge completion device according to a preferred embodiment of the present invention.
Figure 2 is a diagram showing a simple example of the unification process according to this embodiment.
Figure 3 is a diagram illustrating a neurosymbolic integration process using parameterized rules according to this embodiment.
Figures 4 and 5 are diagrams showing positive data and negative data according to this embodiment.
Figure 6 is a diagram for explaining the knowledge completion process according to this embodiment.
Figure 7 is a diagram showing the configuration of a knowledge completion device according to this embodiment.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

본 발명은 뉴로 심볼릭 방식을 이용하여 지식 그래프의 데이터로부터 implicit한 규칙을 explicit하게 추출하여 자동으로 지식완성을 수행한다. The present invention uses a neurosymbolic method to explicitly extract implicit rules from knowledge graph data and automatically perform knowledge completion.

도 1은 본 발명의 바람직한 일 실시에에 따른 지식완성 장치의 시스템의 구성을 도시한 도면이다. Figure 1 is a diagram showing the configuration of a system of a knowledge completion device according to a preferred embodiment of the present invention.

도 1에 도시된 바와 같이, 본 실시예에 따른 지식완성 장치는 뉴로 심볼릭 통합 모듈(Neuro-Symbolic Unification Module, 100) 및 지식완성 모듈(Knowledge Completion Module, 110)을 포함할 수 있다. As shown in FIG. 1, the knowledge completion device according to this embodiment may include a neuro-symbolic unification module (Neuro-Symbolic Unification Module, 100) and a knowledge completion module (Knowledge Completion Module, 110).

뉴로 심볼릭 통합 모듈(100)은 불완전 지식 그래프(Incomplete KG, 102)의 트리플 데이터와 파라미터화된 규칙(Parameterized Rule, )을 입력 받고, 트리플 데이터의 릴레이션 및 엔티티와, 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩한다.The neuro-symbolic integration module 100 integrates triple data of an incomplete knowledge graph (Incomplete KG, 102) and parameterized rules (Parameterized Rule, ) is input, and the relations and entities of the triple data and the relations included in the parameterized rules are embedded in a multidimensional space.

또한, 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 파라미터화된 규칙의 릴레이션(

)과 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하고, 업데이트를 통해 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하며, 하나 이상의 경로를 이용하여 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 생성한다. In addition, when a target triple for link connection is input, a relation of parameterized rules is created through a neurosymbolic unification process based on Backward Chaining (

) and update the embedding value of the relation included in the parameterized rule, generate one or more paths containing a combination of relations that satisfy the target triple through the update, and use one or more paths to provide the most meaningful value to the target triple. Generate logically consistent inference rules.

여기서, 트리플 데이터는 subject(주어), object(목적어)와 같은 엔티티와 predicate에 해당하는 릴레이션을 포함하며, 본 실시예에 따른 뉴로 심볼릭 통합 모듈(100)은 목표 트리플의 릴레이션과 파라미터화된 규칙의 릴레이션과 유사도를 비교한다. Here, the triple data includes entities such as subject (subject) and object (object) and relations corresponding to predicates, and the neurosymbolic integration module 100 according to this embodiment is the relation of the target triple and the parameterized rule. Compare relations and similarities.

본 실시예에 따른 뉴로 심볼릭 통합 모듈(100)은 파라미터화된 규칙에 대해 임의의 트리플을 목표 트리플로 설정하고, 설정된 목표 트리플로부터 뉴로 심볼릭 통합 과정을 수행한다. The neuro-symbolic integration module 100 according to this embodiment sets an arbitrary triple as a target triple for the parameterized rule and performs a neuro-symbolic integration process from the set target triple.

최종 출력은 목표 트리플을 도출할 수 있는 규칙에 대한 릴레이션 정보이다. The final output is relation information about the rule that can derive the target triple.

수집된 릴레이션 정보는 릴레이션에 대한 임베딩 과정을 거친 후, “릴레이션-릴레이션”에 대한 유사도 계산을 통해 파라미터화된 규칙의 릴레이션을 학습하여, 목표 트리플을 만족하는 규칙을 유도한다. The collected relation information goes through an embedding process for the relation, and then learns the relation of parameterized rules through similarity calculation for “relation-relation” to derive a rule that satisfies the target triple.

이후, 지식완성 모듈(110)은 유도된 규칙을 이용한 추론 엔진을 통해 지식의 자동 완성을 수행한다.Afterwards, the knowledge completion module 110 performs automatic completion of knowledge through an inference engine using derived rules.

본 실시예에 따르면, 주어진 목표 트리플(질의)에 대하여 목표 트리플이 참이 되도록 증명하기 위해 Prolog의 Backward chaining 알고리즘을 사용한다. According to this embodiment, Prolog's Backward chaining algorithm is used to prove that the target triple is true for a given target triple (query).

Backward chaining 알고리즘 과정은 크게 2가지 과정으로 진행되며, 이 과정에서는 앞서 설명한 뉴로 심볼릭 통합 과정을 함께 수행하게 된다. The backward chaining algorithm process is largely divided into two processes, and in this process, the neurosymbolic integration process described above is performed together.

뉴로 심볼릭 통합 모듈(100)은 지식 그래프의 모든 규칙과 데이터(facts)를 활용하여 규칙의 결론에 해당하는 목표 트리플에 포함된 엔티티를 포함하는 치환(substitution) 집합을 얻는 OR 과정을 수행한다. The neuro-symbolic integration module 100 utilizes all rules and data (facts) of the knowledge graph to perform an OR process to obtain a substitution set including entities included in the target triple corresponding to the conclusion of the rule.

여기서, nationality(X,Y) :- placeOfBirth(X,Y)의 규칙은 결론 항인 nationality(X,Y)와 전제 항인 BornIn(X,Y)로 이루어질 수 있다. Here, the rule of nationality(X,Y) :- placeOfBirth(X,Y) can be composed of nationality(X,Y) as a conclusion term and BornIn(X,Y) as a premise term.

OR 과정에서 통합이 성공적으로 수행되면, 뉴로 심볼릭 통합 모듈(100)은 규칙의 전제 항을 만족하는 치환 집합을 업데이트하기 위한 AND 과정을 수행한다. If integration is successfully performed in the OR process, the neurosymbolic integration module 100 performs an AND process to update the substitution set that satisfies the prerequisites of the rule.

규칙의 전제 항이 여러 개가 존재할 경우, 첫 번째 전제 항에 대하여 AND 과정을 수행하고 다시 OR 과정을 호출하는 재귀적 호출이 수행된다. If there are multiple prerequisite terms in a rule, a recursive call is performed in which the AND process is performed on the first prerequisite term and the OR process is called again.

도 2는 본 실시예에 따른 통합(unification) 과정에 대한 간단한 예를 도시한 도면이다. Figure 2 is a diagram showing a simple example of the unification process according to this embodiment.

예를 들어 목표 트리플이 nationality(kim, korea)이고, 주어진 규칙이 nationality(X,Y) :- BornIn(X,Y)와 같을 때, 규칙의 결론 항인 nationality(X,Y)에서 변수 X는 kim, 변수 Y는 korea의 값으로 바인딩되어 치환 집합 {X/kim, Y/korea}을 얻을 수 있다. For example, when the target triple is nationality(kim, korea) and the given rule is equal to nationality(X,Y) :- BornIn(X,Y), the variable , the variable Y is bound to the value of korea to obtain the substitution set {X/kim, Y/korea}.

규칙의 결론 항의 변수에 대해 상기와 같은 바인딩을 통해 구해진 치환 집합에서 규칙의 전제 항에서도 동일한 변수가 사용되면, 미리 구해진 치환 집합을 그대로 적용하여 엔티티와 변수를 바인딩하는 파라미터 패싱(Parameter passing)을 통해 전제 항의 변수를 상수(바인딩된 엔티티의 임베딩 값)로 변환할 수 있다. If the same variable is used in the premise term of the rule in the substitution set obtained through the above binding for the variable in the conclusion term of the rule, the pre-obtained substitution set is applied as is and the entity and variable are bound through parameter passing. Variables in premise terms can be converted to constants (embedding values of bound entities).

이를 통해 규칙의 전제 항인 BornIn(X,Y)을 BornIn(kim, korea)와 같은 트리플 데이터로 변환하여 지식 그래프에서 해당 트리플을 검색하여 목표 트리플에 대한 규칙을 추론할 수 있다.Through this, BornIn(X,Y), the premise term of the rule, can be converted to triple data such as BornIn(kim, korea), and the rule for the target triple can be inferred by searching the triple in the knowledge graph.

뉴로 심볼릭 통합 과정을 통해 목표 트리플을 만족하는 파라미터화된 규칙을 구성하는 릴레이션의 조합을 포함하는 경로를 도출할 수 있다. 하지만 뉴로 심볼릭 통합 과정은 규칙 템플릿의 릴레이션이 실제 지식 그래프에 존재하는 릴레이션과 일치하는 경우에만 가능하기 때문에 추론 규칙 생성을 위한 학습에는 부적합하다. Through the neurosymbolic integration process, a path containing a combination of relations forming a parameterized rule that satisfies the goal triple can be derived. However, the neurosymbolic integration process is only possible when the relation of the rule template matches the relation existing in the actual knowledge graph, so it is not suitable for learning to generate inference rules.

따라서 임의의 릴레이션 p, q, r를 #1, #2, #3와 같이 파라미터화하여 규칙 템플릿을 재구성한다. Therefore, the rule template is reconstructed by parameterizing arbitrary relations p, q, and r as #1, #2, and #3.

도 3은 본 실시예에 따른 파라미터화된 규칙을 이용한 뉴로 심볼릭 통합 과정을 도시한 도면이다. Figure 3 is a diagram illustrating a neurosymbolic integration process using parameterized rules according to this embodiment.

도 3을 참조하면, 최종적으로 도출되는 규칙 grandFatherOf(X,Y) :- fatherOf(X,Z), parentOf(Z,Y). 의 릴레이션을 임의의 릴레이션 정보와 학습하기 위해 규칙 템플릿을 #1(X,Y) :- #2(X,Z), #3(Z,Y)와 같이 파라미터화된 규칙 형태로 변경한다. Referring to Figure 3, the final derived rule grandFatherOf(X,Y) :- fatherOf(X,Z), parentOf(Z,Y). In order to learn the relation with arbitrary relation information, the rule template is changed to a parameterized rule form such as #1(X,Y) :- #2(X,Z), #3(Z,Y).

파라미터화된 규칙는 #1과 같이, 결론 항의 제1 릴레이션, #2, #3와 같이 전제 항의 릴레이션을 포함할 수 있고, 복수의 변수 X, Y, Z를 포함할 수 있다. A parameterized rule may include a first relation of conclusion terms, such as #1, a relation of premise terms, such as #2 and #3, and may include a plurality of variables X, Y, and Z.

파라미터화된 규칙의 제1 내지 제3 릴레이션은 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합 과정을 통해 임베딩 값이 업데이트된다. When the target triple is input to the first to third relations of the parameterized rule, the embedding value is updated through a neuro-symbolic integration process based on Backward Chaining.

여기서, 임베딩 값의 업데이트는, 목표 트리플의 릴레이션과, 파라미터화된 규칙의제1 릴레이션의 유사도 비교 및 파라미터화된 규칙에 포함된 변수와 불완전 지식 그래프의 트리플 데이터의 엔티티와의 비교를 통한 파라미터 패싱, 상기한 파라미터 패싱에 따라 결정된 지식 그래프의 트리플 데이터의 릴레이션과 제2 및 제3 릴레이션의 유사도 비교 과정을 통해 수행될 수 있다.Here, the update of the embedding value is parameter passing through similarity comparison between the relation of the target triple and the first relation of the parameterized rule, and comparison between the variables included in the parameterized rule and the entities of the triple data of the incomplete knowledge graph. , It can be performed through a similarity comparison process between the relation of the triple data of the knowledge graph determined according to the above-described parameter passing and the second and third relations.

보다 상세하게, 임베딩 값 업데이트에서, 목표 트리플의 릴레이션과 상기 제1 릴레이션의 유사도를 비교하여 상기 제1 릴레이션의 임베딩 값을 업데이트하고, 상기 목표 트리플의 복수의 엔티티들을 상기 복수의 변수에 각각 바인딩하여 치환 집합을 획득한다. More specifically, in updating the embedding value, the similarity between the relation of the target triple and the first relation is compared to update the embedding value of the first relation, and the plurality of entities of the target triple are each bound to the plurality of variables. Obtain the permutation set.

다음으로, 획득된 치환 집합을 이용하여 상기 제2 릴레이션과 유사도 비교 대상이 되는 상기 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션을 결정한다. Next, the obtained substitution set is used to determine the relation of triple data included in the incomplete knowledge graph that is subject to similarity comparison with the second relation.

도 3과 같이, 결론 항이 제1 릴레이션(#1), 제1 변수(X) 및 제2 변수(Y)를 포함하고, 제1 전제 항이 제2-1 릴레이션(#2) 및 제1 변수(X) 및 제3 변수(Z)를 포함하고, 제2 전제 항이 제2-2 릴레이션(#3) 및 제3 변수(Z) 및 제2 변수(Y)를 포함하는 경우, 치환 집합은 상기 제1 변수를 상기 목표 트리플의 주어 엔티티와 바인딩하고, 상기 제2 변수를 상기 목표 트리플의 목적어 엔티티와 바인딩하여 획득되며, 제1 변수와 동일한 주어 엔티티를 갖는 트리플 데이터를 상기 불완전 지식 그래프에서 탐색하고, 상기 탐색된 트리플 데이터의 릴레이션과 상기 제2-1 릴레이션의 유사도를 비교하고, 상기 제3 변수를 상기 탐색된 트리플 데이터의 목적어 엔티티와 바인딩하여 상기 치환 집합을 업데이트한다. As shown in Figure 3, the conclusion term includes the first relation (#1), the first variable (X), and the second variable (Y), and the first premise term includes the 2-1 relation (#2) and the first variable ( X) and a third variable (Z), and if the second premise term includes the 2-2 relation (#3) and the third variable (Z) and the second variable (Y), the substitution set is Triple data obtained by binding a first variable with the subject entity of the target triple and binding the second variable with an object entity of the target triple, and searching for triple data having the same subject entity as the first variable in the incomplete knowledge graph, The similarity between the relation of the searched triple data and the 2-1 relation is compared, and the substitution set is updated by binding the third variable with the object entity of the searched triple data.

임베딩 값의 업데이트를 통해 제1 내지 제3 릴레이션에 대한 순서를 포함하는 경로 정보가 도출될 수 있고, 도출된 릴레이션 경로 정보를 이용하여 목표 트리플이 참인 경우, 제1 내지 제3 릴레이션과 참인 릴레이션의 유사도는 1에 가깝게, 거짓인 릴레이션의 유사도는 0으로 수렴하도록 학습된다. Path information including the order for the first to third relations can be derived through updating the embedding value, and when the target triple is true using the derived relation path information, the first to third relations and the true relation The similarity is learned to be close to 1, and the similarity of false relations is learned to converge to 0.

본 발명과 기존의 통합 과정의 차이점은 파라미터화된 규칙의 릴레이션이 임의의 릴레이션을 가질 수 있도록 변경되었기 때문에 목표 트리플의 grandFatherOf와 첫 번째 규칙 항의 릴레이션 파라미터 #1에 대하여 임베딩하여 유사도를 계산하는 것이 가능하다. The difference between the present invention and the existing integration process is that the relation of the parameterized rule has been changed to have an arbitrary relation, so it is possible to calculate the similarity by embedding the grandFatherOf of the target triple and the relation parameter #1 of the first rule term. do.

파라미터화된 규칙의 변수는 목표 트리플의 주어 또는 목적어 엔티티로 바인딩된다. The variables of a parameterized rule are bound to the subject or object entity of the target triple.

따라서 규칙의 결론 항과 목표 트리플에 대한 릴레이션은 #1과 grandFatherOf에 대한 유사도 계산을 수행하고 엔티티에 대해서 X는 ABE, Y는 BART로 바인딩된다. Therefore, the relation for the rule's conclusion term and target triple performs similarity calculation for #1 and grandFatherOf, and for entities, X is bound to ABE and Y is bound to BART.

다음으로 전제 항에 대해서는 앞서 구한 치환 집합({X/ABE, Y/BART})을 이용하며, 동일한 변수인 X에 대해 파라미터 패싱을 적용한다. 규칙의 결론 항에서 X의 값이 ABE로 바인딩되었기 때문에 #2(ABE, Z)에 대하여 통합 과정을 수행한다. Next, the substitution set ({X/ABE, Y/BART}) obtained previously is used for the premise terms, and parameter passing is applied to the same variable, X. Since the value of

지식 그래프로부터 subject 엔티티가 ABE인 트리플 데이터에 대해 통합 과정을 수행하면 지식 그래프의 1번 트리플인 fatherOf(ABE, HOMER)가 도출된다. 이렇게 구해진 트리플 데이터를 통해 릴레이션에 대한 유사도 계산을 수행할 수 있다. When an integration process is performed on triple data whose subject entity is ABE from the knowledge graph, fatherOf(ABE, HOMER), the first triple of the knowledge graph, is derived. Similarity calculation for the relation can be performed using the triple data obtained in this way.

다음으로 #2와 fatherOf 릴레이션 간의 유사도를 계산하고, 치환 집합에 변수 Z에 대한 값으로 HOMER가 저장된다. 이와 같은 과정으로 규칙이 끝날 때까지 재귀적으로 수행하게 되며 최종적으로 규칙의 결론 항 및 전제 항에 대한 릴레이션에 대한 정보를 도출하게 된다.Next, the similarity between #2 and fatherOf relation is calculated, and HOMER is stored as the value for variable Z in the substitution set. This process is performed recursively until the end of the rule, and ultimately information about the relationship about the conclusion and premise terms of the rule is derived.

뉴로 심볼릭 통합의 결과인 릴레이션에 대한 정보는 규칙을 구성하는 각 항이 되기 때문에 규칙의 첫 항부터 마지막 항에 대한 정보까지 바뀌지 않아야 한다. 또한 각 항마다 릴레이션에 대한 유사도 계산은 지식 그래프의 크기가 커질수록 계산량이 증가하게 된다. Since the information about the relation that is the result of neurosymbolic integration becomes each term that makes up the rule, the information about the first term of the rule to the last term should not change. Additionally, the amount of calculation for similarity calculation for each relation increases as the size of the knowledge graph increases.

이를 방지하기 위해 뉴로 심볼릭 통합 과정에서 도출된 릴레이션의 조합을 규칙 항의 순서를 고려한 그룹으로 설정한다. To prevent this, the combination of relations derived from the neurosymbolic integration process is set into a group that takes into account the order of rule terms.

따라서 {(#1, grandfahterOf), (#2, fatherOf), (#3, parentOf)}가 하나의 경로로 설정되어 규칙으로 추론 가능하며, #1은 grandfahterOf, #2은 fatherOf, 그리고 #3은 parentOf와 유사도가 1에 가깝게 학습된다.Therefore, {(#1, grandfahterOf), (#2, fatherOf), (#3, parentOf)} is set as one path and can be inferred as a rule, where #1 is grandfahterOf, #2 is fatherOf, and #3 is The similarity with parentOf is learned close to 1.

지식완성을 위한 규칙 추론 학습을 하기 위해 오차(loss)를 최소화하는 비용 함수(cost function)와 기존의 방식과 다르게 뉴로 심볼릭 통합 과정을 수행하기 때문에 학습 데이터에 대한 정의가 필요하다. In order to learn rule inference for knowledge completion, a cost function that minimizes the error (loss) and a neurosymbolic integration process are performed differently from existing methods, so it is necessary to define the learning data.

학습 데이터는 릴레이션의 경로에 대한 임베딩 학습을 수행해야 하기 때문에 지식 그래프에서 임의의 하나의 트리플을 목표 트리플로 선택하여 뉴로 심볼릭 통합 과정을 수행한다. Since the learning data requires embedding learning on the path of the relation, a random triple from the knowledge graph is selected as the target triple and a neurosymbolic integration process is performed.

수행된 결과의 릴레이션은 positive 데이터로 사용되며, 순서를 반영한 릴레이션 조합을 포함하는 경로를 고려하여 그룹 단위로 사용된다. The relation of the performed result is used as positive data, and is used as a group by considering a path containing a combination of relations that reflects the order.

임의의 릴레이션 p, q, r에 대한 파라미터화된 규칙의 릴레이션 #1, #2, #3와 목표 트리플에 대하여 도출되는 규칙의 경우의 수가 k개라 가정할 때, i번째 경로의 릴레이션은 목표 트리플을 만족하는 하나의 규칙에 대한 릴레이션 정보가 되며, 이를 표현하면

과 같이 표현할 수 있다.Assuming that the number of rule cases derived for relations #1, #2, #3 of parameterized rules for any relation p, q, and r and the target triple is k, the relation of the ith path is the target triple. It becomes relation information about one rule that satisfies, and when expressed,

It can be expressed as follows.

도 4 내지 도 5는 본 실시예에 따른 positive 데이터와 negative 데이터를 도시한 도면이다. Figures 4 and 5 are diagrams showing positive data and negative data according to this embodiment.

도 4를 참조하면, 목표 트리플로부터 도출되는 i번째 경로는 {(#1, grandfahterOf), (#2, fatherOf), (#3, parentOf)}와 같으며, 뉴로 심볼릭 통합 과정을 통한 모든 과정의 결과는 참이기 때문에 positive 데이터로 생성한다. Referring to Figure 4, the ith path derived from the target triple is equal to {(#1, grandfahterOf), (#2, fatherOf), (#3, parentOf)}, and is the path of all processes through the neurosymbolic integration process. Because the result is true, it is generated as positive data.

반대로 도 5에 도시된 바와 같이, negative 데이터는 뉴로 심볼릭 통합 과정을 통해 도출된 각 경로 그룹에 존재하지 않는 릴레이션 조합을 추가하여 생성한다. Conversely, as shown in Figure 5, negative data is generated by adding relation combinations that do not exist in each path group derived through the neurosymbolic integration process.

예를 들어, negative 데이터는 {(#1, grandfahterOf), (#2, grandfahterOf), (#3, childOf)}와 같이 통합 과정에서 도출될 수 없는 릴레이션 조합이다. For example, negative data is a relation combination that cannot be derived from the integration process, such as {(#1, grandfahterOf), (#2, grandfahterOf), (#3, childOf)}.

본 실시예에 따르면, 지식완성을 위한 규칙 추론 학습을 하기 위해 오차를 최소화하는 비용 함수를 정의한다. 학습 데이터는 뉴로 심볼릭 통합을 통해 도출되는 릴레이션의 집합은

이고 지식 그래프 K에 대하여 positive 데이터는 모든 L의 릴레이션

에 대해서

형태가 되며, 각 항에 대한 릴레이션 유사도의 값이 1이 되며, negative 데이터는 모든 L의 릴레이션

에 대해서

형태가 되며, 각항에 대한 릴레이션의 유사도의 값이 0이 되도록 하기 위해 negative log-likelihood를 사용한다. According to this embodiment, a cost function that minimizes the error is defined to learn rule inference for knowledge completion. The learning data is a set of relations derived through neurosymbolic integration.

And for the knowledge graph K, positive data is in all relations of L.

about

form, the relation similarity value for each term is 1, and negative data is in all relations of L.

about

It takes the form, and negative log-likelihood is used to ensure that the similarity value of the relation for each term is 0.

뉴로 심볼릭 통합 과정의 결과는 여러 개의 릴레이션으로 구성된 다수의 경로 집합을 도출하기 때문에 0 또는 1의 유사도 값을 위해 몇 가지의 과정이 필요하며, 학습 데이터에 따라 아래와 같은 계산을 추가적으로 수행한다. Since the result of the neurosymbolic integration process derives a set of multiple paths composed of multiple relations, several processes are required to obtain a similarity value of 0 or 1, and the following calculations are additionally performed depending on the learning data.

먼저 뉴로 심볼릭 통합 과정을 수행한 후, 도출되는 각 경로에 대한 릴레이션의 평균값을 계산한다. 하나의 경로는 규칙에 사용되는 다수의 릴레이션을 포함하기 때문에 이들의 평균값은 규칙에 대한 특징을 의미한다. First, the neurosymbolic integration process is performed, and then the average value of the relation for each derived path is calculated. Since one path includes multiple relations used in a rule, their average value represents a characteristic for the rule.

뉴로 심볼릭 통합 과정을 통해 구해진 다수의 경로는 모두 참이며, 주어진 규칙 템플릿을 통해 모두 만족시키기 위해 파라미터화된 규칙의 릴레이션을 증강(augment)하여 수행하여 처리한다. 하나의 규칙의 경로에서 가장 높은 값을 취하여 다수의 경로에서 최소값을 선택하여 학습하는 방법을 사용한다.The multiple paths obtained through the neuro-symbolic integration process are all true, and are processed by augmenting the relation of parameterized rules to satisfy all of them through the given rule template. A learning method is used by taking the highest value from one rule's path and selecting the minimum value from multiple paths.

뉴로 심볼릭 통합 과정을 통해 도출된 경로를 학습 데이터로 사용하고 앞서 지정한 비용 함수를 뉴로 심볼릭 통합 과정 모듈(100)의 함수로 사용함으로써 릴레이션 임베딩 벡터에 대한 학습을 수행할 수 있다. Learning about the relation embedding vector can be performed by using the path derived through the neuro-symbolic integration process as learning data and using the previously specified cost function as a function of the neuro-symbolic integration process module 100.

도 6은 본 실시예에 따른 지식완성 과정을 설명하기 위한 도면이다. Figure 6 is a diagram for explaining the knowledge completion process according to this embodiment.

도 6을 참조하면, 다음과 같은 불완전한 지식 그래프가 있을 때, 뉴로 심볼릭 통합 모듈(100)을 통해 임베딩 벡터를 학습하고 주어진 규칙 템플릿에 알맞은 임베딩 벡터를 추출함으로써 #1(X,Y) :- #2(X,Z), #3(Z,Y) 형태인 규칙 템플릿에 대하여 grandFatherOf(X,Y) :- fatherOf(X,Z), parentOf(Z,Y) 와 같은 규칙 추출이 가능하다. Referring to FIG. 6, when there is an incomplete knowledge graph as follows, the embedding vector is learned through the neurosymbolic integration module 100 and an embedding vector appropriate for the given rule template is extracted to #1(X,Y) :- # For rule templates in the form of 2(X,Z) and #3(Z,Y), it is possible to extract rules such as grandFatherOf(X,Y) :- fatherOf(X,Z), parentOf(Z,Y).

추출된 규칙을 통해 불완전한 지식 그래프에 대하여 규칙 추론을 수행하게 되면 grandFatherOf(jim, edward) :- fatherOf(jim, roth), parentOf(roth, edward) 와 같은 추론이 가능하게 되며, 이를 통해 불완전 지식 그래프에서 누락되었던 grandFatherOf(jim, edward) 에 대한 링크 연결이 가능하게 된다. If rule inference is performed on the incomplete knowledge graph through the extracted rules, inferences such as grandFatherOf(jim, edward) :- fatherOf(jim, roth), parentOf(roth, edward) become possible, which allows the incomplete knowledge graph A link to grandFatherOf(jim, edward), which was missing in , is now possible.

뉴로 심볼릭 통합 모듈(100)을 통해 다양한 형태의 규칙 템플릿을 학습하고 이에 대한 규칙 추출을 통해 추출된 규칙을 사용한 추론을 사용하게 되면 불완전한 지식 그래프에 대하여 좋은 성능의 지식완성을 수행할 수 있게 된다.By learning various types of rule templates through the neuro-symbolic integration module 100 and using inference using the rules extracted through rule extraction, it is possible to perform knowledge completion with good performance for an incomplete knowledge graph.

도 7은 본 실시예에 따른 지식완성 장치의 구성을 도시한 도면이다. Figure 7 is a diagram showing the configuration of a knowledge completion device according to this embodiment.

도 7에 도시된 바와 같이, 본 실시예에 따른 지식완성 장치는 프로세서(700) 및 메모리(702)를 포함할 수 있다. As shown in FIG. 7, the knowledge completion device according to this embodiment may include a processor 700 and a memory 702.

프로세서(700)는 컴퓨터 프로그램을 실행할 수 있는 CPU(central processing unit)나 그밖에 가상 머신 등을 포함할 수 있다. The processor 700 may include a central processing unit (CPU) capable of executing a computer program or another virtual machine.

메모리(702)는 고정식 하드 드라이브나 착탈식 저장 장치와 같은 불휘발성 저장 장치를 포함할 수 있다. 착탈식 저장 장치는 컴팩트 플래시 유닛, USB 메모리 스틱 등을 포함할 수 있다. 메모리(702)는 각종 랜덤 액세스 메모리와 같은 휘발성 메모리도 포함할 수 있다.Memory 702 may include a non-volatile storage device, such as a non-removable hard drive or a removable storage device. Removable storage devices may include compact flash units, USB memory sticks, etc. Memory 702 may also include volatile memory, such as various types of random access memory.

이와 같은 메모리(702)에는 상기한 뉴로 심볼릭 통합 모듈(100) 및 지식완성 모듈(102)이 수행하는 과정을 위해, 프로세서(700)에 의해 실행 가능한 프로그램 명령어들이 저장된다. In the memory 702, program instructions executable by the processor 700 are stored for the processes performed by the neurosymbolic integration module 100 and the knowledge completion module 102.

본 실시예에 따른 프로그램 명령어들은, 불완전 지식 그래프에 포함된 트리플 데이터의 릴레이션 및 엔티티와, 파라미터화된 규칙에 포함되는 릴레이션을 다차원 공간에 임베딩하고, 링크 연결을 위한 목표 트리플이 입력되는 경우, Backward Chaining 기반의 뉴로 심볼릭 통합(unification) 과정을 통해 상기 파라미터화된 규칙에 포함된 릴레이션의 임베딩 값을 업데이트하고, 상기 업데이트를 통해 상기 목표 트리플을 만족하는 릴레이션의 조합을 포함하는 하나 이상의 경로를 생성하고, 상기 하나 이상의 경로를 이용하여 상기 목표 트리플에 가장 의미론적으로 부합하는 추론 규칙을 생성하고,상기 생성된 추론 규칙을 통해 누락된 링크를 연결한다. Program instructions according to this embodiment embed the relations and entities of triple data included in the incomplete knowledge graph and the relations included in the parameterized rule in a multidimensional space, and when a target triple for link connection is input, Backward Through a chaining-based neuro-symbolic unification process, the embedding value of the relation included in the parameterized rule is updated, and through the update, one or more paths containing a combination of relations that satisfy the target triple are generated, , generate an inference rule that most semantically matches the target triple using the one or more paths, and connect missing links through the generated inference rule.

상기한 본 발명의 실시예는 예시의 목적을 위해 개시된 것이고, 본 발명에 대한 통상의 지식을 가지는 당업자라면 본 발명의 사상과 범위 안에서 다양한 수정, 변경, 부가가 가능할 것이며, 이러한 수정, 변경 및 부가는 하기의 특허청구범위에 속하는 것으로 보아야 할 것이다.The above-described embodiments of the present invention have been disclosed for illustrative purposes, and those skilled in the art will be able to make various modifications, changes, and additions within the spirit and scope of the present invention, and such modifications, changes, and additions will be possible. should be regarded as falling within the scope of the patent claims below.

Claims

As a knowledge completion device using neuro symbolic,
processor; and
Including a memory connected to the processor,
Embedding the relations and entities of triple data included in the incomplete knowledge graph and the relations included in the parameterized rules in a multidimensional space,
When a target triple for link connection is input, the embedding value of the relation included in the parameterized rule is updated through a neuro-symbolic unification process based on Backward Chaining,
Generating one or more paths containing a combination of relations that satisfy the goal triple through the update,
Generate an inference rule that most semantically matches the target triple using the one or more paths,
To connect missing links through the generated inference rules,
Store program instructions executable by the processor,
The parameterized rule consists of a conclusion term including a first relation and a plurality of variables, and a premise term including a second relation and a plurality of variables,
The program commands are:
Compare the similarity between the relation of the target triple and the first relation to update the embedding value of the first relation,
Obtain a substitution set by binding a plurality of entities of the target triple to the plurality of variables, respectively,
Using the obtained substitution set, determine a relation of triple data included in the incomplete knowledge graph that is subject to similarity comparison with the second relation,
The prerequisite terms include the first and second predicate terms,
The conclusion term includes a first relation, a first variable and a second variable,
The first premise term includes a 2-1 relation and a first variable and a third variable,
The second premise term includes a 2-2 relation and a third variable and a second variable,
The substitution set is obtained by binding the first variable with the subject entity of the target triple and binding the second variable with the object entity of the target triple,
The program commands are:
Triple data having the same subject entity as the first variable is searched in the incomplete knowledge graph, and the similarity between the relation of the searched triple data and the 2-1 relation is compared,
A knowledge completion device that updates the substitution set by binding the third variable with the object entity of the searched triple data.

delete

According to paragraph 1,
The program commands are:
Calculate the average value of the embedding values of the relation combinations included in each of the one or more paths,
A knowledge completion device that determines one path for generating the inference rule by applying the average value to a cost function.

A knowledge completion method using neuro-symbols performed by a knowledge completion device using neuro-symbols including a processor and memory,
Embedding the relations and entities of triple data included in the incomplete knowledge graph and the relations included in the parameterized rules in a multidimensional space;
When a target triple for link connection is input, updating the embedding value of the relation included in the parameterized rule through a neuro-symbolic unification process based on backward chaining;
generating one or more paths including a combination of relations satisfying the target triple through the update;
generating an inference rule that most semantically matches the target triple using the one or more paths; and
Including connecting missing links through the generated inference rules,
The parameterized rule consists of a conclusion term including a first relation and a plurality of variables, and a premise term including a second relation and a plurality of variables,
The updating step is,
Updating the embedding value of the first relation by comparing the similarity between the relation of the target triple and the first relation;
Obtaining a substitution set by binding a plurality of entities of the target triple to the plurality of variables, respectively; and
Using the obtained substitution set, determining a relation of triple data included in the incomplete knowledge graph to be compared for similarity with the second relation,
The prerequisite terms include the first and second predicate terms,
The conclusion term includes a first relation, a first variable and a second variable,
The first premise term includes a 2-1 relation and a first variable and a third variable,
The second premise term includes a 2-2 relation and a third variable and a second variable,
The substitution set is obtained by binding the first variable with the subject entity of the target triple and binding the second variable with the object entity of the target triple,
The updating step is,
Searching for triple data having the same subject entity as the first variable in the incomplete knowledge graph, and comparing similarity between the relation of the searched triple data and the 2-1 relation; and
A knowledge completion method comprising updating the substitution set by binding the third variable with a target entity of the searched triple data.

delete

A computer program stored in a computer-readable storage medium that performs the method according to claim 7.

As a knowledge completion system using neuro symbolic,
incomplete knowledge graph;
The relations and entities of the triple data included in the incomplete knowledge graph and the relations included in the parameterized rules are embedded in a multidimensional space, and when a target triple for link connection is input, a neurosymbolic unification process based on Backward Chaining is performed. A neuro-symbolic integration module that updates an embedding value of a relation included in the parameterized rule and generates one or more paths including a combination of relations satisfying the target triple through the update; and
It is generated using the one or more paths, and includes a knowledge completion module that connects missing links using an inference rule that most semantically matches the target triple,
The parameterized rule consists of a conclusion term including a first relation and a plurality of variables, and a premise term including a second relation and a plurality of variables,
The neuro-symbolic integration module is,
Compare the similarity between the relation of the target triple and the first relation to update the embedding value of the first relation,
Obtain a substitution set by binding a plurality of entities of the target triple to the plurality of variables, respectively,
Using the obtained substitution set, determine a relation of triple data included in the incomplete knowledge graph that is subject to similarity comparison with the second relation,
The prerequisite terms include the first and second predicate terms,
The conclusion term includes a first relation, a first variable and a second variable,
The first premise term includes a 2-1 relation and a first variable and a third variable,
The second premise term includes a 2-2 relation and a third variable and a second variable,
The substitution set is obtained by binding the first variable with the subject entity of the target triple and binding the second variable with the object entity of the target triple,
The neuro-symbolic integration module is,
Triple data having the same subject entity as the first variable is searched in the incomplete knowledge graph, and the similarity between the relation of the searched triple data and the 2-1 relation is compared,
A knowledge completion system that updates the substitution set by binding the third variable with the object entity of the searched triple data.