KR20180092194A

KR20180092194A - Method and system for embedding knowledge gragh reflecting logical property of relations, recording medium for performing the method

Info

Publication number: KR20180092194A
Application number: KR1020170017653A
Authority: KR
Inventors: 박성배; 박세영; 이상조; 윤희근
Original assignee: 경북대학교 산학협력단
Priority date: 2017-02-08
Filing date: 2017-02-08
Publication date: 2018-08-17
Also published as: KR101914853B1

Abstract

A method and system for embedding a knowledge graph reflecting a logical attribute, and a recording medium for performing the same are disclosed. The method for embedding a knowledge graph reflecting a logical attribute according to the present invention includes the steps of: collecting a knowledge triple and assigning at least one role of a head role and a tail role to an entity constituting the knowledge triple; mapping a head entity which is the entity to which the head role is assigned, in a first mapping matrix, and mapping a tail entity which is the entity to which the tail role is assigned, in a second mapping matrix; calculating a score function using the first mapping matrix and the second mapping matrix; and learning the knowledge triple using the calculated score function. Accordingly, the present invention can provide a translation-based knowledge graph in which meaning between the entities is effectively reflected.

Description

TECHNICAL FIELD The present invention relates to a knowledge graph embedding method and system that reflects a logical attribute, and a recording medium for performing the same, and a recording medium for performing the same. BACKGROUND ART [0002]

본 발명은 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템, 이를 수행하기 위한 기록매체에 관한 것으로서, 더욱 상세하게는 엔티티간의 논리적 속성이 보존되는 변환 기반의 지식 그래프를 임베딩 할 수 있는 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템, 이를 수행하기 위한 기록매체에 관한 것이다.The present invention relates to a knowledge graph embedding method and system reflecting a logical attribute, and a recording medium for performing the same. More particularly, the present invention relates to a knowledge graph embedding method and system for embedding a knowledge- A graph embedding method and system, and a recording medium for performing the same.

지식을 그래프화하여 표현하는 것은 컴퓨터 장치가 인간 지식을 활용하는 가장 효과적인 방법 중 하나이다. 하지만, 지식 그래프의 희소성으로 인해 실제 응용 프로그램 상에서 지식 그래프를 이용하는 것에는 한계가 있다. 따라서, 희소성을 극복하기 위한 해결책으로, 지식 그래프를 완성하는 기술에 대한 연구들이 진행되고 있다.Graphing and expressing knowledge is one of the most effective ways for a computer device to utilize human knowledge. However, due to the scarcity of knowledge graphs, there is a limit to using knowledge graphs in real applications. Therefore, as a solution to overcome the scarcity, studies are being conducted on techniques for completing a knowledge graph.

그 중에서도, 지식 그래프를 완성하는 가장 유망한 방법은 저차원의 연속된 벡터 공간에 그래프를 임베디드 시키는 기술이다. 이 방법은 지식 그래프의 벡터 표현을 학습하고, 그래프 내의 특정 지식(엔티티)의 개연성 또는 연관성을 벡터 공간의 대수 연산으로 측정한다. 즉, 엔티티(entity)와 관계(relation)가 벡터로 표현되고, 관계는 임베딩 공간의 다른 위치로 엔티티를 번역하는 연산자로 취급된다. 따라서, 높은 개연성을 지닌 지식 인스턴스를 발견함으로써 벡터 공간으로부터 새로운 지식을 발견할 수 있다.Among them, the most promising way to complete the knowledge graph is to embed the graph in a contiguous vector space of low dimension. This method learns the vector representation of the knowledge graph and measures the probabilities or associations of specific knowledge (entities) in the graph by logarithmic computation of the vector space. That is, entities and relations are represented as vectors, and relationships are treated as operators that translate entities to different locations in the embedding space. Therefore, new knowledge can be found from the vector space by finding knowledge instances with high probability.

근래에 들어, 종래의 다양한 지식 임베딩 모델들 중 번역 기반모델을 이용하여지식 그래프를 완성하는 기술이 연구되고 있다. 종래의 번역 기반 지식 그래프 임베딩 방법의 대표적인 예로는 TransE(Bordes et al., 2013), TransH(Wang et al., 2014), TransR(Lin et al.,2015) 및 TransD(Ji et al., 2015)가 있다.Recently, a technique for completing a knowledge graph using translation based models among various conventional knowledge embedding models has been studied. TransE (Lin et al., 2015) and TransD (Ji et al., 2015), TransH (Wang et al., 2014) ).

TransE는 관계(r)과 두 엔티티(h, t)로 구성된 지식 트리플(h, r, t)가 주어질 때, h와 r의 벡터의 합과 동일하도록 t의 벡터를 강제함으로써 h, t, r의 벡터 표현을 찾는 방법이다.TransE가 모든 관계들을 단일 벡터 공간에 포함시키는 반면, TransH와 TransR은 각각의 관계가 자체적인 임베딩 공간을 가지고 있는 것으로 간주하는 기술이다. 한편, Ji et al. (2015)에 개시된 논문에서는 단일 관계 또는 단일 엔티티는 일반적으로 여러 유형을 가지고 있음을 발견하였으며, 이에 따라 엔티티와 관계의 매핑 행렬을 허용하는 TransD를 제안하고 있다.TransE gives h, t, r (t) by forcing a vector of t equal to the sum of the vectors of h and r, given a knowledge triple (h, r, t) consisting of a relation r and two entities . TransH includes all relations in a single vector space, while TransH and TransR are techniques that each relationship regards as having its own embedded space. On the other hand, Ji et al. (2015), we found that a single relationship or a single entity generally has several types, thus suggesting TransD that allows a mapping matrix of entities and relationships.

하지만, 번역 기반 모델과 관련된 종래의 기술들은 관계의 논리적 속성을 무시하고 있다. 이는, 엔티티간의 이행 관계 및 대칭 관계는 번역 기반 모델에 의해 생성된 벡터 공간에서 이행성(transitivity)과 대칭성(symmetricity)을 상실하게 되며, 이에 따라 종래의 기술들은 관계에 대한 새로운 지식 및 관계에 의해 영향을 받는 새로운 지식을 완성할 수 없다는 문제점을 야기시키고 있다.However, conventional techniques related to translation based models ignore the logical properties of the relationship. This means that the transitional and symmetric relationships between the entities lose transitivity and symmetricity in the vector space created by the translation based model, And the new knowledge that is affected can not be completed.

대부분의 지식 그래프에서, 이행 관계 또는 대칭 관계는 일반적인 요소이다. 종래의 지식 트리플이 저장된 데이터베이스에서, 트리플의 약 20%는 전이 관계 또는 대칭관계를 가지고 있다. 따라서, 관계의 논리적 속성을 보존하여 지식 그래프를 완성하는 기술의 필요성이 요구되고 있다.In most knowledge graphs, fulfillment relationships or symmetry relationships are common. In a database in which conventional knowledge triples are stored, about 20% of the triples have a transitive or symmetric relationship. Therefore, there is a need for a technique to complete the knowledge graph by preserving the logical properties of the relationship.

한국공개특허 제10-2016-0064826호Korean Patent Publication No. 10-2016-0064826 한국공개특허 제10-2015-0095577호Korean Patent Publication No. 10-2015-0095577

본 발명의 일측면은 엔티티에 하나 이상의 역할을 부여하여 엔티티간 이행성 및 대칭성 등과 같은 논리적 속성이 보존된 번역 기반 지식 그래프를 임베딩 할 수 있는 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템, 이를 수행하기 위한 기록매체를 제공한다.One aspect of the present invention is a knowledge graph embedding method and system that reflects a logical attribute capable of embedding a translation-based knowledge graph preserving logical attributes such as transitivity and symmetry between entities by assigning one or more roles to an entity, And a recording medium.

본 발명의 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problem of the present invention is not limited to the technical problems mentioned above, and other technical problems which are not mentioned can be understood by those skilled in the art from the following description.

본 발명의 일 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 방법은, 지식 트리플을 수집하고, 상기 지식 트리플을 구성하는 엔티티에 헤드역할 또는 테일역할 중 적어도 하나의 역할을 부여하는 단계, 헤드역할이 부여된 엔티티인 헤드 엔티티를 제1 매핑 행렬로 매핑하고, 테일역할이 부여된 엔티티인 테일 엔티티를 제2 매핑 행렬로 매핑하는 단계, 상기 제1 매핑 행렬과 상기 제2 매핑 행렬을 이용하여 스코어 함수를 산출하는 단계 및 산출된 상기 스코어 함수를 이용하여 상기 지식 트리플을 학습하는 단계를 포함한다.The knowledge graph embedding method according to an embodiment of the present invention includes collecting knowledge triples and assigning at least one of a head role and a tail role to an entity constituting the knowledge triple, Mapping a head entity, which is a granted entity, to a first mapping matrix, mapping a tail entity, which is an entity granted a tail role, to a second mapping matrix, calculating a score function using the first mapping matrix and the second mapping matrix, And learning the knowledge triple using the calculated score function.

상기 지식 트리플은 두 개의 엔티티와 상기 두 개의 엔티티의 연관성을 나타내는 릴레이션으로 구성될 수 있다.The knowledge triple may consist of two entities and a relation indicating the association of the two entities.

상기 스코어 함수는, 헤드 엔티티 벡터와 릴레이션 벡터의 합 벡터를 산출하고, 상기 합 벡터와 테일 엔티티 벡터와의 차이값을 절대값을 나타내는 함수일 수 있다.The score function may be a function that calculates the sum vector of the head entity vector and the relation vector and represents the absolute value of the difference value between the sum vector and the tail entity vector.

상기 지식 트리플을 학습하는 단계는, 수집된 지식 트리플에 누락된 엔티티가 존재하는 경우, 누락된 엔티티를 예측하는 링크 예측단계 및 수집된 지식 트리플이 상기 릴레이션에 의해 오류 없이 표현되는지를 판단하는 트리플 분류단계를 포함할 수 있다.The step of learning the knowledge triple comprises: a link prediction step of predicting a missing entity if there is an entity missing in the collected knowledge triple; and a triple classifying step of determining whether the collected knowledge triple is represented without error by the relation Step < / RTI >

상기 링크 예측단계는, 상기 스코어 함수를 이용하여 누락된 엔티티를 예측하고, 상기 트리플 분류단계는, 상기 스코어 함수의 결과값을 기준값과 비교하여 상기 지식 트리플이 오류 없이 표현었는지를 판단할 수 있다.The link prediction step predicts a missing entity using the score function, and the triple classification step may determine whether the knowledge triple is represented without error by comparing the result value of the score function with a reference value.

상기 매핑하는 단계는, 특정 엔티티가 상기 헤드역할과 상기 테일역할 모두를 부여받은 경우, 상기 특정 엔티티를 상기 제1 매핑 행렬과 상기 제2 매핑 행렬 모두에 매핑하는 단계를 포함할 수 있다.The mapping may include mapping the specific entity to both the first mapping matrix and the second mapping matrix if the particular entity is granted both the head role and the tail role.

본 발명의 일 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템은, 지식 트리플을 수집하고, 상기 지식 트리플을 구성하는 엔티티에 헤드역할 또는 테일역할 중 적어도 하나의 역할을 부여하는 역할 부여부, 헤드 역할이 부여된 엔티티인 헤드 엔티티를 제1 매핑 행렬로 매핑하고, 테일역할이 부여된 엔티티인 테일 엔티티를 제2 매핑 행렬로 매핑하는 매핑부, 상기 제1 매핑 행렬과 상기 제2 매핑 행렬을 이용하여 스코어 함수를 산출하는 산출부 및 산출된 상기 스코어 함수를 이용하여 상기 지식 트리플을 학습하는 학습부를 포함한다.The knowledge graph embedding system in which the logical attribute according to an embodiment of the present invention is reflected includes a role assigning unit that acquires a knowledge triple and assigns at least one role of a head role or a tail role to an entity constituting the knowledge triple, A mapping unit for mapping a head entity, which is an entity to which a role is assigned, to a first mapping matrix, and a tail entity, which is an entity granted a tail role, to a second mapping matrix; And a learning unit for learning the knowledge triple using the calculated score function.

상기 학습부는, 수집된 지식 트리플에 누락된 엔티티가 존재하는 경우, 누락된 엔티티를 예측하거나, 수집된 지식 트리플이 상기 릴레이션에 의해 오류 없이 표현되는지를 판단할 수 있다. The learning unit may predict a missing entity or determine whether the collected knowledge triple is represented without error by the relation, if there is a missing entity in the collected knowledge triple.

상기 매핑부는, 특정 엔티티가 상기 헤드역할과 상기 테일역할 모두를 부여받은 경우, 상기 특정 엔티티를 상기 제1 매핑 행렬과 상기 제2 매핑 행렬 모두에 매핑할 수 있다.The mapping unit may map the specific entity to both the first mapping matrix and the second mapping matrix when the specific entity is given both the head role and the tail role.

또한, 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템을 제공하기 위한, 컴퓨터 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체일 수 있다.Further, it may be a computer-readable recording medium on which a computer program is recorded, for providing a knowledge graph embedding method and system in which logical attributes are reflected.

상술한 본 발명의 일측면에 따르면, 엔티티에 적어도 하나의 역할을 부여하여 엔티티간의 이행성 및 대칭성과 같은 논리적 속성이 보존된 지식 그래프를 임베딩할 수 있으며, 이에 따라 엔티티 간의 의미가 효과적으로 반영된 번역 기반 지식 그래프를 제공할 수 있다.According to one aspect of the present invention, it is possible to embed at least one role to an entity, thereby embedding a knowledge graph preserving logical attributes such as transitivity and symmetry between entities, Knowledge graphs can be provided.

도 1은 본 발명의 일 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템의 개략적인 구성을 나타내는 도면이다.
도 2는 종래 기술에 따른 번역 기반 임베딩의 일 예를 나타내는 도면이다.
도 3 내지 도 4는 도 1의 매핑부에 의해 논리적 속성이 보존된 벡터 표현의 일 예를 나타내는 도면이다.
도 5 내지 도 11은 도 1의 시스템의 성능 결과를 나타내는 도면이다.
도 12는 본 발명의 일 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 방법의 개략적인 흐름을 나타내는 도면이다.1 is a diagram showing a schematic configuration of a knowledge graph embedding system in which logical attributes are reflected according to an embodiment of the present invention.
2 is a diagram illustrating an example of translation-based embedding according to the prior art.
FIGS. 3 to 4 are diagrams showing an example of a vector expression in which logical attributes are preserved by the mapping unit of FIG. 1. FIG.
5 to 11 are diagrams showing the performance results of the system of FIG.
FIG. 12 is a flowchart illustrating a method of embedding a knowledge graph reflecting a logical attribute according to an embodiment of the present invention. Referring to FIG.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템(1000)의 개략적인 구성을 도시한 블록도이다. FIG. 1 is a block diagram showing a schematic configuration of a knowledge graph embedding system 1000 in which logical attributes are reflected according to an embodiment of the present invention.

구체적으로, 본 실시예에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템(1000)은 역할 부여부(10), 매핑부(20), 산출부(30) 및 학습부(40)를 포함한다.Specifically, the knowledge graph embedding system 1000 incorporating the logical attribute according to the present embodiment includes a role assigning unit 10, a mapping unit 20, a calculating unit 30, and a learning unit 40.

역할 부여부(10)는 엔티티에 역할을 부여할 수 있다. The role assignment (10) can assign a role to an entity.

엔티티(entity)는 의미를 갖는 정보의 단위로, 파일 처리 시스템에서는 한 건의 자료를 구성하는 레코드 또는 정보가 하나의 엔티티에 해당된다. 각 엔티티간의 관련을 관계(릴레이션쉽, relationship)라고 하며, 복수의 엔티티간의 관계를 지도로 표현한 것을 지식 그래프로 정의할 수 있다.An entity is a unit of meaningful information. In a file processing system, a record or information constituting a single piece of data corresponds to a single entity. The relationship between entities is called a relationship (relationship), and a map representation of a relationship between a plurality of entities can be defined as a knowledge graph.

역할 부여부(10)는 엔티티에 헤드 역할 또는 테일 역할을 부여할 수 있다. 이하에서는, 헤드 역할을 부여받은 엔티티를 헤드 엔티티로, 테일 역할을 부여받은 엔티티를 테일 엔티티로 정의한다. The role assignment (10) can assign an entity a head role or a tail role. Hereinafter, an entity to which a head role is assigned is defined as a head entity, and an entity to which a tail role is assigned is defined as a tail entity.

역할 부여부(10)는 외부 데이터베이스로부터 지식 트리플들을 수집하고, 수집된 지식 트리플에 포함된 각각의 엔티티에 적어도 하나의 역할을 부여할 수 있다. 즉, 하나의 엔티티는 두 개의 역할을 부여받을 수도 있다. 예를 들어, 역할 부여부(10)는 제1 엔티티는 헤드 역할을 부여하고, 제2 엔티티는 테일 역할을 부여하고, 제3 엔티티는 헤드 및 테일 모두의 역할을 부여할 수 있다. 이에 따라, 제1 엔티티는 헤드 엔티티로, 제2 엔티티는 테일 엔티티로 동작될 수 있다. 제1 엔티티 및 제2 엔티티가 하나의 역할을 수행하는 반면, 제3 엔티티는 상황에 따라 헤드 엔티티 또는 테일 엔티티 양 쪽의 역할을 모두 수행할 수 있다.The role assignment 10 may collect knowledge triples from an external database and assign at least one role to each entity included in the collected knowledge triple. That is, one entity may be granted two roles. For example, the role assignment (10) may grant a first entity a role of a head, a second entity a role of a tail, and a third entity may assign a role of both a head and a tail. Accordingly, the first entity may be operated as a head entity, and the second entity may be operated as a tail entity. While the first entity and the second entity perform a single role, the third entity may perform both the roles of both the head entity and the tail entity depending on the situation.

매핑부(20)는 지식 트리플을 벡터 공간에 매핑할 수 있다. 즉, 매핑부(20)는 헤드 엔티티(h)와 테일 엔티티(t)간의 연관성을 관계 r을 이용하여 표현할 수 있다. 매핑부(20)는 지식 트리플을 구성하는 (h, r, t)를 벡터 공간에 매핑하며, 따라서, 지식 트리플은 벡터 형태로 표현될 수 있다.The mapping unit 20 can map the knowledge triple to the vector space. That is, the mapping unit 20 can express the association between the head entity h and the tail entity t using the relation r. The mapping unit 20 maps (h, r, t) constituting the knowledge triple to the vector space, and thus the knowledge triple can be expressed in the form of a vector.

이때, 매핑부(20)는 엔티티 벡터에 부여된 역할에 따라 엔티티 벡터를 서로 다른 매핑 행렬에 매핑시킬 수 있다. 구체적으로, 매핑부(20)는 헤드 엔티티를 제1 매핑 행렬에 매핑시키고, 테일 엔티티를 제2 매핑 행렬에 매핑시킬 수 있다. 즉, 헤드 엔티티와 테일 엔티티는 서로 다른 매핑 행렬에 매핑되고, 동일한 매핑 행렬은 동일한 역할을 부여받은 엔티티들만 투영될 수 있다. At this time, the mapping unit 20 may map entity vectors to different mapping matrices according to roles assigned to the entity vectors. Specifically, the mapping unit 20 may map the head entity to the first mapping matrix, and map the tail entity to the second mapping matrix. That is, the head entity and the tail entity are mapped to different mapping matrices, and the same mapping matrix can be projected only to entities assigned the same role.

매핑부(20)는 적어도 하나의 역할이 부여된 엔티티를 이용하여 벡터 공간 상에서 논리적 속성이 보존된 지식 그래프를 임베딩할 수 있다. 즉, 역할이 부여된 엔티티를 이용하는 본원 발명에 따르면 지식 그래프를 구성하는 엔티티들은 이행 관계(transitive relation) 또는 대칭 관계(symmetric relation)로 표현될 수 있다. 이러한 매핑부(20)의 구체적인 기능은 도 2 내지 도 4를 참조하여 후술하기로 한다.The mapping unit 20 may embed a knowledge graph in which logical attributes are stored in a vector space using an entity to which at least one role is assigned. That is, according to the present invention using an entity to which a role is assigned, the entities constituting the knowledge graph can be represented by a transitive relation or a symmetric relation. The specific function of the mapping unit 20 will be described later with reference to FIG. 2 to FIG.

산출부(30)는 서로 다른 매핑 행렬에 매핑된 엔티티 벡터에 대한 스코어 함수(score function)를 산출할 수 있다. 스코어 함수는 에러 함수(error function)로도 불리우며, 벡터공간에 투영된 지식 트리플이 올바르게 표현된 것인지를 판단하는 척도가 되는 함수이다. The calculating unit 30 may calculate a score function for an entity vector mapped to different mapping matrices. The score function, also called an error function, is a function that is a measure for determining whether a knowledge triple projected in a vector space is correctly represented.

예를 들어, 종래 기술인 TransE(Bordes et al., 2013)에 개시된 스코어 함수는 다음과 같다.For example, the score function disclosed in the prior art TransE (Bordes et al., 2013) is as follows.

단일 임베딩 공간에서, h는 헤드 엔티티 벡터, t는 테일 엔티티 벡터, r은 관계를 의미하며,

,

이다. 여기서,

은 n개의 엔티티들로 구성된 엔티티 집합을 의미한다.In a single embedding space, h denotes a head entity vector, t denotes a tail entity vector, r denotes a relation,

,

to be. here,

Is an entity set consisting of n entities.

상술한 바와 같이, TransE는 관계(r)과 두 엔티티(h, t)로 구성된 지식 트리플(h, r, t)가 주어질 때, h와 r의 벡터의 합과 동일하도록 t의 벡터를 강제함으로써 h, t, r의 벡터 표현을 찾는 방법이다. 즉, 스코어 함수는 벡터 h와 관계 벡터 r의 합에서 벡터 t를 뺀 절대값을 의미하며, 그 값이 0에 가까울수록 두 엔티티(h, t)가 관계 r에 의해 올바르게 표현된 것임을 나타낸다.As described above, TransE is obtained by forcing a vector of t equal to the sum of the vectors of h and r, given a knowledge triple (h, r, t) composed of a relation r and two entities h, t, and r. That is, the score function means an absolute value obtained by subtracting the vector t from the sum of the vector h and the relation vector r, and the closer the value is to 0, the more correctly the two entities h and t are represented by the relation r.

산출부(30)는 종래에 개시된 스코어 함수를 이용하여 논리적 속성이 보존된 스코어 함수를 산출할 수 있다. 본원 발명에 따르면, 역할 부여부(10)는 엔티티에 적어도 하나의 역할(헤드 또는 테일)을 부여하고, 매핑부(20)는 다른 역할이 부여된 엔티티를 다른 매핑 행렬에 매핑함으로써, 이행성(transitivity)과 대칭성(symmetricity)과 같은 논리적 속성이 보존된 지식 그래프를 임베딩할 수 있다. The calculating unit 30 can calculate the score function in which the logical attribute is preserved using the score function disclosed in the prior art. According to the present invention, the role assignment (10) assigns at least one role (head or tail) to the entity, and the mapping unit (20) maps the entity assigned another role to another mapping matrix, transitivity and symmetricity of a knowledge graph with preserved logical properties.

따라서, 논리적 속성이 보존된 TrasE(logical property preserving TransE, lppTransE) 상에서, h와 t는 서로 다른 공간에 매핑될 수 있다. 이를 위해, 헤드 공간 매핑 행렬(M_h,

)와 테일 공간 매핑 행렬(M_t,

)을 이용할 수 있다. 이에 따라, 산출부(30)는 lppTransE의 스코어 함수를 아래와 같이 산출할 수 있다.Thus, on logical property preserving TransE (lppTransE) where logical attributes are preserved, h and t can be mapped to different spaces. To this end, a head space mapping matrix ( _Mh ,

) And a tail space mapping matrix (M _t ,

) Can be used. Accordingly, the calculating unit 30 can calculate the score function of lppTransE as follows.

IppTransE의 스코어 함수는 후술할 TransR의 스코어 함수와 유사할 수 있다. 하지만, TransR은 h와 t가 단일 매핑 행렬에 의해 매핑되는 반면, 본원 발명에 따른 IppTransE는 h는 M_h로 매핑되고, t는 M_t로 매핑될 수 있다.The score function of IppTransE may be similar to the score function of TransR, described below. However, TransR is mapped by a single mapping matrix, h and t are mapped by a single mapping matrix, whereas IppTransE according to the present invention is mapped to h by M _h and t can be mapped by M _t .

다른 예로, 산출부(30)는 종래의 TransR(Lin et al., 2015)의 스코어 함수를 이용하여 논리적 속성이 보존된 TransR(lppTransR)의 스코어 함수를 산출할 수 있다.As another example, the calculating unit 30 may calculate the score function of the TransR (lppTransR) whose logical attribute is preserved using the score function of the conventional TransR (Lin et al., 2015).

TransR에서 엔티티는 관계에 따라 서로 다른 관계 공간의 벡터로 매핑될 수 있다. 이를 수학식으로 표현하면 다음과 같다.In TransR, entities can be mapped to vectors of different relationship spaces according to the relationship. This can be expressed as follows.

여기서, M_r은 관계 r에 대한 매핑 행렬이다. 따라서, 논리적 속성이 보존된 TransR(lppTransR)에서, 각 관계의 매핑 행렬은 헤드 매핑 행렬(

,

)과 테일 매핑 행렬(

,

)로 분할될 수 있다. 그리고, 산출부(30)는 논리적 속성이 보존된 TransR(lppTransR)의 스코어 함수를 다음의 수학식을 이용하여 산출할 수 있다.Where M _r is a mapping matrix for the relation r. Therefore, in the TransR (lppTransR) in which the logical attribute is preserved, the mapping matrix of each relation is a head mapping matrix

,

) And a tail mapping matrix (

,

). &Lt; / RTI > Then, the calculating unit 30 can calculate the score function of the TransR (lppTransR) in which the logical attribute is stored, using the following equation.

두 개의 구분된 매핑 행렬을 이용하여, 엔티티들은 동일한 관계 공간에서 두 개의 서로 다른 벡터 표현을 가질 수 있다.Using two separate mapping matrices, entities can have two different vector representations in the same relationship space.

또 다른 예로, 산출부(30)는 종래의 TransD(Ji et al., 2015)의 스코어 함수를 이용하여 논리적 속성이 보존된 TransD(lppTransD)의 스코어 함수를 산출할 수 있다.As another example, the calculating unit 30 may calculate the score function of TransD (lppTransD) in which the logical attribute is preserved using the score function of the conventional TransD (Ji et al., 2015).

TransD는 엔티티 유형과 관계 유형에 따라 관계 공간에 포함된 서로 다른 벡터에 엔티티 벡터를 매핑할 수 있다. 즉, 엔티티 벡터는 엔티티-관계 특정 매핑 행렬에 의해 매핑될 수 있으며, 이를 수학식으로 표현하면 다음과 같다.TransD can map entity vectors to different vectors contained in the relationship space depending on the entity type and relationship type. That is, the entity vector can be mapped by an entity-relationship specific mapping matrix, which can be expressed as follows.

여기서,

과

은 엔티티-관계 특정 매핑 행렬을 의미한다. 상술한 매핑 행렬은 엔티티의 투영 벡터와 관계를 다음의 수학식에 따라 연산하여 산출될 수 있다.here,

and

Means an entity-relationship specific mapping matrix. The above-described mapping matrix can be calculated by calculating the relationship between the projection vector of the entity and the following equation.

여기서, h_p, t_p 및 r_p는 각각 헤드, 테일 및 관계에 대한 투영 벡터를 의미한다. 또한, 논리적 속성이 보존된 TransD(lppTransD)는 엔티티의 역할이 반영되도록 r_p를 두 개의 투영 벡터(

,

)로 나눌 수 있다. 매핑 행렬은 하기의 수학식에 의해 산출될 수 있다.Where h _p , t _p, and r _p refer to the projection vector for the head, tail, and relationship, respectively. In addition, TransD (lppTransD) with preserved logical properties can transform r _p into two projection vectors

,

). The mapping matrix can be calculated by the following equation.

그리고, 논리적 속성이 보존된 TransD(lppTransD)의 스코어 함수는 다음과 같다.And the score function of TransD (lppTransD) with logical attribute preservation is as follows.

이와 같이, 산출부(30)는 종래에 개시된 스코어 함수를 이용하여 논리적 속성이 반영된 스코어 함수를 산출할 수 있다.In this way, the calculating unit 30 can calculate the score function reflecting the logical attribute using the score function disclosed in the prior art.

학습부(40)는 논리적 속성이 반영된 스코어 함수를 이용하여 논리적 속성이 반영된 지식 그래프를 학습할 수 있다. 이를 수학식으로 표현하면 아래와 같다.The learning unit 40 can learn the knowledge graph in which the logical attribute is reflected by using the score function reflecting the logical attribute. This can be expressed by the following equation.

여기서,

은 논리적 속성이 보존된 임베딩에 대응하는 스코어 함수를 의미하고, P는 올바른 트리플 세트, N은 잘못된 트리플 세트이고,

은 마진(margin)을 의미한다. 지식 그래프는 올바른 트리플만 존재하기 때문에, N은 지식 기존 트리플에서 헤드 또는 테일 엔티티를 대체하여 생성될 수 있다. here,

P denotes a correct triple set, N denotes an invalid triple set,

Means a margin. Since the knowledge graph has only the correct triple, N can be generated by replacing head or tail entities in existing knowledge triples.

즉, 학습부(40)는 상술한 수학식에 나타난 바와 같이, 마진 기반 랭킹 손실(margin-based ranking loss)을 이용하여 올바른 트리플과 잘못된 트리플 셋을 학습할 수 있다. That is, as shown in the above-described equation, the learning unit 40 can learn a correct triple and an incorrect triple set using margin-based ranking loss.

학습부(40)는 확률적(stochastic) 기울기 하강(gradient descent) 알고리즘으로 지식 트리플을 학습할 수 있다. 학습부(40)는 스코어 함수가 3차원으로 표현된 3차원 공간 상에서, 어느 하나의 고점으로부터 저점에 도달하는 방법으로 지식 트리플을 학습할 수 있다. 상술한 수학식에 의해 올바른 지식 트리플 셋과 잘못된 지식 트리플 셋은 3차원 공간 상에서 구분되어 분포될 수 있다. 학습부(40)는 올바른 지식 트리플 분포의 고점으로부터 저점으로 도달하는 경로 동안 단계적으로 올바른 지식 트리플을 학습할 수 있다. 학습부(40)는 잘못된 지식 트리플을 유사한 방법으로 학습할 수 있다. 따라서, 본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템은 확률적 기울기 하강 알고리즘에 의해 최적화될 수 있다. The learning unit 40 can learn a knowledge triple with a stochastic gradient descent algorithm. The learning unit 40 can learn the knowledge triple in such a manner that the score function reaches the low point from any one of the highest points on the three-dimensional space expressed in three dimensions. According to the above equation, the correct knowledge triple set and the incorrect knowledge triple set can be separately distributed in the three-dimensional space. The learning unit 40 can learn the correct knowledge triple stepwise during the path from the high point of the correct knowledge triple distribution to the low point. The learning unit 40 can learn the incorrect knowledge triple in a similar manner. Therefore, the knowledge graph embedding system reflecting the logical attributes according to the present invention can be optimized by the stochastic gradient descent algorithm.

도 2 내지 도 4를 참조하면, 도 1의 매핑부(20)의 구체적인 기능이 도시된다.Referring to Figs. 2 to 4, the specific function of the mapping unit 20 of Fig. 1 is shown.

도 2는 엔티티의 역할을 고려하지 않은 종래의 번역 기반 임베딩의 일 예를 나타내는 도면이다.FIG. 2 is a diagram illustrating an example of conventional translation-based embedding without considering the role of an entity.

번역 기반 임베딩은 공간에서 엔티티의 변환에 따른 관계를 고려함으로써, 임베딩 공간에서 지식 그래프 엔티티의 벡터 표현을 찾는 것을 목적으로 한다. 종래에는, 엔티티의 역할을 고려하지 않고 벡터 공간에 엔티티를 매핑하기 때문에, 이행성 및 대칭성과 같은 관계의 논리적 속성을 표현하지 못하는 문제점이 있다. 즉, 이행 관계 또는 대칭 관계의 벡터는 임베딩 공간에서 이행성 또는 대칭성을 상실하게 된다.Translation based embedding aims at finding the vector representation of knowledge graph entities in the embedding space by taking into account the relationships of the entities in the space. Conventionally, since an entity is mapped to a vector space without considering the role of the entity, there is a problem in that it can not express the logical attribute of the relationship such as transitivity and symmetry. That is, the vector of the transition relation or the symmetry relation loses the transitivity or symmetry in the embedding space.

예를 들어, 세 개의 트리플 (e₁, r₁, e₂), (e₂, r₁, e₃), (e₁, r₁, e₃)이 주어지고, r₁은 이행 관계인 경우, r₁이 제로 벡터가 아니면 도시된 바와 같이 세 종류의 엔티티 벡터가 존재할 수 있다. 도 2의 (a)에서, e₁, e₂, e₃는 선형(linearly)으로 배치될 수 있다. 이 경우, 종래에 개시된 기술은 (e₁, r₁, e₃)의 트리플을 표현할 수 없다. 또한, 도 2의 (b)에 도시된 바와 같이, e₁ 과 e₂가 동일한 지점에 배치되면 (e₁, r₁, e₂)의 트리플이 표현될 수 없다. 그리고, 도 2의 (c)에 도시된 바와 같이, e₂와 e₃가 동일한 지점에 배치되면 (e₂, r₁, e₃)의 트리플이 표현될 수 없다. 이와 유사하게, 종래의 번역 기반 임베딩 기술은 대칭 관계 또한 완벽하게 표현할 수 없다는 문제점이 있다.For example, if is the subject of three triple _{_{(e 1, r 1, e}} 2), (e 2, r 1, e 3), (e 1, r 1, e 3), r 1 is the implementation parties, If r ₁ is not a zero vector, there can be three kinds of entity vectors as shown. In FIG. 2 (a), e ₁ , e ₂ , and e ₃ may be arranged linearly. In this case, the techniques disclosed in the prior art can not represent triples of (e ₁ , r ₁ , e ₃ ). Furthermore, as shown in 2 (b), the e ₁ and e ₂ when placed on the same point (e _1, r _1, e ₂₎ it is not be expressed in the triple. And, as shown in FIG.'S 2 (c), the e ₂ and e ₃ when located at the same point (e _2, r _1, e ₃₎ this can not be expressed in the triple. Similarly, the conventional translation based embedding technique has a problem that the symmetric relation can not be expressed perfectly.

이행 관계 및 대칭 관계의 잘못된 표현으로 인해 야기되는 문제점은 두 가지이다.There are two problems caused by misrepresentation of migration relations and symmetry relationships.

첫번째로, 논리적 속성을 지닌 릴레이션(관계)는 지식 기반의 파일 처리 시스템에서는 일반적이라는 것이다. 예를 들어, 기준 데이터셋 중 하나인 FB15K는 483,142개의 트리플로 구성되어 있으며, 그 중 84,172개의 트리플은 이행 관계 또는 대칭 관계를 가지고 있다. 따라서, 종래의 변환 기반 임베딩 기술은 약 17%의 트리플을 정확하게 표현하지 못할 수 있음을 의미한다. 또 다른 기준 데이터셋 중 하나인 WN18은, 22.4%의 트리플이 이행 관계 또는 대칭 관계를 가지고 있다.First, relations with logical attributes are common in knowledge-based file processing systems. For example, FB15K, one of the reference datasets, consists of 483,142 triples, of which 84,172 triples have a transitional or symmetric relationship. Thus, conventional transformation-based embedding techniques may not accurately represent about 17% triples. One of the other reference datasets, WN18, has a 22.4% triple transitional or symmetric relationship.

두번째로, 이행 또는 대칭 관계는 릴레이션에 의해 직접 연결된 엔티티들에게는 영향을 미치지 못하지만, 엔티티를 통해 비이행 관계 또는 비대칭 관계에 의해 공유되는 다른 엔티티에게 영향을 미친다는 점이다. 따라서, 번역 기반 임베딩 기술에서 이행 관계 및 대칭 관계를 나타내는 것은 중요하다.Second, a fulfillment or symmetry relationship does not affect entities directly connected by a relation, but affects other entities that are shared by non-fulfillment or asymmetric relationships through entities. Therefore, it is important to demonstrate a fulfillment relationship and a symmetric relationship in a translation-based embedding technique.

도 3 내지 도 4를 참조하면, 도 1의 매핑부(20)에 의해 논리적 속성이 보존된 벡터 표현의 일 예가 도시된다.Referring to Figs. 3 to 4, an example of a vector expression in which logical attributes are preserved by the mapping unit 20 of Fig. 1 is shown.

도 3은 매핑부(20)에 의해 이행 관계가 반영된 벡터 표현의 일 예를 나타내는 도면이다.3 is a diagram showing an example of a vector expression in which the transition relation is reflected by the mapping unit 20.

상술한 바와 같이, 종래의 번역 기반 임베딩 기술들은 엔티티가 벡터 공간에 임베디드 될 때 엔티티의 역할을 무시하기 때문에, 이행 관계 또는 대칭 관계가 정확하게 표현되지 않는다.As described above, the conventional translation-based embedding techniques ignore the role of the entity when the entity is embedded in the vector space, so that the transition relation or the symmetric relation is not accurately represented.

반면, 본 발명에 따른 매핑부(20)는 엔티티에 부여된 적어도 하나의 역할에 따라 엔티티를 표현할 수 있다. 즉, 매핑부(20)On the other hand, the mapping unit 20 according to the present invention may represent an entity according to at least one role assigned to the entity. That is,

도시된 도면에서, 실선은 헤드 역할로 매핑되는 것을 의미하고, 점선은 테일 역할로 매핑되는 것을 의미한다. 예를 들어, 세 개의 트리플 (e₁, r₁, e₂), (e₂, r₁, e₃), (e₁, r₁, e₃)이 주어지고, r₁은 이행 관계인 경우, 역할 부여부(10)는 e₁은 헤드 역할만 수행하고, e₃는 테일 역할만 수행하는 반면, e₂는 두 역할을 모두 수행하도록 역할을 부여할 수 있다. 그리고, 매핑부(20)는 엔티티 공간에 존재하는 엔티티 벡터를 서로 다른 두 개의 매핑 행렬(

,

)을 이용하여 r₁의 공간으로 매핑시킬 수 있다. 즉, 매핑부(20)는 헤드 엔티티를

매핑 행렬에 매핑시키고, 테일 엔티티를

매핑 행렬에 투영시킬 수 있다.In the figure, the solid line means to be mapped to the head role, and the dotted line to be mapped to the tail role. For example, if is the subject of three triple _{_{(e 1, r 1, e}} 2), (e 2, r 1, e 3), (e 1, r 1, e 3), r 1 is the implementation parties, whether the role section 10 e ₁ performs only the head and serve, e _3, while performing only the tail acts, e ₂ can be given a role to perform both roles. The mapping unit 20 transforms the entity vector existing in the entity space into two different mapping matrices

,

) Can be used to map to the space of r ₁ . That is, the mapping unit 20 stores the head entity

Mapping to the mapping matrix, and the tail entity

Can be projected onto a mapping matrix.

도시된 도면에서,

은 제1 매핑 행렬

에 투영된 제1 엔티티(e1)의 벡터를 나타내고,

은 제1 매핑 행렬

에 투영된 제2 엔티티(e2)의 벡터를 나타내고,

은 제2 매핑 행렬

에 투영된 제2 엔티티(e2)의 벡터를 나타내고,

은 제2 매핑 행렬

에 투영된 제3 엔티티(e3)의 벡터를 의미한다. 즉,

과

는 (e₂, r₁, e₃)과 (e₁, r₁, e₃)의 두 지식 트리플의 r₁의 공간에서 동일한 위치에 배치될 수 있다. 이와 유사하게,

와

은 (e1, r1, e2)와 (e1, r1, e3)로부터 동일한 위치에 배치될 수 있다. 여기서, 제2 엔티티(e₂)는 역할 부여부(10)에 의해 헤드와 테일 모두의 역할을 부여받았기 때문에,

와

로 다르게 매핑될 수 있다. 결과적으로, 도시된 바와 같이, 역할 부여부(10)가 엔티티에 적어도 하나의 역할을 부여함으로써 매핑부(20)는 세 개의 트리플을 공간에 모두 표현할 수 있다.In the drawing,

A first mapping matrix

Gt; e1 < / RTI >

A first mapping matrix

Represents the vector of the second entity e2,

A second mapping matrix

Represents the vector of the second entity e2,

A second mapping matrix

(E3) of the third entity projected on the third entity e3. In other words,

and

Can be placed at the same position in the space of r ₁ of the two knowledge triples (e ₂ , r ₁ , e ₃ ) and (e ₁ , r ₁ , e ₃ ). Similarly,

Wow

Can be arranged at the same position from (e1, r1, e2) and (e1, r1, e3). Here, since the second entity e ₂ is given the role of both the head and the tail by the role assignment unit 10,

Wow

. &Lt; / RTI > As a result, as shown, the role assigning unit 10 assigns at least one role to the entity, so that the mapping unit 20 can represent all three triples in space.

도 4는 매핑부(20)에 의해 대칭 관계가 반영된 벡터 표현의 일 예를 나타내는 도면이다.FIG. 4 is a diagram showing an example of a vector expression in which a symmetric relationship is reflected by the mapping unit 20. FIG.

도 3을 참조한 상술한 실시예와 유사하게, 매핑부(20)는 대칭 관계 또한 명확하게 표현할 수 있다. 도 4는 대칭 릴레이션(관계) r₂를 갖는 두 개의 트리플 (e₄, r₂, e₅)와 (e₅, r₂, e₄)가 주어지는 경우를 나타내는 도면이다. 도시된 실시예에서, 실선으로 표시된 부분은 엔티티가 매핑 행렬

에 의해 매핑되는 것을 나타내고, 점선으로 표시된 부분은 엔티티가 매핑 행렬

로 매핑되었음을 나타낸다. 매핑부(20)는

와

를 동일한 위치에 배치시키고,

와

또한 동일한 위치에 배치시킬 수 있다. 이에 따라, r₂는 임베딩 공간에서 대칭 관계를 정확하게 표현할 수 있다.Similar to the embodiment described above with reference to Fig. 3, the mapping section 20 can also clearly express the symmetry relationship. 4 is a diagram showing a case where two triples (e ₄ , r ₂ , e ₅ ) and (e ₅ , r ₂ , e ₄ ) having a symmetrical relation (relation) r ₂ are given. In the illustrated embodiment, the solid lines indicate that the entity is a mapping matrix

And the dotted line indicates that the entity is mapped by the mapping matrix < RTI ID = 0.0 >

Lt; / RTI > The mapping unit 20

Wow

Are arranged at the same position,

Wow

They can be arranged at the same position. Thus, r ₂ can accurately represent the symmetry relationship in the embedding space.

도 5 내지 도 10은 본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템의 성능 결과를 나타내는 도면이다.5 to 10 are diagrams illustrating performance results of a knowledge graph embedding method and system in which logical attributes according to the present invention are reflected.

구체적으로, 도 5는 지식 그래프 임베딩 모델들의 복잡도를 나타내는 도표이고, 도 6은 성능 결과에 사용되는 데이터 셋의 특징을 나타내는 도표이고, 도 7은 링크 예측에 사용되는 파라미터 값을 나타내는 도표이고, 도 8은 링크 예측의 성능 결과를 종래 기술들과 비교한 도표이고, 도 9는 트리플 분류에 사용되는 파라미터 값을 나타내는 도표이며, 도 10은 트리플 분류의 성능 결과를 종래 기술들과 비교한 도표이다.Specifically, FIG. 5 is a chart showing the complexity of the knowledge graph embedding models, FIG. 6 is a chart showing characteristics of a data set used in a performance result, FIG. 7 is a table showing parameter values used in link prediction, 8 is a chart comparing the performance results of the link prediction with the prior art, FIG. 9 is a table showing the parameter values used in the triple classification, and FIG. 10 is a chart comparing the performance results of the triple classification with the prior art.

도 5는 지식 그래프 임베딩 모델들의 복잡도를 나타낸다.Figure 5 shows the complexity of knowledge graph embedding models.

도시된 도면에서, Ne와 Nr은 엔티티와 릴레이션(관계)의 수를 나타내고, n과 m은 각각 임베딩 공간의 엔티티 및 관계의 차원을 의미한다. 복잡도는 주로 릴레이션의 개수에 의존하는 경향이 있다. 따라서, 종래 기술인 TransE, TransR 및 TransD와 비교할 때, 본원 발명에 따른 논리적 속성이 반영된 임베딩의 복잡도 증가는 중요하지 않다.In the figure, Ne and Nr represent the number of relations with an entity, and n and m denote entities and dimensions of the embedding space, respectively. The complexity tends to depend mainly on the number of relations. Therefore, when compared with the prior art TransE, TransR and TransD, the increase in complexity of the embedding reflecting the logical attribute according to the present invention is not important.

도 6은 성능 결과에 사용되는 데이터 셋의 특징을 나타내는 도표이다. Figure 6 is a chart illustrating the characteristics of the dataset used for performance results.

본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템(1000) 및 후술하는 논리적 속성이 반영된 지식 그래프 임베딩 방법의 성능 평가를 위하여, 기 공지된 지식 그래프 중 잘 알려진 두 개의 지식 그래프인 WordNet (Miller, 1995)과 Freebase (Bollacker et al., 2008)를 이용한다. In order to evaluate the performance of the knowledge graph embedding system 1000 reflecting the logical attributes according to the present invention and the knowledge graph embedding method reflecting the logical attributes described later, two well-known knowledge graphs of WordNet (Miller, 1995 ) And Freebase (Bollacker et al., 2008).

WordNet은 단어 사이의 의미론적 관계를 제공하며, WN11 (Socher et al., 2013)과 WN18 (Bordes et al., 2014)로 구성된 두 개의 서브 데이터셋이 존재한다. Freebase는 세계에 대한 일반적인 사실을 나타내는 지식 그래프이며, FB13 (Socher et al., 2013)과 FB15K (Bordes et al., 2014)의 두 개의 서브 데이터셋을 가지고 있다. FB15K는 링크 예측 작업과 트리플 분류 작업 모두에 사용되는 반면, FB13은 트리플 분류 작업에만 사용될 수 있다. 링크 예측 작업과 트리플 분류 작업에 대한 구체적인 설명은 후술하기로 한다.WordNet provides semantic relations between words, and there are two subdatasets consisting of WN11 (Socher et al., 2013) and WN18 (Bordes et al., 2014). Freebase is a knowledge graph representing general facts about the world and has two subdatasets: FB13 (Socher et al., 2013) and FB15K (Bordes et al., 2014). FB15K is used for both link prediction operation and triple classification operation, whereas FB13 can only be used for triple classification operation. A detailed description of the link prediction operation and the triple classification operation will be described later.

도시된 도표에서, #Triples은 각각의 벤치마크 데이터셋의 학습 트리플의 개수를 의미하고, #Transitive은 이행 릴레이션(관계)를 갖는 지식 트리플의 개수를 의미하며, #Symmetric은 대칭 관계를 갖는 지식 트리플의 개수를 의미한다. 또한, Ratio는 이행관계 또는 대칭관계인 지식 트리플의 비율을 나타낸다. 도시된 바와 같이, 이행관계 또는 대칭관계를 갖는 지식 트리플은 데이터셋에서 상당한 비중을 차지함을 알 수 있다.In the diagram shown, #Triples means the number of learning triples of each benchmark dataset, #Transit is the number of knowledge triples having a transition relation (relationship), #Symmetric is a knowledge triple having a symmetric relationship . &Lt; / RTI > In addition, Ratio represents a ratio of knowledge triples, which is a transition relation or a symmetric relation. As shown, it can be seen that knowledge triples having a fulfillment relationship or a symmetric relationship take up a significant portion of the data set.

도 7 내지 도 11은 본 발명에 따른 논리적 속성이 반영된 임베딩 시스템(또는 방법)의 성능 결과를 나타내는 도면이다.FIGS. 7 to 11 are views showing performance results of an embedding system (or method) in which logical attributes according to the present invention are reflected.

본 발명에 따른 논리적 속성이 반영된 임베딩 시스템(또는 방법)의 성능은 두 가지 작업(task)을 통해 나타날 수 있다. 두 가지 작업은 링크 예측 작업과 트리플 분류 작업으로 분류될 수 있다.The performance of the embedding system (or method) that reflects the logical attributes according to the present invention may occur through two tasks. Both tasks can be categorized into a link prediction task and a triple classification task.

먼저, 링크 예측 작업은 수집된 지식 트리플에 누락된 릴레이션(관계)가 있는 경우, 누락된 릴레이션을 예측하는 작업이다. First, the link prediction task is a task of predicting a missing relation when there is a missing relation (relation) in the collected knowledge triple.

링크 예측 작업의 성능을 평가하기 위하여, 기 공지된 기술인 TransE(Bordes et al., 2013), TransH(Wang et al., 2014), TransR(Lin et al., 2015) 및 TransD(Ji et al., 2015)와 본 발명에 따른 논리적 속성이 보존된 지식 그래프를 비교하기로 한다.TransE (Lin et al., 2015) and TransD (Ji et al., 2013), TransH (Wang et al., 2014) , 2015) and the knowledge graph preserving the logical attributes according to the present invention will be compared.

도 8 은 링크 예측 작업에 대한 성능 결과 비교표에 대한 도면이다.8 is a diagram of a performance result comparison table for a link prediction task.

도 8에서, Mean Rank는 모든 올바른 엔티티의 평균 랭크를 측정한 결과이고, Hits@10은 상위 10위 안에 랭크된 올바른 트리플의 비율을 나타내는 결과이다. unif와 bern은 두 가지 샘플링 방법에 대한 결과이다. unif와 bern의 두 가지 샘플링 방법은 종래 기술인 Wang et al., 2014에 개시되어 있으므로, 구체적인 설명은 생략하기로 한다.In FIG. 8, Mean Rank is the result of measuring the average rank of all the correct entities, and Hits @ 10 is the result of representing the percentage of correct triples ranked in the top 10. unif and bern are the results of two sampling methods. Two sampling methods, unif and bern, are disclosed in the prior art Wang et al., 2014, and a detailed description thereof will be omitted.

도 7을 함께 참조하면, 본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템은 다섯 개의 파라미터가 있다. α는 학습 속도, B는 각 집단에 포함된 학습 트리플의 개수,

은 마진, n, m은 엔티티와 릴레이션에 대한 임베딩 차원, 그리고 D.S는 임베딩 스코어 함수의 비유사도 측정이다. 그리고 확률적 기울기 하강(stochastic gradient descent)의 반복수는 1000회이다.Referring to FIG. 7, the knowledge graph embedding system in which the logical attribute according to the present invention is reflected has five parameters. α is the learning rate, B is the number of learning triples in each group,

Is the margin, n, m is the embedding dimension for the entity and relation, and DS is the univariate measure of the embedding score function. And the number of iterations of the stochastic gradient descent is 1000 times.

도 8에서, filter 세팅시 종래 기술인 TransD에의 Hits@10은 각각 92.5%(unif 샘플링), 92.2%(bern 샘플링)인 반면, 본 발명에 따른 논리적 속성이 보존된 TransD(lppTransD)의 Hits@10은 각각 93.6%, 94.3%의 성능을 발휘한다. 즉, 본 발명에 따른 논리적 속성이 보존된 TransD(lppTransD)의 성능은 종래 기술인 TransD에 비해 1.8%높음을 알 수 있다. 이와 유사한 방법으로, 본 발명에 따른 논리적 속성이 보존된 지식 그래프(lppTransE, lppTransR, lppTransD)는 종래의 지식 그래프보다 높은 퍼포먼스를 구현할 수 있다.In FIG. 8, Hits @ 10 of the TransD (lppTransD) preserving the logical properties according to the present invention are 92.5% (unif sampling) and 92.2% (bern sampling), respectively, 93.6% and 94.3%, respectively. That is, it can be seen that the performance of TransD (lppTransD) in which the logical attribute according to the present invention is preserved is 1.8% higher than that of TransD in the prior art. In a similar manner, knowledge graphs (lppTransE, lppTransR, lppTransD) in which logical attributes according to the present invention are preserved can achieve higher performance than conventional knowledge graphs.

도 9는 FB15K의 릴레이션의 매핑 특성에 따른 Hits@10을 나타내는 도표이다. 도 9를 함께 참조하면, 본 발명에 따른 논리적 속성이 반영된 임베딩 시스템은 N대 1 및 N대 N의 기본 모델보다 높은 Hits@10을 나타내며, 1대 1과 1대 N의 기본 모델과 유사한 Hits@10을 나타낸다. FIG. 9 is a chart showing Hits @ 10 according to the mapping characteristics of relations of FB15K. 9, the embedding system incorporating the logical attributes according to the present invention exhibits a higher Hits @ 10 than the base model of N-to-1 and N-to-N, and is similar to the basic model of 1: 1 and 1: 10.

도시된 도표에서, 논리적 속성이 반영된 TransD(lppTransD)는 종래 기술인 TransD보다 N대 1에서 7.3%의 성능이 개선되고, N대 N에서는 3.7%의 성능이 개선됨을 알 수 있다. 모든 이행관계(transitive relation) 및 일부 대칭관계(symmetric relation)는 일반적으로 N대 N이기 때문에, N대 N에서 좋은 성능을 달성하는 것은 중요하다. 즉, 도 9에 따르면, N대 N에서 본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 시스템은 종래의 임베딩 기술의 이행성 및 대칭성에 대한 문제를 해결할 수 있다는 것을 나타낼 수 있다.In the diagram shown, it can be seen that TransD (lppTransD) reflecting the logical attributes improves 7.3% performance in N vs. 1 and 3.7% in N vs. N, compared to the prior art TransD. Since all transitive relations and some symmetric relations are generally N to N, it is important to achieve good performance in N to N. That is, according to FIG. 9, it can be shown that the knowledge graph embedding system in which the logical attributes according to the present invention are reflected in N to N can solve the problem of the transitivity and symmetry of the conventional embedding technology.

도 10 내지 도 11은 트리플 분류 작업에 대한 성능 결과 비교표에 대한 도면이다.FIGS. 10 to 11 are diagrams for a performance result comparison table for a triple classification operation. FIG.

트리플 분류 작업은 주어진 지식 트리플이 올바른지 여부를 판단하는 작업이다. 도 10은 트리플 분류 작업에 사용되는 파라미터 종류에 대한 도표이며, 도시된 파라미터들은 도 7에 도시된 파라미터들과 동일하므로, 반복되는 설명은 생략하기로 한다. The triple classification task is to determine whether a given knowledge triple is correct. FIG. 10 is a table of parameter types used in the triple classification operation. The parameters shown in FIG. 10 are the same as those shown in FIG. 7, and a repeated description thereof will be omitted.

도 11에 도시된 바와 같이, 본 발명에 따른 논리적 속성이 보존된 지식 그래프 임베딩 시스템(1000)은 트리플 분류 작업에서 종래 기술에 비해 향상된 성능을 발휘할 수 있다. 이러한 결과는 본 발명에 따른 논리적 속성이 보존된 지식 그래프 임베딩 시스템(1000)이 종래의 번역 기반 임베딩의 문제점을 효과적으로 해결한다는 것을 의미한다.As shown in FIG. 11, the knowledge graph embedding system 1000 in which the logical attribute according to the present invention is preserved can exhibit improved performance in comparison with the prior art in the triple classification operation. These results indicate that the knowledge graph embedding system 1000 preserving logical attributes according to the present invention effectively solves the problem of conventional translation based embedding.

도 12는 본 발명에 따른 논리적 속성이 반영된 지식 그래프 임베딩 방법의 개략적인 흐름을 나타내는 순서도이다.12 is a flowchart showing a schematic flow of a knowledge graph embedding method in which logical attributes according to the present invention are reflected.

먼저, 지식 트리플을 수집하여 지식 트리플을 구성하는 엔티티에 적어도 하나의 역할을 부여할 수 있다(110). 지식 트리플은 외부 데이터베이스로부터 수집될 수 있으며, 수집된 지식 트리플을 구성하는 엔티티에 헤드 역할 또는 테일역할 중 적어도 하나의 역할을 부여할 수 있다. 또한, 지식 트리플을 표준 데이터셋으로부터 수집하는 경우, 엔티티에 역할을 부여하는 과정이 생략될 수도 있다.First, knowledge triples can be collected to assign at least one role to an entity that constitutes a knowledge triple (110). A knowledge triple can be collected from an external database and can assign at least one of a head role or a tail role to an entity that constitutes the collected knowledge triple. Also, when collecting knowledge triples from a standard data set, the process of assigning roles to entities may be omitted.

다음으로, 역할이 부여된 엔티티를 역할에 따라 서로 다른 매핑 행렬로 매핑할 수 있다(120). 즉, 헤드 역할이 부여된 엔티티인 헤드 엔티티와 테일 역할이 부여된 테일 엔티티를 구분하여, 헤드 엔티티들은 제1 매핑 행렬에 투영하고, 테일 엔티티들은 제2 매핑 행렬에 투영시킬 수 있다. 이 과정에서, 헤드 역할과 테일 역할을 모두 수행할 수 있는 엔티티는 서로 다른 두 개의 매핑 행렬에 동시에 매핑될 수 있다.Next, the role-assigned entities may be mapped to different mapping matrices according to roles (120). That is, a head entity, which is an entity to which a head role is assigned, may be distinguished from a tail entity to which a tail role is assigned, so that head entities may be projected onto a first mapping matrix and tail entities may be projected onto a second mapping matrix. In this process, entities that can perform both a head role and a tail role can be simultaneously mapped to two different mapping matrices.

계속하여, 서로 다른 매핑 행렬을 이용하여 스코어 함수를 산출할 수 있다(130). 스코어 함수는 헤드 엔티티 벡터와 릴레이션(관계) 벡터의 합 벡터를 산출하고, 합 벡터와 테일 엔티티 벡터의 차이값을 절대값으로 나타낼 수 있다. 스코어 함수를 이용한 결과값이 0에 가까울수록, 지식 트리플이 릴레이션에 의해 올바르게 표현되었음을 의미한다.Subsequently, the score function may be calculated using different mapping matrices (130). The score function can calculate the sum vector of the head entity vector and the relation (relation) vector, and the difference value between the sum vector and the tail entity vector as an absolute value. The closer the result value using the score function is to 0, the more accurate the knowledge triple is represented by the relation.

이후, 산출된 스코어 함수를 이용하여 지식 트리플을 학습할 수 있다(140). 지식 트리플을 학습하는 단계는, 수집된 지식 트리플에 누락된 엔티티가 존재하는 경우, 누락된 엔티티를 예측하는 링크 예측단계와, 수집된 지식 트리플이 릴레이션에 의해 오류 없이 표현되는지를 판단하는 트리플 분류단계를 포함할 수 있다.Thereafter, the knowledge triple can be learned using the calculated score function (140). The step of learning the knowledge triple comprises: a link prediction step of predicting a missing entity if there is an entity missing in the collected knowledge triple; a triple classification step of determining whether the collected knowledge triple is represented without error by a relation . &Lt; / RTI >

이와 같은, 논리적 속성이 반영된 지식 그래프 임베딩 방법 및 시스템을 제공하는 기술은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such a technique for providing a knowledge graph embedding method and system reflecting logical attributes can be implemented in an application or can be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드 뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

10: 역할 부여부
20: 매핑부
30: 산출부
40: 학습부10: Role availability
20:
30:
40:

Claims

Collecting knowledge triples and assigning at least one of a head role or a tail role to an entity constituting the knowledge triple;
Mapping a head entity, which is an entity to which a head role is assigned, to a first mapping matrix, and mapping a tail entity, which is an entity granted a tail role, to a second mapping matrix;
Calculating a score function using the first mapping matrix and the second mapping matrix; And
And learning the knowledge triple using the calculated score function.

The method according to claim 1,
Wherein the knowledge triple is composed of two entities and a relation indicating a relation between the two entities, wherein the logical attribute is reflected.

3. The method of claim 2,
The score function may include:
Wherein the logical attribute is a function representing a sum of the head entity vector and the relation vector and representing a difference value between the sum vector and the tail entity vector.

3. The method of claim 2,
Wherein learning the knowledge triple comprises:
A link predicting step of predicting a missing entity when a missing entity is present in the collected knowledge triple; And
And a triple classification step of determining whether the collected knowledge triples are represented without error by the relation.

The method of claim 3,
Wherein, in the link prediction step,
Predicting a missing entity using the score function,
The triple classification step includes:
And comparing the result value of the score function with a reference value to determine whether the knowledge triple is expressed without error, wherein the logical attribute is reflected.

The method according to claim 1,
Wherein the mapping step comprises:
Mapping the specific entity to both the first mapping matrix and the second mapping matrix when a specific entity is granted both the head role and the tail role.

Acquiring knowledge triples and assigning at least one of a head role or a tail role to an entity constituting the knowledge triple;
A mapping unit that maps a head entity, which is an entity to which a head role is assigned, to a first mapping matrix, and a tail entity that is an entity to which a tail role is assigned, to a second mapping matrix;
A calculating unit for calculating a score function using the first mapping matrix and the second mapping matrix; And
And a learning unit for learning the knowledge triple using the calculated score function, wherein the logical attribute is reflected.

8. The method of claim 7,
Wherein the knowledge triple comprises two entities and a relation indicating the association of the two entities.

9. The method of claim 8,
The score function may include:
Wherein the logical attribute is a function representing a sum of a head entity vector and an relation vector and representing an absolute value of a difference value between the sum vector and the tail entity vector.

9. The method of claim 8,
Wherein,
If there are missing entities in the collected knowledge triples,
Wherein the logic attribute is reflected to determine whether the collected knowledge triple is represented without error by the relation.

8. The method of claim 7,
Wherein the mapping unit comprises:
And mapping the specific entity to both the first mapping matrix and the second mapping matrix when a specific entity is granted both the head role and the tail role.

12. A computer-readable recording medium storing a computer program for providing a method and system for embedding knowledge graphs in which logical attributes are reflected according to any one of claims 1 to 11.