WO2022107989A1

WO2022107989A1 - Method and device for completing knowledge by using relation learning between query and knowledge graph

Info

Publication number: WO2022107989A1
Application number: PCT/KR2020/018966
Authority: WO
Inventors: 박영택; 이완곤; 김민성; 이민호
Original assignee: 숭실대학교산학협력단
Priority date: 2020-11-23
Filing date: 2020-12-23
Publication date: 2022-05-27
Also published as: KR102442422B1; KR20220070919A

Abstract

Disclosed are a method and a device for completing knowledge by using relation learning between a query and a knowledge graph. According to the present invention, provided is a device for completing knowledge by using relation learning between a query and a knowledge graph, comprising: a query embedding module that outputs a query embedding value corresponding to an input query; a topic extraction module that extracts topics from the input query; a knowledge graph embedding module that outputs embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; a similarity calculation module that determines the predicate most similar to the query by calculating the similarity between the embedding value of the query and the embedding values of each of the plurality of predicates; an embedding connection module that connects the embedding value of the query with the embedding value of the most similar predicate; and a scoring module that infers a new triple by using the extracted topic, the embedding value connecting the query embedding value and the embedding value of the most similar predicate, and the subjects and objects of the knowledge graph.

Description

Method and device for knowledge completion using query and knowledge graph relationship learning

The present invention relates to a knowledge completion method and apparatus using a relationship learning between a query and a knowledge graph.

A knowledge graph refers to a network composed of relationships between entities. In such a knowledge graph, there is a problem of an incomplete knowledge graph due to problems such as omission of relationships for specific entities or incorrect connection of relationships.

Many studies to solve the problem of incomplete knowledge graphs have proposed learning methods using artificial neural networks based on natural language embedding, and various knowledge graph completion systems are being studied with these methods.

In the prior art, “What does [Christian Bale] star in?” In the same query as [Christian Bale], the topic is marked like [Christian Bale], so there is a problem in that it requires work for marking when extracting a topic, and the predicate of the knowledge graph cannot be used.

In order to solve the problems of the prior art, the present invention intends to propose a knowledge completion method and apparatus using a relationship learning between a query sentence and a knowledge graph capable of inferring missing knowledge by using a specific query sentence and a knowledge graph.

In order to achieve the above object, according to an embodiment of the present invention, as a knowledge completion device using a relationship learning between a query and a knowledge graph, a query embedding module that outputs a query embedding value corresponding to an input query ; a topic extraction module for extracting topics from the input query; a knowledge graph embedding module for outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; a similarity calculation module for determining a predicate most similar to a query sentence by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; an embedding connection module that connects the embedding value of the query with the embedding value of the most similar predicate; and a scoring module for inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph. A knowledge completion apparatus using learning is provided.

The query embedding module may determine an embedding value corresponding to the input query using a BERT-based RoBERT model.

The query embedding module may perform tokenization to separate the input query into words, and may extract a topic by excluding words having a preset number of appearances or more in the knowledge graph.

The similarity calculation module may perform a dot product operation on the embedding values of all predicates obtained through the graph embedding module, take a sigmoid, find the highest value, and search for a predicate most similar to the query statement.

The scoring module places the extracted topic in the subject, places the embedding value connecting the query embedding value and the embedding value of the most similar predicate in the predicate, and places the subject and objects of the knowledge graph in the candidate object to deduce a new triple.

The scoring module sequentially places a plurality of candidate objects in the score calculation function, so that the entity with the highest score is an object related to an embedding value that connects the extracted topic and the query embedding value with the embedding value of the most similar predicate. can decide

According to another aspect of the present invention, there is provided a method of completing knowledge by using a query and knowledge graph relationship learning in an apparatus including a processor and a memory, the method comprising: outputting a query embedding value corresponding to an input query; extracting a topic from the input query; outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph; determining a predicate most similar to a query statement by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates; concatenating the embedding value of the query and the embedding value of the most similar predicate; and inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph. A knowledge completion method using

According to another aspect of the present invention, there is provided a computer readable program for performing the above method.

According to the present invention, there is an advantage in that the range of usable datasets can be expanded by automating topic selection.

Also, according to the present invention, it is possible to obtain the effect of filling in insufficient information compared to the case of using only the query by using the predicate embedding value of the knowledge graph together with the question embedding value.

1 is a diagram showing the configuration of a knowledge completion device according to an embodiment of the present invention.

2 is a diagram illustrating a detailed configuration of a query embedding module according to the present embodiment.

3 is a diagram illustrating a case in which the number of appearances is used for topic extraction according to the present embodiment.

4 is a diagram for explaining a process of searching for a predicate similar to a query according to the present embodiment.

5 is a diagram for explaining a process of inferring a new triple through the score calculation function according to the present embodiment.

Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention.

The present invention provides a method for inferring missing knowledge by using a specific query and a knowledge graph.

First, a topic is automatically extracted from a question-type query to obtain the corresponding topic embedding value, and a new triple is created by learning the relationship between the topic and the query from the knowledge graph by using the query embedding and the knowledge graph embedding. infer

In the present invention, in order to increase the reasoning performance of missing knowledge, predicate embedding of a knowledge graph related to a specific query is used together.

Hereinafter, the knowledge completion method according to the present embodiment will be described in detail with reference to the drawings.

As shown in Fig. 1, the knowledge completion device according to the present embodiment includes a query embedding module 100, a topic extraction module 102, a knowledge graph embedding module 104, a similarity calculation module 106, and an embedding connection module. 108 and a scoring module 110 .

The query embedding module 100 outputs an embedding value corresponding to the query input by the user.

Here, query embedding means embedding a query inputted through various algorithms in a vector form in a multidimensional space.

In this embodiment, a BERT-based RoBERT model is used for query embedding.

The BERT model on which RoBERT is based is a context-dependent model. For example, the word 'bank' can have two meanings, such as 'bank deposit' or 'river bank', so the advantage of obtaining an embedding value that expresses the characteristics of the word well considering the context, that is, the front and back sentences. have it

Referring to FIG. 2 , when the input query is “What does Christian Bale star in?”, each word constituting the query is divided into tokens and input, and the first value of the result is embedded in the input query. used as a value.

The topic extraction module 102 according to the present embodiment extracts a topic in consideration of the number of appearances in the knowledge graph of each word included in the input query.

For example, the input query is “What does Christian Bale star in?” In the case of , the topic is separately marked as [Christian Bale] in the query, such as “What does [Christian Bale] star in?”. Therefore, when extracting a topic, work for marking is required.

Referring to FIG. 3 , in the present embodiment, a topic is extracted by extracting a small number of words by calculating the number of appearances of each word included in a query in the knowledge graph. For example, “What does Christian Bale star in?” 'Waht', 'does', 'star', 'in', and '?' appear in the query because the probability of appearing in other queries is much higher than that of “Christian Bale”. It is desirable to extract the few words as the topic. Referring to FIG. 3 , a topic may be extracted by performing tokenization of dividing a query into words and excluding words having a preset number of appearances or more (eg, 2,000 or more).

The knowledge graph embedding module 104 outputs an embedding matrix that well represents a knowledge graph (KB).

A triple is composed of a predicate corresponding to a relation and an entity (subject, object) corresponding to a subject and purpose, and the knowledge graph embedding module 104 includes embedding values for all predicates and entities included in the knowledge graph. print out

For embedding the knowledge graph, the ComplEx model that can express real and imaginary numbers as well as symmetric and asymmetric relationships can be used, and all triple learning of KG is possible through the Score Function.

In the prior art, it was not possible to utilize the predicate of the knowledge graph, but in the present invention, it was confirmed that better performance can be obtained if the subject of the knowledge graph is used together to find and learn a predicate similar to the query when learning the relationship. .

According to a preferred embodiment of the present invention, an embedding value of a query sentence and a knowledge graph is used to find a predicate similar to the query sentence.

The similarity calculation module 106 calculates the similarity between the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104 to determine the predicate most similar to the query.

The similarity calculation module 106 performs a dot product operation on the embedding values output through the query embedding module 100 and the embedding values of all predicates obtained through the knowledge graph embedding module 104, and takes a sigmoid to simulate Look for high values to find similar predicate embeddings.

Referring to FIG. 4 , it can be seen that an embedding value is obtained through the query embedding module 100 for “What does Christian Bale star in?”. And if you take all the predicate embedding values of the knowledge graph in matrix form and take the sigmoid, you can see the similarity value between the query and each predicate as shown in the box on the right. Among them, the highest value “starred_actors” becomes the predicate embedding value most similar to the corresponding query.

The embedding concatenation module 108 concatenates the query embedding value output from the query embedding module 100 and the predicate embedding value most similar to the query sentence determined by the similarity calculation module 106 .

The scoring module 110 infers the missing triple by using the score calculation function (Equation 1) of the knowledge graph embedding module.

The scoring module 100 assigns the topic extracted from the query to the subject of the score calculation function (

) and concatenating the query and similar predicate embeddings in that query to place the predicate in the score calculation function (

) is placed in In place of the last object, entities such as the subject and object of the knowledge graph (

) are candidates, and after calculation, the target with the highest value is found and a corresponding triple is newly inferred.

5, “What does Christian Bale star in?” The embedding value of topic “Christian Bale” extracted from the query

, and concatenate the embedding value of “starred_actors”, which is a predicate similar to the query statement, to the predicate of the score calculation function.

put in The subject and object candidates of the knowledge graph that can come as candidates afterward are the embedding values of “Christian Bale”, “Bruce Wayne”, “Batman Series”, and “The Dark Knight” as the object

It is calculated by substituting one for each. “The Dark Knight” with the highest score (0.87) is selected as the object, and <Christian Bale, starred_actors, The Dark Knight> is inferred as a new triple.

The knowledge completion process using the query and knowledge graph relationship learning according to the preferred embodiment of the present invention may be performed in an apparatus including a processor and a memory.

The processor may include a central processing unit (CPU) or other virtual machine capable of executing a computer program.

The memory may include a non-volatile storage device such as a fixed hard drive or a removable storage device. The removable storage device may include a compact flash unit, a USB memory stick, and the like. The memory may also include volatile memory, such as various random access memories.

The above-described embodiments of the present invention have been disclosed for purposes of illustration, and various modifications, changes, and additions will be possible within the spirit and scope of the present invention by those skilled in the art having ordinary knowledge of the present invention, and such modifications, changes and additions should be regarded as belonging to the following claims.

Claims

As a knowledge completion device using a query and knowledge graph relationship learning,

a query embedding module that outputs a query embedding value corresponding to the input query;

a topic extraction module for extracting topics from the input query;

a knowledge graph embedding module for outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph;

a similarity calculation module for determining a predicate most similar to a query statement by calculating a similarity between the query embedding value and the embedding values of each of the plurality of predicates;

an embedding connection module that connects the embedding value of the query with the embedding value of the most similar predicate; and

Query and knowledge graph relationship learning including a scoring module for inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph knowledge completion device using
The method of claim 1,

The query embedding module determines an embedding value corresponding to the inputted query using a BERT-based RoBERT model, and a knowledge completion device using knowledge graph relationship learning.
According to claim 1,

The query embedding module performs tokenization to separate the input query into each word, and extracts topics by excluding words with a preset number of appearances or more in the knowledge graph. Using knowledge graph relationship learning knowledge completion device.
According to claim 1,

The similarity calculation module performs a dot product operation on the embedding values of all predicates obtained through the graph embedding module, finds the highest value by taking a sigmoid, and searches for a predicate most similar to the query statement Knowledge using graph relationship learning finished device.
The method of claim 1,

The scoring module places the extracted topic in the subject, places the embedding value connecting the query embedding value and the embedding value of the most similar predicate in the predicate, and places the subject and objects of the knowledge graph in the candidate object A knowledge completion device using a query that infers a new triple and knowledge graph relational learning.
6. The method of claim 5,

The scoring module sequentially places a plurality of candidate objects in the score calculation function, so that the entity with the highest score is an object related to an embedding value that connects the extracted topic and the query embedding value with the embedding value of the most similar predicate. Knowledge completion device using decision-making query and knowledge graph relationship learning.
As a method of completing knowledge by using a query and knowledge graph relationship learning in a device including a processor and a memory,

outputting a query embedding value corresponding to the inputted query;

extracting a topic from the input query;

outputting embedding values for a plurality of predicates, subjects, and objects included in the knowledge graph;

determining a predicate most similar to a query statement by calculating a similarity between the query embedding value and each of the embedding values of the plurality of predicates;

concatenating the embedding value of the query and the embedding value of the most similar predicate; and

Inferring a new triple using the extracted topic, the embedding value connecting the embedding value of the query and the embedding value of the most similar predicate, and the subject and object of the knowledge graph. How to use knowledge to complete.
8. The method of claim 7,

In the step of determining the predicate most similar to the query, a dot product operation is performed on the embedding values of all predicates obtained through the graph embedding module, and the highest value is found by taking a sigmoid. Knowledge completion method using graph relation learning.
8. The method of claim 7,

In the step of inferring the new triple, the extracted topic is placed in the subject, the embedding value connecting the query embedding value and the embedding value of the most similar predicate is placed in the predicate, and the subject and object of the knowledge graph A knowledge completion method using a query statement that infers a new triple by locating them to the candidate object and knowledge graph relational learning.
A computer readable program for performing the method according to claim 7.