CN113918730A

CN113918730A - Knowledge graph relation completion method

Info

Publication number: CN113918730A
Application number: CN202111188118.8A
Authority: CN
Inventors: 赵之晗; 陈晓云; 陆海; 张少泉; 张筱雨
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2022-01-11

Abstract

The application provides a knowledge graph relation completion method, which comprises the steps of inputting knowledge base files into an internal knowledge reasoning module and an external knowledge reasoning module; the knowledge representation training module converts the entities and the relations of the knowledge base files into corresponding space vectors; the triple prediction module scores triples in the space vector and determines the confidence coefficient of the triples obtained by inference according to the scoring result; if the triple is established, adding the triple to the first triple set; the external knowledge reasoning module projects the triples of the knowledge base files and the corresponding texts to the same vector representation space based on the unified representation learning framework; adjusting vectors in the same vector representation space according to a preset satisfying relationship and deducing a missing relationship to obtain a second triple set; and fusing and de-duplicating the first triple set and the second triple set, adding the fused and de-duplicated data into the knowledge base file, and importing the knowledge base file into a preset database to generate a new knowledge graph to complete the knowledge graph.

Description

Knowledge graph relation completion method

Technical Field

The application relates to the technical field of electric power, in particular to a knowledge graph relation completion method.

Background

In recent years, with the arrival of the big data era, the power data is also growing explosively, and the increase of the data volume makes it difficult to complete various data processing and analyzing tasks quickly and efficiently only by manpower, so that the processing and analyzing data by using a computer to assist people is a necessary way to solve the problem. Since computers cannot directly process data formats composed of human languages, a new knowledge representation with fine quality and computer friendliness needs to be sought, and thus, a knowledge representation form such as a knowledge graph is generated.

Since 2012, a knowledge graph is proposed by Google corporation, the knowledge graph becomes the key point of research of various industries, electric power researchers begin to construct the knowledge graph aiming at the electric power field, but the existing knowledge graph construction process usually follows a 'quantity first and quality second' strategy, namely a compromise strategy of 'constructing to a certain scale first and then improving quality', so that the knowledge graph automatically constructed inevitably has various quality problems, especially the problem of knowledge loss. The preliminarily constructed knowledge graph often lacks a large amount of relevant knowledge because the adopted knowledge source does not cover the knowledge completely.

Disclosure of Invention

The application provides a knowledge graph relation completion method, which aims to solve the problem of relation loss of a knowledge graph in the power field and improve the construction quality of the knowledge graph in the power field.

A knowledge-graph relationship completion method comprises the following steps:

acquiring a to-be-processed knowledge graph, inputting the knowledge graph into a knowledge graph conversion module, and converting the knowledge graph into a knowledge base file in a triple storage form by the knowledge graph conversion module;

respectively inputting the knowledge base files into an internal knowledge reasoning module and an external knowledge reasoning module of the knowledge graph conversion module;

the internal knowledge reasoning module comprises a knowledge representation training module and a triple prediction module; the knowledge representation training module trains the knowledge base files and converts the entities and the relations of the knowledge base files into corresponding space vectors; the triple prediction module scores triples in the space vector through a scoring function based on a neural network model, and determines the confidence coefficient of the triples obtained through inference according to the scoring result;

judging whether the triples are established or not according to the confidence result, and if so, adding the triples to the first triple set;

the external knowledge reasoning module projects the triples of the knowledge base files and the corresponding texts to the same vector representation space based on the unified representation learning framework;

adjusting vectors in the same vector representation space according to a preset satisfying relationship, and deducing a missing relationship according to the adjusted vectors to obtain a second triple set;

fusing and de-duplicating the first triple set and the second triple set to obtain a third triple set;

and adding the third triple set to the knowledge base file, importing the knowledge base file into a preset database to generate a new knowledge map, and completing the knowledge map according to the new knowledge map.

Further, the knowledge-graph conversion module converts the knowledge graph into an RDF knowledge base file in a triple storage form using the Neo4j graph database.

Further, the triplets include a head entity, a relationship, and a tail entity.

Further, the triple prediction module scores the triples in the spatial vector through a scoring function based on the neural network model, and determines the confidence of the triples obtained through inference according to a scoring result, including:

selecting a head entity and a relation vector of a real triple based on a space vector output by a neural network model, calculating a vector of a tail entity, and acquiring a tail entity named as front n of an inference rank and a scoring fraction thereof, wherein n is more than 1;

selecting tail entities and relation vectors of the triples, calculating vectors of head entities, and acquiring head entities with inference rank n and scoring scores thereof, wherein n is larger than 1;

and deducing to obtain the confidence of the triple according to the scoring scores of the tail entity and the head entity.

Further, adding the triples to the first set of triples includes:

taking the relation between the triples with the confidence degree rank as the top n and the triples with the confidence degree rank as the tail entities;

the generated relationship is added to the first set of triples.

Further, the knowledge representation training module trains knowledge base files, including:

taking triple data with real relation in the knowledge base file as a positive example of model training input;

the knowledge representation training module utilizes triple data of unreal relations randomly generated by a countermeasure generation neural network as a negative example of model training input;

and inputting the positive examples and the negative examples into a knowledge representation training module for training.

Further, the method for complementing the relation of the knowledge graph also comprises the following steps;

and if the knowledge represents that the marginal loss of the positive example triples and the negative example triples output by the training module is less than a preset threshold value, stopping training the knowledge base file.

Further, the knowledge graph relation completion method further comprises the following steps:

when the knowledge base file is trained, errors are calculated according to results output by the knowledge representation training module and positive and negative example labels, and the gradient parameters of the model are updated by using an error back propagation algorithm to perform model optimization.

Further, the knowledge graph relation completion method further comprises the following steps: and extracting entities in the knowledge base file, and constructing an entity candidate set according to all the extracted entities.

According to the method and the device, the completion of the internal knowledge of the knowledge graph is realized by using the knowledge representation model, the learning frame is represented in a unified manner based on the online encyclopedic triple and the corresponding text, the external knowledge of the knowledge graph facing to the power field is used for completing the knowledge of the power knowledge graph, the accuracy of deducing the missing relation can be greatly improved, and the construction quality of the knowledge graph in the power field is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart of a knowledge-graph relationship completion method provided in an embodiment of the present application;

FIG. 2 is a logical representation of the transformation of the entities and relationships of the knowledge base files into corresponding spatial vectors and the scoring of triple confidence;

FIG. 3 is a logical representation of the external knowledge inference module projecting triples of knowledge base files and corresponding text into the same vector representation space.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Since the preliminarily constructed knowledge graph often lacks a large amount of relevant knowledge due to incomplete coverage of knowledge by the adopted knowledge source, a completion technology is needed to complete knowledge of the power knowledge graph in the field of power technology. The application provides a method for complementing knowledge graph relationships, fig. 1 is a schematic flow diagram of a method for complementing knowledge graph relationships provided in an embodiment of the application, and as shown in fig. 1, the method for complementing knowledge graph relationships includes:

s1: acquiring a to-be-processed knowledge graph, inputting the knowledge graph into a knowledge graph conversion module, and converting the knowledge graph into a knowledge base file in a triple storage form by the knowledge graph conversion module.

Acquiring a knowledge graph needing to be completed, inputting the knowledge graph to be processed into a knowledge graph conversion module, wherein the knowledge graph conversion module can convert the knowledge graph into an RDF knowledge base file in a triple storage form by utilizing a Neo4j graph database, and then taking triples using the real relations as the positive examples of a subsequent training neural network.

S2: and respectively inputting the knowledge base files into an internal knowledge inference module and an external knowledge inference module of the knowledge graph conversion module.

The knowledge map conversion module can be divided into an internal knowledge inference module and an external knowledge inference module, and in the step, knowledge base files in a triple storage form are respectively input into the internal knowledge inference module and the external knowledge inference module.

S3: the internal knowledge reasoning module comprises a knowledge representation training module and a triple prediction module, the knowledge representation training module trains knowledge base files, entities and relations of the knowledge base files are converted into corresponding space vectors, the triple prediction module scores triples in the space vectors through a scoring function based on a neural network model, and confidence of triples obtained through reasoning is determined according to scoring results.

The internal knowledge reasoning module comprises a knowledge representation training module and a triple prediction module, the knowledge representation training module is mainly used for training the knowledge base file and comprises a positive example taking triple data with a real relation in the knowledge base file as model training input, the knowledge representation training module takes the triple data with a non-real relation randomly generated by a countermeasure generation neural network as the negative example of the model training input, and the positive example and the negative example are input into the knowledge representation training module for training, wherein the triple comprises a head entity, a relation and a tail entity. For example, in actual operation, one implementation manner may be that the knowledge representation training module mainly inputs the knowledge base file in the form of a triple into a knowledge representation model composed of a TransE model (an embedded representation model based on a translation idea, where the TransE model regards the triple < a head entity, a relationship, a tail entity > as a translation performed by using the relationship from the head entity to the tail entity) and a double-layer linear neural network model for training, and the knowledge representation training model outputs an embedded representation of the knowledge base file, that is, an entity (a general name of the head entity and the tail entity) and a relationship of the knowledge base file are converted into corresponding space vector representations.

Fig. 2 is a logic diagram illustrating conversion of entities and relationships of a knowledge base file into corresponding space vectors and scoring of triple confidence levels, and fig. 2 is a schematic illustration illustrating conversion of entities in a knowledge base file into corresponding vector spaces to generate word vector representations (the word vector spaces are converted space vectors), and then constructing the word vector space representations into triples.

The reason why the language of the text triples is converted into the vector form is that the computer does not recognize the language of the text triples, and the data format of the input triples is required to be in the vector form before the triples are input into the neural network for relationship inference, so that the text triples need to be converted into the form of word vector space representation. After conversion, the constructed triples are input into a neural network for relationship inference, the triples are scored through a scoring function, the entity pair with the highest score is the entity pair with the implicit relationship, and fig. 2 is a schematic illustration of the whole process. As shown in fig. 2, the knowledge base file includes entities such as king, li worker, etc., and includes relationships such as profession, student, ISA instruction set, etc., and after calculation, the entities and relationships in the knowledge base file are converted into corresponding space vector form, and when scoring the triple confidence, the problem may be presented, for example, "how did li worker be an electrical engineer? "score by relationship recommendation in space vector, e.g., e1 may represent entity leery, e2 may represent relationship type, such as profession, electrical engineer, etc., and the whole space vector may be considered as a vector R, and the confidence of the triplets is scored based on neural network.

When the knowledge base file is trained, errors are calculated according to results output by the knowledge representation training module and positive and negative example labels, and the gradient parameters of the model are updated by using an error back propagation algorithm to perform model optimization. And if the knowledge represents that the marginal loss of the positive example triples and the negative example triples output by the training module is less than a preset threshold value, stopping training the knowledge base file.

Specifically, one implementation manner may be that the knowledge representation training module randomly generates some triples that do not belong to the true relationship by using the countermeasure generation neural network as model training negative examples, and combines the previously derived true triples as model input positive examples, and extracts entities in the knowledge base file at the same time, constructs an entity candidate set according to all the extracted entities, extracts the constructed entity candidate set from the entities in the triplet file, and then inputs the positive examples and the negative examples to the knowledge representation model for training. The method comprises the steps of training a knowledge representation model by inputting an acquired real triple and a randomly generated negative triple, and obtaining the trained knowledge representation model until the marginal loss of the positive and negative triples output by the model is smaller than a preset threshold value, wherein the marginal loss refers to the increment of the total calculation cost caused by increasing the data volume of a training sample once, and is the loss rate value output by a loss function in the model. When the model is trained, errors are calculated according to the output of the model and positive and negative example labels, and the gradient parameters of the model are updated by using an error back propagation algorithm to carry out model optimization. Therefore, the accuracy of obtaining new correct triples by the knowledge representation model is improved by optimizing the knowledge representation model, and then the triples are scored by the triple prediction module to judge whether the triples are established.

S4: and judging whether the triple is established or not according to the confidence result, and if so, adding the triple to the first triple set.

The triple prediction module scores the triples through the scoring function, infers according to the result of the scoring function and judges the possibility of establishing the triples, namely the triple prediction module scores the triples in the space vector through the scoring function based on the neural network model and determines the confidence coefficient of the inferred triples according to the scoring result. One implementation may be that, based on the space vector output by the neural network model, the head entity and the relationship vector of the real triplet are selected, the vector of the tail entity is calculated, and the tail entity named as the first n of the inference rank and its score are obtained, where n is a positive integer and n > 1. And then selecting a tail entity and a relation vector of the triple, calculating a vector of a head entity, and acquiring a head entity with the inference rank of front n and a scoring score thereof, wherein n is larger than 1. And finally, reasoning to obtain the confidence of the triple according to the scoring scores of the tail entity and the head entity.

Specifically, firstly, according to a space vector corresponding to an entity and a relation obtained by the output of a knowledge representation model based on a neural network model, a head entity and a relation vector of a real triple are selected, a vector of a tail entity is calculated, and a tail entity named as the first n of an inference rank and a scoring score of the tail entity are obtained. Secondly, selecting a tail entity and a relation of a real triple according to the obtained space vector corresponding to the entity and the relation, calculating a vector of a head entity, obtaining the head entity named as the front n of the inference rank and a scoring score thereof, and finally comparing the entity obtained by inference with a predetermined candidate set according to the concept type of the entity to remove the entity in the non-candidate set. And determining the confidence of the entity obtained by inference according to the scoring scores, supplementing the relation between the inference entity with the confidence ranking of front n and the head entity or the tail entity by the inference entity with the confidence ranking of front n, generating a first triple set, namely supplementing the relation between the head entity and the tail entity by the triple with the confidence ranking of front n, and adding the generated relation to the first triple set.

S5: the external knowledge inference module projects the triples of knowledge base files and corresponding text into the same vector representation space based on a unified representation learning framework.

The external knowledge inference module can describe the triples and corresponding text in the knowledge base file to the same vector representation space using a unified representation learning framework based on online encyclopedia triples and corresponding text. Fig. 3 is a logic diagram of an external knowledge inference module projecting triples of a knowledge base file and corresponding texts into the same vector representation space, and fig. 3 shows an example of completing missing relations of a knowledge graph through a unified representation learning framework, which includes a plurality of defined relations based on an online encyclopedia, basically covers various semantic relations, and uniformly maps (projects) the existing entity words in the knowledge graph, words related to the existing entity of the knowledge graph in the texts acquired by an external internet, and predicted relations of the unified representation learning framework into the same vector representation space.

For example, the text description is "yunnan × electric academy is located in yunnan kunming of spring city of china", the left half part in fig. 3 is an entity vocabulary existing in the knowledge graph, the right half part is a schematic prediction process of a relationship in a vector representation space, in fig. 3, the relationship between each pair of entities in the left half part graph, such as "belonging to", "located", and the like, can predict the actual relationship existing between the entities by the prediction method of the right half part, that is, the external knowledge inference module projects the text description to the vector representation space based on the uniform representation learning framework, predicts the relationship which may exist, and thus the purpose of complementing the knowledge graph is achieved.

Specifically, as shown in fig. 3, "kunming" and "yunnan" electric department "in the right half of fig. 3 are entity vectors existing in the knowledge graph, and the long-dashed line" location "is a predicted relationship, which is predicted for the relationship of" location "in the left half, that is, the possibility that" yunnan "and" kunming "have" location "is predicted. The process of prediction is to predict the relationship between entities of the triad (yunnan electric academy of sciences, location, kunming). The description of the relationship between the "yunnan × electric academy of sciences", "Kunming", "location" and text "located in Yunnan Kunming, spring city, China" may be represented by vectors, where dotted lines "located in" spring city "and" Yunnan "are related words in the external text and are vectors of related words obtained from the text. Assuming that the descriptive sentence in the text is that the yunnan electric academy is located in yunnan Kunming, spring city, China, and there are two entities, namely yunnan electric academy and Kunming, in the knowledge graph, the vocabulary and the predicted relationship of the entities are mapped to a unified vector representation space through a unified representation learning framework.

S6: and adjusting the vectors in the same vector representation space according to a preset satisfying relationship, and deducing a missing relationship according to the adjusted vectors to obtain a second triple set.

Projecting the triples and the corresponding texts of the knowledge base files to the same vector representation space according to the step S5, adjusting the vectors by a unified representation learning framework, if the vectors can be adjusted to the degree that the word vectors of the prediction relationship shown in FIG. 3 are connected end to end with the vector heads of the two entities, considering that the pair of entities has the relationship, completing the knowledge graph, otherwise, considering that the pair of entities does not have the implicit relationship, continuing to predict the next pair of entities, mining the implicit relationship not included in the knowledge graph, and so on, performing inference on the relationships among other entities in the same way until all the entity pairs in the knowledge graph are predicted and mined once, and inferring the missing relationship according to the adjusted vectors to obtain a second set.

S7: and performing fusion de-duplication on the first triple set and the second triple set to obtain a third triple set.

And performing fusion de-duplication processing on the first triple set and the second triple set, for example, removing repeated data and junk data to improve data operation efficiency, and obtaining a third triple set.

S8: and adding the third triple set to the knowledge base file, importing the knowledge base file into a preset database to generate a new knowledge map, and completing the knowledge map according to the new knowledge map.

And adding the fused and de-duplicated third triple set into the RDF knowledge base file in the step S1, and importing the RDF knowledge base file into a preset database such as a database Neo4j to generate a new knowledge graph, so that the original knowledge graph can be completed. The method and the device can solve the problem of relation missing of the knowledge graph in the power field through complementing the relation of the knowledge graph, help to improve the quality control of the power knowledge graph, optimize the traditional method on the basis of singly using the internal knowledge of the knowledge graph to complement, use a unified representation learning framework, and use the representation of external rich text data enhancement entities to improve the accuracy of reasoning missing relation.

The method is beneficial to knowledge updating and knowledge supplementation of the power knowledge graph by supplementing the knowledge graph, solves the problems of low accuracy and the like of the existing supplementation technology by combining the supplementation method based on internal knowledge and external knowledge, solves the problem of limited knowledge inferred based on the internal knowledge of the knowledge graph, and can greatly reduce the consumption of manpower and material resources. According to the method and the device, the inference accuracy of the triple missing relationship is improved by complementing the knowledge map relationship, and the industrial requirement of power technology informatization intellectualization is met.

According to the technical scheme, the knowledge graph relation completion method comprises the steps that a knowledge graph is input into a knowledge graph conversion module, and the knowledge graph conversion module converts the knowledge graph into a knowledge base file in a triple storage form; respectively inputting the knowledge base files into an internal knowledge reasoning module and an external knowledge reasoning module of the knowledge graph conversion module; the internal knowledge reasoning module comprises a knowledge representation training module and a triple prediction module; the knowledge representation training module trains the knowledge base files and converts the entities and the relations of the knowledge base files into corresponding space vectors; the triple prediction module scores triples in the space vector through a scoring function based on a neural network model, and determines the confidence coefficient of the triples obtained through inference according to the scoring result; judging whether the triples are established or not according to the confidence result, and if so, adding the triples to the first triple set; the external knowledge reasoning module projects the triples of the knowledge base files and the corresponding texts to the same vector representation space based on the unified representation learning framework; adjusting vectors in the same vector representation space according to a preset satisfying relationship, and deducing a missing relationship according to the adjusted vectors to obtain a second triple set; fusing and de-duplicating the first triple set and the second triple set to obtain a third triple set; and adding the third triple set to the knowledge base file, importing the knowledge base file into a preset database to generate a new knowledge map, and completing the knowledge map according to the new knowledge map. The method comprises the steps of completing knowledge inside a knowledge graph by using a knowledge representation model, and completing knowledge of the power knowledge graph by using external knowledge of the knowledge graph facing to the power field by combining a unified representation learning box based on online encyclopedic triples and corresponding texts. The completion method provided by the invention not only is based on the internal and external knowledge of the power knowledge graph, but also uses a neural network learning model to carry out knowledge reasoning, so that the accuracy of deducing the missing relationship can be greatly improved, and the construction quality of the power domain knowledge graph is improved.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

Claims

1. A knowledge graph relation completion method is characterized by comprising the following steps:

inputting the knowledge base file into an internal knowledge inference module and an external knowledge inference module of the knowledge map conversion module respectively;

the internal knowledge reasoning module comprises a knowledge representation training module and a triple prediction module; the knowledge representation training module trains the knowledge base files and converts the entities and the relations of the knowledge base files into corresponding space vectors; the triple prediction module scores the triples in the space vector through a scoring function based on a neural network model, and determines the confidence coefficient of the triples obtained through inference according to the scoring result;

judging whether the triple is established or not according to the confidence result, and if so, adding the triple to a first triple set;

the external knowledge reasoning module projects the triples and the corresponding texts of the knowledge base files to the same vector representation space based on a unified representation learning framework;

adjusting the vectors in the same vector representation space according to a preset satisfying relationship, and deducing a missing relationship according to the adjusted vectors to obtain a second triple set;

and adding the third triple set to the knowledge base file, importing the knowledge base file into a preset database to generate a new knowledge graph, and completing the knowledge graph according to the new knowledge graph.

2. The knowledgegraph relationship completion method according to claim 1, wherein the knowledgegraph conversion module converts the knowledgegraph into an RDF knowledgebase file in a triple storage form using a Neo4j graph database.

3. The method of knowledge-graph relationship completion according to claim 1, wherein said triples comprise head entities, relationships and tail entities.

4. The method for completing knowledge-graph relationships according to claim 3, wherein the triple prediction module scores the triples in the space vector through a scoring function based on a neural network model, and determines the confidence of the triples obtained through inference according to the scoring result, and the method comprises the following steps:

and reasoning to obtain the confidence of the triple according to the scoring scores of the tail entity and the head entity.

5. The method of knowledgegraph relationship completion according to claim 4, wherein adding the triplet to a first set of triples comprises:

the generated relationship is added to the first set of triples.

6. The knowledge-graph relationship completion method of claim 1, wherein the knowledge representation training module trains the knowledge base files, comprising:

taking triple data with a real relation in the knowledge base file as a positive example of model training input;

inputting the positive examples and the negative examples into the knowledge representation training module for training.

7. The knowledge-graph relationship completion method according to claim 6, further comprising;

and if the knowledge represents that the marginal loss of the positive example triples and the negative example triples output by the training module is smaller than a preset threshold value, stopping training the knowledge base file.

8. The knowledge-graph relationship completion method according to claim 6, further comprising:

9. The knowledge-graph relationship completion method according to claim 1, further comprising: and extracting the entities in the knowledge base file, and constructing an entity candidate set according to all the extracted entities.