CN112348190B

CN112348190B - Uncertain knowledge graph prediction method based on improved embedded model SUKE

Info

Publication number: CN112348190B
Application number: CN202011159784.4A
Authority: CN
Inventors: 汪璟玢; 聂宽
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2022-06-21
Anticipated expiration: 2040-10-26
Also published as: CN112348190A

Abstract

The invention relates to an uncertain knowledge graph prediction method based on an improved embedded model SUKE. The SUKE model is proposed based on the existing deterministic embedding model DistMult. The SUKE reserves the structural information and uncertainty information of knowledge, and comprises an evaluator and a confidence generator inside, wherein the evaluator evaluates the rationality of a fact according to the structural characteristics and uncertainty characteristics of the fact, screens out unreasonable facts, and accordingly obtains candidate facts. The latter generates confidence for the candidate facts, indicating the probability of the entity developing a particular relationship. The evaluator defines a structure score and an uncertainty score for each triplet for the factual justification evaluation task. Furthermore, the evaluator introduces unknown facts into the training. A confidence generator generates a confidence for each triplet for the confidence prediction task. The invention can effectively complete the link prediction task of the uncertain knowledge graph.

Description

Uncertain knowledge graph prediction method based on improved embedded model SUKE

Technical Field

The invention relates to the technical field of knowledge representation and reasoning under a knowledge graph, in particular to an uncertain knowledge graph prediction method based on an improved SUKE embedded model.

Background

The uncertain knowledge graph provides a confidence score for each triplet, the confidence reflecting the probability of occurrence of the triplet. In recent years, the development of relationship extraction and crowdsourcing has promoted the construction of large-scale uncertain knowledge graphs, such as ConceptNet, Probase and NELL. Ur et al propose URGE in 2017. URGE proposes a matrix decomposition based approach to embed uncertain networks. But the model only considers node proximity in sparse networks and only learns node embedding. Chen et al, UKGE 2019, which utilizes confidence learning embedding of triples, while introducing the fact that probabilistic soft logic infers it invisible. However, UKGE only uses confidence information to learn embedding, and ignores the structural information of the triples.

Currently, little research is done to embed knowledge maps for uncertainty. Miao et al first proposed the uncertainty inference model IIKE for knowledge graphs in 2015. While IIKE achieves good performance, it considers each triplet in isolation only in calculating its confidence probability score, without taking advantage of the interrelated properties of the knowledge-graph. In the uncertain knowledge graph embedding work, the models with more prominent performances are URGE proposed by hu et al in 2018 and UKGE proposed by chen et al in 2019. The URGE model is designed aiming at the uncertain network, the node embedding is generated by considering the proximity of the nodes, although the URGE model can be popularized to the knowledge graph, the uncertain network is different from the knowledge graph, and the embedding task of the uncertain knowledge graph cannot be well completed. UKGE compares in URGE, and the performance is more outstanding, but the structure information of knowledge is not fully utilized to a certain extent to the UKGE model.

Disclosure of Invention

In view of this, the present invention provides an uncertain knowledge graph prediction method based on an embedded model SUKE, which can effectively complete a link prediction task of an uncertain knowledge graph.

The invention is realized by adopting the following scheme: an uncertain knowledge graph prediction method based on an embedded model SUKE comprises the following steps:

step S1: giving an uncertain knowledge graph, wherein the uncertain knowledge graph internally comprises a plurality of quadruplets (h, r, t, w), wherein h represents a head entity, t represents a tail entity, r represents the relation of the head entity and the tail entity, and w represents the probability of the occurrence relation of the triplets (h, r, t); expanding the number of original quadruplets by a probability soft logic reasoning method defined in a UKGE model; finally, dividing the expanded quadruple into a training set of 60 percent, a verification set of 20 percent and a test set of 20 percent for training the SUKE model; the input of the SUKE model is vector representation of h, r and t, and the vector of the entity and the relation in the knowledge base is pre-trained or randomly initialized by using a TransE algorithm;

step S2: constructing an embedded model SUKE: designing two components of an evaluator and a confidence generator, wherein the evaluator is used for evaluating the rationality of the triples (h, r, t), unreasonable triples are removed through the evaluator, the rational triples are used as candidate sets, and the confidence generator is used for generating confidence for the candidate sets so as to obtain quadruples (h, r, t, w); training an evaluator and a confidence generator through a loss function for subsequent prediction;

step S3: adding the obtained quadruple (h, r, t, w) into the original uncertain knowledge graph; wherein link prediction is defined to predict its missing tail entity and confidence given an incomplete quadruple (h, r,; adding a new quadruple obtained by link prediction into the original uncertain knowledge graph to make the knowledge graph more complete; during prediction, for a triplet (h, r,.

Further, the step S2 specifically includes the following steps:

step S21: calculating triple energy scores through a DistMult model, and respectively obtaining mapping functions of the energy scores to structure scores and uncertain scores through training;

step S22: linearly fusing or multiplicatively fusing the structure score and the uncertainty score to obtain an evaluator score which is used for evaluating the rationality of each triple, wherein reasonable triples can obtain a high evaluator score, and the evaluator score is between 0 and 1;

step S23: giving a triple threshold theta, wherein the value range of theta is 0-1, and judging whether the triple is reasonable or not; if the evaluator score of the triple (h, r, t) is larger than or equal to theta, the triple is considered to be reasonable and is added into the candidate set; otherwise, the operation is unreasonable;

step S24: and generating a confidence coefficient for the triple candidate set obtained in the step S23 through a confidence coefficient generator, wherein the value range of the confidence coefficient is 0-1, and forming a new quadruple (h, r, t, w).

Further, the specific content of step S21 is:

firstly, calculating the energy score of the triplet by adopting a DistMult model;

the energy function of the DistMult model is shown in formula (1);

E(h,r,t)＝h^Tdiag(r)t (1)

wherein diag (r) is a relational diagonal matrix, h^TRepresenting the transpose of the head entity vector and t is the tail entity vector representation.

Then, obtaining Q of the triple through different mapping functions and parameters_structureAnd Q_uncertain；

Q_structureThe calculation method is shown in formula (2):

wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_structure(. is a mapping function of the triplet energy scores to the structure scores; the mapping function is shown in equation (3):

φ_structure(E(h,r,t))＝P_structure·E(h,r,t)+b_structure (3)

wherein P is_structureMapping parameters for energy scores to structure scores, b_structureIs an offset;

Q_uncertainthe calculation method of (2) is shown in formula (4):

wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_uncertain(. h) is a mapping function of the triplet energy scores to uncertainty scores; the mapping function is shown in equation (5):

φ_uncertain(E(h,r,t))＝P_uncertain·E(h,r,t)+b_uncertain (5)

wherein P is_uncertainAs a mapping parameter of energy score to uncertainty score, b_uncertainIs an offset;

q of positive example triples in training process_structureTrending towards Q of 1, negative example triplets_structureTends towards 0; q of positive example triplets_uncertainTrend toward true confidence w, Q of negative example triples_uncertainTends towards 0; for the unknown fact inferred by the logic rule, the evaluator score is used for fitting the confidence coefficient of the unknown fact during training;

the loss function of unknown fact participating in training is defined as formula (6)

Where RS is a set of unknown facts, Q_structureAnd Q_uncertainStructure score and uncertainty score of unknown facts, respectively; lambda is a dynamic adjustment parameter;

the loss function of the evaluator consists of positive case loss, negative case loss and loss of unknown fact; is defined as formula (7)

Γ_Evaluator＝Γ_pos+Γ_neg+Γ_UnFacts (7)

Wherein

Γ_posRepresenting positive case loss, Γ_negRepresenting negative example loss, Γ_UnFactsRepresents a loss of unknown facts; s is a positive example set, and S' is a negative example set; the negative examples used in training are generated by randomly replacing head and tail entities, and meanwhile, the confidence coefficient of the negative examples is set to be 0; negative examples are shown in equation (10):

S'＝{(h₁,r,t,0)|h₁∈ε\h}∪{(h,r,t₁,0)|t₁∈ε\t} (10)。

further, the specific content of step S22 is:

the linear weighted fusion mode is shown as formula (11):

score_add＝α*Q_structure+β*Q_uncertain (11)

wherein α + β ═ 1;

the multiplicative fusion mode is shown in formula (12):

score_mul＝ε+(1-ε)(Q_structure*Q_uncertain) (12)

wherein epsilon is smooth hyper-parameter, and the meaning of multiplicative fusion mode means Q_uncertainTo Q_structureMake corresponding adjustmentsFinishing; when Q is_uncertainLess, Q will be lowered_structureConversely, Q will be increased_structure(ii) a When a triplet has a higher Q_structureAnd Q_structureThe calculated score value will be large, and conversely the calculated score value will be small.

Further, the specific content of step S24 is:

the confidence generator generates confidence w for the candidate set by using a formula (4), and the confidence w and the candidate set obtained in the S23 form a new quadruple (h, r, t, w); the confidence generator uses a different mapping function than the evaluator, but the main difference is that only the negative case is used when training the confidence generator.

The loss function of the confidence generator is defined as equation (13)

Where S is the positive case set and w is the true confidence of the triplet.

The model total loss function is composed of an estimator loss function and a confidence coefficient loss function and is defined as a formula (14)

Ψ＝Γ_confidence+Γ_Evaluator (14)

Wherein gamma is_confidenceTo evaluate the loss of the device, Γ_confidenceIs the loss of the confidence generator model.

Compared with the prior art, the invention has the following beneficial effects:

(1) most of the existing expression learning models only consider structural knowledge stored in a knowledge base, so the completion capability of the models is limited, and the algorithm provided by the invention can effectively fuse structural information and uncertain information.

(2) The invention provides an uncertain knowledge map embedding model SUKE, which introduces structural information, uncertain information and unknown facts and can learn better vector representation.

(3) The SUKE internally comprises two components, namely an evaluator and a confidence generator, wherein the evaluator can be used for evaluating the reasonability of uncertain knowledge, the confidence generator is used for complementing the confidence of the triples, and the two components can effectively complete the link prediction task of the uncertain knowledge graph by cooperation.

Drawings

FIG. 1 is a flowchart of a prediction method according to an embodiment of the present invention.

Fig. 2 is a diagram of a SUKE training process according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides an uncertain knowledge graph prediction method based on an embedded model SUKE, including the following steps:

step S1: giving an uncertain knowledge graph, wherein the uncertain knowledge graph internally comprises a plurality of quadruples (h, r, t and w), wherein h represents a head entity, t represents a tail entity, r represents the relation of the head entity and the tail entity, and w represents the probability of the occurrence of the relation of the triples (h, r and t); expanding the number of original quadruplets by a probability soft logic reasoning method defined in a UKGE model; finally, dividing the expanded quadruple into a training set of 60%, a verification set of 20% and a test set of 20% for training the SUKE model; the input of the SUKE model is vector representation of h, r and t, and a TransE algorithm is used for pre-training or randomly initializing vectors of entities and relations in a knowledge base;

In this embodiment, the step S2 specifically includes the following steps:

step S21: calculating the triple energy score through a DistMult model, and respectively obtaining a mapping function of the energy score to a structure score and an uncertain score through training;

step S22: carrying out linear fusion or multiplicative fusion on the structure score and the uncertainty score to obtain an evaluator score for evaluating the rationality of each triplet, wherein the reasonable triplets obtain a high evaluator score, and the evaluator score is between 0 and 1;

step S23: giving a triple threshold theta, wherein the value range of theta is 0-1, and judging whether the triple is reasonable or not; if the evaluator score of the triple (h, r, t) is larger than or equal to theta, the triple is considered to be reasonable and is added into the candidate set; otherwise, unreasonable;

In this embodiment, the specific content of step S21 is:

firstly, calculating the energy score of the triad by adopting a DistMult model;

the energy function of the DistMult model is shown in formula (1);

E(h,r,t)＝h^Tdiag(r)t (1)

wherein diag (r) is a relational diagonal matrix, h^TRepresenting the transpose of the head entity vector and t is the tail entity vector representation. Then, obtaining Q of the triple through different mapping functions and parameters_structureAnd Q_uncertain；

Q_structureThe calculation method is shown in formula (2):

φ_structure(E(h,r,t))＝P_structure·E(h,r,t)+b_structure (3)

wherein P is_structureMapping parameters for energy score to structure score and b_structureThe bias is obtained by learning;

Q_uncertainthe calculation method of (2) is shown in formula (4):

wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_uncertain() a mapping function of the triplet energy scores to uncertainty scores; the mapping function is shown in equation (5):

φ_uncertain(E(h,r,t))＝P_uncertain·E(h,r,t)+b_uncertain (5)

wherein P is_uncertainAs a mapping parameter of energy score to uncertainty score, b_uncertainThe bias is obtained by learning;

q of positive example triples in training process_structureTrending towards Q of 1, negative example triplets_structureTends towards 0; q of positive example triplets_uncertainTrend toward true confidence w, Q of negative example triples_uncertainTends towards 0; for the unknown fact deduced by the logic rule, using an evaluator score to fit the confidence coefficient of the unknown fact during training;

Where RS is a set of unknown facts, Q_structureAnd Q_uncertainStructure score and uncertainty score of unknown facts, respectively; lambda is a dynamic adjustment parameter and is obtained through learning;

Γ_Evaluator＝Γ_pos+Γ_neg+Γ_UnFacts (7)

Wherein

Γ_posRepresenting positive case loss, Γ_negRepresenting negative example loss, Γ_UnFactsRepresents a loss of unknown facts; s is a positive example set, and S' is a negative example set; the negative examples used in training are generated by randomly replacing head and tail entities, and the confidence level of the negative examples is setSetting to 0; negative examples are shown in equation (10):

S'＝{(h₁,r,t,0)|h₁∈ε\h}∪{(h,r,t₁,0)|t₁∈ε\t} (10)。

in this embodiment, the specific content of step S22 is:

the linear weighted fusion mode is shown as formula (11):

score_add＝α*Q_structure+β*Q_uncertain (11)

wherein α + β ═ 1;

the multiplicative fusion mode is shown in formula (12):

score_mul＝ε+(1-ε)(Q_structure*Q_uncertain) (12)

wherein epsilon is smooth hyper-parameter, and the meaning of multiplicative fusion mode means Q_uncertainTo Q_structureCarrying out corresponding adjustment; when Q is_uncertainLess, Q will be lowered_structureConversely, Q will be increased_structure(ii) a When a triplet has a higher Q_structureAnd Q_structureThe calculated score value will be large, and conversely the calculated score value will be small.

In this embodiment, the specific content of step S24 is:

the confidence generator generates confidence w for the candidate set by using a formula (4), and the confidence w and the candidate set obtained in the S23 form a new quadruple (h, r, t, w); the confidence generator uses parameters that are different from the mapping function used by the evaluator, with the main difference that only negative examples are used when training the confidence generator.

The loss function of the confidence generator is defined as formula (13)

Where S is the positive case set and w is the true confidence of the triplet.

Ψ＝Γ_confidence+Γ_Evaluator (14)

Preferably, in this embodiment, the prediction task of the uncertain knowledge graph needs to evaluate the reasonableness of the triples and give confidence of the reasonable triples. The embodiment designs two components of an evaluator and a confidence generator based on the existing embedded model. Wherein the evaluator is configured to evaluate the rationality of the triplet: unreasonable triples will be removed by the evaluator model and reasonable triples will be added to the candidate set. The confidence generator is used for generating confidence for the candidate set. Fig. 1 illustrates a process of SUKE for link prediction, and fig. 2 illustrates a training process of SUKE.

Preferably, in this embodiment, the prediction task of the uncertain knowledge-graph requires evaluating the rationality and predicting the confidence of the triple. The prediction task can therefore be decomposed into two subtasks, a factual plausibility assessment task and a confidence prediction task. Therefore, the SUKE model is proposed based on the existing deterministic embedding model DistMult. The SUKE reserves the structural information and uncertainty information of knowledge, and comprises an evaluator and a confidence generator inside, wherein the evaluator evaluates the rationality of a fact according to the structural characteristics and uncertainty characteristics of the fact, screens out unreasonable facts, and accordingly obtains candidate facts. The latter generates a confidence for the candidate fact that represents the probability of the entity having the particular relationship. The evaluator defines a structure score and an uncertainty score for each triplet for the factual justification evaluation task. In addition, the evaluator learns a further enhancement in the ability to distinguish between positive and negative cases using unknown facts inferred by probabilistic soft logic. A confidence generator generates a confidence for each triplet for the confidence prediction task.

Preferably, in order to fully utilize the structural information and the uncertain information of the knowledge, the embodiment decomposes the prediction task of the map of uncertain knowledge into two subtasks, namely a factual rationality assessment task and a confidence degree prediction task. The former evaluates the rationality of the fact according to the structural characteristics and uncertain characteristics of the fact, screens out unreasonable facts, and thereby obtains candidate facts. The latter generates confidence for the candidate facts. To this end, the present embodiment proposes a SUKE model based on an existing deterministic embedding model. The SUKE retains structural information and uncertainty information of knowledge, and includes two components, an evaluator and a confidence generator, inside. The evaluator defines a structure score and an uncertainty score for each triplet for the factual justification evaluation task. Furthermore, the evaluator introduces the fact that the probabilistic soft logic learning is unknown. The confidence generator generates a confidence for each triplet to be used for the confidence prediction task.

Preferably, in the present embodiment, the following definitions are performed:

definition 1 (quadruplet, T) let T ═ h, r, T, w denote a quadruplet, where h denotes a head entity, r denotes a relationship, T denotes a tail entity, and w denotes a confidence. A quadruple may also be referred to as a knowledge or fact.

Definition 2 (entity set, epsilon) sets an entity set epsilon { e1, e 2.

Definition 3 (relationship set, R) sets a relationship set R { R1, R2., rn }, which represents a set of all relationships in a knowledge base.

A 4 (confidence set, W) set confidence set W { W1, W2,.. wn }, representing the set of all confidences in the knowledge base, is defined.

Definition 5 (uncertain knowledge graph, G) G ═ where epsilon represents the set of entities, R represents the set of relationships, and W represents the set of confidences.

Definition 6 (structural score, Q)_structure)Q_structureObtained by learning the structural score mapping.

Definition 7 (uncertainty score, Q)_uncertain)Q_uncertainObtained by learning the uncertainty score mapping.

(one) evaluator score

SUKE learns one of the energy scores E (h, r, t) to Q, respectively_structureAnd Q_uncertainMapping of (2). Firstly, calculating the energy score of the triad through a translation model, and thenThen obtaining the Q of the triple through different mapping functions and parameters_structureAnd Q_uncertain. Encouraging Q of positive example triples at optimization_structureTrend toward Q of 1, negative example triplets_structureTends towards 0; encouraging Q of regular triplets_uncertainTrend toward true confidence w, Q of negative example triples_uncertainTending towards 0.

Q_structureThe calculation method is shown in equation 1.

Wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_structure(. h) is a mapping function of the triplet energy scores to the structure scores. The mapping function is shown in equation 2.

φ_structure(E(h,r,t))＝P_structure·E(h,r,t)+b_structure (2)

Wherein P is_structureAnd b_structureAre parameters.

Q_uncertainThe calculation method of (c) is shown in equation 3.

Wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_uncertain(. h) is a mapping function of the triplet energy scores to uncertainty scores. The mapping function is shown in equation 4.

φ_uncertain(E(h,r,t))＝P_uncertain·E(h,r,t)+b_uncertain (4)

Wherein P is_uncertainAnd b_uncertainAre parameters.

The DistMult model is selected by the model to calculate the energy scores of the triples. The DistMult model can complete operation based on a matrix, and has the advantages of small calculation amount and quick operation. (2) The DistMult model is less complex than other models. (3) DistMult achieves better performance in deterministic knowledge graphs than other models. The functional energy of the DistMult model is shown in equation 5.

E(h,r,t)＝h^Tdiag(r)t (5)

(1) Fusion mode of structure score and uncertain score

For the fusion mode of the structure score and the uncertain score, two combination modes, namely a linear weighting fusion mode and a multiplicative fusion mode, are mainly considered in the embodiment.

The linear weighted fusion method is shown in equation 6.

score_add＝α*Q_structure+β*Q_uncertain (6)

Wherein alpha + beta is 1, and the values of alpha and beta determine the importance degree of the structure score and the uncertainty score.

When α ═ β ═ 0.5, the structure score and the uncertainty score are as important. In the experiment, the structural score and the uncertainty score were considered to be of equal degree.

The multiplicative fusion method is shown in equation 7.

score_mul＝ε+(1-ε)(Q_structure*Q_uncertain) (7)

Where ε is the smoothing hyperparameter. Multiplicative fusion mode meaning Q_uncertainTo Q_structureAnd performing corresponding adjustment. When Q is_uncertainLess, Q will be lowered_structureConversely, Q will be increased_structure. When a triplet has a higher Q_structureAnd Q_structureThe calculated score value will be large, and conversely the calculated score value will be small.

(2) Probability soft logic enhancement method

To enhance the learning capabilities of the evaluator, the present embodiment will engage in training with unknown facts in the knowledge graph. By introducing heuristic rules, the probability soft logic defined by UKGE is applied to obtain unknown facts. The rules mined in the uncertainty knowledge graph are inherently uncertain, and the confidence of inferred unknown fact triples is not very accurate. If the confidence of unknown facts is used directly, unnecessary noise is necessarily introduced. Therefore, during training, the confidence of unknown facts is used as the score of the evaluator, and dynamic adjustment parameters are introduced, so that the generalization capability of the model is improved. The loss function of the unknown fact is defined as equation 8.

Where RS is a set of unknown facts, Q_structureAnd Q_uncertainThe structure score and uncertainty score of the unknown fact, respectively. And lambda is a dynamic adjustment parameter and is obtained through learning.

(3) Loss of estimator

The loss function of the evaluator consists of positive case loss, negative case loss, and loss of unknown fact.

Defined as equation 9.

Γ_Evaluator＝Γ_pos+Γ_neg+Γ_UnFacts (9)

Wherein

Γ_posRepresenting positive case loss, Γ_negRepresenting negative example loss, Γ_UnFactsRepresenting a loss of unknown facts. S is a positive example set, and S' is a negative example set. Negative examples are generated using random replacement head and tail entities, with the confidence of the negative examples set to 0. Negative example is shown in equation 12.

S'＝{(h₁,r,t,0)|h₁∈ε\h}∪{(h,r,t₁,0)|t₁∈ε\t} (12)

(II) confidence generator

The confidence generator is intended to generate a confidence for the triples. To reduce the complexity and parameters of the model, the confidence generator uses Q_uncertainTo approximate the confidence values w of the trues of the triples. The confidence generator, unlike the evaluator, can be viewed as a confidence prediction model based on the triplet energy scores. Q of confidence generator_uncertainOne set of parameters is not shared with the evaluator. In addition, the confidence generator does not require negative case triples and unknown fact triples to participate in training.

The loss function of the confidence generator is defined as equation 13.

Where S is the positive case set and w is the true confidence of the triplet.

Loss function of model (III)

Since the overall model includes two components, an evaluator and a confidence generator. We will train both components simultaneously, so the loss function of the model is defined as equation (14)

Ψ＝Γ_confidence+Γ_Evaluator (14)

Preferably, in the present embodiment, an embedded model SUKE for prediction in the uncertain knowledge graph is provided in the background of the uncertain knowledge graph.

The algorithm provided by the embodiment can fuse the structural information and the uncertain information of the knowledge to obtain knowledge representation.

The representation learning model provided by the embodiment can effectively complete a link prediction task through the evaluator and the confidence generator, and further realize knowledge map completion.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. An uncertain knowledge graph prediction method based on an improved embedded model SUKE is characterized by comprising the following steps: the method comprises the following steps:

step S1: giving an uncertain knowledge graph, wherein the uncertain knowledge graph internally comprises a plurality of quadruples (h, r, t and w), wherein h represents a head entity, t represents a tail entity, r represents the relation of the head entity and the tail entity, and w represents the probability of the occurrence of the relation of the triples (h, r and t); expanding the number of original quadruplets by a probability soft logic reasoning method defined in a UKGE model; finally, dividing the expanded quadruple into a training set of 60 percent, a verification set of 20 percent and a test set of 20 percent for training the SUKE model; the input of the SUKE model is vector representation of h, r and t, and the vector of the entity and the relation in the knowledge base is pre-trained or randomly initialized by using a TransE algorithm;

step S3: adding the obtained quadruple (h, r, t, w) into the original uncertain knowledge graph; wherein link prediction is defined to predict its missing tail entity and confidence given an incomplete quadruple (h, r,; adding a new quadruple obtained by link prediction into the original uncertain knowledge graph to make the knowledge graph more complete; during prediction, for a triplet (h, r,;

the step S2 specifically includes the following steps:

step S24: generating a confidence coefficient for the triple candidate set obtained in the step S23 through a confidence coefficient generator, wherein the value range of the confidence coefficient is 0-1, and forming a new quadruple (h, r, t, w);

the specific content of step S21 is:

the energy function of the DistMult model is shown in formula (1);

E(h,r,t)＝h^Tdiag(r)t (1)

wherein diag (r) is a relational diagonal matrix, h^TRepresenting the transposition of the head entity vector, and t is represented by the tail entity vector;

Q_structureThe calculation method is shown in formula (2):

wherein E (h, r, t) is the energy score of the triplet obtained by the DistMult model, phi_structure() a mapping function that yields a structure score for the triplet energy score; the mapping function is shown in equation (3):

φ_structure(E(h,r,t))＝P_structure·E(h,r,t)+b_structure (3)

wherein P is_structureAs energyMapping parameters of scores to structural scores, b_structureIs an offset; q_uncertainThe calculation method of (2) is shown in formula (4):

φ_uncertain(E(h,r,t))＝P_uncertain·E(h,r,t)+b_uncertain (5)

q of positive example triples in training process_structureTrending towards Q of 1, negative example triplets_structureTends towards 0; q of positive example triplets_uncertainTrend toward true confidence w, Q of negative example triples_uncertainToward a value of 0; for the unknown fact deduced by the logic rule, using an evaluator score to fit the confidence coefficient of the unknown fact during training;

Γ_Evaluator＝Γ_pos+Γ_neg+Γ_UnFacts (7)

Wherein

S'＝{(h₁,r,t,0)|h₁∈ε\h}∪{(h,r,t₁,0)|t₁∈ε\t} (10)；

the specific content of step S22 is:

the linear weighted fusion mode is shown as formula (11):

score_add＝α*Q_structure+β*Q_uncertain (11)

wherein α + β ═ 1;

the multiplicative fusion mode is shown in formula (12):

score_mul＝ε+(1-ε)(Q_structure*Q_uncertain) (12)

wherein epsilon is smooth hyper-parameter, and the meaning of multiplicative fusion mode means Q_uncertainTo Q_structureCarrying out corresponding adjustment; when Q is_uncertainLess, Q will be lowered_structureConversely, Q will be increased_structure(ii) a When a triplet has a higher Q_structureAnd Q_structureThe calculated score value will be large, and conversely, it will be small;

the specific content of step S24 is:

the confidence generator generates confidence w for the candidate set by using a formula (4), and the confidence w and the candidate set obtained in the S23 form a new quadruple (h, r, t, w); however, the mapping function used by the confidence generator is different from the mapping function used by the evaluator, and the main difference is that only a negative example is used when training the confidence generator;

the loss function of the confidence generator is defined as formula (13)

Wherein S is a positive example set, and w is the true confidence of the triple;

Ψ＝Γ_confidence+Γ_Evaluator (14)