CN112507130A

CN112507130A - Triple credibility evaluation method based on multi-source knowledge graph

Info

Publication number: CN112507130A
Application number: CN202011438775.9A
Authority: CN
Inventors: 王萌; 秦旭; 陆保国; 漆桂林; 刘哲一; 姚茜
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-16
Anticipated expiration: 2040-12-10
Also published as: CN112507130B

Abstract

The invention discloses a triple credibility assessment method based on a multi-source knowledge graph, which comprises the following steps of S1, respectively obtaining corresponding vector representations of an entity name, an entity type, an attribute key value and a relation name for a given multi-source knowledge graph; simultaneously, obtaining vector representation of the attribute values of the multi-source knowledge graph in an embedding space; s2, evaluating the credibility of the triples in the single knowledge graph; s3, evaluating the credibility of a single knowledge graph data source in the multi-source knowledge graph; s4, evaluating the credibility of triples in a common space under the condition of the multi-source knowledge graph, and determining the credibility obtained by a single knowledge graph and the mutual influence among the multi-source knowledge graphs together, wherein the mutual influence among the multi-source knowledge graphs is determined by the mutual influence triples among different data sources, namely the interactive triples; and S5, training based on the credible evaluation of the multi-source knowledge graph. The method improves the accuracy of the triple credibility evaluation.

Description

Triple credibility evaluation method based on multi-source knowledge graph

Technical Field

The invention belongs to the technical field of credible evaluation of triple data by using a knowledge graph, and particularly relates to a credible evaluation method based on a multi-source knowledge graph.

Background

With the wide application of knowledge graphs, knowledge graphs related to various fields are correspondingly established. How to analyze and select data with high reliability under the condition of multiple data sources is a current difficult problem. Firstly, the construction of the knowledge graph cannot completely avoid the introduction of error information: errors exist in the source text of information extraction, and the construction process does not screen out the error information through quality evaluation; and as a deviation of the automatic extraction technology, the error information is generated in the information extraction process. The wrong information can greatly influence the performance condition of the knowledge graph in downstream services, secondly, under the condition that a plurality of data sources exist, different knowledge graphs usually have different credibility conditions, and at the moment, the credibility conditions of the multi-source data need to be calculated in a comparison and analysis combined mode. Finally, the data source has dynamic change, and the changed data content can influence the existing map evaluation condition. Therefore, credibility assessment of knowledge-graph data sources and specific data is an indispensable technology for military knowledge-graph reliability.

The chinese patent CN111460155A, patent number CN111460155A, "a method and an apparatus for evaluating information credibility based on knowledge graph" disclosed in the prior art is a method and an apparatus for evaluating credibility based on knowledge graph, and the method includes: extracting a target triple; replacing the relation in the target triple by using the relation in the pre-constructed knowledge graph to obtain a replacement triple; calculating Manhattan distances of corresponding vectors of head entities, corresponding vectors of relation entities and corresponding vectors of tail entities in the target triples and the replacement triples based on a vector representation model which is trained in advance; and sequencing the replacement triples and the target triples according to the calculated distance, and calculating the credibility score of the target triples.

The limitations of the above patents are:

1) firstly, it does not make more full use of semantic information and graphical topological properties;

2) the method of the above patent is directed to a single knowledge graph and does not address the situation of a multi-source knowledge graph.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a triple credibility assessment method based on a multi-source knowledge graph aiming at the defects in the prior art, to fuse different dimensional characteristics existing in multi-modal data by taking multi-modal data as a data base, and to provide a cross-modal sequencing learning model.

The invention adopts the following technical scheme:

a triple credibility assessment method based on a multi-source knowledge graph comprises the following steps:

s1, for a given multi-source knowledge graph, putting the entity name, the entity type, the attribute key value and the relationship name of the multi-source knowledge graph into a common space by a continuous numbering method to respectively obtain corresponding vector representations of the entity name, the entity type, the attribute key value and the relationship name; meanwhile, performing expression learning on the attribute values of the multi-source knowledge graph, and obtaining vector expression of the attribute values of the multi-source knowledge graph in an embedding space through learning;

s2, evaluating the credibility of the triples in the single knowledge graph by utilizing the inherent characteristics of the knowledge graph, including semantic information and graphic information; expressing various vectors obtained in the step S1 through a scoring function to obtain a numerical value between 0 and 1 as the credibility of the triple;

s3, evaluating the credibility of a single knowledge map data source in the multi-source knowledge map, wherein the credibility is determined by the credibility of all triples in the data source, and meanwhile, the credibility of the single knowledge map influences the credibility of the triples in the single knowledge map;

s4, evaluating the credibility of triples in a common space under the condition of the multi-source knowledge graph, wherein the mutual influence between the credibility obtained by the single knowledge graph in the step S2 and the multi-source knowledge graph is determined jointly, and the mutual influence between the multi-source knowledge graphs is determined by the interactive triples which are the triples mutually influenced between different data sources;

and S5, training based on the credible evaluation of the multi-source knowledge graph.

The specific process of step S1 is as follows:

s101, for a given multi-source knowledge graph, under a uniform numbering space, continuously indexing and numbering entity names, entity types, attribute key values and relationship names, and simultaneously generating a mapping relation between the entity names, the entity types, the attribute key values and the relationship names in the uniform numbering space and indexing data source names;

s102, obtaining a text set of all attribute values of the given multi-source knowledge graph in a common space; after the text set is subjected to word segmentation, vector representation of the attribute values of the multi-source knowledge graph in an attribute embedding space is obtained through a representation learning method, specifically a doc2vec method.

In step S2, the reliability evaluation process for the internal triples of the single knowledge graph is specifically as follows:

s201, aiming at the intrinsic characteristics of the knowledge graph, obtaining four intrinsic knowledge graph characteristics: entity fusion attribute information, relationship type constraint, entity embedding information, relationship embedding information, graph energy transfer information: the fusion attribute information of the entity is generated by fusing vector representation corresponding to the attribute key value obtained by the index number and vector representation of the attribute value in the attribute embedding space; if one entity has a plurality of groups of attribute key values and attribute values, averaging vector representation of the entity to obtain fusion attribute information of the entity; 2) the type constraint of the relationship is to judge and predict the coupling degree of the entity and the relationship in the triple by counting the types of the entities appearing at two sides of a certain relationship; 3) the embedded information of the entity is a vector representation with deep semantic information obtained by the entity index through an embedded layer; 4) the embedded information of the relationship is a vector representation with deep semantic information obtained by the relationship index through an embedded layer; 5) the graph energy transfer information is obtained by regarding the whole knowledge graph as a heterogeneous neural network concentrated on graph features, analyzing the energy features of the graph through the entrance and exit degree information of nodes and a PageRank algorithm, and measuring the degree of triple fit with the characteristics of the whole graph;

s202, connecting the information belonging to the same entity or relationship through the neural network full connection layer to obtain the internal information representation of the entity or relationship: wherein, 1) the internal information representation of the entity fuses the fusion attribute information, the embedding information and the graph energy transfer information of the entity; 2) the internal information representation of the relationship integrates relationship type constraint and relationship embedded information;

s203, a numerical value between 0 and 1 is obtained through a scoring function by using the internal information representation of the head entity, the relation and the tail entity as the reliability of the triple.

In step S201, the calculation method of the map energy transfer information is specifically as follows:

for graph energy transfer information, modeling is carried out by using a Page-Rank method, and as long-tail phenomena occur in iteration of energy in a graph, namely energy of most entities is low, extra graphic features of addition degree and departure degree need to be considered, the head entity e of a triple is calculated_sAnd tail entity e_oThe energy between is calculated by

In the formula

Representing by head entity e_sAnd tail entity e_oInformation for determining the degree of entrance and exit, E (E)_s,e_o) And E (E)_s) Respectively represent the subordinate entities e in the Page-Rank algorithm_sTo the tail entity e_oEnergy and head entity e of_sThe energy belongs to the super parameter for adjusting the entrance and exit degree information and the Page-Rank information.

In step S203, the confidence level of the triplet is calculated as follows:

the internal information representation of the entity and the relation is fused, the credibility of the triple is obtained through a scoring function, the result is placed between 0 and 1, the credibility of the triple is closer to 1, otherwise, the credibility of the triple is closer to 0, and therefore the specific function is as follows:

wherein A (-) represents the credibility score of the triple under a single knowledge graph, G represents that the triple is positioned in the knowledge graph G, and r represents a head entity e in the triple_sAnd tail entity e_oThe relationship between the two or more of them,

an internal information representation representing the relationship,

representing head entity e_sIs characterized by the internal information of (a),

representing tail entities e_oIndicates that ". indicates a dot product operation,. sigma.. cndot.represents an activation function.

The step S3 is specifically as follows:

s301, the credibility of the triples in the single knowledge graph is influenced by the credibility of the data source, so that the formula for calculating the credibility of the triples under the condition of the multi-source knowledge graph is as follows:

in the formula

Representing the confidence score of the triplets t in the knowledge graph G, t being the triplets (e)_s,r,e_o) By shorthand of (A), W_GThe credibility of the knowledge-graph G is shown,

representation subject knowledge graphThe credibility of the triplets t under the influence of the credibility;

s302, the credibility of the single knowledge graph data source is determined by the credibility of all triples in the data source, the credibility of each knowledge graph is initialized to be 1, and then the reliability is determined by the triples, so that the formula for calculating the credibility of the data source under the condition of the multi-source knowledge graph is as follows:

in the formula n_GRepresenting the number of all triples in the knowledge-graph G, and te G representing all triples belonging to the knowledge-graph G.

In step S4, the influence manner of the triples that affect each other between different data sources is as follows:

three different interaction methods are used to model the way a triplet is affected, which models the interaction from the attribute value level, semantic alignment level and graph neighbor level respectively:

s401, attribute value interactive triples; the two entities in the same real world have higher attribute similarity and can mutually influence each other, and the scoring mode of the attribute value interactive triple is as follows:

the attribute value interaction is defined as the attribute value interaction between the triple entities, so the calculation formula of the attribute value interaction score is as follows:

wherein ITE_valueThe method comprises the following steps of (1) () representing an attribute value interaction scoring function between solution triples, | · | representing a counting function for solving the number of elements in a set, p (-) representing an attribute value set of an entity, and c (-) representing a set of common attribute values between the entities;

s402, semantic alignment interactive triples; in the embedding space, the semantically aligned entities or relationships are closely related and can interact with each other, and the distance between the semantically aligned entity pairs in the embedding space is calculated specifically as follows:

knowledge graph G₁Of (2)

And knowledge graph G₂Of (2)

For semantically aligned interactive, at least one entity or relationship of the corresponding triplet is close in the embedding space, and the interactive triplet further requires that the distance of the corresponding triplet on the non-interactive part is not greater than the threshold, so the distance interaction score is calculated as:

in the formula ITE_alignment(. h) represents the semantic alignment interaction function between solution triples, | represents the L2 distance of an entity pair or relationship pair over embedding space;

for vectors

The L2 value means

For vectors

Sum vector

L2 distance finger

S403, drawing neighbor interactive triples; according to semantic alignment similarity studies, entities sharing similar environments are more likely to have semantic similarity and can interact with each other:

the distance calculation method between entities in similar environment is specifically as follows:

by evaluating the surrounding environment of the head entity and the tail entity, the neighbor interaction relationship between the triad pairs can be analyzed, so that the calculation formula of the graph neighbor interaction score is as follows:

in the formula ITE_neighbor(·) represents a graph neighbor interaction scoring function between solution triples, and N (-) represents the mean of entity neighbors over the embedding space;

s404, an influence mode of the interactive triple; if triples in different knowledge-graphs are interactive, they will affect each other;

determining that a set of triples belongs to an interaction triplet requires that any interaction score be less than a threshold, and therefore the calculation formula is:

in the formula, I (·,) represents a function for calculating whether two triples are interactive triples, wherein the function is 1 or 0, 1(·) represents a judgment function with the condition of 1 or 0 when true, and theta_valueThreshold value, theta, representing the interaction score of the attribute value_alignmentThreshold, θ, representing semantic alignment interaction score_neighbourA threshold value representing a graph neighbor interaction score; the interactive triplet is specifically affected as follows: if the credibility of the triples is close, namely the credibility difference is lower than the critical value, the credibility of the triples is gradually close, when the credibility difference of the triples in two interactions is larger than a critical value, the credibility scores of the pair of triples are all reduced, in order to utilize the information in the training of characterization learning, the loss of the following custom design is implemented for modeling, so the calculation formula is,

wherein R (i, j) represents a function for calculating the interactive triplet confidence loss, where i is at G₁Wherein j is at G₂And i and j are interactive triplets, and int (th) is a difference control threshold, if two triplets are not interactive triplets, the loss of the item does not need to be considered.

In step S5, the training step based on the multi-source knowledge-graph credible evaluation is specifically as follows:

s501, in each map in the multi-source knowledge map, the loss of credibility intervals between credibility evaluation positive example triples and negative example triples obtained according to negative sampling of the credibility evaluation positive example triples needs to be calculated, S represents the set of all knowledge maps, and for all knowledge maps G belonging to S, for any positive example triplet t in G⁺(the reliability evaluation label is 1) negative sampling is carried out, the head entity or the tail entity of the negative sampling is randomly replaced, and a negative example triple t is obtained^-(the expected reliability evaluation label is 0), and the reliability evaluation results are respectively

And

the specific formula for the loss calculation is therefore:

in the formula L_relRepresenting the loss of negative sampling of triples inside each graph in the multi-source knowledge graph, max (·,) represents a function taking a larger numerical value in the function, gamma represents the interval of credibility evaluation results of positive-case triples and negative-case triples, and c (t) represents the credibility evaluation result of the positive-case triples t⁺Negative example triple t obtained by negative sampling^-A set of (a);

s502, calculating the interactive influence loss of the interactive triples among all the maps in the multi-source knowledge map, and calculating the interactive influence loss of all the interactive triples, wherein the calculation formula is as follows:

in the formula L_intRepresenting the interaction influence loss of all interaction triples, In representing the set of all interaction triples;

s503, in order to obtain the total loss L of model training by fusing the two loss functions, introducing a hyper-parameter beta to adjust the loss L of negative sampling of each internal triple of each map in the multi-source knowledge map_relLoss of interaction with all interaction triplets L_intMeanwhile, in order to avoid overfitting, a regularization term needs to be added, so the calculation formula is:

where Ω represents the set of all parameters, and W represents any parameter in the model, | · |₂Represents the L2 canonical function; for parameter value

Or

The L2 regular function value means

Or

S504, after the loss is obtained through calculation, back propagation is carried out according to the loss, model parameters are updated, and the evaluation is carried out again until the model converges

Compared with the prior art, the invention has at least the following beneficial effects:

a triple credibility evaluation model for a multi-data-source knowledge graph is provided, and the model models the knowledge graph, the triples and the relation between the knowledge graphs under the condition of the multi-source knowledge graph. On a single knowledge graph, the reliability of the triples is evaluated by using the internal characteristics of the knowledge graph. In the case of multi-source knowledge graphs, the interaction between the knowledge graphs is represented by interactive triples. Finally, three different interaction methods for interactive triplets across the knowledge-graph are proposed, which have several major advantages:

1) the triple evaluation method under the condition of a single-source knowledge graph and under the condition of a multi-source knowledge graph is distinguished and defined.

2) Semantic information and graphical topological characteristics are used to evaluate triple confidence.

3) And under the condition of modeling the multi-source knowledge graph, the influence interaction mode among different knowledge graphs is established.

In conclusion, the method can solve the problem of triple credibility evaluation under the multi-source knowledge graph, wherein evaluation of the multi-source knowledge graph is divided into two aspects of data source and triple evaluation, the characteristics under the condition of the single-source knowledge graph can fully utilize the evidence information of the content of the single knowledge graph, and finally, through interactive triple influence among the multi-source knowledge graphs, mutual influence among different data sources is modeled, and the precision of triple credibility evaluation is improved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a block diagram of the method in an example of the invention;

fig. 2 is a model comparison test PR curve on a noisy data set, where fig. 2(a) is a test PR curve for model comparison on a noisy data set containing 10% negative examples, where fig. 2(b) is a test PR curve for model comparison on a noisy data set containing 20% negative examples, and where fig. 2(c) is a test PR curve for model comparison on a noisy data set containing 40% negative examples.

Detailed Description

Referring to fig. 1, the triple credibility assessment method based on the multi-source knowledge graph of the present invention includes the following steps:

and S1, for a given multi-source knowledge graph, putting the entity, entity category, relationship and attribute information of the multi-source knowledge graph into a common space by a continuous numbering method. Meanwhile, performing representation learning on attributes of the multi-source knowledge graph to obtain embedded information of the attributes;

s101, for a given multi-source knowledge graph, under a uniform numbering space, uniformly numbering entity names with graph sources, entity types, attribute key values and relationship names, and simultaneously generating a mapping relationship from indexes to data source names under the uniform numbering space such as the entity names; under a common space, segmenting attribute values to obtain text sets of all attributes;

s102, obtaining vector representation of attribute values in an attribute embedding space by a representation learning method based on the text set of all attributes;

s103, generating the category index and the attribute information of the entity under the condition on the basis of the obtained index information and attribute vector representation.

S2, evaluating the credibility of the triples in the single knowledge graph by utilizing the inherent characteristics of the knowledge graph, including semantic information and graphic information;

s201, aiming at the intrinsic characteristics of the knowledge graph, utilizing four intrinsic knowledge graph characteristics: entity fusion attribute information, entity type constraints, entity and relationship embedding, and graph energy transfer information. And aiming at the obtained attribute coding information, fusing the value and the one-hot code of the attribute key name through a neural network to represent the hidden information of the complete attribute. If an entity has multiple attributes, the hidden attribute information is averaged to obtain the fused attribute information of the entity. The summary results in the entity types surrounding each relationship. And after embedding the entity and the corresponding relation, fusing the entity and the corresponding relation into a scoring function. Modeling was performed using a Page-Rank-like approach for map energy transfer information. Since the iteration of energy in the graph can have a long tail phenomenon, that is, most entities have low energy, additional graph features considering the degree of addition and the degree of departure are needed.

Thus computing the head entity e of the triplet^sAnd tail entity e^oThe energy between is calculated by

In the formula

S202, connecting the information belonging to the same entity or relationship through the neural network full-connection layer to obtain the internal information representation of the entity or relationship;

and S203, obtaining the credibility of the triple through a scoring function. And fusing the internal information representations of the entities and the relations, and obtaining the credibility scores of the triples through the following scoring functions.

an internal information representation representing the relationship,

s301, attribute value interactive triples; if two entities refer to the same real world entity, then the pair of entities should have similar attributes and will interact with each other; an attribute-value interactive triple may be defined as an attribute-value interaction score between interactive triple entities.

Chinese ITE_valueThe method comprises the following steps of (1) () representing an attribute value interaction scoring function between solution triples, | - | representing a counting function for solving the number of elements in a set, p (-) representing an attribute value set of an entity, and c (-) representing a set of common attribute values between the entities. S302, aligning the interactive triples; in the embedding space, the aligned entities or relationships are closely related and will interact with each other; triplet of knowledge graph 1

And the triplet of FIG. 2

Are aligned and interactive, at least one entity or relationship of the corresponding triplet is tight in the embedding space. Meanwhile, the interactive triplets also require that the distance of the corresponding triplets on the non-interactive part is not greater than a threshold. The formula is as follows:

in the formula ITE_alignment(. h) represents the distance interaction score function between solution triplets, | represents the L2 distance of an entity pair or relationship pair over embedding space;

for vectors

The L2 value means

For vectors

Sum vector

L2 distance finger

S303, neighbor interactive triples; according to the semantic similarity research, if the entities share similar environments, the pair of entities are likely to have semantic similarity and to mutually influence each other; by evaluating the surrounding environment of the head and tail entities, the neighbor interaction relationship between triplet pairs can be analyzed.

In the formula ITE_neighbor(. cndot.) represents the neighbor interaction score function between solution triples, and N (-) represents the mean of the entity neighbors over the embedding space.

S304, judging whether any interaction score is smaller than a threshold value when a group of triples belongs to the interaction triples, and therefore the calculation formula is as follows:

in the formula, I (·,) represents a function for calculating whether two triples are interactive triples, wherein the function is 1 or 0, 1(·) represents a judgment function with the condition of 1 or 0 when true, and theta_valueThreshold value, theta, representing the interaction score of the attribute value_alignmentThreshold value, theta, representing the distance interaction score_neighbourA threshold value representing a neighbor interaction score.

S305, an influence mode of the interactive triple: if the credibility of the triples is close, namely the credibility difference is lower than the critical value, the credibility of the triples is gradually close. When the confidence difference of the triples in two interactions is larger than a critical value, the confidence scores of the triples should be reduced. To utilize this information in the training of characterization learning, it is modeled by the loss of implementing the following custom design, and therefore its calculation formula is,

wherein R (i, j) represents a function for calculating the interactive triplet confidence loss, where i is at G₁Wherein j is at G₂And i and j are interaction triplets, and int (th) is a difference control threshold. If the two triples are not interactive triples, then the item loss need not be considered.

And S4, evaluating the credibility of the triples in the common space under the condition of the multi-source knowledge graph, wherein the evaluation is jointly determined by the mutual influence between the internal factors of the single knowledge graph and the multi-source knowledge graph. Triples that interact between different data sources are referred to as interactive triples. And expressing various vectors obtained in the steps, and obtaining a numerical value between 0 and 1 as the reliability of the triplet through a scoring function.

S401, the credibility of the triples in the single knowledge graph is influenced by the credibility of the data source, so that the specific function is as follows:

an internal information representation representing the relationship,

S402, the credibility of a single knowledge-graph data source is determined by the credibility of all triples in the data source. The confidence level of each knowledge-graph is assigned to 1 at initialization, and then is determined by the triples,

therefore, the formula for calculating the reliability of the data source under the condition of the multi-source knowledge graph is as follows:

S5, training based on multi-source knowledge graph credibility assessment:

s501, in each map in the multi-source knowledge map, the loss of credibility intervals between a credibility evaluation positive example triple and a negative example triple obtained by negative sampling of the credibility evaluation positive example triple needs to be calculated, S represents the set of all knowledge maps, and for all knowledge maps G belonging to S, G is subjected to comparisonAny positive example triplet t⁺(the reliability evaluation label is 1) negative sampling is carried out, the head entity or the tail entity of the negative sampling is randomly replaced, and a negative example triple t is obtained^-(the expected reliability evaluation label is 0), and the reliability evaluation results are respectively

And

the specific formula for the loss calculation is therefore:

in the formula L_relAnd c (t) represents a set of negative example triples t' obtained by carrying out negative sampling on the positive example triples t.

in the formula L_intRepresenting the loss of interaction impact for all interaction triplets, and In represents the set of all interaction triplets.

Or

The L2 regular function value means

Or

And S504, after the loss is obtained through calculation, back propagation is carried out according to the loss, the model parameters are updated, and the evaluation is carried out again until the model is converged.

1. Experimental data show that:

aiming at the problem of multi-knowledge-graph credibility assessment, in view of the similarity of the problem and the problem of knowledge-graph alignment, and in the proposed model, entity alignment is an important interaction factor, a data set aligned by the knowledge graphs, namely a DBP-WD data set, is adopted as an evaluation data set. The dataset was published by Sun Zealand et al at https:// github. com/nju-websoft/BootEA. The dataset is constructed by sampling from DBpedia and Wikidata, mainly movie related data information, where there are 10 ten thousand aligned entities. Table 1 gives the statistical data of the experimental data set. The triplets for the training set, validation set, and test set are 547240, 182414, and 182414, respectively.

Statistical information for the data set of Table 1

In order to evaluate the performance of the multi-source knowledge-graph credible evaluation model provided by the patent on a noise data set, the test considers that a triple in an original data set is a positive-case triple, a negative-case triple is generated at the same time, the original data set is added, and the recognition condition of the test model on positive and negative cases is evaluated. The method for generating the negative example triple randomly replaces a head entity or a tail entity in the original triple, and the characteristic attributes of the type, the neighborhood and the like of the entity are not subjected to negative sampling. Adding negative examples on the basis of the original assumed correct data set, a noisy data set containing 10%, 20%, 40% negative example triples was constructed. The partitioning of the training set, validation set, and test set in the noisy dataset follows 6: 2: 2, the specific information of the negative example triple quantity is shown in Table 2

TABLE 2 statistical information on noise in noise data sets

2. Experimental setup

The reliability evaluation method provided by the patent limits the model output to [0,1 ]]Insofar, a correct triplet model will be marked as 1 and an incorrect triplet model will be marked as 0. This patent has set up a threshold TC and has judged the reliable degree of a triplet t. When the model gives a book result

If the triple t is greater than TC, the triple t is true and reliable, and if the triple t is smaller than TC, the triple t is false and unreliable.

3. Results of the experiment

In the noise test data set, by setting different threshold values, the accuracy and the recall rate of the model for the positive case identification are changed, so that a corresponding PR curve which takes precision and recall as variables is drawn. Fig. 2 is a graph of PR curves plotted on three sets of noise data, where fig. 2(a) is a test PR curve for model comparison on a set of noise data containing 10% negative examples, where fig. 2(b) is a test PR curve for model comparison on a set of noise data containing 20% negative examples, and where fig. 2(c) is a test PR curve for model comparison on a set of noise data containing 40% negative examples. As can be seen from all three subgraphs of FIG. 2, the accuracy rate and the recall rate of the negative example triples identified by the model are in a complementary relationship, and the corresponding recall rate of the negative example triples is reduced when the accuracy of the negative example triples evaluated by the model is higher. Meanwhile, compared with a comparison model TransE model, the PR curve provided by the patent is arranged at the upper right side of the model, and shows that the recall rate of the negative example triple identification is higher than that of the TransE model under the condition of the same accuracy rate, and the accuracy rate of the negative example triple identification is higher than that of the TransE model under the condition of the same recall rate, so that the overall performance is excellent.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A triple credibility assessment method based on a multi-source knowledge graph is characterized by comprising the following steps:

2. The triple credibility assessment method based on multi-source knowledge-graph according to claim 1, wherein the specific process of step S1 is as follows:

3. The multi-source knowledge-graph-based triple credibility assessment method according to claim 1, wherein in the step S2, a credibility assessment process for a triple within a single knowledge-graph is as follows:

4. The triple credibility assessment method based on multi-source knowledge graph according to claim 3, wherein in step S201, the graph energy transfer information is calculated in the following specific manner:

In the formula

Representing by head entity e_sAnd tail entity e_oInformation for determining the degree of entrance and exit, E (E)_s，e_o) And E (E)_s) Respectively represent the subordinate entities e in the Page-Rank algorithm_sTo the tail entity e_oEnergy and head entity e of_sThe energy belongs to the super parameter for adjusting the entrance and exit degree information and the Page-Rank information.

5. The method for credible evaluation of triples based on multi-source knowledge-graph according to claim 3, wherein in step S203, the credibility of triples is calculated as follows:

an internal information representation representing the relationship,

6. The triple credibility assessment method based on multi-source knowledge-graph according to claim 1, wherein the step S3 specifically comprises the following steps:

in the formula

Representing the confidence score of the triplets t in the knowledge graph G, t being the triplets (e)_s，r，e_o) By shorthand of (A), W_GThe credibility of the knowledge-graph G is shown,

representing the credibility of the triplets t under the influence of the credibility of the knowledge graph;

7. The multi-source knowledge-graph-based triple credibility assessment method according to claim 1, wherein in step S4, the influence manner of the triple that is influenced by each other between different data sources is as follows:

knowledge graph G₁Of (2)

And knowledge graph G₂Of (2)

Are semantically aligned interactive, then in the embedding space, the pairAt least one entity or relationship of the corresponding triplet is compact, and the interactive triplet further requires that the distance of the corresponding triplet on the non-interactive part is not greater than the threshold, so the distance interaction score is calculated as:

in the formula ITE_alignment(. h) represents the semantic alignment interaction function between solution triples, | | | · | | | represents the L2 distance of an entity pair or a relationship pair on the embedding space;

for vectors

The L2 value means

For vectors

Sum vector

L2 distance finger

wherein R (i, j) represents a function for calculating the interactive triplet confidence loss, where i is at G₁Wherein j is at G₂Where i and j are interactive triplets and int (th) is the difference control threshold, if two are threeThe tuple is not an interactive triplet, and the loss does not need to be considered.

8. The triple credibility assessment method based on multi-source knowledge-graph of claim 1, wherein in step S5, the training steps based on multi-source knowledge-graph credibility assessment are as follows:

And

the specific formula for the loss calculation is therefore:

in the formula, Ω represents the set of all parameters, W represents any parameter in the model, | · | purple₂Represents the L2 canonical function; for parameter value

Or

The L2 regular function value means

Or