CN114077676B - Knowledge graph noise detection method based on path confidence - Google Patents

Knowledge graph noise detection method based on path confidence Download PDF

Info

Publication number
CN114077676B
CN114077676B CN202111393836.9A CN202111393836A CN114077676B CN 114077676 B CN114077676 B CN 114077676B CN 202111393836 A CN202111393836 A CN 202111393836A CN 114077676 B CN114077676 B CN 114077676B
Authority
CN
China
Prior art keywords
path
confidence
matrix
triples
gru
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111393836.9A
Other languages
Chinese (zh)
Other versions
CN114077676A (en
Inventor
马江涛
周辰宇
王艳军
李端阳
贾泽臣
马宇科
李霆
卢威光
张蓓蕾
李清扬
赵一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Tupu Information Technology Co ltd
Zhengzhou University of Light Industry
Original Assignee
Henan Tupu Information Technology Co ltd
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Tupu Information Technology Co ltd, Zhengzhou University of Light Industry filed Critical Henan Tupu Information Technology Co ltd
Priority to CN202111393836.9A priority Critical patent/CN114077676B/en
Publication of CN114077676A publication Critical patent/CN114077676A/en
Application granted granted Critical
Publication of CN114077676B publication Critical patent/CN114077676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a knowledge graph noise detection method based on path confidence coefficient, which comprises the following steps: firstly, initializing triples, finding all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedded sequences; wherein, a node is formed between adjacent triples in the path embedding sequence; secondly, sequentially inputting the nodes into the CPLL to calculate the confidence score of each node in each path; respectively obtaining a scoring matrix of each path from each path of Bi-GRU; and finally, taking the L2 norm of the score matrix of each path as a path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triplet. The invention combines the method based on the path and the method based on the rule, and improves the efficiency of detecting the noise in the knowledge graph, thereby improving the quality of the knowledge graph.

Description

Knowledge graph noise detection method based on path confidence
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a knowledge graph noise detection method based on path confidence.
Background
Nowadays, knowledge-graphs play an important role in solving the task of artificial intelligence. However, manually or automatically constructed knowledgemaps have a number of quality issues, and often contain some erroneous or missing triples. Noise in the knowledge-graph may be caused by human error or errors in the data, with most noise appearing as erroneous entities or relationships in the triples. Currently, more and more scholars are beginning to focus on the problem of knowledge-graph noise and come up with many solutions.
Noise detection methods in knowledge-graphs can be broadly divided into path-based methods and rule-based methods. Path-based methods start with TransE, TransH, TransR, etc. translation models, which, although they are mostly used for knowledge-graph embedded representation and completion, can also be used to detect noise in the knowledge-graph. The PaTyBRED model proposed by Melo et al, which incorporates type and path features into a local relationship classifier, preserving a specific path for each relationship to indicate whether a triplet is erroneous. Xie et al propose a CKRL model that utilizes the local and global information of triples to represent the probability of a triplet being erroneous. However, the path-based approach is weak in the ability to find noise and is not suitable for processing knowledge-graphs containing complex relationships. Rule-based methods generally have a stronger noise detection capability than path-based methods. Brocheler et al propose a PSL model that extracts the most likely correct triples from ambiguous triples using first order predicate logic and weighting rules. Abedini et al propose Correction Tower, identifying discrete, inconsistent and error relationships in triples in three steps. However, rule-based methods lack the ability to represent the knowledge-map, i.e., after the rule-based methods detect and reject noise in the knowledge-map, it is also necessary to map the knowledge-map to a continuous vector space in order to make it more convenient to manipulate the knowledge-map in downstream tasks.
If the path-based approach and the rule-based approach can be combined, not only noise can be found, but a noise-free knowledge graph representation can also be constructed. Specifically, firstly, in the path of the triple, a rule is made to screen out the effective features. These features are required to distinguish noise information from correct information, and the correct information includes global triplet information and local triplet information. And then, the noise detection and the triple representation work are completed by utilizing the characteristics, so that the quality of the knowledge graph is improved, and the user experience is improved.
Disclosure of Invention
The invention provides a method for detecting noise of a knowledge graph based on path confidence, which is used for solving the technical problems that the existing method based on the path is weak in noise finding capability and is not suitable for processing the knowledge graph containing complex relationships and the rule-based method lacks the capability of knowledge representation.
The technical scheme of the invention is realized as follows:
a knowledge graph noise detection method based on path confidence includes the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
Preferably, in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′ i ·(N′ i+1 ) T (1);
N′ i =(x′ i ,r′ i ,x′ i+1 ) (2);
N′ i+1 =(x′ i+1 ,r′ i+1 ,x′ i+2 ) (3);
wherein, N' i An embedded matrix, N ', representing the ith triplet on the path' i+1 An embedded matrix representing the (i +1) th triplet on the path, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transpose of x' i 、x′ i+1 、x′ i+2 All represent entity, r' i And r' i+1 All represent relationships;
s22, connecting the node T with the parameter matrix W 0 Multiplying to obtain the global confidence among the triples, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate by the node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T 1 =x′ i ·x′ i+1 ,T 2 =r′ i ·r′ i+1 ,T 3 =x′ i+1 ·x′ i+2 (5);
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
Figure BDA0003369718830000021
Figure BDA0003369718830000022
Figure BDA0003369718830000023
Figure BDA0003369718830000024
Figure BDA0003369718830000031
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure BDA0003369718830000032
respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence score G of the node T i
G i =GTT(i,i+1)·LTT(i,i+1) (12)。
Preferably, in step three, the specific method is as follows:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure BDA0003369718830000033
Figure BDA0003369718830000034
wherein,
Figure BDA0003369718830000035
which represents the output result of the forward GRU,
Figure BDA0003369718830000036
the output result of backward GRU is shown, and GRU (-) shows a gating cycle network.
S32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
Figure BDA0003369718830000037
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,
Figure BDA0003369718830000038
represents the final output result of the forward GRU,
Figure BDA0003369718830000039
represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Preferably, in step four, the path confidence and the optimal triplet are calculated by the following methods:
Figure BDA00033697188300000310
when in use
Figure BDA00033697188300000311
When, h (f) k )=h(p j ) (17);
Wherein g (p) represents path confidence, h (p) j ) A matrix of the scores of the paths is represented,
Figure BDA00033697188300000312
l2 function, g (f), representing a matrix k ) Indicates the maximum path confidence, h (f) k ) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
Preferably, the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
Figure BDA0003369718830000041
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
Compared with the prior art, the invention has the following beneficial effects:
1) on the basis of internal structure information in a knowledge graph based on a path, a probability model based on correlation degree is introduced and fused into a neural network structure to detect noise in the knowledge graph and perform knowledge graph representation.
2) The invention constructs a path confidence network to calculate the global triple confidence and the local triple confidence, and obtains the path confidence and the path score matrix of the triple by combining a bidirectional gating circulation network; the path confidence is used to determine whether the triplet is correct, and the path score matrix is used to represent the triplet.
3) The invention solves the problem of knowledge graph noise, completes the representation of the knowledge graph and obtains good effect in the detection test of the knowledge graph noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a sub-graph of all paths from entity "champions" to entity "teams";
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a flow chart of the proposed model of the present invention;
FIG. 4 is a block diagram of a correlation-based probabilistic logic model of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art based on the embodiments of the present invention without inventive step, are within the scope of the present invention.
In general, the existence of some relationship between triplets in a knowledge-graph can be expressed in the form of a path. When the triplet f is expressed as (h, R, t), the path P from the head entity h to the tail entity t as (h, R, t) is an option that cannot be ignored. Wherein, R includes at least one relationship, and possibly several entities, and these entities and relationships may form several triples N, which is referred to as path triples in the present invention. Every two adjacent triplets constitute a node. And R ≧ R, when R ≧ R, path P is equal to f, indicating that f is the shortest path.
There may be multiple paths from the head entity to the tail entity, but some paths are not correct, some paths are not complete, and information in some paths is not suitable for use in the triplet representation. FIG. 1 shows f 1 The set of all paths for ("champion", "joining", "team"), i.e., the set of paths for the entity "champion" to the entity "team". In FIG. 1, f 1 Is the shortest path, also the triplet itself, f 2 ("," 3 The correct triplet is the "basketball game", "equals" and "match". Thus, f 2 Or f 3 The combined path with the other triplets is noisy. These noisy paths must undergo some processing before their path score matrices can be used to represent the triples.
However, most path-based knowledge graph representation methods do not exclude noise contained in the path. But the rule-based approach is well suited to solve the problem of noise contained in the path. Specifically, a confidence level is given to each node in the path to indicate how likely the node is correct, and then a path confidence level is obtained by probability combination, and the path confidence level indicates how likely the path is correct. If the path from the head entity to the tail entity only has the triplet itself, then the triplet is the only node in the path. At this time, the triple confidence, the node confidence and the path confidence are equal. In fact, there may be multiple paths, and it is most appropriate to take the path with the highest path confidence to represent the triplet. If the triples are represented in the form of a matrix, the path score matrix is obtained by the probability combination between the node confidence degrees, and the L2 norm of the path score matrix is used as the confidence degree of the path.
As shown in fig. 2, an embodiment of the present invention provides a method for detecting noise in a knowledge-graph based on path confidence, which includes the following specific steps:
the method comprises the following steps: for the triples with the number of E, finding all paths of all the triples, initializing the number of the triples with the number of E as E, and traversing all the triples; and traversing all paths of the triples, wherein the number of the paths is P, and the number of the initialized paths is P. Embedding each triple of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedding sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n; the number of initialization nodes is N. The structure of the present invention is shown in fig. 3.
Step two: as shown in fig. 4, the nodes are sequentially input to a probability logic layer (CPLL) based on the correlation, and the confidence score of each node in each path is calculated;
in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′ i ·(N′ i+1 ) T (1);
N′ i =(x′ i ,r′ i ,x′ i+1 ) (2);
N′ i+1 =(x′ i+1 ,r′ i+1 ,x′ i+2 ) (3);
wherein, N' i ,N′ i+1 Denote the embedded matrices of the ith and i +1 triplets on the path, respectively, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transposed, x' i 、x′ i+1 、x′ i+2 Represents entity r' i And r' i+1 Representing the relationship.
S22, connecting the node T with the parameter matrix W 0 The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
where GTT (i, i +1) is the global triple confidence.
S23, the node T enters separation&In the filling operation layer, on the diagonal of TSub-matrix block T 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T 1 =x′ i ·x′ i+1 ,T 2 =r′ i ·r′ i+1 ,T 3 =x′ i+1 ·x′ i+2 (5);
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
Figure BDA0003369718830000061
Figure BDA0003369718830000062
Figure BDA0003369718830000063
Figure BDA0003369718830000064
Figure BDA0003369718830000065
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure BDA0003369718830000066
respectively representing different logical operations, and LTT (i, j) is local triple confidence.
S24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node T i
G i =GTT(i,i+1)·LTT(i,i+1) (12)。
Step three: respectively inputting the confidence scores of all nodes in each path into a Bi-GRU (bidirectional gated-loop network) according to the front and back sequence to obtain a score matrix of each path;
in the third step, the specific method is as follows:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure BDA0003369718830000067
Figure BDA0003369718830000068
wherein,
Figure BDA0003369718830000071
which represents the output result of the forward GRU,
Figure BDA0003369718830000072
the output result of backward GRU is shown, and GRU (-) shows a gating cycle network.
S32, in order to retain the effective information to the maximum, performing the connection, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain the path score matrix:
Figure BDA0003369718830000073
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,
Figure BDA0003369718830000074
representing the final output result of the forward GRU,
Figure BDA0003369718830000075
represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal triple.
In the fourth step, the path confidence and the optimal triplet are calculated by the following methods:
Figure BDA0003369718830000076
when the temperature is higher than the set temperature
Figure BDA0003369718830000077
When, h (f) k )=h(p j )(17);
Wherein g (p) represents path confidence, h (p) j ) A matrix of the scores of the paths is represented,
Figure BDA0003369718830000078
l2 function, g (f), representing a matrix k ) Indicates the maximum path confidence, h (f) k ) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
In order to train the model proposed by the present invention, the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
Figure BDA0003369718830000079
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, an invalid triple refers to a triple formed by randomly switching one head entity or one tail entity of an original triple, and a valid triple refers to an original triple.
The present invention uses three reference datasets FB15K, WN18, and NELL995 of knowledge-map noise detection, which are constructed from information extracted from the Freebase, WordNet, and NELL knowledge bases, respectively. Their statistics are listed in table 1.
TABLE 1 statistics of the baseline data sets FB15K, WN18, and NELL995
Figure BDA00033697188300000710
Figure BDA0003369718830000081
To evaluate the performance of the model, noise needs to be added to the data set described above. The basic method is as follows: for a given positive triplet (h, r, t), one of the head or tail entities is randomly switched to form a negative triplet (h ', r, t) or (h, r, t') as noise. In this way, a data set containing 10%, 20%, 40% noise is constructed for each reference data set. These noisy data sets share the same entity, relationship, validation, and test sets as the original data set, and all the noise generated is fused into the original training set.
The invention combines the L2 norm of the path score matrix
Figure BDA0003369718830000082
As path confidences, all triples in the training set are then ranked according to these path confidences. The greater the path confidence of a triplet, the more effective the triplet is represented.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A knowledge graph noise detection method based on path confidence is characterized by comprising the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
in the second step, the specific method is as follows:
s21, initializing the input node T:
Figure FDA0003789590570000017
Figure FDA0003789590570000018
Figure FDA0003789590570000019
wherein, N' i An embedded matrix, N ', representing the ith triplet on the path' i+1 An embedded matrix representing the (i +1) th triplet on the path, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transpose of x' i 、x′ i+1 、x′ i+2 All represent entity, r' i And r' i+1 All represent relationships;
s22, connecting the node T with the parameter matrix W 0 The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
Figure FDA00037895905700000110
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
Figure FDA0003789590570000011
Figure FDA0003789590570000012
Figure FDA0003789590570000013
Figure FDA0003789590570000014
Figure FDA0003789590570000015
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,
Figure FDA0003789590570000016
respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence coefficient;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node T i
G i =GTT(i,i+1)·LTT(i,i+1) (12);
Step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
2. The method for detecting knowledge-graph noise based on path confidence as claimed in claim 1, wherein in step three, the specific method is:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
Figure FDA0003789590570000021
Figure FDA0003789590570000022
wherein,
Figure FDA0003789590570000023
the output result of the forward GRU is represented,
Figure FDA0003789590570000024
representing the output result of backward GRU, GRU (-) represents the gating cycle network;
s32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
Figure FDA0003789590570000025
wherein h (p) represents the output result of the gated cyclic network, i.e., the path score matrix,
Figure FDA0003789590570000026
representing the final output result of the forward GRU,
Figure FDA0003789590570000027
represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
3. The method for knowledge-graph noise detection based on path confidence as claimed in claim 2, wherein in step four, the path confidence and the optimal triplet are calculated by:
Figure FDA0003789590570000028
when in use
Figure FDA0003789590570000029
When, h (f) k )=h(p j )(17);
Wherein g (p) represents path confidence, h (p) j ) A matrix of the path scores is represented,
Figure FDA00037895905700000210
l2 function, g (f), representing a matrix k ) Indicates the maximum path confidence, h (f) k ) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
4. The method of knowledge-graph noise detection based on path confidence of claim 3, wherein the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
Figure FDA0003789590570000031
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
CN202111393836.9A 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence Active CN114077676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111393836.9A CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111393836.9A CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Publications (2)

Publication Number Publication Date
CN114077676A CN114077676A (en) 2022-02-22
CN114077676B true CN114077676B (en) 2022-09-30

Family

ID=80284076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111393836.9A Active CN114077676B (en) 2021-11-23 2021-11-23 Knowledge graph noise detection method based on path confidence

Country Status (1)

Country Link
CN (1) CN114077676B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691896B (en) * 2022-05-31 2022-09-13 浙江大学 Knowledge graph data cleaning method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10606849B2 (en) * 2016-08-31 2020-03-31 International Business Machines Corporation Techniques for assigning confidence scores to relationship entries in a knowledge graph
CN112035672B (en) * 2020-07-23 2023-05-09 深圳技术大学 Knowledge graph completion method, device, equipment and storage medium
CN112732931A (en) * 2021-01-08 2021-04-30 中国人民解放军国防科技大学 Method and equipment for noise detection and knowledge completion of knowledge graph
CN112819162B (en) * 2021-02-02 2024-02-27 东北大学 Quality inspection method for knowledge-graph triples
CN112836064B (en) * 2021-02-24 2023-05-16 吉林大学 Knowledge graph completion method and device, storage medium and electronic equipment
CN113420163B (en) * 2021-06-25 2022-09-16 中国人民解放军国防科技大学 Heterogeneous information network knowledge graph completion method and device based on matrix fusion

Also Published As

Publication number Publication date
CN114077676A (en) 2022-02-22

Similar Documents

Publication Publication Date Title
WO2021159742A1 (en) Image segmentation method and apparatus, and storage medium
Mazumdar et al. Clustering with noisy queries
CN113407759B (en) Multi-modal entity alignment method based on adaptive feature fusion
CN110413704B (en) Entity alignment method based on weighted neighbor information coding
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN109753571B (en) Scene map low-dimensional space embedding method based on secondary theme space projection
CN103810288B (en) Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
CN112258486B (en) Retinal vessel segmentation method for fundus image based on evolutionary neural architecture search
CN110851491A (en) Network link prediction method based on multiple semantic influences of multiple neighbor nodes
CN113656596A (en) Multi-modal entity alignment method based on triple screening fusion
CN110362715A (en) A kind of non-editing video actions timing localization method based on figure convolutional network
CN114077676B (en) Knowledge graph noise detection method based on path confidence
CN107451617B (en) Graph transduction semi-supervised classification method
CN110288568A (en) Method for processing fundus images, device, equipment and storage medium
Osting et al. Statistical ranking using the l1-norm on graphs
Lata et al. Data augmentation using generative adversarial network
CN112364747A (en) Target detection method under limited sample
Dupont et al. Probabilistic semantic inpainting with pixel constrained cnns
CN105978711A (en) Best switching edge searching method based on minimum spanning tree
CN117009547A (en) Multi-mode knowledge graph completion method and device based on graph neural network and countermeasure learning
CN115587626A (en) Heterogeneous graph neural network attribute completion method
CN113033410B (en) Domain generalization pedestrian re-recognition method, system and medium based on automatic data enhancement
CN116955846B (en) Cascade information propagation prediction method integrating theme characteristics and cross attention
Zhang et al. Heuristic search for homology localization problem and its application in cardiac trabeculae reconstruction
Pang et al. Cross-modal co-feedback cellular automata for RGB-T saliency detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant