CN114077676B - Knowledge graph noise detection method based on path confidence - Google Patents
Knowledge graph noise detection method based on path confidence Download PDFInfo
- Publication number
- CN114077676B CN114077676B CN202111393836.9A CN202111393836A CN114077676B CN 114077676 B CN114077676 B CN 114077676B CN 202111393836 A CN202111393836 A CN 202111393836A CN 114077676 B CN114077676 B CN 114077676B
- Authority
- CN
- China
- Prior art keywords
- path
- confidence
- matrix
- triples
- gru
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 32
- 238000013519 translation Methods 0.000 claims abstract description 5
- 238000010606 normalization Methods 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims 1
- 238000013459 approach Methods 0.000 description 4
- 239000002585 base Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241001608711 Melo Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000012458 free base Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Optimization (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a knowledge graph noise detection method based on path confidence coefficient, which comprises the following steps: firstly, initializing triples, finding all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedded sequences; wherein, a node is formed between adjacent triples in the path embedding sequence; secondly, sequentially inputting the nodes into the CPLL to calculate the confidence score of each node in each path; respectively obtaining a scoring matrix of each path from each path of Bi-GRU; and finally, taking the L2 norm of the score matrix of each path as a path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triplet. The invention combines the method based on the path and the method based on the rule, and improves the efficiency of detecting the noise in the knowledge graph, thereby improving the quality of the knowledge graph.
Description
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a knowledge graph noise detection method based on path confidence.
Background
Nowadays, knowledge-graphs play an important role in solving the task of artificial intelligence. However, manually or automatically constructed knowledgemaps have a number of quality issues, and often contain some erroneous or missing triples. Noise in the knowledge-graph may be caused by human error or errors in the data, with most noise appearing as erroneous entities or relationships in the triples. Currently, more and more scholars are beginning to focus on the problem of knowledge-graph noise and come up with many solutions.
Noise detection methods in knowledge-graphs can be broadly divided into path-based methods and rule-based methods. Path-based methods start with TransE, TransH, TransR, etc. translation models, which, although they are mostly used for knowledge-graph embedded representation and completion, can also be used to detect noise in the knowledge-graph. The PaTyBRED model proposed by Melo et al, which incorporates type and path features into a local relationship classifier, preserving a specific path for each relationship to indicate whether a triplet is erroneous. Xie et al propose a CKRL model that utilizes the local and global information of triples to represent the probability of a triplet being erroneous. However, the path-based approach is weak in the ability to find noise and is not suitable for processing knowledge-graphs containing complex relationships. Rule-based methods generally have a stronger noise detection capability than path-based methods. Brocheler et al propose a PSL model that extracts the most likely correct triples from ambiguous triples using first order predicate logic and weighting rules. Abedini et al propose Correction Tower, identifying discrete, inconsistent and error relationships in triples in three steps. However, rule-based methods lack the ability to represent the knowledge-map, i.e., after the rule-based methods detect and reject noise in the knowledge-map, it is also necessary to map the knowledge-map to a continuous vector space in order to make it more convenient to manipulate the knowledge-map in downstream tasks.
If the path-based approach and the rule-based approach can be combined, not only noise can be found, but a noise-free knowledge graph representation can also be constructed. Specifically, firstly, in the path of the triple, a rule is made to screen out the effective features. These features are required to distinguish noise information from correct information, and the correct information includes global triplet information and local triplet information. And then, the noise detection and the triple representation work are completed by utilizing the characteristics, so that the quality of the knowledge graph is improved, and the user experience is improved.
Disclosure of Invention
The invention provides a method for detecting noise of a knowledge graph based on path confidence, which is used for solving the technical problems that the existing method based on the path is weak in noise finding capability and is not suitable for processing the knowledge graph containing complex relationships and the rule-based method lacks the capability of knowledge representation.
The technical scheme of the invention is realized as follows:
a knowledge graph noise detection method based on path confidence includes the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
Preferably, in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′ i ·(N′ i+1 ) T (1);
N′ i =(x′ i ,r′ i ,x′ i+1 ) (2);
N′ i+1 =(x′ i+1 ,r′ i+1 ,x′ i+2 ) (3);
wherein, N' i An embedded matrix, N ', representing the ith triplet on the path' i+1 An embedded matrix representing the (i +1) th triplet on the path, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transpose of x' i 、x′ i+1 、x′ i+2 All represent entity, r' i And r' i+1 All represent relationships;
s22, connecting the node T with the parameter matrix W 0 Multiplying to obtain the global confidence among the triples, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate by the node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T 1 =x′ i ·x′ i+1 ,T 2 =r′ i ·r′ i+1 ,T 3 =x′ i+1 ·x′ i+2 (5);
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence score G of the node T i :
G i =GTT(i,i+1)·LTT(i,i+1) (12)。
Preferably, in step three, the specific method is as follows:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
wherein,which represents the output result of the forward GRU,the output result of backward GRU is shown, and GRU (-) shows a gating cycle network.
S32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,represents the final output result of the forward GRU,represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Preferably, in step four, the path confidence and the optimal triplet are calculated by the following methods:
Wherein g (p) represents path confidence, h (p) j ) A matrix of the scores of the paths is represented,l2 function, g (f), representing a matrix k ) Indicates the maximum path confidence, h (f) k ) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
Preferably, the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
Compared with the prior art, the invention has the following beneficial effects:
1) on the basis of internal structure information in a knowledge graph based on a path, a probability model based on correlation degree is introduced and fused into a neural network structure to detect noise in the knowledge graph and perform knowledge graph representation.
2) The invention constructs a path confidence network to calculate the global triple confidence and the local triple confidence, and obtains the path confidence and the path score matrix of the triple by combining a bidirectional gating circulation network; the path confidence is used to determine whether the triplet is correct, and the path score matrix is used to represent the triplet.
3) The invention solves the problem of knowledge graph noise, completes the representation of the knowledge graph and obtains good effect in the detection test of the knowledge graph noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a sub-graph of all paths from entity "champions" to entity "teams";
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a flow chart of the proposed model of the present invention;
FIG. 4 is a block diagram of a correlation-based probabilistic logic model of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art based on the embodiments of the present invention without inventive step, are within the scope of the present invention.
In general, the existence of some relationship between triplets in a knowledge-graph can be expressed in the form of a path. When the triplet f is expressed as (h, R, t), the path P from the head entity h to the tail entity t as (h, R, t) is an option that cannot be ignored. Wherein, R includes at least one relationship, and possibly several entities, and these entities and relationships may form several triples N, which is referred to as path triples in the present invention. Every two adjacent triplets constitute a node. And R ≧ R, when R ≧ R, path P is equal to f, indicating that f is the shortest path.
There may be multiple paths from the head entity to the tail entity, but some paths are not correct, some paths are not complete, and information in some paths is not suitable for use in the triplet representation. FIG. 1 shows f 1 The set of all paths for ("champion", "joining", "team"), i.e., the set of paths for the entity "champion" to the entity "team". In FIG. 1, f 1 Is the shortest path, also the triplet itself, f 2 ("," 3 The correct triplet is the "basketball game", "equals" and "match". Thus, f 2 Or f 3 The combined path with the other triplets is noisy. These noisy paths must undergo some processing before their path score matrices can be used to represent the triples.
However, most path-based knowledge graph representation methods do not exclude noise contained in the path. But the rule-based approach is well suited to solve the problem of noise contained in the path. Specifically, a confidence level is given to each node in the path to indicate how likely the node is correct, and then a path confidence level is obtained by probability combination, and the path confidence level indicates how likely the path is correct. If the path from the head entity to the tail entity only has the triplet itself, then the triplet is the only node in the path. At this time, the triple confidence, the node confidence and the path confidence are equal. In fact, there may be multiple paths, and it is most appropriate to take the path with the highest path confidence to represent the triplet. If the triples are represented in the form of a matrix, the path score matrix is obtained by the probability combination between the node confidence degrees, and the L2 norm of the path score matrix is used as the confidence degree of the path.
As shown in fig. 2, an embodiment of the present invention provides a method for detecting noise in a knowledge-graph based on path confidence, which includes the following specific steps:
the method comprises the following steps: for the triples with the number of E, finding all paths of all the triples, initializing the number of the triples with the number of E as E, and traversing all the triples; and traversing all paths of the triples, wherein the number of the paths is P, and the number of the initialized paths is P. Embedding each triple of each path by using a translation model TransE algorithm, and representing all paths of the triples as path embedding sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n; the number of initialization nodes is N. The structure of the present invention is shown in fig. 3.
Step two: as shown in fig. 4, the nodes are sequentially input to a probability logic layer (CPLL) based on the correlation, and the confidence score of each node in each path is calculated;
in the second step, the specific method is as follows:
s21, initializing the input node T:
T=N′ i ·(N′ i+1 ) T (1);
N′ i =(x′ i ,r′ i ,x′ i+1 ) (2);
N′ i+1 =(x′ i+1 ,r′ i+1 ,x′ i+2 ) (3);
wherein, N' i ,N′ i+1 Denote the embedded matrices of the ith and i +1 triplets on the path, respectively, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transposed, x' i 、x′ i+1 、x′ i+2 Represents entity r' i And r' i+1 Representing the relationship.
S22, connecting the node T with the parameter matrix W 0 The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
where GTT (i, i +1) is the global triple confidence.
S23, the node T enters separation&In the filling operation layer, on the diagonal of TSub-matrix block T 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
T 1 =x′ i ·x′ i+1 ,T 2 =r′ i ·r′ i+1 ,T 3 =x′ i+1 ·x′ i+2 (5);
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,respectively representing different logical operations, and LTT (i, j) is local triple confidence.
S24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node T i :
G i =GTT(i,i+1)·LTT(i,i+1) (12)。
Step three: respectively inputting the confidence scores of all nodes in each path into a Bi-GRU (bidirectional gated-loop network) according to the front and back sequence to obtain a score matrix of each path;
in the third step, the specific method is as follows:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
wherein,which represents the output result of the forward GRU,the output result of backward GRU is shown, and GRU (-) shows a gating cycle network.
S32, in order to retain the effective information to the maximum, performing the connection, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain the path score matrix:
wherein h (p) represents the output result of the gated loop network, i.e. the path score matrix,representing the final output result of the forward GRU,represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
Step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal triple.
In the fourth step, the path confidence and the optimal triplet are calculated by the following methods:
Wherein g (p) represents path confidence, h (p) j ) A matrix of the scores of the paths is represented,l2 function, g (f), representing a matrix k ) Indicates the maximum path confidence, h (f) k ) The optimal path score matrix representing the triplet is also the optimal embedding matrix for the triplet.
In order to train the model proposed by the present invention, the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, an invalid triple refers to a triple formed by randomly switching one head entity or one tail entity of an original triple, and a valid triple refers to an original triple.
The present invention uses three reference datasets FB15K, WN18, and NELL995 of knowledge-map noise detection, which are constructed from information extracted from the Freebase, WordNet, and NELL knowledge bases, respectively. Their statistics are listed in table 1.
TABLE 1 statistics of the baseline data sets FB15K, WN18, and NELL995
To evaluate the performance of the model, noise needs to be added to the data set described above. The basic method is as follows: for a given positive triplet (h, r, t), one of the head or tail entities is randomly switched to form a negative triplet (h ', r, t) or (h, r, t') as noise. In this way, a data set containing 10%, 20%, 40% noise is constructed for each reference data set. These noisy data sets share the same entity, relationship, validation, and test sets as the original data set, and all the noise generated is fused into the original training set.
The invention combines the L2 norm of the path score matrixAs path confidences, all triples in the training set are then ranked according to these path confidences. The greater the path confidence of a triplet, the more effective the triplet is represented.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (4)
1. A knowledge graph noise detection method based on path confidence is characterized by comprising the following steps:
the method comprises the following steps: initializing the number of triples, finding out all paths of all triples, carrying out embedded representation on each triplet of each path by using a translation model TransE algorithm, and representing all paths of triples as path embedded sequences; a node is formed between adjacent triples in the path embedding sequence, and the number of the nodes is n;
step two: sequentially inputting the nodes to a probability logic layer (CPLL) based on the confidence degree and based on the relevance degree, and calculating a confidence degree score matrix of each node in each path;
in the second step, the specific method is as follows:
s21, initializing the input node T:
wherein, N' i An embedded matrix, N ', representing the ith triplet on the path' i+1 An embedded matrix representing the (i +1) th triplet on the path, (N' i+1 ) T Representing triplet embedding matrix N' i+1 Transpose of x' i 、x′ i+1 、x′ i+2 All represent entity, r' i And r' i+1 All represent relationships;
s22, connecting the node T with the parameter matrix W 0 The global confidence between the triples is obtained by multiplying, namely the global triple confidence:
GTT(i,i+1)=T·W 0 (4);
wherein, GTT (i, i +1) is the confidence of the global triple;
s23, entering into Separate node T&In the padd layer, the sub-matrix block T on the diagonal of T is separated 1 ,T 2 ,T 3 Then T is added 1 ,T 2 ,T 3 Respectively with the parameter matrix W 1 ,W 2 ,W 3 Multiplying to obtain D, E and F; and performing logic operation based on the correlation degrees by using the D, the E and the F, and adding to obtain a local confidence coefficient between the triples, namely the local triple confidence coefficient:
D=T 1 ·W 1 ,E=T 2 ·W 2 ,F=T 3 ·W 3 (6);
wherein MIN (-) represents the minimum value of the matrix, MAX (-) represents the maximum value of the matrix, 1 represents that the elements in the matrix are all 1, -1 represents that the elements in the matrix are all-1,respectively representing different logic operations, wherein LTT (i, j) is a local triple confidence coefficient;
s24, multiplying the confidence coefficient of the global triple and the confidence coefficient of the local triple to obtain the confidence coefficient score G of the node T i :
G i =GTT(i,i+1)·LTT(i,i+1) (12);
Step three: respectively inputting the confidence coefficient score matrixes of all nodes in each path into the Bi-GRU to obtain a score matrix of each path;
step four: and taking the L2 norm of the score matrix of each path as the path confidence coefficient, and taking the corresponding score matrix when the path confidence coefficient is highest as the optimal embedding matrix of the triples.
2. The method for detecting knowledge-graph noise based on path confidence as claimed in claim 1, wherein in step three, the specific method is:
s31, selecting the confidence score G of each node i And confidence G of neighboring nodes i+1 、G i-1 As the input of the bidirectional GRU, the calculation modes of the ith forward GRU and the backward GRU are respectively as follows:
wherein,the output result of the forward GRU is represented,representing the output result of backward GRU, GRU (-) represents the gating cycle network;
s32, performing concatenation, linear and normalization operations on the final outputs of the forward GRU and the backward GRU to obtain a path score matrix:
wherein h (p) represents the output result of the gated cyclic network, i.e., the path score matrix,representing the final output result of the forward GRU,represents the final output result of the backward GRU, concat () represents the join function, line () represents the linear function, and softmax () represents the normalization function.
3. The method for knowledge-graph noise detection based on path confidence as claimed in claim 2, wherein in step four, the path confidence and the optimal triplet are calculated by:
4. The method of knowledge-graph noise detection based on path confidence of claim 3, wherein the designed loss function is as follows:
L=∑ (h,r,t)∈{T'∪T”} log[1+exp(l (h,r,t) ·P(h,r,t))] (18);
the method comprises the following steps that exp () represents an exponential function with a natural constant e as a base, log () represents a logarithmic function, L represents a loss function, P (h, r, T) represents a path from a head entity h to a tail entity T, r represents a relation, T 'represents a set of valid triples, T' represents a set of invalid triples, the invalid triples refer to triples formed by randomly switching one head entity or one tail entity of original triples, and the valid triples refer to the original triples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111393836.9A CN114077676B (en) | 2021-11-23 | 2021-11-23 | Knowledge graph noise detection method based on path confidence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111393836.9A CN114077676B (en) | 2021-11-23 | 2021-11-23 | Knowledge graph noise detection method based on path confidence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114077676A CN114077676A (en) | 2022-02-22 |
CN114077676B true CN114077676B (en) | 2022-09-30 |
Family
ID=80284076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111393836.9A Active CN114077676B (en) | 2021-11-23 | 2021-11-23 | Knowledge graph noise detection method based on path confidence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114077676B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114691896B (en) * | 2022-05-31 | 2022-09-13 | 浙江大学 | Knowledge graph data cleaning method and device |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10606849B2 (en) * | 2016-08-31 | 2020-03-31 | International Business Machines Corporation | Techniques for assigning confidence scores to relationship entries in a knowledge graph |
CN112035672B (en) * | 2020-07-23 | 2023-05-09 | 深圳技术大学 | Knowledge graph completion method, device, equipment and storage medium |
CN112732931A (en) * | 2021-01-08 | 2021-04-30 | 中国人民解放军国防科技大学 | Method and equipment for noise detection and knowledge completion of knowledge graph |
CN112819162B (en) * | 2021-02-02 | 2024-02-27 | 东北大学 | Quality inspection method for knowledge-graph triples |
CN112836064B (en) * | 2021-02-24 | 2023-05-16 | 吉林大学 | Knowledge graph completion method and device, storage medium and electronic equipment |
CN113420163B (en) * | 2021-06-25 | 2022-09-16 | 中国人民解放军国防科技大学 | Heterogeneous information network knowledge graph completion method and device based on matrix fusion |
-
2021
- 2021-11-23 CN CN202111393836.9A patent/CN114077676B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114077676A (en) | 2022-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021159742A1 (en) | Image segmentation method and apparatus, and storage medium | |
Mazumdar et al. | Clustering with noisy queries | |
CN113407759B (en) | Multi-modal entity alignment method based on adaptive feature fusion | |
CN110413704B (en) | Entity alignment method based on weighted neighbor information coding | |
CN110880019B (en) | Method for adaptively training target domain classification model through unsupervised domain | |
CN109753571B (en) | Scene map low-dimensional space embedding method based on secondary theme space projection | |
CN103810288B (en) | Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm | |
CN112258486B (en) | Retinal vessel segmentation method for fundus image based on evolutionary neural architecture search | |
CN110851491A (en) | Network link prediction method based on multiple semantic influences of multiple neighbor nodes | |
CN113656596A (en) | Multi-modal entity alignment method based on triple screening fusion | |
CN110362715A (en) | A kind of non-editing video actions timing localization method based on figure convolutional network | |
CN114077676B (en) | Knowledge graph noise detection method based on path confidence | |
CN107451617B (en) | Graph transduction semi-supervised classification method | |
CN110288568A (en) | Method for processing fundus images, device, equipment and storage medium | |
Osting et al. | Statistical ranking using the l1-norm on graphs | |
Lata et al. | Data augmentation using generative adversarial network | |
CN112364747A (en) | Target detection method under limited sample | |
Dupont et al. | Probabilistic semantic inpainting with pixel constrained cnns | |
CN105978711A (en) | Best switching edge searching method based on minimum spanning tree | |
CN117009547A (en) | Multi-mode knowledge graph completion method and device based on graph neural network and countermeasure learning | |
CN115587626A (en) | Heterogeneous graph neural network attribute completion method | |
CN113033410B (en) | Domain generalization pedestrian re-recognition method, system and medium based on automatic data enhancement | |
CN116955846B (en) | Cascade information propagation prediction method integrating theme characteristics and cross attention | |
Zhang et al. | Heuristic search for homology localization problem and its application in cardiac trabeculae reconstruction | |
Pang et al. | Cross-modal co-feedback cellular automata for RGB-T saliency detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |