CN112417166A - Knowledge graph triple confidence evaluation method - Google Patents

Knowledge graph triple confidence evaluation method Download PDF

Info

Publication number
CN112417166A
CN112417166A CN202011309998.5A CN202011309998A CN112417166A CN 112417166 A CN112417166 A CN 112417166A CN 202011309998 A CN202011309998 A CN 202011309998A CN 112417166 A CN112417166 A CN 112417166A
Authority
CN
China
Prior art keywords
entity
confidence
node
knowledge
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011309998.5A
Other languages
Chinese (zh)
Other versions
CN112417166B (en
Inventor
杨帅
王小红
赵志刚
窦方坤
曹皓伟
潘景山
魏志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202011309998.5A priority Critical patent/CN112417166B/en
Publication of CN112417166A publication Critical patent/CN112417166A/en
Application granted granted Critical
Publication of CN112417166B publication Critical patent/CN112417166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a knowledge graph triple confidence evaluation method, which comprises an evaluation stage, a fusion stage and a verification stage, wherein a) entity level evaluation; a-1) a data source angle; a-2) angle of co-occurrence of documents; a-3) outer chain scale angle; a-4) text description angles; a-5) entity importance angle; a-6) angle of degree of entity; b) evaluating a relationship level; b-1) data source angle; b-2) angle of co-occurrence of documents; b-3) evaluating the known relation layers among the entities; b-4) evaluating an unknown relation level between entities; c) knowledge-graph global-level assessment. The knowledge graph triple confidence evaluation method can efficiently, quickly and massively discover errors in knowledge graph data, and further improve the data quality of the whole knowledge graph system; and the data reliability check can be carried out on the results of machine learning tasks such as link prediction, relationship inference and the like.

Description

Knowledge graph triple confidence evaluation method
Technical Field
The invention relates to a knowledge graph triple confidence evaluation method, in particular to a knowledge graph triple confidence evaluation method comprising an evaluation stage, a fusion stage and a verification stage.
Background
Different targets and drugs are used as entities, the interaction between the targets and the drugs is used as a relationship, related Knowledge is stored in the entities and the relationship in the form of attributes and is mutually interwoven to form a huge map, and the map supports the functions of inquiry, reasoning, intelligent analysis and the like, and is called a Drug-Target Knowledge map (DT KG). DT KG is an important direction for knowledge mapping research in the field of biomedicine in effectively revealing the complex rules of physical and biochemical actions between the medicine and the target, discovering the implicit action relationship between the medicine and the target which is not discovered yet, and further discovering new medicines or developing new applications of the existing medicines.
Errors are inevitable in the construction process of the knowledge graph. In order to find errors in the knowledge graph and improve the quality of the knowledge graph, and further improve the performance of a knowledge-driven learning task, the concept of knowledge graph triple confidence is introduced in the academic world. And (2) knowledge graph triple confidence (KG triple trust) for measuring the trueness of the knowledge expressed by the triple. The confidence degree of the knowledge graph triple is in a value range of [0,1], the closer the value to 0, the higher the probability that the triple is wrong, and the closer the value to 1, the higher the probability that the triple is true.
The existing knowledge graph triple confidence evaluation method can be summarized into 3 types, and the classification principle is divided according to the applicable stages of the knowledge graph triple confidence evaluation method, as shown in 1, 2 and 3 in fig. 1. The first type of confidence evaluation method is used in the process of extracting triples from text data, and typical cases are as follows: knowlefe knowledge base of the masscharian planck information research center, germany. The second type of confidence evaluation method is used in the Embedding process, which aims to encode all entities and relationships into a continuous vector space. The confidence evaluation and the elimination of data noise in the Embedding process are hot points of research of researchers in recent years, and typical methods comprise the following steps: SCEF (a non-support-confidence-aware KG embedding frame), CKRL (a non-confidence-aware registration reporting frame), TransT (a non-translation embedding reporting addressing with triple node), etc. The third confidence evaluation method directly evaluates the triples, can measure the reliability of the triples obtained by knowledge inference, and is also suitable for the confidence evaluation of the dynamic knowledge base. Typical methods are: KGTtm (a knowledge graph triple valued measurement model), CTransE (knowledge graph embedding on unknown graphs by using adaptive confidence-margin-based loss function for transformation-based models), and the like.
The existing knowledge graph triple confidence evaluation method is shown in table 1, and 7 methods are listed:
TABLE 1
Name of method Application phase Year of year
KnowLife Extracting entities and relationships from text 2015
SCEF Embedding 2019
KGTtm Triple unit 2019
TransT Embedding 2019
CKRL Embedding 2018
ConfGCN Node attribute prediction 2019
CTransE Embedding 2019
(1) KnowLife realizes a universal and extensible method for automatically constructing a biomedical knowledge base, automatically extracts information from scientific publications, health portal websites and online community resources, and introduces a confidence evaluation rule in the automatic information extraction process for quantitatively measuring the reliability of extracted entity and relationship data, thereby improving the quality of the biomedical knowledge base.
(2) The SCEF is a knowledge graph embedding framework supporting confidence perception, and the framework is used for constructing an energy function by combining confidence on the basis of a traditional translation-based model, and realizing perfection and correction of a knowledge graph through knowledge representation learning with triple confidence (text, a knowledge graph and a triple).
(3) KGTtm is a metric model of knowledge-graph triplet confidence that quantifies the semantic correctness of triplets and the trueness of the expressed facts from the entity level, the relationship level, and the knowledge-graph global level.
(4) The TransT is a model for calculating the confidence coefficient of the triple based on information such as entity type, entity description and the like, and optimizes the model through a loss function based on cross entropy so as to improve the performance of knowledge embedding learning.
(5) CKRL is a knowledge representation learning framework based on confidence coefficient, introduces the concept of confidence coefficient based on structural information, and improves the effects of knowledge representation learning and knowledge map noise detection by constructing an energy equation by using the entities of triples, the relations and the vector information of paths among the entities.
(6) The ConfGCN model is used to predict the reliability of the node attribute task and may be used to evaluate the scores of the node labels in the graph and their confidence levels.
(7) CTransE is a translation-based model for handling errors introduced by a knowledge graph in automatic updates, which employs a confidence-based loss function to accomplish embedded representation learning of a dynamic knowledge graph.
However, the existing knowledge graph triple confidence evaluation method has the following disadvantages:
1. the considerations are not comprehensive and the confidence score is unreliable. The existing confidence evaluation method considers the confidence influence factors of a knowledge map global level, an entity level and a relation level, but does not take the scientific research literature and the data source into account, so that the finally obtained confidence score is unreliable.
2. The calculation complexity is high, and the interpretability is poor. In the existing method, the confidence of the triples is evaluated through a machine learning model (for example, KGTtm carries out confidence evaluation on the global level of a knowledge graph based on RNN, SemaTyP carries out confidence evaluation by constructing a logistic regression model), and the model has high computational complexity and poor interpretability.
3. The confidence measure is limited to the Embedding process. Most of the existing confidence evaluation methods are suitable for the Embedding process, and the methods cannot directly evaluate the quality of the triples constructed by the knowledge reasoning and automation method.
Disclosure of Invention
The invention provides a knowledge graph triple confidence evaluation method for overcoming the defects of the technical problems.
The knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated comprising 11 total of compounds, diseases, proteins, genes, pathways, cell lines, drugs, products, targets, enzymes, protein-compounds, the data source confidence N for each entityrReferring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N for an entityrThe value of (2) is equal to the number of stars scored by LOD, and if the same entity always appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtainedrTaking the highest score value;
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure BDA0002789496570000041
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
a-3) evaluation of entity by external chain scale angle, confidence N of external chain scale of entityLThe number of entity external links in the biomedical knowledge graph is used for representing, the larger the scale of the entity external links is, the higher the reliability of the entity data is, and the number of the entity external links isMeasuring the credibility of the entity and the external chain scale confidence coefficient N of the entityLEqual to the number of outer chains of the entity;
a-4) evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, category and functional information of the entity, and the data reliability of the entity with the text description is higher; if the entity in the step a-1) has the text description of the corresponding entity, the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedical knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure BDA0002789496570000051
wherein, P1、P2、…、Pi、…、PnRepresents a node in the knowledge-graph and,
Figure BDA0002789496570000052
representing a node P to be investigatedjThe degree of penetration of the (c) is,
Figure BDA0002789496570000053
representing a node P to be investigatedjN represents the number of nodes in the knowledge-graph,
Figure BDA0002789496570000054
representing a node PjThe PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6) evaluation of entity by entity degree angle, and the degree of entry and the degree of exit of entity node reflect the enrichment degree and the degree of entity information in the knowledge graphStrength of association between an entity and other entities; confidence N of angle of degree of entitysThe calculation is performed by equation (3):
Ns=Nin+Nout (3)
wherein N issConfidence of angle, N, representing degree of entityinRepresenting the degree of entry, N, of a physical nodeoutRepresenting the degree of departure of the entity node;
b) evaluating a relationship level;
b-1), evaluating the relationship level by the angle of the data source, wherein the relationship between the entities in the biomedical knowledge graph is generally represented by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relationship between the entities; if the triple data come from a high-quality data source, the relevance between the two entities is very strong, and the confidence coefficient of the triple information is very high; data source confidence N 'of relationship layer'inReferring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N 'of relationship layer'inThe value of (2) is equal to the star number marked by LOD, and if the same entity always appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relation layer is'inTaking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure BDA0002789496570000061
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent the weight values;
b-3), evaluating the known relationship layer among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm to obtain the confidence coefficient of the known relationship;
b-4), evaluating an unknown relation level among the entities, wherein the entity relation which does not exist in the existing knowledge graph and needs to be obtained through reasoning is called as an unknown relation; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by NtotalEvaluating the knowledge map global level by M to measure the information density of the knowledge map global level, and further evaluating the credibility of data contained in the whole knowledge map; wherein N istotalThe total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
According to the knowledge graph triple confidence evaluation method, the fusion stage is realized through the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure BDA0002789496570000062
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher when the Confidence value is larger; the Confidence value of the Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and the Confidence value is finally normalized to a [0,1] interval; if the confidence value is less than the threshold of 0.6 in the given knowledge-graph, it indicates that the data of the triplet is unreliable.
According to the knowledge graph triple confidence evaluation method, the verification stage is used for evaluating whether the final confidence value of the knowledge graph triple is reasonable or not, and then the design of an evaluator and a fusion device is optimized; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and checking whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reason of the data which is seriously and negatively correlated with the verification result and the confidence value is deeply investigated, and the weight of each method in the fusion stage is adjusted, so that the whole knowledge-graph triple confidence evaluation method is perfected.
According to the knowledge map triple confidence evaluation method, the document library in the step a-2) and the step b-2) comprises a CAS, a Patent, a PubMed, a Wikipedia and a DOI, wherein values of the values alpha, beta and theta are 0.7, 0.2 and 0.1 respectively; the scoring values T for different document categories are shown in table 1:
TABLE 1
Class of documents Scoring value
CAS 1.0
Patent 0.8
PubMed 1.0
Wikipedia 0.5
DOI 1.0
In the knowledge map triple confidence evaluation method, in the evaluation process of the known relationship on the relationship layer in the step b-3), the confidence of the known relationship is measured by adopting a resource rank algorithm; the resource rank algorithm is used for describing the association strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure BDA0002789496570000081
wherein M istIs the set of all the nodes leading to the tail node t, OD (e)i) Is node eiOut of (BW)eitIs node eiBandwidth to tail node t, i.e. the number of paths;for MtIn each node eiFrom node eiThe amount of resources transferred to the tail node t is
Figure BDA0002789496570000082
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3), using R (t | h), the degree of entry ID (h) of the head node h, the degree of exit OD (h) of the head node h, the degree of entry ID (t) of the tail node t, the degree of exit OD (t) of the tail node t, and the depth Dep from the head node to the tail node in the step b-3-2), totaling 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein RR (h, t) is the confidence resource rank, and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the calculation is carried out through a formula (7):
Figure BDA0002789496570000083
where φ is a non-linear activation function, WiAnd biIs a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer its value is to 1, the more likely there is a relationship between h and t.
The invention has the beneficial effects that: the method for evaluating the confidence coefficient of the knowledge graph triples comprises the steps that firstly, the confidence coefficient of the triples is evaluated in an evaluation stage from three aspects of entities, relations and knowledge graph overall situation, multiple angles of data sources, document co-occurrence, external link scale, text description, entity importance and entity degree to obtain 11 confidence coefficients, then, in a verification stage, 11 confidence coefficient evaluators are weighted and fused to obtain a final confidence value, in the verification stage, the rationality of the final confidence value is verified, and verification results are fed back to the evaluation stage and the fusion stage to optimize the design of the evaluation stage or adjust the weight of the fusion stage. Therefore, the knowledge graph triple confidence evaluation method can efficiently, quickly and massively discover errors in knowledge graph data, and further improve the data quality of the whole knowledge graph system; and the data reliability check can be carried out on the results of machine learning tasks such as link prediction, relationship inference and the like.
Drawings
FIG. 1 is a schematic diagram of the applicable stages of three types of confidence evaluation methods;
FIG. 2 is a schematic architecture diagram of the knowledge-graph triple confidence evaluation method of the present invention;
FIG. 3 is a schematic diagram of the ResourceRank algorithm in the present invention;
fig. 4 is a diagram of an exemplary case for calculating confidence in the evaluation phase.
Detailed Description
The invention is further described with reference to the following figures and examples.
As shown in fig. 2, a principle architecture diagram of the knowledge-graph triple confidence evaluation method of the present invention is provided, the knowledge-graph triple confidence evaluation method of the present invention is used for evaluating the reliability of triples in a biomedical knowledge graph, and the knowledge-graph triple confidence evaluation method includes: the system comprises an evaluator, a fusion device and a checker, wherein the three-element data of the knowledge graph generates a plurality of confidence value scores after passing through the evaluator, and the fusion device fuses the scores according to a certain weight to generate a final confidence value. The checker checks the rationality of the final confidence value and feeds back the check result to the evaluator and the fuser for optimizing the design of the evaluator or adjusting the weight of the fuser.
The evaluator evaluates the confidence of the triples from three levels of entities, relations, knowledge graph global and the like, a plurality of angles of data sources, document co-occurrence, external chain scale, text description, entity importance and entity degree, and the specific method is shown in table 2:
TABLE 2
Figure BDA0002789496570000091
Figure BDA0002789496570000101
The knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated comprising 11 total of compounds, diseases, proteins, genes, pathways, cell lines, drugs, products, targets, enzymes, protein-compounds, the data source confidence N for each entityrReferring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N for an entityrThe value of (2) is equal to the number of stars scored by LOD, and if the same entity always appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtainedrTaking the highest score value;
as shown in table 3, a LOD data source quality evaluation table is given:
Figure BDA0002789496570000102
Figure BDA0002789496570000111
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure BDA0002789496570000112
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
in the step, the document library comprises CAS, Patent, PubMed, Wikipedia and DOI, wherein the values of alpha, beta and theta are respectively 0.7, 0.2 and 0.1; the scoring values T for different document categories are shown in table 1:
TABLE 1
Figure BDA0002789496570000113
Figure BDA0002789496570000121
a-3) evaluation of entity by external chain scale angle, confidence N of external chain scale of entityLThe reliability of the entity data is higher when the external chain scale of the entity is larger, the credibility of the entity is measured by the external chain number of the entity, and the external chain scale confidence coefficient N of the entity is expressed by the number of the external links of the entity in the biomedical knowledge mapLEqual to the number of outer chains of the entity;
a-4) evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, category and functional information of the entity, and the data reliability of the entity with the text description is higher; if the entity in the step a-1) has the text description of the corresponding entity, the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedical knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure BDA0002789496570000122
wherein, P1、P2、…、Pi、…、PnRepresents a node in the knowledge-graph and,
Figure BDA0002789496570000123
representing a node P to be investigatedjThe degree of penetration of the (c) is,
Figure BDA0002789496570000124
representing a node P to be investigatedjN represents the number of nodes in the knowledge-graph,
Figure BDA0002789496570000125
representing a node PjThe PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6), evaluating the entity by the angle of the degree of the entity, wherein the in-degree and out-degree of the entity node reflect the enrichment degree of entity information in the knowledge graph and the correlation strength between the entity and other entities; confidence N of angle of degree of entitysThe calculation is performed by equation (3):
Ns=Nin+Nout (3)
wherein N issConfidence of angle, N, representing degree of entityinRepresenting the degree of entry, N, of a physical nodeoutRepresenting the degree of departure of the entity node;
b) evaluating a relationship level;
b-1), evaluating the relation level by the data source angle, and generally representing the relation between the entities in the biomedical knowledge graph by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relation between the entities; if the triple data come from a high-quality data source, the relevance between the two entities is very strong, and the confidence coefficient of the triple information is very high; data source confidence N 'of relationship layer'inReferring to LOD scoring in The Linked Open Data Cloud, and for PubChem and RCSB without LOD scoringThe PDB, drug Bank and DTO body data sources respectively give scores of 5 stars, 5 stars and 4 stars; data source confidence N 'of relationship layer'inThe value of (2) is equal to the star number marked by LOD, and if the same entity always appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relation layer is'inTaking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure BDA0002789496570000131
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent the weight values;
b-3), evaluating the known relationship layer among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm to obtain the confidence coefficient of the known relationship;
as shown in fig. 3, a schematic diagram of the principle of the resource rank algorithm in the present invention is given, and the edges (relationship) from the node (entity) a to the node E are very dense, which indicates that there is a high association strength between the two entities (a, E), and there is a relationship between the entities a and E. However, there is no directly associated edge between node G and node F, which means that there is no relationship between entities G and F.
In the step, the confidence coefficient of the known relation is measured by adopting a resource rank algorithm; the resource rank algorithm is used for describing the association strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure BDA0002789496570000141
wherein M istIs the set of all the nodes leading to the tail node t, OD (e)i) Is node eiThe out-of-range of (c) is,
Figure BDA0002789496570000142
is node eiBandwidth to tail node t, i.e. the number of paths; for MtIn each node eiFrom node eiThe amount of resources transferred to the tail node t is
Figure BDA0002789496570000143
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3), using R (t | h), the degree of entry ID (h) of the head node h, the degree of exit OD (h) of the head node h, the degree of entry ID (t) of the tail node t, the degree of exit OD (t) of the tail node t, and the depth Dep from the head node to the tail node in the step b-3-2), totaling 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein RR (h, t) is the confidence resource rank, and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the calculation is carried out through a formula (7):
Figure BDA0002789496570000144
where φ is a non-linear activation function, WiAnd biIs a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer to 1 the value of h indicates the more likely there is a relationship between h and t。
b-4), evaluating an unknown relation level among the entities, wherein the entity relation which does not exist in the existing knowledge graph and needs to be obtained through reasoning is called as an unknown relation; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by NtotalEvaluating the knowledge map global level by M to measure the information density of the knowledge map global level, and further evaluating the credibility of data contained in the whole knowledge map; wherein N istotalThe total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
The fusion phase is realized by the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure BDA0002789496570000151
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher when the Confidence value is larger; the Confidence value of the Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and the Confidence value is finally normalized to a [0,1] interval; if the confidence value is less than the threshold of 0.6 in the given knowledge-graph, it indicates that the data of the triplet is unreliable.
The checking stage is used for evaluating whether the final confidence value of the knowledge map triple is reasonable or not so as to optimize the design of an evaluator and a fusion device; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and checking whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reason of the data which is seriously and negatively correlated with the verification result and the confidence value is deeply investigated, and the weight of the fusion stage is adjusted, so that the whole knowledge map triple confidence degree evaluation method is perfected.
As shown in fig. 4, a typical case diagram of confidence calculation during the evaluation stage is given, taking (noradrenaline, binding molecule entity, β 2 adrenergic receptor) triple as an example, to briefly describe the process of calculating confidence by the evaluator: at the physical level, a translation-based energy function algorithm (TEF) was used to calculate the likelihood that a binding relationship between norepinephrine and β 2 adrenergic receptors exists. The energy function of the (noradrenaline, binding molecule entity, β 2 adrenergic receptor) triplet is first calculated to achieve a low-dimensional distributed representation of entities and relationships. And then converting the energy function into the probability that the entity pair (noradrenaline, beta 2 adrenergic receptor) forms the entity relationship of the binding molecules by using a sigmoid function, and measuring the possibility that the two entities have the binding relationship by the obtained probability value. And in the relation layer, the relation type and the correlation strength of the medicine and the target are calculated by using a resource rank algorithm. The resourcerrank algorithm creates a sub-graph with depth 2 centered around noradrenaline and β 2 adrenergic receptors, and then calculates the amount of resources flowing from the head entity (noradrenaline) to the tail entity (β 2 adrenergic receptors) based on the generated sub-graph, and if the association between the entity pair (noradrenaline, β 2 adrenergic receptors) is strong, there will be a very large number of resources passing from the head entity (noradrenaline) to the tail entity (β 2 adrenergic receptors) through all the associated paths. And on the data source level, a DataSource algorithm is used for comprehensively evaluating the quality of the data sources of the Drug Target Ontology (Drug Target Ontology), the PRotein Ontology (PRoein Ontology) and the UniProt where the triples are located. First, data for the (norepinephrine, binding molecule entity, β 2 adrenergic receptor) triplet is contained in the drug target entity, protein entity, and UniProt data sources. Secondly, The quality of The Data in The three Data sources is different, a Data source algorithm makes an LOD Data source quality evaluation table by referring to The grading of different Data source qualities in a related Open Data Cloud (LOD), and The confidence evaluation of The Data source layer is realized according to a set rule. At the literature co-occurrence level, a literature co-occurrence algorithm (LCO) quantitatively identifies the strength of association of an entity pair with the number of literature co-occurrences. First, the algorithm screened a literature containing (norepinephrine, binding molecule entity, β 2 adrenergic receptor) triplets. Then, the number of the documents is taken as the main, and the weighted calculation is carried out according to a certain weight by referring to the information of the influence factors, the quotation, the journal categories and the like of the documents, and finally the confidence value for identifying the entity to the association strength is obtained. At the structural level of the knowledge graph, a reachable path reasoning algorithm (RP) is used for evaluating semantic correlation existing between head and tail entities in the directed graph and a complex reasoning mode contained between triples. Firstly, considering semantic relevance factors of the path and the target triple, and selecting the reachable path by a path selection algorithm based on the semantic distance. The selected reachable paths are then mapped to a low-dimensional vector, and a Recurrent Neural Network (RNN) is used to obtain a final output vector, which may represent semantic information for each path. Finally, the vector is subjected to nonlinear processing to obtain a value RP ((h, r, t)) which is used for representing the confidence of the diagram structure level in the knowledge-graph.

Claims (5)

1. A knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated comprising 11 total of compounds, diseases, proteins, genes, pathways, cell lines, drugs, products, targets, enzymes, protein-compounds, the data source confidence N for each entityrReferring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N for an entityrThe value of (2) is equal to the number of stars scored by LOD, and if the same entity always appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtainedrTaking the highest score value;
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure FDA0002789496560000011
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
a-3) evaluation of entity by external chain scale angle, confidence N of external chain scale of entityLThe reliability of the entity data is higher when the external chain scale of the entity is larger, the credibility of the entity is measured by the external chain number of the entity, and the external chain scale confidence coefficient N of the entity is expressed by the number of the external links of the entity in the biomedical knowledge mapLEqual to the number of outer chains of the entity;
a-4) evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, category and functional information of the entity, and the data reliability of the entity with the text description is higher; if the entity in the step a-1) has the text description of the corresponding entity, the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedical knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure FDA0002789496560000021
wherein, P1、P2、…、Pi、…、PnRepresents a node in the knowledge-graph and,
Figure FDA0002789496560000022
representing a node P to be investigatedjThe degree of penetration of the (c) is,
Figure FDA0002789496560000023
representing a node P to be investigatedjN represents the number of nodes in the knowledge-graph,
Figure FDA0002789496560000024
representing a node PjThe PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6), evaluating the entity by the angle of the degree of the entity, wherein the in-degree and out-degree of the entity node reflect the enrichment degree of entity information in the knowledge graph and the correlation strength between the entity and other entities; confidence N of angle of degree of entitysThe calculation is performed by equation (3):
Ns=Nin+Nout (3)
wherein N issConfidence of angle, N, representing degree of entityinRepresenting the degree of entry, N, of a physical nodeoutRepresenting the degree of departure of the entity node;
b) evaluating a relationship level;
b-1), evaluating the relation level by the data source angle, and generally representing the relation between the entities in the biomedical knowledge graph by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relation between the entities; if the triple data come from a high-quality data source, the relevance between the two entities is very strong, and the confidence coefficient of the triple information is very high; data source confidence N 'of relationship layer'inReferring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N 'of relationship layer'inThe value of (2) is equal to the star number marked by LOD, and if the same entity always appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relation layer is'inTaking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure FDA0002789496560000031
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent the weight values;
b-3), evaluating the known relationship layer among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm to obtain the confidence coefficient of the known relationship;
b-4), evaluating an unknown relation level among the entities, wherein the entity relation which does not exist in the existing knowledge graph and needs to be obtained through reasoning is called as an unknown relation; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by NtotalEvaluating the knowledge map global level by M to measure the information density of the knowledge map global level, and further evaluating the credibility of data contained in the whole knowledge map; wherein N istotalThe total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
2. The knowledge-graph triplet confidence assessment method of claim 1, characterized in that: the fusion phase is achieved by the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure FDA0002789496560000041
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher when the Confidence value is larger; the Confidence value of the Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and the Confidence value is finally normalized to a [0,1] interval; if the confidence value is less than the threshold of 0.6 in the given knowledge-graph, it indicates that the data of the triplet is unreliable.
3. The knowledge-graph triplet confidence assessment method of claim 2, characterized in that: the verification stage is used for evaluating whether the final confidence value of the knowledge map triple is reasonable or not, and further optimizing the design of an evaluator and a fusion device; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and checking whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reason of the data which is seriously and negatively correlated with the verification result and the confidence value is deeply investigated, and the weight of each method in the fusion stage is adjusted, so that the whole knowledge-graph triple confidence evaluation method is perfected.
4. The knowledge-graph triplet confidence assessment method of claim 1 or 2, characterized in that: the literature base in the step a-2) and the step b-2) comprises CAS, Patent, PubMed, Wikipedia and DOI, and the values of the alpha, the beta and the theta are respectively 0.7, 0.2 and 0.1; the scoring values T for different document categories are shown in table 1:
TABLE 1
Class of documents Scoring value CAS 1.0 Patent 0.8 PubMed 1.0 Wikipedia 0.5 DOI 1.0
5. The knowledge-graph triplet confidence assessment method of claim 1 or 2, characterized in that: in the evaluation process of the known relationship to the relationship layer in the step b-3), measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm; the resource rank algorithm is used for describing the association strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure FDA0002789496560000051
wherein M istIs the set of all the way to the tail node t,OD(ei) Is node eiThe out-of-range of (c) is,
Figure FDA0002789496560000052
is node eiBandwidth to tail node t, i.e. the number of paths; for MtIn each node eiFrom node eiThe amount of resources transferred to the tail node t is
Figure FDA0002789496560000053
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3), using R (t | h), the degree of entry ID (h) of the head node h, the degree of exit OD (h) of the head node h, the degree of entry ID (t) of the tail node t, the degree of exit OD (t) of the tail node t, and the depth Dep from the head node to the tail node in the step b-3-2), totaling 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein RR (h, t) is the confidence resource rank, and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the calculation is carried out through a formula (7):
Figure FDA0002789496560000061
where φ is a non-linear activation function, WiAnd biIs a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer its value is to 1, the more likely there is a relationship between h and t.
CN202011309998.5A 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method Active CN112417166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011309998.5A CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011309998.5A CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Publications (2)

Publication Number Publication Date
CN112417166A true CN112417166A (en) 2021-02-26
CN112417166B CN112417166B (en) 2022-08-26

Family

ID=74774496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011309998.5A Active CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Country Status (1)

Country Link
CN (1) CN112417166B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204650A (en) * 2021-05-14 2021-08-03 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN115860152A (en) * 2023-02-20 2023-03-28 南京星耀智能科技有限公司 Cross-modal joint learning method oriented to character military knowledge discovery
CN116110594A (en) * 2022-12-02 2023-05-12 北京交通大学 Knowledge evaluation method and system of medical knowledge graph based on associated literature
CN116187868A (en) * 2023-04-27 2023-05-30 深圳市迪博企业风险管理技术有限公司 Knowledge graph-based industrial chain development quality evaluation method and device
CN116501915A (en) * 2023-06-29 2023-07-28 长江三峡集团实业发展(北京)有限公司 Energy management end voice page retrieval method and system
CN117725231A (en) * 2024-02-08 2024-03-19 中国电子科技集团公司第十五研究所 Content generation method and system based on semantic evidence prompt and confidence

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160163311A1 (en) * 2014-12-09 2016-06-09 Microsoft Technology Licensing, Llc Communication system
CN106355627A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Method and system used for generating knowledge graphs
US20180075359A1 (en) * 2016-09-15 2018-03-15 International Business Machines Corporation Expanding Knowledge Graphs Based on Candidate Missing Edges to Optimize Hypothesis Set Adjudication
CN109063021A (en) * 2018-07-12 2018-12-21 浙江大学 A kind of knowledge mapping distribution representation method for capableing of encoding relation semanteme Diversity structure
US20180373989A1 (en) * 2017-06-22 2018-12-27 International Business Machines Corporation Relation extraction using co-training with distant supervision
CN110069638A (en) * 2019-03-12 2019-07-30 北京航空航天大学 A kind of knowledge mapping combination table dendrography learning method of binding rule and path
US20190279104A1 (en) * 2018-03-07 2019-09-12 International Business Machines Corporation Unit conversion in a synonym-sensitive framework for question answering
CN110309310A (en) * 2018-02-12 2019-10-08 清华大学 Representation of knowledge learning method based on confidence level
CN111177322A (en) * 2019-12-30 2020-05-19 成都数之联科技有限公司 Ontology model construction method of domain knowledge graph
CN111625659A (en) * 2020-08-03 2020-09-04 腾讯科技(深圳)有限公司 Knowledge graph processing method, device, server and storage medium
CN111737481A (en) * 2019-10-10 2020-10-02 北京沃东天骏信息技术有限公司 Noise reduction method, device and equipment of knowledge graph and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160163311A1 (en) * 2014-12-09 2016-06-09 Microsoft Technology Licensing, Llc Communication system
CN106355627A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Method and system used for generating knowledge graphs
US20180075359A1 (en) * 2016-09-15 2018-03-15 International Business Machines Corporation Expanding Knowledge Graphs Based on Candidate Missing Edges to Optimize Hypothesis Set Adjudication
US20180373989A1 (en) * 2017-06-22 2018-12-27 International Business Machines Corporation Relation extraction using co-training with distant supervision
CN110309310A (en) * 2018-02-12 2019-10-08 清华大学 Representation of knowledge learning method based on confidence level
US20190279104A1 (en) * 2018-03-07 2019-09-12 International Business Machines Corporation Unit conversion in a synonym-sensitive framework for question answering
CN109063021A (en) * 2018-07-12 2018-12-21 浙江大学 A kind of knowledge mapping distribution representation method for capableing of encoding relation semanteme Diversity structure
CN110069638A (en) * 2019-03-12 2019-07-30 北京航空航天大学 A kind of knowledge mapping combination table dendrography learning method of binding rule and path
CN111737481A (en) * 2019-10-10 2020-10-02 北京沃东天骏信息技术有限公司 Noise reduction method, device and equipment of knowledge graph and storage medium
CN111177322A (en) * 2019-12-30 2020-05-19 成都数之联科技有限公司 Ontology model construction method of domain knowledge graph
CN111625659A (en) * 2020-08-03 2020-09-04 腾讯科技(深圳)有限公司 Knowledge graph processing method, device, server and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUN LIU ET AL.: "Topological analysis of knowledge maps", 《KNOWLEDGE-BASED SYSTEMS》 *
WEIGUO ZHENG,HONG CHENG,JEFFREY XU YU,LEI ZOU: "Interactive natural language question answering over knowledge graphs", 《INFORMATION SCIENCES》 *
徐增林,盛泳潘,贺丽荣,王雅芳: "知识图谱技术综述", 《电子科技大学学报》 *
李涛等: "知识图谱的发展与构建", 《南京理工大学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204650A (en) * 2021-05-14 2021-08-03 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN113204650B (en) * 2021-05-14 2022-03-11 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN116110594A (en) * 2022-12-02 2023-05-12 北京交通大学 Knowledge evaluation method and system of medical knowledge graph based on associated literature
CN116110594B (en) * 2022-12-02 2024-05-07 北京交通大学 Knowledge evaluation method and system of medical knowledge graph based on associated literature
CN115860152A (en) * 2023-02-20 2023-03-28 南京星耀智能科技有限公司 Cross-modal joint learning method oriented to character military knowledge discovery
CN116187868A (en) * 2023-04-27 2023-05-30 深圳市迪博企业风险管理技术有限公司 Knowledge graph-based industrial chain development quality evaluation method and device
CN116501915A (en) * 2023-06-29 2023-07-28 长江三峡集团实业发展(北京)有限公司 Energy management end voice page retrieval method and system
CN116501915B (en) * 2023-06-29 2023-10-20 长江三峡集团实业发展(北京)有限公司 Energy management end voice page retrieval method and system
CN117725231A (en) * 2024-02-08 2024-03-19 中国电子科技集团公司第十五研究所 Content generation method and system based on semantic evidence prompt and confidence
CN117725231B (en) * 2024-02-08 2024-04-23 中国电子科技集团公司第十五研究所 Content generation method and system based on semantic evidence prompt and confidence

Also Published As

Publication number Publication date
CN112417166B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN112417166B (en) Knowledge graph triple confidence evaluation method
US12079581B2 (en) Diagnosing sources of noise in an evaluation
US10635978B2 (en) Ensembling of neural network models
Tang et al. A pruning neural network model in credit classification analysis
JP6708847B1 (en) Machine learning apparatus and method
Cantarella et al. Multilayer feedforward networks for transportation mode choice analysis: An analysis and a comparison with random utility models
CN112364880A (en) Omics data processing method, device, equipment and medium based on graph neural network
CN113140254A (en) Meta-learning drug-target interaction prediction system and prediction method
Wang et al. Editorial behaviors in peer review
CN115512785A (en) Attention mechanism-based three-dimensional protein-ligand activity prediction method
CN117668622B (en) Training method of equipment fault diagnosis model, fault diagnosis method and device
Min et al. Poverty prediction using machine learning approach
Cao Evaluating the vocal music teaching using backpropagation neural network
Sharifi et al. Banks credit risk prediction with optimized ANN based on improved owl search algorithm
KR20220155785A (en) Method and apparatus for operating chatbot
Guo [Retracted] Safety Risk Assessment of Tourism Management System Based on PSO‐BP Neural Network
Aswani et al. Identifying popular online news: An approach using chaotic cuckoo search algorithm
CN113392958A (en) Parameter optimization and application method and system of fuzzy neural network FNN
CN115516473A (en) Hybrid human-machine learning system
Mao et al. QoS trust rate prediction for Web services using PSO-based neural network
Yao et al. Application of Neural Network and Structural Model in AI Educational Performance Analysis.
Ma et al. Social network group decision-making model considering interactions between trust relationships and opinion evolution
Wang et al. Proxy Forecasting to Avoid Stochastic Decision Rules in Decision Markets
US20240095553A1 (en) Systems and methods for evaluating counterfactual samples for explaining machine learning models
Samizadeh et al. Web mining based on word-centric search with clustering approach using MLP-PSO hybrid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant