CN112417166B - Knowledge graph triple confidence evaluation method - Google Patents

Knowledge graph triple confidence evaluation method Download PDF

Info

Publication number
CN112417166B
CN112417166B CN202011309998.5A CN202011309998A CN112417166B CN 112417166 B CN112417166 B CN 112417166B CN 202011309998 A CN202011309998 A CN 202011309998A CN 112417166 B CN112417166 B CN 112417166B
Authority
CN
China
Prior art keywords
entity
confidence
node
knowledge graph
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011309998.5A
Other languages
Chinese (zh)
Other versions
CN112417166A (en
Inventor
杨帅
王小红
赵志刚
窦方坤
曹皓伟
潘景山
魏志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202011309998.5A priority Critical patent/CN112417166B/en
Publication of CN112417166A publication Critical patent/CN112417166A/en
Application granted granted Critical
Publication of CN112417166B publication Critical patent/CN112417166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph triple confidence evaluation method which comprises an evaluation stage, a fusion stage and a verification stage, wherein a) entity level evaluation; a-1) a data source angle; a-2) angle of co-occurrence of documents; a-3) outer chain scale angle; a-4) text description angles; a-5) entity importance angle; a-6) angle of degree of entity; b) evaluating a relationship level; b-1) data source angle; b-2) angle of co-occurrence of documents; b-3) evaluating the known relation layers among the entities; b-4) evaluating unknown relation layers among the entities; c) knowledge-graph global-level assessment. The knowledge graph triple confidence evaluation method can efficiently, quickly and massively discover errors in knowledge graph data, and further improve the data quality of the whole knowledge graph system; and the data reliability check can be carried out on the results of machine learning tasks such as link prediction, relationship inference and the like.

Description

Knowledge graph triple confidence evaluation method
Technical Field
The invention relates to a knowledge graph triple confidence evaluation method, in particular to a knowledge graph triple confidence evaluation method comprising an evaluation stage, a fusion stage and a verification stage.
Background
Different targets and drugs are used as entities, the interaction between the targets and the drugs is used as a relationship, related Knowledge is stored in the entities and the relationship in the form of attributes and is mutually interwoven to form a huge map, and the map supports the functions of inquiry, reasoning, intelligent analysis and the like, and is called a Drug-Target Knowledge map (DT KG). DT KG is an important direction for knowledge mapping research in the field of biomedicine in effectively revealing the complex rules of physical and biochemical actions between the medicine and the target, discovering the implicit action relationship between the medicine and the target which is not discovered yet, and further discovering new medicines or developing new applications of the existing medicines.
Errors are inevitable in the construction process of the knowledge graph. In order to find errors in the knowledge graph and improve the quality of the knowledge graph, and further improve the performance of a knowledge-driven learning task, the concept of knowledge graph triple confidence is introduced in the academic world. And (2) knowledge graph triple confidence (KG triple trust) for measuring the trueness of the knowledge expressed by the triple. The confidence degree of the knowledge graph triple is in a value range of [0,1], the closer the value to 0, the higher the probability that the triple is wrong, and the closer the value to 1, the higher the probability that the triple is true.
The existing knowledge graph triple confidence evaluation method can be summarized into 3 types, and the classification principle is divided according to the applicable stages of the knowledge graph triple confidence evaluation method, as shown in 1, 2 and 3 in fig. 1. The first type of confidence evaluation method is used in the process of extracting triples from text data, and typical cases are as follows: knowlefe knowledge base of the masscharian planck information research center, germany. The second type of confidence evaluation method is used in the Embedding process, which aims to encode all entities and relationships into a continuous vector space. The confidence evaluation and the elimination of data noise in the Embedding process are hot points of research of researchers in recent years, and typical methods comprise the following steps: SCEF (a novel support-confidence-aware KG embedding frame), CKRL (a novel knowledge-aware registration retrieval frame), transt (translation embedding discovery with triple tree), and the like. The third confidence evaluation method directly evaluates the triples, can measure the reliability of the triples obtained by knowledge inference, and is also suitable for the confidence evaluation of the dynamic knowledge base. Typical methods are: KGTtm (knowledge graph triple valued measurement model), CTRANSE (knowledge graph embedding on non-knowledge graphs by using adaptation confidence-margin-based loss function for translation-based models), and the like.
The existing knowledge graph triplet confidence evaluation method is shown in table 1, and 7 methods are listed:
TABLE 1
Name of method Application stage Year of year
KnowLife Extracting entities and relationships from text 2015
SCEF Embedding 2019
KGTtm Triple unit 2019
TransT Embedding 2019
CKRL Embedding 2018
ConfGCN Node attribute prediction 2019
CTransE Embedding 2019
(1) KnowLife realizes a universal and extensible method for automatically constructing a biomedical knowledge base, automatically extracts information from scientific publications, health portal websites and online community resources, and introduces a confidence evaluation rule in the automatic information extraction process for quantitatively measuring the reliability of extracted entity and relationship data, thereby improving the quality of the biomedical knowledge base.
(2) The SCEF is a knowledge graph embedding framework supporting confidence perception, and the framework is used for constructing an energy function by combining confidence on the basis of a traditional translation model, and realizing the improvement and correction of a knowledge graph through knowledge representation learning with triple confidence (text, a knowledge graph and triples).
(3) KGTtm is a metric model of knowledge-graph triplet confidence that quantifies the semantic correctness of triplets and the trueness of the expressed facts from the entity level, the relationship level, and the knowledge-graph global level.
(4) The TransT is a model for calculating the confidence coefficient of the triple based on information such as entity type, entity description and the like, and optimizes the model through a loss function based on cross entropy so as to improve the performance of knowledge embedding learning.
(5) CKRL is a knowledge representation learning framework based on confidence coefficient, introduces the concept of confidence coefficient based on structural information, and improves the effects of knowledge representation learning and knowledge map noise detection by constructing an energy equation by using the entities of triples, the relations and the vector information of paths among the entities.
(6) The ConfGCN model is used to predict the reliability of the node attribute task and may be used to evaluate the scores of the node labels in the graph and their confidence levels.
(7) CTransE is a translation-based model for handling errors introduced by a knowledge graph upon automatic update, which employs a confidence-based loss function to accomplish embedded representation learning of a dynamic knowledge graph.
However, the existing knowledge graph triple confidence evaluation method has the following defects:
1. the considerations are not comprehensive and the confidence score is unreliable. The existing confidence evaluation method considers the confidence influence factors of a knowledge map global level, an entity level and a relation level, but does not take the scientific research literature and the data source into account, so that the finally obtained confidence score is unreliable.
2. The calculation complexity is high, and the interpretability is poor. In the existing method, the confidence of the triples is evaluated through a machine learning model (for example, KGTtm carries out confidence evaluation on the global level of a knowledge graph based on RNN, SemaTyP carries out confidence evaluation by constructing a logistic regression model), and the model has high computational complexity and poor interpretability.
3. The confidence measure is limited to the Embedding process. Most of the existing confidence evaluation methods are suitable for the Embedding process, and the methods cannot directly evaluate the quality of the triples constructed by the knowledge reasoning and automation method.
Disclosure of Invention
The invention provides a knowledge graph triple confidence evaluation method for overcoming the defects of the technical problems.
The knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated comprising 11 total of compounds, diseases, proteins, genes, pathways, cell lines, drugs, products, targets, enzymes, protein-compounds, the data source confidence N for each entity r Reference association open data cloudScoring LOD in Linked Open Data Cloud, and respectively giving scores of 5 stars, 5 stars and 4 stars for PubChem, RCSB PDB, DrugBank and DTO body Data sources which are not subjected to LOD scoring; data source confidence N for an entity r The value of (a) is equal to the number of stars scored by the LOD, and if the same entity appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtained r Taking the highest score value;
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure GDA0003671111330000041
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
a-3) evaluation of entity by external chain scale angle, confidence N of external chain scale of entity L The number of entity external links in the biomedical knowledge graph is used for representing, the larger the entity external link scale is, the higher the reliability of entity data is, the credibility of the entity is measured through the number of the entity external links, and the confidence coefficient N of the entity external link scale L Equal to the number of outer chains of the entity;
a-4), evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, the category and the functional information of the entity, and the entity with the text description has higher data reliability; if the text description of the corresponding entity exists in the data source in the step a-1), the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedical knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure GDA0003671111330000051
wherein, P 1 、P 2 、…、P i 、…、P n Represents a node in the knowledge-graph and,
Figure GDA0003671111330000052
representing a node P to be investigated j The degree of penetration of (a) is,
Figure GDA0003671111330000053
representing a node P to be investigated j N represents the number of nodes in the knowledge-graph,
Figure GDA0003671111330000054
representing a node P j The PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6), evaluating the entity by the angle of the degree of the entity, wherein the in-degree and out-degree of the entity node reflect the enrichment degree of entity information in the knowledge graph and the correlation strength between the entity and other entities; confidence N of angle of degree of entity s The calculation is performed by equation (3):
N s =N in +N out (3)
wherein N is s Confidence of angle, N, representing degree of entity in Representing the degree of entry, N, of a physical node out Representing the out degree of the entity node;
b) evaluating a relationship level;
b-1), evaluating the relationship level by the angle of the data source, wherein the relationship between the entities in the biomedical knowledge graph is generally represented by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relationship between the entities; indicating two if the triple data is from a high quality data sourceThe relevance among the entities is strong, and the confidence coefficient of the triple information is high; data source confidence N 'of relationship layer' in Referring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N 'of relation layer' in Is equal to the star number marked by LOD, if the same entity appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relation level is' in Taking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure GDA0003671111330000061
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent the weight values;
b-3), evaluating the known relationship layer among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm to obtain the confidence coefficient of the known relationship;
b-4), evaluating an unknown relation level among the entities, wherein the entity relation which does not exist in the existing knowledge graph and needs to be obtained through reasoning is called as an unknown relation; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by N total the/M evaluates the global level of the knowledge graph,the information density of the knowledge graph overall layer is measured, and the credibility of data contained in the whole knowledge graph is further evaluated; wherein N is total The total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
According to the knowledge graph triple confidence evaluation method, the fusion stage is realized through the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure GDA0003671111330000062
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher when the Confidence value is larger; the Confidence value of Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and finally the Confidence value is normalized to [0,1]]An interval; if the confidence value is less than the threshold value of 0.6 in the designated knowledge graph, the data of the triple is unreliable; n' r Representing data source confidence from the relational level.
According to the knowledge graph triple confidence evaluation method, the verification stage is used for evaluating whether the final confidence value of the knowledge graph triple is reasonable or not, and then the design of an evaluator and a fusion device is optimized; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and verifying whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reason of the data which is seriously and negatively correlated with the verification result and the confidence value is deeply investigated, and the weight of each method in the fusion stage is adjusted, so that the whole knowledge-graph triple confidence evaluation method is perfected.
According to the method for evaluating the confidence coefficient of the knowledge graph triplet, the literature base in the step a-2) and the step b-2) comprises CAS, Patent, PubMed, Wikipedia and DOI, and the values of the value alpha, the value beta and the value theta are 0.7, 0.2 and 0.1 respectively; the scoring values T for different document categories are shown in table 1:
TABLE 1
Class of documents Scoring value
CAS 1.0
Patent 0.8
PubMed 1.0
Wikipedia 0.5
DOI 1.0
In the knowledge map triple confidence evaluation method, in the evaluation process of the known relationship on the relationship layer in the step b-3), the confidence of the known relationship is measured by adopting a resource rank algorithm; the resource rank algorithm is used for describing the association strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure GDA0003671111330000081
wherein M is t Is the set of all the nodes leading to the tail node t, OD (e) i ) Is node e i The out-of-range of (c) is,
Figure GDA0003671111330000082
is node e i Bandwidth to tail node t, i.e. the number of paths; for M t In each node e i From node e i The amount of resources transferred to the tail node t is
Figure GDA0003671111330000083
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3), using R (t | h), the degree of entry ID (h) of the head node h, the degree of exit OD (h) of the head node h, the degree of entry ID (t) of the tail node t, the degree of exit OD (t) of the tail node t, and the depth Dep from the head node to the tail node in the step b-3-2), totaling 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein RR (h, t) is the confidence resource rank, and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the calculation is carried out through a formula (7):
Figure GDA0003671111330000084
where φ is a non-linear activation function, W i And b i Is a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer its value is to 1, the more likely there is a relationship between h and t.
The beneficial effects of the invention are: the method for evaluating the confidence coefficient of the knowledge graph triples comprises the steps that firstly, the confidence coefficient of the triples is evaluated in an evaluation stage from three aspects of entities, relations and knowledge graph overall situation, multiple angles of data sources, document co-occurrence, external link scale, text description, entity importance and entity degree to obtain 11 confidence coefficients, then, in a verification stage, 11 confidence coefficient evaluators are weighted and fused to obtain a final confidence value, in the verification stage, the rationality of the final confidence value is verified, and verification results are fed back to the evaluation stage and the fusion stage to optimize the design of the evaluation stage or adjust the weight of the fusion stage. Therefore, the knowledge graph triple confidence evaluation method can efficiently, quickly and massively discover errors in knowledge graph data, and further improve the data quality of the whole knowledge graph system; and the data reliability check can be carried out on the results of machine learning tasks such as link prediction, relationship inference and the like.
Drawings
FIG. 1 is a schematic diagram of the applicable stages of three types of confidence evaluation methods;
FIG. 2 is a schematic architecture diagram of the knowledge-graph triple confidence evaluation method of the present invention;
FIG. 3 is a schematic diagram of the ResourceRank algorithm in the present invention;
fig. 4 is a diagram of an exemplary case for calculating confidence in the evaluation phase.
Detailed Description
The invention is further described with reference to the following figures and examples.
As shown in fig. 2, a principle architecture diagram of the method for evaluating confidence of knowledge-graph triples of the present invention is given, the method for evaluating confidence of knowledge-graph triples of the present invention is used for evaluating the reliability of triples in a biomedical knowledge-graph, and the method for evaluating confidence of knowledge-graph triples of the present invention comprises: the system comprises an evaluator, a fusion device and a checker, wherein the three-element data of the knowledge graph generates a plurality of confidence value scores after passing through the evaluator, and the fusion device fuses the scores according to a certain weight to generate a final confidence value. The checker checks the rationality of the final confidence value and feeds back the check result to the evaluator and the fuser for optimizing the design of the evaluator or adjusting the weight of the fuser.
The evaluator evaluates the confidence of the triples from three levels of entities, relations, knowledge graph global and the like, a plurality of angles of data sources, document co-occurrence, external chain scale, text description, entity importance and entity degree, and the specific method is shown in table 2:
TABLE 2
Figure GDA0003671111330000091
Figure GDA0003671111330000101
The knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated including compounds, diseases, proteins, genes, pathways, cell lines, pharmaceuticals, products, targets, enzymes, protein-compoundsA total of 11 things, data source confidence N for each entity r Referring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N for an entity r The value of (a) is equal to the number of stars scored by the LOD, and if the same entity appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtained r Taking the highest score value;
as shown in table 3, a LOD data source quality evaluation table is given:
Figure GDA0003671111330000102
Figure GDA0003671111330000111
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure GDA0003671111330000112
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
in the step, the document library comprises CAS, Patent, PubMed, Wikipedia and DOI, wherein the values of alpha, beta and theta are respectively 0.7, 0.2 and 0.1; the scoring values T for different document categories are shown in table 1:
TABLE 1
Figure GDA0003671111330000113
Figure GDA0003671111330000121
a-3) evaluation of entity by external chain scale angle, and external chain scale confidence N of entity L The reliability of the entity data is higher when the external chain scale of the entity is larger, the credibility of the entity is measured by the external chain number of the entity, and the external chain scale confidence coefficient N of the entity is expressed by the number of the external links of the entity in the biomedical knowledge map L Equal to the number of outer chains of the entity;
a-4) evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, category and functional information of the entity, and the data reliability of the entity with the text description is higher; if the text description of the corresponding entity exists in the data source in the step a-1), the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedicine knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure GDA0003671111330000122
wherein, P 1 、P 2 、…、P i 、…、P n Represents a node in the knowledge-graph and,
Figure GDA0003671111330000123
representing a node P to be investigated j The degree of penetration of (a) is,
Figure GDA0003671111330000124
representing a node P to be investigated j N represents the number of nodes in the knowledge-graph,
Figure GDA0003671111330000125
representing a node P j The PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6), evaluating the entity by the angle of the degree of the entity, wherein the in-degree and out-degree of the entity node reflect the enrichment degree of entity information in the knowledge graph and the correlation strength between the entity and other entities; confidence N of angle of degree of entity s The calculation is performed by equation (3):
N s =N in +N out (3)
wherein, N s Confidence of angle, N, representing degree of entity in Representing the degree of entry, N, of a physical node out Representing the out degree of the entity node;
b) evaluating the relation level;
b-1), evaluating the relation level by the data source angle, and generally representing the relation between the entities in the biomedical knowledge graph by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relation between the entities; if the triple data come from a high-quality data source, the relevance between the two entities is very strong, and the confidence coefficient of the triple information is very high; data source confidence N 'of relationship layer' in Referring to LOD scoring in The Linked Open Data Cloud, and giving 5-star, 5-star and 4-star scores for PubChem, RCSB PDB, drug Bank and DTO ontology Data sources which are not subjected to LOD scoring respectively; data source confidence N 'of relationship layer' in Is equal to the star number marked by LOD, if the same entity appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relation level is' in Taking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure GDA0003671111330000131
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weights;
b-3), evaluating the known relationship layer among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a resource rank algorithm to obtain the confidence coefficient of the known relationship;
as shown in fig. 3, a schematic diagram of the principle of the resource rank algorithm in the present invention is given, and the edges (relationship) from the node (entity) a to the node E are very dense, which indicates that there is a high association strength between the two entities (a, E), and there is a relationship between the entities a and E. However, there is no directly associated edge between node G and node F, which means that there is no relationship between entities G and F.
In the step, the confidence coefficient of the known relation is measured by adopting a resource rank algorithm; the resource rank algorithm is used for describing the association strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure GDA0003671111330000141
wherein, M t Is the set of all the nodes leading to the tail node t, OD (e) i ) Is node e i The out-of-range of (c) is,
Figure GDA0003671111330000142
is node e i Bandwidth to tail node t, i.e. the number of paths; for M t In each node e i From node e i The amount of resources transferred to the tail node t is
Figure GDA0003671111330000143
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3) utilizing R (t | h), the degree of approach ID (h) of the head node h, the degree of departure OD (h) of the head node h, the degree of approach ID (t) of the tail node t, the degree of departure OD (t) of the tail node t and the depth Dep from the head node to the tail node in the step b-3-2) to total 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein RR (h, t) is a confidence resource rank and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the probability is obtained through a formula (7):
Figure GDA0003671111330000144
where φ is a non-linear activation function, W i And b i Is a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer its value is to 1, the more likely there is a relationship between h and t.
b-4), evaluating an unknown relation level among the entities, wherein the entity relation which does not exist in the existing knowledge graph and needs to be obtained through reasoning is called as an unknown relation; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by N total The knowledge graph global level is evaluated by the aid of the evaluation module/M, information density of the knowledge graph global level is measured, and then the whole knowledge graph is evaluatedThe credibility of data contained in each knowledge graph; wherein N is total The total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
The fusion phase is realized by the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure GDA0003671111330000151
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher if the Confidence value is larger; the Confidence value of Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and finally the Confidence value is normalized to [0,1]]An interval; if the confidence value is less than the threshold value of 0.6 in the designated knowledge graph, the data of the triple is unreliable; n' r Representing data source confidence from the relational level.
The checking stage is used for evaluating whether the final confidence value of the knowledge map triple is reasonable or not so as to optimize the design of an evaluator and a fusion device; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and checking whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reason of the data which is seriously and negatively correlated with the verification result and the confidence value is deeply investigated, and the weight of the fusion stage is adjusted, so that the whole knowledge map triple confidence degree evaluation method is perfected.
As shown in fig. 4, a typical case diagram of confidence calculation during the evaluation stage is given, taking (noradrenaline, binding molecule entity, β 2 adrenergic receptor) triple as an example, to briefly describe the process of calculating confidence by the evaluator: at the physical level, a translation-based energy function algorithm (TEF) was used to calculate the likelihood that a binding relationship between norepinephrine and β 2 adrenergic receptors exists. The energy function of the (noradrenaline, binding molecule entity, β 2 adrenergic receptor) triplet is first calculated to achieve a low-dimensional distributed representation of entities and relationships. And then converting the energy function into the probability that the entity pair (noradrenaline, beta 2 adrenergic receptor) forms the entity relationship of the binding molecules by using a sigmoid function, and measuring the possibility that the two entities have the binding relationship by the obtained probability value. And in the relation layer, the relation type and the correlation strength of the medicine and the target are calculated by using a resource rank algorithm. The resourcerrank algorithm creates a sub-graph with depth 2 centered around noradrenaline and β 2 adrenergic receptors, and then calculates the amount of resources flowing from the head entity (noradrenaline) to the tail entity (β 2 adrenergic receptors) based on the generated sub-graph, and if the association between the entity pair (noradrenaline, β 2 adrenergic receptors) is strong, there will be a very large number of resources passing from the head entity (noradrenaline) to the tail entity (β 2 adrenergic receptors) through all the associated paths. And on the data source level, a DataSource algorithm is used for comprehensively evaluating the quality of the data sources of the Drug Target Ontology (Drug Target Ontology), the PRotein Ontology (PRoein Ontology) and the UniProt where the triples are located. First, data for the (norepinephrine, binding molecule entity, β 2 adrenergic receptor) triplet is contained in the drug target entity, protein entity, and UniProt data sources. Secondly, The quality of The Data in The three Data sources is different, a Data source algorithm makes an LOD Data source quality evaluation table by referring to The grading of different Data source qualities in a related Open Data Cloud (LOD), and The confidence evaluation of The Data source layer is realized according to a set rule. At the literature co-occurrence level, a literature co-occurrence algorithm (LCO) quantitatively identifies the strength of association of an entity pair with the number of literature co-occurrences. First, the algorithm screened a literature containing (norepinephrine, binding molecule entity, β 2 adrenergic receptor) triplets. Then, the number of the documents is taken as the main, and the weighted calculation is carried out according to a certain weight by referring to the information of the influence factors, the quotation, the journal categories and the like of the documents, and finally the confidence value for identifying the entity to the association strength is obtained. At the level of a knowledge graph structure, a reachable path reasoning algorithm (RP) is used for evaluating semantic correlation existing between head and tail entities in the directed graph and a complex reasoning mode contained between triples. Firstly, considering semantic relevance factors of the path and the target triple, and selecting the reachable path by a path selection algorithm based on the semantic distance. The selected reachable paths are then mapped to a low-dimensional vector, and a Recurrent Neural Network (RNN) is used to obtain a final output vector, which may represent semantic information for each path. Finally, the vector is subjected to nonlinear processing to obtain a value RP ((h, r, t)) which is used for representing the confidence of the diagram structure level in the knowledge-graph.

Claims (5)

1. A knowledge graph triple confidence evaluation method comprises an evaluation stage, a fusion stage and a verification stage, and is characterized in that: the evaluation phase is realized by the following steps:
a) entity level assessment;
a-1) evaluation of entities from a data source perspective, the entities to be evaluated comprising 11 total of compounds, diseases, proteins, genes, pathways, cell lines, drugs, products, targets, enzymes, protein-compounds, the data source confidence N for each entity r And respectively giving 5 stars to PubChem, RCSB PDB, DrugBank and DTO ontology data sources which are not subjected to LOD scoring by referring to LOD scoring in the associated open data cloud5, 5 and 4 stars; data source confidence N for an entity r The value of (a) is equal to the number of stars scored by the LOD, and if the same entity appears in 2 or more than 2 data sources, the confidence coefficient N of the data source is obtained r Taking the highest score value;
a-2) evaluating the entity by the document co-occurrence angle, inquiring documents related to the entity in a document library, and solving the confidence coefficient LCA of the document co-occurrence angle of the entity by a formula (1):
Figure FDA0003754823780000011
LCA represents the document co-occurrence angle confidence of an entity, N represents the number of documents related to the entity, F represents the influence factor of the documents, L is the reference amount of the documents, T is the score value corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weight values;
a-3) evaluation of entity by external chain scale angle, and external chain scale confidence N of entity L The reliability of the entity data is higher when the external chain scale of the entity is larger, the credibility of the entity is measured by the external chain number of the entity, and the external chain scale confidence coefficient N of the entity is expressed by the number of the external links of the entity in the biomedical knowledge map L Equal to the number of outer chains of the entity;
a-4) evaluating the entity by the text description angle, wherein the entity text description is the description of the concept, category and functional information of the entity, and the data reliability of the entity with the text description is higher; if the text description of the corresponding entity exists in the data source in the step a-1), the value of the text description confidence value D of the entity is 1, and if the text description confidence value D does not exist, the value of the text description confidence value D is 0;
a-5) evaluating the entity from the perspective of entity importance, wherein the importance of the node in the whole graph is directly determined by the quantity and quality of linked entity nodes in the biomedical knowledge graph; the importance of a certain entity in the knowledge graph is measured by adopting a PageRank algorithm to represent the confidence coefficient of the importance of the entity, wherein the PageRank algorithm is shown as a formula (2):
Figure FDA0003754823780000021
wherein, P 1 、P 2 、…、P i 、…、P n Represents a node in the knowledge-graph and,
Figure FDA0003754823780000022
representing a node P to be studied j The degree of penetration of the (c) is,
Figure FDA0003754823780000023
representing a node P to be investigated j N represents the number of nodes in the knowledge-graph,
Figure FDA0003754823780000024
representing a node P j The PageRank values of all the nodes form a PageRank vector of the knowledge graph, and q represents the probability of continuous expansion of the nodes in the knowledge graph and is 0.5;
a-6), evaluating the entity by the angle of the degree of the entity, wherein the in-degree and out-degree of the entity node reflect the enrichment degree of entity information in the knowledge graph and the correlation strength between the entity and other entities; confidence N of angle of degree of entity s The calculation is performed by equation (3):
N s =N in +N out (3)
wherein, N s Confidence of angle, N, representing degree of entity in Representing the degree of entry, N, of a physical node out Representing the degree of departure of the entity node;
b) evaluating a relationship level;
b-1), evaluating the relation level by the data source angle, and generally representing the relation between the entities in the biomedical knowledge graph by a triplet (h, r, t), wherein h is a head entity, t is a tail entity, and r is the relation between the entities; if the triple data come from a high-quality data source, the relevance between the two entities is very strong, and the confidence coefficient of the triple information is very high; of the relation layerData Source confidence N in Referring to LOD scoring in the associated open data cloud, and giving 5-star, 5-star and 4-star scoring for PubChem, RCSB PDB, DrugBank and DTO ontology data sources which are not subjected to LOD scoring respectively; data source confidence N 'of relation layer' in Is equal to the number of stars scored by LOD, and if the same entity appears in 2 or more than 2 data sources, the data source confidence coefficient N 'of the relationship layer is' in Taking the highest score value;
b-2) evaluating the relation level by the document co-occurrence angle, inquiring documents related to the entity pair (h, t) in a document library, and solving the document co-occurrence angle confidence coefficient LCA' of the entity pair (h, t) by a formula (4):
Figure FDA0003754823780000031
LCA 'represents the confidence coefficient of the co-occurrence angle of the documents of the entity pair (h, T), N' represents the number of the documents related to the entity pair (h, T), F represents the influence factor of the documents, L represents the reference quantity of the documents, T represents the score values corresponding to different document categories, i represents the ith document, and alpha, beta and theta represent weights;
b-3), evaluating the known relationship level among the entities, establishing an entity relationship in the construction process of the biomedical knowledge graph, namely a known relationship, and measuring the confidence coefficient of the known relationship by adopting a Re sourceRank algorithm to obtain the confidence coefficient Re sourceRank of the known relationship;
b-4), evaluating unknown relation layers among the entities, wherein entity relations which do not exist in the existing knowledge graph and need to be obtained through reasoning are called unknown relations; adopting a KSP algorithm to measure the confidence coefficient of the unknown relationship, and evaluating the relationship strength through the number of the first K shortest paths between two entities in the map to obtain the confidence coefficient KSP of the unknown relationship;
c) estimating the global level of the knowledge graph;
by N total The knowledge graph global level is evaluated by the aid of the evaluation module, so that information density of the knowledge graph global level is measured, and data contained in the whole knowledge graph are evaluatedThe reliability of (2); wherein N is total The total degree of all entity nodes of the knowledge graph is the sum of the in-degree and the out-degree of all the entity nodes, and M is the total number of the entity nodes in the knowledge graph.
2. The knowledge-graph triplet confidence assessment method of claim 1, characterized in that: the fusion phase is achieved by the following steps: combining the data quality condition of the biomedical knowledge graph and the medicine-target point relation prediction task factors, solving the triple confidence value of the biomedical knowledge graph through a formula 5:
Figure FDA0003754823780000041
the Confidence represents a triple Confidence value which is a positive number, and the Confidence is higher when the Confidence value is larger; the Confidence value of the Confidence is obtained by weighting 11 Confidence evaluators of an entity level, a relation level and a knowledge graph global level, and the Confidence value is finally normalized to a [0,1] interval; if the confidence value is less than the threshold value of 0.6 in the given knowledge-graph, the data of the triple is not reliable.
3. The knowledge-graph triplet confidence assessment method of claim 2, characterized in that: the verification stage is used for evaluating whether the final confidence value of the knowledge map triple is reasonable or not, and further optimizing the design of an evaluator and a fusion device; the checker comprises two methods of expert sampling check and automatic check; and (4) expert sampling and checking: the expert sampling and checking method is characterized in that manual checking is carried out by means of experts in the medical field, and the checking range of the experts is as follows: the confidence score is in the range of [0.9,1] and the triplets contain data of the existing drugs or hot targets; the expert checking method comprises the following steps: researching the medicines and targets related to the triad, and checking whether the triad data with high confidence values is reliable or not according to professional knowledge and experience;
automatic verification: the automatic verification method is to verify the confidence value of the triple by means of a molecular docking technology, and the range of the automatic verification is as follows: the confidence value range is [0.6,0.9], 10% of the triples are randomly sampled; the automatic checking method comprises the following steps: performing molecular docking calculation on the drug-target data related to the triples by using a LibDock and GOLD scoring function in the Discovery Studio 2018Client, and judging whether the confidence value is reliable or not according to the final scoring value;
and the result of the verification stage is fed back to the evaluation stage and the fusion stage, the reasons of the data which are seriously negatively related to the verification result and the confidence value are deeply investigated, and the weight of each method in the fusion stage is adjusted, so that the whole knowledge graph triple confidence degree evaluation method is perfected.
4. The knowledge-graph triplet confidence assessment method of claim 1 or 2, characterized in that: the literature base in the step a-2) and the step b-2) comprises CAS, Patent, PubMed, Wikipedia and DOI, and the values of the alpha, the beta and the theta are respectively 0.7, 0.2 and 0.1; the scoring values T corresponding to different document categories are: scoring values corresponding to the document categories CAS, Patent, PubMed, Wikipedia, and DOI are 1.0, 0.8, 1.0, 0.5, and 1.0, respectively.
5. The knowledge-graph triplet confidence assessment method of claim 1 or 2, characterized in that: in the evaluation process of the known relationship to the relationship layer in the step b-3), measuring the confidence coefficient of the known relationship by adopting a Re sourceRank algorithm; the Re sourceRank algorithm is used for describing the correlation strength between two entities, and the idea of the algorithm is as follows: if the association between the entity pair (h, t) is strong, then there will be very many resources passing from the head entity h to the tail entity t through all the association paths; the method is realized by the following steps:
b-3-1), constructing a directed graph taking a head entity h as a center;
b-3-2), iteratively calculating the resources in the graph by using a formula (6) until the resources are converged, and calculating a resource reservation value of the tail entity t;
Figure FDA0003754823780000051
wherein M is t Is the set of all the nodes leading to the tail node t, OD (e) i ) Is node e i Out of degree, BW eit Is node e i Bandwidth to tail node t, i.e. the number of paths; for M t In each node e i From node e i The amount of resources transferred to the tail node t is
Figure FDA0003754823780000052
Setting that the resource flow of each node has the same eta probability and can directly jump to a random node, wherein the part of resources flowing to a tail node t randomly is 1/N, and N is the total number of the nodes;
b-3-3), using R (t | h), the degree of entry ID (h) of the head node h, the degree of exit OD (h) of the head node h, the degree of entry ID (t) of the tail node t, the degree of exit OD (t) of the tail node t, and the depth Dep from the head node to the tail node in the step b-3-2), totaling 6 characteristics to construct a characteristic vector V, converting the V into a probability value RR (h, t) through an activation function, wherein the RR (h, t) is a confidence Re sourceRank, and is used for measuring the possibility that one or more relations exist between the head node h and the tail node t, and the calculation is carried out through a formula (7):
Figure FDA0003754823780000053
where φ is a non-linear activation function, W i And b i Is a parameter matrix which can be adjusted during training, and the range of RR (h, t) value is 0,1]The closer its value is to 1, the more likely there is a relationship between h and t.
CN202011309998.5A 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method Active CN112417166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011309998.5A CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011309998.5A CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Publications (2)

Publication Number Publication Date
CN112417166A CN112417166A (en) 2021-02-26
CN112417166B true CN112417166B (en) 2022-08-26

Family

ID=74774496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011309998.5A Active CN112417166B (en) 2020-11-20 2020-11-20 Knowledge graph triple confidence evaluation method

Country Status (1)

Country Link
CN (1) CN112417166B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204650B (en) * 2021-05-14 2022-03-11 深圳市曙光信息技术有限公司 Evaluation method and system based on domain knowledge graph
CN116110594B (en) * 2022-12-02 2024-05-07 北京交通大学 Knowledge evaluation method and system of medical knowledge graph based on associated literature
CN115860152B (en) * 2023-02-20 2023-06-27 南京星耀智能科技有限公司 Cross-modal joint learning method for character military knowledge discovery
CN116187868B (en) * 2023-04-27 2023-07-21 深圳市迪博企业风险管理技术有限公司 Knowledge graph-based industrial chain development quality evaluation method and device
CN116501915B (en) * 2023-06-29 2023-10-20 长江三峡集团实业发展(北京)有限公司 Energy management end voice page retrieval method and system
CN117725231B (en) * 2024-02-08 2024-04-23 中国电子科技集团公司第十五研究所 Content generation method and system based on semantic evidence prompt and confidence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355627A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Method and system used for generating knowledge graphs
CN109063021A (en) * 2018-07-12 2018-12-21 浙江大学 A kind of knowledge mapping distribution representation method for capableing of encoding relation semanteme Diversity structure
CN111177322A (en) * 2019-12-30 2020-05-19 成都数之联科技有限公司 Ontology model construction method of domain knowledge graph

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9466297B2 (en) * 2014-12-09 2016-10-11 Microsoft Technology Licensing, Llc Communication system
US10606893B2 (en) * 2016-09-15 2020-03-31 International Business Machines Corporation Expanding knowledge graphs based on candidate missing edges to optimize hypothesis set adjudication
US10223639B2 (en) * 2017-06-22 2019-03-05 International Business Machines Corporation Relation extraction using co-training with distant supervision
CN110309310A (en) * 2018-02-12 2019-10-08 清华大学 Representation of knowledge learning method based on confidence level
US11544605B2 (en) * 2018-03-07 2023-01-03 International Business Machines Corporation Unit conversion in a synonym-sensitive framework for question answering
CN110069638B (en) * 2019-03-12 2021-01-05 北京航空航天大学 Knowledge graph combined representation learning method combining rules and paths
CN111737481B (en) * 2019-10-10 2024-03-01 北京沃东天骏信息技术有限公司 Method, device, equipment and storage medium for noise reduction of knowledge graph
CN111625659B (en) * 2020-08-03 2020-11-13 腾讯科技(深圳)有限公司 Knowledge graph processing method, device, server and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355627A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Method and system used for generating knowledge graphs
CN109063021A (en) * 2018-07-12 2018-12-21 浙江大学 A kind of knowledge mapping distribution representation method for capableing of encoding relation semanteme Diversity structure
CN111177322A (en) * 2019-12-30 2020-05-19 成都数之联科技有限公司 Ontology model construction method of domain knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Topological analysis of knowledge maps;Jun Liu et al.;《Knowledge-Based Systems》;20121231;全文 *
知识图谱的发展与构建;李涛等;《南京理工大学学报》;20170309;全文 *

Also Published As

Publication number Publication date
CN112417166A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112417166B (en) Knowledge graph triple confidence evaluation method
Tang et al. A pruning neural network model in credit classification analysis
US11455474B2 (en) Diagnosing sources of noise in an evaluation
CN112364880B (en) Omics data processing method, device, equipment and medium based on graph neural network
US20190130277A1 (en) Ensembling of neural network models
WO2020008919A1 (en) Machine learning device and method
CN113140254A (en) Meta-learning drug-target interaction prediction system and prediction method
Mendonça et al. Approximating network centrality measures using node embedding and machine learning
Gupta et al. Implementing weighted entropy-distance based approach for the selection of software reliability growth models
CN115512785A (en) Attention mechanism-based three-dimensional protein-ligand activity prediction method
Amirov et al. Medical data processing system based on neural network and genetic algorithm
Min et al. Poverty prediction using machine learning approach
Ansyari et al. Implementation of Random Forest and Extreme Gradient Boosting in the Classification of Heart Disease Using Particle Swarm Optimization Feature Selection
Qie et al. A stage model for agent-based emotional persuasion with an adaptive target: From a social exchange perspective
Cao Evaluating the vocal music teaching using backpropagation neural network
Sharifi et al. Banks credit risk prediction with optimized ANN based on improved owl search algorithm
CN116434976A (en) Drug repositioning method and system integrating multisource knowledge-graph
Manoharan et al. Ensemble Model for Educational Data Mining Based on Synthetic Minority Oversampling Technique
CN115516473A (en) Hybrid human-machine learning system
Ghorbanali et al. DRP-VEM: Drug repositioning prediction using voting ensemble
Feng et al. A drug information embedding method based on graph convolution neural network
Jaszcz The impact of the weighting techniques on MultiMOORA-based ranking on patients using ambiguous medical data.
Samizadeh et al. Web mining based on word-centric search with clustering approach using MLP-PSO hybrid
Wu et al. A cloud-based fuzzy expert system for the risk assessment of chronic kidney diseases
Ma et al. Social network group decision-making model considering interactions between trust relationships and opinion evolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant