CN116110594B - Knowledge evaluation method and system of medical knowledge graph based on associated literature - Google Patents

Knowledge evaluation method and system of medical knowledge graph based on associated literature Download PDF

Info

Publication number
CN116110594B
CN116110594B CN202211541515.3A CN202211541515A CN116110594B CN 116110594 B CN116110594 B CN 116110594B CN 202211541515 A CN202211541515 A CN 202211541515A CN 116110594 B CN116110594 B CN 116110594B
Authority
CN
China
Prior art keywords
knowledge
medical
literature
document
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211541515.3A
Other languages
Chinese (zh)
Other versions
CN116110594A (en
Inventor
花睿
周雪忠
舒梓心
杨扩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202211541515.3A priority Critical patent/CN116110594B/en
Publication of CN116110594A publication Critical patent/CN116110594A/en
Application granted granted Critical
Publication of CN116110594B publication Critical patent/CN116110594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on a related document, which belong to the technical field of clinical medicine, and acquire related evidence information of a medical entity and the document based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature; integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document; and evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph. According to the invention, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature association network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.

Description

Knowledge evaluation method and system of medical knowledge graph based on associated literature
Technical Field
The invention relates to the technical field of clinical medicine, in particular to a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on associated literature.
Background
The medical knowledge graph (KnowledgeGraph, KG) is a basic stone for realizing intelligent medical treatment, and is expected to bring more efficient and accurate medical services. The knowledge graph connects the trivial and scattered medical information knowledge to support comprehensive auxiliary diagnosis and treatment, knowledge retrieval question-answering and intelligent medical treatment. In recent years, along with the wide application of big data and artificial intelligence technology in the medical field, the research subjects gradually develop towards diversification, and the construction of various specialized disease knowledge bases including electronic medical records, clinical paths and the like becomes a research hotspot. However, on one hand, most medical knowledge maps only record the type or origin of knowledge sources, and lack more accurate and reliable knowledge evidence information. On the other hand, the existing knowledge graph still has a large amount of unreliable information, which creates great challenges for further knowledge reasoning and discovery. Therefore, there is a need to construct a reliable knowledge graph construction and evaluation method.
The medical knowledge graph is a vertical domain knowledge graph, is a graph-based semantic network, and represents the relationship between biomedical entities, and aims to improve the quality of search results and the retrieval efficiency. However, due to the individuality and complexity of medical research environments and human systems, in the case that conditions and environments are not controlled, there are a large number of false positive research results, which are normal and reasonable phenomena, and the generation of biomedical knowledge is often based on specific conditions, so how to identify the environments and conditions required by the existing medical knowledge by the method of the related literature is very important. And the method fully utilizes the characteristics of authors, contexts, keywords, titles and the like of the documents, establishes the association of medical knowledge and the documents, and determines the meaning of the medical knowledge under different situations.
With the arrival of big data and artificial intelligence technology age, intelligent medical treatment with the core of digitalization, informatization and intelligence is rapidly developed and widely applied in the biomedical field, and simultaneously, a large amount of electronic medical record and scientific literature data are accumulated. How to store and effectively utilize large-scale medical data is a problem to be solved. Knowledge maps are widely used as an effective way to store structured knowledge. Because the construction modes of the medical knowledge graph are various, knowledge from different sources can have different degrees of reliability under different use scenes, and meanwhile, the medical knowledge graph is always in a continuously perfect and updated state because of the continuous discovery of new medical knowledge. The existing medical knowledge graph has respective evaluation standards according to different data integration modes, data sources, data analysis modes and the like. For example, forum uses existing semantic networks to infer new relationships and uses methods such as enrichment analysis to construct weighted knowledge maps. MALACARDS disease database measures reliability of disease-gene relationships by constructing custom MIFTS, MSRS, MCRS and other scoring criteria. Most medical knowledge graphs are constructed by integrating databases of different sources, and the reliability problem of knowledge under different situations is not considered. Most of the existing medical knowledge maps only record the sources and sources of knowledge, but the medical knowledge needs to consider the context information, the reliability of knowledge from different sources is often different, and direct integration can lead to unreliable overall knowledge. In addition, no effective knowledge reliability evaluation method exists for the existing knowledge graph.
Hu Manman et al propose a method and system for constructing a medical knowledge graph based on neural network and remote supervision by acquiring a medical text set and a medical entity set, training a named entity recognition and relationship extraction model by using a remote supervision method, and automatically extracting and constructing a medical knowledge graph containing candidate entities and relationships thereof from large-scale unstructured data. However, since the medical text set is used as a corpus, the reliability of the corpus cannot be verified, and the remote labeling data with a large amount of noise is directly input into the model, the expected effect is difficult to achieve.
Wang Yalin et al propose a knowledge-graph triplet reliability assessment method, which is to train a binary neural network by constructing positive and negative samples and other graph information through randomly replacing entities or relations by utilizing embedded vectors of a pre-training model, and evaluate the reliability of triplet knowledge, but only consider the information of the graph itself and have no associated external evidence information.
Disclosure of Invention
The invention aims to provide a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on related literature, which are used for solving at least one technical problem in the background technology.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in one aspect, the invention provides a knowledge assessment method of a medical knowledge graph based on a related document, which comprises the following steps:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Preferably, based on remote supervision, obtaining associated evidence information of the medical entity and the document includes: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.
Preferably, constructing the associated evidence information of the medical relationship and the document based on the significance test includes: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model.
Preferably, for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, and sorting, taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.
Preferably, evaluating the reliability of the acquired medical knowledge includes: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
Preferably, the evaluation is performed from multiple angles of remote supervision model, knowledge-document correlation network, and relevant document features for a specific piece of knowledge.
In a second aspect, the present invention provides a knowledge assessment system based on a medical knowledge graph of an associated document, including:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And the evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
In a third aspect, the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement a knowledge-based assessment method based on a medical knowledge graph of an associated document as described above.
In a fourth aspect, the present invention provides a computer program product comprising a computer program for implementing a knowledge-based assessment method of medical knowledge-graph based on an associated document as described above, when run on one or more processors.
In a fifth aspect, the present invention provides an electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as described above.
The invention has the beneficial effects that: the medical entity, relation and literature correlation knowledge graph construction method integrating remote supervision and significance test is provided, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature correlation network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.
The advantages of additional aspects of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a knowledge assessment method based on a medical knowledge graph of an associated document according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality. The embodiments described below by way of the drawings are exemplary only and should not be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or groups thereof.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.
It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.
Example 1
In this embodiment 1, there is provided first a knowledge assessment system of a medical knowledge graph based on a related document, including:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And the evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
In this embodiment 1, a knowledge evaluation method of a medical knowledge graph based on a related document is implemented by using the system described above, including:
acquiring associated evidence information of a medical entity and a literature based on remote supervision and constructing associated evidence information of a medical relationship and the literature based on significance test by utilizing an acquisition module;
utilizing a construction module, fusing remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph by using an evaluation module.
Wherein, based on remote supervision, obtaining associated evidence information of the medical entity and the document comprises: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.
Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model. For positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.
Evaluating the reliability of the acquired medical knowledge, including: and obtaining all sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents through PubMedIDs associated with each sample, thereby forming a knowledge-document association network, comprising three-tuple knowledge and two types of nodes of the documents, and two types of side relations of the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining multidimensional characteristics of the documents.
For a particular piece of knowledge, the evaluation is performed from multiple angles of a remote supervision model, a knowledge-document correlation network and related document features.
Example 2
As shown in fig. 1, in this embodiment 2, a method for constructing a knowledge graph and evaluating reliability by fusing document evidence and remote supervision is provided, which specifically includes a method for evaluating knowledge reliability by fusing a medical entity, a relationship and document-associated knowledge graph of remote supervision and significance test, and a knowledge-document-associated network.
In this embodiment 2, a medical entity, relationship and literature association knowledge graph integrating remote supervision and significance test is constructed first. An entity, medical relation and literature association knowledge graph integrating remote supervision and significance test is established, and the method mainly comprises two parts of medical entity-literature association and medical relation-literature association. And finally, through knowledge integration, a knowledge graph of the association of the medical entity, the relationship and the literature is formed.
Wherein constructing a remote supervision based medical entity-document association comprises: first, 5 entities including drugs, components, genes, diseases and symptoms are collected by integrating the existing medical knowledge base (such as UMLS, drugBank). The entity is then mapped into the literature context sentence by remote annotation. Based on the method, 1000 samples are randomly selected for manual labeling, so that a standard test set is formed. For disease and gene entity, a fine tuning test was performed in combination with existing pre-trained models biomedical-disease-ner and biomedical-gene-ner. For drugs, components and symptomatic entities, the named entity recognition of the related entities is less studied, so that the existing remote supervision named entity method (such as BOND) is used for training, and finally, large-scale entity-document associated evidence information is formed.
Constructing a medical relationship-literature association based on remote supervision and significance testing includes: first, all sentences containing related entities in the PubMed abstract are obtained based on a named entity recognition model. In order to form association data of the medical relation and the document, whether two entities exist in the triplet of the knowledge graph or not is judged, so that positive and negative samples are distinguished (namely, if the two entities exist in the triplet of the knowledge graph, the sentence is marked as the positive sample, and vice versa). The difference between positive and negative samples is compared through T test, and a high-quality data set is screened. For positive samples, the mean value of the similarity of each positive sample and the rest of positive samples is calculated, and the first K samples are taken as true examples. The formula is as follows:
S=Topk{Sim(si,sj)1≤i≤j≤n};
The similarity between two sentences s i、sj, such as cosine similarity, is calculated using Sim (s i,sj) as follows: wherein/> Vector representation representing sentence s i,/>The vector representation representing sentence s j is obtained by using an existing pre-trained model in the biomedical field (e.g. PubMedBERT). For each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples: p= { Sim (S i,sj)|1≤i≤j≤K,si∈S,sj e S }).
Secondly, calculating the similarity of the negative sample s and K real examples to obtain the distribution N of the negative samples:
N={Sim(s,si)1≤i≤K,si∈S};
t test is carried out on the two distributions, and p value is calculated: p=t (P, N);
if P is less than 0.05, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if the average value of N is less than the average value of P, the negative sample is considered to be the true and false example.
After the training set is constructed through the steps, 1000 pieces of sample manual labels are randomly screened to serve as a testing set, a relation extraction data set containing various entity types and various relations is finally obtained, further training is carried out through remote supervision relation extraction models such as PCNNs, CIL, BGWA, and a remote supervision algorithm aiming at specific knowledge graph relations is established. And finally, extracting the medical relation-literature association evidence information on a large scale by using a trained remote supervision relation extraction method.
In this example 2, knowledge reliability evaluation was performed in combination with a knowledge-document correlation network. First, all sentences containing specific medical knowledge relations are obtained by using a remote supervision method. Since all sentences are derived from the PubMed document library, the information of the context, the bibliographic information, the references, the author institutions, the publication dates, the cited documents and the like of the documents can be obtained through PubMedIDs associated with each sample. Thereby forming a knowledge-document association network, specifically comprising two types of nodes of triplet knowledge and document, the knowledge-document and the document referencing the two types of relationships. In addition, knowledge reliability assessment needs to take other features of knowledge-related documents into account, and other information of the documents, such as document authors (authors), whether they belong to research properties (is_research), relative Citations (RCR), experimental or research objects (human), clinical transformation potential (APT), whether they belong to clinical research (is_clinical), the number of cited articles (citation _count), etc., is obtained from icite. Finally, the reliability evaluation of knowledge is performed by utilizing the node characteristics (node degree, centrality and the like) and the edge characteristics (edge weight, edge intermediation centrality and the like) of the complex network and combining the multidimensional characteristics of the literature.
For a particular piece of knowledge, the evaluation is performed from multiple angles of remote supervision model (DS), complex Network (NET), related Literature (LIT) features:
Score=ScoreDS+ScoreNET+ScoreLIT
Where Score DS is the predictive Score, predicted by inputting the knowledge context into the remote monitoring model formed above. Score NET is an importance Score of knowledge in the network, and the importance of a certain knowledge node i in the network is expressed by using a normalized medium, and the formula is as follows:
where g st is the number of shortest paths from node s to node t, Represents the number of shortest paths through the knowledge node i among g st shortest paths from node s to node t. Score LIT is a literature feature Score, first excluding review articles through the is_research field. Second, using the authors field, the article diversity of the knowledge is calculated using the following formula:
Where authors represent a collection of authors of a document and i represents the sequence number of the knowledge-related document.
Since a knowledge is related to a plurality of documents, indexes such as RCR, APT, citiation _count and the like are summed, averaged and median calculated. Modeling the three values and ARTICLEDIVERSITY of the three indexes, constructing a neural network two-classification model, using the labeled dataset in the step 1 as a training set, using various indexes of the knowledge as characteristics, and using whether the knowledge is reliable (0/1) as a label. And predicting each knowledge by using the trained model to obtain the Score LIT.
Example 3
An embodiment 3 of the present invention provides an electronic device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, where the method includes the following steps:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Example 4
Embodiment 4 of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a knowledge assessment method based on a medical knowledge graph of an associated document, the method comprising the steps of:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Example 5
Embodiment 5 of the present invention provides a computer device including a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, the method including the steps of:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it should be understood that various changes and modifications could be made by one skilled in the art without the need for inventive faculty, which would fall within the scope of the invention.

Claims (6)

1. A knowledge assessment method of a medical knowledge graph based on a related document, comprising:
Based on remote supervision, acquiring associated evidence information of a medical entity and a document, including: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents;
Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
Based on the constructed knowledge-associated literature knowledge graph and literature-associated literature graph, evaluating the reliability of the acquired medical knowledge, including: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
2. The knowledge-based medical knowledge graph knowledge assessment method based on the relevant literature according to claim 1, wherein the assessment is performed from multiple angles of a remote supervision model, a knowledge-literature association network and relevant literature features for a specific piece of knowledge.
3. A knowledge-based evaluation system for medical knowledge graph based on associated literature, comprising:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; comprising the following steps: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents; based on the significance test, constructing associated evidence information of the medical relation and the literature; comprising the following steps: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
The evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph; comprising the following steps: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
4. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the knowledge-based method of medical knowledge-graph evaluation based on associated literature as claimed in claim 1 or 2.
5. A computer program product comprising a computer program for implementing the knowledge-based method of medical knowledge-graph of an associated document as claimed in claim 1 or 2 when run on one or more processors.
6. An electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as claimed in claim 1 or 2.
CN202211541515.3A 2022-12-02 2022-12-02 Knowledge evaluation method and system of medical knowledge graph based on associated literature Active CN116110594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211541515.3A CN116110594B (en) 2022-12-02 2022-12-02 Knowledge evaluation method and system of medical knowledge graph based on associated literature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211541515.3A CN116110594B (en) 2022-12-02 2022-12-02 Knowledge evaluation method and system of medical knowledge graph based on associated literature

Publications (2)

Publication Number Publication Date
CN116110594A CN116110594A (en) 2023-05-12
CN116110594B true CN116110594B (en) 2024-05-07

Family

ID=86255135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211541515.3A Active CN116110594B (en) 2022-12-02 2022-12-02 Knowledge evaluation method and system of medical knowledge graph based on associated literature

Country Status (1)

Country Link
CN (1) CN116110594B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256828A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Medical entity relationship extraction method and device, computer equipment and readable storage medium
CN112417166A (en) * 2020-11-20 2021-02-26 山东省计算中心(国家超级计算济南中心) Knowledge graph triple confidence evaluation method
CN112885478A (en) * 2021-01-28 2021-06-01 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic device, and storage medium
CN113505244A (en) * 2021-09-10 2021-10-15 中国人民解放军总医院 Knowledge graph construction method, system, equipment and medium based on deep learning
CN113707297A (en) * 2021-08-26 2021-11-26 平安国际智慧城市科技股份有限公司 Medical data processing method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151560A1 (en) * 2018-11-09 2020-05-14 Silot Pte. Ltd. Apparatus and methods for using bayesian program learning for efficient and reliable generation of knowledge graph data structures

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256828A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Medical entity relationship extraction method and device, computer equipment and readable storage medium
WO2021151353A1 (en) * 2020-10-20 2021-08-05 平安科技(深圳)有限公司 Medical entity relationship extraction method and apparatus, and computer device and readable storage medium
CN112417166A (en) * 2020-11-20 2021-02-26 山东省计算中心(国家超级计算济南中心) Knowledge graph triple confidence evaluation method
CN112885478A (en) * 2021-01-28 2021-06-01 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic device, and storage medium
CN113707297A (en) * 2021-08-26 2021-11-26 平安国际智慧城市科技股份有限公司 Medical data processing method, device, equipment and storage medium
CN113505244A (en) * 2021-09-10 2021-10-15 中国人民解放军总医院 Knowledge graph construction method, system, equipment and medium based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Predicting Treatment Relations with Semantic Patterns over Biomedical Knowledge Graphs;Bakal;《2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)》;20151231;第9468卷;586-596 *
中文医学文献的实体关系提取研究及在糖尿病医学文献中的应用;范智渊;《生物医学工程学杂志》;20211231;第38卷(第03期);563-573 *

Also Published As

Publication number Publication date
CN116110594A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
Ali Stroke prediction using distributed machine learning based on apache spark
Fang et al. Feature Selection Method Based on Class Discriminative Degree for Intelligent Medical Diagnosis.
WO2015093541A1 (en) Scenario generation device and computer program therefor
Wu et al. Topic evolution based on LDA and HMM and its application in stem cell research
Das et al. A graph based clustering approach for relation extraction from crime data
KR101875306B1 (en) System for providing disease information using cluster of medicine teminologies
Banna et al. A hybrid deep learning model to predict the impact of COVID-19 on mental health from social media big data
CN109036577A (en) Diabetic complication analysis method and device
Role et al. Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation
Wan et al. Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks
Jatav An algorithm for predictive data mining approach in medical diagnosis
Cao et al. Multi-information source hin for medical concept embedding
Sumathi et al. Improved fuzzy weighted‐iterative association rule based ontology postprocessing in data mining for query recommendation applications
Ma et al. Constructing a semantic graph with depression symptoms extraction from twitter
Comito et al. AI-driven clinical decision support: enhancing disease diagnosis exploiting patients similarity
Juršič et al. Bridging concept identification for constructing information networks from text documents
CN111986814A (en) Modeling method of lupus nephritis prediction model of lupus erythematosus patient
Saranya et al. Intelligent medical data storage system using machine learning approach
Wegrzyn-Wolska et al. Social media analysis for e-health and medical purposes
CN105205075B (en) From the name entity sets extended method of extension and recommended method is inquired based on collaboration
Tandjung et al. Topic modeling with latent-dirichlet allocation for the discovery of state-of-the-art in research: A literature review
CN116110594B (en) Knowledge evaluation method and system of medical knowledge graph based on associated literature
CN114496231A (en) Constitution identification method, apparatus, equipment and storage medium based on knowledge graph
Dost et al. Unraveling the Hepatitis B Cure: A Hybrid AI Approach for Capturing Knowledge about the Immune System's Impact
Xu et al. Biological entity relationship extraction method based on multiple kernel learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant