CN116110594B - Knowledge evaluation method and system of medical knowledge graph based on associated literature - Google Patents
Knowledge evaluation method and system of medical knowledge graph based on associated literature Download PDFInfo
- Publication number
- CN116110594B CN116110594B CN202211541515.3A CN202211541515A CN116110594B CN 116110594 B CN116110594 B CN 116110594B CN 202211541515 A CN202211541515 A CN 202211541515A CN 116110594 B CN116110594 B CN 116110594B
- Authority
- CN
- China
- Prior art keywords
- knowledge
- medical
- literature
- document
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 27
- 238000012360 testing method Methods 0.000 claims abstract description 39
- 239000003814 drug Substances 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 39
- 238000009826 distribution Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 18
- 201000010099 disease Diseases 0.000 claims description 14
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 12
- 108090000623 proteins and genes Proteins 0.000 claims description 10
- 208000024891 symptom Diseases 0.000 claims description 9
- 229940079593 drug Drugs 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000012353 t test Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 abstract description 2
- 238000011160 research Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010201 enrichment analysis Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on a related document, which belong to the technical field of clinical medicine, and acquire related evidence information of a medical entity and the document based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature; integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document; and evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph. According to the invention, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature association network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.
Description
Technical Field
The invention relates to the technical field of clinical medicine, in particular to a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on associated literature.
Background
The medical knowledge graph (KnowledgeGraph, KG) is a basic stone for realizing intelligent medical treatment, and is expected to bring more efficient and accurate medical services. The knowledge graph connects the trivial and scattered medical information knowledge to support comprehensive auxiliary diagnosis and treatment, knowledge retrieval question-answering and intelligent medical treatment. In recent years, along with the wide application of big data and artificial intelligence technology in the medical field, the research subjects gradually develop towards diversification, and the construction of various specialized disease knowledge bases including electronic medical records, clinical paths and the like becomes a research hotspot. However, on one hand, most medical knowledge maps only record the type or origin of knowledge sources, and lack more accurate and reliable knowledge evidence information. On the other hand, the existing knowledge graph still has a large amount of unreliable information, which creates great challenges for further knowledge reasoning and discovery. Therefore, there is a need to construct a reliable knowledge graph construction and evaluation method.
The medical knowledge graph is a vertical domain knowledge graph, is a graph-based semantic network, and represents the relationship between biomedical entities, and aims to improve the quality of search results and the retrieval efficiency. However, due to the individuality and complexity of medical research environments and human systems, in the case that conditions and environments are not controlled, there are a large number of false positive research results, which are normal and reasonable phenomena, and the generation of biomedical knowledge is often based on specific conditions, so how to identify the environments and conditions required by the existing medical knowledge by the method of the related literature is very important. And the method fully utilizes the characteristics of authors, contexts, keywords, titles and the like of the documents, establishes the association of medical knowledge and the documents, and determines the meaning of the medical knowledge under different situations.
With the arrival of big data and artificial intelligence technology age, intelligent medical treatment with the core of digitalization, informatization and intelligence is rapidly developed and widely applied in the biomedical field, and simultaneously, a large amount of electronic medical record and scientific literature data are accumulated. How to store and effectively utilize large-scale medical data is a problem to be solved. Knowledge maps are widely used as an effective way to store structured knowledge. Because the construction modes of the medical knowledge graph are various, knowledge from different sources can have different degrees of reliability under different use scenes, and meanwhile, the medical knowledge graph is always in a continuously perfect and updated state because of the continuous discovery of new medical knowledge. The existing medical knowledge graph has respective evaluation standards according to different data integration modes, data sources, data analysis modes and the like. For example, forum uses existing semantic networks to infer new relationships and uses methods such as enrichment analysis to construct weighted knowledge maps. MALACARDS disease database measures reliability of disease-gene relationships by constructing custom MIFTS, MSRS, MCRS and other scoring criteria. Most medical knowledge graphs are constructed by integrating databases of different sources, and the reliability problem of knowledge under different situations is not considered. Most of the existing medical knowledge maps only record the sources and sources of knowledge, but the medical knowledge needs to consider the context information, the reliability of knowledge from different sources is often different, and direct integration can lead to unreliable overall knowledge. In addition, no effective knowledge reliability evaluation method exists for the existing knowledge graph.
Hu Manman et al propose a method and system for constructing a medical knowledge graph based on neural network and remote supervision by acquiring a medical text set and a medical entity set, training a named entity recognition and relationship extraction model by using a remote supervision method, and automatically extracting and constructing a medical knowledge graph containing candidate entities and relationships thereof from large-scale unstructured data. However, since the medical text set is used as a corpus, the reliability of the corpus cannot be verified, and the remote labeling data with a large amount of noise is directly input into the model, the expected effect is difficult to achieve.
Wang Yalin et al propose a knowledge-graph triplet reliability assessment method, which is to train a binary neural network by constructing positive and negative samples and other graph information through randomly replacing entities or relations by utilizing embedded vectors of a pre-training model, and evaluate the reliability of triplet knowledge, but only consider the information of the graph itself and have no associated external evidence information.
Disclosure of Invention
The invention aims to provide a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on related literature, which are used for solving at least one technical problem in the background technology.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in one aspect, the invention provides a knowledge assessment method of a medical knowledge graph based on a related document, which comprises the following steps:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Preferably, based on remote supervision, obtaining associated evidence information of the medical entity and the document includes: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.
Preferably, constructing the associated evidence information of the medical relationship and the document based on the significance test includes: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model.
Preferably, for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, and sorting, taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.
Preferably, evaluating the reliability of the acquired medical knowledge includes: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
Preferably, the evaluation is performed from multiple angles of remote supervision model, knowledge-document correlation network, and relevant document features for a specific piece of knowledge.
In a second aspect, the present invention provides a knowledge assessment system based on a medical knowledge graph of an associated document, including:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And the evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
In a third aspect, the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement a knowledge-based assessment method based on a medical knowledge graph of an associated document as described above.
In a fourth aspect, the present invention provides a computer program product comprising a computer program for implementing a knowledge-based assessment method of medical knowledge-graph based on an associated document as described above, when run on one or more processors.
In a fifth aspect, the present invention provides an electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as described above.
The invention has the beneficial effects that: the medical entity, relation and literature correlation knowledge graph construction method integrating remote supervision and significance test is provided, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature correlation network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.
The advantages of additional aspects of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a knowledge assessment method based on a medical knowledge graph of an associated document according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality. The embodiments described below by way of the drawings are exemplary only and should not be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or groups thereof.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.
It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.
Example 1
In this embodiment 1, there is provided first a knowledge assessment system of a medical knowledge graph based on a related document, including:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And the evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
In this embodiment 1, a knowledge evaluation method of a medical knowledge graph based on a related document is implemented by using the system described above, including:
acquiring associated evidence information of a medical entity and a literature based on remote supervision and constructing associated evidence information of a medical relationship and the literature based on significance test by utilizing an acquisition module;
utilizing a construction module, fusing remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
And evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph by using an evaluation module.
Wherein, based on remote supervision, obtaining associated evidence information of the medical entity and the document comprises: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.
Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model. For positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.
Evaluating the reliability of the acquired medical knowledge, including: and obtaining all sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents through PubMedIDs associated with each sample, thereby forming a knowledge-document association network, comprising three-tuple knowledge and two types of nodes of the documents, and two types of side relations of the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining multidimensional characteristics of the documents.
For a particular piece of knowledge, the evaluation is performed from multiple angles of a remote supervision model, a knowledge-document correlation network and related document features.
Example 2
As shown in fig. 1, in this embodiment 2, a method for constructing a knowledge graph and evaluating reliability by fusing document evidence and remote supervision is provided, which specifically includes a method for evaluating knowledge reliability by fusing a medical entity, a relationship and document-associated knowledge graph of remote supervision and significance test, and a knowledge-document-associated network.
In this embodiment 2, a medical entity, relationship and literature association knowledge graph integrating remote supervision and significance test is constructed first. An entity, medical relation and literature association knowledge graph integrating remote supervision and significance test is established, and the method mainly comprises two parts of medical entity-literature association and medical relation-literature association. And finally, through knowledge integration, a knowledge graph of the association of the medical entity, the relationship and the literature is formed.
Wherein constructing a remote supervision based medical entity-document association comprises: first, 5 entities including drugs, components, genes, diseases and symptoms are collected by integrating the existing medical knowledge base (such as UMLS, drugBank). The entity is then mapped into the literature context sentence by remote annotation. Based on the method, 1000 samples are randomly selected for manual labeling, so that a standard test set is formed. For disease and gene entity, a fine tuning test was performed in combination with existing pre-trained models biomedical-disease-ner and biomedical-gene-ner. For drugs, components and symptomatic entities, the named entity recognition of the related entities is less studied, so that the existing remote supervision named entity method (such as BOND) is used for training, and finally, large-scale entity-document associated evidence information is formed.
Constructing a medical relationship-literature association based on remote supervision and significance testing includes: first, all sentences containing related entities in the PubMed abstract are obtained based on a named entity recognition model. In order to form association data of the medical relation and the document, whether two entities exist in the triplet of the knowledge graph or not is judged, so that positive and negative samples are distinguished (namely, if the two entities exist in the triplet of the knowledge graph, the sentence is marked as the positive sample, and vice versa). The difference between positive and negative samples is compared through T test, and a high-quality data set is screened. For positive samples, the mean value of the similarity of each positive sample and the rest of positive samples is calculated, and the first K samples are taken as true examples. The formula is as follows:
S=Topk{Sim(si,sj)1≤i≤j≤n};
The similarity between two sentences s i、sj, such as cosine similarity, is calculated using Sim (s i,sj) as follows: wherein/> Vector representation representing sentence s i,/>The vector representation representing sentence s j is obtained by using an existing pre-trained model in the biomedical field (e.g. PubMedBERT). For each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples: p= { Sim (S i,sj)|1≤i≤j≤K,si∈S,sj e S }).
Secondly, calculating the similarity of the negative sample s and K real examples to obtain the distribution N of the negative samples:
N={Sim(s,si)1≤i≤K,si∈S};
t test is carried out on the two distributions, and p value is calculated: p=t (P, N);
if P is less than 0.05, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if the average value of N is less than the average value of P, the negative sample is considered to be the true and false example.
After the training set is constructed through the steps, 1000 pieces of sample manual labels are randomly screened to serve as a testing set, a relation extraction data set containing various entity types and various relations is finally obtained, further training is carried out through remote supervision relation extraction models such as PCNNs, CIL, BGWA, and a remote supervision algorithm aiming at specific knowledge graph relations is established. And finally, extracting the medical relation-literature association evidence information on a large scale by using a trained remote supervision relation extraction method.
In this example 2, knowledge reliability evaluation was performed in combination with a knowledge-document correlation network. First, all sentences containing specific medical knowledge relations are obtained by using a remote supervision method. Since all sentences are derived from the PubMed document library, the information of the context, the bibliographic information, the references, the author institutions, the publication dates, the cited documents and the like of the documents can be obtained through PubMedIDs associated with each sample. Thereby forming a knowledge-document association network, specifically comprising two types of nodes of triplet knowledge and document, the knowledge-document and the document referencing the two types of relationships. In addition, knowledge reliability assessment needs to take other features of knowledge-related documents into account, and other information of the documents, such as document authors (authors), whether they belong to research properties (is_research), relative Citations (RCR), experimental or research objects (human), clinical transformation potential (APT), whether they belong to clinical research (is_clinical), the number of cited articles (citation _count), etc., is obtained from icite. Finally, the reliability evaluation of knowledge is performed by utilizing the node characteristics (node degree, centrality and the like) and the edge characteristics (edge weight, edge intermediation centrality and the like) of the complex network and combining the multidimensional characteristics of the literature.
For a particular piece of knowledge, the evaluation is performed from multiple angles of remote supervision model (DS), complex Network (NET), related Literature (LIT) features:
Score=ScoreDS+ScoreNET+ScoreLIT;
Where Score DS is the predictive Score, predicted by inputting the knowledge context into the remote monitoring model formed above. Score NET is an importance Score of knowledge in the network, and the importance of a certain knowledge node i in the network is expressed by using a normalized medium, and the formula is as follows:
where g st is the number of shortest paths from node s to node t, Represents the number of shortest paths through the knowledge node i among g st shortest paths from node s to node t. Score LIT is a literature feature Score, first excluding review articles through the is_research field. Second, using the authors field, the article diversity of the knowledge is calculated using the following formula:
Where authors represent a collection of authors of a document and i represents the sequence number of the knowledge-related document.
Since a knowledge is related to a plurality of documents, indexes such as RCR, APT, citiation _count and the like are summed, averaged and median calculated. Modeling the three values and ARTICLEDIVERSITY of the three indexes, constructing a neural network two-classification model, using the labeled dataset in the step 1 as a training set, using various indexes of the knowledge as characteristics, and using whether the knowledge is reliable (0/1) as a label. And predicting each knowledge by using the trained model to obtain the Score LIT.
Example 3
An embodiment 3 of the present invention provides an electronic device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, where the method includes the following steps:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Example 4
Embodiment 4 of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a knowledge assessment method based on a medical knowledge graph of an associated document, the method comprising the steps of:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
Example 5
Embodiment 5 of the present invention provides a computer device including a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, the method including the steps of:
Acquiring associated evidence information of a medical entity and a literature based on remote supervision;
based on the significance test, constructing associated evidence information of the medical relation and the literature;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it should be understood that various changes and modifications could be made by one skilled in the art without the need for inventive faculty, which would fall within the scope of the invention.
Claims (6)
1. A knowledge assessment method of a medical knowledge graph based on a related document, comprising:
Based on remote supervision, acquiring associated evidence information of a medical entity and a document, including: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents;
Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;
integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;
Based on the constructed knowledge-associated literature knowledge graph and literature-associated literature graph, evaluating the reliability of the acquired medical knowledge, including: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
2. The knowledge-based medical knowledge graph knowledge assessment method based on the relevant literature according to claim 1, wherein the assessment is performed from multiple angles of a remote supervision model, a knowledge-literature association network and relevant literature features for a specific piece of knowledge.
3. A knowledge-based evaluation system for medical knowledge graph based on associated literature, comprising:
the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; comprising the following steps: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents; based on the significance test, constructing associated evidence information of the medical relation and the literature; comprising the following steps: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;
the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;
The evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph; comprising the following steps: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.
4. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the knowledge-based method of medical knowledge-graph evaluation based on associated literature as claimed in claim 1 or 2.
5. A computer program product comprising a computer program for implementing the knowledge-based method of medical knowledge-graph of an associated document as claimed in claim 1 or 2 when run on one or more processors.
6. An electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as claimed in claim 1 or 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211541515.3A CN116110594B (en) | 2022-12-02 | 2022-12-02 | Knowledge evaluation method and system of medical knowledge graph based on associated literature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211541515.3A CN116110594B (en) | 2022-12-02 | 2022-12-02 | Knowledge evaluation method and system of medical knowledge graph based on associated literature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116110594A CN116110594A (en) | 2023-05-12 |
CN116110594B true CN116110594B (en) | 2024-05-07 |
Family
ID=86255135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211541515.3A Active CN116110594B (en) | 2022-12-02 | 2022-12-02 | Knowledge evaluation method and system of medical knowledge graph based on associated literature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116110594B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256828A (en) * | 2020-10-20 | 2021-01-22 | 平安科技(深圳)有限公司 | Medical entity relationship extraction method and device, computer equipment and readable storage medium |
CN112417166A (en) * | 2020-11-20 | 2021-02-26 | 山东省计算中心(国家超级计算济南中心) | Knowledge graph triple confidence evaluation method |
CN112885478A (en) * | 2021-01-28 | 2021-06-01 | 平安科技(深圳)有限公司 | Medical document retrieval method, medical document retrieval device, electronic device, and storage medium |
CN113505244A (en) * | 2021-09-10 | 2021-10-15 | 中国人民解放军总医院 | Knowledge graph construction method, system, equipment and medium based on deep learning |
CN113707297A (en) * | 2021-08-26 | 2021-11-26 | 平安国际智慧城市科技股份有限公司 | Medical data processing method, device, equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200151560A1 (en) * | 2018-11-09 | 2020-05-14 | Silot Pte. Ltd. | Apparatus and methods for using bayesian program learning for efficient and reliable generation of knowledge graph data structures |
-
2022
- 2022-12-02 CN CN202211541515.3A patent/CN116110594B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256828A (en) * | 2020-10-20 | 2021-01-22 | 平安科技(深圳)有限公司 | Medical entity relationship extraction method and device, computer equipment and readable storage medium |
WO2021151353A1 (en) * | 2020-10-20 | 2021-08-05 | 平安科技(深圳)有限公司 | Medical entity relationship extraction method and apparatus, and computer device and readable storage medium |
CN112417166A (en) * | 2020-11-20 | 2021-02-26 | 山东省计算中心(国家超级计算济南中心) | Knowledge graph triple confidence evaluation method |
CN112885478A (en) * | 2021-01-28 | 2021-06-01 | 平安科技(深圳)有限公司 | Medical document retrieval method, medical document retrieval device, electronic device, and storage medium |
CN113707297A (en) * | 2021-08-26 | 2021-11-26 | 平安国际智慧城市科技股份有限公司 | Medical data processing method, device, equipment and storage medium |
CN113505244A (en) * | 2021-09-10 | 2021-10-15 | 中国人民解放军总医院 | Knowledge graph construction method, system, equipment and medium based on deep learning |
Non-Patent Citations (2)
Title |
---|
Predicting Treatment Relations with Semantic Patterns over Biomedical Knowledge Graphs;Bakal;《2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)》;20151231;第9468卷;586-596 * |
中文医学文献的实体关系提取研究及在糖尿病医学文献中的应用;范智渊;《生物医学工程学杂志》;20211231;第38卷(第03期);563-573 * |
Also Published As
Publication number | Publication date |
---|---|
CN116110594A (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ali | Stroke prediction using distributed machine learning based on apache spark | |
Fang et al. | Feature Selection Method Based on Class Discriminative Degree for Intelligent Medical Diagnosis. | |
WO2015093541A1 (en) | Scenario generation device and computer program therefor | |
Wu et al. | Topic evolution based on LDA and HMM and its application in stem cell research | |
Das et al. | A graph based clustering approach for relation extraction from crime data | |
KR101875306B1 (en) | System for providing disease information using cluster of medicine teminologies | |
Banna et al. | A hybrid deep learning model to predict the impact of COVID-19 on mental health from social media big data | |
CN109036577A (en) | Diabetic complication analysis method and device | |
Role et al. | Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation | |
Wan et al. | Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks | |
Jatav | An algorithm for predictive data mining approach in medical diagnosis | |
Cao et al. | Multi-information source hin for medical concept embedding | |
Sumathi et al. | Improved fuzzy weighted‐iterative association rule based ontology postprocessing in data mining for query recommendation applications | |
Ma et al. | Constructing a semantic graph with depression symptoms extraction from twitter | |
Comito et al. | AI-driven clinical decision support: enhancing disease diagnosis exploiting patients similarity | |
Juršič et al. | Bridging concept identification for constructing information networks from text documents | |
CN111986814A (en) | Modeling method of lupus nephritis prediction model of lupus erythematosus patient | |
Saranya et al. | Intelligent medical data storage system using machine learning approach | |
Wegrzyn-Wolska et al. | Social media analysis for e-health and medical purposes | |
CN105205075B (en) | From the name entity sets extended method of extension and recommended method is inquired based on collaboration | |
Tandjung et al. | Topic modeling with latent-dirichlet allocation for the discovery of state-of-the-art in research: A literature review | |
CN116110594B (en) | Knowledge evaluation method and system of medical knowledge graph based on associated literature | |
CN114496231A (en) | Constitution identification method, apparatus, equipment and storage medium based on knowledge graph | |
Dost et al. | Unraveling the Hepatitis B Cure: A Hybrid AI Approach for Capturing Knowledge about the Immune System's Impact | |
Xu et al. | Biological entity relationship extraction method based on multiple kernel learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |