CN116110594B

CN116110594B - Knowledge evaluation method and system of medical knowledge graph based on associated literature

Info

Publication number: CN116110594B
Application number: CN202211541515.3A
Authority: CN
Inventors: 花睿; 周雪忠; 舒梓心; 杨扩
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2024-05-07
Anticipated expiration: 2042-12-02
Also published as: CN116110594A

Abstract

The invention provides a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on a related document, which belong to the technical field of clinical medicine, and acquire related evidence information of a medical entity and the document based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature; integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document; and evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph. According to the invention, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature association network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.

Description

Knowledge evaluation method and system of medical knowledge graph based on associated literature

Technical Field

The invention relates to the technical field of clinical medicine, in particular to a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on associated literature.

Background

The medical knowledge graph (KnowledgeGraph, KG) is a basic stone for realizing intelligent medical treatment, and is expected to bring more efficient and accurate medical services. The knowledge graph connects the trivial and scattered medical information knowledge to support comprehensive auxiliary diagnosis and treatment, knowledge retrieval question-answering and intelligent medical treatment. In recent years, along with the wide application of big data and artificial intelligence technology in the medical field, the research subjects gradually develop towards diversification, and the construction of various specialized disease knowledge bases including electronic medical records, clinical paths and the like becomes a research hotspot. However, on one hand, most medical knowledge maps only record the type or origin of knowledge sources, and lack more accurate and reliable knowledge evidence information. On the other hand, the existing knowledge graph still has a large amount of unreliable information, which creates great challenges for further knowledge reasoning and discovery. Therefore, there is a need to construct a reliable knowledge graph construction and evaluation method.

The medical knowledge graph is a vertical domain knowledge graph, is a graph-based semantic network, and represents the relationship between biomedical entities, and aims to improve the quality of search results and the retrieval efficiency. However, due to the individuality and complexity of medical research environments and human systems, in the case that conditions and environments are not controlled, there are a large number of false positive research results, which are normal and reasonable phenomena, and the generation of biomedical knowledge is often based on specific conditions, so how to identify the environments and conditions required by the existing medical knowledge by the method of the related literature is very important. And the method fully utilizes the characteristics of authors, contexts, keywords, titles and the like of the documents, establishes the association of medical knowledge and the documents, and determines the meaning of the medical knowledge under different situations.

With the arrival of big data and artificial intelligence technology age, intelligent medical treatment with the core of digitalization, informatization and intelligence is rapidly developed and widely applied in the biomedical field, and simultaneously, a large amount of electronic medical record and scientific literature data are accumulated. How to store and effectively utilize large-scale medical data is a problem to be solved. Knowledge maps are widely used as an effective way to store structured knowledge. Because the construction modes of the medical knowledge graph are various, knowledge from different sources can have different degrees of reliability under different use scenes, and meanwhile, the medical knowledge graph is always in a continuously perfect and updated state because of the continuous discovery of new medical knowledge. The existing medical knowledge graph has respective evaluation standards according to different data integration modes, data sources, data analysis modes and the like. For example, forum uses existing semantic networks to infer new relationships and uses methods such as enrichment analysis to construct weighted knowledge maps. MALACARDS disease database measures reliability of disease-gene relationships by constructing custom MIFTS, MSRS, MCRS and other scoring criteria. Most medical knowledge graphs are constructed by integrating databases of different sources, and the reliability problem of knowledge under different situations is not considered. Most of the existing medical knowledge maps only record the sources and sources of knowledge, but the medical knowledge needs to consider the context information, the reliability of knowledge from different sources is often different, and direct integration can lead to unreliable overall knowledge. In addition, no effective knowledge reliability evaluation method exists for the existing knowledge graph.

Hu Manman et al propose a method and system for constructing a medical knowledge graph based on neural network and remote supervision by acquiring a medical text set and a medical entity set, training a named entity recognition and relationship extraction model by using a remote supervision method, and automatically extracting and constructing a medical knowledge graph containing candidate entities and relationships thereof from large-scale unstructured data. However, since the medical text set is used as a corpus, the reliability of the corpus cannot be verified, and the remote labeling data with a large amount of noise is directly input into the model, the expected effect is difficult to achieve.

Wang Yalin et al propose a knowledge-graph triplet reliability assessment method, which is to train a binary neural network by constructing positive and negative samples and other graph information through randomly replacing entities or relations by utilizing embedded vectors of a pre-training model, and evaluate the reliability of triplet knowledge, but only consider the information of the graph itself and have no associated external evidence information.

Disclosure of Invention

The invention aims to provide a knowledge evaluation method and a knowledge evaluation system of a medical knowledge graph based on related literature, which are used for solving at least one technical problem in the background technology.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in one aspect, the invention provides a knowledge assessment method of a medical knowledge graph based on a related document, which comprises the following steps:

Acquiring associated evidence information of a medical entity and a literature based on remote supervision;

based on the significance test, constructing associated evidence information of the medical relation and the literature;

integrating remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relationship and the document;

And evaluating the reliability of the acquired medical knowledge based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.

Preferably, based on remote supervision, obtaining associated evidence information of the medical entity and the document includes: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.

Preferably, constructing the associated evidence information of the medical relationship and the document based on the significance test includes: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model.

Preferably, for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, and sorting, taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.

Preferably, evaluating the reliability of the acquired medical knowledge includes: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.

Preferably, the evaluation is performed from multiple angles of remote supervision model, knowledge-document correlation network, and relevant document features for a specific piece of knowledge.

In a second aspect, the present invention provides a knowledge assessment system based on a medical knowledge graph of an associated document, including:

the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; based on the significance test, constructing associated evidence information of the medical relation and the literature;

the building module is used for fusing remote supervision and significance test and building knowledge-associated document knowledge graph and document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;

And the evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph.

In a third aspect, the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement a knowledge-based assessment method based on a medical knowledge graph of an associated document as described above.

In a fourth aspect, the present invention provides a computer program product comprising a computer program for implementing a knowledge-based assessment method of medical knowledge-graph based on an associated document as described above, when run on one or more processors.

In a fifth aspect, the present invention provides an electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as described above.

The invention has the beneficial effects that: the medical entity, relation and literature correlation knowledge graph construction method integrating remote supervision and significance test is provided, a knowledge graph containing literature evidence is constructed, knowledge reliability evaluation is carried out by combining a knowledge-literature correlation network, the reliability evaluation problem of medical knowledge is solved, and more accurate knowledge information is provided for clinical analysis.

The advantages of additional aspects of the invention will be set forth in part in the description which follows, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a knowledge assessment method based on a medical knowledge graph of an associated document according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality. The embodiments described below by way of the drawings are exemplary only and should not be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or groups thereof.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.

It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.

Example 1

In this embodiment 1, there is provided first a knowledge assessment system of a medical knowledge graph based on a related document, including:

In this embodiment 1, a knowledge evaluation method of a medical knowledge graph based on a related document is implemented by using the system described above, including:

acquiring associated evidence information of a medical entity and a literature based on remote supervision and constructing associated evidence information of a medical relationship and the literature based on significance test by utilizing an acquisition module;

utilizing a construction module, fusing remote supervision and significance test, and constructing a knowledge-associated document knowledge graph and a document-document associated graph based on the associated evidence information of the medical entity and the document and the associated evidence information of the medical relation and the document;

And evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph by using an evaluation module.

Wherein, based on remote supervision, obtaining associated evidence information of the medical entity and the document comprises: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; and for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training for medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents.

Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; and building a remote supervision algorithm aiming at the specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model. For positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than P, the negative sample is considered to be a true and false example.

Evaluating the reliability of the acquired medical knowledge, including: and obtaining all sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents through PubMedIDs associated with each sample, thereby forming a knowledge-document association network, comprising three-tuple knowledge and two types of nodes of the documents, and two types of side relations of the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining multidimensional characteristics of the documents.

For a particular piece of knowledge, the evaluation is performed from multiple angles of a remote supervision model, a knowledge-document correlation network and related document features.

Example 2

As shown in fig. 1, in this embodiment 2, a method for constructing a knowledge graph and evaluating reliability by fusing document evidence and remote supervision is provided, which specifically includes a method for evaluating knowledge reliability by fusing a medical entity, a relationship and document-associated knowledge graph of remote supervision and significance test, and a knowledge-document-associated network.

In this embodiment 2, a medical entity, relationship and literature association knowledge graph integrating remote supervision and significance test is constructed first. An entity, medical relation and literature association knowledge graph integrating remote supervision and significance test is established, and the method mainly comprises two parts of medical entity-literature association and medical relation-literature association. And finally, through knowledge integration, a knowledge graph of the association of the medical entity, the relationship and the literature is formed.

Wherein constructing a remote supervision based medical entity-document association comprises: first, 5 entities including drugs, components, genes, diseases and symptoms are collected by integrating the existing medical knowledge base (such as UMLS, drugBank). The entity is then mapped into the literature context sentence by remote annotation. Based on the method, 1000 samples are randomly selected for manual labeling, so that a standard test set is formed. For disease and gene entity, a fine tuning test was performed in combination with existing pre-trained models biomedical-disease-ner and biomedical-gene-ner. For drugs, components and symptomatic entities, the named entity recognition of the related entities is less studied, so that the existing remote supervision named entity method (such as BOND) is used for training, and finally, large-scale entity-document associated evidence information is formed.

Constructing a medical relationship-literature association based on remote supervision and significance testing includes: first, all sentences containing related entities in the PubMed abstract are obtained based on a named entity recognition model. In order to form association data of the medical relation and the document, whether two entities exist in the triplet of the knowledge graph or not is judged, so that positive and negative samples are distinguished (namely, if the two entities exist in the triplet of the knowledge graph, the sentence is marked as the positive sample, and vice versa). The difference between positive and negative samples is compared through T test, and a high-quality data set is screened. For positive samples, the mean value of the similarity of each positive sample and the rest of positive samples is calculated, and the first K samples are taken as true examples. The formula is as follows:

S＝Top_k{Sim(s_i,s_j)1≤i≤j≤n}；

The similarity between two sentences s _i、s_j, such as cosine similarity, is calculated using Sim (s _i,s_j) as follows: wherein/> Vector representation representing sentence s _i,/>The vector representation representing sentence s _j is obtained by using an existing pre-trained model in the biomedical field (e.g. PubMedBERT). For each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples: p= { Sim (S _i,s_j)|1≤i≤j≤K,s_i∈S,s_j e S }).

Secondly, calculating the similarity of the negative sample s and K real examples to obtain the distribution N of the negative samples:

N＝{Sim(s,s_i)1≤i≤K,s_i∈S}；

t test is carried out on the two distributions, and p value is calculated: p=t (P, N);

if P is less than 0.05, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if the average value of N is less than the average value of P, the negative sample is considered to be the true and false example.

After the training set is constructed through the steps, 1000 pieces of sample manual labels are randomly screened to serve as a testing set, a relation extraction data set containing various entity types and various relations is finally obtained, further training is carried out through remote supervision relation extraction models such as PCNNs, CIL, BGWA, and a remote supervision algorithm aiming at specific knowledge graph relations is established. And finally, extracting the medical relation-literature association evidence information on a large scale by using a trained remote supervision relation extraction method.

In this example 2, knowledge reliability evaluation was performed in combination with a knowledge-document correlation network. First, all sentences containing specific medical knowledge relations are obtained by using a remote supervision method. Since all sentences are derived from the PubMed document library, the information of the context, the bibliographic information, the references, the author institutions, the publication dates, the cited documents and the like of the documents can be obtained through PubMedIDs associated with each sample. Thereby forming a knowledge-document association network, specifically comprising two types of nodes of triplet knowledge and document, the knowledge-document and the document referencing the two types of relationships. In addition, knowledge reliability assessment needs to take other features of knowledge-related documents into account, and other information of the documents, such as document authors (authors), whether they belong to research properties (is_research), relative Citations (RCR), experimental or research objects (human), clinical transformation potential (APT), whether they belong to clinical research (is_clinical), the number of cited articles (citation _count), etc., is obtained from icite. Finally, the reliability evaluation of knowledge is performed by utilizing the node characteristics (node degree, centrality and the like) and the edge characteristics (edge weight, edge intermediation centrality and the like) of the complex network and combining the multidimensional characteristics of the literature.

For a particular piece of knowledge, the evaluation is performed from multiple angles of remote supervision model (DS), complex Network (NET), related Literature (LIT) features:

Score＝Score_DS+Score_NET+Score_LIT；

Where Score _DS is the predictive Score, predicted by inputting the knowledge context into the remote monitoring model formed above. Score _NET is an importance Score of knowledge in the network, and the importance of a certain knowledge node i in the network is expressed by using a normalized medium, and the formula is as follows:

where g _st is the number of shortest paths from node s to node t, Represents the number of shortest paths through the knowledge node i among g _st shortest paths from node s to node t. Score _LIT is a literature feature Score, first excluding review articles through the is_research field. Second, using the authors field, the article diversity of the knowledge is calculated using the following formula:

Where authors represent a collection of authors of a document and i represents the sequence number of the knowledge-related document.

Since a knowledge is related to a plurality of documents, indexes such as RCR, APT, citiation _count and the like are summed, averaged and median calculated. Modeling the three values and ARTICLEDIVERSITY of the three indexes, constructing a neural network two-classification model, using the labeled dataset in the step 1 as a training set, using various indexes of the knowledge as characteristics, and using whether the knowledge is reliable (0/1) as a label. And predicting each knowledge by using the trained model to obtain the Score _LIT.

Example 3

An embodiment 3 of the present invention provides an electronic device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, where the method includes the following steps:

Example 4

Embodiment 4 of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a knowledge assessment method based on a medical knowledge graph of an associated document, the method comprising the steps of:

Example 5

Embodiment 5 of the present invention provides a computer device including a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to execute a knowledge assessment method based on a medical knowledge graph of an associated document, the method including the steps of:

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it should be understood that various changes and modifications could be made by one skilled in the art without the need for inventive faculty, which would fall within the scope of the invention.

Claims

1. A knowledge assessment method of a medical knowledge graph based on a related document, comprising:

Based on remote supervision, acquiring associated evidence information of a medical entity and a document, including: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents;

Based on the significance test, constructing associated evidence information of the medical relationship and the literature, comprising: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;

Based on the constructed knowledge-associated literature knowledge graph and literature-associated literature graph, evaluating the reliability of the acquired medical knowledge, including: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.

2. The knowledge-based medical knowledge graph knowledge assessment method based on the relevant literature according to claim 1, wherein the assessment is performed from multiple angles of a remote supervision model, a knowledge-literature association network and relevant literature features for a specific piece of knowledge.

3. A knowledge-based evaluation system for medical knowledge graph based on associated literature, comprising:

the acquisition module is used for acquiring the associated evidence information of the medical entity and the literature based on remote supervision; comprising the following steps: collecting 5 entities including medicines, components, genes, diseases and symptoms through an integrated medical knowledge base; mapping the entity into a literature context sentence through remote labeling; for diseases and gene entities, performing fine tuning test by combining a pre-training model, and training medicine, components and symptom entities by using a remote supervision named entity method to form associated evidence information of medical entities and documents; based on the significance test, constructing associated evidence information of the medical relation and the literature; comprising the following steps: acquiring all sentences containing related medical entities based on the named entity recognition model; distinguishing positive and negative samples by judging whether entities in sentences exist in triples of the knowledge graph or not; screening the data set by comparing the difference between the positive sample and the negative sample through T test to form association data of medical relation and literature; establishing a remote supervision algorithm aiming at a specific knowledge graph relationship, training a remote supervision relationship extraction model, and extracting medical relationship-literature association evidence information by using the trained remote supervision relationship extraction model; for positive samples, calculating the average value of the similarity of each positive sample and the rest positive samples, sorting, and taking the first K samples as a real example; for each negative sample, judging whether the negative sample is a true example or false example, firstly calculating the internal similarity of K real examples to obtain the distribution P of the real examples; secondly, calculating the similarity between the negative sample s and K real examples to obtain the distribution N of the negative samples; t test is carried out on the two distributions, and a P value is obtained through calculation; if P is smaller than the set threshold value, the negative sample is considered to be a true and false example, otherwise, the average value of the two distributions is continuously judged, and if N is smaller than the average value of P, the negative sample is considered to be the true and false example;

The evaluation module is used for evaluating the acquired medical knowledge reliability based on the constructed knowledge-associated literature knowledge graph and the literature-associated literature graph; comprising the following steps: and obtaining sentences containing specific medical knowledge relations by using a remote supervision method, obtaining relevant information of documents by using pubMed IDs associated with each sample, thereby forming a knowledge-document association network, including three-tuple knowledge and two types of nodes of the documents, and referencing two types of side relations by the knowledge-documents and the documents, and evaluating the reliability of the knowledge by using node characteristics and side relation characteristics and combining the multidimensional characteristics of the documents.

4. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the knowledge-based method of medical knowledge-graph evaluation based on associated literature as claimed in claim 1 or 2.

5. A computer program product comprising a computer program for implementing the knowledge-based method of medical knowledge-graph of an associated document as claimed in claim 1 or 2 when run on one or more processors.

6. An electronic device, comprising: a processor, a memory, and a computer program; wherein the processor is connected to the memory, and the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes the instructions for implementing the knowledge assessment method based on the medical knowledge graph of the associated document as claimed in claim 1 or 2.