CN114005507A

CN114005507A - Clinical medication risk assessment method and system based on knowledge graph

Info

Publication number: CN114005507A
Application number: CN202111113981.7A
Authority: CN
Inventors: 陈龙彪; 林志铭; 蔡晓海; 陈思耀; 王程
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2022-02-01

Abstract

A clinical medication risk assessment method and system based on knowledge graph is characterized in that a medication risk knowledge graph is constructed by using real prescription data and medication specification data, medication risks of a super specification in medical data are mined by using a machine learning technology, medication risks in the medical knowledge graph are complemented, a comprehensive and complete medical knowledge graph is finally obtained, and clinical prescription risks are detected by using the medical knowledge graph. The medical knowledge graph is adopted to extract corresponding characteristics so as to judge the risk of ultra-risk medication, and the method has excellent extraction efficiency and accuracy.

Description

Clinical medication risk assessment method and system based on knowledge graph

Technical Field

The invention relates to the field of reasonable medication safety, in particular to a clinical medication risk assessment method and system based on a knowledge graph.

Background

The reasonable medication means that the disease is effectively, safely and economically prevented, treated and cured. According to the World Health Organization (WHO) estimates that one third of the dead patients die worldwide each year from illicit medication. In China, the clinical unreasonable medication accounts for 12-32% of the number of cases. The medication safety is an important component of the safety of patients and is the core of the medical quality. Unsafe prescriptions are one of the main reasons for threatening the safety of medication. The risk factors affecting the safety of the prescription are numerous, and mainly include adverse drug interaction, drug incompatibility, improper drug selection or excessive drug selection, repeated drug use and the like.

In order to detect the prescription risk, the traditional method is a manual prescription examination method, namely, an expert pharmacist examines the prescription risk, and the defects are obvious: the workload is large, and the spot check is usually only carried out. With the popularization of information technology, medication monitoring software is deployed and used in various hospitals, the software is a rule base constructed based on basic knowledge of medicine and related subjects, medication logic cannot be completely covered only by relying on the rule base, medication risks of superinstructions which are not covered in the rule base cannot be accurately detected, risk assessment is easy to make mistakes when an existing medication monitoring system is directly applied to prescriptions, and wrong reports and missed reports are often caused in clinical practice.

Disclosure of Invention

The invention mainly aims to overcome the defects in the prior art, provides a clinical medication risk assessment method and system based on a knowledge graph, extracts corresponding characteristics through a medical knowledge graph containing medical text knowledge and clinical medication experience so as to judge the risk of ultra-risk medication, and has excellent extraction efficiency and accuracy.

The invention adopts the following technical scheme:

a clinical medication risk assessment method based on a knowledge graph is characterized by comprising the following steps:

s1, acquiring and processing the drug specification data and the prescription data, and extracting drug attribute information and prescription relation information;

s2, constructing a medical knowledge map containing medical text knowledge and clinical medication experience according to the drug attribute information and the prescription relationship information;

s3, extracting the medication risk relationship in the medical knowledge graph, and marking the medication risk level on the medical knowledge graph;

s4, extracting a plurality of features according to the medical knowledge graph to train three xgboost classification algorithm models: the method comprises the following steps of (1) detecting the medication risk of a super-instruction of a medical knowledge graph by using a drug interaction risk model, a drug-induced risk model and an individual medication risk model, wherein the trained three models are respectively used for correspondingly detecting the drug combination risk, the drug-induced risk and the individual medication risk, and marking the detected risks in the medical knowledge graph;

s5, detecting the medication risk of the clinical prescription by adopting the medical knowledge map obtained in the step S4.

Preferably, the step S1 includes

S11, processing the data of the drug specification, extracting drug attribute information, which comprises:

s111, extracting important attribute information from the medicine specification data by using a method of named entity recognition in regular matching and natural language processing, wherein the method comprises the following steps: the name, indications, ingredients, contraindications, drug interactions and usage amounts of the drugs are stored in the csv file;

s112, removing redundant text information from the information extracted from the drug specification data by utilizing a regular matching and manual modification method, converting each piece of information into a format (drug attribute name attribute value) and storing the format in a txt file;

s12, processing the prescription data, extracting prescription relation information, which includes:

s121, data cleaning: de-branding the medicine, namely deleting the manufacturer brand in the data bracket; deleting all prescription data of the diagnosed diseases, which appear less than twice in the database, and deleting prescription data of which the data in the database contain null values;

and S122, converting each pair of relations of each piece of data in the database table into a format of (object 1, relation name object 2) and storing the format in the txt file.

Preferably, the S2 specifically includes the following steps:

s21, downloading a neo4j graph database, and putting the file containing the drug attribute information and the prescription relationship information obtained in the step S1 into an import folder in the database;

s22, writing an import instruction by using a neo4j import statement;

s23, executing the sentences on the neo4jweb console, and storing files containing drug attribute information and prescription relation information into the database.

Preferably, the S3 specifically includes the following steps:

s31, marking the combined use risk among the medicines;

s32, marking the risk of the drug between the drug and the disease diagnosis;

s33, marking the risks of the drugs used by different individual patients.

Preferably, the S31 specifically includes the following steps:

s311, extracting a drug pair which represents drug interaction and has an interaction risk level of 3 in the medical knowledge graph, screening out two drugs in the drug pair which never appear in the same prescription, and adding/marking a risk relationship side with a weight of 3 to the drug pair on the medical knowledge graph;

and S312, extracting the relationship pairs among the medicines with the risk level equal to 0 and the occurrence frequency of more than 10 times or the combination frequency of more than 30% in the historical prescription, and adding a risk relationship side with the weight of 0 to the medicine pairs on the medical knowledge graph.

Preferably, the S32 specifically includes the following steps:

s321, extracting a medicine-diagnosis pair which is in a contraindication relation between medicines and diagnoses, screening out the medicine-diagnosis pair which never appears in the same prescription, and adding a risk relation side with the weight of 3 to the medicine-diagnosis pair on a medical knowledge graph;

s322, screening out drug-diagnosis pairs which appear more than 10 times in the historical prescription or have the combination frequency of more than 30%, and adding a risk relation edge with the weight of 0 to the drug-diagnosis pairs on the medical knowledge map.

Preferably, the S33 specifically includes the following steps:

s331, extracting a drug-group pair which is in a contraindication relation between a drug and a patient group, screening out the drug-group pair which is never used for a certain group in a prescription, and adding a risk relation side with the weight of 3 to the drug-group pair on a medical knowledge graph;

s332, extracting the relation between the medicine and the group as applicable, wherein medicine-group pairs with the use cases more than 10 times or the combination frequency more than 30% appear in the historical prescription, and adding a risk relation edge with the labeling weight of 0 to the medicine-group pairs on the medical knowledge graph.

Preferably, in S4, the training of the drug interaction risk model specifically includes the following steps:

s411, extracting 300 medicine pairs which are used together and do not accord with the description in the specification;

s412, taking the level D (drug1, drug2) as a characteristic according to the drug risk level marked on the medical knowledge map;

s413, taking the frequency f (drug1, drug2) of the drug pairs connected on the medical knowledge map as a characteristic, wherein the frequency is the ratio of the number of edges N (drug1, drug2) of the drug pairs to the number of edges N (drug1) and N (drug2) of the two drugs connected with all the drugs,

the concrete formula is as follows:

s414, using the frequency of the medicine pairs as the characteristic;

s415, calculating the Preferential connection of the two drugs in the drug pair to the Preferred Attachment (PA), and obtaining the characteristic by using the preferred Attachment algorithm of the knowledge graph:

the concrete formula is as follows: PA (drug1, drug2) ═ N (drug1) | × | N (drug2) |;

s416, determining whether the medicine pair is in the co-group (drug1, drug2) of the same community as a characteristic;

and S417, taking the five characteristics of the medicine pairs extracted from the medical knowledge graph in the steps S412 to S416 as input of a model, cross-evaluating the joint risk level of the medicine pairs according to the standard of the medication risk level, taking the marked joint risk level as a label, and training a medicine interaction risk model by using an xgboost algorithm.

Preferably, in S4, the training drug risk model specifically includes the following:

s421, extracting 300 pairs of diagnosis-medicine pairs which do not conform to the description in the specification;

s422, obtaining the characteristic of the relation D (drug) between the medicine and the diagnosis according to the medicine-taking risk level marked on the medical knowledge map;

s423, calculating the connection frequency f (drug) between the drug and the diagnosed disease as a characteristic, the specific formula is:

wherein N (drug) represents the number of times that the drug and the diagnosis drug appear in the same prescription, N (drug) represents the number of times that the drug appears in the prescription, and N (drug) represents the number of times that the diagnosis drug appears in the prescription;

s424, calculating the occurrence frequency of the medicine in the diagnosis prescription as a characteristic, wherein the specific formula is as follows: f (drug, diagnosis) ═ N (drug, diagnosis);

s425, calculating different drug connection numbers C (drug, diagnosis) of the diagnosis nodes as characteristics;

s426, using the co-group (drug1, drug2) of the same community as the drug and the diagnosis as a characteristic;

and S427, taking the five characteristics extracted from the medical knowledge graph in the steps S422 to S426 as input of a model, cross-evaluating the drug non-symptomatic risk level according to the standard of the drug use risk level, taking the labeled drug non-symptomatic risk level as a label, and training the drug non-symptomatic risk model by using an xgboost algorithm.

Preferably, in S4, the training of the individual medication risk model specifically includes the following steps:

s431, extracting the risk characteristics of the non-symptomatic drugs from the knowledge graph, and performing machine learning task detection on the combined risk of the drugs;

s432, obtaining the characteristics of a relation D (drug) between a drug and a patient group according to the drug risk level marked on the medical knowledge graph;

s433, the connection frequency f (drug) between the medicine and the diagnosed disease is taken as a characteristic, and the specific formula is as follows:

s434, using the frequency of using the drug in the prescription of the population to which the patient belongs as a feature f (role, drug), the specific formula is:

wherein N (role, diagnosis, drug) indicates the number of times that the drug and the diagnosis are in the same prescription, and the patient belongs to the role of the human group, and N (role, diagnosis) indicates the number of times that the drug and the diagnosis are in the same prescription;

s435, calculating the number of different groups connected by the medicines as characteristics;

s436, calculating the mixed Gaussian distribution of the dosage of the medicine;

s437, referring to the weight of the patient and the age range of the patient;

s438, taking the above 6 features extracted from the prescription as input of the model, cross-evaluating the individual medication risk level according to the medication risk level standard, taking the marked individual medication risk level as a label, and training the individual medication risk model by using an xgboost algorithm.

A clinical medication risk assessment system based on knowledge-graph is characterized by comprising the following components:

the data processing module is used for acquiring and processing the drug specification data and the prescription data and extracting drug attribute information and prescription relation information;

the medical knowledge map module is used for constructing a medical knowledge map containing medical text knowledge and clinical medication experience according to the drug attribute information and the prescription relation information;

the marking module is used for extracting the medication risk relationship in the medical knowledge graph and marking the medication risk level on the medical knowledge graph;

the super-explanation medication risk module extracts a plurality of characteristics according to the medical knowledge graph and trains three classification models by using an xgboost algorithm: the risk model of drug interaction, the risk model of drug non-symptomatic and the risk model of individual medication, and the trained model is used for detecting the medication risk of the super-specification in the prescription, and the three models respectively and correspondingly detect the drug combination risk, the drug non-symptomatic risk and the individual medication risk;

and the prescription medication risk evaluation module detects the medication risk of the clinical prescription by adopting a medical knowledge map.

As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:

1. the invention constructs the medication risk knowledge graph by using the real prescription data and the medication specification data, mines the medication risk of the super specification in the medical data by using the machine learning technology, completes the medication risk in the medical knowledge graph, finally obtains a comprehensive and complete medication risk graph, detects the clinical prescription risk by using the medicine risk graph, and has excellent extraction efficiency and accuracy.

2. The invention utilizes the medicine specification and the real prescription data to construct the knowledge map to obtain the knowledge map with the medicine specification text knowledge and clinical medication experience, combines medical text knowledge and clinical practice experience analysis, and detects prescription risks from two aspects of known risk markers and overdescript medication risk detection, so that the invention is more comprehensive and flexible and avoids causing conflicts.

3. The invention finds out the medication mode which accords with the description of the medicine specification and the clinical medication practice on data from the knowledge map through the known risk marker, and marks the corresponding risk level of the specification description.

4. The method extracts the medication risk characteristics based on the knowledge graph, detects the super-instruction medication risk in the knowledge graph by using a machine learning classification technology, takes the medication mode marking risk value of a professional pharmacist as a label, performs a machine learning classification task for detecting the super-instruction medication risk, improves the detection precision and is safer.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is an illustration data example;

FIG. 3 is an example of prescription data;

FIG. 4 is a graph database import statement;

FIG. 5 is a graph database after importing data;

FIG. 6 is a flow chart of medication for a superscript;

FIG. 7 is a medication risk level table;

FIG. 8 is an exemplary table of medication risk characteristics for a superscript;

the invention is described in further detail below with reference to the figures and specific examples.

Detailed Description

The invention is further described below by means of specific embodiments.

A clinical medication risk assessment method based on a knowledge graph comprises the following steps:

and S1, acquiring the drug specification data and the prescription data for processing, and extracting drug attribute information and prescription relation information. In this step, the drug order data is in docx format, see fig. 2. Referring to fig. 3, the prescription data is data in db format. The method specifically comprises the following steps:

and S112, removing redundant text information from the information extracted from the drug specification data by utilizing a regular matching and manual modification method, converting each piece of information into a format (drug attribute name attribute value), and storing the format in a txt file.

S2, constructing a medical knowledge map containing medical text knowledge and clinical medication experience according to the drug attribute information and the prescription relation information.

S21, downloading the neo4j graph database, and putting the file containing the drug attribute information and the prescription relationship information obtained in the step S1, namely the txt file obtained in the previous step, into an import folder in the database.

S22, writing an import instruction by using a neo4j import statement, and referring to FIG. 4;

s23, executing the sentences on the neo4jweb console, and storing files containing drug attribute information and prescription relationship information into a database, see FIG. 5.

S3, extracting the medication risk relationship in the medical knowledge graph, and marking the medication risk level on the medical knowledge graph.

In this step, the medication risk relationship, including both permitted and prohibited use, that meets both medical text rules and historical prescription experience is extracted, and medication risk levels are labeled on the graph, with the risk level descriptions shown in table 1. The method comprises the following three steps:

and S31, marking the combined use risk among the medicines. The method specifically comprises the following steps:

s312, extracting the risk level equal to 0, adding a risk relation edge with the weight of 0 to the medicine pair with the modulo frequency of 10 times or the combination frequency of more than 30% in the historical prescription, and adding the medicine pair on the medical knowledge graph.

S32, marking the medicine between the medicine and the disease diagnosis without the risk of the disease. The method specifically comprises the following steps:

S33, marking the risks of the drugs used by different individual patients. The method specifically comprises the following steps:

S4, extracting a plurality of features according to the medical knowledge graph to train three xgboost classification algorithm models: the three models respectively correspond to detection of drug combination risk, drug non-symptomatic risk coming and individual drug use risk and are marked in the medical knowledge map.

In the prescription, three entities of medicine, disease diagnosis and patient mainly exist, so when the medicine taking risk is detected, the medicine combination risk between medicines, the medicine non-symptomatic risk between medicines and disease diagnosis and the individualized medicine taking risk between medicines and patients are detected according to three pairs of relations of medicine to medicine, medicine to disease and medicine to patient.

Wherein, training the drug interaction risk model specifically comprises the following steps:

s411, extracting 300 medicine pairs which are not matched with the medicine combination and described in the specification, wherein the specification describes that the medicine has an inhibiting effect on the combination, and the using cases still exist in the prescription. Features are then extracted from the knowledge-graph (S412-S416).

S412, according to the medication risk level marked on the medical knowledge map, the medication risk level comprises the joint use risk level among the medicines, and is an important reference for judging the risk, and the level D (drug1, drug2) is taken as a characteristic;

s413, taking the frequency f (drug1, drug2) of the drug pairs connected on the medical knowledge map as a characteristic, wherein the frequency is the ratio of the number of edges N (drug1, drug2) of the drug pairs to the number of edges N (drug1) and N (drug2) of the two drugs connected with all the drugs, and the specific formula is as follows:

the higher the frequency, the more frequent the combination of the drugs, the lower the risk of reflecting the two nodes connected to each other to some extent, and the higher the risk otherwise.

And S414, taking the use frequency of the medicine pair as a characteristic. The use risk of the common medicines in the prescription is lower than that of the non-common medicines, and the risk is higher when the medicines are very common.

S415, calculating Preferential connection Priority Attachment (PA) of the two drugs in the drug pair, wherein the characteristic represents the achievement of the connection number of the drug pair and reflects the level of the connection number of the drug pair, and if the connection number of the two drug nodes is large, the probability of connection of the node pair is high, and the risk is low. This feature is obtained using the Preferential assignment algorithm of the knowledge graph:

the concrete formula is as follows: PA (drug1, drug2) ═ N (drug1) | × | N (drug2) |.

S416, whether the medicine pairs are in the same community co-group (drug1, drug2) is used as a characteristic, the community division is carried out on the map by using a Louvain algorithm when the MRKG is constructed, if the medicine pairs are in the same community, the risk of medicine combination is small, and otherwise, the risk is high.

Extracting the drug asymptomatic risk characteristics from the knowledge graph, performing a machine learning task to detect the drug asymptomatic risk, wherein the training drug asymptomatic risk model specifically comprises the following steps:

s421, extracting 300 pairs of medicine-diagnosis-medicine pairs which do not conform to the diagnosis-medicine pairs described in the specification, wherein the indication of the medicine described in the specification does not include the diagnosis disease in the prescription;

s422, obtaining the characteristic of the relation D (drug) between the medicine and the diagnosis according to the medicine-taking risk level marked on the medical knowledge map; the characteristic values are respectively marked as-1, 0 and 1 according to the contraindication relationship, the irrelative relationship and the indication relationship. This is based on the description of the risk between drug and diagnosis in the specification, and is an important reference for the judgment of the risk.

wherein N (drug) represents the number of times of drug and diagnostic drug appearing in the same prescription, N (drug) represents the number of times of drug appearing in the prescription, and N (drug) is shown in the tableThe number of times the diagnosis occurred in the prescription is shown.

This frequency reflects the degree of prevalence of the drug in diagnostic use, with the risk of prevalence of disease often being lower than that of uncommon use, reflecting to some extent that the risk of two nodes connected is lower, and vice versa.

S424, calculating the occurrence frequency of the medicine in the diagnosis prescription as a characteristic, wherein the more times the medicine is used for the disease in the prescription, the lower the risk is generally, otherwise, the higher the risk is, and the specific formula is as follows:

f(drug,diagnosis)＝N(drug,diagnosis)；

s425, calculating the number C (drug) of different drug connections of the diagnosis node as a characteristic, wherein if the drug administration of the diagnosis has multiple choices, the substitutability is higher, and the risk is lower.

S426, using the co-group (drug1, drug2) of the same community as the drug and the diagnosis as a characteristic; if the medication is in the same community as the diagnosis, the risk is lower, otherwise it is higher. For example, the diagnosis in the prescription is gynecological diseases, and the medicines in the prescription are gynecological medicines, so that the prescription is more reasonable and safer; risks are more likely if pediatric subject community medication is present.

Further, extracting individual medication risk features from the medical knowledge map to perform machine learning tasks to detect individual medication risks, wherein the training of the individual medication risk model specifically comprises the following steps:

and S431, extracting the risk characteristics of the drugs not corresponding to the symptoms from the knowledge graph, and performing machine learning task detection on the risk of drug combination.

S432, obtaining the characteristics of a relationship D (drug) between the drug and the patient group according to the medication risk level marked on the medical knowledge graph, and recording the characteristic values as-1, 0 and 1 according to a taboo relationship, a no relationship and an applicable relationship. This is based on the description of the risk of use for a patient population in the drug insert and is an important reference for assessing risk.

wherein N (role, diagnosis, drug) indicates the number of times that the drug and the diagnostic diagnosis are prescribed in the same prescription, and N (role, diagnosis) indicates the number of times that the patient belongs to the role of the human population.

Because a certain patient group often uses a certain drug, the risk of taking the drug is generally lower, and vice versa.

S435, calculating the number of different groups connected by the medicines as a characteristic, and if a certain medicine is suitable for various groups, reflecting that the use risk is lower to a certain extent, otherwise, the use risk is higher.

And S436, calculating the mixed Gaussian distribution of the dose of the medicine, wherein if the dose of the medicine in the prescription is in a high distribution area of the historical prescription dose of the medicine, the dose is in a more reasonable range, and otherwise, the dose is at risk.

S437, referring to the weight of the patient and the age range of the patient, the weight and age of the patient are important physical characteristics, and in clinical practice, most of the dosage of the drugs will be prescribed according to the weight and age.

And finally, detecting the medication risk of the super-specification in the medical knowledge graph by using the trained model, and marking the detected risk in the graph to obtain a more complete knowledge graph containing the medication risk, so that the knowledge graph is used for risk query.

The invention also provides a clinical medication risk assessment system based on the knowledge graph, which comprises the following steps:

the super-explanation medication risk module extracts a plurality of characteristics according to the medical knowledge graph and trains three classification models by using an xgboost algorithm: the three models respectively correspond to detection of drug combination risk, drug non-symptomatic risk coming and individual drug use risk.

The system of the invention applies the clinical medication risk assessment method based on the knowledge graph to carry out medication risk assessment on the prescription.

The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims

1. A clinical medication risk assessment method based on a knowledge graph is characterized by comprising the following steps:

2. The method of claim 1, wherein the step S1 comprises

3. The method for clinical medication risk assessment based on a knowledge-graph of claim 2, wherein the S2 specifically comprises the following:

s22, writing an import instruction by using a neo4j import statement;

4. The method for clinical medication risk assessment based on a knowledge-graph of claim 2, wherein the S3 specifically comprises the following:

s31, marking the combined use risk among the medicines;

s32, marking the risk of the drug between the drug and the disease diagnosis;

s33, marking the risks of the drugs used by different individual patients.

5. The method for clinical medication risk assessment based on a knowledge-graph of claim 4, wherein the S31 specifically comprises the following steps:

6. The method for clinical medication risk assessment based on a knowledge-graph of claim 4, wherein the S32 specifically comprises the following steps:

7. The method for clinical medication risk assessment based on a knowledge-graph of claim 4, wherein the S33 specifically comprises the following steps:

8. The method of claim 1, wherein the training of the drug interaction risk model in S4 specifically comprises the following steps:

the concrete formula is as follows:

s414, using the frequency of the medicine pairs as the characteristic;

9. The method for clinical medication risk assessment based on knowledge-graph of claim 1, wherein in S4, the training-drug risk model specifically comprises the following:

10. The method for clinical medication risk assessment based on knowledge-graph of claim 1, wherein in S4, the training of the individual medication risk model specifically comprises the following steps:

s437, referring to the weight of the patient and the age range of the patient;

11. A clinical medication risk assessment system based on knowledge-graph is characterized by comprising the following components: