CN110880362B - Large-scale medical data knowledge mining and treatment scheme recommending system - Google Patents
Large-scale medical data knowledge mining and treatment scheme recommending system Download PDFInfo
- Publication number
- CN110880362B CN110880362B CN201911117826.5A CN201911117826A CN110880362B CN 110880362 B CN110880362 B CN 110880362B CN 201911117826 A CN201911117826 A CN 201911117826A CN 110880362 B CN110880362 B CN 110880362B
- Authority
- CN
- China
- Prior art keywords
- treatment
- patient
- module
- information
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a large-scale medical data knowledge mining and treatment scheme recommending system, which comprises: the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources; the disease severity prediction module is used for obtaining a disease severity score in the treatment process of each patient; the treatment effectiveness measurement module is used for obtaining effective treatment measurement information; the patient similarity measurement module is used for constructing a similarity measurement relation of the patient; and the drug treatment scheme recommendation module is used for obtaining the next-stage drug treatment scheme recommendation. The invention judges and predicts the severity of the disease condition of the patient and defines the effectiveness measure of the treatment by the multitask bidirectional heterogeneous LSTM. And calculating the fine granularity similarity of the patient, and recommending the treatment scheme of the next stage according to the historical treatment record of the patient and the effective treatment scheme of other patients with high pathological similarity.
Description
Technical Field
The invention discloses a system for realizing discovery and recommendation of an effective drug treatment scheme by applying deep learning and knowledge introduction, belonging to the field of medical data mining.
Background
Electronic medical record (EHR) data is from millions of patients, and is currently collected and stored periodically at various medical institutions. These EHR data consist of heterogeneous data elements, typically including demographics, diagnostics, physical examinations, sensor measurements, laboratory test results, prescribed or managed medications, and clinical records, among others. With the rapid development of information technology and the rapid popularization of Electronic Medical Record Systems (EMRs), the amount of digital information stored in electronic health medical records in China has increased dramatically over the last decade. It is widely believed that a great deal of hidden knowledge is contained in the massive data, and various types of data in an electronic medical record system (EMR) provide a way to acquire medical knowledge, so that a basis is provided for improving the medical quality and efficiency. Specifically, EMR data has played an important role in many medical applications, especially in providing effective medication recommendations for physicians and patients, increasing the cure rate of disease, reducing the risk of death to clinical patients, and reducing decision costs during physician treatment and avoiding increased medical costs due to ineffective or harmful treatments.
While there is a tremendous interest in using EMRs data to improve medical performance, the gains from the analysis of EMRs data are far less than what EMRs can provide. One reason is that the prognosis of a patient is influenced by many factors, such as the age and sex of the patient, the severity of the disease, and the treatment being administered. While the EMRs data contains comprehensive information about patients, diagnosis and treatment, there is no unified framework to integrate all relevant factors for advanced data modeling. Furthermore, EMRs data is heterogeneous, vertical in nature. For example, a treatment record is a series of orders, where each order typically consists of a medication name, a route of administration, a dose, a start time, and an end time. In general, analyzing large-scale complex EMRs data, extracting medical knowledge, and promoting decision making in treatment practice is a not small challenge.
Scientists have made many beneficial explorations in electronic case data mining in order to analyze large-scale complex EMRs data. According to the data mining paper review [1] [2] applied to EMR, the Recurrent Neural Network (RNN) and its variants (LSTM, GRU) specifically used for sequential modeling can capture the complex temporal dynamics in longitudinal EMR data, which is the first choice for EMR modeling tasks. Chen, W., wang, S. [4] et al dynamically predicted the severity of Intensive Care Unit (ICU) patient' S condition using a multitasking RNN by integrating laboratory test results for different organs of the patient. However, the method in [3] does not make full use of heterogeneous data in EMR, for example, the diagnosis results and the description of the disease are meaningful for the task. Cao X, edward C et al [3] developed a treatment engine based on historical EMR data to provide patients with next-stage prescriptions based on their condition, laboratory results, treatment records, and demographic information. [4] Three different LSTM variants were proposed primarily to address the problem of data heterogeneity, but no overall framework for recommended treatment was proposed. Since the prescription for the next phase of the procedure is from historical treatment, the problem of "cold start", i.e. the treatment recommendation for the first hospitalized patient, is not addressed, and the present invention recognizes that the first 24 hours of treatment in the treatment of critically ill patients is critical. Leileilei Sun, chuanren Liu et al [5] proposed a method for developing and recommending a data-driven automatic treatment plan, mainly using important information in medical advice, and the clustering method used by the method finally obtained a few types of drug treatment combinations, which could not satisfy more refined treatment method recommendations. Meanwhile, none of the above schemes takes into account the problem of reactivity between drugs and the history of drug allergy of patients.
Reference:
[1].Shickel B,Tighe P J,Bihorac A,et al.Deep EHR:A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record(EHR)Analysis[J].IEEE Journal of Biomedical and Health Informatics,2017:1-1.
[2].Cao X,Edward C,Jimeng S.Opportunities and challenges in developing deep learning models using electronic health records data:a systematic review[J].Journal of the American Medical Informatics Association,2018.
[3].Chen,W.,Wang,S.,Long,G.,Yao,L.,Sheng,Q.Z.,Li,X.:Dynamic illness severity prediction via multi-task rnns for intensive care unit.In:ICDM(2018)
[4].Jin B.,Yang H.,Sun L.,Liu C.,Qu Y.,Tong J.A treatment engine by predicting next-period prescriptions Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining,ACM(2018),pp.1608-1616.
[5].Leilei Sun,Chuanren Liu,Chonghui Guo,Hui Xiong,and Yanming Xie.2016.Data-driven Automatic Treatment Regimen Development and Recommendation.In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,New York,NY,USA,1865–1874.
disclosure of Invention
The invention aims to provide a large-scale medical data knowledge mining and treatment scheme recommending system, which applies a heterogeneous cyclic neural network and knowledge introduction to find effective treatment segments from a large-scale electronic medical record and can explain the next-stage medicament treatment of a patient based on the fine-grained similarity of the patient so as to meet the modeling requirement and have a good effect.
In order to achieve the purpose, the invention adopts the technical scheme that:
a large-scale medical data knowledge mining and treatment scheme recommendation system comprises: the system comprises a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module and a drug treatment scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information, namely demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results;
the disease severity prediction module is used for training a bidirectional heterogeneous LSTM network through demographic information, diagnosis description information and laboratory index data obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The electronic medical record data come from a critical medicine database MIMIC III v1.4.
The disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,scoring for disease severity;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f ) f t ′=σ(W′ f [Chechup t ,h′ t+1 ]+b′ f )
i t =σ(W i [Chechup t ,h t-1 ]+b i ) i′ t =σ(W i ′[Chechup t ,h′ t+1 ]+b′ i )
o t =σ(W o [Chechup t ,h t-1 ]+b o ) o′ t =σ(W′ o [Chechup t ,h′ t+1 ]+b′ o )
d t =σ(W d C t-1 +b d ) d′ t =σ(W′ d C′ t+1 +b′ d )
h t =o t tanh(C t ) h′ t =o′ t tanh(C′ t )
D=relu(W dense [h t ,h′ t ]+W static P Static +b dense )
wherein σ is Sigmoid functiontan h is tan h functionReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosis (Diagnosis) t ,Chechup t Respectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, input gate, forget gate, output gate, memory cell and hidden state, respectively, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control, add additional candidate valuesAnd cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state informationWill update degree ofAdd to Current cell State C t 。
The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static Weight for static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, wherein out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as the true value y of Cross Encopy for training the bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
The treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity degree score, influence degree K of current treatment on the next stage and discharge result R = {0001,0010,0100 and 1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
where T is the length of the time window, y T Scoring the severity of each disease within the tth time window;
information of effective treatment measure M = Q [ y ] T ;K;R]。
The patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the patient z is represented as:
whereinAndfrom the forward-facing LSTM network, the network,andfrom the backward-direction LSTM network,static demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2 ;
wherein j represents the j th patient.
The drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
Compared with the prior art, the technical scheme adopted by the invention has the following beneficial effects:
(1) The system of the present invention explores effective treatment modalities from large-scale real electronic cases, which are fine-grained and short-term, unlike existing treatment recommendation engines whose treatment involves only a generally coarse-grained treatment regimen. Thus, doctors can be guided to treat more finely.
(2) The system of the invention recommends the medication individually according to the physiological condition, the treatment history, the medication history and the like of the patient and updates the medication dynamically.
(3) The invention introduces drug reactivity knowledge and patient allergy history, reduces reactivity and anaphylactic reaction between drugs, and can increase reliability and effectiveness of treatment. The whole treatment process of extracting positive and negative treatment effects is provided for doctors through the comparison of the patient similarity of fine granularity, so that the interpretability and the reliability of the medicine recommendation are enhanced, and the doctors can judge the predicted effectiveness of the recommended treatment scheme according to the treatment cases of similar patients and different effects generated by different schemes and determine whether to adopt or not or adopt own improved treatment scheme.
Drawings
FIG. 1 is a schematic diagram of a large-scale medical data knowledge mining and treatment planning recommendation system according to the present invention.
The specific implementation mode is as follows:
the present invention is further explained below.
Fig. 1 shows a large-scale medical data knowledge mining and treatment scheme recommendation system according to the present invention, which includes a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module, and a medication scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, wherein the preprocessed electronic medical record comprises five types of patient information which are demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results respectively;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The realization process of the large-scale medical data knowledge mining and treatment scheme recommendation system provided by the invention is as follows:
step 1: a dataset preprocessing module preprocesses Electronic Medical Record (EMR) data. EMR databases are typically composed of a variety of heterogeneous data sources, and the data retrieved from EMR databases is diverse, incomplete, redundant, and will greatly impact the final mining results. Accordingly, the EMR data must be pre-processed to ensure that the EMR data is accurate, complete, and consistent. First, EMR data is improved by filling in defaults, smoothing noise, and correcting data inconsistencies; second, EMR data may come from multiple EMR systems, and different data sources naturally lead to heterogeneous problems. The heterogeneous problem is mainly manifested as inconsistency of data attributes, such as attribute names and measurement units. For example, the specific gravity of urine may be expressed as SG or specific gravity, and the unit of measurement of triglyceride may be mmol/L, and sometimes may be mg/dl. Redundant data is also processed, and redundancy is mainly expressed by repeated records of data attributes or inconsistent attribute expression modes.
The pre-processed electronic cases typically contain five categories of patient information, demographic information, diagnostic description information, laboratory indices (physical examination results), medication prescriptions (medical orders), and discharge results (death).
Demographic information includes the patient's age, gender, address of residence, educational background, religion, race, marital status, weight, height, and other information. This information is important in the course of clinical decisions such as influencing the design of the overall treatment regimen and the dosage of the drug. Demographic information can be considered static during patient hospitalization, with P Static Representative, demographic information is formalized as:
P Static ={P Age ,P Gender ,P Site ,P Education ,...}
the diagnosis description information is given by the doctor and comprises the type of the disease, the qualitative description of the severity of the disease, complications and the like. Patients may suffer from a variety of diseases and during treatment, the disease may gradually heal, or the disease may progress, with new disease or increased complications. This can therefore be viewed as a dynamic process, using Diagnosis t Representing diagnostic description information at time t. The diagnostic description information is formalized as:
laboratory physiological characteristic indicators (physical examination results): during the course of treatment, in order to accurately assess the efficacy of the treatment, multiple examinations are performed during hospitalization of the patient. For the invention
Shows the result of the physical examination at the t-th time, whereinAs the physiological characteristic index of jth laboratoryThe value at time t.
The drug prescription (order) includes the name of the drug, route of administration, daily dosage, start time, end time, and the invention uses Treatment t Representing a prescription for a drug, as a combination of a series of drugs, the prescription for the drug is formulated as:
wherein, thereinThe name of the used medicine is shown,is the route of administration, by "intravenous" (IV), "intramuscular" (IM), "oral" (Per os, PO) and the like.Is the dose of the medicament per time,which indicates how many times a day each time,the time of administration is indicated as such,day d. dr indicates that the sub-optimal drug prescription is a total of dr different drugs. In the present invention, a time window of a specific size is considered to beOne complete treatment, therefore medication was rewritten as:
discharge outcome (mortality): when a patient is discharged, a doctor gives a discharge evaluation result according to the actual condition of the patient, the patient result can be cure, improvement, invalidation or death, and the four results R = {0001,0010,0100 and 1000} are expressed by a single-hot code R.
And 2, step: the disease severity prediction module intensively predicts the ICU patient's criticality by building a bidirectional heterogeneous LSTM network W1.
In the ICU, the SOFA scoring system may reflect the severity of the patient's condition. SOFA assessments are performed over a long period of time, such as 24 hours, which results in a lower level of response to critically ill patients, and predicting the severity of the disease score in a more intensive way is an effective solution for rapidly monitoring patients in the ICU.
The overall structure of the bidirectional heterogeneous LSTM network W1 is as follows:
wherein, input is physiological characteristic index (physical examination result), demographic information and diagnosis description information of the laboratory, and the heterogeneous LSTM can use the three types of heterogeneous data as input.The predicted disease severity was scored.
The LSTM at each time step t comprises i, f, o, c and h which are respectively an input gate, a forgetting gate, an output gate, a memory unit and a hidden state, wherein the forgetting gate controls the amount of memory to be forgotten, and the input gate controls the updating of each unit and the exposure of the state of the output gate control unit; if all the physiological characteristic indexes (physical examination results), the demographic information, the diagnosis description information and the like of the laboratory are differentConstructing a sequence as input and constructing sequential hidden states for each time series, the fully connected hidden neurons of different time series may confound the intrinsic dynamics of each time series. In order to realize flexible interaction of multi-surface time series, the invention only reserves the memory related to the physiological characteristic index. Under control of the previous memory, the additional diagnostic description information time series affects the cell state only through a unique structure called a decomposition gate. Using cell state C t-1 Structural breakdown door d t Which is used to control the amount of added information. By controlling the resolution gate, additional candidates are addedAdd to cell state C t . The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static As weights of static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, and out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
Bi-directional heterogeneous LSTM is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f )
i t =σ(W i [Chechup t ,h t-1 ]+b i )
o t =σ(W o [Chechup t ,h t-1 ]+b o )
d t =σ(W d C t-1 +b d )
h t =o t tanh(C t )
f t ′=σ(W′ f [Chechup t ,h′ t+1 ]+b′ f )
i′ t =σ(W i ′[Chechup t ,h′ t+1 ]+b′ i )
o′ t =σ(W′ o [Chechup t ,h′ t+1 ]+b′ o )
d′ t =σ(W′ d C′ t+1 +b′ d )
h′ t =o′ t tanh(C′ t )
D=relu(W dense [h t ,h′ t ]+W static P Static +b dense )
wherein σ is Sigmoid functiontan h is tan h functionReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosines t ,Chechup t Diagnostic description information at time t, laboratory indices (abbreviated to Di, ch omitted), respectively; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control to add additional candidate valuesAnd cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state informationWill update degree ofAdd to Current cell State C t 。
And 3, step 3: a treatment effectiveness measurement module for defining what treatment is effective; the disease severity score for each patient during treatment obtained in step two. The structure and parameters of the trained network W1 are solidified, and when a new patient enters, a real-time disease severity score of the new patient can be obtained by inputting current laboratory physiological characteristic indexes (physical examination results), demographic information and diagnosis description information.
For treatments in EMR data, a measure of treatment effectiveness is defined. Evaluation is based on three considerations, the current disease severity of the patient, the degree of impact of the current treatment on the next stage (time window) and the outcome of the treatment at the final discharge. Wherein the degree of influence K of the current treatment on the next stage is represented using the slope of the disease severity score curve, for ease of calculation and considering that the score curve is not smooth, K is defined as:
where T is the length of the time window, y T The severity of each disease within the tth time window was scored.
Therapeutic efficacy M = Q [ y [ ] T ;K;R]In the embodiment of the present invention, y T After K, R normalization, Q = [1,2,1]。
And 4, step 4: the patient similarity measurement module constructs a similarity measurement relation of the patient: information such as laboratory physiological characteristic indicators, demographic information, diagnostic description information, and disease severity of patients is important to construct a measure of similarity between patients. When using the network W1 to measure the disease severity score, the patient's information is already deposited in the network.
Each patient is represented as:
whereinAndfrom the forward LSTM network, the network,andfrom the backward-direction LSTM network,static demographic information.
Inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2 ;
wherein j represents the jth patient.
And 5: the drug treatment scheme recommendation module provides interpretability by searching and recommending a drug treatment scheme of the next stage through the positive and negative similarity treatment samples: the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measure relation of the patient obtained by the patient similarity measure module. And introducing a time sequence of medicine prescription information, and constructing a similarity measure-a treatment effectiveness measure-a pharmacy-a time tensor table. When a new patient is hospitalized, a treatment pharmacy with the highest treatment effect at the current stage and the highest similarity with the patient is recommended to the patient. It should be noted that, as the patient treatment is recommended, the patient status changes, and the similarity between the current patient and the patient in the EMR data also changes, so the recommendation of the present invention is dynamically changed according to the patient status.
Considering adverse reactions between medicines and allergy history of patients, the invention filters the combination of large adverse reactions and the medicines containing the current allergy medicines of the patients when recommending the embodiment, and selects a suboptimal method, so that the recommendation is more reliable for the current patients. Meanwhile, the treatment examples of the first s effective treatments and the first s ineffective or negative treatments with high similarity to the patient are provided for the doctor to help the doctor to make a better decision.
In this embodiment, the electronic medical record data is from the critical medicine database MIMIC-III. The MIMIC-III database is a real clinical database containing health data related to more than 40,000 patients admitted to the ICU by the Beth Israel Deaconess medical center within 11 years of age, and the invention applies the latest version of MIMIC III v1.4, including 50206 medical treatment records, relating to 6695 different diseases and 4127 drugs. The examples exclude those patients under 15 years of age or staying in the ICU for less than 48 hours. Children were excluded because the definition of the normal range of medical metrics varied between adults and children, and the 48-hour requirement in the ICU ensured sufficient data for analysis. At the same time, patients with large amounts of missing data are excluded because overestimation of the missing data may introduce differences with negative effects. Finally 3255 patients were selected for modeling and analysis.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (5)
1. A large-scale medical data knowledge mining and treatment scheme recommendation system is characterized in that: the system comprises a data set preprocessing module, a disease severity predicting module, a treatment effectiveness measuring module, a patient similarity measuring module and a drug treatment scheme recommending module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of various heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information which are static demographic information P respectively Static Diagnosis description information Diagnosines, laboratory index Chechup, medicine prescription Treatment and discharge result R;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process; the disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,scoring for disease severity;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f ) f t ′=σ(W f ′[Chechup t ,h t ′ +1 ]+b′ f )
i t =σ(W i [Chechup t ,h t-1 ]+b i ) i t ′=σ(W i ′[Chechup t ,h t ′ +1 ]+b i ′)
o t =σ(W o [Chechup t ,h t-1 ]+b o ) o t ′=σ(W o ′[Chechup t ,h t ′ +1 ]+b o ′)
d t =σ(W d C t-1 +b d ) d t ′=σ(W d ′C t ′ +1 +b d ′)
h t =o t tanh(C t ) h t ′=o t ′tanh(C t ′)
D=relu(W dense [h t ,h t ′]+W static P Static +b dense )
wherein σ is Sigmoid functiontan h is tan h functionReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents a bias term, and W, b are parameters to be learned by the model network; diagnosis (Diagnosis) t ,Chechup t Respectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control, add additional candidate valuesAnd cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state informationWill update degree ofAdd to Current cell State C t ;
Forward and backward LSTM has the same structure, forward LSTM network is represented by using a label without a prime sign, and backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static Weight for static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, wherein out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining a real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity score obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
2. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the electronic medical record data is from an intensive care medical database MIMIC III.
3. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity score, influence degree K of current treatment on the next stage and discharge result R = {0001,0010,0100,1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
where T is the length of the time window, y T Scoring the severity of each disease within the tth time window;
information of effective treatment measure M = Q [ y ] T ;K;R]。
4. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the patient z is expressed as:
whereinAndfrom the forward LSTM network, the network,andfrom backward LSTM network, P Static Static demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2 ;
wherein j represents the jth patient.
5. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117826.5A CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117826.5A CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110880362A CN110880362A (en) | 2020-03-13 |
CN110880362B true CN110880362B (en) | 2022-10-11 |
Family
ID=69728839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911117826.5A Active CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110880362B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111430032B (en) * | 2020-03-20 | 2022-03-18 | 山东科技大学 | Old people disease modeling method based on APC model and genetic clustering algorithm |
CN111462897B (en) * | 2020-04-01 | 2021-05-11 | 山东大学 | Patient similarity analysis method and system based on improved heterogeneous information network |
CN111696666A (en) * | 2020-06-10 | 2020-09-22 | 杭州联众医疗科技股份有限公司 | Intelligent chronic disease management system based on time coding |
CN111681767B (en) * | 2020-06-12 | 2022-07-05 | 电子科技大学 | Electronic medical record data processing method and system |
CN111863281B (en) * | 2020-07-29 | 2021-08-06 | 山东大学 | Personalized medicine adverse reaction prediction system, equipment and medium |
CN112712435A (en) * | 2020-12-28 | 2021-04-27 | 天津幸福生命科技有限公司 | Service management system, computer-readable storage medium, and electronic device |
CN113436727B (en) * | 2021-06-30 | 2022-07-12 | 华中科技大学 | Method for scoring cure probability of potential treatment plan based on patient detection information |
CN113593670A (en) * | 2021-08-05 | 2021-11-02 | 江西省科学院应用物理研究所 | Prescription generation method and system for household direct current stimulation medical equipment |
CN113628716A (en) * | 2021-08-05 | 2021-11-09 | 翼健(上海)信息科技有限公司 | Prescription recommendation system |
CN116580797B (en) * | 2023-05-15 | 2023-10-31 | 北京利久医药科技有限公司 | Rapid comparison method of clinical test data |
CN116504354B (en) * | 2023-06-28 | 2024-01-09 | 合肥工业大学 | Intelligent service recommendation method and system based on intelligent medical treatment |
CN117012375B (en) * | 2023-10-07 | 2024-03-26 | 之江实验室 | Clinical decision support method and system based on patient topological feature similarity |
CN117373657B (en) * | 2023-12-07 | 2024-02-20 | 深圳问止中医健康科技有限公司 | Personalized medical auxiliary inquiry system based on big data analysis |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105793852A (en) * | 2013-12-04 | 2016-07-20 | M·奥利尼克 | Computational medical treatment plan method and system with mass medical analysis |
CN109637669A (en) * | 2018-11-22 | 2019-04-16 | 中山大学 | Generation method, device and the storage medium of therapeutic scheme based on deep learning |
CN109994215A (en) * | 2019-04-25 | 2019-07-09 | 清华大学 | Disease automatic coding system, method, equipment and storage medium |
CN110024044A (en) * | 2016-09-28 | 2019-07-16 | 曼迪奥研究有限公司 | For excavating the system and method for medical data |
CN110310740A (en) * | 2019-04-15 | 2019-10-08 | 山东大学 | Based on see a doctor again information forecasting method and the system for intersecting attention neural network |
CN110347837A (en) * | 2019-07-17 | 2019-10-18 | 电子科技大学 | A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11410756B2 (en) * | 2017-07-28 | 2022-08-09 | Google Llc | System and method for predicting and summarizing medical events from electronic health records |
-
2019
- 2019-11-12 CN CN201911117826.5A patent/CN110880362B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105793852A (en) * | 2013-12-04 | 2016-07-20 | M·奥利尼克 | Computational medical treatment plan method and system with mass medical analysis |
CN110024044A (en) * | 2016-09-28 | 2019-07-16 | 曼迪奥研究有限公司 | For excavating the system and method for medical data |
CN109637669A (en) * | 2018-11-22 | 2019-04-16 | 中山大学 | Generation method, device and the storage medium of therapeutic scheme based on deep learning |
CN110310740A (en) * | 2019-04-15 | 2019-10-08 | 山东大学 | Based on see a doctor again information forecasting method and the system for intersecting attention neural network |
CN109994215A (en) * | 2019-04-25 | 2019-07-09 | 清华大学 | Disease automatic coding system, method, equipment and storage medium |
CN110347837A (en) * | 2019-07-17 | 2019-10-18 | 电子科技大学 | A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease |
Non-Patent Citations (2)
Title |
---|
《Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis》;Shickel, B.1 等;《IEEE Journal of Biomedical and Health Informatics》;20180531;第22卷(第5期);全文 * |
数据驱动的重症患者健康监测方法研究;丁阳阳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190115;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110880362A (en) | 2020-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110880362B (en) | Large-scale medical data knowledge mining and treatment scheme recommending system | |
US11468998B2 (en) | Methods and systems for software clinical guidance | |
Yadav et al. | Mining electronic health records (EHRs) A survey | |
Shortliffe et al. | Knowledge engineering for medical decision making: A review of computer-based clinical decision aids | |
WO2023078025A1 (en) | Task decomposition strategy-based auxiliary differential diagnosis system for fever of unknown origin | |
US20170249434A1 (en) | Multi-format, multi-domain and multi-algorithm metalearner system and method for monitoring human health, and deriving health status and trajectory | |
Huddar et al. | Predicting complications in critical care using heterogeneous clinical data | |
Afsaneh et al. | Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review | |
Gautier et al. | Artificial intelligence and diabetes technology: a review | |
Robinson et al. | Defining phenotypes from clinical data to drive genomic research | |
CN111863238A (en) | Parallel intelligence based chronic disease diagnosis and treatment system and diagnosis and treatment method | |
Luo et al. | Applying interpretable deep learning models to identify chronic cough patients using EHR data | |
US11322250B1 (en) | Intelligent medical care path systems and methods | |
Moazemi et al. | Artificial intelligence for clinical decision support for monitoring patients in cardiovascular ICUs: a systematic review | |
CN112908452A (en) | Event data modeling | |
Shickel et al. | Deep multi-modal transfer learning for augmented patient acuity assessment in the intelligent ICU | |
Yang et al. | Disease prediction model based on bilstm and attention mechanism | |
Kamra et al. | Diagnosis support system for general diseases by implementing a novel machine learning based classifier | |
Rasubala et al. | Digital twin roles in public healthcare | |
Cheng et al. | Combining knowledge extension with convolution neural network for diabetes prediction | |
Gupta et al. | An overview of clinical decision support system (cdss) as a computational tool and its applications in public health | |
CN112567473B (en) | Predicting hypoglycemia ratio by machine learning system | |
Zhang et al. | A time-sensitive hybrid learning model for patient subgrouping | |
Soguero-Ruiz et al. | An interoperable system toward cardiac risk stratification from ECG monitoring | |
Basha et al. | Deep learning neural network (DLNN)-based classification and optimization algorithm for organ inflammation disease diagnosis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |