CN110880362B - Large-scale medical data knowledge mining and treatment scheme recommending system - Google Patents

Large-scale medical data knowledge mining and treatment scheme recommending system Download PDF

Info

Publication number
CN110880362B
CN110880362B CN201911117826.5A CN201911117826A CN110880362B CN 110880362 B CN110880362 B CN 110880362B CN 201911117826 A CN201911117826 A CN 201911117826A CN 110880362 B CN110880362 B CN 110880362B
Authority
CN
China
Prior art keywords
treatment
patient
module
information
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911117826.5A
Other languages
Chinese (zh)
Other versions
CN110880362A (en
Inventor
张立言
黄兆孟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201911117826.5A priority Critical patent/CN110880362B/en
Publication of CN110880362A publication Critical patent/CN110880362A/en
Application granted granted Critical
Publication of CN110880362B publication Critical patent/CN110880362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a large-scale medical data knowledge mining and treatment scheme recommending system, which comprises: the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources; the disease severity prediction module is used for obtaining a disease severity score in the treatment process of each patient; the treatment effectiveness measurement module is used for obtaining effective treatment measurement information; the patient similarity measurement module is used for constructing a similarity measurement relation of the patient; and the drug treatment scheme recommendation module is used for obtaining the next-stage drug treatment scheme recommendation. The invention judges and predicts the severity of the disease condition of the patient and defines the effectiveness measure of the treatment by the multitask bidirectional heterogeneous LSTM. And calculating the fine granularity similarity of the patient, and recommending the treatment scheme of the next stage according to the historical treatment record of the patient and the effective treatment scheme of other patients with high pathological similarity.

Description

Large-scale medical data knowledge mining and treatment scheme recommending system
Technical Field
The invention discloses a system for realizing discovery and recommendation of an effective drug treatment scheme by applying deep learning and knowledge introduction, belonging to the field of medical data mining.
Background
Electronic medical record (EHR) data is from millions of patients, and is currently collected and stored periodically at various medical institutions. These EHR data consist of heterogeneous data elements, typically including demographics, diagnostics, physical examinations, sensor measurements, laboratory test results, prescribed or managed medications, and clinical records, among others. With the rapid development of information technology and the rapid popularization of Electronic Medical Record Systems (EMRs), the amount of digital information stored in electronic health medical records in China has increased dramatically over the last decade. It is widely believed that a great deal of hidden knowledge is contained in the massive data, and various types of data in an electronic medical record system (EMR) provide a way to acquire medical knowledge, so that a basis is provided for improving the medical quality and efficiency. Specifically, EMR data has played an important role in many medical applications, especially in providing effective medication recommendations for physicians and patients, increasing the cure rate of disease, reducing the risk of death to clinical patients, and reducing decision costs during physician treatment and avoiding increased medical costs due to ineffective or harmful treatments.
While there is a tremendous interest in using EMRs data to improve medical performance, the gains from the analysis of EMRs data are far less than what EMRs can provide. One reason is that the prognosis of a patient is influenced by many factors, such as the age and sex of the patient, the severity of the disease, and the treatment being administered. While the EMRs data contains comprehensive information about patients, diagnosis and treatment, there is no unified framework to integrate all relevant factors for advanced data modeling. Furthermore, EMRs data is heterogeneous, vertical in nature. For example, a treatment record is a series of orders, where each order typically consists of a medication name, a route of administration, a dose, a start time, and an end time. In general, analyzing large-scale complex EMRs data, extracting medical knowledge, and promoting decision making in treatment practice is a not small challenge.
Scientists have made many beneficial explorations in electronic case data mining in order to analyze large-scale complex EMRs data. According to the data mining paper review [1] [2] applied to EMR, the Recurrent Neural Network (RNN) and its variants (LSTM, GRU) specifically used for sequential modeling can capture the complex temporal dynamics in longitudinal EMR data, which is the first choice for EMR modeling tasks. Chen, W., wang, S. [4] et al dynamically predicted the severity of Intensive Care Unit (ICU) patient' S condition using a multitasking RNN by integrating laboratory test results for different organs of the patient. However, the method in [3] does not make full use of heterogeneous data in EMR, for example, the diagnosis results and the description of the disease are meaningful for the task. Cao X, edward C et al [3] developed a treatment engine based on historical EMR data to provide patients with next-stage prescriptions based on their condition, laboratory results, treatment records, and demographic information. [4] Three different LSTM variants were proposed primarily to address the problem of data heterogeneity, but no overall framework for recommended treatment was proposed. Since the prescription for the next phase of the procedure is from historical treatment, the problem of "cold start", i.e. the treatment recommendation for the first hospitalized patient, is not addressed, and the present invention recognizes that the first 24 hours of treatment in the treatment of critically ill patients is critical. Leileilei Sun, chuanren Liu et al [5] proposed a method for developing and recommending a data-driven automatic treatment plan, mainly using important information in medical advice, and the clustering method used by the method finally obtained a few types of drug treatment combinations, which could not satisfy more refined treatment method recommendations. Meanwhile, none of the above schemes takes into account the problem of reactivity between drugs and the history of drug allergy of patients.
Reference:
[1].Shickel B,Tighe P J,Bihorac A,et al.Deep EHR:A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record(EHR)Analysis[J].IEEE Journal of Biomedical and Health Informatics,2017:1-1.
[2].Cao X,Edward C,Jimeng S.Opportunities and challenges in developing deep learning models using electronic health records data:a systematic review[J].Journal of the American Medical Informatics Association,2018.
[3].Chen,W.,Wang,S.,Long,G.,Yao,L.,Sheng,Q.Z.,Li,X.:Dynamic illness severity prediction via multi-task rnns for intensive care unit.In:ICDM(2018)
[4].Jin B.,Yang H.,Sun L.,Liu C.,Qu Y.,Tong J.A treatment engine by predicting next-period prescriptions Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining,ACM(2018),pp.1608-1616.
[5].Leilei Sun,Chuanren Liu,Chonghui Guo,Hui Xiong,and Yanming Xie.2016.Data-driven Automatic Treatment Regimen Development and Recommendation.In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,New York,NY,USA,1865–1874.
disclosure of Invention
The invention aims to provide a large-scale medical data knowledge mining and treatment scheme recommending system, which applies a heterogeneous cyclic neural network and knowledge introduction to find effective treatment segments from a large-scale electronic medical record and can explain the next-stage medicament treatment of a patient based on the fine-grained similarity of the patient so as to meet the modeling requirement and have a good effect.
In order to achieve the purpose, the invention adopts the technical scheme that:
a large-scale medical data knowledge mining and treatment scheme recommendation system comprises: the system comprises a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module and a drug treatment scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information, namely demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results;
the disease severity prediction module is used for training a bidirectional heterogeneous LSTM network through demographic information, diagnosis description information and laboratory index data obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The electronic medical record data come from a critical medicine database MIMIC III v1.4.
The disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
Figure GDA0003765161680000031
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,
Figure GDA0003765161680000032
scoring for disease severity;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f ) f t ′=σ(W′ f [Chechup t ,h′ t+1 ]+b′ f )
i t =σ(W i [Chechup t ,h t-1 ]+b i ) i′ t =σ(W i ′[Chechup t ,h′ t+1 ]+b′ i )
o t =σ(W o [Chechup t ,h t-1 ]+b o ) o′ t =σ(W′ o [Chechup t ,h′ t+1 ]+b′ o )
d t =σ(W d C t-1 +b d ) d′ t =σ(W′ d C′ t+1 +b′ d )
Figure GDA0003765161680000041
Figure GDA0003765161680000042
Figure GDA0003765161680000043
h t =o t tanh(C t ) h′ t =o′ t tanh(C′ t )
D=relu(W dense [h t ,h′ t ]+W static P Static +b dense )
Figure GDA0003765161680000044
wherein σ is Sigmoid function
Figure GDA0003765161680000045
tan h is tan h function
Figure GDA0003765161680000046
ReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosis (Diagnosis) t ,Chechup t Respectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, input gate, forget gate, output gate, memory cell and hidden state, respectively, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control, add additional candidate values
Figure GDA0003765161680000047
And cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state information
Figure GDA0003765161680000048
Will update degree of
Figure GDA0003765161680000049
Add to Current cell State C t
The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static Weight for static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, wherein out represents that the layer is an output layer, and finally obtaining a predicted disease severity score
Figure GDA00037651616800000410
The model uses SOFA score as the true value y of Cross Encopy for training the bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
The treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity degree score, influence degree K of current treatment on the next stage and discharge result R = {0001,0010,0100 and 1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
Figure GDA0003765161680000051
where T is the length of the time window, y T Scoring the severity of each disease within the tth time window;
information of effective treatment measure M = Q [ y ] T ;K;R]。
The patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the patient z is represented as:
Figure GDA0003765161680000052
wherein
Figure GDA0003765161680000053
And
Figure GDA0003765161680000054
from the forward-facing LSTM network, the network,
Figure GDA0003765161680000055
and
Figure GDA0003765161680000056
from the backward-direction LSTM network,
Figure GDA0003765161680000057
static demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2
wherein j represents the j th patient.
The drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
Compared with the prior art, the technical scheme adopted by the invention has the following beneficial effects:
(1) The system of the present invention explores effective treatment modalities from large-scale real electronic cases, which are fine-grained and short-term, unlike existing treatment recommendation engines whose treatment involves only a generally coarse-grained treatment regimen. Thus, doctors can be guided to treat more finely.
(2) The system of the invention recommends the medication individually according to the physiological condition, the treatment history, the medication history and the like of the patient and updates the medication dynamically.
(3) The invention introduces drug reactivity knowledge and patient allergy history, reduces reactivity and anaphylactic reaction between drugs, and can increase reliability and effectiveness of treatment. The whole treatment process of extracting positive and negative treatment effects is provided for doctors through the comparison of the patient similarity of fine granularity, so that the interpretability and the reliability of the medicine recommendation are enhanced, and the doctors can judge the predicted effectiveness of the recommended treatment scheme according to the treatment cases of similar patients and different effects generated by different schemes and determine whether to adopt or not or adopt own improved treatment scheme.
Drawings
FIG. 1 is a schematic diagram of a large-scale medical data knowledge mining and treatment planning recommendation system according to the present invention.
The specific implementation mode is as follows:
the present invention is further explained below.
Fig. 1 shows a large-scale medical data knowledge mining and treatment scheme recommendation system according to the present invention, which includes a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module, and a medication scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, wherein the preprocessed electronic medical record comprises five types of patient information which are demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results respectively;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The realization process of the large-scale medical data knowledge mining and treatment scheme recommendation system provided by the invention is as follows:
step 1: a dataset preprocessing module preprocesses Electronic Medical Record (EMR) data. EMR databases are typically composed of a variety of heterogeneous data sources, and the data retrieved from EMR databases is diverse, incomplete, redundant, and will greatly impact the final mining results. Accordingly, the EMR data must be pre-processed to ensure that the EMR data is accurate, complete, and consistent. First, EMR data is improved by filling in defaults, smoothing noise, and correcting data inconsistencies; second, EMR data may come from multiple EMR systems, and different data sources naturally lead to heterogeneous problems. The heterogeneous problem is mainly manifested as inconsistency of data attributes, such as attribute names and measurement units. For example, the specific gravity of urine may be expressed as SG or specific gravity, and the unit of measurement of triglyceride may be mmol/L, and sometimes may be mg/dl. Redundant data is also processed, and redundancy is mainly expressed by repeated records of data attributes or inconsistent attribute expression modes.
The pre-processed electronic cases typically contain five categories of patient information, demographic information, diagnostic description information, laboratory indices (physical examination results), medication prescriptions (medical orders), and discharge results (death).
Demographic information includes the patient's age, gender, address of residence, educational background, religion, race, marital status, weight, height, and other information. This information is important in the course of clinical decisions such as influencing the design of the overall treatment regimen and the dosage of the drug. Demographic information can be considered static during patient hospitalization, with P Static Representative, demographic information is formalized as:
P Static ={P Age ,P Gender ,P Site ,P Education ,...}
the diagnosis description information is given by the doctor and comprises the type of the disease, the qualitative description of the severity of the disease, complications and the like. Patients may suffer from a variety of diseases and during treatment, the disease may gradually heal, or the disease may progress, with new disease or increased complications. This can therefore be viewed as a dynamic process, using Diagnosis t Representing diagnostic description information at time t. The diagnostic description information is formalized as:
Figure GDA0003765161680000071
laboratory physiological characteristic indicators (physical examination results): during the course of treatment, in order to accurately assess the efficacy of the treatment, multiple examinations are performed during hospitalization of the patient. For the invention
Figure GDA0003765161680000081
Shows the result of the physical examination at the t-th time, wherein
Figure GDA0003765161680000082
As the physiological characteristic index of jth laboratory
Figure GDA0003765161680000083
The value at time t.
The drug prescription (order) includes the name of the drug, route of administration, daily dosage, start time, end time, and the invention uses Treatment t Representing a prescription for a drug, as a combination of a series of drugs, the prescription for the drug is formulated as:
Figure GDA0003765161680000084
wherein, therein
Figure GDA0003765161680000085
The name of the used medicine is shown,
Figure GDA0003765161680000086
is the route of administration, by "intravenous" (IV), "intramuscular" (IM), "oral" (Per os, PO) and the like.
Figure GDA0003765161680000087
Is the dose of the medicament per time,
Figure GDA0003765161680000088
which indicates how many times a day each time,
Figure GDA0003765161680000089
the time of administration is indicated as such,
Figure GDA00037651616800000810
day d. dr indicates that the sub-optimal drug prescription is a total of dr different drugs. In the present invention, a time window of a specific size is considered to beOne complete treatment, therefore medication was rewritten as:
Figure GDA00037651616800000811
discharge outcome (mortality): when a patient is discharged, a doctor gives a discharge evaluation result according to the actual condition of the patient, the patient result can be cure, improvement, invalidation or death, and the four results R = {0001,0010,0100 and 1000} are expressed by a single-hot code R.
And 2, step: the disease severity prediction module intensively predicts the ICU patient's criticality by building a bidirectional heterogeneous LSTM network W1.
In the ICU, the SOFA scoring system may reflect the severity of the patient's condition. SOFA assessments are performed over a long period of time, such as 24 hours, which results in a lower level of response to critically ill patients, and predicting the severity of the disease score in a more intensive way is an effective solution for rapidly monitoring patients in the ICU.
The overall structure of the bidirectional heterogeneous LSTM network W1 is as follows:
Figure GDA00037651616800000812
wherein, input is physiological characteristic index (physical examination result), demographic information and diagnosis description information of the laboratory, and the heterogeneous LSTM can use the three types of heterogeneous data as input.
Figure GDA0003765161680000091
The predicted disease severity was scored.
The LSTM at each time step t comprises i, f, o, c and h which are respectively an input gate, a forgetting gate, an output gate, a memory unit and a hidden state, wherein the forgetting gate controls the amount of memory to be forgotten, and the input gate controls the updating of each unit and the exposure of the state of the output gate control unit; if all the physiological characteristic indexes (physical examination results), the demographic information, the diagnosis description information and the like of the laboratory are differentConstructing a sequence as input and constructing sequential hidden states for each time series, the fully connected hidden neurons of different time series may confound the intrinsic dynamics of each time series. In order to realize flexible interaction of multi-surface time series, the invention only reserves the memory related to the physiological characteristic index. Under control of the previous memory, the additional diagnostic description information time series affects the cell state only through a unique structure called a decomposition gate. Using cell state C t-1 Structural breakdown door d t Which is used to control the amount of added information. By controlling the resolution gate, additional candidates are added
Figure GDA0003765161680000092
Add to cell state C t . The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static As weights of static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, and out represents that the layer is an output layer, and finally obtaining a predicted disease severity score
Figure GDA0003765161680000093
The model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
Bi-directional heterogeneous LSTM is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f )
i t =σ(W i [Chechup t ,h t-1 ]+b i )
o t =σ(W o [Chechup t ,h t-1 ]+b o )
d t =σ(W d C t-1 +b d )
Figure GDA0003765161680000101
Figure GDA0003765161680000102
Figure GDA0003765161680000103
h t =o t tanh(C t )
f t ′=σ(W′ f [Chechup t ,h′ t+1 ]+b′ f )
i′ t =σ(W i ′[Chechup t ,h′ t+1 ]+b′ i )
o′ t =σ(W′ o [Chechup t ,h′ t+1 ]+b′ o )
d′ t =σ(W′ d C′ t+1 +b′ d )
Figure GDA0003765161680000104
Figure GDA0003765161680000105
Figure GDA0003765161680000106
h′ t =o′ t tanh(C′ t )
D=relu(W dense [h t ,h′ t ]+W static P Static +b dense )
Figure GDA0003765161680000111
wherein σ is Sigmoid function
Figure GDA0003765161680000112
tan h is tan h function
Figure GDA0003765161680000113
ReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosines t ,Chechup t Diagnostic description information at time t, laboratory indices (abbreviated to Di, ch omitted), respectively; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control to add additional candidate values
Figure GDA0003765161680000114
And cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state information
Figure GDA0003765161680000115
Will update degree of
Figure GDA0003765161680000116
Add to Current cell State C t
And 3, step 3: a treatment effectiveness measurement module for defining what treatment is effective; the disease severity score for each patient during treatment obtained in step two. The structure and parameters of the trained network W1 are solidified, and when a new patient enters, a real-time disease severity score of the new patient can be obtained by inputting current laboratory physiological characteristic indexes (physical examination results), demographic information and diagnosis description information.
For treatments in EMR data, a measure of treatment effectiveness is defined. Evaluation is based on three considerations, the current disease severity of the patient, the degree of impact of the current treatment on the next stage (time window) and the outcome of the treatment at the final discharge. Wherein the degree of influence K of the current treatment on the next stage is represented using the slope of the disease severity score curve, for ease of calculation and considering that the score curve is not smooth, K is defined as:
Figure GDA0003765161680000117
where T is the length of the time window, y T The severity of each disease within the tth time window was scored.
Therapeutic efficacy M = Q [ y [ ] T ;K;R]In the embodiment of the present invention, y T After K, R normalization, Q = [1,2,1]。
And 4, step 4: the patient similarity measurement module constructs a similarity measurement relation of the patient: information such as laboratory physiological characteristic indicators, demographic information, diagnostic description information, and disease severity of patients is important to construct a measure of similarity between patients. When using the network W1 to measure the disease severity score, the patient's information is already deposited in the network.
Each patient is represented as:
Figure GDA0003765161680000121
wherein
Figure GDA0003765161680000122
And
Figure GDA0003765161680000123
from the forward LSTM network, the network,
Figure GDA0003765161680000124
and
Figure GDA0003765161680000125
from the backward-direction LSTM network,
Figure GDA0003765161680000126
static demographic information.
Inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2
wherein j represents the jth patient.
And 5: the drug treatment scheme recommendation module provides interpretability by searching and recommending a drug treatment scheme of the next stage through the positive and negative similarity treatment samples: the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measure relation of the patient obtained by the patient similarity measure module. And introducing a time sequence of medicine prescription information, and constructing a similarity measure-a treatment effectiveness measure-a pharmacy-a time tensor table. When a new patient is hospitalized, a treatment pharmacy with the highest treatment effect at the current stage and the highest similarity with the patient is recommended to the patient. It should be noted that, as the patient treatment is recommended, the patient status changes, and the similarity between the current patient and the patient in the EMR data also changes, so the recommendation of the present invention is dynamically changed according to the patient status.
Considering adverse reactions between medicines and allergy history of patients, the invention filters the combination of large adverse reactions and the medicines containing the current allergy medicines of the patients when recommending the embodiment, and selects a suboptimal method, so that the recommendation is more reliable for the current patients. Meanwhile, the treatment examples of the first s effective treatments and the first s ineffective or negative treatments with high similarity to the patient are provided for the doctor to help the doctor to make a better decision.
In this embodiment, the electronic medical record data is from the critical medicine database MIMIC-III. The MIMIC-III database is a real clinical database containing health data related to more than 40,000 patients admitted to the ICU by the Beth Israel Deaconess medical center within 11 years of age, and the invention applies the latest version of MIMIC III v1.4, including 50206 medical treatment records, relating to 6695 different diseases and 4127 drugs. The examples exclude those patients under 15 years of age or staying in the ICU for less than 48 hours. Children were excluded because the definition of the normal range of medical metrics varied between adults and children, and the 48-hour requirement in the ICU ensured sufficient data for analysis. At the same time, patients with large amounts of missing data are excluded because overestimation of the missing data may introduce differences with negative effects. Finally 3255 patients were selected for modeling and analysis.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A large-scale medical data knowledge mining and treatment scheme recommendation system is characterized in that: the system comprises a data set preprocessing module, a disease severity predicting module, a treatment effectiveness measuring module, a patient similarity measuring module and a drug treatment scheme recommending module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of various heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information which are static demographic information P respectively Static Diagnosis description information Diagnosines, laboratory index Chechup, medicine prescription Treatment and discharge result R;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process; the disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
Figure FDA0003781899210000011
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,
Figure FDA0003781899210000012
scoring for disease severity;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
f t =σ(W f [Chechup t ,h t-1 ]+b f ) f t ′=σ(W f ′[Chechup t ,h t+1 ]+b′ f )
i t =σ(W i [Chechup t ,h t-1 ]+b i ) i t ′=σ(W i ′[Chechup t ,h t+1 ]+b i ′)
o t =σ(W o [Chechup t ,h t-1 ]+b o ) o t ′=σ(W o ′[Chechup t ,h t+1 ]+b o ′)
d t =σ(W d C t-1 +b d ) d t ′=σ(W d ′C t+1 +b d ′)
Figure FDA0003781899210000013
Figure FDA0003781899210000018
Figure FDA0003781899210000014
h t =o t tanh(C t ) h t ′=o t ′tanh(C t ′)
D=relu(W dense [h t ,h t ′]+W static P Static +b dense )
Figure FDA0003781899210000015
wherein σ is Sigmoid function
Figure FDA0003781899210000016
tan h is tan h function
Figure FDA0003781899210000017
ReLu is a ReLu function f (x) = max (0, x), W is each weight matrix, b represents a bias term, and W, b are parameters to be learned by the model network; diagnosis (Diagnosis) t ,Chechup t Respectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state C t-1 Structural breakdown door d t For controlling the amount of added information; by forgetting door f t Control, add additional candidate values
Figure FDA0003781899210000021
And cell state C at the previous time t-1 Add to Current cell State C t (ii) a From an input gate i t Controlling new state information
Figure FDA0003781899210000022
Will update degree of
Figure FDA0003781899210000023
Add to Current cell State C t
Forward and backward LSTM has the same structure, forward LSTM network is represented by using a label without a prime sign, and backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information P Static The weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is W dense ,W static Weight for static information, b dense A deviation term for this layer; then inputting the data into a sigmoid layer, wherein out represents that the layer is an output layer, and finally obtaining a predicted disease severity score
Figure FDA0003781899210000024
The model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining a real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity score obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
2. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the electronic medical record data is from an intensive care medical database MIMIC III.
3. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity score, influence degree K of current treatment on the next stage and discharge result R = {0001,0010,0100,1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
Figure FDA0003781899210000031
where T is the length of the time window, y T Scoring the severity of each disease within the tth time window;
information of effective treatment measure M = Q [ y ] T ;K;R]。
4. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the patient z is expressed as:
Figure FDA0003781899210000032
wherein
Figure FDA0003781899210000033
And
Figure FDA0003781899210000034
from the forward LSTM network, the network,
Figure FDA0003781899210000035
and
Figure FDA0003781899210000036
from backward LSTM network, P Static Static demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<P z ,P j >=||P z -P j || 2
wherein j represents the jth patient.
5. The large-scale medical data knowledge mining and treatment protocol recommendation system of claim 1, wherein: the drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
CN201911117826.5A 2019-11-12 2019-11-12 Large-scale medical data knowledge mining and treatment scheme recommending system Active CN110880362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911117826.5A CN110880362B (en) 2019-11-12 2019-11-12 Large-scale medical data knowledge mining and treatment scheme recommending system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911117826.5A CN110880362B (en) 2019-11-12 2019-11-12 Large-scale medical data knowledge mining and treatment scheme recommending system

Publications (2)

Publication Number Publication Date
CN110880362A CN110880362A (en) 2020-03-13
CN110880362B true CN110880362B (en) 2022-10-11

Family

ID=69728839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911117826.5A Active CN110880362B (en) 2019-11-12 2019-11-12 Large-scale medical data knowledge mining and treatment scheme recommending system

Country Status (1)

Country Link
CN (1) CN110880362B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111430032B (en) * 2020-03-20 2022-03-18 山东科技大学 Old people disease modeling method based on APC model and genetic clustering algorithm
CN111462897B (en) * 2020-04-01 2021-05-11 山东大学 Patient similarity analysis method and system based on improved heterogeneous information network
CN111696666A (en) * 2020-06-10 2020-09-22 杭州联众医疗科技股份有限公司 Intelligent chronic disease management system based on time coding
CN111681767B (en) * 2020-06-12 2022-07-05 电子科技大学 Electronic medical record data processing method and system
CN111863281B (en) * 2020-07-29 2021-08-06 山东大学 Personalized medicine adverse reaction prediction system, equipment and medium
CN112712435A (en) * 2020-12-28 2021-04-27 天津幸福生命科技有限公司 Service management system, computer-readable storage medium, and electronic device
CN113436727B (en) * 2021-06-30 2022-07-12 华中科技大学 Method for scoring cure probability of potential treatment plan based on patient detection information
CN113593670A (en) * 2021-08-05 2021-11-02 江西省科学院应用物理研究所 Prescription generation method and system for household direct current stimulation medical equipment
CN113628716A (en) * 2021-08-05 2021-11-09 翼健(上海)信息科技有限公司 Prescription recommendation system
CN116580797B (en) * 2023-05-15 2023-10-31 北京利久医药科技有限公司 Rapid comparison method of clinical test data
CN116504354B (en) * 2023-06-28 2024-01-09 合肥工业大学 Intelligent service recommendation method and system based on intelligent medical treatment
CN117012375B (en) * 2023-10-07 2024-03-26 之江实验室 Clinical decision support method and system based on patient topological feature similarity
CN117373657B (en) * 2023-12-07 2024-02-20 深圳问止中医健康科技有限公司 Personalized medical auxiliary inquiry system based on big data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105793852A (en) * 2013-12-04 2016-07-20 M·奥利尼克 Computational medical treatment plan method and system with mass medical analysis
CN109637669A (en) * 2018-11-22 2019-04-16 中山大学 Generation method, device and the storage medium of therapeutic scheme based on deep learning
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
CN110024044A (en) * 2016-09-28 2019-07-16 曼迪奥研究有限公司 For excavating the system and method for medical data
CN110310740A (en) * 2019-04-15 2019-10-08 山东大学 Based on see a doctor again information forecasting method and the system for intersecting attention neural network
CN110347837A (en) * 2019-07-17 2019-10-18 电子科技大学 A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11410756B2 (en) * 2017-07-28 2022-08-09 Google Llc System and method for predicting and summarizing medical events from electronic health records

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105793852A (en) * 2013-12-04 2016-07-20 M·奥利尼克 Computational medical treatment plan method and system with mass medical analysis
CN110024044A (en) * 2016-09-28 2019-07-16 曼迪奥研究有限公司 For excavating the system and method for medical data
CN109637669A (en) * 2018-11-22 2019-04-16 中山大学 Generation method, device and the storage medium of therapeutic scheme based on deep learning
CN110310740A (en) * 2019-04-15 2019-10-08 山东大学 Based on see a doctor again information forecasting method and the system for intersecting attention neural network
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
CN110347837A (en) * 2019-07-17 2019-10-18 电子科技大学 A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis》;Shickel, B.1 等;《IEEE Journal of Biomedical and Health Informatics》;20180531;第22卷(第5期);全文 *
数据驱动的重症患者健康监测方法研究;丁阳阳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190115;全文 *

Also Published As

Publication number Publication date
CN110880362A (en) 2020-03-13

Similar Documents

Publication Publication Date Title
CN110880362B (en) Large-scale medical data knowledge mining and treatment scheme recommending system
US11468998B2 (en) Methods and systems for software clinical guidance
Yadav et al. Mining electronic health records (EHRs) A survey
Shortliffe et al. Knowledge engineering for medical decision making: A review of computer-based clinical decision aids
WO2023078025A1 (en) Task decomposition strategy-based auxiliary differential diagnosis system for fever of unknown origin
US20170249434A1 (en) Multi-format, multi-domain and multi-algorithm metalearner system and method for monitoring human health, and deriving health status and trajectory
Huddar et al. Predicting complications in critical care using heterogeneous clinical data
Afsaneh et al. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review
Gautier et al. Artificial intelligence and diabetes technology: a review
Robinson et al. Defining phenotypes from clinical data to drive genomic research
CN111863238A (en) Parallel intelligence based chronic disease diagnosis and treatment system and diagnosis and treatment method
Luo et al. Applying interpretable deep learning models to identify chronic cough patients using EHR data
US11322250B1 (en) Intelligent medical care path systems and methods
Moazemi et al. Artificial intelligence for clinical decision support for monitoring patients in cardiovascular ICUs: a systematic review
CN112908452A (en) Event data modeling
Shickel et al. Deep multi-modal transfer learning for augmented patient acuity assessment in the intelligent ICU
Yang et al. Disease prediction model based on bilstm and attention mechanism
Kamra et al. Diagnosis support system for general diseases by implementing a novel machine learning based classifier
Rasubala et al. Digital twin roles in public healthcare
Cheng et al. Combining knowledge extension with convolution neural network for diabetes prediction
Gupta et al. An overview of clinical decision support system (cdss) as a computational tool and its applications in public health
CN112567473B (en) Predicting hypoglycemia ratio by machine learning system
Zhang et al. A time-sensitive hybrid learning model for patient subgrouping
Soguero-Ruiz et al. An interoperable system toward cardiac risk stratification from ECG monitoring
Basha et al. Deep learning neural network (DLNN)-based classification and optimization algorithm for organ inflammation disease diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant