CN114694841B

CN114694841B - Adverse event risk prediction method based on patient electronic health record

Info

Publication number: CN114694841B
Application number: CN202210322129.9A
Authority: CN
Inventors: 郑恒杰; 刘勇国; 张云; 朱嘉静; 李巧勤; 傅翀
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-04-07
Anticipated expiration: 2042-03-30
Also published as: CN114694841A; ZA202208574B

Abstract

The invention discloses an adverse event risk prediction method based on an electronic health record of a patient, which comprises the following steps of: s1, preprocessing data; s2, performing K-means clustering sampling processing, and dividing data into 3 clusters to obtain 3 clustering centers; s3, pressing 3 clustering centers to P _* The maximum values in the three subsets are sorted from small to large and respectively used as an uncommon code subset, a more common code subset and a common code subset, then the three subsets are respectively and correspondingly input into three basic classifiers of GRAM +, dipole + and RNN + for pre-training, and then model fusion is carried out on the three basic classifiers. According to the method, a clustering algorithm is used for sampling proper training samples for a basic learning device, a self-adaptive combination strategy is designed, and integration weights of different basic classifiers are generated in a self-adaptive mode according to the distance from the training samples to the center of a pre-training set, so that the model has stronger self-adaptability. In addition, through the sampling after clustering, the calculation amount can be obviously reduced when the basic embedding is trained.

Description

Adverse event risk prediction method based on patient electronic health record

Technical Field

The invention relates to an adverse event risk prediction method based on an electronic health record of a patient.

Background

AIDS is a highly harmful infectious disease, is caused by infection of AIDS virus (HIV), and has the main attack target of CD4T lymphocyte which is the most important in the immune system of human body, so that the human body loses the immune function, is easy to infect various diseases and has high fatality rate. After AIDS, if the patient is actively treated, a relatively good treatment effect can be obtained, but if adverse events such as serious complications occur, the treatment effect is affected. The method can predict adverse events such as possible complications in the future by combining conventional risk factors and specific factors of AIDS patients, and can be used as powerful assistance for guiding the medical care of the AIDS patients. The Electronic Health Records (EHRs) of AIDS patients not only comprise medical codes (including diagnosis, medication and program codes, wherein the diagnosis codes comprise 585.9 (chronic kidney disease), the program codes refer to codes representing procedures such as intervention, treatment and the like, each code represents symptoms, diseases, abnormal findings, intervention, treatment and the like) of each diagnosis of the AIDS patients, but also comprise personalized data such as demographic data, vital signs and the like of the AIDS patients, and the data are utilized to predict possible future adverse events of the AIDS patients so as to assist doctors to make more reasonable decisions on the medical care of the AIDS patients.

The Chinese patent application CN109887606A as a diagnosis and prediction method of a bidirectional recurrent neural network based on attention provides a prediction method of the bidirectional recurrent neural network based on attention, firstly, high-dimensional medical codes (namely clinical variables) are embedded into a low code layer space, then, coded representations are input into the bidirectional recurrent neural network based on attention, and hidden state representations are generated. The medical code for future visits is predicted by the softmax layer.

Edward Choi (E.Choi, M.T.Bahadori, L.Song, et al.UA-CRNN: GRAM: graph-based assessment Model for Healthcare retrieval Learning [ C ]. In: proceedings of the 24th ACM SIGKDD International Conference on Knowledge discovery & data mining, london,2018, pp.249-256) et al propose a Representation Learning method based on a Knowledge Graph Attention mechanism, learning an embedded Representation containing more informative medical codes mainly using hierarchical information inherent to a medical ontology, and then performing prediction by using a depth Learning method. However, the above-mentioned technical method has the following problems: (1) The model has dependence on the training data volume, good prediction effect is achieved when the training data is sufficient, and the prediction performance is poor when the data volume is insufficient; (2) Medical ontology knowledge contained in medical coding is ignored, and the prediction performance of medical codes with low occurrence frequency and rare cases is poor.

The representation learning method based on the knowledge graph needs larger calculation cost and training difficulty in order to learn the embedded representation of the medical code containing richer information. In addition, the above methods ignore individual differences between patients, which has an effect on the accuracy of the prediction.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an adverse event risk prediction method based on patient electronic health record, which is characterized in that a clustering algorithm is used for sampling proper training samples for a basic learner, a self-adaptive combination strategy is designed, and integration weights of different basic classifiers are generated in a self-adaptive manner according to the distance between the training samples and the center of a pre-training set, so that a model has stronger self-adaptability.

The purpose of the invention is realized by the following technical scheme: a method for adverse event risk prediction based on an electronic patient health record, comprising the steps of:

s1, data preprocessing: taking the data of each patient as a time-sequential diagnostic sequence in the electronic health record data; the diagnostic sequence was processed as follows:

s11, using C = { C ₁ ,c ₂ ,...,c _N Denotes the set of all diagnostic codes, c _i Representing the ith diagnosis code, wherein i is more than or equal to 1 and less than or equal to N, and N represents the total number of the diagnosis codes; x = [ X ] ₁ ,x ₂ ,...,x _T ]Representing a patientThe visit information of (1), wherein the tth visit information x _t ∈{0,1} ^N ，{0,1} ^N Representing a vector of N elements, each element having a value of 0 or 1, i.e. x _t ＝{x _t1 ,x _t1 ,…,x _ti ,…x _tN }; if the diagnostic code c with serial number i _i ∈{c ₁ ,c ₂ ,...,c _N X is present in the t-th visit _ti =1, otherwise x _ti ＝0；

S12, using L = [ L = ₁ ,l ₂ ,...,l _T ]Personalized data representing all visits of a patient, l _i A vector representation of the personalized data record representing the ith visit; the average value is obtained for each patient in T times of treatment, and the average value l of the same kind of data in different times of treatment is obtained _* (ii) a Selecting missing values for numerical data, and selecting missing values to be filled by using an average value, and for non-numerical data, filling the missing values by using values with highest occurrence frequency in the patient data according to a mode principle in statistics;

s13, summing each diagnosis code in the X to obtain the frequency of the unique diagnosis code in all the clinic information of each patient

I.e. is>

Then all are treated again>

Summing to obtain the frequency S of unique diagnosis codes in all data _* Let P _* ＝s _* /S _* Representing the proportion of the occurrence frequency of each diagnosis code in each patient data in all data;

after the treatment is finished, the data of j th patient consists of three parts X _j 、L _j 、F _j J is more than or equal to 1 and less than or equal to M, and M represents the number of patients with collected data;

is shown asAverage value l of the same data in different visits of j patients _* ，/>

Representing the proportion of the occurrence frequency of each diagnosis code in the jth patient data in all the data;

s2, carrying out K-means clustering sampling treatment: with data for each patient

Performing K-means clustering as sample points, dividing data into 3 clusters to obtain 3 clustering centers theta ₁ ,θ ₂ ,θ ₃ Then calculating F for each patient data _j And F' for each cluster center, at the same sampling rate >>

Selecting corresponding sub-data sets from the data of all patients according to the sequence of the distances from small to large to obtain D' = D ₁ '∪D ₂ '∪D ₃ ' generating a plurality of subdata sets for training a basic classifier;

s3, clustering 3 centers theta ₁ ,θ ₂ ,θ ₃ According to P _* The maximum values in the three subsets are sorted from small to large and respectively used as an rare coding subset, a more common coding subset and a common coding subset, then the three subsets are respectively and correspondingly input into three basic classifiers of GRAM +, dipole + and RNN + for pre-training, and then the three basic classifiers are subjected to model fusion.

Further, the GRAM + adds a global attention mechanism by using the personalized data of the patient as a guide on the basis of GRAM, and the specific design is as follows:

in the knowledge directed acyclic graph formed by the medical ontology, a leaf node is an element in the diagnosis code set in the S11, and an ancestor node of the leaf node represents that the ontology represented by the leaf node is derived from the leaf node; all nodes c are assigned a basic embedding vector e, representing the final representation of a leaf node as a basic embedded convex combination of itself and its ancestor nodes:

wherein g is _i Represents a medical code c _i A (i) represents the code c _i And c _i Index of ancestor node, α _ij Is the local attention weight, calculated by the Softmax function as follows:

f(e _i ,e _j ) Is a scalar value, representing e _i And e _j The compatibility between two basic embeddings is obtained by a multilayer perceptron;

by concatenating the final representations g of all medical codes ₁ ,g ₂ ,...,g _N To obtain an embedded matrix G, and a future diagnosis vector v _t Expressed as a vector x _t Multiplied by the embedding matrix G and passed through a nonlinear tanh () function:

v ₁ ,v ₂ ,...,v _T ＝tanh(G[x ₁ ,x ₂ ,...,x _T ])

then using the personalized data L = [ L ] of the patient ₁ ,l ₂ ,...,l _T ]To add a global attention weight beta _t Obtaining a global representation u comprising patient-personalized data information _t ：

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value representing l _i And l _* Compatibility between, by sense of multi-layerObtaining by knowing a machine;

will u ₁ ,u ₂ ,...u _T Inputting the data into a GRU network to obtain a hidden state representation h ₁ ,h ₂ ,...,h _T Generating the first prediction information by the Softmax layer

Is defined as:

h ₁ ,h ₂ ,...,h _T ＝GRU(u ₁ ,u ₂ ,...,u _T ,θ _r )

θ _r is a super-reference to the GRU network,

and &>

Weights and biases to be learned;

using true diagnostic information y _t And prediction information

The loss is calculated as follows:

upper label

Representing a transpose; the loss calculation is back propagated, the error between prediction and reality is calculated, and back propagation is learned and corrected until the GRAM + model converges.

Further, the Dipole + utilizes patient personalized data L = [ L = ] ₁ ,l ₂ ,...,l _T ]As a guide, both directions are used simultaneouslyA recurrent neural network and attention mechanism to predict patient visit information; first, the visit information X is embedded into a representation vector v by a multi-layer perceptron _t Then using the patient's personalized data L = [ L ₁ ,l ₂ ,...,l _T ]To add a global attention weight beta _t Obtaining a global representation u comprising patient-personalized data information _t ：

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value;

then vector u _t Is input to a bi-directional recurrent neural network and finally, the bi-directional outputs are concatenated to generate a potential vector for prediction using an attention mechanism based on the data sequence position.

Further, RNN + is based on RNN with patient personalization data L = [ ] ₁ ,l ₂ ,...,l _T ]Guiding a patient visit information representation vector X to generate a global representation vector u comprising patient personalized data information _t Global representation vector u _t The algorithm of (c) is the same as the Dipole + method, and the vector u is represented globally _t An attention model based on the data sequence position is entered and then a prediction is made using unidirectional GRUs.

Further, an adaptive weighted integration strategy is adopted in the model fusion stage, and for each sample X _i ＝[x ₁ ,x ₂ ,...,x _T ]Calculating its distance d to each cluster center _i ＝[δ _i1 ,δ _i2 ,δ _i3 ]；

For each sample, the integrated weight w is generated using the following formula _i ：

The final integrated output result is expressed as:

wherein

The outputs of the three basic classifiers are shown.

The invention has the beneficial effects that: compared with the prior art, the technical scheme provided by the invention considers the difference between individuals of the patient, and utilizes the individualized data of the patient to guide the model to establish reasonable attention, thereby improving the accuracy of model prediction. In addition, the influence of different sample scales on the model performance is considered, appropriate training samples are sampled for the basic learning device through a clustering algorithm, a self-adaptive combination strategy is designed, and the integration weights of different basic classifiers are generated in a self-adaptive mode according to the distance from the training samples to the center of a pre-training set, so that the model has stronger self-adaptability. In addition, through the sampling after clustering, the calculation amount can be obviously reduced when the basic embedding is trained.

Drawings

FIG. 1 is a flow chart of an adverse event risk prediction method based on an electronic patient health record of the present invention;

FIG. 2 is a diagram illustrating the structure of the GRAM + classifier of the present invention;

FIG. 3 is a schematic diagram of the integration strategy of the present invention.

Detailed Description

The invention discloses an effective method for predicting adverse event risks of electronic health records of AIDS patients, which comprises the steps of firstly utilizing a clustering method to sample proper pre-training sets for different basic learners, integrating the prediction performances of different classifiers on codes with different frequencies, designing a self-adaptive combination strategy, and generating the integration weights of different basic classifiers in a self-adaptive manner according to the distance between a training sample and the center of the pre-training set so as to balance the difference of a single model on the prediction performances of medical codes with different frequency numbers. In addition, the model is added with an attention mechanism which takes personalized data as a guide to make up for individual differences, and the model accuracy is increased. The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the method for predicting the risk of adverse events based on the electronic health record of the patient of the present invention comprises the following steps:

s1, data preprocessing: in the electronic health record data, the data of each patient is regarded as a time sequence of diagnosis, and in each diagnosis, a plurality of diagnosis codes (including diagnosis, medication, program codes and the like) exist; the diagnostic sequence was processed as follows:

s11, using C = { C = ₁ ,c ₂ ,...,c _N Denotes the set of all diagnostic codes, c _i Representing the ith diagnosis code, wherein i is more than or equal to 1 and less than or equal to N, and N represents the total number of the diagnosis codes; x = [ X ] ₁ ,x ₂ ,...,x _T ]Representing the visit information of a patient, wherein the t-th visit information x _t ∈{0,1} ^N Being binary vectors, {0,1} ^N Representing a vector of N elements, each element having a value of 0 or 1, i.e. x _t ＝{x _t1 ,x _t1 ,…,x _ti ,…x _tN }; if the diagnostic code c with serial number i _i ∈{c ₁ ,c ₂ ,...,c _N X is present in the t-th visit _ti =1, otherwise x _ti ＝0；

S12, using L = [ L = ₁ ,l ₂ ,...,l _T ]Personalized data representing all visits by a patient, l _i A vector representation of the personalized data record representing the ith visit; the average value is obtained for each patient in T times of treatment, and the average value l of the same kind of data in different times of treatment is obtained _* (ii) a Selecting missing values for numerical data, and selecting missing values to be filled by using an average value, and for non-numerical data, filling the missing values by using values with highest occurrence frequency in the patient data according to a mode principle in statistics;

s13, summing each diagnosis code in the X to obtain the frequency of the unique diagnosis code in all the treatment information of each patientNext time

I.e. is>

Then all are treated again>

after the treatment, the data of j th patient consists of three parts X _j 、L _j 、F _j J is more than or equal to 1 and less than or equal to M, and M represents the number of patients with collected data; x _j For training classifiers, L _j To guide the attention mechanism of the classifier,

for clustering patients, and for determining whether a patient is in a cluster>

Mean value l representing the same data from different visits of the j-th patient _* ，/>

Indicating the proportion of the frequency of occurrence of each diagnostic code in the jth patient data in the total data.

S2, carrying out K-means clustering sampling treatment: data per patient F = [ l = _* ,P _* ]Performing K-means clustering as sample points, dividing data into 3 clusters (the number of the clusters is generally the same as that of basic classifiers adopted later), and more likely aggregating data with similar personalized data and similar frequency of occurrence of diagnostic codes in diagnostic records to the same cluster; given the number of clusters 3, the data for all patients were divided into D = D using the K-means algorithm in combination with D ₁ ∪D ₂ ∪D ₃ To obtain 3Center of each cluster theta ₁ ,θ ₂ ,θ ₃ Then calculating F for each patient data _j And F' of each cluster center at the same sampling rate for each cluster center

Selecting corresponding sub-data sets from the data of all patients according to the sequence of the distances from small to large to obtain D' = D ₁ '∪D ₂ '∪D ₃ ', the generated plurality of subdata sets are used for training of a basic classifier;

s3, clustering 3 centers theta ₁ ,θ ₂ ,θ ₃ According to P _* The maximum values in the three subsets are sorted from small to large and respectively used as an uncommon coding subset, a more common coding subset and a common coding subset, then the three subsets are respectively and correspondingly input into three basic classifiers of GRAM +, dipole + and RNN + for pre-training to learn decision boundaries, and then model fusion is carried out on the three basic classifiers;

the GRAM + is based on GRAM, adds a global attention mechanism by using the personalized data of the patient as a guide, and is specifically designed as follows as shown in fig. 2:

in a knowledge directed acyclic graph formed by a medical ontology, leaf nodes and ancestor nodes are used for distinguishing, the medical ontology conforms to a tree structure when being named and coded, all the leaf nodes are elements in a diagnostic code set in S11, and the ancestor nodes represent ontologies represented by the leaf nodes and are derived from the leaf nodes; all nodes c are assigned a basic embedding vector e, representing the final representation of a leaf node as a basic embedded convex combination of itself and its ancestor nodes:

wherein g is _i Represents a medical code c _i (i.e., leaf node) embedded representation, a (i) represents code c _i And c _i Index of ancestor node, α _ij Is the local attention weight of the user and,calculated by the Softmax function as follows:

f(e _i ,e _j ) Is a scalar value, representing e _i And e _j Compatibility between two basic embeddings, derived by the multilayer perceptron (MLP); training basic embedding by using Glover, and learning coded representation by using a global co-occurrence matrix of all nodes c;

by concatenating the final representations g of all medical codes ₁ ,g ₂ ,...,g _N To obtain an embedded matrix G, and a diagnosis vector v _t Expressed as a vector x _t Multiplied by the embedding matrix G and passed through a nonlinear tanh () function:

v ₁ ,v ₂ ,...,v _T ＝tanh(G[x ₁ ,x ₂ ,...,x _T ])

then using the patient's personalized data L = [ L = ₁ ,l ₂ ,...,l _T ]To add a global attention weight beta _t Obtaining a global representation u comprising patient-personalized data information _t ：

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value, representing l _i And l _* The compatibility between the sensors is obtained by a multilayer perceptron;

will u ₁ ,u ₂ ,...u _T Inputting the data into a GRU network to obtain a hidden state representation h ₁ ,h ₂ ,...,h _T Generating the first prediction information by a Softmax layer

Is defined as follows:

h ₁ ,h ₂ ,...,h _T ＝GRU(u ₁ ,u ₂ ,...,u _T ,θ _r )

θ _r is a super-reference to the GRU network,

and &>

Weights and biases to be learned;

using true diagnostic information y _t And predictive information

The loss is calculated as follows:

upper label

Representing a transposition; the loss calculation is back propagated, the error between prediction and reality is calculated, and back propagation is learned and corrected until the GRAM + model converges.

The Dipole + utilizes patient personalized data L = [ L = [ ] ₁ ,l ₂ ,...,l _T ]As a guide, a bidirectional recurrent neural network and an attention mechanism are used simultaneously to predict the patient information; first, the information X of the doctor is embedded into a representation vector v by a multi-layer perceptron _t Then using the patient's personalized data L = [ L = ₁ ,l ₂ ,...,l _T ]To add a global attention weight beta _t Obtaining a global representation u comprising patient-personalized data information _t ：

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value;

RNN + is based on RNN using patient-specific data L = [ L = ₁ ,l ₂ ,...,l _T ]Guiding the patient visit information representation vector X to generate a global representation vector u containing patient personalized data information _t Global representation vector u _t The algorithm of (c) is the same as the Dipole + method, and the vector u is represented globally _t An attention model based on the data sequence position is entered and then predictions are made using unidirectional GRUs.

Adopting a self-adaptive weighting integration strategy in a model fusion stage; as shown in FIG. 3, in the fusion phase, X is applied to each sample _i ＝[x ₁ ,x ₂ ,...,x _T ]Calculating its distance d to each cluster center _i ＝[δ _i1 ,δ _i2 ,δ _i3 ](ii) a The distance measures the degree that the training sample data belongs to a certain cluster, the closer the center of the pre-training data subset is to the basic classifier of the training sample, the better adaptability to the sample is, and the indirect measurement measures that the classifier trained on the cluster predicts the new sample X _i Of the cell. For each sample, the integrated weight w is generated using the following formula _i ：

The final integrated output result is expressed as:

wherein

Represents the output of three basic classifiers>

Is a prediction output which represents the probability of the occurrence of the medical code corresponding to the index in a future diagnosis, namely the risk of various adverse events which may occur in the future of the patient, thereby assisting the doctor to make more reasonable decisions on the medical care of the patient.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A method for predicting risk of an adverse event based on an electronic health record of a patient, comprising the steps of:

s1, data preprocessing: taking the data of each patient as a time sequence of diagnosis in the electronic health record data; the diagnostic sequence was processed as follows:

s11, using C = { C = ₁ ,c ₂ ,...,c _N Denotes the set of all diagnostic codes, c _i Representing the ith diagnosis code, wherein i is more than or equal to 1 and less than or equal to N, and N represents the total number of the diagnosis codes; x = [ X ] ₁ ,x ₂ ,...,x _T ]Representing the visit information of a patient, wherein the t-th visit information x _t ∈{0,1} ^N ，{0,1} ^N Representing a vector of N elements, each element having a value of 0 or1, i.e. x _t ＝{x _t1 ,x _t1 ,…,x _ti ,…x _tN }; if the diagnostic code c with serial number i _i ∈{c ₁ ,c ₂ ,...,c _N X is present in the t-th visit _ti =1, otherwise x _ti ＝0；

s13, summing each diagnosis code in the X to obtain the frequency S of the unique diagnosis code in all the visit information of each patient _*i I.e. by

For all s _*i Summing to obtain the frequency S of the unique diagnosis code in all data _* Let P _* ＝s _* /S _* Representing the proportion of the occurrence frequency of each diagnosis code in each patient data in all data;

after the treatment is finished, the data of j th patient consists of three parts X _j 、L _j 、F _j J is more than or equal to 1 and less than or equal to M, and M represents the number of patients with collected data; f _j ＝[l _*j ,P _*j ]，l _*j Mean value l representing the same data from different visits of the j-th patient _* ，P _*j Representing the proportion of the occurrence frequency of each diagnosis code in the jth patient data in all the data;

s2, carrying out K-means clustering sampling treatment: with data F for each patient _j ＝[l _*j ,P _*j ]Performing K-means clustering as sample points, dividing data into 3 clusters to obtain 3 clustering centers theta ₁ ,θ ₂ ,θ ₃ Then, howeverPost-calculation of F for each patient data _j And F' of each cluster center at the same sampling rate for each cluster center

s3, clustering 3 centers theta ₁ ,θ ₂ ,θ ₃ According to P _* The maximum values in the three subsets are sorted from small to large and respectively used as an uncommon code subset, a more common code subset and a common code subset, then the three subsets are respectively and correspondingly input into three basic classifiers of GRAM +, dipole + and RNN + for pre-training, and then model fusion is carried out on the three basic classifiers.

2. The method as claimed in claim 1, wherein the GRAM + is a global attention mechanism added by using personalized data of patients as guidance based on GRAM, and is specifically designed as follows:

by concatenating the final representations g of all medical codes ₁ ,g ₂ ,...,g _N To obtain an embedded matrix G, and a diagnosis vector v _t Expressed as a vector x _t Multiplied by the embedding matrix G and passed through a nonlinear hyperbolic tangent activation function tanh ():

v ₁ ,v ₂ ,...,v _T ＝tanh(G[x ₁ ,x ₂ ,...,x _T ])

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value representing l _i And l _* The compatibility between the sensors is obtained by a multilayer perceptron;

will u ₁ ,u ₂ ,...u _T Inputting the data into GRU network to obtain hidden state representation h ₁ ,h ₂ ,...,h _T Generating the first prediction information by the Softmax layer

Is defined as:

h ₁ ,h ₂ ,...,h _T ＝GRU(u ₁ ,u ₂ ,...,u _T ,θ _r )

θ _r is a super-reference to the GRU network,

and &>

Weights and biases to be learned;

using true diagnostic information y _t And prediction information

The loss is calculated as follows:

upper label

3. The method as claimed in claim 1, wherein the Dipole + utilizes the patient personalized data L = [ L ] ₁ ,l ₂ ,...,l _T ]As a guide, a bidirectional recurrent neural network and an attention mechanism are used simultaneously to predict the patient information; first, the visit information X is embedded into a representation vector v by a multi-layer perceptron _t Then using the patient's personalized data L = [ L ₁ ,l ₂ ,...,l _T ]To add a global attention weight beta _t Obtaining a global representation u comprising patient-personalized data information _t ：

u _t ＝β _t v _t ,t＝1,2,...,T

β _t Calculated by the following Softmax function:

f(l _i ,l _* ) Is a scalar value;

4. The method of claim 3, wherein RNN + is based on RNN using patient personalized data L = [ L ] on the basis of RNN ₁ ,l ₂ ,...,l _T ]Guiding a patient visit information representation vector X to generate a global representation vector u comprising patient personalized data information _t Global representation vector u _t The algorithm of (c) is the same as the Dipole + method, and the vector u is represented globally _t An attention model based on the data sequence position is entered and then predictions are made using unidirectional GRUs.

5. The method of claim 1, wherein an adaptive weighted integration strategy is used in the model fusion stage to predict the risk of adverse events based on the patient's electronic health record (EMR), and wherein X is used for each sample _i ＝[x ₁ ,x ₂ ,...,x _T ]Calculating its distance d to each cluster center _i ＝[δ _i1 ,δ _i2 ,δ _i3 ]；

The final integrated output result is expressed as:

wherein

The outputs of the three basic classifiers are shown. />