CN114783608B - Construction method of slow patient group disease risk prediction model based on graph self-encoder - Google Patents

Construction method of slow patient group disease risk prediction model based on graph self-encoder Download PDF

Info

Publication number
CN114783608B
CN114783608B CN202210507317.9A CN202210507317A CN114783608B CN 114783608 B CN114783608 B CN 114783608B CN 202210507317 A CN202210507317 A CN 202210507317A CN 114783608 B CN114783608 B CN 114783608B
Authority
CN
China
Prior art keywords
disease
patient
encoder
graph
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210507317.9A
Other languages
Chinese (zh)
Other versions
CN114783608A (en
Inventor
邱航
胡智栩
杨萍
王利亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210507317.9A priority Critical patent/CN114783608B/en
Publication of CN114783608A publication Critical patent/CN114783608A/en
Application granted granted Critical
Publication of CN114783608B publication Critical patent/CN114783608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to the technical field of medical information, in particular to a method for constructing a slow patient group disease risk prediction model based on a graph self-encoder, which constructs a patient-disease bipartite graph based on hospitalization records and historical disease information of a patient, and then extracts feature vectors for the patient and the disease respectively; finally, a disease risk prediction model based on a graph attention mechanism is constructed based on a graph self-encoder framework to predict the future disease risk of a chronic disease patient, and the attention mechanism is used in a decoder part of the disease risk prediction model and the weight information of edges is considered at the same time, so that the topology information of two graphs and the individual difference of the patient can be considered at the same time, the complex influence relation among diseases is learned, and the aim of improving the prediction effect is further achieved.

Description

Construction method of slow patient group disease risk prediction model based on graph self-encoder
Technical Field
The invention relates to the technical field of medical information, in particular to a method for constructing a slow patient group disease risk prediction model based on a graph self-encoder.
Background
The aggravation of aging population and the rapid rise in the incidence of chronic diseases place a serious social and economic burden worldwide. It is estimated that more than 75% of the elderly have more than one chronic disease, and that the multiple diseases of the elderly (two and more chronic diseases) have become a prominent global problem, resulting in greater medical needs, more medical service usage and costs. There is a complex correlation between chronic diseases, some of which may lead to the occurrence of other chronic diseases, further increasing the therapeutic burden on the patient. Prevention and treatment of chronic diseases and related complications has become an unprecedented problem. The method can effectively predict the future disease risk of the chronic disease patient, can lead doctors to intervene in advance, and reduces the occurrence risk of related diseases, thereby preventing the diseases and having great realization significance. The existing disease risk prediction method mainly has the following problems:
(1) The partial prediction method models the disease prediction problem as a series of two-class models, each of which predicts whether a disease occurs, and this modeling method causes the number of models to increase as the number of predicted diseases increases, limiting the practicality of the models.
(2) The partial prediction method utilizes the historical disease information of the patient, abstracts the disease information into a patient-disease bipartite graph, models the problem as a link prediction problem, predicts the disease risk by using heuristic methods such as a Common Neighbor (CN) index, an Adamic-Adar (AA) index and the like, and only considers the topological information of the bipartite graph, but does not consider the individual difference of the patient, such as sex, age and the like.
(3) Most of the existing prediction methods do not consider complex influence relation among diseases, so that the prediction effect is poor.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a method for constructing a slow patient group disease risk prediction model based on a graph self-encoder, which aims to solve the technical problems that the prior prediction method mentioned in the background art does not consider the influence of complex relations among diseases, and the prediction effect is poor.
The technical scheme adopted by the invention is as follows:
the method for constructing the slow patient group disease risk prediction model based on the graph self-encoder comprises the following steps:
step 1: acquiring a data set of a first page of a historical medical record, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
step 2: dividing the preprocessed historical case data into a disease which the patient has historically and a disease which the patient has in the future based on a time sequence, constructing the disease which the patient has historically into a patient-disease coding bipartite graph, and constructing the disease which the patient has in the future N years into a patient-disease decoding bipartite graph;
step 3: invoking historical case data in the storage space, and extracting a patient feature vector and a disease feature vector based on the historical case data;
step 4: establishing an encoder and a decoder based on the patient-disease encoding bipartite graph and the patient-decoding bipartite graph respectively, wherein the encoder is a graph annotation network, and establishing a disease risk prediction model based on the encoder and the decoder;
step 4.1: establishing a heuristic feature extraction model;
step 4.2: establishing a neighbor sampling strategy;
step 4.3: using a graph attention network as an encoder, wherein the encoder comprises at least one graph convolution module, and a graph convolution layer of each graph convolution module learns weights of different neighbors by adopting a graph attention mechanism to obtain a final embedded vector expression;
step 4.4: constructing a bilinear decoder based on the patient-decoded bipartite graph, wherein the bilinear decoder predicts the existence probability of edges in the patient-decoded graph for embedded vector expressions and heuristic features of edges of known patients and diseases;
step 5: disease risk prediction models are trained based on the dataset of the historic medical records top page.
For the new hospitalization record, the invention can also obtain individual information, hospitalization hospital information and historical disease diagnosis information of the patient, and can extract corresponding patient characteristic vectors and disease characteristic vectors; for both the patient-disease encoding bipartite graph and the patient-disease decoding bipartite graph, new hospitalization record data for the patient is added to both bipartite graphs. Finally, the trained disease risk prediction model obtains the disease risks of other diseases corresponding to the patient, and descending order is arranged according to the risks.
In the decoder part of the disease risk prediction model, the attention mechanism is used, and the weight information of the edges is considered, so that the topological information of the two graphs and the individual difference of patients can be considered at the same time, the complex influence relationship among diseases is learned, and the purpose of improving the prediction effect is further achieved.
Preferably, the preprocessing in step 1 is to reject variables with a deletion rate greater than 30% in the data set, and fill the deletion value with the average of the non-missing portions for the remaining data with the deletion rate.
Preferably, the edges in the patient-disease encoding bipartite graph represent disease that the patient has historically, and the weights represent the number of occurrences of the disease; the patient-disease decoding bipartite graph comprises a positive sample and a negative sample, wherein the positive sample is a new disease of the patient in the future N years, and the negative sample is a disease which can not be new in the patient in the future N years; subtracting the patient-disease encoding bipartite graph from the full bipartite graph to obtain the edges of the patient-disease decoding bipartite graph; the patient-disease encoding bipartite graph is used for an encoder to automatically learn the expression of the embedded vectors of the patient nodes and the disease nodes, and the patient-disease decoding bipartite graph is used for a decoder to learn the occurrence probability of each edge.
Preferably, the extraction of the patient feature vector includes individual information, hospitalization hospital information, the number of historic diseases and ECI co-disease index of historic diseases; the data with the characteristic type of discrete type is subjected to single-heat coding and is converted into binary variables of 0-1; taking the data with the characteristic type of numerical value as continuous characteristics and taking the value as real number; and encoding the characteristic type as discrete data and the data with sequential relation as numerical characteristic.
Preferably, the extraction of the disease feature vector is performed by ascending order arrangement of ICD-10 codes of disease nodes to obtain the serial number of each disease node, and then a vector is generated for each disease node by independent heat coding; and the prevalence of each disease (number of patients divided by total number) was calculated as a feature to characterize the prevalence of the disease.
Preferably, the step 4 includes the steps of:
step 4.1: establishing a heuristic feature extraction model:
Figure GDA0004133582250000031
Figure GDA0004133582250000032
Figure GDA0004133582250000033
/>
Figure GDA0004133582250000034
in the method, in the process of the invention,
Figure GDA0004133582250000035
and->
Figure GDA0004133582250000036
A set of neighbor nodes that are nodes i, j and z, respectively, wherein node i represents a central node; the I.I is the size of the set; />
Figure GDA0004133582250000037
It is the second order neighbor set of node j; />
Figure GDA0004133582250000038
Common Neighbors index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure GDA0004133582250000039
Adamic-Adar index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure GDA00041335822500000310
Jaccard's coeffient index representing the edges i, j of a patient-disease encoding bipartite graph, +.>
Figure GDA00041335822500000311
Preferential Attachment index representing the edges i, j of the patient-disease encoding bipartite graph; the larger the value of the index is, the higher the occurrence probability of the edge is;
step 4.2: establishing a neighbor sampling strategy:
Figure GDA00041335822500000312
wherein: w (w) ij And
Figure GDA00041335822500000313
weights and sampling probabilities, w, of edges i, j representing patient-disease encoding bipartite graph, respectively iu Weights representing sides i, u of the patient-disease encoding bipartite graph based on sampling probability +.>
Figure GDA00041335822500000314
Performing put-back sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: using a graph attention network as an encoder, wherein the encoder comprises at least one graph convolution module, and a graph convolution layer of each graph convolution module learns weights of different neighbors by adopting a graph attention mechanism to obtain a final embedded vector expression; defining the first layer of the encoder is characterized by
Figure GDA00041335822500000315
Multi-headed attention weight from node j to node i>
Figure GDA00041335822500000316
Calculated from the following formula:
Figure GDA00041335822500000317
Figure GDA0004133582250000041
Figure GDA0004133582250000042
in the method, in the process of the invention,
Figure GDA0004133582250000043
a query vector representing the attention of a central node i at the layer 1 network of the encoder at the c-th header;
Figure GDA0004133582250000044
a weight matrix representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure GDA0004133582250000045
An embedded vector representing a central node i at a layer 1 network of the encoder; />
Figure GDA0004133582250000046
A bias term representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure GDA0004133582250000047
A key vector representing the attention of node j at the c-th head in the layer 1 network of the encoder; />
Figure GDA0004133582250000048
A weight matrix representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure GDA0004133582250000049
An embedded vector representing a node j in a layer 1 network of the encoder; w (w) ij Weights representing edges i, j; />
Figure GDA00041335822500000410
A bias term representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500000411
Attention weights representing the attention of edge i, j at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500000412
A key vector representing the attention of node u at the c-th head in the layer one network of the encoder; />
Figure GDA00041335822500000413
Exponential scaling of the vector dot product is performed, and d is the dimension of the vector;
after obtaining the multi-head attention weight, carrying out message aggregation operation on embedded vectors of different neighbors:
Figure GDA00041335822500000414
Figure GDA00041335822500000415
in the method, in the process of the invention,
Figure GDA00041335822500000416
a value vector representing the attention of node j at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500000417
A weight matrix representing the attention of the vector v at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500000418
A bias term representing the attention of the vector v at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500000419
An attention vector representing a central node i of the layer 1 network of the encoder; splicing operation of representation vectors;
embedding vector of center node i
Figure GDA00041335822500000420
And->
Figure GDA00041335822500000421
In combination, and taking into account the gating residual mechanism, the inflow of selective control information, thereby calculating the embedded vector expression of the next layer +.>
Figure GDA00041335822500000422
The specific calculation formula is as follows:
Figure GDA00041335822500000423
Figure GDA00041335822500000424
Figure GDA00041335822500000425
wherein r is i (l) Information representing a central node i in a layer one network of the encoder;
Figure GDA0004133582250000051
a weight matrix representing a central node i in a layer one network of the encoder; />
Figure GDA0004133582250000052
A bias term representing a central node i in a layer one network of the encoder; />
Figure GDA0004133582250000053
A weight representing the gating residual of the central node i in the layer 1 network of the encoder; will->
Figure GDA0004133582250000054
r i (l) And
Figure GDA0004133582250000055
Figure GDA0004133582250000056
sequentially spliced and passes ∈ ->
Figure GDA0004133582250000057
The weight matrix is subjected to linear transformation, and the value range is mapped to the interval from 0 to 1 through a sigmoid function, so that r is controlled i (l) And->
Figure GDA0004133582250000058
A function of information inflow; finally obtaining the embedded vector representation of the central node i of the layer 1 network by LayerNorm and ReLU activation functions>
Figure GDA0004133582250000059
Step 4.4: constructing a bilinear decoder, wherein the bilinear decoder is an embedded vector expression of known patients and diseases, predicts the existence probability of edges in a patient-decoding diagram, and calculates the following formula:
Figure GDA00041335822500000510
Figure GDA00041335822500000511
in the method, in the process of the invention,
Figure GDA00041335822500000512
representing the index corresponding to the edge i, j and taking the index as heuristic characteristics; />
Figure GDA00041335822500000513
Transpose of embedded vector representing node i, h j A vector representing node j; the above uses multiple weight matrices to reference the multi-head attention mechanism>
Figure GDA00041335822500000514
Learning +.>
Figure GDA00041335822500000515
And h j And then the learned results are spliced to obtain +.>
Figure GDA00041335822500000516
Will be
Figure GDA00041335822500000517
Splicing with heuristic features to form hidden layer feature expression of edge ++>
Figure GDA00041335822500000518
Finally through W o The weight matrix is subjected to linear transformation, and the bias term b is added o Obtaining the result of the output layer, and obtaining the prediction probability p of the edges i and j by using a sigmoid activation function ij
Figure GDA00041335822500000519
The loss function uses cross entropy and is calculated as follows:
Figure GDA00041335822500000520
wherein G is dec Representing a decoding diagram e ij Representing edges ii, j, y ij Labels representing edges; and optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
Preferably, the preprocessed data set is divided into a training set, a verification set and a test set according to the proportion of 7:1:2; the training set is used for training the disease risk prediction model, the verification set is used for optimizing parameters of the disease risk prediction model, and the test set is used for evaluating the generalization effect of the disease risk prediction model.
Preferably, all negative samples in the data set are acquired to form a negative sample set, the negative sample set is sampled to generate a negative sample for training a disease risk prediction model, and the ratio of the positive sample to the negative sample is set to be 1:10.
The beneficial effects of the invention include:
1. for the new hospitalization record, the invention can also obtain individual information, hospitalization hospital information and historical disease diagnosis information of the patient, and can extract corresponding patient characteristic vectors and disease characteristic vectors; for both the patient-disease encoding bipartite graph and the patient-disease decoding bipartite graph, new hospitalization record data for the patient is added to both bipartite graphs. Finally, the trained disease risk prediction model obtains the disease risks of other diseases corresponding to the patient, and descending order is arranged according to the risks.
In the encoder part of the disease risk prediction model, the attention mechanism is used, and the weight information of the edges is considered, so that the topological information of the two graphs and the individual difference of patients can be considered at the same time, the complex influence relationship among diseases is learned, and the purpose of improving the prediction effect is achieved.
2. The invention adopts the final output result to arrange the disease probability of the diseases in a descending order, realizes the risk prediction of all diseases, and has wide practical value.
3. The invention can complete modeling only by the data of the first page of the medical records of the patient, extracts the characteristic vectors of the patient and the characteristic vectors of the diseases, digs available information in all aspects, and strengthens the prediction capability of the model.
4. Besides considering node embedded vectors learned by an encoder, the decoder part of the disease risk prediction model extracts heuristic features such as CN, AA and the like for each edge, and the heuristic features can supplement additional information, so that the model converges more quickly and has better effect.
Drawings
FIG. 1 is a construction of a patient-disease bipartite graph of the present invention.
Fig. 2 is a diagram showing a disease risk prediction model structure according to the present invention.
FIG. 3 is a training flow chart of the disease risk prediction model of the present invention.
Fig. 4 is a prediction flow chart of the disease risk prediction model of the present invention.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
Embodiments of the present invention are described in further detail below with reference to fig. 1 and 4:
the method for constructing the slow patient group disease risk prediction model based on the graph self-encoder comprises the following steps:
step 1: acquiring a data set of a first page of a historical medical record, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
the history first page data is a record item generated by the patient after the hospitalization is completed, and each record contains individual information (encrypted identification card number, gender, age, hospitalization time, discharge time and the like) of the patient, information of hospitalization hospitals (information of hospital grade, hospital address and the like) and hospitalization disease diagnosis (main diagnosis and 15 secondary diagnoses at most) of the patient, and is coded by international disease classification 10 th edition (International Classification of Disease-Revision 10, ICD-10); based on the above, the data needs to be preprocessed, that is, variables with the missing rate greater than 30% in the data set are removed, and the remaining data with the missing rate is filled with the missing value by using the average value of the non-missing part; the data without missing values is obtained and stored in a memory space established in a storage medium, such as a database.
The objective of the present invention is to predict the risk of disease in a patient for the next N years based on the patient's historical disease and individual information. Therefore, patients with inpatients whose time span of inpatients is longer than N years need to be screened out before data without missing values are formed, chronic disease diagnosis of inpatients of last N years is regarded as predictive label, and history of inpatients is regarded as known information; the invention can predict future disease risks with different time coarse granularity by setting the value of N.
Step 2: dividing the preprocessed historical case data into a disease which the patient has historically and a disease which the patient has in the future based on a time sequence, constructing the disease which the patient has historically into a patient-disease coding bipartite graph, and constructing the disease which the patient has in the future N years into a patient-disease decoding bipartite graph;
referring to fig. 1, in order to predict the possible diseases of the patient in the next N years, the present invention abstracts the task scenario into a link prediction problem of two graphs, the left node of which represents different patients, the right node represents different diseases, and only the edges from patient to disease exist, the patient-disease encoding two graphs are used for the encoder to automatically learn the patient node and the disease node embedded vector expression, and the patient-disease decoding two graphs are used for the decoder to learn the occurrence probability of each edge.
The edges in the patient-disease encoding bipartite graph represent disease that the patient has historically, and the weights represent the number of occurrences of the disease; the solid line in the patient-disease decoding bipartite graph represents a positive sample, i.e., new disease in the patient for the next N years; the dashed lines in the patient-disease decoding bipartite graph represent negative samples, i.e., diseases that the patient will not develop newly for the next N years; the patient-disease decoding bipartite graph edges are obtained by subtracting the patient-disease encoding bipartite graph from the full bipartite graph.
The self-constructed patient-disease coding bipartite graph of the present invention is used for a disease risk prediction model (the self-constructed disease risk prediction model is named: GADP model, graph Attention Disease Prediction, GADP, the graph annotates the disease risk prediction model, which is referred to herein as the disease risk prediction model for convenience of description) to automatically learn patient node and patient node embedded vector expression, while the patient-disease decoding bipartite graph is used for resolving future disease risk of the disease.
Step 3: invoking historical case data in the storage space, and extracting a patient feature vector and a disease feature vector based on the historical case data;
the extraction of the patient feature vector includes individual information, hospital information, number of historic diseases, ECI co-morbid index (Elixhauser Comorbidity Index, ECI) of historic diseases, ECI co-morbid index being capable of quantifying the physical condition of the patient to some extent; carrying out single-heat coding on data with discrete characteristic types, and converting the data into binary variables of 0-1; taking the data with the characteristic type of numerical value as continuous characteristics and taking the value as real number; and encoding the characteristic type as discrete data and the data with sequential relation as numerical characteristic.
See in particular table 1 below:
TABLE 1 extraction of feature vectors for patient nodes
Figure GDA0004133582250000081
Referring to table 1 above, the third column in table 1 is the data type of the features, and if the features are numerical, the features are treated as continuous features and the values are real numbers. If discrete, it is required to convert it into a binary variable of 0-1 by one-hot encoding. However, as in the "hospitalization" field, its values are dangerous, urgent and general, although discrete data, there is a sequential relationship of values, which are coded as numerical features, i.e., 1, 2 and 3, in order to reduce the data dimension; thus, the feature dimension can be reduced, and the sequence information in the feature dimension can be reserved.
The extraction of the disease characteristic vector is to obtain the serial number of each disease node by ascending order arrangement of ICD-10 codes of the disease node, and then to generate a vector for each disease node by independent heat coding; and the prevalence of each disease is calculated as a characteristic used to characterize the prevalence of the disease.
Step 4: establishing an encoder and a decoder based on the patient-disease encoding bipartite graph and the patient-decoding bipartite graph respectively, wherein the encoder is a graph annotation network, and establishing a disease risk prediction model based on the encoder and the decoder;
the present invention uses a graph self-encoder (Graph auto encoder, GAE) as the link prediction base prediction architecture. The graph is used as an end-to-end model from the encoder, embedded vector expression of each node in the encoded graph can be automatically learned, and then the probability of each edge in the decoded graph is predicted by the decoder. The core components of the self-encoder are the encoder and decoder. The present invention uses a graph attention network (Graph Attention Networks, GAT) as an encoder and a Bilinear layer (Bilinear layer) as a decoder, and this model is named a graph attention disease prediction (Graph Attention Disease Prediction, GADP) model, the network structure of which is shown in fig. 2.
The step 4 comprises the following steps:
step 4.1: establishing a heuristic feature extraction model:
Figure GDA0004133582250000091
Figure GDA0004133582250000092
Figure GDA0004133582250000093
Figure GDA0004133582250000094
in the method, in the process of the invention,
Figure GDA0004133582250000095
and->
Figure GDA0004133582250000096
A set of neighbor nodes that are nodes i, j and z, respectively, wherein node i represents a central node; the I.I is the size of the set; />
Figure GDA0004133582250000097
It is the second order neighbor set of node j; />
Figure GDA0004133582250000098
Common Neighbors index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure GDA0004133582250000099
Adamic-Adar index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure GDA00041335822500000910
Jaccard's coeffient index representing the edges i, j of a patient-disease encoding bipartite graph, +.>
Figure GDA00041335822500000911
Preferential Attachment index representing the edges i, j of the patient-disease encoding bipartite graph; the larger the value of the index is, the higher the occurrence probability of the edge is;
step 4.2: because in the neighbor sampling strategy in the graph neural network, a certain number of neighbors are generally sampled based on the mean random distribution; however, because the influence degree of different diseases on patients is different, taking the edge weight of the disease-coding bipartite graph into consideration, a non-uniform neighbor sampling strategy is designed, so that the larger the weight is, the higher the sampling probability is, and the specific neighbor sampling strategy is as follows:
Figure GDA00041335822500000912
wherein: w (w) ij And
Figure GDA00041335822500000913
weights and sampling probabilities, w, of edges i, j representing patient-disease encoding bipartite graph, respectively iu Weights representing sides i, u of the patient-disease encoding bipartite graph based on sampling probability +.>
Figure GDA00041335822500000914
Performing put-back sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: referring to fig. 2, using a graph attention network as an encoder, the encoder comprises two identical graph convolution modules, and the graph convolution layer of each graph convolution module learns weights of different neighbors by adopting a graph attention mechanism to obtain a final embedded vector expression; defining the first layer of the encoder is characterized by
Figure GDA0004133582250000101
Multi-headed attention weight from node j to node i>
Figure GDA0004133582250000102
Calculated from the following formula:
Figure GDA0004133582250000103
Figure GDA0004133582250000104
Figure GDA0004133582250000105
in the method, in the process of the invention,
Figure GDA0004133582250000106
a query vector representing the attention of a central node i at the layer 1 network of the encoder at the c-th header;
Figure GDA0004133582250000107
a weight matrix representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure GDA0004133582250000108
An embedded vector representing a central node i at a layer 1 network of the encoder; />
Figure GDA0004133582250000109
A bias term representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500001010
A key vector representing the attention of node j at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500001011
A weight matrix representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500001012
An embedded vector representing a node j in a layer 1 network of the encoder; w (w) ij Weights representing edges i, j; />
Figure GDA00041335822500001013
A bias term representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500001014
Attention weights representing the attention of edge i, j at the c-th head in the layer 1 network of the encoder; />
Figure GDA00041335822500001015
A key vector representing the attention of node u at the c-th head in the layer one network of the encoder; />
Figure GDA00041335822500001016
Is a vector quantityThe dot product is exponentially scaled, d is the dimension of the vector;
first, the central node of the first layer is embedded into the vector
Figure GDA00041335822500001017
By->
Figure GDA00041335822500001018
Linear transformation into query vector->
Figure GDA00041335822500001019
Embedding neighbor nodes into vectors->
Figure GDA00041335822500001020
Sum edge weight w ij Splicing and passing->
Figure GDA00041335822500001021
Linear transformation into key vector->
Figure GDA00041335822500001022
Reuse of<q,k>Calculate the attention weight of the edge, +.>
Figure GDA00041335822500001023
Exponential scaling of the vector dot product is performed, and d is the dimension of the vector; finally, normalized attention weight is obtained by normalization operation>
Figure GDA00041335822500001024
After obtaining the multi-head attention weight of the graph, carrying out message aggregation operation on embedded vectors of different neighbors:
Figure GDA00041335822500001025
Figure GDA00041335822500001026
wherein C is the total number of heads of attention, and is the vector splicing operation; first by
Figure GDA00041335822500001027
Obtain->
Figure GDA00041335822500001028
Value vector after linear transformation ∈ ->
Figure GDA0004133582250000111
Then, the weighted sum +.>
Figure GDA0004133582250000112
Then splice the multi-head attention results together to form the multi-head attention vector of neighbor aggregation>
Figure GDA0004133582250000113
Embedding vectors into a central node
Figure GDA0004133582250000114
And->
Figure GDA0004133582250000115
In combination, and taking into account the gating residual mechanism, the inflow of selective control information, thereby calculating the embedded vector expression of the next layer +.>
Figure GDA0004133582250000116
The specific calculation formula is as follows:
Figure GDA0004133582250000117
Figure GDA0004133582250000118
Figure GDA0004133582250000119
wherein r is i (l) Embedded vector being a central node
Figure GDA00041335822500001110
By->
Figure GDA00041335822500001111
Linearly transformed, ++>
Figure GDA00041335822500001112
Is the weight of the gating residual, will +.>
Figure GDA00041335822500001113
r i (l) And->
Figure GDA00041335822500001114
Sequentially spliced and passes ∈ ->
Figure GDA00041335822500001126
Performing linear transformation, and mapping the value range to the interval from 0 to 1 through a sigmoid function so as to realize the control of r i (l) And->
Figure GDA00041335822500001115
A function of information inflow; finally obtaining the embedded vector representation of the central node i of the first layer +1 through LayerNorm and ReLU activation functions>
Figure GDA00041335822500001116
Step 4.4: constructing a bilinear decoder, wherein one side corresponds to a unique patient and disease in a patient-decoding bipartite graph, the bilinear decoder is the embedded vector expression of the known patient and disease, the existence probability of the side in the patient-decoding graph is predicted, and the calculation formula is as follows:
Figure GDA00041335822500001117
Figure GDA00041335822500001118
in the method, in the process of the invention,
Figure GDA00041335822500001119
representing the index corresponding to the edge i, j and taking the index as heuristic characteristics; />
Figure GDA00041335822500001120
Transpose of embedded vector representing node i, h j A vector representing node j; the above uses multiple weight matrices to reference the multi-head attention mechanism>
Figure GDA00041335822500001121
Learning +.>
Figure GDA00041335822500001122
And h j The learned results are spliced together with heuristic features to form hidden layer features of edges ++>
Figure GDA00041335822500001123
Figure GDA00041335822500001124
The subscript b of (2) is used only to distinguish between different weight matrices;
finally through W o The weight matrix is subjected to linear transformation, and the bias term b is added o Obtaining the result of the output layer, and obtaining the prediction probability p of the edges i and j by using a sigmoid activation function ij
Figure GDA00041335822500001125
The loss function uses cross entropy and is calculated as follows:
Figure GDA0004133582250000121
wherein G is dec Representing a decoding diagram e ij Representing edges ii, j, y ij Labels representing edges; and optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
Step 5: disease risk prediction models are trained based on the dataset of the historic medical records top page.
To quickly train a disease risk prediction model, samples from the whole negative sample set are required to generate a trained negative sample. The invention sets the sampling ratio of positive and negative samples to 1:10, and if a patient has 3 positive samples, 30 negative samples need to be sampled.
In the data set dividing stage, the data set is divided into a training set, a verification set and a test set by taking a patient as a unit, and the ratio of the training set to the verification set to the test set is 7:1:2 respectively. The training set is used for training a disease risk prediction model; the verification set is used for optimizing parameters of the model; the test set was used to evaluate the generalization effect of the model. And (3) during model reasoning, a full-quantity sample test is adopted to obtain the prediction probability of each disease. And sequencing the prediction probability of the diseases to obtain the risk sequences of different diseases.
The disease risk prediction model adopts a small-batch training mode, and a part of nodes and neighbors thereof are sampled each time to train the network, so that the network can be trained on large-scale graph data. The model has good effect and strong expandability. When new data needs to be predicted, the whole graph data does not need to be trained again like other graph neural network models, and the prediction can be made only by using the neighbor information of the nodes. The number of neighbor samples per layer of the disease risk prediction model is 10. In order to optimize model parameters, a gradient descent method is used for back propagation, so that parameters of a weight matrix are optimized, and a well-trained disease risk prediction model is obtained.
And carrying out disease risk prediction on the new hospitalization record by adopting a trained disease risk prediction model:
referring to fig. 4, for a new hospitalization record, individual information, hospitalization hospital information, historic disease diagnosis information of the patient can be obtained as well, and corresponding patient feature vectors and disease feature vectors can be extracted. For a patient-disease encoding bipartite graph and a patient-disease decoding bipartite graph, the patient is added to both bipartite graphs. Finally, using a GADP model to obtain the disease risks of other diseases of the patient, and carrying out descending order on the risks to return to the diseases of the TopN.
The foregoing examples merely represent specific embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, several variations and modifications can be made without departing from the technical solution of the present application, which fall within the protection scope of the present application.

Claims (8)

1. The method for constructing the slow patient group disease risk prediction model based on the graph self-encoder is characterized by comprising the following steps of:
step 1: acquiring a data set of a first page of a historical medical record, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
step 2: dividing the preprocessed historical case data into a disease which the patient has historically and a disease which the patient has in the future based on a time sequence, constructing the disease which the patient has historically into a patient-disease coding bipartite graph, and constructing the disease which the patient has in the future N years into a patient-disease decoding bipartite graph;
step 3: invoking historical case data in the storage space, and extracting a patient feature vector and a disease feature vector based on the historical case data;
step 4: an encoder and a decoder are respectively established based on a patient-disease coding bipartite graph and a patient-decoding bipartite graph, the encoder is a graph meaning network, and a disease risk prediction model is established based on the encoder and the decoder, and the method specifically comprises the following steps:
step 4.1: establishing a heuristic feature extraction model;
step 4.2: establishing a neighbor sampling strategy;
step 4.3: using a graph attention network as an encoder, wherein the encoder comprises at least one graph convolution module, and a graph convolution layer of each graph convolution module learns weights of different neighbors by adopting a graph attention mechanism to obtain a final embedded vector expression;
step 4.4: constructing a bilinear decoder based on the patient-decoded bipartite graph, wherein the bilinear decoder predicts the existence probability of edges in the patient-decoded graph for embedded vector expressions and heuristic features of edges of known patients and diseases;
step 5: disease risk prediction models are trained based on the dataset of the historic medical records top page.
2. The method according to claim 1, wherein the preprocessing in step 1 is to discard variables with a missing rate greater than 30% in the data set, and to fill the missing values with the average of the non-missing portions of the remaining data with the missing rate.
3. The method of constructing a model for predicting disease risk in a group of slow patients based on a graph self-encoder as claimed in claim 1, wherein the edges in the patient-disease encoding bipartite graph represent the disease that the patient has historically, and the weights represent the number of occurrences of the disease; the patient-disease decoding bipartite graph comprises a positive sample and a negative sample, wherein the positive sample is a new disease of the patient in the future N years, and the negative sample is a disease which can not be new in the patient in the future N years; subtracting the patient-disease encoding bipartite graph from the full bipartite graph to obtain the edges of the patient-disease decoding bipartite graph; the patient-disease encoding bipartite graph is used for the encoder to automatically learn the expression of the embedded vectors of the patient node and the disease node and the extraction of heuristic features, and the patient-disease decoding bipartite graph is used for the decoder to learn the occurrence probability of each edge.
4. The method for constructing a model for predicting disease risk of a group of slow patients based on a graph-based self-encoder as claimed in claim 1, wherein the extraction of the patient feature vector includes individual information, hospitalization hospital information, number of historic diseases and ECI co-morbid index of historic diseases; the data with the characteristic type of discrete type is subjected to single-heat coding and is converted into binary variables of 0-1; taking the data with the characteristic type of numerical value as continuous characteristics and taking the value as real number; and encoding the characteristic type as discrete data and the data with sequential relation as numerical characteristic.
5. The method for constructing a model for predicting disease risk of a group of slow patients based on a graph self-encoder as claimed in claim 1, wherein the extraction of the disease feature vector is performed by ascending arrangement of ICD-10 codes of disease nodes to obtain serial numbers of each disease node, and then a vector is generated for each disease node by single-hot coding; and the prevalence of each disease is calculated as a characteristic used to characterize the prevalence of the disease.
6. The method for constructing a model for predicting disease risk of a group of slow patients based on a graph-based self-encoder as claimed in claim 1, wherein the step 4 comprises the steps of:
step 4.1: establishing a heuristic feature extraction model:
Figure FDA0004133582220000021
Figure FDA0004133582220000022
Figure FDA0004133582220000023
Figure FDA0004133582220000024
in the method, in the process of the invention,
Figure FDA0004133582220000025
and->
Figure FDA0004133582220000026
A set of neighbor nodes that are nodes i, j and z, respectively, wherein node i represents a central node; the I.I is the size of the set; />
Figure FDA0004133582220000027
It is the second order neighbor set of node j; />
Figure FDA0004133582220000028
Common Neighbors index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure FDA0004133582220000029
Adamic-Adar index representing edges i, j of patient-disease encoding bipartite graph, +.>
Figure FDA00041335822200000210
Jaccard's coeffient index representing the edges i, j of a patient-disease encoding bipartite graph, +.>
Figure FDA00041335822200000211
Preferential Attachment index representing the edges i, j of the patient-disease encoding bipartite graph; the larger the value of the index is, the higher the occurrence probability of the edge is;
step 4.2: establishing a neighbor sampling strategy:
Figure FDA00041335822200000212
wherein: w (w) ij And
Figure FDA00041335822200000213
weights and sampling probabilities, w, of edges i, j representing patient-disease encoding bipartite graph, respectively iu Weights representing sides i, u of the patient-disease encoding bipartite graph based on sampling probability +.>
Figure FDA00041335822200000214
Performing put-back sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: using a graph attention network as an encoder, wherein the encoder comprises at least one graph convolution module, and a graph convolution layer of each graph convolution module learns weights of different neighbors by adopting a graph attention mechanism to obtain a final embedded vector expression; defining the first layer of the encoder is characterized by
Figure FDA0004133582220000031
Multi-headed attention weight from node j to node i
Figure FDA0004133582220000032
Calculated from the following formula:
Figure FDA0004133582220000033
Figure FDA0004133582220000034
Figure FDA0004133582220000035
in the method, in the process of the invention,
Figure FDA0004133582220000036
representation ofA query vector of attention at the c-th head at a central node i of the layer i network of the encoder; />
Figure FDA0004133582220000037
A weight matrix representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure FDA0004133582220000038
An embedded vector representing a central node i at a layer 1 network of the encoder; />
Figure FDA0004133582220000039
A bias term representing the attention of the query vector q at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000310
A key vector representing the attention of node j at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000311
A weight matrix representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000312
An embedded vector representing a node j in a layer 1 network of the encoder; w (w) ij Weights representing edges i, j; />
Figure FDA00041335822200000313
A bias term representing the attention of key vector k at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000314
Attention weights representing the attention of edge i, j at the c-th head in the layer 1 network of the encoder;
Figure FDA00041335822200000315
a key vector representing the attention of node u at the c-th head in the layer one network of the encoder; />
Figure FDA00041335822200000316
Exponential scaling of the vector dot product is performed, and d is the dimension of the vector;
after obtaining the multi-head attention weight, carrying out message aggregation operation on embedded vectors of different neighbors:
Figure FDA00041335822200000317
Figure FDA00041335822200000318
in the method, in the process of the invention,
Figure FDA00041335822200000319
a value vector representing the attention of node j at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000320
A weight matrix representing the attention of the vector v at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000321
A bias term representing the attention of the vector v at the c-th head in the layer 1 network of the encoder; />
Figure FDA00041335822200000322
An attention vector representing a central node i of the layer 1 network of the encoder; splicing operation of representation vectors;
embedding vector of center node i
Figure FDA00041335822200000323
And->
Figure FDA00041335822200000324
In combination, and taking into account the gating residual mechanism, the inflow of selective control information, thereby calculating the embedded vector expression of the next layer +.>
Figure FDA00041335822200000325
The specific calculation formula is as follows:
Figure FDA00041335822200000326
Figure FDA0004133582220000041
Figure FDA0004133582220000042
wherein r is i (l) Information representing a central node i in a layer one network of the encoder; w (W) r (l) A weight matrix representing a central node i in a layer one network of the encoder;
Figure FDA0004133582220000043
a bias term representing a central node i in a layer one network of the encoder;
Figure FDA0004133582220000044
a weight representing the gating residual of the central node i in the layer 1 network of the encoder; will->
Figure FDA0004133582220000045
r i (l) And->
Figure FDA0004133582220000046
Figure FDA00041335822200000420
Spliced in turn and pass through W g (l) The weight matrix is subjected to linear transformation, and the value range is mapped to the interval from 0 to 1 through a sigmoid function, so that r is controlled i (l) And->
Figure FDA0004133582220000047
A function of information inflow; finally obtaining the embedded vector representation of the central node i of the layer 1 network by LayerNorm and ReLU activation functions>
Figure FDA0004133582220000048
Step 4.4: constructing a bilinear decoder, wherein the bilinear decoder is an embedded vector expression of known patients and diseases, predicts the existence probability of edges in a patient-decoding diagram, and calculates the following formula:
Figure FDA0004133582220000049
Figure FDA00041335822200000410
in the method, in the process of the invention,
Figure FDA00041335822200000411
representing the index corresponding to the side i, j of the patient-disease encoding bipartite graph and taking the index as heuristic characteristics; />
Figure FDA00041335822200000412
Transpose of embedded vector representing node i, h j A vector representing node j; the above uses multiple weight matrices to reference the multi-head attention mechanism>
Figure FDA00041335822200000413
Learning +.>
Figure FDA00041335822200000414
And h j And then the learned results are spliced to obtain +.>
Figure FDA00041335822200000415
Will->
Figure FDA00041335822200000416
Splicing with heuristic features to form hidden layer feature expression of edge ++>
Figure FDA00041335822200000417
Finally through W o The weight matrix is subjected to linear transformation, and the bias term b is added o Obtaining the result of the output layer, and obtaining the prediction probability p of the edges i and j by using a sigmoid activation function ij
Figure FDA00041335822200000418
The loss function uses cross entropy and is calculated as follows:
Figure FDA00041335822200000419
wherein G is dec Representing a decoding diagram e ij Representing edges i, j, y ij Labels representing edges; and optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
7. The method for constructing a slow patient group disease risk prediction model based on a graph-based self-encoder according to claim 1, wherein the preprocessed data set is divided into a training set, a validation set and a test set according to a ratio of 7:1:2; the training set is used for training the disease risk prediction model, the verification set is used for optimizing parameters of the disease risk prediction model, and the test set is used for evaluating the generalization effect of the disease risk prediction model.
8. The method of constructing a disease risk prediction model for a slow patient group based on a graph-based self-encoder according to claim 1, wherein all negative samples in the dataset are acquired, a negative sample set is formed, the negative sample set is sampled, negative samples for training the disease risk prediction model are generated, and the ratio of the positive samples to the negative samples is set to 1:10.
CN202210507317.9A 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder Active CN114783608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210507317.9A CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210507317.9A CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Publications (2)

Publication Number Publication Date
CN114783608A CN114783608A (en) 2022-07-22
CN114783608B true CN114783608B (en) 2023-05-05

Family

ID=82436498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210507317.9A Active CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Country Status (1)

Country Link
CN (1) CN114783608B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713986B (en) * 2022-11-11 2023-07-11 中南大学 Attention mechanism-based material crystal attribute prediction method
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116825360A (en) * 2023-07-24 2023-09-29 湖南工商大学 Method and device for predicting chronic disease co-morbid based on graph neural network and related equipment
CN117438023B (en) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117476240B (en) * 2023-12-28 2024-04-05 中国科学院自动化研究所 Disease prediction method and device with few samples

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013108122A1 (en) * 2012-01-20 2013-07-25 Mueller-Wolf Martin "indima apparatus" system, method and computer program product for individualized and collaborative health care
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN113689954A (en) * 2021-08-24 2021-11-23 平安科技(深圳)有限公司 Hypertension risk prediction method, device, equipment and medium
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693728B2 (en) * 2004-03-31 2010-04-06 Aetna Inc. System and method for administering health care cost reduction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013108122A1 (en) * 2012-01-20 2013-07-25 Mueller-Wolf Martin "indima apparatus" system, method and computer program product for individualized and collaborative health care
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN113689954A (en) * 2021-08-24 2021-11-23 平安科技(深圳)有限公司 Hypertension risk prediction method, device, equipment and medium
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder

Also Published As

Publication number Publication date
CN114783608A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN114783608B (en) Construction method of slow patient group disease risk prediction model based on graph self-encoder
CN112131673B (en) Engine surge fault prediction system and method based on fusion neural network model
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN112508085B (en) Social network link prediction method based on perceptual neural network
CN112086195B (en) Admission risk prediction method based on self-adaptive ensemble learning model
WO2023116111A1 (en) Disk fault prediction method and apparatus
CN112291098B (en) Network security risk prediction method and related device thereof
Mustika et al. Analysis accuracy of xgboost model for multiclass classification-a case study of applicant level risk prediction for life insurance
CN114898121B (en) Automatic generation method for concrete dam defect image description based on graph attention network
CN113328755B (en) Compressed data transmission method facing edge calculation
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN113345564B (en) Early prediction method and device for patient hospitalization duration based on graph neural network
CN116579447A (en) Time sequence prediction method based on decomposition mechanism and attention mechanism
CN114880538A (en) Attribute graph community detection method based on self-supervision
CN112201348B (en) Knowledge-aware-based multi-center clinical data set adaptation device
CN114418158A (en) Cell network load index prediction method based on attention mechanism learning network
CN115762783A (en) Acute kidney injury prediction system
CN115794548A (en) Method and device for detecting log abnormity
CN115035455A (en) Cross-category video time positioning method, system and storage medium based on multi-modal domain resisting self-adaptation
Zhang et al. Compressing knowledge graph embedding with relational graph auto-encoder
CN112989048A (en) Network security domain relation extraction method based on dense connection convolution
CN112862070A (en) Link prediction system using graph neural network and capsule network
CN115906768B (en) Enterprise informatization data compliance assessment method, system and readable storage medium
CN117811843B (en) Network intrusion detection method and system based on big data analysis and autonomous learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant