CN114783608A - Construction method of slow patient group disease risk prediction model based on graph self-encoder - Google Patents

Construction method of slow patient group disease risk prediction model based on graph self-encoder Download PDF

Info

Publication number
CN114783608A
CN114783608A CN202210507317.9A CN202210507317A CN114783608A CN 114783608 A CN114783608 A CN 114783608A CN 202210507317 A CN202210507317 A CN 202210507317A CN 114783608 A CN114783608 A CN 114783608A
Authority
CN
China
Prior art keywords
disease
patient
encoder
vector
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210507317.9A
Other languages
Chinese (zh)
Other versions
CN114783608B (en
Inventor
邱航
胡智栩
杨萍
王利亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210507317.9A priority Critical patent/CN114783608B/en
Publication of CN114783608A publication Critical patent/CN114783608A/en
Application granted granted Critical
Publication of CN114783608B publication Critical patent/CN114783608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to the technical field of medical information, in particular to a construction method of a slow patient group disease risk prediction model based on a graph self-encoder, wherein a patient-disease bipartite graph is constructed based on hospitalization records and historical disease information of patients, and then characteristic vectors are respectively extracted aiming at the patients and the diseases; and finally, constructing a disease risk prediction model based on an image attention mechanism based on an image self-encoder architecture to predict the future disease risk of the chronic patient, and using the attention mechanism and considering the side weight information at the decoder part of the disease risk prediction model, so that the topological information of bipartite images and the individual difference of the patient can be considered at the same time, the complex influence relationship among diseases can be learned, and the purpose of improving the prediction effect can be achieved.

Description

Construction method of slow patient group disease risk prediction model based on graph self-encoder
Technical Field
The invention relates to the technical field of medical information, in particular to a construction method of a slow patient group disease risk prediction model based on a graph self-encoder.
Background
The exacerbation of the aging population and the steep rise in the incidence of chronic diseases impose a severe social and economic burden on the world. It is estimated that over 75% of the elderly have more than one chronic disease, and that the multiple disease (of two or more chronic diseases at the same time) of the elderly has become a prominent global problem, resulting in greater medical needs, greater use of medical services and costs. There are complex interrelationships between chronic diseases, and some chronic diseases may cause other chronic diseases, further increasing the treatment burden of patients. The prevention and treatment of chronic diseases and related complications have become an irresistible problem. The method effectively predicts the future disease risk of the chronic disease patient, can lead doctors to intervene in advance, reduces the occurrence risk of related diseases, thereby preventing the diseases in the bud and having great implementation significance. The existing disease risk prediction method mainly has the following problems:
(1) the partial prediction method models the disease prediction problem as a series of two-class models, each of which predicts whether a disease occurs, and this modeling method causes the number of models to increase as the number of predicted diseases increases, limiting the utility of the models.
(2) The partial prediction method utilizes historical disease information of a patient, abstracts the historical disease information into a patient-disease bipartite graph, models the problem into a link prediction problem, and predicts the disease risk by using heuristic methods such as Common Neighbors (CN) indexes, adaptive-Adar (AA) indexes and the like, wherein only topological information of the bipartite graph is considered, and individual differences of the patient, such as information of gender, age and the like, are not considered.
(3) Most of the existing prediction methods do not consider the complex influence relationship among diseases, so that the prediction effect is poor.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a construction method of a slow patient group disease risk prediction model based on a graph self-encoder, aiming at solving the technical problem of poor prediction effect caused by the fact that the existing prediction method in the background art does not consider the complex relation influence among diseases.
The technical scheme adopted by the invention is as follows:
the construction method of the slow patient group disease risk prediction model based on the graph self-encoder comprises the following steps:
step 1: acquiring a data set of a historical case homepage, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
and 2, step: dividing the historical case data obtained by preprocessing into diseases which the patient has historically suffered and diseases which the patient has in the future based on the time sequence, constructing the diseases which the patient has historically suffered as a patient-disease coding bipartite graph, and constructing the diseases which the patient has in the future N years as a patient-disease decoding bipartite graph;
and 3, step 3: calling historical case data in the storage space, and extracting a patient characteristic vector and a disease characteristic vector based on the historical case data;
and 4, step 4: establishing an encoder and a decoder respectively based on the patient-disease encoding bipartite graph and the patient-decoding bipartite graph, wherein the encoder is a graph attention network and establishes a disease risk prediction model based on the encoder and the decoder;
and 5: and training a disease risk prediction model based on the data set of the historical case home page.
For new hospitalization records, the invention can also obtain individual information, hospitalization hospital information and historical disease diagnosis information of patients, and can extract corresponding patient characteristic vectors and disease characteristic vectors; for both the patient-to-illness encoding bipartite graph and the patient-to-illness decoding bipartite graph, new patient admission record data is added into both bipartite graphs. And finally, obtaining the risk of other diseases of the corresponding patient by the trained disease risk prediction model, and performing descending order aiming at the risk.
In addition, in the decoder part of the disease risk prediction model, the attention mechanism is used, and the side weight information is considered at the same time, so that the topological information of the bipartite graph and the individual difference of the patient can be considered at the same time, the complex influence relation among the diseases can be learned, and the purpose of improving the prediction effect can be achieved.
Preferably, the preprocessing in step 1 is to remove variables with a deletion rate of more than 30% from the data set, and to fill the missing values with the mean values of the non-missing parts for the remaining data with the deletion rate.
Preferably, the edges in the patient-disease code bipartite graph represent diseases that the patient has had in history, and the weights represent the number of occurrences of the diseases; the patient-disease decoding bipartite graph comprises positive samples and negative samples, wherein the positive samples are new diseases of the patient in the next N years, and the negative samples are diseases which cannot be newly developed in the next N years; subtracting the patient-disease encoded bipartite graph from the complete bipartite graph to obtain edges of the patient-disease decoded bipartite graph; the patient-disease encoded bipartite graph is used for an encoder to automatically learn patient nodes and disease node embedded vector expressions, and the patient-disease decoded bipartite graph is used for a decoder to learn the occurrence probability of each edge.
Preferably, the extraction of the patient feature vector comprises individual information, hospital information, historical disease number and ECI co-morbidity index of historical diseases; carrying out one-hot coding on the data with the characteristic type of discrete type, and converting the data into a binary variable of 0-1; taking the data with the characteristic type of numerical value as continuous characteristic, and taking the value as real number; and (3) encoding the data with the characteristic type of discrete type and the data with the value having the sequence relation into the numerical characteristic.
Preferably, the extraction of the disease feature vectors is performed by performing ascending order arrangement on ICD-10 codes of disease nodes to obtain a serial number of each disease node, and then generating a vector for each disease node through unique hot coding; and the prevalence rate (the number of patients divided by the total number) of each disease is calculated as a characteristic for representing the hotness of the disease.
Preferably, the step 4 comprises the following steps:
step 4.1: establishing a heuristic characteristic extraction model:
Figure BDA0003636568130000031
Figure BDA0003636568130000032
Figure BDA0003636568130000033
Figure BDA0003636568130000034
in the formula (I), the compound is shown in the specification,
Figure BDA0003636568130000035
and
Figure BDA0003636568130000036
a set of neighbor nodes for nodes i, j and z, respectively, wherein node i represents a central node; | is the size of the solution set;
Figure BDA0003636568130000037
it is a second-order neighbor set of node j;
Figure BDA0003636568130000038
common neighbor indices representing the edges i, j of the patient-disease coding bipartite graph,
Figure BDA0003636568130000039
Adamic-Adar index representing the edge i, j of the patient-disease code bipartite graph,
Figure BDA00036365681300000310
Jaccard's coeffient index representing the side i, j of the patient-disease code bipartite graph,
Figure BDA00036365681300000311
Representing patient-disease code bipartite graphA preferred attribute index for edge i, j; the larger the value of the index is, the higher the occurrence probability of the edge is;
step 4.2: establishing a neighbor sampling strategy:
Figure BDA00036365681300000312
in the formula: w is aijAnd
Figure BDA00036365681300000313
weights and sampling probabilities, w, of the edges i, j of the patient-disease encoded bipartite graph, respectivelyiuWeights representing the edges i, u of the patient-disease encoded bipartite graph based on the sampling probabilities
Figure BDA00036365681300000314
Performing replacement sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: the graph attention network is used as an encoder, the encoder comprises at least one graph convolution module, and the graph convolution layer of each graph convolution module learns the weights of different neighbors by adopting the graph attention mechanism to obtain the final embedded vector expression; the layer I of the encoder is defined by
Figure BDA00036365681300000315
Multi-headed attention weights from node j to node i
Figure BDA00036365681300000316
Calculated by the following formula:
Figure BDA00036365681300000317
Figure BDA00036365681300000318
Figure BDA00036365681300000319
in the formula (I), the compound is shown in the specification,
Figure BDA00036365681300000320
a query vector representing the focus of the central node i at the layer i network of the encoder at the c-th point;
Figure BDA00036365681300000321
a weight matrix representing the focus of the query vector q at the c-th point in the l-th layer network of the encoder;
Figure BDA00036365681300000322
an embedded vector representing a central node i in a layer i network of the encoder;
Figure BDA0003636568130000041
a bias term representing the focus of the query vector q at the c-th point in the l-th layer network of the encoder;
Figure BDA0003636568130000042
a key vector representing node j's attention at the c-th in the l-th network of the encoder;
Figure BDA0003636568130000043
a weight matrix representing the key vector k in the layer i network of the encoder attention at c;
Figure BDA0003636568130000044
an embedded vector representing node j in the l-th network of the encoder; w is aijRepresenting the weight of the edge i, j;
Figure BDA0003636568130000045
a bias term representing the key vector k at the c-th attention in the l-th layer of the encoder;
Figure BDA0003636568130000046
indicating that in the l-th network of the encoder the edge i, j is in the hc attention weight of head attention;
Figure BDA0003636568130000047
a key vector representing the attention of node u at the c-th in the l-th network of the encoder;
Figure BDA0003636568130000048
carrying out exponential scaling on vector dot products, wherein d is the dimension of a vector;
after obtaining the multi-head attention weight of the graph, performing message aggregation operation on the embedded vectors of different neighbors:
Figure BDA0003636568130000049
Figure BDA00036365681300000410
in the formula (I), the compound is shown in the specification,
Figure BDA00036365681300000411
a value vector representing the attention of node j at the c-th in the l-th network of the encoder;
Figure BDA00036365681300000412
a weight matrix representing the value vector v in the layer i network of the encoder at the c-th attention;
Figure BDA00036365681300000413
a bias term representing the value vector v in the l-th network of the encoder to be noticed at the c-th place;
Figure BDA00036365681300000414
an attention vector representing a central node i in an l +1 th layer network of the encoder;
embedding vector of central node i
Figure BDA00036365681300000415
And
Figure BDA00036365681300000416
combined and taking into account a gated residual mechanism, selectively controlling the inflow of information to compute the embedded vector representation of the next layer
Figure BDA00036365681300000417
The specific calculation formula is as follows:
Figure BDA00036365681300000418
Figure BDA00036365681300000419
Figure BDA00036365681300000420
wherein r isi (l)Information indicating a central node i in a layer i network of an encoder;
Figure BDA00036365681300000421
representing a weight matrix of a central node i in a layer I network of an encoder;
Figure BDA00036365681300000422
representing the weight of the gating residual error of the central node i in the l-th network of the encoder; will be provided with
Figure BDA00036365681300000423
ri (l)And
Figure BDA00036365681300000424
-ri (l)are spliced in sequence and pass through
Figure BDA00036365681300000425
Weight matrix routing linesPerforming sexual transformation, and mapping a value domain to a range from 0 to 1 through a sigmoid function, thereby realizing r controli (l)And
Figure BDA00036365681300000426
the function of information inflow of (2); finally, obtaining the embedded vector representation of the central node i of the l +1 th network through LayerNorm and ReLU activation function
Figure BDA00036365681300000427
Step 4.4: constructing a bilinear decoder which predicts the existence probability of edges in a patient-decoded picture for an embedded vector representation of known patients and diseases, and the calculation formula is as follows:
Figure BDA0003636568130000051
Figure BDA0003636568130000052
in the formula (I), the compound is shown in the specification,
Figure BDA0003636568130000053
representing indexes corresponding to the edges i and j, and taking the indexes as heuristic characteristics;
Figure BDA0003636568130000054
transpose of the embedded vector, h, representing node ijA vector representing node j; the above formula uses a plurality of weight matrixes to use the multi-head attention mechanism for reference
Figure BDA0003636568130000055
Learning from different angles
Figure BDA0003636568130000056
And hjThe combination method of (3) and then splicing the learned results with heuristic features to form hidden layer features of edges
Figure BDA0003636568130000057
Finally pass through WoThe weight matrix is linearly transformed by adding an offset term boObtaining the result of the output layer, and obtaining the prediction probability p of the edge i, j by using the sigmoid activation functionij
Figure BDA0003636568130000058
The loss function uses cross entropy and is calculated as follows:
Figure BDA0003636568130000059
wherein G isdecRepresents a decoding graph, eijRepresents the side ii, j, yijA label representing an edge; and (5) optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
Preferably, the preprocessed data set is divided into a training set, a verification set and a test set according to the ratio of 7:1: 2; the training set is used for training the disease risk prediction model, the verification set is used for optimizing parameters of the disease risk prediction model, and the test set is used for evaluating the generalization effect of the disease risk prediction model.
Preferably, all negative samples in the data set are obtained, a negative sample set is formed, the negative sample set is sampled, a negative sample used for training a disease risk prediction model is generated, and the ratio of the positive sample to the negative sample is set to be 1: 10.
The beneficial effects of the invention include:
1. for new hospitalization records, the invention can also obtain individual information, hospitalization hospital information and historical disease diagnosis information of patients, and can extract corresponding patient characteristic vectors and disease characteristic vectors; for both the patient-to-illness coded bipartite map and the patient-to-illness decoded bipartite map, new patient hospitalization record data is added to both bipartite maps. And finally, obtaining the risks of other diseases of the corresponding patient by the trained disease risk prediction model, and performing descending order aiming at the risks.
In addition, in the encoder part of the disease risk prediction model, the attention mechanism is used and the side weight information is considered at the same time, so that the topological information of the bipartite graph and the individual difference of the patient can be considered at the same time, the complex influence relation among the diseases can be learned, and the purpose of improving the prediction effect is achieved.
2. The invention adopts the final output result to carry out descending order aiming at the disease probability of the diseases, realizes the risk prediction of all the diseases and has wide practical value.
3. According to the invention, modeling can be completed only by the first page data of the medical record of the patient, and the characteristic vector of the patient and the characteristic vector of the disease are extracted, so that available information is comprehensively mined, and the prediction capability of the model is enhanced.
4. The decoder part of the disease risk prediction model not only considers the node embedded vector learned by the encoder, but also extracts heuristic characteristics such as CN, AA and the like for each edge, and the heuristic characteristics can supplement additional information, so that the model convergence is faster and the effect is better.
Drawings
FIG. 1 is a construction of a patient-disease bipartite graph according to the invention.
FIG. 2 is a diagram of the disease risk prediction model architecture of the present invention.
Fig. 3 is a training flowchart of the disease risk prediction model of the present invention.
Fig. 4 is a prediction flowchart of the disease risk prediction model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of embodiments of the present application, generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
The following detailed description of embodiments of the invention is made with reference to the accompanying drawings 1 and 4:
the construction method of the slow patient group disease risk prediction model based on the graph self-encoder comprises the following steps:
step 1: acquiring a data set of a historical case homepage, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
the historical case first page data is a record item generated after the patient is completely hospitalized, each record comprises individual information (encrypted information such as identification number, sex, age, hospitalization time and discharge time) of the patient, information (information such as hospital grade and hospital address) of the hospitalization hospital and hospitalization Disease diagnosis (main diagnosis and at most 15 secondary diagnoses) of the patient, and the 10 th edition code of International Classification of Disease-review 10, ICD-10 is adopted; based on the above, data needs to be preprocessed, that is, variables with a deletion rate greater than 30% in a data set are removed, and the residual data with the deletion rate are filled with the missing values by using the mean value of the non-missing parts; data without missing values are obtained and stored in a storage space established in the storage medium, such as a database.
The goal of the present invention is to predict a patient's risk of disease N years into the future based on the patient's historical disease and individual information. Therefore, before data without missing values is formed, patients with hospitalization records with time span exceeding N years need to be screened, the chronic disease diagnosis of the hospitalization records of the last N years of the patients is taken as a prediction label, and the historical diseases are taken as known information; according to the invention, by setting the value of N, the future disease risk of the patient at different time granularities can be predicted.
Step 2: dividing the historical case data obtained by preprocessing into diseases which the patient has historically suffered and diseases which the patient has suffered in the future based on the time sequence, constructing the diseases which the patient has historically suffered as a patient-disease coding bipartite graph, and constructing the diseases which the patient has suffered in the next N years as a patient-disease decoding bipartite graph;
referring to fig. 1, in order to be able to predict the disease that may occur in a patient N years in the future, the present invention abstracts the task scenario into the link prediction problem of a bipartite graph, whose left nodes represent different patients and right nodes represent different diseases, and where only edges from patient to disease exist, a patient-disease encoded bipartite graph is used for an encoder to automatically learn the patient nodes and disease node embedded vector expressions, and a patient-disease decoded bipartite graph is used for a decoder to learn the occurrence probability of each edge.
The edges in the patient-disease code bipartite graph represent diseases that the patient has had in history, and the weights represent the number of occurrences of the disease; the solid line in the patient-disease decoding bipartite graph represents a positive sample, i.e., the patient's new disease N years into the future; the dashed line in the patient-disease decoding bipartite graph represents a negative sample, i.e., a disease that the patient will not have newly developed in the next N years; the complete bipartite graph is used to subtract the patient-disease encoded bipartite graph to obtain the edges of the patient-disease decoded bipartite graph.
The patient-Disease coding bipartite Graph constructed by the method is used for a Disease risk Prediction model (the patient-Disease coding bipartite Graph constructed by the method is named as a GADP model, namely a Graph Attention Disease risk Prediction model, and is called as a Disease risk Prediction model in the text for convenience of expression), patient nodes and patient node embedded vector expression are automatically learned, and the patient-Disease decoding bipartite Graph is used for solving the future Disease risk of the Disease.
And step 3: calling historical case data in the storage space, and extracting a patient characteristic vector and a disease characteristic vector based on the historical case data;
the extraction of the patient feature vector comprises individual information, hospital information, historical disease number and ECI (ECI) co-morbidity Index of historical diseases, wherein the ECI co-morbidity Index can quantify the physical condition of the patient to a certain extent; carrying out one-hot coding on the data with the characteristic type of discrete type, and converting the data into a binary variable of 0-1; taking the data with the characteristic type of numerical value as continuous characteristic, and taking the value as real number; and (3) encoding the data with the characteristic type of discrete type and the data with the value having the sequence relation into the numerical characteristic.
See in particular table 1 below:
TABLE 1 extraction of feature vectors for patient nodes
Figure BDA0003636568130000071
Figure BDA0003636568130000081
Referring to table 1 above, the third column in table 1 is the data type of the features, and if the features are numerical, the features are regarded as continuous features and take real numbers. If the discrete type is adopted, the single-hot coding is required to convert the discrete type into a binary variable of 0-1. However, as for the "hospitalization condition" field, its values are dangerous, urgent and general, although the values are discrete data, there is a sequential relationship between the values, and in order to reduce the dimension of the data, the feature is coded as a numerical feature, i.e. 1, 2 and 3; therefore, the characteristic dimension can be reduced, and the sequence information in the characteristic dimension can be kept.
Extracting the disease characteristic vectors by performing ascending order arrangement on ICD-10 codes of the disease nodes to obtain a serial number of each disease node, and generating a vector for each disease node through unique hot codes; and calculating the prevalence rate of each disease as a feature for characterizing the prevalence of the disease.
And 4, step 4: establishing an encoder and a decoder respectively based on the patient-disease encoding bipartite graph and the patient-decoding bipartite graph, wherein the encoder is used for establishing an attention network and establishing a disease risk prediction model based on the encoder and the decoder;
the invention uses Graph auto-encoder (GAE) as basic prediction architecture of link prediction. The graph self-encoder is used as an end-to-end model, and can automatically learn the embedded vector expression of each node in the encoding graph, and then a decoder is used for predicting the probability of decoding each edge in the graph. The core components of the graph self-encoder are an encoder and a decoder. The invention uses Graph Attention Networks (GAT) as an encoder, Bilinear layer (Bilinear layer) as a decoder, the model is named Graph Attention Disease Prediction (GADP) model, and the network structure is shown as figure 2.
The step 4 comprises the following steps:
step 4.1: establishing a heuristic characteristic extraction model:
Figure BDA0003636568130000091
Figure BDA0003636568130000092
Figure BDA0003636568130000093
Figure BDA0003636568130000094
in the formula (I), the compound is shown in the specification,
Figure BDA0003636568130000095
and
Figure BDA0003636568130000096
a set of neighbor nodes, respectively nodes i, j and z, wherein node i represents a central node; | is the size of the solution set;
Figure BDA0003636568130000097
it is a second-order neighbor set of node j;
Figure BDA0003636568130000098
common neighbor indices representing the edges i, j of the patient-disease coding bipartite graph,
Figure BDA0003636568130000099
Adamic-Adar index representing the edge i, j of the patient-disease code bipartite graph,
Figure BDA00036365681300000910
Jaccard's coeffecifient index, representing the edge i, j of the patient-disease code bipartite graph,
Figure BDA00036365681300000911
A preference attribute index representing the edge i, j of the patient-disease coding bipartite graph; the larger the value of the index is, the higher the occurrence probability of the edge is;
step 4.2: in the neighbor sampling strategy in the graph neural network, a certain number of neighbors are generally sampled based on mean value random distribution; however, because different diseases have different degrees of influence on patients, a non-uniform neighbor sampling strategy is designed by taking the edge weights of the disease-coding bipartite graph into account, so that the larger the weight is, the higher the probability of sampling is, and the specific neighbor sampling strategy is as follows:
Figure BDA00036365681300000912
in the formula: w is aijAnd
Figure BDA00036365681300000913
weights and sampling probabilities, w, of the edges i, j of the patient-disease encoded bipartite graph, respectivelyiuWeights representing the edges i, u of the patient-disease encoded bipartite graph based on the sampling probabilities
Figure BDA00036365681300000914
Performing replacement sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: referring to fig. 2, an attention network is used as an encoder, the encoder includes two identical graph convolution modules, and the graph convolution layer of each graph convolution module learns the weights of different neighbors by adopting an attention mechanism to obtain a final embedded vector expression; the layer I of the encoder is defined by
Figure BDA00036365681300000915
Multi-headed attention weight from node j to node i
Figure BDA00036365681300000916
Calculated by the following formula:
Figure BDA00036365681300000917
Figure BDA00036365681300000918
Figure BDA00036365681300000919
in the formula (I), the compound is shown in the specification,
Figure BDA0003636568130000101
a query vector representing the focus of the central node i at the layer i network of the encoder at the c-th point;
Figure BDA0003636568130000102
a weight matrix representing the focus of the query vector q at the c-th point in the l-th layer network of the encoder;
Figure BDA0003636568130000103
an embedded vector representing a central node i in a layer i network of the encoder;
Figure BDA0003636568130000104
a bias term representing the query vector q in the l-th layer network of the encoder to be noticed at the c-th point;
Figure BDA0003636568130000105
a key vector representing node j's attention at c' th in the l-th layer network of the encoder;
Figure BDA0003636568130000106
a weight matrix representing the key vector k in the layer i network of the encoder attention at c;
Figure BDA0003636568130000107
an embedded vector representing node j in the l-th network of the encoder; w is aijRepresenting the weight of the edge i, j;
Figure BDA0003636568130000108
a bias term representing the key vector k in the layer i network of the encoder with attention at the c-th point;
Figure BDA0003636568130000109
attention weights indicating that edges i, j are attentive at the c-th in the l-th layer of the encoder;
Figure BDA00036365681300001010
a key vector representing node u's attention at c' th in layer i network of the encoder;
Figure BDA00036365681300001011
carrying out exponential scaling on vector dot products, wherein d is the dimension of a vector;
firstly, embedding a central node of the l-th layer into a vector
Figure BDA00036365681300001012
By passing
Figure BDA00036365681300001013
Linear transformation to a query vector
Figure BDA00036365681300001014
Embedding neighboring nodes into vectors
Figure BDA00036365681300001015
And edge weight wijAre spliced and then passed through
Figure BDA00036365681300001016
Linear transformation to key vector
Figure BDA00036365681300001017
Reuse of<q,k>The attention weight of the edge is calculated,
Figure BDA00036365681300001018
carrying out exponential scaling on vector dot products, wherein d is the dimension of a vector; finally, normalized attention weight is obtained by using normalization operation
Figure BDA00036365681300001019
After obtaining the attention weight of the multiple points of the graph, performing message aggregation operation on the embedded vectors of different neighbors:
Figure BDA00036365681300001020
Figure BDA00036365681300001021
wherein, C is the total number of heads of attention, | | | is the splicing operation of the vector; first by
Figure BDA00036365681300001022
Obtaining
Figure BDA00036365681300001023
Vector of values after linear transformation
Figure BDA00036365681300001024
Then, the weighted sum is obtained by the previously calculated attention weight
Figure BDA00036365681300001025
Then the multi-head attention results are spliced together to form a multi-head attention vector of neighbor aggregation
Figure BDA00036365681300001026
Embedding vectors of central nodes
Figure BDA00036365681300001027
And
Figure BDA00036365681300001028
combining and considering a gated residual mechanism, selectively controlling the inflow of information to compute the embedded vector representation of the next layer
Figure BDA00036365681300001029
The specific calculation formula is as follows:
Figure BDA00036365681300001030
Figure BDA00036365681300001031
Figure BDA0003636568130000111
wherein r isi (l)Embedded vector being a central node
Figure BDA0003636568130000112
By passing
Figure BDA0003636568130000113
Is linearly transformed intoIn the end of the process, the raw materials are mixed,
Figure BDA0003636568130000114
is the weight of the gated residual, will
Figure BDA0003636568130000115
ri (l)And
Figure BDA0003636568130000116
-ri (l)are spliced in sequence and pass
Figure BDA0003636568130000117
Linear transformation is carried out, and the value range is mapped to the interval from 0 to 1 through a sigmoid function, thereby realizing the control of ri (l)And
Figure BDA0003636568130000118
the function of information inflow of (2); finally, obtaining the embedded vector representation of the l +1 layer central node i through LayerNorm and ReLU activation function
Figure BDA0003636568130000119
Step 4.4: constructing a bilinear decoder, wherein one edge corresponds to a unique patient and a unique disease in a patient-decoding bipartite graph, the bilinear decoder is an embedded vector expression of the known patient and disease, and the existence probability of the edge in the patient-decoding graph is predicted by the following calculation formula:
Figure BDA00036365681300001110
Figure BDA00036365681300001111
in the formula (I), the compound is shown in the specification,
Figure BDA00036365681300001112
representing the index corresponding to the edge i, j and taking the index as a startA hair style feature;
Figure BDA00036365681300001113
transpose of the embedded vector, h, representing node ijA vector representing node j; the above formula uses multiple weight matrices to use a multi-head attention mechanism
Figure BDA00036365681300001114
Learning from different angles
Figure BDA00036365681300001115
And hjThe learned results are spliced together and are spliced with heuristic characteristics to form hidden layer characteristics of edges
Figure BDA00036365681300001116
Subscript b in (1) is used only to distinguish different weight matrices;
finally pass through WoThe weight matrix is linearly transformed, and an offset term b is addedoObtaining the result of the output layer, and obtaining the prediction probability p of the edge i, j by using the sigmoid activation functionij
Figure BDA00036365681300001117
The loss function uses cross entropy and is calculated as follows:
Figure BDA00036365681300001118
wherein G isdecRepresenting a decoding diagram, eijRepresents the side ii, j, yijA label representing an edge; and (3) optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
And 5: and training a disease risk prediction model based on the data set of the historical case home page.
In order to train the disease risk prediction model quickly, sampling is performed from the whole negative sample set, and a training negative sample is generated. The invention sets the sampling ratio of positive and negative samples to be 1:10, and if a patient has 3 positive samples, 30 negative samples need to be sampled.
In the data set dividing stage, the data set is divided into a training set, a verification set and a test set by taking a patient as a unit, wherein the ratio of the training set to the verification set to the test set is 7:1: 2. The training set is used for training a disease risk prediction model; the verification set is used for optimizing the parameters of the model; the test set was used to evaluate the generalization effect of the model. And when the model is used for reasoning, a full-scale sample test is adopted to obtain the prediction probability of each disease. And sequencing the prediction probabilities of the diseases to obtain the risk sequencing of different diseases.
The disease risk prediction model adopts a small-batch training mode, a part of nodes and neighbors thereof are sampled each time to train the network, so that the network training on large-scale graph data becomes possible. The model has good effect and strong expandability. When the prediction needs to be made on new data, the prediction can be made only by using the neighbor information of the nodes without retraining the whole graph data like other graph neural network models. The number of neighbor samples per layer of the disease risk prediction model is 10. In order to optimize the model parameters, a gradient descent method is used for back propagation, so that the parameters of the weight matrix are optimized, and a well-trained disease risk prediction model is obtained.
And (3) adopting a trained disease risk prediction model to predict the disease risk of the new hospitalization record:
referring to fig. 4, for a new hospitalization record, individual information, hospitalization hospital information, and historical disease diagnosis information of the patient may be obtained as well, and corresponding patient feature vectors and disease feature vectors may be extracted. For the patient-disease encoding bipartite graph and the patient-disease decoding bipartite graph, the patient is added to both bipartite graphs. Finally, the GADP model is used to obtain the risk of other diseases of the patient, and the risk is subjected to descending order to return to the TopN disease.
The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, without departing from the technical idea of the present application, several changes and modifications can be made, which are all within the protection scope of the present application.

Claims (8)

1. The construction method of the slow patient group disease risk prediction model based on the graph self-encoder is characterized by comprising the following steps of:
step 1: acquiring a data set of a historical case homepage, preprocessing data in the data set, and storing the preprocessed historical case data into a storage space established by a storage medium;
step 2: dividing the historical case data obtained by preprocessing into diseases which the patient has historically suffered and diseases which the patient has suffered in the future based on the time sequence, constructing the diseases which the patient has historically suffered as a patient-disease coding bipartite graph, and constructing the diseases which the patient has suffered in the next N years as a patient-disease decoding bipartite graph;
and step 3: calling historical case data in a storage space, and extracting a patient characteristic vector and a disease characteristic vector based on the historical case data;
and 4, step 4: establishing an encoder and a decoder respectively based on the patient-disease encoding bipartite graph and the patient-decoding bipartite graph, wherein the encoder is a graph attention network and establishes a disease risk prediction model based on the encoder and the decoder;
and 5: and training a disease risk prediction model based on the data set of the history case front page.
2. The method of claim 1, wherein the preprocessing in step 1 is to eliminate variables with a deletion rate of more than 30% in the data set, and to fill the missing values with the mean values of the non-missing parts for the remaining data with the missing rate.
3. The method for constructing a slow patient group disease risk prediction model based on graph self-encoder according to claim 1, wherein the edges in the patient-disease encoding bipartite graph represent diseases that the patient has had in history, and the weights represent the number of occurrences of the disease; the patient-disease decoding bipartite graph comprises positive samples and negative samples, wherein the positive samples are new diseases of the patient in the next N years, and the negative samples are diseases which cannot be newly sent in the next N years; subtracting the patient-disease encoded bipartite graph from the complete bipartite graph to obtain edges of the patient-disease decoded bipartite graph; the patient-disease encoding bipartite graph is used for an encoder to automatically learn patient nodes and disease node embedding vector expressions and extract heuristic features, and the patient-disease decoding bipartite graph is used for a decoder to learn the occurrence probability of each edge.
4. The method of constructing a slow patient population risk of illness prediction model based on graph self-encoder as claimed in claim 1, wherein the extraction of the patient feature vector comprises individual information, hospital information, historical disease number and ECI comorbidity index of historical disease; carrying out unique hot coding on the data with the characteristic type of discrete type, and converting the data into a binary variable of 0-1; taking the data with the characteristic type of numerical value as continuous characteristic, and taking the value as real number; and (3) encoding the data with the characteristic type of discrete type and the data with the value having the sequence relation into the numerical characteristic.
5. The method for constructing a slow patient group disease risk prediction model based on a graph self-encoder according to claim 1, wherein the disease feature vectors are extracted by arranging ICD-10 codes of disease nodes in an ascending order to obtain a serial number of each disease node, and generating a vector for each disease node through unique hot coding; and calculating the prevalence rate of each disease as a feature for characterizing the prevalence of the disease.
6. The method for constructing a model for predicting the risk of a disease in a chronic patient group based on a graph self-encoder as claimed in claim 1, wherein the step 4 comprises the steps of:
step 4.1: establishing a heuristic characteristic extraction model:
Figure FDA0003636568120000021
Figure FDA0003636568120000022
Figure FDA0003636568120000023
Figure FDA0003636568120000024
in the formula (I), the compound is shown in the specification,
Figure FDA0003636568120000025
and
Figure FDA0003636568120000026
a set of neighbor nodes, respectively nodes i, j and z, wherein node i represents a central node; | is the size of the solution set;
Figure FDA0003636568120000027
it is the second-order neighbor set of node j;
Figure FDA0003636568120000028
common Neighbors indices representing the edges i, j of the patient-disease-encoding bipartite graph,
Figure FDA0003636568120000029
Adamic-Adar index representing the edge i, j of the patient-disease coding bipartite graph,
Figure FDA00036365681200000210
Representing the patientJaccard's coeffient index of the side i, j of the disease-coding bipartite graph,
Figure FDA00036365681200000211
A preference attribute index representing the edge i, j of the patient-disease-encoding bipartite graph; the larger the value of the index is, the higher the occurrence probability of the edge is;
and 4.2: establishing a neighbor sampling strategy:
Figure FDA00036365681200000212
in the formula: w is aijAnd
Figure FDA00036365681200000213
weights and sampling probabilities, w, of the edges i, j of the patient-disease encoded bipartite graph, respectivelyiuWeights representing the edges i, u of the patient-disease encoded bipartite graph based on the sampling probabilities
Figure FDA00036365681200000214
Performing replacement sampling on neighbors of the central node to obtain a fixed number of neighbor samples;
step 4.3: the graph attention network is used as an encoder, the encoder comprises at least one graph convolution module, and the graph convolution layer of each graph convolution module learns the weights of different neighbors by adopting a graph attention mechanism to obtain final embedded vector expression; the layer I of the encoder is defined by
Figure FDA00036365681200000215
Multi-headed attention weights from node j to node i
Figure FDA00036365681200000216
Calculated by the following formula:
Figure FDA00036365681200000217
Figure FDA00036365681200000218
Figure FDA00036365681200000219
in the formula (I), the compound is shown in the specification,
Figure FDA0003636568120000031
a query vector representing the focus of the central node i at the layer i network of the encoder at the c-th point;
Figure FDA0003636568120000032
a weight matrix representing the focus of the query vector q at the c-th point in the l-th layer network of the encoder;
Figure FDA0003636568120000033
an embedded vector representing a central node i in a layer i network of the encoder;
Figure FDA0003636568120000034
a bias term representing the query vector q in the l-th layer network of the encoder to be noticed at the c-th point;
Figure FDA0003636568120000035
a key vector representing node j's attention at the c-th in the l-th network of the encoder;
Figure FDA0003636568120000036
a weight matrix representing the key vector k in the layer i network of the encoder attention at c;
Figure FDA0003636568120000037
representation of sections in layer I network of encoderAn embedded vector for point j; w is aijRepresents the weight of the edge i, j;
Figure FDA0003636568120000038
a bias term representing the key vector k in the layer i network of the encoder with attention at the c-th point;
Figure FDA0003636568120000039
attention weights indicating that edges i, j are attentive at the c-th in the l-th layer of the encoder;
Figure FDA00036365681200000310
a key vector representing the attention of node u at the c-th in the l-th network of the encoder;
Figure FDA00036365681200000311
carrying out exponential scaling on vector dot products, wherein d is the dimension of a vector;
after obtaining the attention weight of the multiple points of the graph, performing message aggregation operation on the embedded vectors of different neighbors:
Figure FDA00036365681200000312
Figure FDA00036365681200000313
in the formula (I), the compound is shown in the specification,
Figure FDA00036365681200000314
a vector of values representing node j's attention at head c in the layer l network of the encoder;
Figure FDA00036365681200000315
a weight matrix representing the value vector v in the layer i network of the encoder at the c-th attention;
Figure FDA00036365681200000316
a bias term representing the value vector v in the l-th network of the encoder to be noticed at the c-th place;
Figure FDA00036365681200000317
an attention vector representing a central node i in an l +1 th layer network of the encoder;
embedding vector of central node i
Figure FDA00036365681200000318
And
Figure FDA00036365681200000319
combined and taking into account a gated residual mechanism, selectively controlling the inflow of information to compute the embedded vector representation of the next layer
Figure FDA00036365681200000320
The specific calculation formula is as follows:
Figure FDA00036365681200000321
Figure FDA00036365681200000322
Figure FDA00036365681200000323
wherein r isi (l)Information indicating a central node i in a l-th network of an encoder;
Figure FDA00036365681200000324
representing a weight matrix of a central node i in a layer I network of an encoder;
Figure FDA00036365681200000325
representing the weight of the gating residual error of the central node i in the l-th network of the encoder; will be provided with
Figure FDA00036365681200000326
ri (l)And
Figure FDA00036365681200000327
are spliced in sequence and pass through
Figure FDA00036365681200000328
The weight matrix is subjected to linear transformation, and the value domain is mapped to the interval from 0 to 1 through a sigmoid function, so that r is controlledi (l)And
Figure FDA0003636568120000041
the function of information inflow of (2); finally, obtaining the embedded vector representation of the central node i of the l +1 th layer network through LayerNorm and ReLU activation function
Figure FDA0003636568120000042
Step 4.4: constructing a bilinear decoder which predicts the existence probability of edges in a patient-decoding image for the embedded vector expression of known patients and diseases, and the calculation formula is as follows:
Figure FDA0003636568120000043
Figure FDA0003636568120000044
in the formula (I), the compound is shown in the specification,
Figure FDA0003636568120000045
representing the index corresponding to the edge i, j of the patient-disease encoding bipartite graph as a heuristic characteristic;
Figure FDA0003636568120000046
transpose of the embedded vector representing node i, hjA vector representing node j; the above formula uses multiple weight matrices to use a multi-head attention mechanism
Figure FDA0003636568120000047
Learning from different angles
Figure FDA0003636568120000048
And hjThe combination method of (3) and then splicing the learned results with heuristic features to form hidden layer features of edges
Figure FDA0003636568120000049
Finally pass through WoThe weight matrix is linearly transformed by adding an offset term boObtaining the result of the output layer, and obtaining the prediction probability p of the edge i, j by using the sigmoid activation functionij
Figure FDA00036365681200000410
The loss function uses cross entropy and is calculated as follows:
Figure FDA00036365681200000411
wherein G isdecRepresents a decoding graph, eijRepresents the side ii, j, yijA label representing an edge; and (5) optimizing the Loss of the model by using a gradient descent algorithm, and training a disease risk prediction model.
7. The method for constructing a slow patient population disease risk prediction model based on a graph self-encoder according to claim 1, wherein the preprocessed data set is divided into a training set, a validation set and a test set according to a ratio of 7:1: 2; the training set is used for training the disease risk prediction model, the verification set is used for optimizing parameters of the disease risk prediction model, and the test set is used for evaluating the generalization effect of the disease risk prediction model.
8. The method of claim 1, wherein all negative examples in the data set are obtained, a set of negative examples is formed, the set of negative examples is sampled, negative examples for training a disease risk prediction model are generated, and the ratio of positive examples to negative examples is set to 1: 10.
CN202210507317.9A 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder Active CN114783608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210507317.9A CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210507317.9A CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Publications (2)

Publication Number Publication Date
CN114783608A true CN114783608A (en) 2022-07-22
CN114783608B CN114783608B (en) 2023-05-05

Family

ID=82436498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210507317.9A Active CN114783608B (en) 2022-05-10 2022-05-10 Construction method of slow patient group disease risk prediction model based on graph self-encoder

Country Status (1)

Country Link
CN (1) CN114783608B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713986A (en) * 2022-11-11 2023-02-24 中南大学 Attention mechanism-based material crystal property prediction method
CN116072298A (en) * 2023-04-06 2023-05-05 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116825360A (en) * 2023-07-24 2023-09-29 湖南工商大学 Method and device for predicting chronic disease co-morbid based on graph neural network and related equipment
CN117438023A (en) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117476240A (en) * 2023-12-28 2024-01-30 中国科学院自动化研究所 Disease prediction method and device with few samples

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050222867A1 (en) * 2004-03-31 2005-10-06 Aetna, Inc. System and method for administering health care cost reduction
WO2013108122A1 (en) * 2012-01-20 2013-07-25 Mueller-Wolf Martin "indima apparatus" system, method and computer program product for individualized and collaborative health care
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN113689954A (en) * 2021-08-24 2021-11-23 平安科技(深圳)有限公司 Hypertension risk prediction method, device, equipment and medium
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050222867A1 (en) * 2004-03-31 2005-10-06 Aetna, Inc. System and method for administering health care cost reduction
WO2013108122A1 (en) * 2012-01-20 2013-07-25 Mueller-Wolf Martin "indima apparatus" system, method and computer program product for individualized and collaborative health care
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN113689954A (en) * 2021-08-24 2021-11-23 平安科技(深圳)有限公司 Hypertension risk prediction method, device, equipment and medium
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713986A (en) * 2022-11-11 2023-02-24 中南大学 Attention mechanism-based material crystal property prediction method
CN115713986B (en) * 2022-11-11 2023-07-11 中南大学 Attention mechanism-based material crystal attribute prediction method
CN116072298A (en) * 2023-04-06 2023-05-05 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116825360A (en) * 2023-07-24 2023-09-29 湖南工商大学 Method and device for predicting chronic disease co-morbid based on graph neural network and related equipment
CN117438023A (en) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117438023B (en) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117476240A (en) * 2023-12-28 2024-01-30 中国科学院自动化研究所 Disease prediction method and device with few samples
CN117476240B (en) * 2023-12-28 2024-04-05 中国科学院自动化研究所 Disease prediction method and device with few samples

Also Published As

Publication number Publication date
CN114783608B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN114783608A (en) Construction method of slow patient group disease risk prediction model based on graph self-encoder
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
WO2022057669A1 (en) Method for pre-training knowledge graph on the basis of structured context information
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN111738535A (en) Method, device, equipment and storage medium for predicting rail transit time-space short-time passenger flow
CN110110318B (en) Text steganography detection method and system based on cyclic neural network
CN113535984A (en) Attention mechanism-based knowledge graph relation prediction method and device
CN104572583A (en) Densification of longitudinal emr for improved phenotyping
CN114519469A (en) Construction method of multivariate long sequence time sequence prediction model based on Transformer framework
Mustika et al. Analysis accuracy of xgboost model for multiclass classification-a case study of applicant level risk prediction for life insurance
CN112215604A (en) Method and device for identifying information of transaction relationship
CN113780665B (en) Private car stay position prediction method and system based on enhanced recurrent neural network
CN116187555A (en) Traffic flow prediction model construction method and prediction method based on self-adaptive dynamic diagram
CN116403730A (en) Medicine interaction prediction method and system based on graph neural network
CN111178946B (en) User behavior characterization method and system
CN112749791A (en) Link prediction method based on graph neural network and capsule network
CN113345564B (en) Early prediction method and device for patient hospitalization duration based on graph neural network
CN114398500A (en) Event prediction method based on graph-enhanced pre-training model
CN112201348B (en) Knowledge-aware-based multi-center clinical data set adaptation device
CN114513337A (en) Privacy protection link prediction method and system based on mail data
CN113255750A (en) VCC vehicle attack detection method based on deep learning
CN114418158A (en) Cell network load index prediction method based on attention mechanism learning network
CN114840777B (en) Multi-dimensional endowment service recommendation method and device and electronic equipment
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion
CN115035455A (en) Cross-category video time positioning method, system and storage medium based on multi-modal domain resisting self-adaptation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant