CN113345564A - Early prediction method and device for patient hospitalization duration based on graph neural network - Google Patents

Early prediction method and device for patient hospitalization duration based on graph neural network Download PDF

Info

Publication number
CN113345564A
CN113345564A CN202110601268.0A CN202110601268A CN113345564A CN 113345564 A CN113345564 A CN 113345564A CN 202110601268 A CN202110601268 A CN 202110601268A CN 113345564 A CN113345564 A CN 113345564A
Authority
CN
China
Prior art keywords
patient
disease
network
stay
hospitalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110601268.0A
Other languages
Chinese (zh)
Other versions
CN113345564B (en
Inventor
邱航
胡智栩
王利亚
周德嘉
丁舒涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110601268.0A priority Critical patent/CN113345564B/en
Publication of CN113345564A publication Critical patent/CN113345564A/en
Application granted granted Critical
Publication of CN113345564B publication Critical patent/CN113345564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a method and a device for early prediction of hospitalization duration of a patient based on a graph neural network, belonging to the technical field of data processing, wherein the method comprises the following steps: acquiring a medical record home page data set and preprocessing the medical record home page data set; extracting basic features and historical features of a patient; extracting a disease vector according to the preprocessed medical record home page data set; constructing a patient similarity network; constructing an early prediction model of the length of stay in hospital by using a GraphSAGE graph neural network based on the length of stay in hospital label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient; and predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient. The method can be used for early predicting the hospitalization duration of the patient at the hospitalization point, and has higher application value.

Description

Early prediction method and device for patient hospitalization duration based on graph neural network
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method and a device for early prediction of hospitalization duration of a patient based on a graph neural network.
Background
Accurate prediction of the length of a patient in hospital can help a hospital manager to effectively allocate limited medical resources, control the medical cost of the patient and improve the quality of medical service. However, the current patient stay duration prediction adopts data modeling such as clinical examination and demographics, and ignores the influence of the patient suffering from the co-morbidity on the stay duration; in addition, few studies have performed early predictions of the length of a patient's hospital stay at the point in time of patient admission. Therefore, how to accurately realize the prediction of the length of the patient in hospital in the early stage of admission has important practical significance.
Disclosure of Invention
In view of the above disadvantages in the prior art, the present invention provides a method, an apparatus, an electronic device and a storage medium for early prediction of a patient length of stay in hospital based on a neural network, so as to better predict the length of stay in hospital of the patient.
In order to achieve the above purpose, the invention adopts the technical scheme that:
according to an embodiment of the application, a graph neural network-based early prediction method for the length of a patient in hospital is provided, which comprises the following steps:
s1, preprocessing the acquired medical record homepage data set, and extracting the stay time label of the patient;
s2, extracting basic features and historical features of the patient;
s3, extracting disease vectors according to the preprocessed medical record home page data set;
s4, constructing a patient similarity network;
s5, constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of the patient;
and S6, predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient.
Further, the patient stay in hospital for a long time label is specifically: acquiring a length label of hospitalization time of the patient according to the difference between the date of hospitalization and the date of discharge of the patient;
the basic features comprise individual information and hospital information of the patient; the historical characteristics include statistical information of the patient's historical length of stay.
Further, the step S3 includes the following steps:
s301, constructing a disease co-occurrence network based on the preprocessed medical record home page data set;
s302, according to the disease co-occurrence network, generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and finishing the extraction of the disease vector:
AGGREGATEsum=∑i∈Ddi
wherein, AGGREGATEsumRepresenting a sum aggregation function, diA low dimensional vector representation representing disease i and D represents the patient's disease set.
Further, the step S301 includes the steps of:
s3011, aggregating the preprocessed medical record homepage data sets according to the identity numbers of the patients, and classifying different inpatient record diagnoses of the patients;
s3012, constructing a patient-disease two-dimensional table according to the classification result;
s3013, according to the patient-disease two-dimensional table, measuring risks between disease pairs by using the relative risk RR values, and taking the risks as the weight RR of the network edgeij
S3014, calculating RRijAnd RR for all disease pairsijWith corresponding confidence interval, preserving RRij>1 and confidence interval>1, network edge;
and S3015, generating a disease co-occurrence network according to the reserved network edges.
Further, the step S4 includes the following steps:
s401, generating a patient-disease bipartite graph represented by an adjacency matrix A according to diseases of a patient when the patient is hospitalized, wherein rows represent the patient and columns represent the diseases;
s402, compressing the patient-disease bipartite graph and calculating the number of disease co-occurrences among patients so as to construct a patient similarity network, wherein the weight e of the edges of the patient i and the patient j in the patient similarity networkijThe expression of (a) is as follows:
eij=Ai,:*Aj,:
wherein A isi,:Represents the ith row of the adjacency matrix A, i.e., the disease characterization vector for the ith patient, AjRepresenting the disease characterization vector of the jth row of the adjacency matrix a, i.e. the jth patient, and representing the operator as the dot product of the vectors.
Further, the sampling strategy p of the graph SAGE graph neural network in the step S5ijThe expression is as follows:
Figure BDA0003092788380000031
wherein p isijRepresenting the sampling probability, e, of a neighbor node j of a node iijRepresenting the weight of the edge of node i and neighbor node j, N (i) representing the neighbor set of node i, eikRepresenting the weights of the edges of node i and neighbor node k.
Further, the step S6 includes the following steps:
s601, inputting sample information to be predicted;
s602, inputting sample information to be predicted into a patient similarity network, and acquiring a second-order sampling neighbor of a node in the patient similarity network;
s603, extracting a feature vector containing basic features, historical features and disease vectors according to the information of the sample to be predicted;
and S604, outputting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model according to the second-order sampling neighbors and the feature vectors to obtain the early prediction result of the hospitalization duration of the patient.
According to another aspect of the embodiments of the present application, there is provided a device for early predicting length of stay of a patient based on a neural network, including:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring a data set of a first page of a medical record and preprocessing the data set;
the characteristic extraction module is used for extracting basic characteristics and historical characteristics of the patient;
the disease vector extraction module is used for extracting disease vectors according to the preprocessed medical record home page data set, and comprises the following steps:
the disease co-occurrence network unit is used for constructing a disease co-occurrence network based on the preprocessed medical record homepage data set;
the feature extraction unit is used for generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm according to the disease co-occurrence network, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and completing the extraction of the disease vector;
a patient similarity network module for constructing a patient similarity network, comprising:
constructing a patient-disease bipartite graph unit for generating a patient-disease bipartite graph represented by an adjacency matrix a according to the disease of the patient at the time of hospitalization, wherein rows represent the patient and columns represent the disease;
constructing a patient similarity network unit for compressing the patient-disease bipartite graph and calculating the disease co-occurrence times among patients so as to construct a patient similarity network;
the training module is used for constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient, and comprises the following steps:
the user-defined sampling strategy unit is used for acquiring a fixed number of neighbor samples according to the user-defined sampling probability;
the prediction module is used for predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient, and comprises the following steps:
the input unit is used for inputting sample information to be predicted, inputting the sample information to be predicted into the patient similarity network and acquiring a second-order sampling neighbor of a node in the patient similarity network;
the characteristic vector extraction unit is used for extracting a characteristic vector containing basic characteristics, historical characteristics and a disease vector according to the information of the sample to be predicted;
and the prediction unit is used for outputting the hospitalization duration of the sample to be predicted by utilizing the hospitalization duration early prediction model according to the second-order sampling neighbors and the characteristic vector to obtain the early prediction result of the hospitalization duration of the patient.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the program to implement any of the methods for predicting length of a patient in hospital.
According to another aspect of the embodiments of the present application, there is provided a computer-readable storage medium storing a computer program, which is executed by a processor to implement any one of the above methods for predicting length of a patient's stay in a hospital.
The technical scheme provided by one aspect of the embodiment of the application can have the following beneficial effects:
(1) because very little information is available in the early stage of admission, few studies are carried out on the early prediction of the length of the hospital stay at the point of admission, and the early prediction of the length of the hospital stay at the point of admission has greater application value by extracting the basic characteristics and the historical characteristics of the patient and establishing a prediction model based on a graph neural network.
(2) According to the method, various feature sets such as individual basic features and historical features are constructed, available information is comprehensively mined, and the prediction capability of the model can be enhanced.
(3) The invention constructs a disease co-occurrence network, and uses the Node2Vec algorithm to mine the internal information of the network, so as to obtain the embedded vector representation of the disease.
(4) The invention constructs the similarity network of the patients to measure the similarity between the patients, and adopts the GraphSAGE graph neural network model to model the stay time prediction task.
(5) The method utilizes GraphSage to carry out modeling, previous researches utilize a series of characteristics, a traditional machine learning model is adopted to carry out modeling, compared with the previous LOS prediction, the problem of the length of stay in hospital is abstracted to carry out learning modeling by a graph, and one of the innovations of the method is to convert the problem into the graph to be expressed.
Drawings
Fig. 1 shows a flowchart of a stay in hospital length prediction method according to an embodiment of the present application.
Fig. 2 shows a schematic diagram of a hospitalization record sample generation disease vector according to an embodiment of the present application.
Fig. 3 shows a schematic diagram from a patient-disease bipartite graph (left) to a patient similarity network (right) for one embodiment of the application.
FIG. 4 illustrates a prediction phase flow diagram of one embodiment of the present application.
Fig. 5 shows a structure of a stay length prediction apparatus according to an embodiment of the present application.
Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Based on the above background art, it can be known that: the existing method for predicting the length of a patient in hospital mainly has the following problems: (1) in studies that predict length of stay early at the point of admission, the prediction is not good because of the very limited data available and because these studies extract only some common features. (2) Disease diagnostic information is important for the prediction of length of stay, and existing studies mainly characterize diseases by means of unique heat codes or Charlson comorbidity indexes. These methods can result in characteristic dimensions required to represent the disease in thousands of dimensions, and do not take into account the complex intrinsic relationship between disease and disease. (3) Existing studies treat each hospitalization record independently and do not mine the relationship between records and use this additional information to enhance the performance of the model. (4) Most studies mainly adopt traditional machine learning models, such as random forests or support vector machines, to predict the length of stay, and the prediction effect is poor. In view of this, embodiments of the present application will provide a method and an apparatus for early predicting a length of a patient's stay, and an electronic device and a storage medium based on the prediction method, which are applied in a scenario of early predicting a length of a patient's stay, where the length of the patient's stay can be predicted at a point of patient's stay, and the patient is not limited to an age stage nor a disease.
In the scheme of the embodiment of the application, a large number of basic features and historical features of a patient are extracted to represent the health condition of the patient based on the hospitalization record of the patient, then a disease co-occurrence network is constructed by utilizing the co-occurrence information of the disease, the embedded vector representation of the disease is obtained through a Node2Vec algorithm, the required dimension for representing the disease is reduced, meanwhile, the complex internal relation among the diseases is kept, finally, a patient similarity network is constructed to measure the similarity among the patients, and a graph Sample and aggregate neural network model is used to convolve the structural information of the patient similarity network and the feature vector of the patient to obtain a final early hospitalization duration prediction model.
As shown in fig. 1, in one embodiment of the present application, a method for early prediction of length of a patient's stay in a hospital is provided, comprising the steps of:
and S1, preprocessing the acquired medical record homepage data set and extracting the length label of the stay of the patient.
In the embodiment disclosed in the present application, the first page data of the medical record is a record item generated by the patient after the completion of the hospital stay, and each record contains individual information of the patient (encrypted identification number, sex, age, time of stay, time of discharge, and the like), information of the hospital stay (information of hospital grade, hospital address, and the like), and diagnosis of the patient's hospitalization diseases (main diagnosis and up to 15 secondary diagnoses). Based on the method, data preprocessing is needed, and variables with deletion rate larger than 30% are removed. And filling the missing values by using the mean values of the rest data with the missing values to obtain data without the missing values to form a modeling database. Since the task belongs to a regression prediction task, namely the hospital admission time is taken as a numerical variable, the hospital admission date and the hospital discharge date of the patient are differentiated to obtain the hospital admission time label of the patient.
S2, extracting basic features and historical features of the patient;
in the embodiment disclosed in the present application, the individual information of the patient, such as sex and age, is an important reference index in the stay prediction task. In addition, the patient's historical hospitalization record is useful for predicting the length of the hospital stay (patients who are frequently hospitalized for a long time are also more likely to be hospitalized for a long time). Therefore, the application extracts the basic features and the historical features to enrich the feature set. The detailed rules of extraction are shown in table 1.
TABLE 1
Figure BDA0003092788380000081
Figure BDA0003092788380000091
The basic features are mainly the feature quantification of individual information and hospital information of a patient, so that the individual information and the hospital information are converted into a data format (numerical data) capable of being recognized by a machine learning model. The historical characteristics are statistical information of the stay length of the historical stay of the patient, such as the average value, the maximum value and the like of the stay length of the historical stay. These characteristics have all proven to be useful features in previous studies. The third column of table 1 is the data type of the features, which if numerical, are treated as continuous features and take on real numbers. If the discrete type is adopted, one-hot coding is needed to convert the discrete type into a binary variable of 0-1. However, for example, the "hospitalization" field, its value is critical, urgent and general. Although it is discrete data, the value has a sequential relationship, and in order to reduce the dimension of the data, the feature is encoded into numerical features, namely 1, 2 and 3, so that the dimension of the feature can be reduced, and the sequential information in the feature can be kept. For historical features, the extraction is performed on the patient's historical hospitalizations. However, some patients do not have any history to reference, so the characteristics are null values, and in order to fill in missing values, the characteristics are uniformly filled by-999.
S3, extracting disease vectors according to the preprocessed medical record home page data set, and the method comprises the following steps:
s301, constructing a disease co-occurrence network based on the preprocessed medical record homepage data set, and comprising the following steps;
and constructing a disease co-occurrence network based on the first page data of the medical records to measure the internal relationship among the diseases. The nodes of the network are diseases, the edges are the relationship between the diseases, and the construction steps of the disease co-occurrence network are as follows:
s3011, aggregating the preprocessed medical record homepage data sets according to the identity numbers of the patients, and classifying different inpatient record diagnoses of the patients;
s3012, according to the classification result, a patient-disease two-dimensional table is constructed, rows represent patients, columns represent diseases, and the value of the two-dimensional table is 1 or 0. 1 represents the patient has the disease and 0 represents the patient does not have the disease. Because the data volume may be very large, the two-dimensional table is extremely sparse and has huge memory consumption, and can be stored by using a sparse matrix;
s3013, according to the patient-disease two-dimensional table, measuring risks between disease pairs by using the relative risk RR values, and taking the risks as the weight RR of the network edgeij
Figure BDA0003092788380000101
Wherein, RRijRR values, C representing the relative Risk of diseases i and jijRepresenting the number of co-occurrences of disease i and disease j, N representing the total number of people, i.e. the number of rows of the two-dimensional table, CiAnd CjRepresenting the number of occurrences of disease i and disease j, which are all statistically derived from a patient-disease two-dimensional table.
S3014, calculating RRij99% confidence interval of and RR for all disease pairsijWith corresponding confidence interval, preserving RRij>1 and confidence interval>1, network edge;
s3015, generating a disease co-occurrence network according to the reserved network edges, that is, if there are network edges, nodes of the network can also be generated from the edges, thereby generating the whole disease co-occurrence network.
S301, constructing a disease co-occurrence network based on the preprocessed medical record home page data set;
s302, according to the disease co-occurrence network, generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and finishing the extraction of the disease vector:
after the disease co-occurrence network is obtained, the Node2Vec algorithm is used for quantizing the disease nodes into embedded vectors with low dimensions, and the vector dimensions can be between 100 and 300. Since each patient may have multiple diseases detected at the point of stay and the number of diseases is different for each patient, an aggregation strategy is needed to integrate the embedded vectors of the same patient into a standard vector. The present application proposes a summation aggregation strategy:
AGGREGATEsum=∑i∈Ddi
wherein D represents the patient's diagnosis set of hospitalization, DiA low dimensional vector representation representing disease i. The aggregation strategy is to add the vectors corresponding to the hospitalization diagnosis of the patient so as to characterize the diseased condition of the patient. After the aggregation step, the disease information of each patient is quantified to form a uniform length vector, i.e., a disease vector. A schematic diagram of the generation of a disease vector from a sample is shown in figure 2.
S4, constructing a patient similarity network:
s401, generating a patient-disease bipartite graph represented by an adjacency matrix A according to diseases of a patient when the patient is hospitalized, wherein rows represent the patient and columns represent the diseases;
s402, compressing the patient-disease bipartite graph, and calculating the disease co-occurrence times among patients, thereby constructing a patient similarity network.
In the embodiments disclosed in this application, the present invention treats the patient's underlying features, historical features, and disease vectors as the patient's feature vectors. And based on co-occurrence of disease among the individual patients, a patient similarity network is constructed incorporating additional information for the prediction of length of stay. Given that each patient will have multiple illnesses while hospitalized, a patient-illness bipartite graph may be generated, as shown in the left panel of fig. 3. Each patient may have several diseases, forming the bipartite graph. It is noteworthy that different hospitalizations of the same person are treated as two different individuals here, since the subject of the length of stay prediction task is a hospitalization record, not a person. The bipartite graph is represented by an adjacency matrix a, the rows of which represent patients (admission records) and the columns represent diseases. With the patient-disease bipartite graph in place, the graph was compressed to abstract the patient-to-patient similarity, resulting in the form of the right graph of fig. 3. If two patients share one disease, the two patients have an edge, the weight of the edge is the number of the shared diseases, and the larger the weight is, the larger the disease similarity between the patients is represented, thereby establishing a patient similarity network. The weights of the edges of patient i and patient j are calculated as shown below. The operators are dot products of vectors. In the patient similarity network, a node is an admission record (patient), and an edge represents the similarity between the admission record and the admission record, which can also be called the similarity between the patient and the patient:
eij=Ai,:*Aj,:
s5, constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of the patient.
In the embodiment disclosed by the application, the characteristic vector of the patient is regarded as the characteristic vector of the node in the patient similarity network, the structural information and the characteristic vector of the patient similarity network are considered simultaneously by using the GraphSAGE graph neural network, and the hospitalization duration is modeled, so that the early prediction model of the hospitalization duration of the patient is obtained. The GraphSAGE graph neural network adopts a small-batch training mode, a part of nodes and neighbors thereof are sampled every time, and the network is trained, so that the network can be trained on large-scale graph data. When a prediction needs to be made on new data, the prediction can be made only by using the neighbor information of the nodes without retraining the whole graph data like other graph neural network models. In the neighbor sampling strategy of GraphSAGE, a certain number of neighbors are sampled randomly, typically based on a mean distribution. The invention designs a self-defined sampling strategy, so that the larger the weight of the edge is, the higher the sampling probability is, as shown in the following formula:
Figure BDA0003092788380000121
the above equation obtains the sampling probability of each neighbor node j of the node i using the softmax function, where n (i) represents the neighbor set of the node i. Based on the sampling probability, the neighbor of the node is subjected to the sampling with the put back, and a fixed number of neighbor samples are obtained.
The invention designs a four-layer GraphSAGE graph neural network, wherein the first layer is an input layer, the second layer and the third layer are graph convolution layers, the fourth layer is an output layer and only comprises 1 neuron, the predicted value of the hospitalization duration of a node is directly output, and the difference between the predicted value and the true value is evaluated by adopting the mean square error as a loss function. The number of neighbor samples of the second layer is 10 and the number of neighbor samples of the third layer is 5. The aggregation function of each graph convolution layer selects the mean aggregation function and uses ReLU as the nonlinear activation function. In order to optimize the model parameters, a gradient descent method is used for back propagation, so that the parameters of the weight matrix are optimized, and a well-trained early prediction model of the duration of the hospitalization is obtained.
S6, predicting the stay time of the sample to be predicted by using the stay time early prediction model to obtain the stay time early prediction result of the patient, and the method comprises the following steps:
s601, inputting sample information to be predicted;
s602, inputting sample information to be predicted into a patient similarity network, and acquiring a second-order sampling neighbor of a node in the patient similarity network;
s603, extracting a feature vector containing basic features, historical features and disease vectors according to the information of the sample to be predicted;
and S604, outputting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model according to the second-order sampling neighbors and the feature vectors to obtain the early prediction result of the hospitalization duration of the patient.
In the embodiment of the present disclosure, for a new hospitalization record, individual information, hospitalization hospital information, patient's disease diagnosis information, and patient's historical hospitalization record may be obtained as well, and corresponding feature vectors (basic feature, historical feature, and disease vector) may be extracted. The admission record is added to the patient similarity network, and the second-order sampling neighbors of the node are obtained. And (4) making a prediction on the model by the trained GraphSAGE model, and outputting the predicted length of stay.
It should be noted that the embodiments disclosed in the present application may be applied to multiple scenarios, and the application scenarios are not specifically limited in the present application. For example, the application is applied to a scene of early prediction of the length of a patient's stay, the length of the patient's stay can be predicted at the point of patient's admission, the patient is not limited to the age stage and the disease, the age and the disease are not distinguished, and even 0 disease can be predicted.
As shown in fig. 5, another embodiment of the present application provides an apparatus for early prediction of length of stay of a patient, comprising:
the pretreatment module is used for pretreating the acquired medical record homepage data set and extracting the hospitalization duration label of the patient;
the characteristic extraction module is used for extracting basic characteristics and historical characteristics of the patient;
the system comprises a disease vector extraction module and a feature extraction unit, wherein the disease vector extraction module is used for extracting disease vectors according to a preprocessed medical record home page data set and comprises a disease co-occurrence network unit which is used for constructing a disease co-occurrence network based on the preprocessed medical record home page data set, the feature extraction unit is used for generating low-dimensional vector embedded representation of diseases by using a Node2Vec algorithm according to the disease co-occurrence network and integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method to finish the extraction of the disease vectors.
A patient similarity network module for constructing a patient similarity network, comprising: constructing a patient-disease bipartite graph unit for generating a patient-disease bipartite graph represented by an adjacency matrix a according to the disease of the patient at the time of hospitalization, wherein rows represent the patient and columns represent the disease; and constructing a patient similarity network unit for compressing the patient-disease bipartite graph and calculating the disease co-occurrence times among patients so as to construct the patient similarity network.
The training module is used for constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient, and comprises the following steps:
the user-defined sampling strategy unit is used for acquiring a fixed number of neighbor samples according to the user-defined sampling probability;
the prediction module is used for predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient, and comprises the following steps: the input unit is used for inputting sample information to be predicted, inputting the sample information to be predicted into the patient similarity network and acquiring a second-order sampling neighbor of a node in the patient similarity network; the characteristic vector extraction unit is used for extracting a characteristic vector containing basic characteristics, historical characteristics and a disease vector according to the information of the sample to be predicted; and the prediction unit is used for outputting the hospitalization duration of the sample to be predicted by utilizing the hospitalization duration early prediction model according to the second-order sampling neighbors and the characteristic vector to obtain the early prediction result of the hospitalization duration of the patient.
The device for predicting length of stay of a patient provided in the embodiment shown in fig. 5 can implement the technical solution shown in the above method embodiment, and the implementation principle and the beneficial effects are similar, and are not described herein again.
In another embodiment of the present application, an electronic device is provided, which includes a memory, a processor and a computer program stored in the memory and executed on the processor, wherein the processor executes the program to perform any one of the above methods for predicting length of patient stay.
As shown in fig. 6, the electronic device may include: the device comprises a processor, a memory, a bus and a communication interface, wherein the processor, the communication interface and the memory are connected through the bus, the memory stores a computer program capable of running on the processor, and the processor executes the computer program to execute the method for predicting the length of a patient in hospital provided by any one of the previous embodiments of the application.
Another embodiment of the present application provides a computer-readable storage medium storing a computer program, which is executed by a processor to implement the method for predicting length of a patient in hospital as described in any one of the foregoing embodiments.
The computer-readable storage medium described above may be implemented in any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks, and may be any available medium that can be accessed by a general purpose or special purpose computer. A readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium, which may also be an integral part of the processor, and the processor and the readable storage medium may reside in an Application Specific Integrated Circuit (ASIC), and the processor and the readable storage medium may also reside as discrete components in a device for predicting a length of a patient's stay in a hospital.
Embodiments of the present application may be provided as a method, apparatus, or computer program product, and as such, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. While the methods, apparatus (systems), and computer program products according to embodiments of the invention have been described with reference to flowchart illustrations and/or block diagrams, it is to be understood that each flowchart illustration and/or block diagram block or blocks, and combinations of flowchart illustrations and/or block diagrams, can be implemented by computer program instructions which are provided to a computer-readable memory of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart illustration of one or more flow diagrams and/or block diagrams block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

Claims (10)

1. A method for early predicting the length of a patient in hospital based on a graph neural network is characterized by comprising the following steps:
s1, preprocessing the acquired medical record homepage data set and extracting a length label of stay of the patient;
s2, respectively extracting basic features and historical features of the patient;
s3, extracting disease vectors according to the preprocessed medical record home page data set;
s4, constructing a patient similarity network;
s5, constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of the patient;
and S6, predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient.
2. The method for early prediction of patient stay duration based on graph neural network as claimed in claim 1, wherein the patient stay duration label is specifically: acquiring a length label of hospitalization time of the patient according to the difference between the date of hospitalization and the date of discharge of the patient;
the basic features comprise individual information and hospital information of the patient; the historical characteristics include statistical information of the patient's historical length of stay.
3. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S3 comprises the following steps:
s301, constructing a disease co-occurrence network based on the preprocessed medical record home page data set;
s302, according to the disease co-occurrence network, generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and finishing the extraction of the disease vector:
AGGREGATEsum=∑i∈Ddi
wherein, AGGREGATEsumRepresenting a sum aggregation function, diA low dimensional vector representation representing disease i and D represents the patient's disease set.
4. The method for early predicting the length of stay of a patient in hospital based on a neural network of claim 3, wherein said step S301 comprises the steps of:
s3011, aggregating the preprocessed medical record homepage data sets according to the identity numbers of the patients, and classifying different inpatient record diagnoses of the patients;
s3012, constructing a patient-disease two-dimensional table according to the classification result;
s3013, according to the patient-disease two-dimensional table, measuring risks between disease pairs by using the relative risk RR values, and taking the risks as the weight RR of the network edgeij
S3014, calculating RRijInterval under confidence, and RR for all disease pairsijWith corresponding confidence interval, preserving RRij>1 and confidence interval>1, network edge;
and S3015, generating a disease co-occurrence network according to the reserved network edges.
5. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S4 comprises the following steps:
s401, generating a patient-disease bipartite graph represented by an adjacency matrix A according to diseases of a patient when the patient is hospitalized, wherein rows represent the patient and columns represent the diseases;
s402, compressing the patient-disease bipartite graph and calculating the number of disease co-occurrences among patients so as to construct a patient similarity network, wherein the weight e of the edges of the patient i and the patient j in the patient similarity networkijThe expression of (a) is as follows:
eij=Ai,:*Aj,:
wherein A isi,:Represents the ith row of the adjacency matrix A, i.e., the disease characterization vector for the ith patient, Aj,:Represents the j-th row of the adjacency matrix AI.e. the disease characterization vector of the jth patient, represents an operator, being the dot product of the vectors.
6. The method for early predicting the hospitalization duration of patient based on graph neural network as claimed in claim 1, wherein the sampling strategy p of graph neural network in step S5ijThe expression is as follows:
Figure FDA0003092788370000031
wherein p isijRepresenting the sampling probability, e, of a neighbor node j of a node iijRepresenting the weight of the edge of node i and neighbor node j, N (i) representing the neighbor set of node i, eikRepresenting the weights of the edges of node i and neighbor node k.
7. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S6 comprises the following steps:
s601, inputting sample information to be predicted;
s602, inputting sample information to be predicted into a patient similarity network, and acquiring a second-order sampling neighbor of a node in the patient similarity network;
s603, extracting a feature vector containing basic features, historical features and disease vectors according to the information of the sample to be predicted;
and S604, outputting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model according to the second-order sampling neighbors and the feature vectors to obtain the early prediction result of the hospitalization duration of the patient.
8. A device for early prediction of patient hospitalization duration based on a graph neural network is characterized by comprising:
the pretreatment module is used for pretreating the acquired medical record homepage data set and extracting the hospitalization duration label of the patient;
the characteristic extraction module is used for extracting basic characteristics and historical characteristics of the patient;
the disease vector extraction module is used for extracting disease vectors according to the preprocessed medical record home page data set, and comprises the following steps:
the disease co-occurrence network unit is used for constructing a disease co-occurrence network based on the preprocessed medical record homepage data set;
the feature extraction unit is used for generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm according to the disease co-occurrence network, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and completing the extraction of the disease vector;
a patient similarity network module for constructing a patient similarity network, comprising:
constructing a patient-disease bipartite graph unit for generating a patient-disease bipartite graph represented by an adjacency matrix a according to the disease of the patient at the time of hospitalization, wherein rows represent the patient and columns represent the disease;
constructing a patient similarity network unit for compressing the patient-disease bipartite graph and calculating the disease co-occurrence times among patients so as to construct a patient similarity network;
the training module is used for constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient, and comprises the following steps:
the user-defined sampling strategy unit is used for acquiring a fixed number of neighbor samples according to the user-defined sampling probability;
the prediction module is used for predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient, and comprises the following steps:
the input unit is used for inputting sample information to be predicted, inputting the sample information to be predicted into the patient similarity network and acquiring a second-order sampling neighbor of a node in the patient similarity network;
the characteristic vector extraction unit is used for extracting a characteristic vector containing basic characteristics, historical characteristics and a disease vector according to the information of the sample to be predicted;
and the prediction unit is used for outputting the hospitalization duration of the sample to be predicted by utilizing the hospitalization duration early prediction model according to the second-order sampling neighbors and the characteristic vector to obtain the early prediction result of the hospitalization duration of the patient.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executed on the processor, the processor executing the program to implement the method of predicting length of a patient's stay in a hospital as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which is executable by a processor for implementing a method for predicting the length of a patient's stay in a hospital as claimed in any one of claims 1 to 7.
CN202110601268.0A 2021-05-31 2021-05-31 Early prediction method and device for patient hospitalization duration based on graph neural network Active CN113345564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601268.0A CN113345564B (en) 2021-05-31 2021-05-31 Early prediction method and device for patient hospitalization duration based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601268.0A CN113345564B (en) 2021-05-31 2021-05-31 Early prediction method and device for patient hospitalization duration based on graph neural network

Publications (2)

Publication Number Publication Date
CN113345564A true CN113345564A (en) 2021-09-03
CN113345564B CN113345564B (en) 2022-08-05

Family

ID=77472853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601268.0A Active CN113345564B (en) 2021-05-31 2021-05-31 Early prediction method and device for patient hospitalization duration based on graph neural network

Country Status (1)

Country Link
CN (1) CN113345564B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898879A (en) * 2022-05-10 2022-08-12 电子科技大学 Chronic disease risk prediction method based on graph representation learning
WO2023027107A1 (en) * 2021-08-25 2023-03-02 富士フイルム株式会社 Prediction device for predicting information about patient, operation method for prediction device, and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150307A1 (en) * 2005-12-22 2007-06-28 Cerner Innovation, Inc. Displaying clinical predicted length of stay of patients for workload balancing in a healthcare environment
US20080015891A1 (en) * 2006-07-12 2008-01-17 Medai, Inc. Method and System to Assess an Acute and Chronic Disease Impact Index
CN106778014A (en) * 2016-12-29 2017-05-31 浙江大学 A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network
CN107180283A (en) * 2017-07-05 2017-09-19 山东大学 A kind of behavior prediction system and method for being in hospital again combined based on optimal characteristics
CN109523067A (en) * 2018-10-30 2019-03-26 平安医疗健康管理股份有限公司 Cost Forecast method, apparatus, server and storage medium based on prediction model
CN109642258A (en) * 2018-10-17 2019-04-16 上海允英医疗科技有限公司 A kind of method and system of tumor prognosis prediction
CN109949936A (en) * 2019-03-13 2019-06-28 成都数联易康科技有限公司 A kind of Risk Forecast Method of being hospitalized again based on deep learning mixed model
CN110504021A (en) * 2018-05-17 2019-11-26 阿里巴巴集团控股有限公司 Inpatient bed dispatching method, device, system and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150307A1 (en) * 2005-12-22 2007-06-28 Cerner Innovation, Inc. Displaying clinical predicted length of stay of patients for workload balancing in a healthcare environment
US20080015891A1 (en) * 2006-07-12 2008-01-17 Medai, Inc. Method and System to Assess an Acute and Chronic Disease Impact Index
CN106778014A (en) * 2016-12-29 2017-05-31 浙江大学 A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network
CN107180283A (en) * 2017-07-05 2017-09-19 山东大学 A kind of behavior prediction system and method for being in hospital again combined based on optimal characteristics
CN110504021A (en) * 2018-05-17 2019-11-26 阿里巴巴集团控股有限公司 Inpatient bed dispatching method, device, system and electronic equipment
CN109642258A (en) * 2018-10-17 2019-04-16 上海允英医疗科技有限公司 A kind of method and system of tumor prognosis prediction
CN109523067A (en) * 2018-10-30 2019-03-26 平安医疗健康管理股份有限公司 Cost Forecast method, apparatus, server and storage medium based on prediction model
CN109949936A (en) * 2019-03-13 2019-06-28 成都数联易康科技有限公司 A kind of Risk Forecast Method of being hospitalized again based on deep learning mixed model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
THANOS GENTIMIS等: ""Predicting Hospital Length of Stay Using Neural Networks on MIMIC III Data"", 《IEEE INTERNATIONAL SYMPOSIUM ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING》 *
陈潮锋等: ""基于机器学习算法预测全膝关节置换后住院时长"", 《中国组织工程研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023027107A1 (en) * 2021-08-25 2023-03-02 富士フイルム株式会社 Prediction device for predicting information about patient, operation method for prediction device, and program
CN114898879A (en) * 2022-05-10 2022-08-12 电子科技大学 Chronic disease risk prediction method based on graph representation learning

Also Published As

Publication number Publication date
CN113345564B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
Che et al. Deep computational phenotyping
Bashir et al. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting
JP2021524099A (en) Systems and methods for integrating statistical models of different data modality
CN109034960B (en) Multi-attribute inference method based on user node embedding
CN113345564B (en) Early prediction method and device for patient hospitalization duration based on graph neural network
WO2013067461A2 (en) Identifying associations in data
Chang et al. Parameter learning for the belief rule base system in the residual life probability prediction of metalized film capacitor
CN111914562B (en) Electronic information analysis method, device, equipment and readable storage medium
CN111966904A (en) Information recommendation method based on multi-user portrait model and related device
CN111178986B (en) User-commodity preference prediction method and system
CN110781401A (en) Top-n project recommendation method based on collaborative autoregressive flow
Gupta et al. Concurrent imputation and prediction on EHR data using bi-directional GANs: Bi-GANs for EHR imputation and prediction
Li et al. Integrating static and time-series data in deep recurrent models for oncology early warning systems
CN114628001B (en) Prescription recommendation method, system, equipment and storage medium based on neural network
CN115099326A (en) Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium based on artificial intelligence
CN113705698A (en) Information pushing method and device based on click behavior prediction
CN117557331A (en) Product recommendation method and device, computer equipment and storage medium
CN116779111A (en) Drug recommendation method and system based on heterogeneous EHR network representation learning
Farnoudkia et al. Copula Gaussian graphical modelling of biological networks and Bayesian inference of model parameters
CN113643283A (en) Method, device, equipment and storage medium for detecting aging condition of human body
Mu et al. Diagnosis prediction via recurrent neural networks
CN116994699B (en) Deep learning-based venlafaxine dose individuation recommendation method and device
Wang et al. PPDTS: Predicting potential drug–target interactions based on network similarity
CN115114345B (en) Feature representation extraction method, device, equipment, storage medium and program product
US20240169187A1 (en) Systems and Methods for Supplementing Data With Generative Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant