CN113345564A

CN113345564A - Early prediction method and device for patient hospitalization duration based on graph neural network

Info

Publication number: CN113345564A
Application number: CN202110601268.0A
Authority: CN
Inventors: 邱航; 胡智栩; 王利亚; 周德嘉; 丁舒涵
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-03
Anticipated expiration: 2041-05-31
Also published as: CN113345564B

Abstract

The invention provides a method and a device for early prediction of hospitalization duration of a patient based on a graph neural network, belonging to the technical field of data processing, wherein the method comprises the following steps: acquiring a medical record home page data set and preprocessing the medical record home page data set; extracting basic features and historical features of a patient; extracting a disease vector according to the preprocessed medical record home page data set; constructing a patient similarity network; constructing an early prediction model of the length of stay in hospital by using a GraphSAGE graph neural network based on the length of stay in hospital label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient; and predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient. The method can be used for early predicting the hospitalization duration of the patient at the hospitalization point, and has higher application value.

Description

Early prediction method and device for patient hospitalization duration based on graph neural network

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method and a device for early prediction of hospitalization duration of a patient based on a graph neural network.

Background

Accurate prediction of the length of a patient in hospital can help a hospital manager to effectively allocate limited medical resources, control the medical cost of the patient and improve the quality of medical service. However, the current patient stay duration prediction adopts data modeling such as clinical examination and demographics, and ignores the influence of the patient suffering from the co-morbidity on the stay duration; in addition, few studies have performed early predictions of the length of a patient's hospital stay at the point in time of patient admission. Therefore, how to accurately realize the prediction of the length of the patient in hospital in the early stage of admission has important practical significance.

Disclosure of Invention

In view of the above disadvantages in the prior art, the present invention provides a method, an apparatus, an electronic device and a storage medium for early prediction of a patient length of stay in hospital based on a neural network, so as to better predict the length of stay in hospital of the patient.

In order to achieve the above purpose, the invention adopts the technical scheme that:

according to an embodiment of the application, a graph neural network-based early prediction method for the length of a patient in hospital is provided, which comprises the following steps:

s1, preprocessing the acquired medical record homepage data set, and extracting the stay time label of the patient;

s2, extracting basic features and historical features of the patient;

s3, extracting disease vectors according to the preprocessed medical record home page data set;

s4, constructing a patient similarity network;

s5, constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of the patient;

and S6, predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient.

Further, the patient stay in hospital for a long time label is specifically: acquiring a length label of hospitalization time of the patient according to the difference between the date of hospitalization and the date of discharge of the patient;

the basic features comprise individual information and hospital information of the patient; the historical characteristics include statistical information of the patient's historical length of stay.

Further, the step S3 includes the following steps:

s301, constructing a disease co-occurrence network based on the preprocessed medical record home page data set;

s302, according to the disease co-occurrence network, generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and finishing the extraction of the disease vector:

AGGREGATE_sum＝∑_i∈Dd_i

wherein, AGGREGATE_sumRepresenting a sum aggregation function, d_iA low dimensional vector representation representing disease i and D represents the patient's disease set.

Further, the step S301 includes the steps of:

s3011, aggregating the preprocessed medical record homepage data sets according to the identity numbers of the patients, and classifying different inpatient record diagnoses of the patients;

s3012, constructing a patient-disease two-dimensional table according to the classification result;

s3013, according to the patient-disease two-dimensional table, measuring risks between disease pairs by using the relative risk RR values, and taking the risks as the weight RR of the network edge_ij；

S3014, calculating RR_ijAnd RR for all disease pairs_ijWith corresponding confidence interval, preserving RR_ij>1 and confidence interval>1, network edge;

and S3015, generating a disease co-occurrence network according to the reserved network edges.

Further, the step S4 includes the following steps:

s401, generating a patient-disease bipartite graph represented by an adjacency matrix A according to diseases of a patient when the patient is hospitalized, wherein rows represent the patient and columns represent the diseases;

s402, compressing the patient-disease bipartite graph and calculating the number of disease co-occurrences among patients so as to construct a patient similarity network, wherein the weight e of the edges of the patient i and the patient j in the patient similarity network_ijThe expression of (a) is as follows:

e_ij＝A_i,:*A_j,:

wherein A is_i,:Represents the ith row of the adjacency matrix A, i.e., the disease characterization vector for the ith patient, A_jRepresenting the disease characterization vector of the jth row of the adjacency matrix a, i.e. the jth patient, and representing the operator as the dot product of the vectors.

Further, the sampling strategy p of the graph SAGE graph neural network in the step S5_ijThe expression is as follows:

wherein p is_ijRepresenting the sampling probability, e, of a neighbor node j of a node i_ijRepresenting the weight of the edge of node i and neighbor node j, N (i) representing the neighbor set of node i, e_ikRepresenting the weights of the edges of node i and neighbor node k.

Further, the step S6 includes the following steps:

s601, inputting sample information to be predicted;

s602, inputting sample information to be predicted into a patient similarity network, and acquiring a second-order sampling neighbor of a node in the patient similarity network;

s603, extracting a feature vector containing basic features, historical features and disease vectors according to the information of the sample to be predicted;

and S604, outputting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model according to the second-order sampling neighbors and the feature vectors to obtain the early prediction result of the hospitalization duration of the patient.

According to another aspect of the embodiments of the present application, there is provided a device for early predicting length of stay of a patient based on a neural network, including:

the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring a data set of a first page of a medical record and preprocessing the data set;

the characteristic extraction module is used for extracting basic characteristics and historical characteristics of the patient;

the disease vector extraction module is used for extracting disease vectors according to the preprocessed medical record home page data set, and comprises the following steps:

the disease co-occurrence network unit is used for constructing a disease co-occurrence network based on the preprocessed medical record homepage data set;

the feature extraction unit is used for generating low-dimensional vector embedded representation of the disease by using a Node2Vec algorithm according to the disease co-occurrence network, integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method, and completing the extraction of the disease vector;

a patient similarity network module for constructing a patient similarity network, comprising:

constructing a patient-disease bipartite graph unit for generating a patient-disease bipartite graph represented by an adjacency matrix a according to the disease of the patient at the time of hospitalization, wherein rows represent the patient and columns represent the disease;

constructing a patient similarity network unit for compressing the patient-disease bipartite graph and calculating the disease co-occurrence times among patients so as to construct a patient similarity network;

the training module is used for constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of a patient, and comprises the following steps:

the user-defined sampling strategy unit is used for acquiring a fixed number of neighbor samples according to the user-defined sampling probability;

the prediction module is used for predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient, and comprises the following steps:

the input unit is used for inputting sample information to be predicted, inputting the sample information to be predicted into the patient similarity network and acquiring a second-order sampling neighbor of a node in the patient similarity network;

the characteristic vector extraction unit is used for extracting a characteristic vector containing basic characteristics, historical characteristics and a disease vector according to the information of the sample to be predicted;

and the prediction unit is used for outputting the hospitalization duration of the sample to be predicted by utilizing the hospitalization duration early prediction model according to the second-order sampling neighbors and the characteristic vector to obtain the early prediction result of the hospitalization duration of the patient.

According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the program to implement any of the methods for predicting length of a patient in hospital.

According to another aspect of the embodiments of the present application, there is provided a computer-readable storage medium storing a computer program, which is executed by a processor to implement any one of the above methods for predicting length of a patient's stay in a hospital.

The technical scheme provided by one aspect of the embodiment of the application can have the following beneficial effects:

(1) because very little information is available in the early stage of admission, few studies are carried out on the early prediction of the length of the hospital stay at the point of admission, and the early prediction of the length of the hospital stay at the point of admission has greater application value by extracting the basic characteristics and the historical characteristics of the patient and establishing a prediction model based on a graph neural network.

(2) According to the method, various feature sets such as individual basic features and historical features are constructed, available information is comprehensively mined, and the prediction capability of the model can be enhanced.

(3) The invention constructs a disease co-occurrence network, and uses the Node2Vec algorithm to mine the internal information of the network, so as to obtain the embedded vector representation of the disease.

(4) The invention constructs the similarity network of the patients to measure the similarity between the patients, and adopts the GraphSAGE graph neural network model to model the stay time prediction task.

(5) The method utilizes GraphSage to carry out modeling, previous researches utilize a series of characteristics, a traditional machine learning model is adopted to carry out modeling, compared with the previous LOS prediction, the problem of the length of stay in hospital is abstracted to carry out learning modeling by a graph, and one of the innovations of the method is to convert the problem into the graph to be expressed.

Drawings

Fig. 1 shows a flowchart of a stay in hospital length prediction method according to an embodiment of the present application.

Fig. 2 shows a schematic diagram of a hospitalization record sample generation disease vector according to an embodiment of the present application.

Fig. 3 shows a schematic diagram from a patient-disease bipartite graph (left) to a patient similarity network (right) for one embodiment of the application.

FIG. 4 illustrates a prediction phase flow diagram of one embodiment of the present application.

Fig. 5 shows a structure of a stay length prediction apparatus according to an embodiment of the present application.

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Based on the above background art, it can be known that: the existing method for predicting the length of a patient in hospital mainly has the following problems: (1) in studies that predict length of stay early at the point of admission, the prediction is not good because of the very limited data available and because these studies extract only some common features. (2) Disease diagnostic information is important for the prediction of length of stay, and existing studies mainly characterize diseases by means of unique heat codes or Charlson comorbidity indexes. These methods can result in characteristic dimensions required to represent the disease in thousands of dimensions, and do not take into account the complex intrinsic relationship between disease and disease. (3) Existing studies treat each hospitalization record independently and do not mine the relationship between records and use this additional information to enhance the performance of the model. (4) Most studies mainly adopt traditional machine learning models, such as random forests or support vector machines, to predict the length of stay, and the prediction effect is poor. In view of this, embodiments of the present application will provide a method and an apparatus for early predicting a length of a patient's stay, and an electronic device and a storage medium based on the prediction method, which are applied in a scenario of early predicting a length of a patient's stay, where the length of the patient's stay can be predicted at a point of patient's stay, and the patient is not limited to an age stage nor a disease.

In the scheme of the embodiment of the application, a large number of basic features and historical features of a patient are extracted to represent the health condition of the patient based on the hospitalization record of the patient, then a disease co-occurrence network is constructed by utilizing the co-occurrence information of the disease, the embedded vector representation of the disease is obtained through a Node2Vec algorithm, the required dimension for representing the disease is reduced, meanwhile, the complex internal relation among the diseases is kept, finally, a patient similarity network is constructed to measure the similarity among the patients, and a graph Sample and aggregate neural network model is used to convolve the structural information of the patient similarity network and the feature vector of the patient to obtain a final early hospitalization duration prediction model.

As shown in fig. 1, in one embodiment of the present application, a method for early prediction of length of a patient's stay in a hospital is provided, comprising the steps of:

and S1, preprocessing the acquired medical record homepage data set and extracting the length label of the stay of the patient.

In the embodiment disclosed in the present application, the first page data of the medical record is a record item generated by the patient after the completion of the hospital stay, and each record contains individual information of the patient (encrypted identification number, sex, age, time of stay, time of discharge, and the like), information of the hospital stay (information of hospital grade, hospital address, and the like), and diagnosis of the patient's hospitalization diseases (main diagnosis and up to 15 secondary diagnoses). Based on the method, data preprocessing is needed, and variables with deletion rate larger than 30% are removed. And filling the missing values by using the mean values of the rest data with the missing values to obtain data without the missing values to form a modeling database. Since the task belongs to a regression prediction task, namely the hospital admission time is taken as a numerical variable, the hospital admission date and the hospital discharge date of the patient are differentiated to obtain the hospital admission time label of the patient.

S2, extracting basic features and historical features of the patient;

in the embodiment disclosed in the present application, the individual information of the patient, such as sex and age, is an important reference index in the stay prediction task. In addition, the patient's historical hospitalization record is useful for predicting the length of the hospital stay (patients who are frequently hospitalized for a long time are also more likely to be hospitalized for a long time). Therefore, the application extracts the basic features and the historical features to enrich the feature set. The detailed rules of extraction are shown in table 1.

TABLE 1

The basic features are mainly the feature quantification of individual information and hospital information of a patient, so that the individual information and the hospital information are converted into a data format (numerical data) capable of being recognized by a machine learning model. The historical characteristics are statistical information of the stay length of the historical stay of the patient, such as the average value, the maximum value and the like of the stay length of the historical stay. These characteristics have all proven to be useful features in previous studies. The third column of table 1 is the data type of the features, which if numerical, are treated as continuous features and take on real numbers. If the discrete type is adopted, one-hot coding is needed to convert the discrete type into a binary variable of 0-1. However, for example, the "hospitalization" field, its value is critical, urgent and general. Although it is discrete data, the value has a sequential relationship, and in order to reduce the dimension of the data, the feature is encoded into numerical features, namely 1, 2 and 3, so that the dimension of the feature can be reduced, and the sequential information in the feature can be kept. For historical features, the extraction is performed on the patient's historical hospitalizations. However, some patients do not have any history to reference, so the characteristics are null values, and in order to fill in missing values, the characteristics are uniformly filled by-999.

S3, extracting disease vectors according to the preprocessed medical record home page data set, and the method comprises the following steps:

s301, constructing a disease co-occurrence network based on the preprocessed medical record homepage data set, and comprising the following steps;

and constructing a disease co-occurrence network based on the first page data of the medical records to measure the internal relationship among the diseases. The nodes of the network are diseases, the edges are the relationship between the diseases, and the construction steps of the disease co-occurrence network are as follows:

s3012, according to the classification result, a patient-disease two-dimensional table is constructed, rows represent patients, columns represent diseases, and the value of the two-dimensional table is 1 or 0. 1 represents the patient has the disease and 0 represents the patient does not have the disease. Because the data volume may be very large, the two-dimensional table is extremely sparse and has huge memory consumption, and can be stored by using a sparse matrix;

s3013, according to the patient-disease two-dimensional table, measuring risks between disease pairs by using the relative risk RR values, and taking the risks as the weight RR of the network edge_ij：

Wherein, RR_ijRR values, C representing the relative Risk of diseases i and j_ijRepresenting the number of co-occurrences of disease i and disease j, N representing the total number of people, i.e. the number of rows of the two-dimensional table, C_iAnd C_jRepresenting the number of occurrences of disease i and disease j, which are all statistically derived from a patient-disease two-dimensional table.

S3014, calculating RR_ij99% confidence interval of and RR for all disease pairs_ijWith corresponding confidence interval, preserving RR_ij>1 and confidence interval>1, network edge;

s3015, generating a disease co-occurrence network according to the reserved network edges, that is, if there are network edges, nodes of the network can also be generated from the edges, thereby generating the whole disease co-occurrence network.

after the disease co-occurrence network is obtained, the Node2Vec algorithm is used for quantizing the disease nodes into embedded vectors with low dimensions, and the vector dimensions can be between 100 and 300. Since each patient may have multiple diseases detected at the point of stay and the number of diseases is different for each patient, an aggregation strategy is needed to integrate the embedded vectors of the same patient into a standard vector. The present application proposes a summation aggregation strategy:

AGGREGATE_sum＝∑_i∈Dd_i

wherein D represents the patient's diagnosis set of hospitalization, D_iA low dimensional vector representation representing disease i. The aggregation strategy is to add the vectors corresponding to the hospitalization diagnosis of the patient so as to characterize the diseased condition of the patient. After the aggregation step, the disease information of each patient is quantified to form a uniform length vector, i.e., a disease vector. A schematic diagram of the generation of a disease vector from a sample is shown in figure 2.

S4, constructing a patient similarity network:

s402, compressing the patient-disease bipartite graph, and calculating the disease co-occurrence times among patients, thereby constructing a patient similarity network.

In the embodiments disclosed in this application, the present invention treats the patient's underlying features, historical features, and disease vectors as the patient's feature vectors. And based on co-occurrence of disease among the individual patients, a patient similarity network is constructed incorporating additional information for the prediction of length of stay. Given that each patient will have multiple illnesses while hospitalized, a patient-illness bipartite graph may be generated, as shown in the left panel of fig. 3. Each patient may have several diseases, forming the bipartite graph. It is noteworthy that different hospitalizations of the same person are treated as two different individuals here, since the subject of the length of stay prediction task is a hospitalization record, not a person. The bipartite graph is represented by an adjacency matrix a, the rows of which represent patients (admission records) and the columns represent diseases. With the patient-disease bipartite graph in place, the graph was compressed to abstract the patient-to-patient similarity, resulting in the form of the right graph of fig. 3. If two patients share one disease, the two patients have an edge, the weight of the edge is the number of the shared diseases, and the larger the weight is, the larger the disease similarity between the patients is represented, thereby establishing a patient similarity network. The weights of the edges of patient i and patient j are calculated as shown below. The operators are dot products of vectors. In the patient similarity network, a node is an admission record (patient), and an edge represents the similarity between the admission record and the admission record, which can also be called the similarity between the patient and the patient:

e_ij＝A_i,:*A_j,:

s5, constructing an early prediction model of the length of stay in hospital by utilizing a GraphSAGE graph neural network based on the length of stay label, basic characteristics, historical characteristics, disease vectors and a patient similarity network of the patient.

In the embodiment disclosed by the application, the characteristic vector of the patient is regarded as the characteristic vector of the node in the patient similarity network, the structural information and the characteristic vector of the patient similarity network are considered simultaneously by using the GraphSAGE graph neural network, and the hospitalization duration is modeled, so that the early prediction model of the hospitalization duration of the patient is obtained. The GraphSAGE graph neural network adopts a small-batch training mode, a part of nodes and neighbors thereof are sampled every time, and the network is trained, so that the network can be trained on large-scale graph data. When a prediction needs to be made on new data, the prediction can be made only by using the neighbor information of the nodes without retraining the whole graph data like other graph neural network models. In the neighbor sampling strategy of GraphSAGE, a certain number of neighbors are sampled randomly, typically based on a mean distribution. The invention designs a self-defined sampling strategy, so that the larger the weight of the edge is, the higher the sampling probability is, as shown in the following formula:

the above equation obtains the sampling probability of each neighbor node j of the node i using the softmax function, where n (i) represents the neighbor set of the node i. Based on the sampling probability, the neighbor of the node is subjected to the sampling with the put back, and a fixed number of neighbor samples are obtained.

The invention designs a four-layer GraphSAGE graph neural network, wherein the first layer is an input layer, the second layer and the third layer are graph convolution layers, the fourth layer is an output layer and only comprises 1 neuron, the predicted value of the hospitalization duration of a node is directly output, and the difference between the predicted value and the true value is evaluated by adopting the mean square error as a loss function. The number of neighbor samples of the second layer is 10 and the number of neighbor samples of the third layer is 5. The aggregation function of each graph convolution layer selects the mean aggregation function and uses ReLU as the nonlinear activation function. In order to optimize the model parameters, a gradient descent method is used for back propagation, so that the parameters of the weight matrix are optimized, and a well-trained early prediction model of the duration of the hospitalization is obtained.

S6, predicting the stay time of the sample to be predicted by using the stay time early prediction model to obtain the stay time early prediction result of the patient, and the method comprises the following steps:

s601, inputting sample information to be predicted;

In the embodiment of the present disclosure, for a new hospitalization record, individual information, hospitalization hospital information, patient's disease diagnosis information, and patient's historical hospitalization record may be obtained as well, and corresponding feature vectors (basic feature, historical feature, and disease vector) may be extracted. The admission record is added to the patient similarity network, and the second-order sampling neighbors of the node are obtained. And (4) making a prediction on the model by the trained GraphSAGE model, and outputting the predicted length of stay.

It should be noted that the embodiments disclosed in the present application may be applied to multiple scenarios, and the application scenarios are not specifically limited in the present application. For example, the application is applied to a scene of early prediction of the length of a patient's stay, the length of the patient's stay can be predicted at the point of patient's admission, the patient is not limited to the age stage and the disease, the age and the disease are not distinguished, and even 0 disease can be predicted.

As shown in fig. 5, another embodiment of the present application provides an apparatus for early prediction of length of stay of a patient, comprising:

the pretreatment module is used for pretreating the acquired medical record homepage data set and extracting the hospitalization duration label of the patient;

the system comprises a disease vector extraction module and a feature extraction unit, wherein the disease vector extraction module is used for extracting disease vectors according to a preprocessed medical record home page data set and comprises a disease co-occurrence network unit which is used for constructing a disease co-occurrence network based on the preprocessed medical record home page data set, the feature extraction unit is used for generating low-dimensional vector embedded representation of diseases by using a Node2Vec algorithm according to the disease co-occurrence network and integrating all the low-dimensional vector embedded representations of the same patient by using a polymerization method to finish the extraction of the disease vectors.

A patient similarity network module for constructing a patient similarity network, comprising: constructing a patient-disease bipartite graph unit for generating a patient-disease bipartite graph represented by an adjacency matrix a according to the disease of the patient at the time of hospitalization, wherein rows represent the patient and columns represent the disease; and constructing a patient similarity network unit for compressing the patient-disease bipartite graph and calculating the disease co-occurrence times among patients so as to construct the patient similarity network.

the prediction module is used for predicting the hospitalization duration of the sample to be predicted by using the hospitalization duration early prediction model to obtain the early prediction result of the hospitalization duration of the patient, and comprises the following steps: the input unit is used for inputting sample information to be predicted, inputting the sample information to be predicted into the patient similarity network and acquiring a second-order sampling neighbor of a node in the patient similarity network; the characteristic vector extraction unit is used for extracting a characteristic vector containing basic characteristics, historical characteristics and a disease vector according to the information of the sample to be predicted; and the prediction unit is used for outputting the hospitalization duration of the sample to be predicted by utilizing the hospitalization duration early prediction model according to the second-order sampling neighbors and the characteristic vector to obtain the early prediction result of the hospitalization duration of the patient.

The device for predicting length of stay of a patient provided in the embodiment shown in fig. 5 can implement the technical solution shown in the above method embodiment, and the implementation principle and the beneficial effects are similar, and are not described herein again.

In another embodiment of the present application, an electronic device is provided, which includes a memory, a processor and a computer program stored in the memory and executed on the processor, wherein the processor executes the program to perform any one of the above methods for predicting length of patient stay.

As shown in fig. 6, the electronic device may include: the device comprises a processor, a memory, a bus and a communication interface, wherein the processor, the communication interface and the memory are connected through the bus, the memory stores a computer program capable of running on the processor, and the processor executes the computer program to execute the method for predicting the length of a patient in hospital provided by any one of the previous embodiments of the application.

Another embodiment of the present application provides a computer-readable storage medium storing a computer program, which is executed by a processor to implement the method for predicting length of a patient in hospital as described in any one of the foregoing embodiments.

The computer-readable storage medium described above may be implemented in any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks, and may be any available medium that can be accessed by a general purpose or special purpose computer. A readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium, which may also be an integral part of the processor, and the processor and the readable storage medium may reside in an Application Specific Integrated Circuit (ASIC), and the processor and the readable storage medium may also reside as discrete components in a device for predicting a length of a patient's stay in a hospital.

Embodiments of the present application may be provided as a method, apparatus, or computer program product, and as such, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. While the methods, apparatus (systems), and computer program products according to embodiments of the invention have been described with reference to flowchart illustrations and/or block diagrams, it is to be understood that each flowchart illustration and/or block diagram block or blocks, and combinations of flowchart illustrations and/or block diagrams, can be implemented by computer program instructions which are provided to a computer-readable memory of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart illustration of one or more flow diagrams and/or block diagrams block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

Claims

1. A method for early predicting the length of a patient in hospital based on a graph neural network is characterized by comprising the following steps:

s1, preprocessing the acquired medical record homepage data set and extracting a length label of stay of the patient;

s2, respectively extracting basic features and historical features of the patient;

s4, constructing a patient similarity network;

2. The method for early prediction of patient stay duration based on graph neural network as claimed in claim 1, wherein the patient stay duration label is specifically: acquiring a length label of hospitalization time of the patient according to the difference between the date of hospitalization and the date of discharge of the patient;

3. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S3 comprises the following steps:

AGGREGATE_sum＝∑_i∈Dd_i

4. The method for early predicting the length of stay of a patient in hospital based on a neural network of claim 3, wherein said step S301 comprises the steps of:

S3014, calculating RR_ijInterval under confidence, and RR for all disease pairs_ijWith corresponding confidence interval, preserving RR_ij>1 and confidence interval>1, network edge;

5. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S4 comprises the following steps:

e_ij＝A_i，:*A_j，:

wherein A is_i,:Represents the ith row of the adjacency matrix A, i.e., the disease characterization vector for the ith patient, A_j,:Represents the j-th row of the adjacency matrix AI.e. the disease characterization vector of the jth patient, represents an operator, being the dot product of the vectors.

6. The method for early predicting the hospitalization duration of patient based on graph neural network as claimed in claim 1, wherein the sampling strategy p of graph neural network in step S5_ijThe expression is as follows:

7. The method for early predicting the length of stay of a patient in a hospital based on a neural network of claim 1, wherein the step S6 comprises the following steps:

s601, inputting sample information to be predicted;

8. A device for early prediction of patient hospitalization duration based on a graph neural network is characterized by comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executed on the processor, the processor executing the program to implement the method of predicting length of a patient's stay in a hospital as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which is executable by a processor for implementing a method for predicting the length of a patient's stay in a hospital as claimed in any one of claims 1 to 7.