CN112530594B

CN112530594B - Hemodialysis complication long-term risk prediction system based on convolution survival network

Info

Publication number: CN112530594B
Application number: CN202110179779.8A
Authority: CN
Inventors: 李劲松; 王丰; 朱世强; 田雨; 周天舒
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-05-11
Anticipated expiration: 2041-02-08
Also published as: CN112530594A; WO2022166158A1

Abstract

The invention discloses a long-term risk prediction system for hemodialysis complications based on a convolutional survival network, which comprises a data acquisition module, a data preprocessing module, a learning prediction module and a result display module, wherein the data acquisition module is used for acquiring a data set; the invention utilizes a convolution neural network to process multidimensional hemodialysis time sequence characteristics; the convolutional neural network combines with Cox proportion risk hypothesis to provide a convolutional survival network; and on the basis of utilizing the convolution survival network, adopting Breslow to estimate a reference risk function and calculating the long-term risk change condition of the patient. The invention can make full use of the common truncation data in medical research; the main framework of the convolutional neural network is applied, so that visual analysis is facilitated, and interpretable and heuristic results are made; long-term risk variation of the patient can be predicted.

Description

Hemodialysis complication long-term risk prediction system based on convolution survival network

Technical Field

The invention belongs to the technical field of medical treatment and machine learning, and particularly relates to a hemodialysis complication long-term risk prediction system based on a convolutional survival network.

Background

End stage renal disease is on the rise in worldwide incidence, causing a huge disease burden. Most patients need to rely on hemodialysis (hemodialysis) to sustain life. Complicated diseases such as vascular access infection, hypertension, coronary heart disease and the like which can occur in the long-term hemodialysis process seriously affect the survival condition of patients and cause huge influence and burden on the patients, families of the patients and the society. Therefore, long-term risk prediction and early prevention and treatment of hemodialysis complications are of great importance to improve the quality of life of end-stage renal patients. A large amount of time series data are accumulated in the long-term hemodialysis process, and a large number of opportunities are brought to the challenge and the opportunity of related research. In recent years, with the rapid development of information technology, many time sequence analysis methods based on machine learning methods have been developed. The deep learning method has wide application and strong performance, and comprises a cyclic neural network, a convolutional neural network and the like. The cyclic neural network and the convolutional neural network have been widely applied to timing analysis scenarios in the medical field.

The recurrent neural network is a recurrent neural network which takes sequence data as input, recurses in the evolution direction of the sequence and all nodes are connected in a chain manner. The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a deep structure, has the characteristic learning capability, can perform translation invariant classification on input information according to a hierarchical structure, and is also called as a translation invariant artificial neural network. However, the features identified by the recurrent neural network are complex and abstract, and are difficult to visually understand and provide heuristic results. The convolutional neural network can only predict the risk value of a cut-off single time point, cannot perform long-term and continuous risk prediction on the occurrence risk of the complication, and is difficult to provide accurate and effective decision support for clinicians. In addition, the conventional recurrent neural network and the convolutional neural network can only analyze simple data structures, and cannot process truncated data common in clinical analysis.

Disclosure of Invention

Deep learning is a popular research direction in the field of artificial intelligence, and achieves a lot of achievements in search, machine translation, natural language processing, multimedia learning, recommendation and personalization technologies and other related fields, but at present, deep learning is rarely applied to the field of hemodialysis complication prognosis analysis.

The invention aims to provide a hemodialysis complication long-term risk prediction system based on a convolution survival network aiming at the defects of the traditional deep learning method; the method utilizes a convolutional neural network algorithm in the deep learning field, and obtains interpretable and instructive results based on the characterization learning capability and a visualization method of a convolutional kernel; the invention improves the structure of the convolutional neural network, so that the relative risk of a patient can be predicted by fully utilizing the truncated data, and long-term and continuous risk prediction can be further carried out.

The purpose of the invention is realized by the following technical scheme: a system for long-term risk prediction of hemodialysis complications based on a convolutional survival network, the system comprising: the data acquisition module is used for acquiring blood pressure information of the hemodialysis patient; the data preprocessing module is used for carrying out missing value processing and normalization processing on the original data and normalizing the original data into a two-dimensional matrix; a learning prediction module for deep learning modeling; the result display module is used for visually outputting and presenting long-term risk change conditions;

the processing process of the data preprocessing module specifically comprises the following steps: the continuous hemodialysis blood pressure data are arranged into a two-dimensional matrix according to a hemodialysis treatment sequence, each row of data corresponds to one hemodialysis treatment process, and the two-dimensional matrix is normalized by using Min-Max to ensure that a blood pressure waveform is reserved;

the processing procedure of the learning prediction module comprises two parts:

(1) and (3) predicting the relative risk of the complications based on the convolutional survival network: training a convolution survival network by using a convolution neural network architecture and combining a Cox proportional risk loss function; the convolution survival network is formed by adding a plurality of convolution layers and full-connection layers, the input is a two-dimensional matrix output by the data preprocessing module, each convolution layer abstracts the characteristics layer by layer and finally abstracts the characteristics into a plurality of mode characteristics, and each mode characteristic is output as a node through the full-connection layer to represent the relative risk of an event; optimizing network parameters through a Cox proportional risk loss function;

risk function

Representing the probability of an individual to have an event at a given time, the formula is as follows:

(1)

wherein

The time is represented by the time of day,

is that

The dimensional covariates are used to characterize the patient,

the number of the neurons of the last full connection layer of the convolution survival network,

is a regression parameter that is a function of the time,

is a function of the reference risk,

is the relative risk, i.e., the output of the convolutional survivor network;

the Cox proportional risk loss function is formulated as follows:

(2)

wherein

Refers to the number of patients;

is an indicator of an ending event E, individual patient

The occurrence of an event is

If no event occurs, then

；

And

respectively represent individual patients

And

the survival time of (2);

(2) calculating the long-term risk change condition based on the Breslow method and the relative risk, specifically: based on Breslow method

Benchmark cumulative risk function for a time of day

According to

To obtain

，

The estimated expression of (a) is as follows:

(3)

wherein

Is composed of

A set of samples at risk at a time;

according to a benchmark risk function

Relative risk calculated by combining convolution survival network

Calculating a risk function describing the long-term risk according to equation (1)

。

Further, the data preprocessing module screens out data with systolic pressure less than 60mmHg or more than 250mmHg firstly, screens out systolic pressure data of 36 continuous hemodialysis procedures, arranges the data into a two-dimensional matrix according to the hemodialysis treatment sequence, arranges each row of data corresponding to a hemodialysis treatment process with the time length of 5 hours, separates 10 minutes from row to row, and totals 36 rows with 30 points in each row.

Furthermore, in the learning prediction module, the convolution survival network receives a two-dimensional matrix output by the data preprocessing module, firstly, a one-dimensional blood pressure mode is identified, then, a sensing field is complemented into a square shape, the features are abstracted layer by layer and then connected with a global mean pooling layer to obtain a plurality of mode features, and each mode feature is output as a node through a full connection layer to represent the relative risk of an incident; the last layer of activation function is Linear, and the rest layers of activation functions are Relu.

Furthermore, in the learning prediction module, the input of the convolution survival network is a two-dimensional matrix with the size of 1 channel 36 × 30 output by the data preprocessing module, a one-dimensional blood pressure mode is identified through 16 convolutions with 1 × 5, then a sensing field is complemented into a square shape through 32 convolution operations with 5 × 1 in the longitudinal direction, then 2 layers of 16 convolution kernels with 5 × 5 and 1 layer of 9 convolution kernels with 3 × 3 are connected to abstract the feature layer by layer, then 9 mode feature values are obtained through a global mean pooling layer, and the 9 mode feature values are output as a node through a full connection layer to represent the relative risk of an occurrence event.

Further, in the learning prediction module, a calculation flow of the Cox proportional risk loss function is as follows:

(a) uniformly arranging the characteristic data X and the survival ending events E of the patients according to the survival time T in a descending order to form a matrix M;

（b）

for convolution survival network output, corresponding to

(ii) a Since M is arranged in a descending order of T,

front of matrix MiExponential cumulative sum output after passing through convolution survival network, i.e. beforeiLine of

The cumulative sum function is recorded as cumsum;

（c）

i.e. matrix mthiThe value of the survival ending event E of the individual is recorded as

；

(d) Let the summation function be sum, whereby the loss function is expressed as:

。

further, in the learning prediction module, in order to make the loss functions corresponding to the training set and the test set take the same magnitude, the sum function sum is replaced by the mean function mean, and the loss function is expressed as:

。

further, in the learning prediction module, when network parameters are optimized through a Cox proportional risk loss function, the training set is randomly layered and divided into 10 batchs, the proportion of the survival outcome events E in each batch is equal, survival data of each batch are arranged according to a reverse order of survival time T and used for calculating the loss function, and therefore 10 times of network parameters are updated through traversing a data set.

The invention has the beneficial effects that: the invention utilizes a convolution neural network to process multidimensional hemodialysis time sequence characteristics; the convolutional neural network combines with Cox proportion risk hypothesis to provide a convolutional survival network; and on the basis of utilizing the convolution survival network, adopting Breslow to estimate a reference risk function and calculating the long-term risk change condition of the patient. The invention can make full use of the common truncation data in medical research; the main framework of the convolutional neural network is applied, so that visual analysis is facilitated, and interpretable and heuristic results are made; long-term risk variation of the patient can be predicted.

Drawings

FIG. 1 is a block diagram of the long-term risk prediction system for hemodialysis complications based on a convolutional survival network according to the present invention;

FIG. 2 is a flow chart of a long-term risk prediction for hemodialysis complications based on a convolutional survival network;

fig. 3 is a schematic diagram of a convolution survival network model.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The convolution survival network in the invention: the convolutional neural network applied to survival analysis can process time sequence and image data and perform survival analysis and risk prediction; long-term risk prediction: unlike risk prediction by a point in time, "long-term risk prediction" predicts continuous risk changes over a longer period of time; and (3) truncation data: if the data for which no result event occurs at the specified end time is called truncated data, the time from the start point to the end point is called the truncation time.

As shown in fig. 1, the present invention provides a system for predicting long-term risk of hemodialysis complications based on a convolutional survival network, which includes: the data acquisition module is used for acquiring blood pressure information of the hemodialysis patient; the data preprocessing module is used for carrying out missing value processing and normalization processing on the original data and normalizing the original data into a two-dimensional matrix; a learning prediction module for deep learning modeling; and the result display module is used for visually outputting and presenting long-term risk change conditions.

The data preprocessing module preprocesses hemodialysis blood pressure data, and specifically comprises the following steps:

the blood pressure data recorded in the blood permeation process has correlation with adverse clinical events such as cardiovascular and cerebrovascular complications of patients. The invention screens out the data with the systolic pressure less than 60mmHg or more than 250mmHg, screens out the systolic pressure data of 36 continuous hemodialysis (about 3 months) and arranges the data into a format of a two-dimensional matrix, the data are arranged according to the hemodialysis treatment sequence, each row of data corresponds to the hemodialysis treatment process with the time length of 5 hours (the time is less than 5 hours and is filled with 0), the interval between columns is 10 minutes (linear interpolation), and 36 rows are counted, and each row has 30 points. The two-dimensional matrix is normalized using Min-Max to ensure that the blood pressure waveform is preserved. The two-dimensional matrix serves as the input to the convolutional survivor network, and is shown as the leftmost input matrix in fig. 3.

The process of learning the prediction module comprises two parts: predicting the relative risk of the complications based on the convolutional survival network; calculating the long-term risk change condition based on the Breslow method and the relative risk, as shown in FIG. 2, the specific steps are as follows:

(1) convolutional survival network-based relative risk prediction of complications

The invention trains a convolution survival network by using the architecture of a convolution neural network and combining a Cox proportional risk loss function. Convolution with a bit lineThe survival network is formed by stacking a plurality of convolution layers and full-connection layers, the survival network is input into a two-dimensional matrix with the size of 1 channel 36 × 30, a one-dimensional blood pressure mode is identified through 16 convolutions with 1 × 5, then a sensing field is complemented into a square shape through 32 convolution operations with 5 × 1 in the longitudinal direction, then 2 layers of 16 convolution kernels with 5 × 5 and 1 layer of 9 convolution kernels with 3 × 3 are connected to abstract the characteristic layer by layer, then 9 mode characteristic values are obtained through a global mean pooling layer, the 9 mode characteristic values are fully connected and output as a node to represent the relative risk of occurrence of an event (such as cardiovascular complications), namely the relative risk is represented through full-connection output of the 9 mode characteristic values

(ii) a The last layer of activation function is Linear, and the rest layers of activation functions are Relu. According to the invention, network parameters are optimized through a Cox proportional risk loss function, and the network output is relative risk. The structure of the convolution survival network model is shown in figure 3.

The key to the convolutional survival network is that it can process the truncated data and optimize the network parameters by using the Cox proportional risk loss function. Risk function

Representing the probability of an individual to have an event at a given time, and a risk function, as shown in equation (1), representing the patientxIn thattProbability of occurrence of an event at a time:

(1)

wherein

The time is represented by the time of day,

is that

The amount of the dimensional covariate is,

the number of neurons in the last fully connected layer of the convolutional survivor network (corresponding to the 9 mode characteristic values obtained after pooling the global mean value in this embodiment, that is, the number of neurons in the last fully connected layer of the convolutional survivor network

Taking 9) for characterizing patientsx，

Is a regression parameter that is a function of the time,

is a function of the reference risk,

is a relative risk (i.e., the output of the convolutional survivor network).

The Cox proportional risk loss function is:

(2)

wherein

The number of ESRD patients is indicated,

is an indicator of an ending event E; individual patient

The occurrence of an event is

If no event occurs, then

；

And

respectively represent individual patients

And

the lifetime of (1). The smaller the value of the formula (2), the smaller the expression parameter

The more closely (corresponding to the parameters of the last fully connected layer) the relative risk of the patient can be fitted.

Usually the loss function is calculated from y _ pred and y _ true. y _ pred is the predicted value of the model output, and y _ true represents its true value. For example, the sum of the differences and the mean of y _ pred and y _ true is the mean square error loss function. However, only in the formula (2)

And

the relative risk as output of the model prediction can be represented by y _ pred, while the patient's true risk of complication occurrence is unknown. To calculate the loss function of equation (2), the present invention employs the following procedure:

(1.1) uniformly arranging the characteristic data X (representing the characteristic data of all patients) and survival outcome events E (representing the survival outcomes of all patients) of the patients in a descending order according to the survival time T (representing the survival time of all patients) to form a matrix M;

（1.2）

representing relative risk for the convolution survival network output, corresponding to

. Since M is arranged in a descending order of T,

The cumulative sum function is recorded as cumsum;

（1.3）

i.e. matrix mthiThe value of the survival ending event E of the individual can be recorded as

；

(1.4) let the summation function be sum, whereby the loss function can be expressed as:

；

(1.5) because the sample quantities of the training set and the test set are usually different, in order to enable the loss functions of the training set and the test set to take the same magnitude, the summation in the step (1.4) is changed into the averaging; let the averaging function be denoted mean, the loss function can be expressed as

。

It can be observed that, unlike the normal loss function that uses y _ pred and y _ true calculations, the Cox proportional risk loss function uses the model output y _ pred and survival outcome event E to perform calculations after arranging the feature data X and survival outcome event E in reverse order using the time-to-live T, and can only calculate the loss function in bulk since the cumulative sum function cumsum needs to be calculated.

When using the Cox proportional risk loss function to optimize network parameters, calculations are typically performed using the entire data set. However, the network parameters can be updated only once by 1 epoch (traversing the data set), and the time cost is high. According to the method, a training set is randomly and hierarchically divided into 10 batchs, the proportion of survival outcome events E in each batch is equal, and the survival data of each batch are arranged according to the survival time T in a reverse order and used for calculating the loss function. Therefore, 10 batch data of 1 epoch are used for updating the network parameters, the network parameters are updated 10 times by traversing the data set once, and the efficiency is obviously improved.

(2) Calculating long-term risk change condition based on Breslow method combined with relative risk

Left side of middle size in formula (1)

I.e. a risk function describing long-term risk; equal sign right side

The relative risk is calculated by a convolution survival network; therefore, only estimation is needed

Can be combined

Computing a risk function describing long-term risk

。

In the prior art are known

Estimating a reference risk function

The most commonly used method is the Breslow method. Breslow method proposes

Benchmark cumulative risk function for a time of day

The estimation expression of (1):

(3)

wherein

Is composed of

A set of samples that are at risk at a time.

This example trained the model using systolic blood pressure recordings during 36 consecutive hemodialysis sessions in a patient with end-stage renal disease in a certain hospital, and evaluated the model accuracy (C-Index) using a stratified ten-fold cross-validation. In the embodiment, 36 hemodialysis blood pressure records of patients with end-stage renal disease are subjected to maximum value, minimum value, mean value and blood pressure variability, the maximum value, the minimum value, the mean value and the blood pressure variability are used for inputting DeepSurv and a traditional Cox proportional risk regression model, and ten-fold cross validation is performed for comparing and evaluating the accuracy of the model. The traditional Cox proportional hazards regression model C-Index was 0.646. + -. 0.065 (0.95 CI), the Deepsurv model C-Index was 0.658. + -. 0.038 (0.95 CI), and the model C-Index of the present invention was 0.839. + -. 0.039 (0.95 CI). It can be seen that the DeepSurv model is stable compared with the traditional Cox proportional risk regression model, the accuracy C-Index is almost the same, the accuracy C-Index of the model is obviously superior to that of the traditional Cox proportional risk regression model, and long-term risk prediction of hemodialysis complications can be realized.

The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. A system for predicting long-term risk of hemodialysis complications based on a convolutional survival network, the system comprising: the data acquisition module is used for acquiring blood pressure information of the hemodialysis patient; the data preprocessing module is used for carrying out missing value processing and normalization processing on the original data and normalizing the original data into a two-dimensional matrix; a learning prediction module for deep learning modeling; the result display module is used for visually outputting and presenting long-term risk change conditions;

the processing procedure of the learning prediction module comprises two parts:

(1) and (3) predicting the relative risk of the complications based on the convolutional survival network: training a convolution survival network by using a convolution neural network architecture and combining a Cox proportional risk loss function; the convolution survival network is formed by laminating a plurality of convolution layers and a full connection layer, the input is a two-dimensional matrix output by the data preprocessing module, each convolution layer abstracts the characteristics layer by layer and finally abstracts the characteristics into a plurality of mode characteristics, and each mode characteristic is output as a node through the full connection layer to represent the relative risk of an event; optimizing network parameters through a Cox proportional risk loss function;

risk function

(1)

wherein

The time is represented by the time of day,

is that

The dimensional covariates are used to characterize the patient,

is a regression parameter that is a function of the time,

is a function of the reference risk,

is the relative risk, i.e., the output of the convolutional survivor network;

the Cox proportional risk loss function is formulated as follows:

(2)

wherein

Refers to the number of patients;

is an indicator of an ending event E, individual patient

The occurrence of an event is

If no event occurs, then

；

And

respectively represent individual patients

And

the survival time of (2);

Benchmark cumulative risk function for a time of day

According to

To obtain

，

The estimated expression of (a) is as follows:

(3)

wherein

Is composed of

A set of samples at risk at a time;

according to a benchmark risk function

Relative risk calculated by combining convolution survival network

。

2. The convolutional survivor network-based long-term risk prediction system for hemodialysis complications according to claim 1, wherein the data preprocessing module first screens out data with a systolic blood pressure of less than 60mmHg or more than 250mmHg, then screens out systolic blood pressure data of 36 continuous hemodialysis to be normalized into a two-dimensional matrix, and arranges the data in a hemodialysis treatment order, each row of data corresponds to a hemodialysis treatment process with a duration of 5 hours, and the interval between columns is 10 minutes, and total 36 rows with 30 points in each row.

3. The system of claim 1, wherein in the learning prediction module, the convolutional survivor network receives a two-dimensional matrix output by the data preprocessing module, firstly identifies a one-dimensional blood pressure mode, then complements the perception field into a square shape, abstracts the characteristic layers and then connects with a global mean pooling layer to obtain a plurality of mode characteristics, and each mode characteristic is output as a node through a full connection layer to represent the relative risk of an occurrence event; the last layer of activation function is Linear, and the rest layers of activation functions are Relu.

4. The system according to claim 2, wherein the convolutional survivor network-based long-term risk prediction system for the hemodialysis complications is characterized in that in the learning prediction module, the input of the convolutional survivor network is a two-dimensional matrix with the size of 1 channel 36 × 30 output by the data preprocessing module, a one-dimensional blood pressure pattern is identified through 16 convolutions with the size of 1 × 5, then a sensing field is squared through 32 convolution operations with the size of 5 × 1 in the longitudinal direction, then 2 layers of 16 convolution kernels with the size of 5 × 5 and 1 layer of 9 convolution kernels with the size of 3 × 3 are connected to abstract feature layers, then a global mean pooling layer is connected to obtain 9 pattern feature values, and the 9 pattern feature values are output as a node through a full connection layer to represent the relative risk of occurrence of an event.

5. The system of claim 1, wherein in the learning and predicting module, the Cox proportional risk loss function is calculated as follows:

（b）

for convolution survival network output, corresponding to

(ii) a Since M is arranged in a descending order of T,

The cumulative sum function is recorded as cumsum;

（c）

；

。

6. the system according to claim 5, wherein in the learning prediction module, in order to make the loss functions corresponding to the training set and the test set take the same magnitude, the sum function sum is replaced with a mean function mean, and the loss function is expressed as:

。

7. the system according to claim 1, wherein in the learning and prediction module, when the network parameters are optimized through a Cox proportional risk loss function, the training set is randomly hierarchically divided into 10 lots, the proportion of survival outcome events E in each lot is equal, survival data of each lot is arranged in an inverted order of survival time T to calculate a loss function, and thus 10 network parameters are updated through one data set.