WO2021203796A1

WO2021203796A1 - Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis

Info

Publication number: WO2021203796A1
Application number: PCT/CN2021/073136
Authority: WO
Inventors: 李劲松; 池胜强; 田雨; 周天舒
Original assignee: 之江实验室
Priority date: 2020-04-09
Filing date: 2021-01-21
Publication date: 2021-10-14
Also published as: CN111640510A

Abstract

A disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis, comprising a data acquisition module, a data preprocessing module, and a prediction model construction module. The system, by using a deep neural network model as a basis, converts a survival analysis problem into a multi-task learning model composed of a semi-supervised learning problem of multi-time-sequence-point survival probability prediction; the model directly models a survival probability, does not depend on proportional risk hypothesis, can fit a time-dependent effect, and has better interpretability; it is proposed that a semi-supervised loss function and a sorting loss function are utilized to fit data, complete data and censored data are fully utilized, and traditional survival analysis problems and survival analysis problems considering competition risks can be solved; according to the model, by means of multi-task learning of multiple time sequence points, data sharing among multiple prediction tasks is achieved, mutual constraint among the multiple prediction tasks is achieved, and the generalization ability of the model is improved.

Description

A disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis

Technical field

The invention belongs to the technical field of medical treatment and machine learning, and in particular relates to a disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis.

Background technique

Disease prognosis prediction analysis can provide clinicians with prognostic information for disease treatment, help formulate treatment plans, increase disease cure rate, improve patient prognostic quality of life, and effectively reduce disease burden, which is of great significance for disease control and treatment. Survival analysis is a commonly used data analysis method in the prediction of disease prognosis, which is used to analyze and predict the time of occurrence of an event. In medicine, it plays a key role in determining the course of treatment, developing new drugs, preventing adverse drug reactions and improving hospital procedures. Recently, with the rise of deep learning models and the improvement of training technology, deep neural networks, convolutional neural networks, long- and short-term memory networks and other deep learning network structures have begun to increase in the application of disease prognosis prediction. In addition, some advanced machine learning strategies are gradually being applied to survival analysis methods based on deep learning, including active learning, transfer learning, and multi-task learning to improve the performance of disease prognosis prediction.

Censored data are common in disease prognosis data. Censored data are not missing data, but incomplete data that can only provide prognostic information from the beginning to the censored time, and cannot provide complete information from the beginning to the occurrence of the event. Existing deep learning-based methods may not make full use of censored data; or in the case of making full use of censored data, they cannot effectively solve the time-dependent phenomenon of features; or the generalization ability of the model is insufficient; or the model is interpretable Poor sex. The existing methods based on multi-task learning cannot make full use of censored data.

Summary of the invention

The purpose of the present invention is to provide a disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis in view of the deficiencies of the prior art.

The present invention is based on a deep neural network model, transforms the survival analysis problem into a multi-task learning model composed of a semi-supervised learning problem of multi-time sequence point survival probability prediction; considering the censored data and the non-increasing trend of survival probability in survival analysis, it is proposed to use The semi-supervised loss function and the ranking loss function fit the data, and can deal with traditional survival analysis problems and survival analysis problems considering competitive risks. At the same time, it provides a method for evaluating the importance of features, and visualizes the time dependence and nonlinear effects of features.

The deep neural network structure in the model contains multiple layers of nonlinear transformation units, which can fit the nonlinear effects of features. The model directly models survival probability, does not rely on proportional hazards assumptions, can fit time-dependent effects, and has better explanatory properties. The model makes full use of complete data and censored data through logarithmic loss function and semi-supervised loss function; utilizes the non-increasing trend of survival probability through sorting loss function; realizes automatic feature selection and prevents model overfitting through L1 and L2 loss functions . The model realizes data sharing between multiple prediction tasks through multi-task learning at multiple time sequence points, and realizes mutual constraints between multiple prediction tasks at the same time, and improves the generalization ability of the model.

The purpose of the present invention is achieved through the following technical solutions: a disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis, including: a data acquisition module for acquiring disease prognosis data; A data preprocessing module for missing value processing and normalization processing; a prediction model building module for modeling disease prognosis data; a prediction result display module for displaying data prediction results; in the prediction model building module Using a survival analysis method based on deep semi-supervised multi-task learning, the specific steps are as follows:

(1) In the prognostic data survival analysis, the given data set is recorded as: D={(X ₁ ,T ₁ ,δ ₁ ),(X ₂ ,T ₂ ,δ ₂ ),…,(X _i ,T _i ,δ _i ),…,(X _N ,T _N ,δ _N )}. _{_{(X i, T i, δ}} i) represents an instance of data, where X _i is the i th data feature vector; [delta] _i is censored indicator variable i-th data, when δ _i = 1, it indicates that the data is non-censored data, i.e. observed occurrence of an event, when δ _i = 0, indicates that the data is censored data, i.e., the occurrence of an event not observed; survival time T _i represents the i-th data. For uncensored data, T _i is equal to the observed survival time O _i ; for censored data, T _i is equal to the censoring time C _i .

The characteristics of the data set can be expressed as:

Among them, N is the number of samples and M is the number of features.

The label of the data set can be expressed as:

Y={(T ₁ , δ ₁ ),(T ₂ , δ ₂ ),…,(T _i , δ _i ),…,(T _N , δ _N )}

(2) Regard the survival time as multiple time points, and convert the original label information of each sample into a K-dimensional survival state vector, where K=max(T _i ), i=1, 2,...,N, Is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the sample at this point in time. The label of the converted data set can be expressed as:

(3) Construct a deep neural network with one input layer and multiple output layers. The input of the deep neural network is the feature X of the data set, the output label is Y, and each output layer corresponds to each of Y y, that is, each output layer corresponds to event prediction tasks at different times. The deep neural network can make predictions for the same task at K different times.

(4) Construct a prediction model. The objective function of the prediction model is composed of five parts: log loss, L1 loss, L2 loss, semi-supervised loss and ranking loss:

1) Log loss

For the labeled data, for the binary classification problem that does not consider the competition risk, the model uses the logarithmic loss to punish the wrong classification to measure the accuracy of the classifier. Let the label be y, y∈{0,1}. The parameter θ is estimated by the maximum likelihood estimation method, and the likelihood function is:

Where, l is the sample number label, p (X _i; θ) is the posterior probability of the sample X _i. Take the logarithm of the likelihood function to obtain the log likelihood function, that is, the log loss function:

That is, the greater the probability that each sample belongs to its true label, the better.

For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that when X _i is given, the conditional probability distribution of y is p(y _i =k|X _i ; θ), where k=1, 2,...,C, and C is the number of all possible outcomes. The parameter θ is estimated by the maximum likelihood estimation method, and the corresponding log loss function is:

Among them, I{y _i =k} is an indicator function. When y _i =k, I{y _i =k}=1; otherwise, I{y _i =k}=0.

2) L1 loss:

L1(θ)=‖θ‖

3) L2 loss

L2(θ)=‖θ‖ ²

4) Semi-supervised loss

For unlabeled data, the use of unlabeled data is realized by adding an entropy-constrained regularization term to the objective function.

For the binary classification problem that does not consider the competition risk, the event state is a random variable that obeys the Bernoulli distribution and the parameter is p. Its entropy is defined as follows:

H(p)=-plog p-(1-p)log(1-p)

For unlabeled data, the entropy constraint regularization is defined as follows:

Among them, u is the number of unlabeled samples, and p is the probability of occurrence of the event. If the category of unlabeled data is determined, the entropy constraint regularization term will be small.

For multi-classification problems considering competitive risks, the entropy constraint regularization of unlabeled data is defined as follows:

5) Sorting loss

The non-increasing trend of survival probability is constrained by adding a ranking loss to the objective function. The ranking loss is defined as follows:

Among them, p _i,p (y _i = 1|X _i ; θ) represents the probability of a death event in the i-th sample at time p. That is, when time p<q, the probability of occurrence of the i-th sample event should satisfy p _i,p (y _i =1|X _i ;θ)<pi _,q (y _i =1|X _i ;θ), Otherwise, a penalty will be imposed on the probability of occurrence of this pair of events; I(pi _,p (y _i = 1|X _i ; θ)>pi _,q (y _i =1|X _i ; θ)) is the indicator function, When p _i,p (y _i =1|X _i ;θ)>pi _,q (y _i =1|X _i ;θ), I=1; otherwise, I=0.

In summary, the semi-supervised multi-task survival analysis model based on deep learning, that is, the objective function of the prediction model is:

L _total (θ)=l(θ)+λ ₁ L1(θ)+λ ₂ L2(θ)+λ ₃ Ω(θ)+λ ₄ R(θ)

Among them, l(θ) is the logarithmic loss, L1(θ) is the L1 loss, L2(θ) is the L2 loss, Ω(θ) is the semi-supervised loss, R(θ) is the ranking loss, λ ₁ ,λ ₂ , λ ₃ ,λ ₄ are the parameters that control the strength of the regular term.

Use disease data for model training to obtain model parameters θ to determine the prediction model. For new disease data, predictive models are used to predict the prognosis of the disease.

Further, the step (2) transforms the original survival analysis problem into a multi-task learning problem through the process of converting the label information into a vector.

Further, in the step (3), the hidden layer parameters in the deep neural network adopt a hard sharing mechanism, thereby reducing the risk of overfitting.

Further, in the step (4), for the deep semi-supervised multi-task learning problem of survival analysis problem transition, there are two important features: unlabeled data caused by censoring and a non-increasing trend of survival probability. For unlabeled data caused by censoring, semi-supervised learning is performed by using entropy-constrained regularization. Aiming at the non-increasing trend of survival probabilities at different time points, sorting loss is introduced to constrain the survival probabilities of different output layers. At the same time, L1 loss is introduced into the objective function to realize automatic feature selection, and L2 loss is introduced to avoid overfitting.

Further, the prediction result display module is used for feature importance evaluation, and visually displays the time dependence and nonlinear effects of features. The specific steps for calculating the importance of a feature F are as follows:

1) Select the corresponding test data to calculate the model prediction error and record it as error1.

2) Randomly add noise interference to the feature F of all samples in the test data, calculate the model prediction error again, and record it as error2. For continuous variables, randomly add a noise disturbance that obeys the normal distribution N(0,σ∈), where σ is the standard deviation of the feature F, and ∈ is a small constant. For discrete variables, x _F → x _F *(1-s)+(1-x _F )*s, where s is the noise disturbance that obeys the Bernoulli distribution, and x _F is the value of the characteristic F.

3) Calculate the difference e between the two prediction errors: e=error2-error1.

4) Repeat steps 1-3 n times.

5) The calculation formula for the importance of feature F is as follows:

If random noise is added, the accuracy of the test data drops drastically, indicating that this feature has a great influence on the prediction results of the sample, and further indicating that the importance is relatively high.

Further, the prediction result display module visually displays the influence of the characteristics on the prognosis by drawing the predicted cumulative incidence curves corresponding to different characteristics. To draw the predicted cumulative incidence curve corresponding to a certain feature F, the specific steps are as follows:

1) All possible values of feature F are: x _F,1 ,x _F,2 ,...,x _F,v ,...,x _F,V , where V is the number of all possible values of feature F.

2) Let the value of feature F be x _F = x _{F, v} , v = 1, 2, ..., V, keep the values of other features unchanged, and calculate the average value of the cumulative occurrence rate predicted by the model:

in,

Is the average value of model prediction output for all data,

Is the model prediction output of the i-th data, and x _i,o are the values of all the features except the feature F in the i-th data.

3) Combine what you got in step 2)

Draw as a curve.

Further, in the process of drawing the predictive cumulative incidence curve, for continuous variables, the variable value range is divided into R equal divisions, and the values of all the cut points are used for cumulative incidence estimation and curve drawing, reducing the amount of calculation, R Determine according to the specific characteristic value range.

The beneficial effects of the present invention are:

The present invention is based on a deep neural network model, and converts the survival analysis problem into a multi-task learning model composed of a semi-supervised learning problem of survival probability prediction at multiple time series points. The deep neural network structure can fit the nonlinear effects of features. The model directly models survival probability, does not rely on proportional hazards assumptions, can fit time-dependent effects, and has better explanatory properties.

Considering the non-increasing trend of censored data and survival probability in survival analysis, it is proposed to use semi-supervised loss function and ranking loss function to fit the data, making full use of complete data and censored data, which can deal with traditional survival analysis problems and consider competition The question of risk survival analysis. The model realizes data sharing between multiple prediction tasks through multi-task learning at multiple time sequence points, and realizes mutual constraints between multiple prediction tasks at the same time, and improves the generalization ability of the model. At the same time, it provides a method for evaluating the importance of features, and visualizes the time dependence and nonlinear effects of features.

Description of the drawings

Figure 1 is a structural diagram of the disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis of the present invention;

Figure 2 is a schematic diagram of data set label conversion;

Figure 3 is a diagram of the neural network structure.

Detailed ways

In order to make the above objectives, features and advantages of the present invention more obvious and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

In the following description, many specific details are explained in order to fully understand the present invention, but the present invention can also be implemented in other ways different from those described here, and those skilled in the art can do so without departing from the connotation of the present invention. Similar promotion, therefore, the present invention is not limited by the specific embodiments disclosed below.

The censored data in this application is: If at the specified end time, the data without a result event is called censored data, and the time from the starting point to the censored is called censoring time. The time-dependent phenomenon is: Regardless of the baseline risk, at any point in time, the risk of an event in an exposed individual relative to an exposed individual is constant; the phenomenon that does not meet the above assumptions is considered a characteristic pair The prognosis of the disease is time-dependent. The risk of competition is: during the follow-up of the disease prognosis, the patient’s events other than the event of concern did not occur, that is, other events "competed" for the occurrence of the event of concern, and these events are called competitive risks; the competition risk is only Exist in the survival analysis problem where there are multiple end-point events, but only one end-point event will occur at any given time.

As shown in Figure 1, a disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis proposed by the present application includes: a data acquisition module for acquiring disease prognosis data; and processing missing values for disease prognosis data And a normalized data preprocessing module; a prediction model building module for modeling disease prognosis data; a prediction result display module for visually displaying data prediction results; the prediction model building module uses The survival analysis method of deep semi-supervised multi-task learning, its realization principle is as follows:

The characteristics of the data set can be expressed as:

Among them, N is the number of samples and M is the number of features.

The label of the data set can be expressed as:

Y={(T ₁ , δ ₁ ),(T ₂ , δ ₂ ),…,(T _i , δ _i ),…,(T _N , δ _N )}

(2) The present invention regards survival time as multiple time points, rather than continuous variables. Therefore, we can transform the original label information of each sample into a K-dimensional survival state vector, where K=max(T _i ), i=1, 2,...,N is the maximum survival time in all samples . Each element in the survival state vector represents the occurrence of the event (value 1), non-occurrence (value 0) or unknown (value 2) of the sample at this point in time. An example of the transformation of data set labels is shown in Figure 2. The label of the converted data set can be expressed as:

Through the process of transforming label information into vectors, the original survival analysis problem is transformed into a multi-task learning problem.

(3) Use a deep neural network with one input layer and multiple output layers. The input of the deep neural network is the feature X of the data set, the output label is Y, and each output layer corresponds to each y in Y, namely Each output layer corresponds to event prediction tasks at different times. Figure 3 shows a deep neural network with K output layers. If the output k refers to _{the prediction of the task at time T k} , then the network can make predictions for the same task at K different times. The hidden layer parameters in the network adopt a hard sharing mechanism. The hard sharing mechanism reduces the risk of overfitting. Intuitively speaking, the more tasks learn at the same time, the more common features of the model can be captured by the model, so that the risk of overfitting on each task is smaller.

(4) Objective function definition

For the deep semi-supervised multi-task learning problem of survival analysis problem transformation, there are two important characteristics: unlabeled data caused by censoring and the non-increasing trend of survival probability. For these two problems, it is necessary to design appropriate constraints to deal with. For unlabeled data caused by censoring, we use entropy-constrained regularization for semi-supervised learning. If the category of unlabeled data is determined, the entropy constraint regularization term will be small. Taking into account the non-increasing trend of survival probabilities at different time points, we introduce a sorting loss to constrain the survival probabilities of different output layers. At the same time, L1 loss is introduced into the objective function to realize automatic feature selection, and L2 loss is introduced to avoid overfitting. The objective function of the model consists of five parts: log loss, L1 loss, L2 loss, semi-supervised loss and ranking loss.

1) Log loss

For the survival analysis problem considering the competitive risk, we regard the event prediction at each time point as a multi-classification problem. Suppose that when X _i is given, the conditional probability distribution of y is p(y _i =k|X _i ; θ), where k=1, 2,...,C, and C is the number of all possible outcomes. This model used to solve the classification problem of y∈{1,2,...,C} is an extension of the two-class model, and its parameters can also be solved by the maximum likelihood estimation method, and the corresponding log loss function for:

2) L1 loss

The definition of L1 loss is as follows:

L1(θ)=‖θ‖

L1 loss, that is, adding the sum of the absolute values of all the weight parameters θ to the objective function can make more θ zero and realize automatic feature selection.

3) L2 loss

The definition of L2 loss is as follows:

L2(θ)=‖θ‖ ²

L2 loss, that is, adding the sum of the squares of all weight parameters θ in the objective function, so that all θ tends to zero as much as possible to avoid over-fitting.

4) Semi-supervised loss

For unlabeled data, the use of unlabeled data can be realized by adding an entropy-constrained regularization term to the objective function. For the binary classification problem that does not consider the competition risk, the event state is a random variable that obeys the Bernoulli distribution and the parameter is p. Its entropy is defined as follows:

H(p)=-plog p-(1-p)log(1-p)

5) Sorting loss

L _total (θ)=l(θ)+λ ₁ L1(θ)+λ ₂ L2(θ)+λ ₃ Ω(θ)+λ ₄ R(θ)

(5) Feature importance

To calculate the importance of a feature F, the specific steps are as follows:

2) Randomly add noise interference to the feature F of all samples in the test data (you can randomly change the value of the sample at the feature F), calculate the model prediction error again, and record it as error2. For continuous variables, randomly add a noise disturbance that obeys the normal distribution N(0,σ∈), where σ is the standard deviation of the feature F, and ∈ is a small constant. For discrete variables, x _F → x _F *(1-s)+(1-x _F )*s, where s is the noise disturbance that obeys the Bernoulli distribution, and x _F is the value of the characteristic F.

4) Repeat steps 1-3 n times, n usually takes more than 500 times.

5) The calculation formula for the importance of feature F is as follows:

The reason why this value can explain the importance of the feature is that if random noise is added, the accuracy of the test data will be greatly reduced (that is, the error2 will increase), indicating that this feature has a great impact on the prediction results of the sample, and thus the degree of importance Relatively high.

(6) Visualization of the influence of features on prognosis

By drawing the predicted cumulative incidence curve corresponding to different characteristics, the visual display of the influence of the characteristics on the prognosis. To draw the predicted cumulative incidence curve corresponding to a certain feature F, the specific steps are as follows:

in,

Is the average value of model prediction output for all data,

3) Combine what you got in step 2)

Draw as a curve. For continuous variables, the value range of the variable can be divided into R equal parts, and the values of all cut points are used for cumulative incidence estimation and curve drawing to reduce the amount of calculation. R is usually determined according to the specific characteristic value range.

This application uses the deep neural network structure to fit the nonlinear effect of the data; according to the dimensionality of the input data, the length of the survival time, and the accuracy of the model, the deep neural network structure can be flexibly expanded; the model directly models the survival probability without relying on The proportional hazard hypothesis can fit the time-dependent effects of features and has better interpretability; through the logarithmic loss function and semi-supervised loss function, it makes full use of the complete data and censored data; through the ranking loss function, the survival probability is not used. Incremental law; through the L1 and L2 loss functions, automatic feature selection and prevention of model overfitting are realized; the model can realize data sharing between multiple prediction tasks through multi-task learning at multiple time series points, and between multiple prediction tasks at the same time The model can deal with traditional survival analysis problems and survival analysis problems considering competitive risks; it provides a method for evaluating the importance of features based on deep learning models; and visualizes the prognostic effects of features Time dependence and non-linear effects.

The above are only the preferred embodiments of the present invention. Although the present invention has been disclosed as above in preferred embodiments, it is not intended to limit the present invention. Anyone familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into equivalent changes. Examples. Therefore, all simple modifications, equivalent changes and modifications made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

A disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis, which is characterized in that it includes: a data acquisition module for acquiring disease prognosis data; Data preprocessing module; prediction model building module for modeling disease prognosis data; prediction result display module for displaying data prediction results; the prediction model building module adopts deep semi-supervised multi-task learning Survival analysis method, the specific steps are as follows:

(1) In the prognostic data survival analysis, the given data set is recorded as: D={(X 1 ,T 1 ,δ 1 ),(X 2 ,T 2 ,δ 2 ),…,(X i ,T i ,δ i ),…,(X N ,T N ,δ N )}. (X i, T i, δ i) represents an instance of data, where X i is the i th data feature vector; [delta] i is censored indicator variable i-th data, when δ i = 1, it indicates that the data is non-censored data, i.e. observed occurrence of an event, when δ i = 0, indicates that the data is censored data, i.e., the occurrence of an event not observed; survival time T i represents the i-th data. For uncensored data, T i is equal to the observed survival time O i ; for censored data, T i is equal to the censoring time C i .

The characteristics of the data set can be expressed as:

Among them, N is the number of samples and M is the number of features.

The label of the data set can be expressed as:

Y={(T 1 ,δ 1 ),(T 2 ,δ 2 ),…,(T i ,δ i ),…,(T N ,δ N )}

(2) Regard the survival time as multiple time points, and convert the original label information of each sample into a K-dimensional survival state vector, where K=max(T i ), i=1, 2,...,N, Is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the sample at this point in time. The label of the converted data set can be expressed as:

(3) Construct a deep neural network with one input layer and multiple output layers. The input of the deep neural network is the feature X of the data set, the output label is Y, and each output layer corresponds to each of Y y, that is, each output layer corresponds to event prediction tasks at different times. The deep neural network can make predictions for the same task at K different times.

(4) Construct a prediction model. The objective function of the prediction model is composed of five parts: logarithmic loss, L1 loss, L2 loss, semi-supervised loss and ranking loss:

1) Log loss

For the labeled data, for the binary classification problem that does not consider the competition risk, the model uses the logarithmic loss to punish the wrong classification to measure the accuracy of the classifier. Let the label be y, y∈{0,1}. The parameter θ is estimated by the maximum likelihood estimation method, and the likelihood function is:

Where, l is the sample number label, p (X i; θ) is the posterior probability of the sample X i. Take the logarithm of the likelihood function to obtain the log likelihood function, that is, the log loss function:

That is, the greater the probability that each sample belongs to its true label, the better.

For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that when X i is given, the conditional probability distribution of y is p(y i =k|X i ; θ), where k=1, 2,...,C, and C is the number of all possible outcomes. The parameter θ is estimated by the maximum likelihood estimation method, and the corresponding log loss function is:

Among them, I{y i =k} is an indicator function. When y i =k, I{y i =k}=1; otherwise, I{y i =k}=0.

2) L1 loss:

L1(θ)=‖θ‖

3) L2 loss

L2(θ)=‖θ‖ 2

4) Semi-supervised loss

For unlabeled data, the use of unlabeled data is realized by adding an entropy-constrained regularization term to the objective function.

For the binary classification problem that does not consider the competition risk, the event state is a random variable that obeys the Bernoulli distribution and the parameter is p. Its entropy is defined as follows:

H(p)=-plog p-(1-p)log(1-p)

For unlabeled data, the entropy constraint regularization is defined as follows:

Among them, u is the number of unlabeled samples, and p is the probability of occurrence of the event. If the category of unlabeled data is determined, the entropy constraint regularization term will be small.

For multi-classification problems considering competitive risks, the entropy constraint regularization of unlabeled data is defined as follows:

5) Sorting loss

The non-increasing trend of survival probability is constrained by adding a ranking loss to the objective function. The ranking loss is defined as follows:

Among them, p i,p (y i = 1|X i ; θ) represents the probability of a death event in the i-th sample at time p. That is, when time p<q, the probability of occurrence of the i-th sample event should satisfy p i,p (y i =1|X i ;θ)<pi ,q (y i =1|X i ;θ), Otherwise, a penalty will be imposed on the probability of occurrence of this pair of events; I(pi ,p (y i = 1|X i ; θ)>pi ,q (y i =1|X i ; θ)) is the indicator function, When p i,p (y i =1|X i ;θ)>pi ,q (y i =1|X i ;θ), I=1; otherwise, I=0.

In summary, the semi-supervised multi-task survival analysis model based on deep learning, that is, the objective function of the prediction model is:

L total (θ)=l(θ)+λ 1 L1(θ)+λ 2 L2(θ)+λ 3 Ω(θ)+λ 4 R(θ)

Among them, l(θ) is the logarithmic loss, L1(θ) is the L1 loss, L2(θ) is the L2 loss, Ω(θ) is the semi-supervised loss, R(θ) is the ranking loss, λ 1 ,λ 2 , λ 3 ,λ 4 are the parameters that control the strength of the regular term.

Use disease data for model training to obtain model parameters θ to determine the prediction model. For new disease data, predictive models are used to predict the prognosis of the disease.
A disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 1, wherein the step (2) transforms the original survival analysis problem through the process of converting label information into a vector Transform into a multi-task learning problem.
The disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 1, characterized in that, in the step (3), the hidden layer parameters in the deep neural network adopt a hard sharing mechanism, thereby Reduce the risk of overfitting.
The disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 1, characterized in that, in the step (4), for the deep semi-supervised multi-task learning problem transformed by the survival analysis problem, There are two important features: unlabeled data caused by censoring and non-increasing trends in survival probabilities. For unlabeled data caused by censoring, semi-supervised learning is performed by using entropy-constrained regularization. Aiming at the non-increasing trend of survival probabilities at different time points, sorting loss is introduced to constrain the survival probabilities of different output layers. At the same time, L1 loss is introduced into the objective function to realize automatic feature selection, and L2 loss is introduced to avoid overfitting.
The disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 1, wherein the prediction result display module is used for feature importance evaluation, and the time dependence and time dependence of features are displayed in a visual manner. Non-linear effects. The specific steps for calculating the importance of a feature F are as follows:

1) Select the corresponding test data to calculate the model prediction error and record it as error1.

2) Randomly add noise interference to the feature F of all samples in the test data, calculate the model prediction error again, and record it as error2. For continuous variables, randomly add a noise disturbance that obeys the normal distribution N(0,σε), where σ is the standard deviation of the feature F, and ε is a small constant. For discrete variables, x F → x F *(1-s)+(1-x F )*s, where s is the noise disturbance that obeys the Bernoulli distribution, and x F is the value of the characteristic F.

3) Calculate the difference e between the two prediction errors: e=error2-error1.

4) Repeat steps 1-3 n times.

5) The calculation formula for the importance of feature F is as follows:

If random noise is added, the accuracy of the test data drops significantly, indicating that this feature has a great influence on the prediction results of the sample, and further indicating that the importance is relatively high.
The disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 5, wherein the prediction result display module visually displays the characteristic pairs by drawing predicted cumulative incidence curves corresponding to different characteristics The impact of prognosis. To draw the predicted cumulative incidence curve corresponding to a certain feature F, the specific steps are as follows:

1) All possible values of feature F are: x F,1 ,x F,2 ,...,x F,v ,...,x F,V , where V is the number of all possible values of feature F.

2) Let the value of feature F be x F = x F, v , v = 1, 2, ..., V, keep the values of other features unchanged, and calculate the average value of the cumulative occurrence rate predicted by the model:

in,
Is the average value of model prediction output for all data,
Is the model prediction output of the i-th data, and x i,o are the values of all the features except the feature F in the i-th data.

3) Combine what you got in step 2)
Draw as a curve.
The disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis according to claim 6, characterized in that, in the process of drawing the predictive cumulative incidence curve, for continuous variables, the variable value range is equally divided into R Divide equally, take the values of all the cut points to estimate the cumulative incidence and draw the curve to reduce the amount of calculation. R is determined according to the specific characteristic value range.