CN111640510A

CN111640510A - Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis

Info

Publication number: CN111640510A
Application number: CN202010273957.9A
Authority: CN
Inventors: 李劲松; 池胜强; 田雨; 周天舒; 叶前呈
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2020-09-08
Also published as: WO2021203796A1

Abstract

The invention discloses a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which comprises a data acquisition module, a data preprocessing module, a prediction model construction module and the like; the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time dependence effect, and has better interpretability; fitting data by using a semi-supervised loss function and a sequencing loss function, fully utilizing complete data and deleted data, and processing the traditional survival analysis problem and the survival analysis problem considering competitive risk; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.

Description

Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis

Technical Field

The invention belongs to the technical field of medical treatment and machine learning, and particularly relates to a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis.

Background

The disease prognosis prediction analysis can provide the clinician with prognosis information for disease treatment, help the formulation of treatment plan, improve the disease cure rate, improve the prognosis life quality of patients, effectively reduce the disease burden, and has great significance for the control and treatment of diseases. Survival analysis is a commonly used data analysis method in disease prognosis prediction for analyzing and predicting the time of occurrence of an event. In medicine, it plays a key role in determining the course of treatment, developing new drugs, preventing adverse drug reactions, and improving hospital procedures. Recently, with the rise of deep learning models and the improvement of training techniques, application research of deep learning network structures such as deep neural networks, convolutional neural networks, long-term and short-term memory networks and the like in disease prognosis prediction is increased. In addition, some advanced machine learning strategies are also gradually applied to the survival analysis method based on deep learning, including active learning, migratory learning and multitask learning, so that the disease prognosis prediction performance is improved.

Deletion data is ubiquitous in disease prognosis data, and the deletion data is not missing data, but only can provide prognosis information from a starting point to a deletion time, and can not provide incomplete data of complete information from the starting point to an event occurrence. The existing method based on deep learning cannot fully utilize the deleted data; or the time dependence phenomenon of the characteristics can not be effectively solved under the condition of fully utilizing the deleted data; or insufficient generalization ability of the model; or poor interpretability of the model. The existing method based on multi-task learning cannot fully utilize the deleted data.

Disclosure of Invention

The invention aims to provide a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, aiming at the defects of the prior art.

The invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the non-increasing trend of deleted data and survival probability in the survival analysis is considered, the semi-supervised loss function and the sequencing loss function are used for fitting the data, and the traditional survival analysis problem and the survival analysis problem considering the competitive risk can be processed. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.

The deep neural network structure in the model comprises a plurality of layers of nonlinear transformation unit layers, and the nonlinear effect of the features can be fitted. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability. The model fully utilizes complete data and deletion data through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing trend of survival probability through a ranking loss function; automatic feature selection and prevention of model overfitting is achieved by the L1 and L2 loss functions. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.

The purpose of the invention is realized by the following technical scheme: a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:

(1) in prognostic data survival analysis, a given data set is noted as: d { (X)₁,T₁,₁),(X₂,T₂,₂),…,(X_i,T_i,_i),…,(X_N,T_N,_N)}。(X_i,T_i,_i) Represents an instance of data in which X_iThe ith data feature vector;_ia deletion indicator variable for the ith data_iWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1_iWhen the value is 0, the data is deleted data, that is, no event is observed; t is_iIndicating the lifetime of the ith piece of data. For non-deleted data, T_iEqual to the observed time-to-live O_i(ii) a For deleted data, T_iEqual to the deletion time C_i。

The features of the data set may be expressed as:

where N is the number of samples and M is the number of features.

The labels of the data set may be expressed as:

Y{(T₁,₁),(T₂,₂),…,(T_i,_i),…,(T_N,_N)}

(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)_i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:

(3) and constructing a deep neural network, wherein the deep neural network is provided with an input layer and a plurality of output layers, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, and namely each output layer corresponds to an event prediction task at different time. The deep neural network can make predictions for the same task at K different times.

(4) Constructing a prediction model, wherein an objective function of the prediction model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss:

1) logarithmic loss

For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:

wherein l is the number of labeled samples, p (X)_i(ii) a θ) is sample X_iThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:

i.e. the greater the probability that each sample belongs to its true mark, the better.

For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given X_iThe conditional probability distribution of y is p (y)_i＝k|X_i(ii) a θ), where k is 1,2, …, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:

wherein, I { y_iIs an indication function wheny_iWhen k, I { y_iK 1; otherwise, I { y_i＝k}＝0。

2) Loss of L1:

L1(θ)＝||θ||

3) l2 loss

L2(θ)＝||θ||²

4) Semi-supervised loss

Aiming at the non-label data, the utilization of the non-label data is realized by adding an entropy-constrained regularization item to the objective function.

For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:

H(p)＝-plog p-(1-p)log(1-p)

then for unlabeled data, entropy-constrained regularization is defined as follows:

wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.

For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:

5) loss of ordering

The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:

wherein p is_i,p(y_i＝1|X_i(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. I.e. when p < q, iThe probability of occurrence of an event of one sample should satisfy p_i,p(y_i＝1|X_i；θ)＜p_i,q(y_i＝1|X_i(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)_i,p(y_i＝1|X_i；θ)＞p_i,q(y_i＝1|X_i(ii) a θ)) is an indicator function, when p_i,p(y_i＝1|X_i；θ)＞p_i,q(y_i＝1|X_i(ii) a θ), I ═ 1; otherwise, I is 0.

In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:

L_total(θ)＝l(θ)+λ₁L1(θ)+λ₂L2(θ)+λ₃Ω(θ)+λ₄R(θ)

where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and₁,λ₂,λ₃,λ₄is a parameter that controls the strength of the regularization term.

And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.

Further, the step (2) converts the original survival analysis problem into a multi-task learning problem through a process of converting the label information into a vector.

Further, in the step (3), a hard sharing mechanism is adopted for hidden layer parameters in the deep neural network, so that the risk of overfitting is reduced.

Further, in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.

Further, the prediction result display module is used for feature importance evaluation and displays time dependence and nonlinear effects of features in a visualization mode. The specific steps for calculating the importance of a certain feature F are as follows:

1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.

2) Randomly adding noise interference to the characteristic F of all samples in the test data, calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding noise interference which is subject to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, x_F→x_F*(1-s)+(1-x_F) S, where s is the noise disturbance following the Bernoulli distribution, x_FIs the value of characteristic F.

3) Calculating the difference e of the two prediction errors: e-error 2-error 1.

4) Repeating the steps of 1 to 3 for n times.

5) The significance calculation formula of the feature F is as follows:

if random noise is added, the accuracy of the test data is greatly reduced, which shows that the characteristic has great influence on the prediction result of the sample, and further shows that the importance degree is higher.

Furthermore, the prediction result display module is used for visually displaying the influence of the characteristics on prognosis by drawing prediction cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:

1) all possible values of feature F are: x is the number of_F,1,x_F,2,…,x_F,v,…,x_F,VWhere V is the number of all possible values of the feature F.

2) Let the value of the characteristic F be x_F＝x_F,vAnd V is 1,2, …, V, keeping the values of other features unchanged, and calculating the average value of the model predicted cumulative occurrence rate:

wherein,

is the average of the model predicted outputs for all data,

is the model prediction output of the ith piece of data, x_i,oIs the value of all the other features in the ith piece of data except feature F.

3) Mixing the product obtained in step 2)

Plotted as a curve.

Further, in the process of drawing the predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.

The invention has the beneficial effects that:

the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction. The deep neural network structure may be fitted to the nonlinear effects of the features. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability.

In consideration of the non-increasing trend of deleted data and survival probability in survival analysis, the method proposes that the data are fitted by using a semi-supervised loss function and a sequencing loss function, fully utilizes complete data and deleted data, and can process the traditional survival analysis problem and the survival analysis problem in consideration of competition risks. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.

Drawings

FIG. 1 is a diagram of a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to the present invention;

FIG. 2 is a schematic diagram of a dataset tag transformation;

fig. 3 is a diagram of a neural network architecture.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The deleted data of the application is: if at a specified end time, data for which no result event occurs is referred to as erasure data, and the time from the start point to erasure is referred to as erasure time. The time-dependent phenomenon is: regardless of the baseline risk, at any point in time, the risk of an event occurring in an individual with an exposure versus an individual without the exposure is constant; phenomena whose features do not meet the above assumptions are considered to be time-dependent in their impact on disease prognosis. The risk of competition is: during the disease prognosis follow-up period, the patient has no events of interest due to other events except the events of interest, namely, other events "compete" for the occurrence of the events of interest, and the events are called competitive risks; the competitive risk is only present in the problem of survival analysis where there are multiple endpoint events, but only one endpoint event occurs at any given time.

As shown in fig. 1, the present application provides a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which includes: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for visually displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the realization principle is as follows:

(1) in prognostic data survival analysis, a given data set is noted as: d { (X)₁,T₁,₁),(X₂,T₂,₂),…,(X_i,T_i,_i),…,(X_N,T_N,_N){。(X_i,T_i,_i) Represents an instance of data in which X_iThe ith data feature vector;_ia deletion indicator variable for the ith data_iWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1_iWhen the value is 0, the data is deleted data, that is, no event is observed; t is_iIndicating the lifetime of the ith piece of data. For non-deleted data, T_iEqual to the observed time-to-live O_i(ii) a For deleted data, T_iEqual to the deletion time C_i。

The features of the data set may be expressed as:

where N is the number of samples and M is the number of features.

The labels of the data set may be expressed as:

Y＝{(T₁,₁),(T₂,₂),…,(T_i,_i),…,(T_N,_N)}

(2) the present invention considers time-to-live as a plurality of time points, rather than as a continuous variable. Therefore, the original label information of each sample can be converted into a survival state vector with the dimension of K, wherein K is max (T)_i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents an event occurrence (value 1), non-occurrence (value 0), or unknown (value 2) for the sample at this point in time. An example of a transformation of a dataset tag is shown in FIG. 2. The converted dataset labels may be represented as:

the original survival analysis problem is converted into a multi-task learning problem through a process of converting label information into vectors.

(3) A deep neural network with an input layer and a plurality of output layers is utilized, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, namely, each output layer corresponds to the event prediction task at different time. FIG. 3 shows a deep neural network with K output layers, if output K refers to a task at time T_kThen the network can make predictions for the same task at K different times. Hidden layer parameters in the network employ a hard sharing mechanism. The hard sharing mechanism reduces the risk of overfitting. Intuitively, the more tasks learn simultaneously, the more tasks the model can capture the common feature representation, so that the less risk of overfitting on each task.

(4) Object function definition

For the deep semi-supervised multitask learning problem of survival analysis problem transformation, two important characteristics exist: non-increasing trends in unlabeled data and survival probability due to deletions. For these two problems, proper constraints need to be designed to deal with. For unlabeled data caused by deletion, entropy constrained regularization is utilized for semi-supervised learning. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small. Considering the non-increasing trend of the survival probability of different time points, the ordering loss is introduced, and the survival probability of different output layers is restrained. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss. The objective function of the model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss.

1) Logarithmic loss

For the survival analysis problem considering the competitive risk, we consider the event prediction at each time point as a multi-classification problem. Suppose that at a given X_iThe conditional probability distribution of y is p (y)_i＝k|X_iThe model for solving the classification problem of y ∈ {1,2, …, C } is an extension of the binary model, whose parameters can also be solved by maximum likelihood estimation, and the corresponding log-loss function is:

wherein, I { y_iK is an indication of the function when y_iWhen k, I { y_iK 1; otherwise, I { y_i＝k}＝0。

2) L1 loss

The L1 loss is defined as follows:

L1(θ)＝||θ||

the loss of L1, i.e. adding the sum of the absolute values of all the weighting parameters theta to the objective function, can make more theta zero, enabling automatic selection of features.

3) L2 loss

The L2 loss is defined as follows:

L2(θ)＝||θ||²

l2 penalizes, i.e., adds the sum of the squares of all the weighting parameters θ in the objective function, making all θ as close to zero as possible, avoiding overfitting.

4) Semi-supervised loss

For the non-label data, the use of the non-label data can be realized by adding an entropy-constrained regularization item to the objective function. For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:

H(p)＝-plog p-(1-p)log(1-p)

5) loss of ordering

wherein p is_i,p(y_i＝1|X_i(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy p_i,p(y_i＝1|X_i；θ)＜p_i,q(y_i＝1|X_i(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)_i,p(y_i＝1|X_i；θ)＞p_i,q(y_i＝1|X_i(ii) a θ)) is an indicator function, when p_i,p(y_i＝1|X_i；θ)＞p_i,q(y_i＝1|X_i(ii) a θ), I ═ 1; otherwise, I is 0.

L_total(θ)＝l(θ)+λ₁L1(θ)+λ₂L2(θ)+λ₃Ω(θ)+λ₄R(θ)

(5) Importance of features

Calculating the importance of a certain feature F, and the specific steps are as follows:

2) Randomly adding noise interference to the characteristic F of all samples in the test data (the value of the sample at the characteristic F can be randomly changed), calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding a noise disturbance which is subjected to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, x is a constant_F→x_F*(1-s)+(1-x_F) S, where s is the noise disturbance following the Bernoulli distribution, x_FIs the value of characteristic F.

4) Repeating the steps 1-3 n times, wherein n is usually more than 500 times.

5) The significance calculation formula of the feature F is as follows:

the significance of the feature can be described because if random noise is added, the accuracy of the test data is greatly reduced (i.e., error2 is increased), which indicates that the feature has a great influence on the prediction result of the sample, and thus the significance is higher.

(6) Visualization of feature impact on prognosis

And the influence of the characteristics on the prognosis is visually displayed by drawing the predicted cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:

wherein,

is the average of the model predicted outputs for all data,

3) Mixing the product obtained in step 2)

Plotted as a curve. For continuous variables, the value range of the variable can be averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is usually determined according to the specific characteristic value range.

The method utilizes a deep neural network structure to fit the nonlinear function of data; according to the dimension of input data, the length of the survival time and the accuracy of the model, the deep neural network structure can be flexibly expanded; the model directly models the survival probability, does not depend on proportional risk hypothesis, can fit the time-dependent effect of the characteristics, and has better interpretability; full data and deleted data are fully utilized through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing rule of survival probability through a sequencing loss function; through loss functions of L1 and L2, automatic feature selection is realized, and model overfitting is prevented; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves the generalization capability of the model; the model can process the traditional survival analysis problem and the survival analysis problem considering the competitive risk; a feature importance evaluation method based on a deep learning model is provided; and displaying the time dependence and nonlinear effect of the characteristics on prognosis in a visualized mode.

The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. A disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:

(1) in prognostic data survival analysis, a given data set is noted as: d { (X)₁，T₁，₁)，(X₂，T₂，₂)，...，(X_i，T_i，_i)，...，(X_N，T_N，_N)}。(X_i，T_i，_i) Represents an instance of data in which X_iThe ith data feature vector;_ia deletion indicator variable for the ith data_iWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1_iWhen the value is 0, the data is deleted data, that is, no event is observed; t is_iIndicating the lifetime of the ith piece of data. For non-deleted data, T_iEqual to the observed time-to-live O_i(ii) a For deleted data, T_iEqual to when deletedC_i。

The features of the data set may be expressed as:

where N is the number of samples and M is the number of features.

The labels of the data set may be expressed as:

Y＝{(T₁，₁)，(T₂，₂)，…，(T_i，_i)，…，(T_N，_N)}

(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)_i) N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:

1) logarithmic loss

For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given X_iThe conditional probability distribution of y is p (y)_i＝k|X_i(ii) a θ), where k is 1,2, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:

2) Loss of L1:

L1(θ)＝||θ||

3) l2 loss

L2(θ)＝||θ||²

4) Semi-supervised loss

H(p)＝-plogp-(1-p)log(1-p)

5) loss of ordering

wherein p is_i，p(y_i＝1|X_i(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy p_i，p(y_i＝1|X_i；θ)＜p_i，q(y_i＝1|X_i(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)_i，p(y_i＝1|X_i；θ)＞p_i，q(y_i＝1|X_i(ii) a θ)) is an indicator function, when p_i，p(y_i＝1|X_i；θ)＞p_i，q(y_i＝1|X_i(ii) a θ), I ═ 1; otherwise, I is 0.

L_total(θ)＝l(θ)+λ₁L1(θ)+λ₂L2(θ)+λ₃Ω(θ)+λ₄R(θ)

where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and₁，λ₂，λ₃，λ₄is a parameter that controls the strength of the regularization term.

2. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein said step (2) converts the original survival analysis problem into the multitask learning problem through the process of converting the label information into the vector.

3. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (3), the hidden layer parameters in the deep neural network adopt a hard sharing mechanism, so as to reduce the risk of overfitting.

4. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.

5. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein the prediction result display module is used for feature importance evaluation and visually displaying the time dependence and the nonlinear effect of features. The specific steps for calculating the importance of a certain feature F are as follows:

4) Repeating the steps of 1 to 3 for n times.

5) The significance calculation formula of the feature F is as follows:

6. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 5, wherein the prediction result display module is used for visually displaying the influence of the characteristics on the prognosis by drawing a prediction cumulative incidence curve corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:

1) all possible values of feature F are: x is the number of_F，1，x_F，2，...，x_F，v，...，x_F，VWhere V is the number of all possible values of the feature F.

2) Let the value of the characteristic F be x_F＝x_F，vV1, 2.. V, keeping the values of other features unchanged, calculating the average value of the model predicted cumulative occurrence rate:

wherein,

is the average of the model predicted outputs for all data,

is the model prediction output of the ith piece of data, x_i，oIs the value of all the other features in the ith piece of data except feature F.

3) Mixing the product obtained in step 2)

Plotted as a curve.

7. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to claim 6, characterized in that in the process of drawing a predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the division points are taken to carry out cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.