CN111640510A - Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis - Google Patents

Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis Download PDF

Info

Publication number
CN111640510A
CN111640510A CN202010273957.9A CN202010273957A CN111640510A CN 111640510 A CN111640510 A CN 111640510A CN 202010273957 A CN202010273957 A CN 202010273957A CN 111640510 A CN111640510 A CN 111640510A
Authority
CN
China
Prior art keywords
data
prediction
loss
model
survival
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010273957.9A
Other languages
Chinese (zh)
Inventor
李劲松
池胜强
田雨
周天舒
叶前呈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202010273957.9A priority Critical patent/CN111640510A/en
Publication of CN111640510A publication Critical patent/CN111640510A/en
Priority to PCT/CN2021/073136 priority patent/WO2021203796A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/20ICT specially adapted for the handling or processing of medical references relating to practices or guidelines

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Bioethics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which comprises a data acquisition module, a data preprocessing module, a prediction model construction module and the like; the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time dependence effect, and has better interpretability; fitting data by using a semi-supervised loss function and a sequencing loss function, fully utilizing complete data and deleted data, and processing the traditional survival analysis problem and the survival analysis problem considering competitive risk; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.

Description

Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis
Technical Field
The invention belongs to the technical field of medical treatment and machine learning, and particularly relates to a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis.
Background
The disease prognosis prediction analysis can provide the clinician with prognosis information for disease treatment, help the formulation of treatment plan, improve the disease cure rate, improve the prognosis life quality of patients, effectively reduce the disease burden, and has great significance for the control and treatment of diseases. Survival analysis is a commonly used data analysis method in disease prognosis prediction for analyzing and predicting the time of occurrence of an event. In medicine, it plays a key role in determining the course of treatment, developing new drugs, preventing adverse drug reactions, and improving hospital procedures. Recently, with the rise of deep learning models and the improvement of training techniques, application research of deep learning network structures such as deep neural networks, convolutional neural networks, long-term and short-term memory networks and the like in disease prognosis prediction is increased. In addition, some advanced machine learning strategies are also gradually applied to the survival analysis method based on deep learning, including active learning, migratory learning and multitask learning, so that the disease prognosis prediction performance is improved.
Deletion data is ubiquitous in disease prognosis data, and the deletion data is not missing data, but only can provide prognosis information from a starting point to a deletion time, and can not provide incomplete data of complete information from the starting point to an event occurrence. The existing method based on deep learning cannot fully utilize the deleted data; or the time dependence phenomenon of the characteristics can not be effectively solved under the condition of fully utilizing the deleted data; or insufficient generalization ability of the model; or poor interpretability of the model. The existing method based on multi-task learning cannot fully utilize the deleted data.
Disclosure of Invention
The invention aims to provide a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, aiming at the defects of the prior art.
The invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the non-increasing trend of deleted data and survival probability in the survival analysis is considered, the semi-supervised loss function and the sequencing loss function are used for fitting the data, and the traditional survival analysis problem and the survival analysis problem considering the competitive risk can be processed. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.
The deep neural network structure in the model comprises a plurality of layers of nonlinear transformation unit layers, and the nonlinear effect of the features can be fitted. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability. The model fully utilizes complete data and deletion data through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing trend of survival probability through a ranking loss function; automatic feature selection and prevention of model overfitting is achieved by the L1 and L2 loss functions. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.
The purpose of the invention is realized by the following technical scheme: a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T1,1),(X2,T2,2),…,(Xi,Ti,i),…,(XN,TN,N)}。(Xi,Ti,i) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to the deletion time Ci
Figure BDA0002444124610000021
The features of the data set may be expressed as:
Figure BDA0002444124610000022
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y{(T1,1),(T2,2),…,(Ti,i),…,(TN,N)}
(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:
Figure BDA0002444124610000023
(3) and constructing a deep neural network, wherein the deep neural network is provided with an input layer and a plurality of output layers, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, and namely each output layer corresponds to an event prediction task at different time. The deep neural network can make predictions for the same task at K different times.
(4) Constructing a prediction model, wherein an objective function of the prediction model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss:
1) logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
Figure BDA0002444124610000031
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
Figure BDA0002444124610000032
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|Xi(ii) a θ), where k is 1,2, …, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:
Figure BDA0002444124610000033
wherein, I { yiIs an indication function whenyiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) Loss of L1:
L1(θ)=||θ||
3) l2 loss
L2(θ)=||θ||2
4) Semi-supervised loss
Aiming at the non-label data, the utilization of the non-label data is realized by adding an entropy-constrained regularization item to the objective function.
For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plog p-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
Figure BDA0002444124610000034
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
Figure BDA0002444124610000041
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
Figure BDA0002444124610000042
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. I.e. when p < q, iThe probability of occurrence of an event of one sample should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1234is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
Further, the step (2) converts the original survival analysis problem into a multi-task learning problem through a process of converting the label information into a vector.
Further, in the step (3), a hard sharing mechanism is adopted for hidden layer parameters in the deep neural network, so that the risk of overfitting is reduced.
Further, in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.
Further, the prediction result display module is used for feature importance evaluation and displays time dependence and nonlinear effects of features in a visualization mode. The specific steps for calculating the importance of a certain feature F are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data, calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding noise interference which is subject to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, xF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps of 1 to 3 for n times.
5) The significance calculation formula of the feature F is as follows:
Figure BDA0002444124610000051
if random noise is added, the accuracy of the test data is greatly reduced, which shows that the characteristic has great influence on the prediction result of the sample, and further shows that the importance degree is higher.
Furthermore, the prediction result display module is used for visually displaying the influence of the characteristics on prognosis by drawing prediction cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,…,xF,v,…,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vAnd V is 1,2, …, V, keeping the values of other features unchanged, and calculating the average value of the model predicted cumulative occurrence rate:
Figure BDA0002444124610000052
wherein,
Figure BDA0002444124610000053
is the average of the model predicted outputs for all data,
Figure BDA0002444124610000054
is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
3) Mixing the product obtained in step 2)
Figure BDA0002444124610000055
Plotted as a curve.
Further, in the process of drawing the predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.
The invention has the beneficial effects that:
the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction. The deep neural network structure may be fitted to the nonlinear effects of the features. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability.
In consideration of the non-increasing trend of deleted data and survival probability in survival analysis, the method proposes that the data are fitted by using a semi-supervised loss function and a sequencing loss function, fully utilizes complete data and deleted data, and can process the traditional survival analysis problem and the survival analysis problem in consideration of competition risks. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.
Drawings
FIG. 1 is a diagram of a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to the present invention;
FIG. 2 is a schematic diagram of a dataset tag transformation;
fig. 3 is a diagram of a neural network architecture.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The deleted data of the application is: if at a specified end time, data for which no result event occurs is referred to as erasure data, and the time from the start point to erasure is referred to as erasure time. The time-dependent phenomenon is: regardless of the baseline risk, at any point in time, the risk of an event occurring in an individual with an exposure versus an individual without the exposure is constant; phenomena whose features do not meet the above assumptions are considered to be time-dependent in their impact on disease prognosis. The risk of competition is: during the disease prognosis follow-up period, the patient has no events of interest due to other events except the events of interest, namely, other events "compete" for the occurrence of the events of interest, and the events are called competitive risks; the competitive risk is only present in the problem of survival analysis where there are multiple endpoint events, but only one endpoint event occurs at any given time.
As shown in fig. 1, the present application provides a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which includes: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for visually displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the realization principle is as follows:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T1,1),(X2,T2,2),…,(Xi,Ti,i),…,(XN,TN,N){。(Xi,Ti,i) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to the deletion time Ci
Figure BDA0002444124610000071
The features of the data set may be expressed as:
Figure BDA0002444124610000072
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y={(T1,1),(T2,2),…,(Ti,i),…,(TN,N)}
(2) the present invention considers time-to-live as a plurality of time points, rather than as a continuous variable. Therefore, the original label information of each sample can be converted into a survival state vector with the dimension of K, wherein K is max (T)i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents an event occurrence (value 1), non-occurrence (value 0), or unknown (value 2) for the sample at this point in time. An example of a transformation of a dataset tag is shown in FIG. 2. The converted dataset labels may be represented as:
Figure BDA0002444124610000073
the original survival analysis problem is converted into a multi-task learning problem through a process of converting label information into vectors.
(3) A deep neural network with an input layer and a plurality of output layers is utilized, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, namely, each output layer corresponds to the event prediction task at different time. FIG. 3 shows a deep neural network with K output layers, if output K refers to a task at time TkThen the network can make predictions for the same task at K different times. Hidden layer parameters in the network employ a hard sharing mechanism. The hard sharing mechanism reduces the risk of overfitting. Intuitively, the more tasks learn simultaneously, the more tasks the model can capture the common feature representation, so that the less risk of overfitting on each task.
(4) Object function definition
For the deep semi-supervised multitask learning problem of survival analysis problem transformation, two important characteristics exist: non-increasing trends in unlabeled data and survival probability due to deletions. For these two problems, proper constraints need to be designed to deal with. For unlabeled data caused by deletion, entropy constrained regularization is utilized for semi-supervised learning. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small. Considering the non-increasing trend of the survival probability of different time points, the ordering loss is introduced, and the survival probability of different output layers is restrained. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss. The objective function of the model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss.
1) Logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
Figure BDA0002444124610000081
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
Figure BDA0002444124610000082
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, we consider the event prediction at each time point as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|XiThe model for solving the classification problem of y ∈ {1,2, …, C } is an extension of the binary model, whose parameters can also be solved by maximum likelihood estimation, and the corresponding log-loss function is:
Figure BDA0002444124610000083
wherein, I { yiK is an indication of the function when yiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) L1 loss
The L1 loss is defined as follows:
L1(θ)=||θ||
the loss of L1, i.e. adding the sum of the absolute values of all the weighting parameters theta to the objective function, can make more theta zero, enabling automatic selection of features.
3) L2 loss
The L2 loss is defined as follows:
L2(θ)=||θ||2
l2 penalizes, i.e., adds the sum of the squares of all the weighting parameters θ in the objective function, making all θ as close to zero as possible, avoiding overfitting.
4) Semi-supervised loss
For the non-label data, the use of the non-label data can be realized by adding an entropy-constrained regularization item to the objective function. For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plog p-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
Figure BDA0002444124610000091
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
Figure BDA0002444124610000092
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
Figure BDA0002444124610000093
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1234is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
(5) Importance of features
Calculating the importance of a certain feature F, and the specific steps are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data (the value of the sample at the characteristic F can be randomly changed), calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding a noise disturbance which is subjected to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, x is a constantF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps 1-3 n times, wherein n is usually more than 500 times.
5) The significance calculation formula of the feature F is as follows:
Figure BDA0002444124610000101
the significance of the feature can be described because if random noise is added, the accuracy of the test data is greatly reduced (i.e., error2 is increased), which indicates that the feature has a great influence on the prediction result of the sample, and thus the significance is higher.
(6) Visualization of feature impact on prognosis
And the influence of the characteristics on the prognosis is visually displayed by drawing the predicted cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,…,xF,v,…,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vAnd V is 1,2, …, V, keeping the values of other features unchanged, and calculating the average value of the model predicted cumulative occurrence rate:
Figure BDA0002444124610000102
wherein,
Figure BDA0002444124610000103
is the average of the model predicted outputs for all data,
Figure BDA0002444124610000104
is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
3) Mixing the product obtained in step 2)
Figure BDA0002444124610000105
Plotted as a curve. For continuous variables, the value range of the variable can be averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is usually determined according to the specific characteristic value range.
The method utilizes a deep neural network structure to fit the nonlinear function of data; according to the dimension of input data, the length of the survival time and the accuracy of the model, the deep neural network structure can be flexibly expanded; the model directly models the survival probability, does not depend on proportional risk hypothesis, can fit the time-dependent effect of the characteristics, and has better interpretability; full data and deleted data are fully utilized through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing rule of survival probability through a sequencing loss function; through loss functions of L1 and L2, automatic feature selection is realized, and model overfitting is prevented; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves the generalization capability of the model; the model can process the traditional survival analysis problem and the survival analysis problem considering the competitive risk; a feature importance evaluation method based on a deep learning model is provided; and displaying the time dependence and nonlinear effect of the characteristics on prognosis in a visualized mode.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (7)

1. A disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T11),(X2,T22),...,(Xi,Tii),...,(XN,TNN)}。(Xi,Tii) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to when deletedCi
Figure FDA0002444124600000011
The features of the data set may be expressed as:
Figure FDA0002444124600000012
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y={(T11),(T22),…,(Tii),…,(TNN)}
(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)i) N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:
Figure FDA0002444124600000013
(3) and constructing a deep neural network, wherein the deep neural network is provided with an input layer and a plurality of output layers, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, and namely each output layer corresponds to an event prediction task at different time. The deep neural network can make predictions for the same task at K different times.
(4) Constructing a prediction model, wherein an objective function of the prediction model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss:
1) logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
Figure FDA0002444124600000021
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
Figure FDA0002444124600000022
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|Xi(ii) a θ), where k is 1,2, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:
Figure FDA0002444124600000023
wherein, I { yiK is an indication of the function when yiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) Loss of L1:
L1(θ)=||θ||
3) l2 loss
L2(θ)=||θ||2
4) Semi-supervised loss
Aiming at the non-label data, the utilization of the non-label data is realized by adding an entropy-constrained regularization item to the objective function.
For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plogp-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
Figure FDA0002444124600000024
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
Figure FDA0002444124600000031
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
Figure FDA0002444124600000032
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1,λ2,λ3,λ4is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
2. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein said step (2) converts the original survival analysis problem into the multitask learning problem through the process of converting the label information into the vector.
3. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (3), the hidden layer parameters in the deep neural network adopt a hard sharing mechanism, so as to reduce the risk of overfitting.
4. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.
5. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein the prediction result display module is used for feature importance evaluation and visually displaying the time dependence and the nonlinear effect of features. The specific steps for calculating the importance of a certain feature F are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data, calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding noise interference which is subject to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, xF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps of 1 to 3 for n times.
5) The significance calculation formula of the feature F is as follows:
Figure FDA0002444124600000041
if random noise is added, the accuracy of the test data is greatly reduced, which shows that the characteristic has great influence on the prediction result of the sample, and further shows that the importance degree is higher.
6. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 5, wherein the prediction result display module is used for visually displaying the influence of the characteristics on the prognosis by drawing a prediction cumulative incidence curve corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,...,xF,v,...,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vV1, 2.. V, keeping the values of other features unchanged, calculating the average value of the model predicted cumulative occurrence rate:
Figure FDA0002444124600000042
wherein,
Figure FDA0002444124600000043
is the average of the model predicted outputs for all data,
Figure FDA0002444124600000044
is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
3) Mixing the product obtained in step 2)
Figure FDA0002444124600000045
Plotted as a curve.
7. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to claim 6, characterized in that in the process of drawing a predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the division points are taken to carry out cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.
CN202010273957.9A 2020-04-09 2020-04-09 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis Pending CN111640510A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010273957.9A CN111640510A (en) 2020-04-09 2020-04-09 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis
PCT/CN2021/073136 WO2021203796A1 (en) 2020-04-09 2021-01-21 Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273957.9A CN111640510A (en) 2020-04-09 2020-04-09 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis

Publications (1)

Publication Number Publication Date
CN111640510A true CN111640510A (en) 2020-09-08

Family

ID=72331086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273957.9A Pending CN111640510A (en) 2020-04-09 2020-04-09 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis

Country Status (2)

Country Link
CN (1) CN111640510A (en)
WO (1) WO2021203796A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819768A (en) * 2021-01-26 2021-05-18 复旦大学 DCNN-based cancer full-field digital pathological section survival analysis method
CN112906994A (en) * 2021-04-19 2021-06-04 拉扎斯网络科技(上海)有限公司 Order meal delivery time prediction method and device, electronic equipment and storage medium
CN113314218A (en) * 2021-06-22 2021-08-27 浙江大学 Dynamic survival analysis equipment containing competition risk based on comparison
WO2021203796A1 (en) * 2020-04-09 2021-10-14 之江实验室 Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis
CN113903466A (en) * 2021-09-26 2022-01-07 新乡医学院第一附属医院 Data processing device and system for auxiliary evaluation of risk degree of cardiovascular diseases of population and application of data processing device and system
CN115188470A (en) * 2022-06-29 2022-10-14 山东大学 Multi-chronic disease prediction system based on multitask Cox learning model
CN115565669A (en) * 2022-10-11 2023-01-03 电子科技大学 Cancer survival analysis method based on GAN and multitask learning
WO2023284321A1 (en) * 2021-07-15 2023-01-19 华为云计算技术有限公司 Method and device for predicting survival hazard ratio
CN116403714A (en) * 2023-04-07 2023-07-07 大连市中心医院 Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium
TWI810510B (en) * 2021-01-04 2023-08-01 鴻海精密工業股份有限公司 Method and device for processing multi-modal data, electronic device, and storage medium
CN116564524A (en) * 2023-06-30 2023-08-08 之江实验室 Pseudo tag evolution trend regular prognosis prediction device
CN118053047A (en) * 2024-04-11 2024-05-17 浙江公路水运工程咨询集团有限公司 Method and system for detecting unsupervised reconstruction network abnormality based on pseudo tag

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114141366B (en) * 2021-12-31 2024-03-26 杭州电子科技大学 Auxiliary analysis method for cerebral apoplexy rehabilitation evaluation based on voice multitasking learning
CN114566289B (en) * 2022-04-26 2022-08-09 之江实验室 Disease prediction system based on multi-center clinical data anti-cheating analysis
CN114927237B (en) * 2022-04-26 2024-07-02 东南大学 Disease prevention and control disease control facility configuration method with capacity limitation
CN114821337B (en) * 2022-05-20 2024-04-16 武汉大学 Semi-supervised SAR image building area extraction method based on phase consistency pseudo tag
CN115184054B (en) * 2022-05-30 2022-12-27 深圳技术大学 Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium
CN115458158B (en) * 2022-09-23 2023-09-15 深圳大学 Acute kidney injury prediction system for sepsis patient
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116206755B (en) * 2023-05-06 2023-08-22 之江实验室 Disease detection and knowledge discovery device based on neural topic model
CN116504423B (en) * 2023-06-26 2023-09-26 北京大学 Drug effectiveness evaluation method
CN117059270A (en) * 2023-08-14 2023-11-14 北京理工大学 Acute altitude disease risk assessment system combining medical priori knowledge pseudo tags
CN116832285B (en) * 2023-09-01 2023-11-07 吉林大学 Breathing machine operation abnormity monitoring and early warning system based on cloud platform
CN116959715B (en) * 2023-09-18 2024-01-09 之江实验室 Disease prognosis prediction system based on time sequence evolution process explanation
CN117558414B (en) * 2023-11-23 2024-05-24 之江实验室 System, electronic device and medium for predicting early recurrence of multi-tasking hepatocellular carcinoma
CN117971356B (en) * 2024-03-29 2024-06-14 苏州元脑智能科技有限公司 Heterogeneous acceleration method, device, equipment and storage medium based on semi-supervised learning
CN118280601A (en) * 2024-04-07 2024-07-02 佛山科学技术学院 Anticancer drug sensitivity assessment method and system based on semi-supervised learning
CN118522468B (en) * 2024-07-22 2024-09-27 武汉市第三医院 Blood concentration monitoring system based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944479A (en) * 2017-11-16 2018-04-20 哈尔滨工业大学 Disease forecasting method for establishing model and device based on semi-supervised learning
CN110556178A (en) * 2018-05-30 2019-12-10 西门子医疗有限公司 decision support system for medical therapy planning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897545B (en) * 2017-01-05 2019-04-30 浙江大学 A kind of tumor prognosis forecasting system based on depth confidence network
CN108053398A (en) * 2017-12-19 2018-05-18 南京信息工程大学 A kind of melanoma automatic testing method of semi-supervised feature learning
CN108564039A (en) * 2018-04-16 2018-09-21 北京工业大学 A kind of epileptic seizure prediction method generating confrontation network based on semi-supervised deep layer
US10559386B1 (en) * 2019-04-02 2020-02-11 Kpn Innovations, Llc Methods and systems for an artificial intelligence support network for vibrant constituional guidance
CN110580695B (en) * 2019-08-07 2022-06-21 深圳先进技术研究院 Multi-mode three-dimensional medical image fusion method and system and electronic equipment
CN111640510A (en) * 2020-04-09 2020-09-08 之江实验室 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944479A (en) * 2017-11-16 2018-04-20 哈尔滨工业大学 Disease forecasting method for establishing model and device based on semi-supervised learning
CN110556178A (en) * 2018-05-30 2019-12-10 西门子医疗有限公司 decision support system for medical therapy planning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
池胜强: "基于机器学习的结直肠癌预后模型及其泛化能力研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021203796A1 (en) * 2020-04-09 2021-10-14 之江实验室 Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis
TWI810510B (en) * 2021-01-04 2023-08-01 鴻海精密工業股份有限公司 Method and device for processing multi-modal data, electronic device, and storage medium
CN112819768B (en) * 2021-01-26 2022-06-17 复旦大学 DCNN-based survival analysis method for cancer full-field digital pathological section
CN112819768A (en) * 2021-01-26 2021-05-18 复旦大学 DCNN-based cancer full-field digital pathological section survival analysis method
CN112906994A (en) * 2021-04-19 2021-06-04 拉扎斯网络科技(上海)有限公司 Order meal delivery time prediction method and device, electronic equipment and storage medium
CN113314218A (en) * 2021-06-22 2021-08-27 浙江大学 Dynamic survival analysis equipment containing competition risk based on comparison
WO2023284321A1 (en) * 2021-07-15 2023-01-19 华为云计算技术有限公司 Method and device for predicting survival hazard ratio
CN113903466A (en) * 2021-09-26 2022-01-07 新乡医学院第一附属医院 Data processing device and system for auxiliary evaluation of risk degree of cardiovascular diseases of population and application of data processing device and system
CN115188470A (en) * 2022-06-29 2022-10-14 山东大学 Multi-chronic disease prediction system based on multitask Cox learning model
CN115565669A (en) * 2022-10-11 2023-01-03 电子科技大学 Cancer survival analysis method based on GAN and multitask learning
CN116403714A (en) * 2023-04-07 2023-07-07 大连市中心医院 Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium
CN116403714B (en) * 2023-04-07 2024-01-26 大连市中心医院 Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium
CN116564524A (en) * 2023-06-30 2023-08-08 之江实验室 Pseudo tag evolution trend regular prognosis prediction device
CN116564524B (en) * 2023-06-30 2023-10-03 之江实验室 Pseudo tag evolution trend regular prognosis prediction device
CN118053047A (en) * 2024-04-11 2024-05-17 浙江公路水运工程咨询集团有限公司 Method and system for detecting unsupervised reconstruction network abnormality based on pseudo tag

Also Published As

Publication number Publication date
WO2021203796A1 (en) 2021-10-14

Similar Documents

Publication Publication Date Title
CN111640510A (en) Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis
Callaway et al. Fixation patterns in simple choice reflect optimal information sampling
CN109659033B (en) Chronic disease state of an illness change event prediction device based on recurrent neural network
EP3620983B1 (en) Computer-implemented method, computer program product and system for data analysis
CN109599177B (en) Method for predicting medical treatment track through deep learning based on medical history
CN110119540B (en) Multi-output gradient lifting tree modeling method for survival risk analysis
CN116340796B (en) Time sequence data analysis method, device, equipment and storage medium
Li et al. Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction
Salerno et al. High-dimensional survival analysis: Methods and applications
Yue et al. Bayesian Tobit quantile regression model for medical expenditure panel survey data
CN115903741A (en) Data anomaly detection method for industrial control system
Enguehard Learning perturbations to explain time series predictions
Li et al. Life-cycle modeling driven by coupling competition degradation for remaining useful life prediction
Wu et al. Imaging feature-based clustering of financial time series
Nayebi et al. WindowSHAP: An efficient framework for explaining time-series classifiers based on Shapley values
Yeganeh et al. Monitoring multistage healthcare processes using state space models and a machine learning based framework
Subhash et al. Nonparametric estimation of quantile-based entropy function
Ferdous et al. Cdans: Temporal causal discovery from autocorrelated and non-stationary time series data
Groha et al. Neural odes for multi-state survival analysis
CN114580791B (en) Method and device for identifying working state of bulking machine, computer equipment and storage medium
Budhathoki et al. Accurate causal inference on discrete data
Zhang et al. Hurdle modeling for defect data with excess zeros in steel manufacturing process
Pachal et al. Sequence prediction under missing data: An RNN approach without imputation
CN115565669A (en) Cancer survival analysis method based on GAN and multitask learning
Chown et al. The nonparametric location-scale mixture cure model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200908