CN111640510A - Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis - Google Patents
Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis Download PDFInfo
- Publication number
- CN111640510A CN111640510A CN202010273957.9A CN202010273957A CN111640510A CN 111640510 A CN111640510 A CN 111640510A CN 202010273957 A CN202010273957 A CN 202010273957A CN 111640510 A CN111640510 A CN 111640510A
- Authority
- CN
- China
- Prior art keywords
- data
- prediction
- loss
- model
- survival
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004083 survival effect Effects 0.000 title claims abstract description 84
- 238000004458 analytical method Methods 0.000 title claims abstract description 50
- 201000010099 disease Diseases 0.000 title claims abstract description 44
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 44
- 238000004393 prognosis Methods 0.000 title claims abstract description 42
- 230000006870 function Effects 0.000 claims abstract description 54
- 230000002860 competitive effect Effects 0.000 claims abstract description 14
- 238000012163 sequencing technique Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000010276 construction Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 23
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000012217 deletion Methods 0.000 claims description 15
- 230000037430 deletion Effects 0.000 claims description 15
- 230000001186 cumulative effect Effects 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 9
- 230000009022 nonlinear effect Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract 1
- 238000012800 visualization Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000005919 time-dependent effect Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000001617 migratory effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/20—ICT specially adapted for the handling or processing of medical references relating to practices or guidelines
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Pathology (AREA)
- Bioethics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which comprises a data acquisition module, a data preprocessing module, a prediction model construction module and the like; the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time dependence effect, and has better interpretability; fitting data by using a semi-supervised loss function and a sequencing loss function, fully utilizing complete data and deleted data, and processing the traditional survival analysis problem and the survival analysis problem considering competitive risk; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.
Description
Technical Field
The invention belongs to the technical field of medical treatment and machine learning, and particularly relates to a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis.
Background
The disease prognosis prediction analysis can provide the clinician with prognosis information for disease treatment, help the formulation of treatment plan, improve the disease cure rate, improve the prognosis life quality of patients, effectively reduce the disease burden, and has great significance for the control and treatment of diseases. Survival analysis is a commonly used data analysis method in disease prognosis prediction for analyzing and predicting the time of occurrence of an event. In medicine, it plays a key role in determining the course of treatment, developing new drugs, preventing adverse drug reactions, and improving hospital procedures. Recently, with the rise of deep learning models and the improvement of training techniques, application research of deep learning network structures such as deep neural networks, convolutional neural networks, long-term and short-term memory networks and the like in disease prognosis prediction is increased. In addition, some advanced machine learning strategies are also gradually applied to the survival analysis method based on deep learning, including active learning, migratory learning and multitask learning, so that the disease prognosis prediction performance is improved.
Deletion data is ubiquitous in disease prognosis data, and the deletion data is not missing data, but only can provide prognosis information from a starting point to a deletion time, and can not provide incomplete data of complete information from the starting point to an event occurrence. The existing method based on deep learning cannot fully utilize the deleted data; or the time dependence phenomenon of the characteristics can not be effectively solved under the condition of fully utilizing the deleted data; or insufficient generalization ability of the model; or poor interpretability of the model. The existing method based on multi-task learning cannot fully utilize the deleted data.
Disclosure of Invention
The invention aims to provide a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, aiming at the defects of the prior art.
The invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction; the non-increasing trend of deleted data and survival probability in the survival analysis is considered, the semi-supervised loss function and the sequencing loss function are used for fitting the data, and the traditional survival analysis problem and the survival analysis problem considering the competitive risk can be processed. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.
The deep neural network structure in the model comprises a plurality of layers of nonlinear transformation unit layers, and the nonlinear effect of the features can be fitted. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability. The model fully utilizes complete data and deletion data through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing trend of survival probability through a ranking loss function; automatic feature selection and prevention of model overfitting is achieved by the L1 and L2 loss functions. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model.
The purpose of the invention is realized by the following technical scheme: a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T1,1),(X2,T2,2),…,(Xi,Ti,i),…,(XN,TN,N)}。(Xi,Ti,i) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to the deletion time Ci。
The features of the data set may be expressed as:
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y{(T1,1),(T2,2),…,(Ti,i),…,(TN,N)}
(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:
(3) and constructing a deep neural network, wherein the deep neural network is provided with an input layer and a plurality of output layers, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, and namely each output layer corresponds to an event prediction task at different time. The deep neural network can make predictions for the same task at K different times.
(4) Constructing a prediction model, wherein an objective function of the prediction model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss:
1) logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|Xi(ii) a θ), where k is 1,2, …, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:
wherein, I { yiIs an indication function whenyiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) Loss of L1:
L1(θ)=||θ||
3) l2 loss
L2(θ)=||θ||2
4) Semi-supervised loss
Aiming at the non-label data, the utilization of the non-label data is realized by adding an entropy-constrained regularization item to the objective function.
For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plog p-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. I.e. when p < q, iThe probability of occurrence of an event of one sample should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1,λ2,λ3,λ4is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
Further, the step (2) converts the original survival analysis problem into a multi-task learning problem through a process of converting the label information into a vector.
Further, in the step (3), a hard sharing mechanism is adopted for hidden layer parameters in the deep neural network, so that the risk of overfitting is reduced.
Further, in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.
Further, the prediction result display module is used for feature importance evaluation and displays time dependence and nonlinear effects of features in a visualization mode. The specific steps for calculating the importance of a certain feature F are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data, calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding noise interference which is subject to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, xF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps of 1 to 3 for n times.
5) The significance calculation formula of the feature F is as follows:
if random noise is added, the accuracy of the test data is greatly reduced, which shows that the characteristic has great influence on the prediction result of the sample, and further shows that the importance degree is higher.
Furthermore, the prediction result display module is used for visually displaying the influence of the characteristics on prognosis by drawing prediction cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,…,xF,v,…,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vAnd V is 1,2, …, V, keeping the values of other features unchanged, and calculating the average value of the model predicted cumulative occurrence rate:
wherein,is the average of the model predicted outputs for all data,is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
Further, in the process of drawing the predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.
The invention has the beneficial effects that:
the invention takes a deep neural network model as a basis, and converts a survival analysis problem into a multi-task learning model consisting of semi-supervised learning problems of multi-time-sequence point survival probability prediction. The deep neural network structure may be fitted to the nonlinear effects of the features. The model directly models the survival probability, does not depend on the proportional risk assumption, can fit the time-dependent effect, and has better interpretability.
In consideration of the non-increasing trend of deleted data and survival probability in survival analysis, the method proposes that the data are fitted by using a semi-supervised loss function and a sequencing loss function, fully utilizes complete data and deleted data, and can process the traditional survival analysis problem and the survival analysis problem in consideration of competition risks. The model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves generalization capability of the model. Meanwhile, an evaluation method of feature importance is provided, and time dependence and nonlinear effects of features are displayed in a visualization mode.
Drawings
FIG. 1 is a diagram of a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to the present invention;
FIG. 2 is a schematic diagram of a dataset tag transformation;
fig. 3 is a diagram of a neural network architecture.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The deleted data of the application is: if at a specified end time, data for which no result event occurs is referred to as erasure data, and the time from the start point to erasure is referred to as erasure time. The time-dependent phenomenon is: regardless of the baseline risk, at any point in time, the risk of an event occurring in an individual with an exposure versus an individual without the exposure is constant; phenomena whose features do not meet the above assumptions are considered to be time-dependent in their impact on disease prognosis. The risk of competition is: during the disease prognosis follow-up period, the patient has no events of interest due to other events except the events of interest, namely, other events "compete" for the occurrence of the events of interest, and the events are called competitive risks; the competitive risk is only present in the problem of survival analysis where there are multiple endpoint events, but only one endpoint event occurs at any given time.
As shown in fig. 1, the present application provides a disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, which includes: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for visually displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the realization principle is as follows:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T1,1),(X2,T2,2),…,(Xi,Ti,i),…,(XN,TN,N){。(Xi,Ti,i) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to the deletion time Ci。
The features of the data set may be expressed as:
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y={(T1,1),(T2,2),…,(Ti,i),…,(TN,N)}
(2) the present invention considers time-to-live as a plurality of time points, rather than as a continuous variable. Therefore, the original label information of each sample can be converted into a survival state vector with the dimension of K, wherein K is max (T)i) I-1, 2, …, N, is the maximum survival time in all samples. Each element in the survival state vector represents an event occurrence (value 1), non-occurrence (value 0), or unknown (value 2) for the sample at this point in time. An example of a transformation of a dataset tag is shown in FIG. 2. The converted dataset labels may be represented as:
the original survival analysis problem is converted into a multi-task learning problem through a process of converting label information into vectors.
(3) A deep neural network with an input layer and a plurality of output layers is utilized, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, namely, each output layer corresponds to the event prediction task at different time. FIG. 3 shows a deep neural network with K output layers, if output K refers to a task at time TkThen the network can make predictions for the same task at K different times. Hidden layer parameters in the network employ a hard sharing mechanism. The hard sharing mechanism reduces the risk of overfitting. Intuitively, the more tasks learn simultaneously, the more tasks the model can capture the common feature representation, so that the less risk of overfitting on each task.
(4) Object function definition
For the deep semi-supervised multitask learning problem of survival analysis problem transformation, two important characteristics exist: non-increasing trends in unlabeled data and survival probability due to deletions. For these two problems, proper constraints need to be designed to deal with. For unlabeled data caused by deletion, entropy constrained regularization is utilized for semi-supervised learning. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small. Considering the non-increasing trend of the survival probability of different time points, the ordering loss is introduced, and the survival probability of different output layers is restrained. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss. The objective function of the model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss.
1) Logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, we consider the event prediction at each time point as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|XiThe model for solving the classification problem of y ∈ {1,2, …, C } is an extension of the binary model, whose parameters can also be solved by maximum likelihood estimation, and the corresponding log-loss function is:
wherein, I { yiK is an indication of the function when yiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) L1 loss
The L1 loss is defined as follows:
L1(θ)=||θ||
the loss of L1, i.e. adding the sum of the absolute values of all the weighting parameters theta to the objective function, can make more theta zero, enabling automatic selection of features.
3) L2 loss
The L2 loss is defined as follows:
L2(θ)=||θ||2
l2 penalizes, i.e., adds the sum of the squares of all the weighting parameters θ in the objective function, making all θ as close to zero as possible, avoiding overfitting.
4) Semi-supervised loss
For the non-label data, the use of the non-label data can be realized by adding an entropy-constrained regularization item to the objective function. For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plog p-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1,λ2,λ3,λ4is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
(5) Importance of features
Calculating the importance of a certain feature F, and the specific steps are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data (the value of the sample at the characteristic F can be randomly changed), calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding a noise disturbance which is subjected to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, x is a constantF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps 1-3 n times, wherein n is usually more than 500 times.
5) The significance calculation formula of the feature F is as follows:
the significance of the feature can be described because if random noise is added, the accuracy of the test data is greatly reduced (i.e., error2 is increased), which indicates that the feature has a great influence on the prediction result of the sample, and thus the significance is higher.
(6) Visualization of feature impact on prognosis
And the influence of the characteristics on the prognosis is visually displayed by drawing the predicted cumulative incidence curves corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,…,xF,v,…,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vAnd V is 1,2, …, V, keeping the values of other features unchanged, and calculating the average value of the model predicted cumulative occurrence rate:
wherein,is the average of the model predicted outputs for all data,is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
3) Mixing the product obtained in step 2)Plotted as a curve. For continuous variables, the value range of the variable can be averagely divided into R equal parts, the values of all the dividing points are taken for cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is usually determined according to the specific characteristic value range.
The method utilizes a deep neural network structure to fit the nonlinear function of data; according to the dimension of input data, the length of the survival time and the accuracy of the model, the deep neural network structure can be flexibly expanded; the model directly models the survival probability, does not depend on proportional risk hypothesis, can fit the time-dependent effect of the characteristics, and has better interpretability; full data and deleted data are fully utilized through a logarithmic loss function and a semi-supervised loss function; utilizing a non-increasing rule of survival probability through a sequencing loss function; through loss functions of L1 and L2, automatic feature selection is realized, and model overfitting is prevented; the model realizes data sharing among a plurality of prediction tasks through multi-task learning of multiple time sequence points, realizes mutual constraint among the plurality of prediction tasks, and improves the generalization capability of the model; the model can process the traditional survival analysis problem and the survival analysis problem considering the competitive risk; a feature importance evaluation method based on a deep learning model is provided; and displaying the time dependence and nonlinear effect of the characteristics on prognosis in a visualized mode.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (7)
1. A disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis, comprising: a data acquisition module for acquiring disease prognosis data; the data preprocessing module is used for carrying out missing value processing and normalization processing on disease prognosis data; a prediction model construction module for modeling disease prognosis data; the prediction result display module is used for displaying the data prediction result; the prediction model building module adopts a survival analysis method based on deep semi-supervised multitask learning, and the method comprises the following specific steps:
(1) in prognostic data survival analysis, a given data set is noted as: d { (X)1,T1,1),(X2,T2,2),...,(Xi,Ti,i),...,(XN,TN,N)}。(Xi,Ti,i) Represents an instance of data in which XiThe ith data feature vector;ia deletion indicator variable for the ith dataiWhen the number is 1, the data is non-deleted data, that is, an event is observed, and when the number is 1iWhen the value is 0, the data is deleted data, that is, no event is observed; t isiIndicating the lifetime of the ith piece of data. For non-deleted data, TiEqual to the observed time-to-live Oi(ii) a For deleted data, TiEqual to when deletedCi。
The features of the data set may be expressed as:
where N is the number of samples and M is the number of features.
The labels of the data set may be expressed as:
Y={(T1,1),(T2,2),…,(Ti,i),…,(TN,N)}
(2) regarding the survival time as a plurality of time points, converting the original label information of each sample into a K-dimensional survival state vector, wherein K is max (T)i) N, is the maximum survival time in all samples. Each element in the survival state vector represents the occurrence, non-occurrence, or unknown of the event for the sample at this point in time. The converted dataset labels may be represented as:
(3) and constructing a deep neural network, wherein the deep neural network is provided with an input layer and a plurality of output layers, the input of the deep neural network is the characteristic X of the data set, the output label is Y, each output layer corresponds to each Y in the Y, and namely each output layer corresponds to an event prediction task at different time. The deep neural network can make predictions for the same task at K different times.
(4) Constructing a prediction model, wherein an objective function of the prediction model consists of five parts, namely logarithmic loss, L1 loss, L2 loss, semi-supervised loss and sequencing loss:
1) logarithmic loss
For labeled data, for the two classification problems without considering competition risk, the model measures the accuracy of the classifier by punishing wrong classification by using logarithmic loss. The label is y, y is e {0,1 }. The parameter θ is estimated by a maximum likelihood estimation method, the likelihood function being:
wherein l is the number of labeled samples, p (X)i(ii) a θ) is sample XiThe posterior probability of (d). Taking logarithm to the likelihood function to obtain a log likelihood function, namely a log loss function:
i.e. the greater the probability that each sample belongs to its true mark, the better.
For the survival analysis problem considering the competitive risk, the event prediction at each time point is regarded as a multi-classification problem. Suppose that at a given XiThe conditional probability distribution of y is p (y)i=k|Xi(ii) a θ), where k is 1,2, C is the number of all possible outcomes. Estimating a parameter theta by a maximum likelihood estimation method, wherein a corresponding logarithmic loss function is as follows:
wherein, I { yiK is an indication of the function when yiWhen k, I { yiK 1; otherwise, I { yi=k}=0。
2) Loss of L1:
L1(θ)=||θ||
3) l2 loss
L2(θ)=||θ||2
4) Semi-supervised loss
Aiming at the non-label data, the utilization of the non-label data is realized by adding an entropy-constrained regularization item to the objective function.
For the binary problem without considering the competitive risk, the event state is a random variable obeying Bernoulli distribution with a parameter p, and the entropy is defined as follows:
H(p)=-plogp-(1-p)log(1-p)
then for unlabeled data, entropy-constrained regularization is defined as follows:
wherein u is the number of unlabeled samples, and p is the probability of occurrence of an event. If the class of unlabeled data is deterministic, the entropy constrained regularization term will be small.
For the multi-classification problem considering the competitive risk, the entropy-constrained regularization of the unlabeled data is defined as follows:
5) loss of ordering
The non-increasing trend of the survival probability is constrained by adding a ranking penalty to the objective function. The ordering penalty is defined as follows:
wherein p isi,p(yi=1|Xi(ii) a θ) represents the probability of a death event occurring at time p for the ith sample. That is, when the time p < q, the probability of the occurrence of the ith sample event should satisfy pi,p(yi=1|Xi;θ)<pi,q(yi=1|Xi(ii) a Theta), otherwise, applying punishment to the event occurrence probability; i (p)i,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ)) is an indicator function, when pi,p(yi=1|Xi;θ)>pi,q(yi=1|Xi(ii) a θ), I ═ 1; otherwise, I is 0.
In summary, the semi-supervised multitask survival analysis model based on deep learning, namely the objective function of the prediction model, is as follows:
Ltotal(θ)=l(θ)+λ1L1(θ)+λ2L2(θ)+λ3Ω(θ)+λ4R(θ)
where L (θ) is log loss, L1(θ) is L1 loss, L2(θ) is L2 loss, Ω (θ) is semi-supervised loss, R (θ) is ordering loss, λ (θ) is ordering loss, and1,λ2,λ3,λ4is a parameter that controls the strength of the regularization term.
And (5) performing model training by using the disease data to obtain a parameter theta of the model, thereby determining the prediction model. And predicting the new disease data by using the prediction model to obtain the prediction result of disease prognosis.
2. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein said step (2) converts the original survival analysis problem into the multitask learning problem through the process of converting the label information into the vector.
3. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (3), the hidden layer parameters in the deep neural network adopt a hard sharing mechanism, so as to reduce the risk of overfitting.
4. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein in the step (4), for the deep semi-supervised multitask learning problem of survival analysis problem transformation, there are two important features: non-increasing trends in unlabeled data and survival probability due to deletions. And aiming at the unlabeled data caused by deletion, performing semi-supervised learning by utilizing entropy constraint regularization. And aiming at the non-increasing trend of the survival probability at different time points, introducing sequencing loss to constrain the survival probability of different output layers. Meanwhile, the automatic selection of the characteristics is realized by introducing L1 loss into the objective function, and the overfitting is avoided by introducing L2 loss.
5. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 1, wherein the prediction result display module is used for feature importance evaluation and visually displaying the time dependence and the nonlinear effect of features. The specific steps for calculating the importance of a certain feature F are as follows:
1) corresponding test data is selected to calculate a model prediction error, which is noted as error 1.
2) Randomly adding noise interference to the characteristic F of all samples in the test data, calculating the prediction error of the model again, and recording the prediction error as error 2. for a continuous variable, randomly adding noise interference which is subject to normal distribution N (0, sigma ∈), wherein sigma is the standard deviation of the characteristic F, ∈ is a small constant, and for a discrete variable, xF→xF*(1-s)+(1-xF) S, where s is the noise disturbance following the Bernoulli distribution, xFIs the value of characteristic F.
3) Calculating the difference e of the two prediction errors: e-error 2-error 1.
4) Repeating the steps of 1 to 3 for n times.
5) The significance calculation formula of the feature F is as follows:
if random noise is added, the accuracy of the test data is greatly reduced, which shows that the characteristic has great influence on the prediction result of the sample, and further shows that the importance degree is higher.
6. The disease prognosis prediction system based on the deep semi-supervised multitask learning survival analysis as claimed in claim 5, wherein the prediction result display module is used for visually displaying the influence of the characteristics on the prognosis by drawing a prediction cumulative incidence curve corresponding to different characteristics. Drawing a predicted cumulative occurrence curve corresponding to a certain characteristic F, and specifically comprising the following steps:
1) all possible values of feature F are: x is the number ofF,1,xF,2,...,xF,v,...,xF,VWhere V is the number of all possible values of the feature F.
2) Let the value of the characteristic F be xF=xF,vV1, 2.. V, keeping the values of other features unchanged, calculating the average value of the model predicted cumulative occurrence rate:
wherein,is the average of the model predicted outputs for all data,is the model prediction output of the ith piece of data, xi,oIs the value of all the other features in the ith piece of data except feature F.
7. The disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis according to claim 6, characterized in that in the process of drawing a predicted cumulative occurrence rate curve, for continuous variables, the value range of the variables is averagely divided into R equal parts, the values of all the division points are taken to carry out cumulative occurrence rate estimation and curve drawing, the calculated amount is reduced, and R is determined according to the specific characteristic value range.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273957.9A CN111640510A (en) | 2020-04-09 | 2020-04-09 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
PCT/CN2021/073136 WO2021203796A1 (en) | 2020-04-09 | 2021-01-21 | Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273957.9A CN111640510A (en) | 2020-04-09 | 2020-04-09 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111640510A true CN111640510A (en) | 2020-09-08 |
Family
ID=72331086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010273957.9A Pending CN111640510A (en) | 2020-04-09 | 2020-04-09 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111640510A (en) |
WO (1) | WO2021203796A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112819768A (en) * | 2021-01-26 | 2021-05-18 | 复旦大学 | DCNN-based cancer full-field digital pathological section survival analysis method |
CN112906994A (en) * | 2021-04-19 | 2021-06-04 | 拉扎斯网络科技(上海)有限公司 | Order meal delivery time prediction method and device, electronic equipment and storage medium |
CN113314218A (en) * | 2021-06-22 | 2021-08-27 | 浙江大学 | Dynamic survival analysis equipment containing competition risk based on comparison |
WO2021203796A1 (en) * | 2020-04-09 | 2021-10-14 | 之江实验室 | Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis |
CN113903466A (en) * | 2021-09-26 | 2022-01-07 | 新乡医学院第一附属医院 | Data processing device and system for auxiliary evaluation of risk degree of cardiovascular diseases of population and application of data processing device and system |
CN115188470A (en) * | 2022-06-29 | 2022-10-14 | 山东大学 | Multi-chronic disease prediction system based on multitask Cox learning model |
CN115565669A (en) * | 2022-10-11 | 2023-01-03 | 电子科技大学 | Cancer survival analysis method based on GAN and multitask learning |
WO2023284321A1 (en) * | 2021-07-15 | 2023-01-19 | 华为云计算技术有限公司 | Method and device for predicting survival hazard ratio |
CN116403714A (en) * | 2023-04-07 | 2023-07-07 | 大连市中心医院 | Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium |
TWI810510B (en) * | 2021-01-04 | 2023-08-01 | 鴻海精密工業股份有限公司 | Method and device for processing multi-modal data, electronic device, and storage medium |
CN116564524A (en) * | 2023-06-30 | 2023-08-08 | 之江实验室 | Pseudo tag evolution trend regular prognosis prediction device |
CN118053047A (en) * | 2024-04-11 | 2024-05-17 | 浙江公路水运工程咨询集团有限公司 | Method and system for detecting unsupervised reconstruction network abnormality based on pseudo tag |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114141366B (en) * | 2021-12-31 | 2024-03-26 | 杭州电子科技大学 | Auxiliary analysis method for cerebral apoplexy rehabilitation evaluation based on voice multitasking learning |
CN114566289B (en) * | 2022-04-26 | 2022-08-09 | 之江实验室 | Disease prediction system based on multi-center clinical data anti-cheating analysis |
CN114927237B (en) * | 2022-04-26 | 2024-07-02 | 东南大学 | Disease prevention and control disease control facility configuration method with capacity limitation |
CN114821337B (en) * | 2022-05-20 | 2024-04-16 | 武汉大学 | Semi-supervised SAR image building area extraction method based on phase consistency pseudo tag |
CN115184054B (en) * | 2022-05-30 | 2022-12-27 | 深圳技术大学 | Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium |
CN115458158B (en) * | 2022-09-23 | 2023-09-15 | 深圳大学 | Acute kidney injury prediction system for sepsis patient |
CN116072298B (en) * | 2023-04-06 | 2023-08-15 | 之江实验室 | Disease prediction system based on hierarchical marker distribution learning |
CN116206755B (en) * | 2023-05-06 | 2023-08-22 | 之江实验室 | Disease detection and knowledge discovery device based on neural topic model |
CN116504423B (en) * | 2023-06-26 | 2023-09-26 | 北京大学 | Drug effectiveness evaluation method |
CN117059270A (en) * | 2023-08-14 | 2023-11-14 | 北京理工大学 | Acute altitude disease risk assessment system combining medical priori knowledge pseudo tags |
CN116832285B (en) * | 2023-09-01 | 2023-11-07 | 吉林大学 | Breathing machine operation abnormity monitoring and early warning system based on cloud platform |
CN116959715B (en) * | 2023-09-18 | 2024-01-09 | 之江实验室 | Disease prognosis prediction system based on time sequence evolution process explanation |
CN117558414B (en) * | 2023-11-23 | 2024-05-24 | 之江实验室 | System, electronic device and medium for predicting early recurrence of multi-tasking hepatocellular carcinoma |
CN117971356B (en) * | 2024-03-29 | 2024-06-14 | 苏州元脑智能科技有限公司 | Heterogeneous acceleration method, device, equipment and storage medium based on semi-supervised learning |
CN118280601A (en) * | 2024-04-07 | 2024-07-02 | 佛山科学技术学院 | Anticancer drug sensitivity assessment method and system based on semi-supervised learning |
CN118522468B (en) * | 2024-07-22 | 2024-09-27 | 武汉市第三医院 | Blood concentration monitoring system based on machine learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944479A (en) * | 2017-11-16 | 2018-04-20 | 哈尔滨工业大学 | Disease forecasting method for establishing model and device based on semi-supervised learning |
CN110556178A (en) * | 2018-05-30 | 2019-12-10 | 西门子医疗有限公司 | decision support system for medical therapy planning |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897545B (en) * | 2017-01-05 | 2019-04-30 | 浙江大学 | A kind of tumor prognosis forecasting system based on depth confidence network |
CN108053398A (en) * | 2017-12-19 | 2018-05-18 | 南京信息工程大学 | A kind of melanoma automatic testing method of semi-supervised feature learning |
CN108564039A (en) * | 2018-04-16 | 2018-09-21 | 北京工业大学 | A kind of epileptic seizure prediction method generating confrontation network based on semi-supervised deep layer |
US10559386B1 (en) * | 2019-04-02 | 2020-02-11 | Kpn Innovations, Llc | Methods and systems for an artificial intelligence support network for vibrant constituional guidance |
CN110580695B (en) * | 2019-08-07 | 2022-06-21 | 深圳先进技术研究院 | Multi-mode three-dimensional medical image fusion method and system and electronic equipment |
CN111640510A (en) * | 2020-04-09 | 2020-09-08 | 之江实验室 | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis |
-
2020
- 2020-04-09 CN CN202010273957.9A patent/CN111640510A/en active Pending
-
2021
- 2021-01-21 WO PCT/CN2021/073136 patent/WO2021203796A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944479A (en) * | 2017-11-16 | 2018-04-20 | 哈尔滨工业大学 | Disease forecasting method for establishing model and device based on semi-supervised learning |
CN110556178A (en) * | 2018-05-30 | 2019-12-10 | 西门子医疗有限公司 | decision support system for medical therapy planning |
Non-Patent Citations (1)
Title |
---|
池胜强: "基于机器学习的结直肠癌预后模型及其泛化能力研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021203796A1 (en) * | 2020-04-09 | 2021-10-14 | 之江实验室 | Disease prognosis prediction system based on deep semi-supervised multi-task learning survival analysis |
TWI810510B (en) * | 2021-01-04 | 2023-08-01 | 鴻海精密工業股份有限公司 | Method and device for processing multi-modal data, electronic device, and storage medium |
CN112819768B (en) * | 2021-01-26 | 2022-06-17 | 复旦大学 | DCNN-based survival analysis method for cancer full-field digital pathological section |
CN112819768A (en) * | 2021-01-26 | 2021-05-18 | 复旦大学 | DCNN-based cancer full-field digital pathological section survival analysis method |
CN112906994A (en) * | 2021-04-19 | 2021-06-04 | 拉扎斯网络科技(上海)有限公司 | Order meal delivery time prediction method and device, electronic equipment and storage medium |
CN113314218A (en) * | 2021-06-22 | 2021-08-27 | 浙江大学 | Dynamic survival analysis equipment containing competition risk based on comparison |
WO2023284321A1 (en) * | 2021-07-15 | 2023-01-19 | 华为云计算技术有限公司 | Method and device for predicting survival hazard ratio |
CN113903466A (en) * | 2021-09-26 | 2022-01-07 | 新乡医学院第一附属医院 | Data processing device and system for auxiliary evaluation of risk degree of cardiovascular diseases of population and application of data processing device and system |
CN115188470A (en) * | 2022-06-29 | 2022-10-14 | 山东大学 | Multi-chronic disease prediction system based on multitask Cox learning model |
CN115565669A (en) * | 2022-10-11 | 2023-01-03 | 电子科技大学 | Cancer survival analysis method based on GAN and multitask learning |
CN116403714A (en) * | 2023-04-07 | 2023-07-07 | 大连市中心医院 | Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium |
CN116403714B (en) * | 2023-04-07 | 2024-01-26 | 大连市中心医院 | Cerebral apoplexy END risk prediction model building method and device, END risk prediction system, electronic equipment and medium |
CN116564524A (en) * | 2023-06-30 | 2023-08-08 | 之江实验室 | Pseudo tag evolution trend regular prognosis prediction device |
CN116564524B (en) * | 2023-06-30 | 2023-10-03 | 之江实验室 | Pseudo tag evolution trend regular prognosis prediction device |
CN118053047A (en) * | 2024-04-11 | 2024-05-17 | 浙江公路水运工程咨询集团有限公司 | Method and system for detecting unsupervised reconstruction network abnormality based on pseudo tag |
Also Published As
Publication number | Publication date |
---|---|
WO2021203796A1 (en) | 2021-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111640510A (en) | Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis | |
Callaway et al. | Fixation patterns in simple choice reflect optimal information sampling | |
CN109659033B (en) | Chronic disease state of an illness change event prediction device based on recurrent neural network | |
EP3620983B1 (en) | Computer-implemented method, computer program product and system for data analysis | |
CN109599177B (en) | Method for predicting medical treatment track through deep learning based on medical history | |
CN110119540B (en) | Multi-output gradient lifting tree modeling method for survival risk analysis | |
CN116340796B (en) | Time sequence data analysis method, device, equipment and storage medium | |
Li et al. | Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction | |
Salerno et al. | High-dimensional survival analysis: Methods and applications | |
Yue et al. | Bayesian Tobit quantile regression model for medical expenditure panel survey data | |
CN115903741A (en) | Data anomaly detection method for industrial control system | |
Enguehard | Learning perturbations to explain time series predictions | |
Li et al. | Life-cycle modeling driven by coupling competition degradation for remaining useful life prediction | |
Wu et al. | Imaging feature-based clustering of financial time series | |
Nayebi et al. | WindowSHAP: An efficient framework for explaining time-series classifiers based on Shapley values | |
Yeganeh et al. | Monitoring multistage healthcare processes using state space models and a machine learning based framework | |
Subhash et al. | Nonparametric estimation of quantile-based entropy function | |
Ferdous et al. | Cdans: Temporal causal discovery from autocorrelated and non-stationary time series data | |
Groha et al. | Neural odes for multi-state survival analysis | |
CN114580791B (en) | Method and device for identifying working state of bulking machine, computer equipment and storage medium | |
Budhathoki et al. | Accurate causal inference on discrete data | |
Zhang et al. | Hurdle modeling for defect data with excess zeros in steel manufacturing process | |
Pachal et al. | Sequence prediction under missing data: An RNN approach without imputation | |
CN115565669A (en) | Cancer survival analysis method based on GAN and multitask learning | |
Chown et al. | The nonparametric location-scale mixture cure model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200908 |