CN114418420A - Competitive risk survival analysis method based on causal inference - Google Patents
Competitive risk survival analysis method based on causal inference Download PDFInfo
- Publication number
- CN114418420A CN114418420A CN202210085862.3A CN202210085862A CN114418420A CN 114418420 A CN114418420 A CN 114418420A CN 202210085862 A CN202210085862 A CN 202210085862A CN 114418420 A CN114418420 A CN 114418420A
- Authority
- CN
- China
- Prior art keywords
- survival analysis
- causal
- model
- risk
- competition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
Abstract
The invention discloses a competition risk survival analysis method based on causal inference, which comprises the following steps: building a structured cause and effect model according to the competition risk survival analysis model; identifying confusion factors existing in the competition risk survival analysis model and backgate paths generated by the confusion factors according to the structured causal model; carrying out causal intervention on the competition risk survival analysis model through backdoor adjustment to remove a confounding factor in the model; defining a loss function of the competitive risk survival analysis model, and correcting the loss function to obtain a loss function for the prognosis of the fruit stem; and minimizing a loss function for the prognosis of the fruit stem to realize training optimization of the competitive risk survival analysis model. The invention discloses a competitive risk survival analysis method based on causal inference, which corrects the existing competitive risk survival analysis model from the causal angle by using a structured causal model, and learns a deviation-free survival model by adopting a causal inference mode and a backdoor adjustment formula.
Description
Technical Field
The invention belongs to the field of data processing, and particularly relates to a competitive risk survival analysis method based on causal inference.
Background
Survival analysis is a collection of data analysis techniques that aim at analyzing the relationship between covariates and hit times of events of interest. Survival analysis methods include statistical methods to machine learning, and deep learning methods in recent years. Various survival analysis methods are now widely used in various fields including medicine, recommendation systems, and economics, among others.
Traditional statistical survival analysis methods, such as the Cox proportional hazards model (CPH), while achieving great success, lack the ability to handle competing risk problems, i.e., environments in which multiple events of interest exist. Competitive risk is a class of events that either hinders the observation of an event of interest or alters the likelihood that the event occurs, and therefore plays a crucial role in estimating survival time. To solve the problem of competitive risk in survival analysis, Fine and Gray first proposed in 1999 a statistical survival analysis model based on competitive risk, a Fine-Gray model, and a deep learning model proposed in recent years, such as the deep hit model and its Dynamic survival analysis-based variant (Dynamic deep hit), and related documents have demonstrated the feasibility of these models in downstream tasks.
Despite many advances, the existing survival analysis model for competitive risk has a significant drawback that the competitive risk is a confounding factor, which misleads the survival analysis model to learn the false correlation between the covariate X and the event Y when capturing the causal relationship between the covariate and the event of interest, resulting in the performance of the model being degraded. While confounders are advantageous for capturing the fundamental relationship between X and Y through model computation P (Y | X), it may falsely extract event-independent and event-dependent covariates.
Disclosure of Invention
The invention provides a competition risk survival analysis method based on causal inference, which solves the technical problems and specifically adopts the following technical scheme:
a competitive risk survival analysis method based on causal inference comprises the following steps:
building a structured cause and effect model according to the competition risk survival analysis model;
identifying confusion factors existing in the competition risk survival analysis model and backgate paths generated by the confusion factors according to the structured causal model;
carrying out causal intervention on the competition risk survival analysis model through backdoor adjustment to remove a confounding factor in the model;
defining a loss function of the competitive risk survival analysis model, and correcting the loss function to obtain a loss function for the prognosis of the fruit stem;
and minimizing a loss function for the prognosis of the fruit stem to realize training optimization of the competitive risk survival analysis model.
Further, the specific method for building the structured cause and effect model comprises the following steps:
taking a covariate X, a competitive risk R, a potential representation C, an occurrence event Y and an occurrence time T as nodes, and connecting the nodes through line segments with arrows, wherein the directions of the arrows represent causal relationships among the nodes.
Further, in the structured causal model:
x → C characterizes the learning process;
x → Y ← C denotes the survival analysis process;
x → R ← Y represents the back door path;
r → X indicates that the competitive risk R causes a change in the covariate X of the corresponding subject;
r → Y indicates that the event that ultimately occurs is from the competing risk R.
Further, the confounding factor is a competitive risk R, which is a set of competitive eventsIt represents a set of competing risks for subject i, eachRepresenting a competitive risk for subject i.
Further, the specific method for performing causal intervention on the competition risk survival analysis model through backdoor adjustment to remove the confounding factor in the model comprises the following steps:
the causal intervention P (Y | do (X)) is used to eliminate the effect of the confounder R on the occurrence Y, where do (X) represents the intervention on the covariate X, eliminating the causal edge of the confounder R to the covariate X, eliminating the back gate path of the confounder R to the covariate X and the occurrence Y.
Further, P (Y | do (X)) is calculated by adopting a backdoor adjustment formula to eliminate the influence of the confusion factor R in the competition risk survival analysis model,
wherein, P (Y)i|XiAnd r) is the probability of occurrence of an event Y with a covariate XiAnd event r, p (r) represents the a priori distribution of r in the sample data set.
Further, performing cause-outcome prognosis, and designing a cause-effect-based specific sub-network of the competition risk survival analysis model, wherein the cause-effect-based specific sub-network comprises K feedforward networks composed of fully-connected layers, and each feedforward network is output through a softmax layer:namely event yiAt time t1.t2,...,tmaxProbability distribution of occurrence of P (y)i,T=t|do(Xi) Can be calculated as:
at time τ*Defining an event yiThe cumulative incidence function for the prognosis of dry fruit of (1) is defined as:
wherein, t*Representing the time of the last measurement in all longitudinal observations of the object,indicating that a deletion occurred.
where I (·) is an indicator function.
The method has the advantages that the method for analyzing the competitive risk survival based on causal inference is provided, the existing competitive risk survival analysis model is corrected from the causal point of view by the structured causal model, and covariates which really cause the event to occur are captured by using P (Y | do (X)) instead of P (Y | X). And learning a deviation-removed survival model by adopting a causal inference mode through a back door adjustment formula. Namely, the optimization model obtained according to the causal inference is not influenced by false correlation.
Drawings
FIG. 1 is a schematic diagram of a causal inference based competitive risk survival analysis method of the present invention;
FIG. 2 is a schematic diagram of a structured causal model of the present invention;
FIG. 3 is a schematic diagram of the model structure of the present invention after improvement of the competitive risk survival analysis model by fruit stem prognosis.
Detailed Description
The invention is described in detail below with reference to the figures and the embodiments.
Fig. 1 shows a method for analyzing competitive risk survival based on causal inference, which includes: s1: and building a Structured Causal Model (SCM) according to the competition risk survival analysis model. S2: and identifying confusion factors existing in the competition risk survival analysis model and backgate paths generated by the confusion factors according to the structured causal model. S3: and carrying out causal intervention on the competition risk survival analysis model through backdoor adjustment to remove the confounding factor in the model. S4: and defining a loss function of the competition risk survival analysis model, and correcting the loss function to obtain the loss function for the prognosis of the fruit stem. S5: and minimizing a loss function for the prognosis of the fruit stem to realize training optimization of the competitive risk survival analysis model. Through the steps, the existing competitive risk survival analysis model is corrected by the structured causal model from the causal angle, and a deviation-free survival model is learned by a backdoor adjustment formula in a causal inference mode. Namely, the optimization model obtained according to the causal inference is not influenced by false correlation. The above steps are specifically described below.
For step S1: and building a structured cause and effect model according to the competition risk survival analysis model.
The structural causal model is typically represented as a structural causal graph to represent causal relationships between variables. The specific method for building the structured cause and effect model comprises the following steps:
taking a covariate X, a competitive risk R, a potential representation C, an occurrence event Y and an occurrence time T as nodes, and connecting the nodes through line segments with arrows, wherein the directions of the arrows represent causal relationships among the nodes. As shown in fig. 2 (a).
The structured causal model expresses the survival analysis process as two steps: a characterization learning step and a competition risk related survival analysis step.
Specifically, in the structured causal model, X → C characterizes the learning process, where C represents a potential characterization captured from the covariate X of the subject. A competitive risk survival analysis model will take the covariate X as input into a shared sub-network (usually consisting of multiple layers of RNN networks). The sharing sub-network extracts the potential characterization C from the longitudinal data X of the target object, and then the model takes the residual connection of the potential characterization C and the object observation data X as input to estimate the joint distribution of the first hit time and the competitive risk. X → Y ← C denotes the survival analysis process. Wherein the survival state of the subject depends on the learned covariates X that potentially characterize C and the subject. X → R ← Y denotes the back door path. R → X indicates that the competitive risk R causes a change in the covariate X of the corresponding subject. R → Y indicates that the event that ultimately occurs is from the competing risk R. The cumulative occurrence function (CIF) is used in a competition risk survival analysis model to estimate the probability of occurrence of each competition event. CIF denotes the probability that a particular event occurs at or before time T.
For step S2: and identifying confusion factors existing in the competition risk survival analysis model and backgate paths generated by the confusion factors according to the structured causal model.
Specifically, the confounding factor is the competition risk R, which is the set of competition eventsIt represents a set of competing risks for subject i, eachOne competitive risk that represents subject i is a confounder that confounds the competitive risk survival analysis model to capture the true causal relationship between covariate X and occurrence Y. By using a causal structure model, it is possible to visually reveal how confounders affect the prediction process, and to use a back-gate tuning strategy to eliminate its impact on the survival analysis with competing risks.
From the description of the structured causal model, the causal relationship between X and Y is perturbed by the obfuscator R through the back gate path X ← R → Y, which may introduce false correlations and lead to competitive risk bias. I.e. when maximizing P (Y ═ r)*T is less than or equal to T | X, C, R) and R*(r*E R) is the occurrence of the object, we want the model to depend on the actual causal path X → Y ← C. However, the back gate path X → R ← Y, introduces a false correlation between Y and X, which misleads the model to over-fit the event of interest R to covariate X and results in an under-fit survival analysis model. Thus, the observed P (Y | X), i.e. the probability of an event Y occurring with a covariate X, cannot reflect the true causal relationship between X and Y due to the presence of the confounding factor R.
For step S3: and carrying out causal intervention on the competition risk survival analysis model through backdoor adjustment to remove the confounding factor in the model.
Specifically, the specific method for performing causal intervention on the competition risk survival analysis model through backdoor adjustment to remove the confounding factor in the model includes:
and (3) eliminating the influence of the confusion factor R on the occurrence event Y by using causal intervention P (Y | do (X)), wherein do (X) represents the intervention on the covariate X, the causal edge of the confusion factor R to the covariate X is eliminated, the back gate path of the confusion factor R to the covariate X and the occurrence event Y is eliminated, and the purpose of removing the confusion factor is achieved. P (Y | do (X)) calculates the probability of occurrence of event Y after causal intervention covariate X, compared to P (Y | X). Therefore, P (Y | do (x)) can be calculated using a backdoor adjustment formula to eliminate the effect of the confusion factor R in the competition risk survival analysis model. As shown in figure 2(b) specifically,
wherein, P (Y)i|XiAnd r) is the probability of occurrence of an event Y with a covariate XiAnd event r, p (r) represents the a priori distribution of r in the sample data set. According to the do operator rule, the derivation of this formula is as follows:
by the formula, P (Y) is calculatedi|XiR) and p (R), the effect of the aliasing factor R can be removed according to the back door adjustment formula.
For step S4: and defining a loss function of the competition risk survival analysis model, and correcting the loss function to obtain the loss function for the prognosis of the fruit stem.
And (4) carrying out the prognosis of the cause, and designing a specific cause sub-network based on causal intervention of the competitive risk survival analysis model. The model shown in FIG. 3 is the result of the improved competitive risk survival analysis model using the prognosis of fruit stemAnd (5) forming. The model includes shared sub-networks and cause-specific sub-networks based on causal intervention. In particular, the shared sub-network is used for characterization extraction and consists of a multi-layer RNN network to encode the input covariate X to obtain a characterization C of the covariate. The causal subnetwork based on causal intervention comprises K feedforward networks composed of fully connected layers, each feedforward network being output through a softmax layer:namely event yiAt time t1.t2,...,tmaxProbability distribution of occurrence of P (y)i,T=t|do(Xi) Can be calculated as:
at time τ*Defining an event yiThe Cumulative Incidence Function (CIF) of prognosis of stem of fruit is defined as:
wherein, t*Representing the time of the last measurement in all longitudinal observations of the object,indicating that a deletion occurred. The formula is established under the condition that this object survives to the last measurement time.
The log-likelihood loss function of the modified competitive risk survival analysis model using the fruit stem prognosis is:
where I (·) is an indicator function. The first term of the formula captures the information provided by the non-deleted objects and the second term is established under the condition that the deleted objects survive the last time the time was measured.
For step S5: and minimizing a loss function for the prognosis of the fruit stem to realize training optimization of the competitive risk survival analysis model.
And training a competition risk survival analysis model based on causal inference by using a survival analysis data set, and verifying the result.
The competition risk survival analysis method based on causal inference further comprises the following steps: and inputting data to be analyzed into the trained competitive risk survival analysis model to obtain a prediction result.
It can be understood that through the trained competitive risk survival analysis model, more accurate prediction results can be obtained after data is input.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.
Claims (8)
1. A method for analyzing competitive risk survival based on causal inference, comprising:
building a structured cause and effect model according to the competition risk survival analysis model;
identifying confusion factors existing in the competition risk survival analysis model and back-door paths generated by the confusion factors according to the structured causal model;
carrying out causal intervention on the competition risk survival analysis model through backdoor adjustment to remove a confounding factor in the model;
defining a loss function of the competition risk survival analysis model, and correcting the loss function to obtain a loss function for prognosis of the fruit stem;
and minimizing a loss function for the prognosis of the fruit stem to realize training optimization of the competition risk survival analysis model.
2. The causal inference based competitive risk survival analysis method of claim 1,
the specific method for constructing the structured cause and effect model comprises the following steps:
taking a covariate X, a competitive risk R, a potential representation C, an occurrence event Y and an occurrence time T as nodes, and connecting the nodes through line segments with arrows, wherein the directions of the arrows represent causal relationships among the nodes.
3. The causal inference based competitive risk survival analysis method of claim 2,
in the structured causal model:
x → C characterizes the learning process;
x → Y ← C denotes the survival analysis process;
x → R ← Y represents the back door path;
r → X indicates that the competitive risk R causes a change in the covariate X of the corresponding subject;
r → Y indicates that the event that ultimately occurs is from the competing risk R.
5. The causal inference based competitive risk survival analysis method of claim 4,
the specific method for performing causal intervention on the competition risk survival analysis model through backdoor adjustment to remove the confounding factors in the model comprises the following steps:
the causal intervention P (Y | do (X)) is used to eliminate the effect of the confounder R on the occurrence Y, where do (X) represents the intervention on the covariate X, eliminating the causal edge of the confounder R to the covariate X, eliminating the back gate path of the confounder R to the covariate X and the occurrence Y.
6. The causal inference based competitive risk survival analysis method of claim 5,
p (Y | do (X)) is calculated by adopting a back door adjustment formula to eliminate the influence of a confusion factor R in a competition risk survival analysis model,
wherein, P (Y)i|XiAnd r) is the probability of occurrence of an event Y with a covariate XiAnd event r, p (r) represents the a priori distribution of r in the sample data set.
7. The causal inference based competitive risk survival analysis method of claim 6,
performing cause-outcome prognosis, and designing a cause-effect-based specific sub-network of the competition risk survival analysis model, wherein the cause-effect-based specific sub-network comprises K feedforward networks composed of full connection layers, and each feedforward network outputs through a softmax layer:namely event yiAt time t1.t2,...,tmaxProbability distribution of occurrence of P (y)i,T=t|do(Xi) Can be calculated as:
at time τ*Defining an event yiThe cumulative incidence function for the prognosis of dry fruit of (1) is defined as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210085862.3A CN114418420A (en) | 2022-01-25 | 2022-01-25 | Competitive risk survival analysis method based on causal inference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210085862.3A CN114418420A (en) | 2022-01-25 | 2022-01-25 | Competitive risk survival analysis method based on causal inference |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114418420A true CN114418420A (en) | 2022-04-29 |
Family
ID=81277359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210085862.3A Pending CN114418420A (en) | 2022-01-25 | 2022-01-25 | Competitive risk survival analysis method based on causal inference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114418420A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116307274A (en) * | 2023-05-18 | 2023-06-23 | 北京航空航天大学 | Urban area energy consumption prediction method considering causal intervention |
-
2022
- 2022-01-25 CN CN202210085862.3A patent/CN114418420A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116307274A (en) * | 2023-05-18 | 2023-06-23 | 北京航空航天大学 | Urban area energy consumption prediction method considering causal intervention |
CN116307274B (en) * | 2023-05-18 | 2023-08-18 | 北京航空航天大学 | Urban area energy consumption prediction method considering causal intervention |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109146246B (en) | Fault detection method based on automatic encoder and Bayesian network | |
CN111275288A (en) | XGboost-based multi-dimensional data anomaly detection method and device | |
CN116738868B (en) | Rolling bearing residual life prediction method | |
CN115205689A (en) | Improved unsupervised remote sensing image anomaly detection method | |
CN112132430A (en) | Reliability evaluation method and system for distributed state sensor of power distribution main equipment | |
CN115062272A (en) | Water quality monitoring data abnormity identification and early warning method | |
CN115051929B (en) | Network fault prediction method and device based on self-supervision target perception neural network | |
CN114418420A (en) | Competitive risk survival analysis method based on causal inference | |
CN110795599B (en) | Video emergency monitoring method and system based on multi-scale graph | |
Ibrahim et al. | Fractional calculus-based slime mould algorithm for feature selection using rough set | |
Zhu et al. | RGCNU: Recurrent Graph Convolutional Network With Uncertainty Estimation for Remaining Useful Life Prediction | |
CN116468174A (en) | Flight parameter prediction and confidence evaluation method | |
JP6398991B2 (en) | Model estimation apparatus, method and program | |
CN111340196A (en) | Countermeasure network data generation method and abnormal event detection method | |
CN110990383A (en) | Similarity calculation method based on industrial big data set | |
CN113328881B (en) | Topology sensing method, device and system for non-cooperative wireless network | |
CN115578325A (en) | Image anomaly detection method based on channel attention registration network | |
CN115374931A (en) | Deep neural network robustness enhancing method based on meta-countermeasure training | |
CN114169433A (en) | Industrial fault prediction method based on federal learning + image learning + CNN | |
CN112286169B (en) | Industrial robot fault detection method | |
CN109886292B (en) | Abnormal reason diagnosis method based on abnormal association graph | |
CN115965823B (en) | Online difficult sample mining method and system based on Focal loss function | |
CN115174421B (en) | Network fault prediction method and device based on self-supervision unwrapping hypergraph attention | |
CN117333726B (en) | Quartz crystal cutting abnormality monitoring method, system and device based on deep learning | |
CN117688496B (en) | Abnormality diagnosis method, system and equipment for satellite telemetry multidimensional time sequence data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |