CN116386881A - Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient - Google Patents

Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient Download PDF

Info

Publication number
CN116386881A
CN116386881A CN202310211362.4A CN202310211362A CN116386881A CN 116386881 A CN116386881 A CN 116386881A CN 202310211362 A CN202310211362 A CN 202310211362A CN 116386881 A CN116386881 A CN 116386881A
Authority
CN
China
Prior art keywords
patient
model
risk
colorectal cancer
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310211362.4A
Other languages
Chinese (zh)
Inventor
何亚舟
罗志鹏
王自强
许川
舒驰
吴清彬
周燕虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310211362.4A priority Critical patent/CN116386881A/en
Publication of CN116386881A publication Critical patent/CN116386881A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting low-frequency bad prognosis ending of early colorectal cancer patients, and belongs to the technical field of neural networks. The method comprises the following steps: first selecting a risk profile for patient survival prediction; building a model M based on a neural network, and training and optimizing the model M by utilizing observed patient survival data and by defining a proper loss function; finally, the risk characteristics of the tested patient are input into the model M, and then the prediction result of the survival time of the tested patient is obtained. Aiming at the characteristic of low occurrence frequency of bad ending of early tumor patients, the method fills the technical blank in the field, and effectively overcomes the technical difficulties of small sample, small probability ending event and 'time-event' two-dimensional compound ending prediction by a proposed neural network model, proper loss function and random gradient descent method; compared with the traditional linear model COX regression method, the model prediction accuracy is further improved.

Description

Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient
Technical Field
The invention belongs to the field of neural networks, and particularly relates to a method and a system for predicting low-frequency bad prognosis ending of early colorectal cancer patients by using a neural network.
Background
The malignant tumor disease burden of China is increased year by year, and the malignant tumor disease burden is mainly represented by the increase of new cases and tumor related death cases year by year. Colorectal cancer is a malignant tumor with the second global cancer-related mortality rate, and the incidence and mortality rate of colorectal cancer in China are also increased year by year. The risk factors influencing prognosis outcome of the tumor patient are screened based on priori knowledge, so that the risk degree layering is carried out on the tumor patients with different characteristics, and the accurate prediction of the survival probability of the patients in a specific age is carried out, so that the clinical doctor can be helped to formulate individual disease monitoring and treatment strategies, and the method has important clinical value.
In recent years, with the wide implementation of colorectal cancer tumor early screening projects such as fecal occult blood and colonoscopes at home and abroad and the appearance of noninvasive early screening new technologies such as circulating tumor DNA (ctDNA), more and more colorectal cancer cases are found in the early stage of tumor (stage I). Statistics indicate that about 30% of current cases of colorectal cancer are diagnosed with stage I. Although stage I tumor patients generally have better survival outcomes after radical surgery, some patients still have poor prognosis. Literature data shows that about 5% -10% of patients develop adverse outcomes such as tumor recurrence, metastasis or death within five years. Therefore, how to accurately identify such patient populations at risk of developing poor outcome at the time of early tumor diagnosis is an important issue to be addressed.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for predicting low-frequency poor prognosis outcome for early colorectal cancer patients, which can accurately identify patient population at risk of poor outcome in early tumor diagnosis.
In order to solve the technical problems, the technical scheme of the invention is to adopt a method for predicting the low-frequency bad prognosis outcome of early colorectal cancer patients, which comprises the following steps:
selecting risk features for prediction; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen;
building a model M based on a neural network, and training and optimizing the model M by utilizing a training data set and a properly defined loss function;
and inputting the risk characteristics of the patient into the model M to obtain a prediction result.
As an improvement, the model M is a fully-connected neural network comprising H hidden layers, the input of the model M is the risk characteristic x of the patient, and the output is the probability distribution y of the occurrence of the adverse event of the patient;
let the input vector of hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T W k );
When k=h, for output vector z H And carrying out Softmax probability transformation to obtain probability distribution y of occurrence of the adverse event.
As a further improvement, the formula is used
Figure SMS_1
A Softmax probability transformation is performed, where y= [ y ] 1 ,y 2 …y m ],1≤r≤m。
As an improvement, training the model M by using a random gradient descent method includes:
let the loss function l=l (θ; D), where θ e Θ is a spatial parameter and D is a training dataset, i.e. observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure SMS_2
Optimum parameter theta * And (5) iteratively obtaining by using a random gradient descent method.
As a further development, the loss function is
L=αL 1 +(1-α)L 2
Wherein the method comprises the steps of
Figure SMS_3
Figure SMS_4
Where n is the total number of patients in the training dataset, subscript i is the ith patient, x i Is a risk feature of the patient, k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples, ρ, are either occurred or not, respectively 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
As an improvement, the formula is utilized
C td =P{S(s i |x i )<S(s i |x j )|s i <s j ,k i =1}
The output accuracy of the model M is evaluated, where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
The invention also provides a prediction system of low frequency office of early colorectal cancer patients, comprising:
the risk feature selection module is used for selecting risk features; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen;
and the prediction module is used for predicting probability distribution of occurrence of adverse events of the patient by using the input risk characteristics of the patient.
As an improvement, the prediction module includes:
the model building module is used for building a model M; the model M is a fully-connected neural network comprising H hidden layers, the input of the model M is the risk characteristic x of a patient, and the output is the probability distribution y of the occurrence of adverse events of the patient;
let the input vector of hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T W k );
When k=h, for output vector z H Performing Softmax probability transformation to obtain probability distribution y of occurrence of adverse events;
the training optimization module is used for training and optimizing the model M, and comprises a loss function L=L (theta; D), wherein theta epsilon theta is a space parameter, and D is a training data set, namely observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure SMS_5
Iterative acquisition of optimal parameter theta by random gradient descent method *
As an improvement, the training optimization module includes a loss function definition module for defining a loss function L, the loss function being
L=αL 1 +(1-α)L 2
Wherein the method comprises the steps of
Figure SMS_6
Figure SMS_7
Where n is the total number of patients in the training dataset, subscript i is the ith patient, x i Is a risk feature of the patient, k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples, ρ, are either occurred or not, respectively 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
As an improvement, the prediction module further includes:
model evaluation module for using formula
C td =P{S(s i |x i )<S(s i |x j )|s i <s j ,k i =1}
The output accuracy of the model M is evaluated, where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
The invention has the advantages that:
firstly, the invention can fill up the technical blank in the field aiming at the characteristic of low occurrence frequency of bad ending of early tumor patients, and effectively overcomes the technical difficulties of small sample, small probability ending event and 'time-event' two-dimensional compound ending prediction by a proposed neural network model random gradient descent method; compared with the traditional linear model COX regression method, the model prediction accuracy is further improved.
Second, the invention is based on neural network method, has the advantage of nonlinear fitting, and can realize better fitting of the model compared with the traditional generalized linear model (such as Cox model), thereby improving the prediction accuracy of the model.
Third, the time cost is low. The survival probability of different time nodes of the patient can be rapidly calculated on the basis of given input risk factor parameters.
Fourth, the simple operation. The risk factor characteristic screening method based on priori knowledge is adopted, the final inclusion is common patient characteristics in the work of clinicians, and the risk factor characteristic screening method can be updated in time along with the occurrence of new research evidence, is convenient for medical workers to read the calculation result, and is more convenient to use.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic structural diagram of the present invention.
FIG. 3 is a schematic diagram of the prediction result.
Detailed Description
In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the following specific embodiments.
The main characteristic of early (stage I) colorectal tumor is that most patients have generally better prognosis outcome after standardized surgery treatment, i.e. the adverse outcome such as recurrence, metastasis and death after treatment is a relatively small probability event. Statistical data indicate that the probability of the above adverse outcomes occurring within five years after surgery for patients with stage I colorectal cancer is approximately 5% -10%. Meanwhile, the prognosis outcome is a two-dimensional compound variable of Time-event (Time-to-event) in variable typing, namely, the outcome of a specific patient consists of whether the bad outcome occurs or not and specific Time two-dimensional information of the occurrence of the bad outcome. This further increases the technical difficulty of accurately predicting it.
The current tumor prognosis-related prediction model tools focus mainly on tumors that are staged for a particular anatomical location, including all early, mid, and late diagnoses. For example, publication No. CN112011616A describes a prognostic model for predicting post-operative survival of all staged hepatocellular carcinoma based on immune-related gene expression markers. Similar methods are equally applicable to colorectal cancer prognosis prediction, as the patent of publication No. CN111778337A describes a calculation method for predicting the prognosis risk scores of all staged colorectal cancers using 12 known tumor-stem related gene expression levels as risk factors. Since tumor staging is one of the main factors affecting tumor prognosis, failure to use staging as a main predictor in the above tool design process can potentially affect prediction accuracy. There are other tools that treat tumor stage as predictors in model tools, however such tools still identify the subject as tumor of all stages, and thus lack specificity for tumor characteristics of different stages. For example, a study from japan reports a calculation method for predicting the overall survival rate of a patient using clinical information such as tumor stage, body mass index and history of diabetes in a colorectal cancer patient as risk factors. Because the tools lack mining on the specific characteristics of tumor patients in different stages, and the tumor stages of the patients in the same stage can not be used as effective prediction factors, the prediction effect of the patients applied to a specific stage is often poor.
In order to solve the above technical problems, as shown in fig. 1, the present invention provides a method for predicting low-frequency poor prognosis outcome of early colorectal cancer patients, specifically comprising:
s1, selecting risk characteristics for prediction; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen.
Given that early tumor patients have a low probability of occurrence of poor outcome events (often less than 10%), too many risk features (features) may seriously affect prediction accuracy, the present invention adopts a double-layer screening strategy of "basic variable+priori knowledge" to select 8 risk features in total for predicting stage I colorectal cancer poor outcome probability. Colorectal cancer prognosis risk factors recommended by the national tumor Cooperation network (NCCN) and the European clinical oncology society (ESMO) were first selected: tumor T stage (T1 vs. T2), tumor Grade (Grade G1 or G2 vs. G3 or G4); tumor nerve invasion (PNI), number of lymph nodes (12 vs. < 12) and pre-operative carcinoembryonic antigen (CEA <5ng/ml vs. Gtoreq.5 ng/ml); the basic variables include the age, sex and tumor diameter at the time of tumor diagnosis.
In addition, the present invention selects the predicted adverse outcome to be total mortality (overall death) and colorectal cancer-related mortality (CRC-specific death). All cause death is defined as death of any cause in a subject over a period of observation, whereas colorectal cancer-related death is directly due to colorectal cancer. The corresponding total survival probability (overall survival) and colorectal cancer associated survival rate (CRC-specific survival) are defined as "1-mortality", and the prediction probability generated by the present invention is the total survival rate and colorectal cancer associated survival rate of individuals who meet the fixed risk profile at a given time node (e.g., 5 years after tumor diagnosis).
S2, building a model M, and training and optimizing the model M by using a training data set and a properly defined loss function.
In this embodiment, the model M is a fully connected neural network including H hidden layers, the input of the model M is the risk feature x of the patient, and the output is the probability distribution y of occurrence of adverse events of the patient;
let the input vector of hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T W k );
When k=h, for output vector z H And carrying out Softmax probability transformation to obtain probability distribution y of occurrence of the adverse event.
Specifically, the formula is utilized
Figure SMS_8
A Softmax probability transformation is performed, where y= [ y ] 1 ,y 2 …y m ],1≤r≤m。
After the prediction model is built, training and optimizing the model are needed to improve the prediction accuracy. The invention adopts a random gradient descent method to train the model M, and comprises the following steps:
let the loss function l=l (θ; D), where θ e Θ is a spatial parameter and D is a training dataset, i.e. observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure SMS_9
Optimum parameter theta * Iterative acquisition by using a random gradient descent method, specifically setting theta 0 For random initial model parameters, then a finite and sufficient number of iterations θ t Will then approach theta * . The iteration rule is as follows:
Figure SMS_10
where beta is the learning step size and,
Figure SMS_11
the gradient of the model parameters is determined for the loss function.
In the present invention, the loss function L is composed of two parts L 1 And L 2 The composition is formed. First, L 1 Based on a negative log-likelihood function and requiring a more accurate fit to the small probability of adverse events that have occurred, namely:
Figure SMS_12
where n is the total number of patients in the training dataset, subscript i is the ith patient, x i Is a risk feature of the patient, k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples, ρ, are either occurred or not, respectively 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
Second part L 2 Is a contrast ranking loss function for a known determination of survival time, i.e., for two different patients (x i ,s i ,k i ) And (x) j ,s j ,k j ) If k i =1 (i.e. s i Is a determined length of time to live) then regardless of k j Whether or not it is 1, provided that s i <s j The survival probability S (S) i |x i ) Also should satisfy a value less than S (S) j |x j ). Thus, there are:
Figure SMS_13
the final loss function is a combination of the two, and is controlled by the super parameter alpha E [0,1], namely:
L=αL 1 +(1-α)L 2
and finally, evaluating the output accuracy of the model M after training and optimization. The invention adopts the time-dependent C-index (C) td ) As a main parameter for evaluating the performance of the model. C (C) td Given are the probabilities of whether the magnitude of two survival times predicted by the model and the magnitude of the real survival time are consistent for any two comparable samples. Namely:
C td =P{S(s i |x i )<S(s i |x j )|s i <s j ,k i =1}
where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
The data of 9015 patients with stage I colorectal cancer in The national cancer registration monitoring system (The Surveillance, epidemic, and End Results, SEER) were used according to 3: the 1 scale is randomly divided into a training data set and a test set. The risk prediction factors were the 8 risk factors mentioned previously (age, sex, tumor diameter, tumor grade, T-stage, CEA, PNI, and number of lymph nodes examined). The predicted outcome event is total cause death (OS) and colorectal cancer-related death (CRCD). The control group comparison method is a classical COX proportional risk regression model. Comparative experimental results the above table is the predicted C for different event occurrence rates td Index performance (higher value is better) is shown.
Types of adverse events Incidence of event COX model Model M
OS 17.6% 0.7382 0.7475
CRCD 4.2% 0.6725 0.6846
The table above is C predicted for different event occurrence rates td Index performance (higher value is better).
From this, it can be seen that the model M provided by the present invention has significantly better predictive performance over the COX model on both low frequency events. It is worth mentioning that in the prediction of CRCD, the occurrence rate of adverse events is only 4.2%, and the proposed model can still perform normally and has better performance than COX model (in case of lower event rate, the improvement of accuracy rate of 0.1% is very difficult). Thus, the proposed model has significant advantages in the prediction of low probability bad outcomes.
S3, inputting the risk characteristics of the patient into the model M to obtain a prediction result.
Let the risk profile x= [ age=58 years, sex=male, tumor grade=g3 or G4, tumor size=3 cm, pni=yes, lymph node for delivery=14, cea=10 ng/ml, T stage=t1 for a newly diagnosed colon cancer patient]Put x as input into model M * Y=m can be obtained * (x) The distribution is shown in fig. 3, from which it can be inferred that the most likely survival time of the patient is around five years.
As shown in fig. 2, the present invention further provides a system for predicting low frequency office of early colorectal cancer patients, comprising:
the risk feature selection module is used for selecting risk features; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen;
a prediction module for predicting a probability distribution of occurrence of a patient adverse event using an input patient risk feature, comprising:
the model building module is used for building a model M; the model M is a fully-connected neural network comprising H hidden layers, the input of the model M is the risk characteristic x of a patient, and the output is the probability distribution y of the occurrence of adverse events of the patient;
let the input vector of hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T w k );
When k=h, for output vector z H And carrying out Softmax probability transformation to obtain probability distribution y of occurrence of the adverse event.
The training optimization module is used for training and optimizing the model M, and comprises a loss function L=L (theta; D), wherein theta epsilon theta is a space parameter, and D is a training data set, namely observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure SMS_14
Iterative acquisition of optimal parameter theta by random gradient descent method * . The training optimization module comprises a loss function definition module for defining a loss function L, wherein the loss function is that
L=αL 1 +(1-α)L 2
Wherein the method comprises the steps of
Figure SMS_15
Figure SMS_16
Where n is the total number of patients in the training dataset, subscript i is the ith patient, x i Is a risk feature of the patient, k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples, ρ, are either occurred or not, respectively 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
Model evaluation module for using formula
C td =P{S(s i |x i )<S(s i |x j )|s i <s j ,k i =1}
The output accuracy of the model M is evaluated, where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that the above-mentioned preferred embodiment should not be construed as limiting the invention, and the scope of the invention should be defined by the appended claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (10)

1. A method for predicting low frequency poor prognosis outcome in an early stage colorectal cancer patient, comprising:
selecting risk features for prediction; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen;
building a model M based on a neural network, and training and optimizing the model M by utilizing a training data set and a loss function;
and inputting the risk characteristics of the patient into the model M to obtain a prediction result.
2. A method of predicting low frequency poor prognosis outcome for patients with early colorectal cancer according to claim 1, wherein:
the model M is a fully-connected neural network comprising H hidden layers, the input of the model M is the risk characteristic x of a patient, and the output is the probability distribution y of the occurrence time of adverse events of the patient;
let the input vector of a hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T w k );
When k=h, for output vector z H And carrying out Softmax probability transformation so as to obtain probability distribution y of occurrence of the adverse event.
3. A method of predicting low frequency poor prognosis outcome for patients with early colorectal cancer according to claim 2, wherein:
using the formula
Figure FDA0004112819640000011
A Softmax probability transformation is performed, where y= [ y ] 1 ,y 2 …y m ],1≤r≤m。
4. A method for predicting low frequency poor prognosis outcome for patients with early colorectal cancer according to claim 1, characterized in that the training of model M with a stochastic gradient descent method comprises:
let the loss function l=l (θ; D), where θ e Θ is a spatial parameter and D is a training dataset, i.e. observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure FDA0004112819640000021
Optimum parameter theta * And (5) iteratively obtaining by using a random gradient descent method.
5. A method of predicting low frequency poor prognosis outcome for patients with early colorectal cancer according to claim 1, wherein:
the loss function is
L=αL 1 +(1-α)L 2
Wherein the method comprises the steps of
Figure FDA0004112819640000022
Figure FDA0004112819640000023
Where n is the total number of patients in the training dataset, subscript i is the ith patient, x i As a risk feature of the patient,k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples, ρ, are either occurred or not, respectively 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
6. A method of predicting low frequency poor prognosis outcome for patients with early colorectal cancer according to claim 1, wherein:
using the formula
C td =p{S(s |x )<S(s t |x )|s t <s ,k i =1}
The output accuracy of the model M is evaluated, where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
7. A system for predicting low frequency offices in early colorectal cancer patients, comprising:
the risk feature selection module is used for selecting risk features; the risk features include basal features and clinical pathology features; the basal characteristics include age, sex, tumor size; the clinical pathological characteristics comprise tumor T stage, tumor grading, tumor nerve invasion, lymph number examination and preoperative embryo antigen;
and the prediction module is used for predicting probability distribution of occurrence of adverse events of the patient by using the input risk characteristics of the patient.
8. A prediction system for the low frequency office of an early stage colorectal cancer patient according to claim 7, characterized in that the prediction module comprises:
the model building module is used for building a model M; the model M is a fully-connected neural network comprising H hidden layers, the input of the model M is the risk characteristic x of a patient, and the output is the probability distribution y of the occurrence of adverse events of the patient;
let the input vector of hidden layer k be z k-1 The output vector is z k Wherein 1.ltoreq.k.ltoreq.H, and z when k=1 1 =x; model parameter w k Sigma is the activation function, then there is
z k =σ(z k-1 T w k );
When k=h, for output vector z H Performing Softmax probability transformation to obtain probability distribution y of occurrence of adverse events;
the training optimization module is used for training and optimizing the model M, and comprises a loss function L=L (theta; D), wherein theta epsilon theta is a space parameter, and D is a training data set, namely observed patient survival time data;
trained optimal model M * =M θ* Wherein
Figure FDA0004112819640000041
Iterative acquisition of optimal parameter theta by random gradient descent method *
9. The prediction system for low frequency office of early stage colorectal cancer patients according to claim 8, wherein the training optimization module comprises a loss function definition module for defining a loss function L, the loss function being
L=αL 1 +(1-α)L 2
Wherein the method comprises the steps of
Figure FDA0004112819640000042
Figure FDA0004112819640000043
Where n is the total number of patients in the training dataset, subscript i is the ith patient, x i Is a risk feature of the patient, k i =1 represents patient death, k i =0 represents that the patient still survived; y is i [s i ]Representing predicted time to live equal s i Probability of (2); s (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of (2); ρ 1 And ρ 2 The weights of the event samples which are happened or not happened respectively satisfy ρ 12 >0; super parameter alpha E [0,1]]Balance L 1 And L is equal to 2 Influence on model optimization.
10. The system for predicting low frequency office in an early stage colorectal cancer patient of claim 8, wherein the prediction module further comprises:
model evaluation module for using formula
C td =P{S(s i |x i )<S(s i |x j )|s i <s j ,k i =1}
The output accuracy of the model M is evaluated, where P (|) is a conditional probability distribution function, x i Is a risk feature of the patient s i Is the predicted patient survival time, S (|) is the survival function, S (S) i |x i ) Patient i's survival time is greater than s, which is model predicted i Probability of k i Is a label of whether the patient is dead.
CN202310211362.4A 2023-03-07 2023-03-07 Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient Pending CN116386881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310211362.4A CN116386881A (en) 2023-03-07 2023-03-07 Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310211362.4A CN116386881A (en) 2023-03-07 2023-03-07 Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient

Publications (1)

Publication Number Publication Date
CN116386881A true CN116386881A (en) 2023-07-04

Family

ID=86960608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310211362.4A Pending CN116386881A (en) 2023-03-07 2023-03-07 Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient

Country Status (1)

Country Link
CN (1) CN116386881A (en)

Similar Documents

Publication Publication Date Title
Orucevic et al. Nomogram update based on TAILORx clinical trial results-Oncotype DX breast cancer recurrence score can be predicted using clinicopathologic data
KR102190299B1 (en) Method, device and program for predicting the prognosis of gastric cancer using artificial neural networks
Wu et al. Development and validation of a 32-gene prognostic index for prostate cancer progression
Meng et al. Application of radiomics for personalized treatment of cancer patients
WO2022036869A1 (en) Method and system for predicting pathological complete remission probability of breast cancer after neoadjuvant chemotherapy
CN110916666B (en) Imaging omics feature processing method for predicting recurrence of hepatocellular carcinoma after surgical resection
CA3092303A1 (en) A computer-implemented method of analysing genetic data about an organism
Soria et al. A quantifier-based fuzzy classification system for breast cancer patients
WO2014151626A1 (en) Electronic variant classification
Dag et al. A probabilistic data analytics methodology based on Bayesian Belief network for predicting and understanding breast cancer survival
Imani et al. Random forest modeling for survival analysis of cancer recurrences
CN109979532B (en) Thyroid papillary carcinoma distant metastasis molecular mutation prediction model, method and system
Zhu et al. Imaging-genetic data mapping for clinical outcome prediction via supervised conditional gaussian graphical model
Kumar et al. Integrating Diverse Omics Data Using Graph Convolutional Networks: Advancing Comprehensive Analysis and Classification in Colorectal Cancer
Inoue et al. Combining longitudinal studies of PSA
Loya et al. Uncertainty estimation in cancer survival prediction
Jarman et al. An integrated framework for risk profiling of breast cancer patients following surgery
CN116386881A (en) Method and system for predicting low-frequency poor prognosis outcome of early colorectal cancer patient
Tournoud et al. A strategy to build and validate a prognostic biomarker model based on RT-qPCR gene expression and clinical covariates
Kim et al. A new latent cure rate marker model for survival data
JP2024501141A (en) Computer-implemented methods and apparatus for analyzing genetic data
Bhattacharjee et al. A combined iterative sure independence screening and Cox proportional hazard model for extracting and analyzing prognostic biomarkers of adenocarcinoma lung cancer
US20200105374A1 (en) Mixture model for targeted sequencing
Diamand et al. Expanding Active Surveillance Criteria for Low-and Intermediate-risk Prostate Cancer: Can We Accurately Predict the Risk of Misclassification for Patients Diagnosed by Multiparametric Magnetic Resonance Imaging–targeted Biopsy?
Zou et al. A Bayesian hierarchical non‐linear regression model in receiver operating characteristic analysis of clustered continuous diagnostic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination