CN113517073A - Method and system for predicting survival rate after lung cancer surgery - Google Patents

Method and system for predicting survival rate after lung cancer surgery Download PDF

Info

Publication number
CN113517073A
CN113517073A CN202111071269.5A CN202111071269A CN113517073A CN 113517073 A CN113517073 A CN 113517073A CN 202111071269 A CN202111071269 A CN 202111071269A CN 113517073 A CN113517073 A CN 113517073A
Authority
CN
China
Prior art keywords
data
lung cancer
risk factor
regression analysis
clinical data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111071269.5A
Other languages
Chinese (zh)
Other versions
CN113517073B (en
Inventor
何建行
梁文华
李坚福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute Of Respiratory Health
Bioisland Laboratory
Original Assignee
Guangzhou Institute Of Respiratory Health
Bioisland Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute Of Respiratory Health, Bioisland Laboratory filed Critical Guangzhou Institute Of Respiratory Health
Priority to CN202111071269.5A priority Critical patent/CN113517073B/en
Publication of CN113517073A publication Critical patent/CN113517073A/en
Application granted granted Critical
Publication of CN113517073B publication Critical patent/CN113517073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computing Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Physiology (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Mathematical Physics (AREA)
  • Veterinary Medicine (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)

Abstract

The embodiment of the disclosure discloses a method and a system for predicting survival rate after lung cancer surgery. Wherein, the method for predicting the survival rate of the lung cancer after the operation by measuring the clinical data including the gene mutation typing comprises the following steps: a data acquisition step, in which clinical data after lung cancer surgery are acquired; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; and a regression analysis step, wherein the risk factor data and the overall survival period data are subjected to regression analysis to obtain data after the regression analysis, and clinical data after lung cancer surgery comprise gene mutation typing, age, tumor size, lymph node metastasis and a surgery mode.

Description

Method and system for predicting survival rate after lung cancer surgery
Technical Field
The present disclosure relates to the field of surgery, and in particular to methods and systems for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.
Background
Early stage lung cancer comprises I, II stages and a subset of stage III disease. The standard treatment for non-small cell lung cancer is radical resection. After lung cancer surgery, there is a need to predict post-operative survival rates for patients.
In the prior art, TNM is adopted to predict disease-free survival rate after lung cancer surgery by stages. The seventh version of TNM staging is the most widely used staging system, and non-metastatic NSCLC patients are stratified according to the size and infiltration of the tumor and the degree of lymph node involvement. However, the prediction of disease-free survival rate after lung cancer surgery by using TNM in stages is not accurate, the difference of disease-free survival rates of different patients in the same stage is large, and the prediction of the disease-free survival rate after the surgery is very inaccurate.
Patent document CN111640518A discloses a method for predicting the post-operative disease-free survival rate of a cervical cancer patient using a cervical cancer post-operative survival prediction model. The parameter selection, the nomogram and the like are suitable for the postoperative disease-free survival rate of cervical cancer, the pregnancy history, the HPV typing and the FIGO staging are related indexes of the cervical cancer, but not related indexes of lung cancer, and the cervical cancer and the lung cancer are two completely different diseases, so the nomogram in CN111640518A obtained by the indexes is not suitable for the postoperative disease-free survival rate prediction of early lung cancer patients.
Therefore, there is a need for more accurate prediction of postoperative disease-free survival for patients with early stage lung cancer in other, more effective ways. In the prediction process, the selection of prediction parameters and the like is very important for the accuracy of the prediction result.
Disclosure of Invention
To solve the problems in the related art, embodiments of the present disclosure provide methods and systems for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing.
In a first aspect, the disclosed embodiments provide a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing, comprising:
a data acquisition step, in which clinical data after lung cancer surgery are acquired;
a preprocessing step, namely classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;
a risk factor screening step, namely screening risk factors for the clinical data of the building module to obtain risk factor data and total life cycle data;
a regression analysis step of performing regression analysis on the risk factor data and the total life cycle data to obtain regression-analyzed data,
the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,
the regression analysis is calculated by the following formula
ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,
h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of
Figure 251348DEST_PATH_IMAGE001
With reference to the first aspect, the present disclosure provides, in a first implementation form of the first aspect,
the screening step of the risk factors comprises the following steps: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.
With reference to the first aspect, the present disclosure provides, in a second implementation form of the first aspect,
the regression analysis step comprises: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.
With reference to the first aspect, the present disclosure provides, in a third implementation form of the first aspect,
the post-operative clinical data for lung cancer further comprises at least one of: the type of pathology; and/or
The post-regression analysis data includes: disease-free survival rate after operation; and/or
The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.
With reference to the first aspect, the present disclosure provides, in a fourth implementation form of the first aspect,
the lung cancer comprises: stage I-IIIA lung cancer; and/or
The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.
With reference to the first aspect, the present disclosure provides, in a fifth implementation form of the first aspect,
the pretreatment step comprises: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.
With reference to the first aspect, the present disclosure provides, in a sixth implementation form of the first aspect,
and a verification step, which is used for verifying the risk factor screening step and the regression analysis step.
With reference to the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the verifying step includes:
calculating the area under the line, the sensitivity and the specificity of the operating characteristic curve of the receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the clinical data of the verification group;
and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver.
With reference to the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the machine learning method includes at least one of:
a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.
With reference to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect,
and judging the processing accuracy of the risk factor screening step and the regression analysis step under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.
With reference to the first aspect, in a tenth implementation manner of the first aspect, the present disclosure further includes:
and a display step, displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.
With reference to the tenth implementation manner of the first aspect, in an eleventh implementation manner of the first aspect,
the display step comprises the following steps: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.
In a second aspect, a system for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of genetic mutations is provided in embodiments of the present disclosure, comprising:
the data acquisition module is used for acquiring clinical data after lung cancer surgery;
the preprocessing module is used for classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;
the risk factor screening module is used for screening risk factors of the modeling group clinical data to obtain risk factor data and overall life cycle data;
a regression analysis module for performing regression analysis on the risk factor data and the total life cycle data to obtain regression analyzed data,
the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,
the regression analysis is calculated by the following formula
ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,
h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of
Figure 787503DEST_PATH_IMAGE001
With reference to the second aspect, the present disclosure provides, in a first implementation form of the second aspect,
the risk factor screening module is used for: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.
With reference to the second aspect, the present disclosure provides, in a second implementation form of the second aspect,
the regression analysis module is to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.
With reference to the second aspect, the present disclosure, in a third implementation form of the second aspect,
the post-operative clinical data for lung cancer further comprises at least one of: the type of pathology; and/or
The post-regression analysis data includes: disease-free survival rate after operation; and/or
The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.
With reference to the second aspect, the present disclosure, in a fourth implementation form of the second aspect,
the lung cancer comprises: stage I-IIIA lung cancer; and/or
The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.
With reference to the second aspect, the present disclosure provides, in a fifth implementation form of the second aspect,
the preprocessing module is used for: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.
With reference to the second aspect, in a sixth implementation manner of the second aspect, the present disclosure further includes:
and the verification module is used for verifying the risk factor screening module and the regression analysis module.
With reference to the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the verification module is configured to:
calculating the area, sensitivity and specificity under the operating characteristic curve of the receiver based on the risk factor screening module, the regression analysis module and the clinical data of the verification group by adopting a machine learning method;
and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the line of the receiver operating characteristic curve.
With reference to the seventh implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the machine learning method includes at least one of:
a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.
With reference to the eighth implementation manner of the second aspect, in a ninth implementation manner of the second aspect,
and judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.
With reference to the second aspect, in a tenth implementation manner of the second aspect, the present disclosure further includes:
and the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.
With reference to the tenth implementation manner of the second aspect, in an eleventh implementation manner of the second aspect,
the display module is used for: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme provided by the embodiment of the disclosure, the method for predicting the survival rate after lung cancer surgery comprises the following steps: a data acquisition step, in which clinical data after lung cancer surgery are acquired; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis step of performing regression analysis on the risk factor data and the total survival data to obtain data after the regression analysis, wherein clinical data after lung cancer operation comprise gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after the regression analysis, h0(t) is reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of which are set as reference risk rates
Figure 867454DEST_PATH_IMAGE001
Therefore, the accuracy of the patient survival prediction model is improved, and the postoperative disease-free survival rate is accurately estimated.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. The following is a description of the drawings.
Fig. 1a shows an exemplary schematic diagram of an implementation scenario for grouping lung cancer patient data according to an embodiment of the present disclosure.
Fig. 1b illustrates an exemplary schematic diagram of an implementation scenario of a patient survival prediction model according to an embodiment of the present disclosure.
Fig. 1c illustrates an exemplary schematic diagram of an implementation scenario of a validated patient survival prediction model according to an embodiment of the present disclosure.
Fig. 1d shows an exemplary schematic of a nomogram for predicting disease-free survival of a patient, according to an embodiment of the present disclosure.
Fig. 1e shows an exemplary schematic of a subject performance curve according to an embodiment of the present disclosure.
Fig. 2 illustrates a flow chart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.
Fig. 3 shows a flowchart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to yet another embodiment of the present disclosure.
Fig. 4 illustrates a flowchart of a method for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing according to still another embodiment of the present disclosure.
Fig. 5 illustrates a block diagram of a system for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to another embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of labels, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to preclude the possibility that one or more other labels, numbers, steps, actions, components, parts, or combinations thereof are present or added.
It should be further noted that the embodiments and labels in the embodiments of the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
For early stage lung cancer, including I, II stage and a subset of stage III disease, the standard treatment is radical resection. The seventh version of TNM staging is the most widely used tumor staging system, and non-metastatic NSCLC patients are stratified according to the size and infiltration of the tumor and the degree of lymph node involvement. However, TNM staging is less accurate and patients in the same stage have a greatly different post-operative disease-free survival rate. Therefore, there is a need for more accurate prediction of postoperative disease-free survival for patients with early stage lung cancer in other, more effective ways. In the prediction process, selection of prediction parameters, coefficients and the like is very important for the accuracy of the result.
In order to solve the above problems, the present disclosure provides a method and a system for predicting survival rate after lung cancer surgery.
Fig. 1a shows an exemplary schematic diagram of an implementation scenario for grouping lung cancer patient data according to an embodiment of the present disclosure.
FIG. 1a specifically illustrates the process of grouping patient data by a method of predicting post-lung cancer survival by measuring clinical data including gene mutation typing.
It will be appreciated by those of ordinary skill in the art that fig. 1a illustrates an implementation scenario for grouping lung cancer patient data, and does not constitute a limitation of the present disclosure.
As shown in fig. 1a, for the acquired lung cancer patient data 101, step S101 performs 3:1 random grouping, resulting in modeling group data 102 and verification group data 103. Wherein the data volume ratio of the modeling group data 102 to the verification group data 103 is 3: 1. The modeling group data 102 is used to train the patient survival prediction model, and the validation group data 103 is used to validate the accuracy of the results of the patient survival prediction model.
In an embodiment of the present disclosure, lung cancer patient data 101 includes gene mutation typing, age, tumor size, lymph node metastasis, surgical procedure. Gene mutation typing includes all 8 species of the following: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROS1 fusion, Kras mutation, RET fusion, Braf mutation.
In the embodiment of the disclosure, the disease-free survival rate can be accurately predicted by adopting the 8 gene mutation typing. For example: EGFR plays an important role in the proliferation, growth, repair, survival, etc. of tumor cells. EGFR mutations can be overexpressed in tumors of epithelial origin, such as non-small cell lung cancer. In addition, EGFR mutations are also closely associated with neovascularization, tumor invasion and metastasis, tumor chemotherapy resistance, and prognosis. The HER2 mutant highly-expressed tumor shows stronger metastatic capacity and infiltration capacity, has poorer sensitivity to chemotherapy and is easy to relapse. The c-Met protein coded by the MET gene is a tyrosine kinase receptor of Hepatocyte Growth Factor (HGF), and the HGF is combined with the c-Met to activate a downstream signal channel and promote cell proliferation, growth, migration and angiogenesis. When MET gene is amplified, related signal paths are continuously activated, so that lung cancer cells are continuously proliferated and transferred.
In an embodiment of the present disclosure, lung cancer patient data 101 also includes a pathology type.
In embodiments of the present disclosure, lung cancer patient data 101 may be obtained by measurement in a variety of ways, such as CT examination, chest puncture biopsy, gene testing kits, and the like.
In embodiments of the present disclosure, lung cancer patient data 101 may be stored in a database to facilitate extraction of lung cancer patient data 101 at any time and for comprehensive analysis.
In an embodiment of the present disclosure, the optimal critical point may be found by adopting a method of optimal approximation of a Receiver Operating Characteristic (ROC) curve for continuous data, such as age, in the clinical lung cancer patient data 101, and the clinical lung cancer patient data may be grouped into multiple categories based on the optimal critical point. Classification data, such as patient tumor size, lymph node metastasis, surgical procedure, type of pathology, and adjuvant treatment plan, may all be treated as grouped data.
In embodiments of the disclosure, the inclusion criteria for lung cancer patients entering the statistical analysis are:
TNM staging for early stage lung cancer patients in TNM I-IIIA;
2, the surgical treatment is the first choice, and no new auxiliary chemotherapy or radiotherapy is performed before the operation;
3, the operation mode is as follows: radical resection of lung cancer + lymph node dissection;
and 4, the postoperative follow-up time is at least 3 years.
Criteria for excluding lung cancer patients were:
1, absence of any clinical information;
2, not combining other primary malignant tumors at the same time.
In embodiments of the present disclosure, the population of lung cancer patients may be more than 500.
One of ordinary skill in the art will appreciate that the population of lung cancer patients may also be other values, such as 1000 above 500, and the present disclosure is not limited thereto.
Fig. 1b illustrates an exemplary schematic diagram of an implementation scenario of a patient survival prediction model according to an embodiment of the present disclosure.
FIG. 1b specifically illustrates the workflow of a patient survival prediction model in a method for predicting survival after lung cancer surgery by measuring clinical data including gene mutation typing.
It will be understood by those of ordinary skill in the art that fig. 1b illustrates an implementation scenario of a patient survival prediction model, and does not constitute a limitation of the present disclosure.
As shown in fig. 1b, for the modeling group data 102, step S102 performs risk factor screening such as LASSO analysis (LASSO analysis), to obtain risk factor data and total Disease-free Survival (DFS) data 104.
The LASSO analysis is to add a penalty term to compress the estimated parameters based on least square, and when the parameters are reduced to be less than a threshold, the parameters are changed to 0, so as to select independent variables with larger influence on dependent variables and calculate corresponding regression coefficients. LASSO analysis has significant advantages in processing sample data where multiple collinearity exists. The formula for LASSO analysis is
FLASSO=‖y-Xw‖2+λ‖w‖
Wherein y is a dependent variable, X is an independent variable, w is a loss function, and λ is a penalty coefficient.
In embodiments of the disclosure, the risk factor data comprises a genetic mutation typing. The gene mutation typing includes: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation. The risk factor data further includes at least one of: age, tumor size, lymph node metastasis, surgical procedure. For the same lung cancer patient, the risk factor data and the overall survival data correspond to each other.
The overall lifetime data is the time from the randomized block to death due to any cause. For subjects who have been lost prior to death, the last follow-up time can be calculated as the time of death.
In embodiments of the present disclosure, a regression analysis step, such as S103 multifactor Cox analysis, performs regression analysis on the risk factor data and overall survival data 104 to obtain post-regression analysis data, such as post-operative disease-free survival. The multifactor Cox analysis calculates the postoperative disease-free survival rate in the following manner:
ln [ h (t, X)/h0(t) ] = β 1 age + β 2 tumor size + β 3 lymph node metastasis + β 4 surgical modality + β 5 gene mutation typing (β values see table below), where ln represents logarithms, h0(t) represents baseline risk rates, and β 1, β 2, β 3, β 4, β 5 are coefficients.
In the embodiment of the present disclosure, in the above formula, the values of tumor size, lymph node metastasis, operation mode, age, and gene mutation typing may be all 1.
Figure 221075DEST_PATH_IMAGE001
Based on the risk factor data and the postoperative disease-free survival rate, in step S104, a nomogram is established to predict a patient survival score and corresponding probability. The alignment chart is shown in FIG. 1 d.
In an embodiment of the present disclosure, LASSO analysis S102 and multifactor Cox analysis S103 together comprise a patient survival prediction model 105.
Fig. 1c illustrates an exemplary schematic diagram of an implementation scenario of a validated patient survival prediction model according to an embodiment of the present disclosure.
FIG. 1c specifically illustrates a validation procedure in a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.
It will be appreciated by those of ordinary skill in the art that fig. 1c illustrates an implementation scenario of a predictive model for validating patient survival, and does not constitute a limitation of the present disclosure.
In the embodiment of the present disclosure, based on the patient survival prediction model 105 and the validation group data 103, the area, sensitivity, and specificity 106 under the ROC curve are calculated by the artificial intelligence model S105, and the patient survival prediction model accuracy prediction is performed in step S106.
In the embodiment of the disclosure, the artificial intelligence model may use at least one of a Logistic Regression (LR) method, a Support Vector Machine (SVM) method, a Random Forest (RF) method, a Decision Tree (DT) method, a K-nearest neighbor (KNN) method, a Naive Bayesian (NB) method, and an adabodfst (ada) method, and adopt a 10-fold cross validation method to obtain an ROC curve, and calculate sensitivity and specificity. When the area under the ROC curve is larger than 0.65, the survival prediction model of the patient can be predicted to have good model discrimination; when the sensitivity and the specificity are both more than 0.5, the prediction model for predicting the survival of the patient has good prediction effect. By integrating the area under the ROC curve, the sensitivity and the specificity, when the area under the ROC curve is more than 0.65 and both the sensitivity and the specificity are more than 0.5, the survival prediction model of the patient can be predicted to have higher accuracy.
Fig. 1d shows an exemplary schematic of a nomogram for predicting disease-free survival of a patient, according to an embodiment of the present disclosure.
FIG. 1d specifically shows a nomogram for predicting disease-free survival of a patient in a method for predicting survival after lung cancer surgery by measuring clinical data including gene mutation typing.
It will be appreciated by those of ordinary skill in the art that fig. 1d illustrates a nomogram for predicting disease-free survival of a patient, and does not constitute a limitation of the present disclosure.
The nomogram is that a multi-factor Cox regression model is built, each value level of each influence factor is assigned according to the influence degree (the size of a regression coefficient) of each influence factor on an ending variable in the model, then all scores are added to obtain a total score, and finally the prediction probability of the individual ending event is calculated through the function conversion relation between the total score and the occurrence probability of the ending event. Based on the alignment chart shown in fig. 1d, the number of points corresponding to each risk factor can be obtained using the values of each risk factor for tumor size, lymph node metastasis, age, type of surgery, and genetic mutation typing. And adding the points corresponding to the risk factors to obtain the total points. The corresponding disease-free survival rate of 1 year, 3 years and 5 years can be obtained from the total points.
In the embodiment of the present disclosure, as shown in the alignment chart of fig. 1d, the corresponding relationship between the value and the point number of each optimized risk factor is obtained through a Cox regression model. For example: when the size of the tumor is1, the corresponding point number is 0; when the tumor size is 2, the corresponding number of points is 33; when the tumor size is 3, the corresponding number of points is 66; when the tumor size is 4, the corresponding number of points is 100. When the gene mutation typing is Pure EGFR mutation/AE Function, the corresponding point number is 0; when the gene mutation typing is other, the number of the corresponding points is 24. The area under the line of the operating characteristic curve 107 in fig. 1e is made larger than 0.65 by the correspondence between the values and the number of points of the risk factors. The corresponding sensitivity and specificity are both more than 0.5, specifically, the area under the line of the working characteristic curve of the testee is 0.71, the corresponding sensitivity is 0.67, and the specificity is 0.68, so that accurate results of the disease-free survival rate of 1 year, the disease-free survival rate of 3 years and the disease-free survival rate of 5 years are obtained.
Fig. 1e shows an exemplary schematic of a subject performance curve according to an embodiment of the present disclosure.
FIG. 1e specifically illustrates a subject performance curve for a method of predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.
It will be understood by those of ordinary skill in the art that figure 1e illustrates a subject performance curve without limiting the present disclosure.
As shown in fig. 1e, the area under the line of the subject performance curve 107 was 0.71, greater than 0.65. The corresponding sensitivity was 0.67 and specificity was 0.68, both greater than 0.5. Therefore, the patient survival prediction model has good model distinguishing effect and prediction effect and higher accuracy.
Fig. 2 illustrates a flow chart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.
As shown in fig. 2, the method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing comprises: steps S201, S202, S203, S204.
In step S201, post-lung cancer surgery clinical data is acquired.
In step S202, clinical data after lung cancer surgery are classified and grouped to obtain clinical data of a building group and clinical data of a verification group.
In step S203, risk factor screening is performed on the modeling group clinical data to obtain risk factor data and overall lifetime data.
In step S204, regression analysis is performed on the risk factor data and the total lifetime data to obtain regression-analyzed data.
Step S201 is a data acquisition step, step S202 is a preprocessing step, step S203 is a risk factor screening step, and step S204 is a regression analysis step.
Clinical data after lung cancer surgery include gene mutation typing, age, tumor size, lymph node metastasis, mode of surgery,
the regression analysis is calculated by the following formula
ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,
h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of
Figure 629054DEST_PATH_IMAGE001
According to an embodiment of the present disclosure, post-operative clinical data of lung cancer is acquired through a data acquisition step; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis step of performing regression analysis on the risk factor data and the total survival data to obtain data after the regression analysis, wherein clinical data after lung cancer operation comprise gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after the regression analysis, h0(t) is reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of which are set as reference risk rates
Figure 93533DEST_PATH_IMAGE001
Therefore, the risk factors more relevant to the postoperative disease-free survival rate are screened out, the accuracy of the survival prediction model of the patient is improved, and the postoperative disease-free survival rate is accurately estimated.
According to an embodiment of the present disclosure, the screening of the risk factors includes: and screening risk factors on the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data, thereby obtaining the risk factors more related to the postoperative disease-free survival rate and improving the estimation accuracy of the postoperative disease-free survival rate.
According to an embodiment of the present disclosure, the analyzing step by regression includes: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data, so that accurate regression-analyzed data is obtained, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
According to embodiments of the present disclosure, post-operative clinical data by lung cancer further includes a pathology type; and/or the post-regression analysis data comprises: disease-free survival rate after operation; and/or the risk factor data comprises genotyping, further comprising at least one of: age, tumor size, lymph node metastasis and operation mode, so that the accuracy of a patient survival prediction model and the estimation accuracy of postoperative disease-free survival rate are improved by selecting appropriate postoperative clinical data and risk factor data of the lung cancer.
According to embodiments of the present disclosure, lung cancer comprises: stage I-IIIA lung cancer; and/or the genotyping of the gene comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion and Braf mutation, so that a reasonable lung cancer applicable type is selected, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
According to an embodiment of the present disclosure, the pre-processing step includes: for continuous data in the lung cancer postoperative clinical data, an optimal critical point is obtained by adopting an optimal approximation method of a receiver operating characteristic curve, and a plurality of classified lung cancer postoperative clinical data are grouped by adopting the optimal critical point to obtain modeling group clinical data and verification group clinical data, so that reasonable data grouping is performed, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
Fig. 3 shows a flowchart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to yet another embodiment of the present disclosure.
As shown in fig. 3, the method for predicting the survival rate after lung cancer surgery by measuring clinical data including gene mutation typing includes, in addition to the steps S201, S202, S203, S204 identical to fig. 2: step S301.
In step S301, the risk factor screening step and the regression analysis step are verified.
Step S301 is a verification step.
According to an embodiment of the present disclosure, by further comprising: and a verification step, namely verifying the risk factor screening step and the regression analysis step so as to verify the accuracy of the patient survival prediction model.
According to an embodiment of the present disclosure, the verifying step includes: calculating the area, sensitivity and specificity under the operating characteristic curve of a receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the verification group clinical data; and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver, thereby verifying the accuracy of the patient survival prediction model.
According to an embodiment of the present disclosure, the machine learning method includes at least one of: the method comprises a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method and an AdaboDFSt method, so that the offline area, the sensitivity and the specificity of the receiver operation characteristic curve can be accurately calculated, and the patient survival prediction model can be accurately verified.
According to the embodiment of the disclosure, the quantitative prediction standard of the accuracy of the patient survival prediction model is obtained by judging the processing accuracy of the risk factor screening step and the regression analysis step under the condition that the area under the receiver operation characteristic curve is more than 0.65, the sensitivity is more than 0.5, and the specificity is more than 0.5.
Fig. 4 illustrates a flowchart of a method for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing according to still another embodiment of the present disclosure.
As shown in fig. 4, the method for predicting the survival rate after lung cancer surgery by measuring clinical data including gene mutation typing includes a step S401 in addition to the steps S201, S202, S203, S204, S301 identical to fig. 3.
In step S401, the relationship between the risk factor data and the regression-analyzed data is graphically displayed.
According to the embodiment of the disclosure, the relationship between the risk factor data and the regression-analyzed data is graphically displayed through the displaying step, so that the relationship between the risk factor data and the regression-analyzed data, such as postoperative disease-free survival rate, is visually and vividly displayed, and the use convenience is improved.
According to an embodiment of the present disclosure, the displaying step includes: the nomogram is used for displaying the relationship between the risk factor data and the regression-analyzed data, so that the regression-analyzed data such as postoperative disease-free survival rate can be intuitively and conveniently calculated from the risk factor data.
Fig. 5 illustrates a block diagram of a system for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.
As shown in fig. 5, a system 500 for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of gene mutations comprises: the system comprises a data acquisition module 501, a preprocessing module 502, a risk factor screening module 503 and a regression analysis module 504.
In an embodiment of the present disclosure, the data acquisition module 501 is used to acquire post-operative clinical data of lung cancer; the preprocessing module 502 is used for classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; the risk factor screening module 503 is configured to perform risk factor screening on the modeling group clinical data to obtain risk factor data and overall lifetime data; the regression analysis module 504 is configured to perform regression analysis on the risk factor data and the total lifetime data to obtain post-regression analysis data, wherein the post-lung cancer surgery clinical data includes a genetic mutation typing, an age, a tumor size, a lymph node metastasis, and a surgery mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = β 1 age + β 2 tumor size + β 3 lymph node metastasis + β 4 surgery mode + β 5 genetic mutation typing by the following formula, h (t, X) is post-regression analysis data, h0(t) is a reference risk rate, β 1, β 2, β 3, β 4, and β 5 are coefficients, and values thereof are set as reference risk rates
Figure 27991DEST_PATH_IMAGE001
According to an embodiment of the present disclosure, the data acquisition module is used for acquiring clinical data after lung cancer surgery; the preprocessing module is used for classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; the risk factor screening module is used for screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis module for performing regression analysis on the risk factor data and the total survival data to obtain data after regression analysis, wherein clinical data after lung cancer operation comprises gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after regression analysis, h0(t) is reference risk rate, beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients, and values are set as values
Figure 427879DEST_PATH_IMAGE001
Therefore, the risk factors more relevant to the postoperative disease-free survival rate are screened out, the accuracy of the survival prediction model of the patient is improved, and the postoperative disease-free survival rate is accurately estimated.
According to an embodiment of the present disclosure, the risk factor screening module is configured to: and (3) screening the risk factors of the clinical data of the modeling group by using a lasso analysis method to obtain the risk factor data and the overall survival time data, thereby obtaining the risk factors more related to the postoperative disease-free survival rate and improving the estimation accuracy of the postoperative disease-free survival rate.
According to an embodiment of the present disclosure, the regression analysis module is configured to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data, so that accurate regression-analyzed data is obtained, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
According to embodiments of the present disclosure, post-operative clinical data via lung cancer further comprises: the type of pathology; and/or the post-regression analysis data comprises: disease-free survival rate after operation; and/or the risk factor data comprises genotyping, further comprising at least one of: age, tumor size, lymph node metastasis and operation mode, so that the accuracy of a patient survival prediction model and the estimation accuracy of postoperative disease-free survival rate are improved by selecting appropriate postoperative clinical data and risk factor data of the lung cancer.
According to embodiments of the present disclosure, lung cancer comprises: stage I-IIIA lung cancer; and/or the gene mutation typing comprises at least one of: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROS1 fusion, Kras mutation, RET fusion and Braf mutation, so that a reasonable lung cancer applicable type is selected, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
According to an embodiment of the present disclosure, the preprocessing module is configured to: for continuous data in the lung cancer postoperative clinical data, an optimal critical point is obtained by adopting an optimal approximation method of a receiver operating characteristic curve, and a plurality of classified lung cancer postoperative clinical data are grouped by adopting the optimal critical point to obtain modeling group clinical data and verification group clinical data, so that reasonable data grouping is performed, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.
According to an embodiment of the present disclosure, the system for predicting post-operative survival rate of lung cancer by measuring clinical data including gene mutation typing may further include, in addition to the data acquisition module 501, the preprocessing module 502, the risk factor screening module 503, and the regression analysis module 504 in fig. 5: and a verification module.
And the verification module is used for verifying the risk factor screening module and the regression analysis module.
According to an embodiment of the present disclosure, by further comprising: and the verification module is used for verifying the risk factor screening module and the regression analysis module so as to verify the accuracy of the patient survival prediction model.
According to an embodiment of the present disclosure, the verification module is configured to: calculating the area, sensitivity and specificity under the operating characteristic curve of a receiver by adopting a machine learning method based on a risk factor screening module, a regression analysis module and verification group clinical data; and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver, so as to verify the accuracy of the patient survival prediction model.
According to an embodiment of the present disclosure, the machine learning method includes at least one of: the method comprises a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method and an AdaboDFSt method, so that the offline area, the sensitivity and the specificity of the receiver operation characteristic curve can be accurately calculated, and the patient survival prediction model can be accurately verified.
According to the embodiment of the disclosure, the quantitative prediction standard of the accuracy of the patient survival prediction model is obtained by judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operation characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.
In the embodiment of the present disclosure, the system for predicting the post-operative survival rate of lung cancer by measuring clinical data including gene mutation typing may further include, in addition to the data acquisition module 501, the preprocessing module 502, the risk factor screening module 503, the regression analysis module 504, and the verification module: and a display module.
And the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.
According to an embodiment of the present disclosure, by further comprising: the display module is used for displaying the relationship between the risk factor data and the regression analysis data in a graphical mode, so that the relationship between the risk factor data and the regression analysis data such as postoperative disease-free survival rate is visually and vividly displayed, and the use convenience is improved.
According to an embodiment of the present disclosure, a display module is used for: the nomogram is used for displaying the relationship between the risk factor data and the regression-analyzed data, so that the regression-analyzed data such as postoperative disease-free survival rate can be intuitively and conveniently calculated from the risk factor data.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (24)

1. A method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing comprising:
a data acquisition step, in which clinical data after lung cancer surgery are acquired;
a preprocessing step, namely classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;
a risk factor screening step, namely screening risk factors for the clinical data of the building module to obtain risk factor data and total life cycle data;
a regression analysis step of performing regression analysis on the risk factor data and the total life cycle data to obtain regression-analyzed data,
the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,
the regression analysis is calculated by the following formula
ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,
h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of
Figure 400936DEST_PATH_IMAGE001
2. The method of claim 1,
the screening step of the risk factors comprises the following steps: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.
3. The method of claim 1,
the regression analysis step comprises: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.
4. The method of claim 1,
the post-operative clinical data for lung cancer further comprises: the type of pathology; and/or
The post-regression analysis data includes: disease-free survival rate after operation; and/or
The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.
5. The method of claim 1,
the lung cancer comprises: stage I-IIIA lung cancer; and/or
The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.
6. The method of claim 1,
the pretreatment step comprises: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.
7. The method of claim 1, further comprising:
and a verification step, which is used for verifying the risk factor screening step and the regression analysis step.
8. The method of claim 7, wherein the step of verifying comprises:
calculating the area under the line, the sensitivity and the specificity of the operating characteristic curve of the receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the clinical data of the verification group;
and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver.
9. The method of claim 8, wherein the machine learning method comprises at least one of:
a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.
10. The method of claim 9,
and judging the processing accuracy of the risk factor screening step and the regression analysis step under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.
11. The method of claim 1, further comprising:
and a display step, displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.
12. The method of claim 11,
the display step comprises the following steps: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.
13. A system for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of gene mutations, comprising:
the data acquisition module is used for acquiring clinical data after lung cancer surgery;
the preprocessing module is used for classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;
the risk factor screening module is used for screening risk factors of the modeling group clinical data to obtain risk factor data and overall life cycle data;
a regression analysis module for performing regression analysis on the risk factor data and the total life cycle data to obtain regression analyzed data,
the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,
the regression analysis is calculated by the following formula
ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,
h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of
Figure 583656DEST_PATH_IMAGE001
14. The system of claim 13,
the risk factor screening module is used for: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.
15. The system of claim 13,
the regression analysis module is to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.
16. The system of claim 13,
the post-operative clinical data for lung cancer further comprises: the type of pathology; and/or
The post-regression analysis data includes: disease-free survival rate after operation; and/or
The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.
17. The system of claim 13,
the lung cancer comprises: stage I-IIIA lung cancer; and/or
The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.
18. The system of claim 13,
the preprocessing module is used for: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.
19. The system of claim 13, further comprising:
and the verification module is used for verifying the risk factor screening module and the regression analysis module.
20. The system of claim 19, wherein the verification module is configured to:
calculating the area, sensitivity and specificity under the operating characteristic curve of the receiver based on the risk factor screening module, the regression analysis module and the clinical data of the verification group by adopting a machine learning method;
and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the line of the receiver operating characteristic curve.
21. The system of claim 20, wherein the machine learning method comprises at least one of:
a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.
22. The system of claim 21,
and judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.
23. The system of claim 13, further comprising:
and the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.
24. The system of claim 23,
the display module is used for: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.
CN202111071269.5A 2021-09-13 2021-09-13 Method for constructing survival rate prediction model after lung cancer surgery and prediction model system Active CN113517073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111071269.5A CN113517073B (en) 2021-09-13 2021-09-13 Method for constructing survival rate prediction model after lung cancer surgery and prediction model system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111071269.5A CN113517073B (en) 2021-09-13 2021-09-13 Method for constructing survival rate prediction model after lung cancer surgery and prediction model system

Publications (2)

Publication Number Publication Date
CN113517073A true CN113517073A (en) 2021-10-19
CN113517073B CN113517073B (en) 2022-04-12

Family

ID=78063081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111071269.5A Active CN113517073B (en) 2021-09-13 2021-09-13 Method for constructing survival rate prediction model after lung cancer surgery and prediction model system

Country Status (1)

Country Link
CN (1) CN113517073B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093512A (en) * 2021-10-21 2022-02-25 杭州电子科技大学 Survival prediction method based on multi-mode data and deep learning model
CN114582517A (en) * 2022-03-04 2022-06-03 四川大学 Construction method and application of periodontitis early-stage prejudgment scoring table
CN114974598A (en) * 2022-06-29 2022-08-30 山东大学 Lung cancer prognosis prediction model construction method and lung cancer prognosis prediction system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111892A (en) * 2019-04-29 2019-08-09 杭州电子科技大学 A kind of postoperative short-term relapse and metastasis risk evaluating system of NSCLC patient
CN111640509A (en) * 2020-06-02 2020-09-08 山东大学齐鲁医院 Cervical cancer postoperative recurrence risk prediction method and system
CN111640518A (en) * 2020-06-02 2020-09-08 山东大学齐鲁医院 Cervical cancer postoperative survival prediction method, system, equipment and medium
CN112582028A (en) * 2020-12-30 2021-03-30 华南理工大学 Lung cancer prognosis prediction model, construction method and device
CN113174439A (en) * 2021-03-30 2021-07-27 中国医学科学院肿瘤医院 Application of immune gene pair-based scoring system in predicting immunotherapy effect of non-small cell lung cancer patient
CN113327679A (en) * 2021-05-27 2021-08-31 上海市闵行区中心医院 Pulmonary embolism clinical risk and prognosis scoring method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111892A (en) * 2019-04-29 2019-08-09 杭州电子科技大学 A kind of postoperative short-term relapse and metastasis risk evaluating system of NSCLC patient
CN111640509A (en) * 2020-06-02 2020-09-08 山东大学齐鲁医院 Cervical cancer postoperative recurrence risk prediction method and system
CN111640518A (en) * 2020-06-02 2020-09-08 山东大学齐鲁医院 Cervical cancer postoperative survival prediction method, system, equipment and medium
CN112582028A (en) * 2020-12-30 2021-03-30 华南理工大学 Lung cancer prognosis prediction model, construction method and device
CN113174439A (en) * 2021-03-30 2021-07-27 中国医学科学院肿瘤医院 Application of immune gene pair-based scoring system in predicting immunotherapy effect of non-small cell lung cancer patient
CN113327679A (en) * 2021-05-27 2021-08-31 上海市闵行区中心医院 Pulmonary embolism clinical risk and prognosis scoring method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吴琪燕等: "肺腺癌预后因素及血清肿瘤标记物的诊断效能", 《昆明医科大学学报》 *
涂文婷: ""非小细胞肺癌的EGFR基因突变状态预测及预后评估的放射组学研究"", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *
赵方超等: "非小细胞肺癌术后复发转移的风险模型构建及预测能力的验证", 《肿瘤防治研究》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093512A (en) * 2021-10-21 2022-02-25 杭州电子科技大学 Survival prediction method based on multi-mode data and deep learning model
CN114093512B (en) * 2021-10-21 2023-04-18 杭州电子科技大学 Survival prediction method based on multi-mode data and deep learning model
CN114582517A (en) * 2022-03-04 2022-06-03 四川大学 Construction method and application of periodontitis early-stage prejudgment scoring table
CN114974598A (en) * 2022-06-29 2022-08-30 山东大学 Lung cancer prognosis prediction model construction method and lung cancer prognosis prediction system
CN114974598B (en) * 2022-06-29 2024-04-16 山东大学 Method for constructing lung cancer prognosis prediction model and lung cancer prognosis prediction system

Also Published As

Publication number Publication date
CN113517073B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN113517073B (en) Method for constructing survival rate prediction model after lung cancer surgery and prediction model system
CN113450873B (en) Marker for predicting gastric cancer prognosis and immunotherapy applicability and application thereof
Xu et al. Development and clinical validation of a novel 9-gene prognostic model based on multi-omics in pancreatic adenocarcinoma
EP3729441B1 (en) Microsatellite instability detection
TW201926095A (en) Models for targeted sequencing
CN108021788B (en) Method and device for extracting biomarkers based on deep sequencing data of cell free DNA
KR20190085667A (en) Circulating Tumor DNA Detection Method Using Sample comprising Cell free DNA and Uses thereof
CN115410713A (en) Hepatocellular carcinoma prognosis risk prediction model construction based on immune-related gene
CN108559777B (en) Novel molecular marker and application thereof in preparation of kit for diagnosis and prognosis of renal clear cell carcinoma
CN115588507A (en) Prognosis model of lung adenocarcinoma EMT related gene, construction method and application
CN115497552A (en) Gastric cancer prognosis risk model based on endoplasmic reticulum stress characteristic gene and application
CN115482880A (en) Head and neck squamous carcinoma glycolysis related gene prognosis model, construction method and application
CN110760585A (en) Prostate cancer biomarker and application thereof
Schneider et al. Multimodal integration of image, epigenetic and clinical data to predict BRAF mutation status in melanoma
CN112037863B (en) Early NSCLC prognosis prediction system
CN113584175A (en) Group of molecular markers for evaluating renal papillary cell carcinoma progression risk and screening method and application thereof
KR102265529B1 (en) Method for predicting disease risk based on analysis of complex genetic information
CN112382341A (en) Method for identifying biomarkers related to esophageal squamous carcinoma prognosis
US20240194294A1 (en) Artificial-intelligence-based method for detecting tumor-derived mutation of cell-free dna, and method for early diagnosis of cancer, using same
Livesey et al. Transforming RNA-Seq gene expression to track cancer progression in the multi-stage early to advanced-stage cancer development
US20200105374A1 (en) Mixture model for targeted sequencing
KR102450582B1 (en) Extracting method for biomarker, biomarkers for diagnosis of pancreatic cancer and method for diagnosis of pancreatic cancer using the same
CN117219162B (en) Evidence intensity assessment method for body source identification aiming at tumor tissue STR (short tandem repeat) map
CN115472294B (en) Model for predicting transformation speed of small cell transformation lung adenocarcinoma patient and construction method thereof
CN113270188B (en) Method and device for constructing prognosis prediction model of patient after radical esophageal squamous carcinoma treatment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant