CN114639482A - IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method - Google Patents
IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method Download PDFInfo
- Publication number
- CN114639482A CN114639482A CN202210276812.3A CN202210276812A CN114639482A CN 114639482 A CN114639482 A CN 114639482A CN 202210276812 A CN202210276812 A CN 202210276812A CN 114639482 A CN114639482 A CN 114639482A
- Authority
- CN
- China
- Prior art keywords
- esophageal squamous
- lasso
- idpc
- patients
- survival
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010041823 squamous cell carcinoma Diseases 0.000 title claims abstract description 79
- 230000004083 survival effect Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004393 prognosis Methods 0.000 title claims abstract description 18
- 238000012502 risk assessment Methods 0.000 title claims abstract description 17
- 208000036765 Squamous cell carcinoma of the esophagus Diseases 0.000 claims abstract description 23
- 208000007276 esophageal squamous cell carcinoma Diseases 0.000 claims abstract description 23
- 206010061534 Oesophageal squamous cell carcinoma Diseases 0.000 claims abstract description 21
- 230000001575 pathological effect Effects 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 11
- 238000003066 decision tree Methods 0.000 claims abstract description 10
- 210000004369 blood Anatomy 0.000 claims abstract description 9
- 239000008280 blood Substances 0.000 claims abstract description 9
- 238000000546 chi-square test Methods 0.000 claims abstract description 8
- 230000002980 postoperative effect Effects 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 238000007477 logistic regression Methods 0.000 claims description 23
- 239000000523 sample Substances 0.000 claims description 22
- 206010028980 Neoplasm Diseases 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 18
- 102000009027 Albumins Human genes 0.000 claims description 12
- 108010088751 Albumins Proteins 0.000 claims description 12
- 239000013610 patient sample Substances 0.000 claims description 11
- 201000011510 cancer Diseases 0.000 claims description 10
- 108010049003 Fibrinogen Proteins 0.000 claims description 6
- 102000008946 Fibrinogen Human genes 0.000 claims description 6
- 102000006395 Globulins Human genes 0.000 claims description 6
- 108010044091 Globulins Proteins 0.000 claims description 6
- 102000001554 Hemoglobins Human genes 0.000 claims description 6
- 108010054147 Hemoglobins Proteins 0.000 claims description 6
- 208000007433 Lymphatic Metastasis Diseases 0.000 claims description 6
- 229940012952 fibrinogen Drugs 0.000 claims description 6
- 230000008602 contraction Effects 0.000 claims description 5
- 230000008595 infiltration Effects 0.000 claims description 5
- 238000001764 infiltration Methods 0.000 claims description 5
- 108090000190 Thrombin Proteins 0.000 claims description 4
- 230000004069 differentiation Effects 0.000 claims description 4
- 229960004072 thrombin Drugs 0.000 claims description 4
- 108010094028 Prothrombin Proteins 0.000 claims description 3
- 102100027378 Prothrombin Human genes 0.000 claims description 3
- 210000003651 basophil Anatomy 0.000 claims description 3
- 238000004820 blood count Methods 0.000 claims description 3
- 210000003979 eosinophil Anatomy 0.000 claims description 3
- 210000003743 erythrocyte Anatomy 0.000 claims description 3
- 210000000265 leukocyte Anatomy 0.000 claims description 3
- 210000004698 lymphocyte Anatomy 0.000 claims description 3
- 210000001616 monocyte Anatomy 0.000 claims description 3
- 210000000440 neutrophil Anatomy 0.000 claims description 3
- 102000004169 proteins and genes Human genes 0.000 claims description 3
- 108090000623 proteins and genes Proteins 0.000 claims description 3
- 229940039716 prothrombin Drugs 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims 3
- 230000002596 correlated effect Effects 0.000 claims 1
- 238000003745 diagnosis Methods 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 description 7
- 238000000611 regression analysis Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- PGOHTUIFYSHAQG-LJSDBVFPSA-N (2S)-6-amino-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-4-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S,3R)-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S,3R)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-1-[(2S,3R)-2-[[(2S)-2-[[(2S)-2-[[(2R)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-1-[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-amino-4-methylsulfanylbutanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-5-carbamimidamidopentanoyl]amino]propanoyl]pyrrolidine-2-carbonyl]amino]-3-methylbutanoyl]amino]-4-methylpentanoyl]amino]-4-methylpentanoyl]amino]acetyl]amino]-3-hydroxypropanoyl]amino]-4-methylpentanoyl]amino]-3-sulfanylpropanoyl]amino]-4-methylsulfanylbutanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-hydroxybutanoyl]pyrrolidine-2-carbonyl]amino]-5-oxopentanoyl]amino]-3-hydroxypropanoyl]amino]-3-hydroxypropanoyl]amino]-3-(1H-imidazol-5-yl)propanoyl]amino]-4-methylpentanoyl]amino]-3-hydroxybutanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-5-carbamimidamidopentanoyl]amino]-5-oxopentanoyl]amino]-3-hydroxybutanoyl]amino]-3-hydroxypropanoyl]amino]-3-carboxypropanoyl]amino]-3-hydroxypropanoyl]amino]-5-oxopentanoyl]amino]-5-oxopentanoyl]amino]-3-phenylpropanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-methylbutanoyl]amino]-4-methylpentanoyl]amino]-4-oxobutanoyl]amino]-5-carbamimidamidopentanoyl]amino]-3-(1H-indol-3-yl)propanoyl]amino]-4-carboxybutanoyl]amino]-5-oxopentanoyl]amino]hexanoic acid Chemical compound CSCC[C@H](N)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O PGOHTUIFYSHAQG-LJSDBVFPSA-N 0.000 description 2
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 2
- 206010027476 Metastases Diseases 0.000 description 2
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 2
- 108010000499 Thromboplastin Proteins 0.000 description 2
- 102000002262 Thromboplastin Human genes 0.000 description 2
- 201000004101 esophageal cancer Diseases 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 210000001165 lymph node Anatomy 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 210000000115 thoracic cavity Anatomy 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 210000004876 tela submucosa Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000009736 wetting Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides an IDPC and LASSO-based esophageal squamous cell carcinoma prognosis survival risk assessment method, which comprises the following steps: firstly, acquiring pathological data of an esophageal squamous carcinoma patient, constructing a decision tree by using a chi-square test method and important pathological factors determined by information gain, and dividing the patient into an early stage group and a middle and late stage group; secondly, acquiring preoperative blood conventional biochemical indexes of esophageal squamous cell carcinoma patients in an early group and a middle and late group respectively, and selecting indexes which are obviously related to postoperative survival risks by using LASSO; then, IDPC is utilized to gather the esophageal squamous carcinoma patients in the early and middle-late groups into different clusters respectively, and for each cluster, an LR-based nomogram is constructed to predict the survival risk of the esophageal squamous carcinoma patients; finally, the performance of the nomograms was evaluated using the confusion matrix and the subject's AUC. The invention can accurately judge the prognosis survival risk of the esophageal squamous cell carcinoma patient, and can help doctors to make diagnosis decision so as to provide effective treatment for the patient.
Description
Technical Field
The invention relates to the technical field of esophageal squamous cell carcinoma risk assessment, in particular to an IDPC and LASSO based esophageal squamous cell carcinoma prognosis survival risk assessment method.
Background
The TNM staging system proposed by the United states cancer Joint Committee has been widely applied to prognosis prediction of patients with esophageal squamous cell carcinoma. However, the pathogenesis of esophageal squamous carcinoma is complex, and the survival risk of patients diagnosed with esophageal squamous carcinoma only by using the TNM staging system has some limitations. Endoscopic determinations can also determine the risk of survival for patients with esophageal squamous carcinoma, but this is expensive for the patient. Classifying survival risk based on clinical pathology and blood routine examination information is a challenge facing computer-aided systems. In recent years, a number of machine learning methods have been used to predict the prognostic survival of patients with esophageal squamous carcinoma, such as neural networks, support vector machines, and random forests. However, it is difficult for the user to find the internal structure of the nonlinear model created by the machine learning method, and the importance of the index cannot be found. Meanwhile, the method for extracting and clustering the characteristics of the biological information is a difficult problem for researchers at home and abroad. The present medical field needs a method which can conveniently and visually find index factors influencing the survival risk of esophageal squamous cell carcinoma after prognosis and accurately judge the prognosis risk.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides an IDPC and LASSO-based esophageal squamous cell carcinoma prognosis survival risk assessment method, which solves the technical problems of unclear internal structure, incomplete index variable screening and low prediction capability of the existing prediction model.
The technical scheme of the invention is realized as follows:
an IDPC and LASSO based esophageal squamous carcinoma prognosis survival risk assessment method comprises the following steps:
the method comprises the following steps: acquiring pathological data of an esophageal squamous carcinoma patient;
step two: using pathological data of esophageal squamous carcinoma patients, constructing a decision tree by using a chi-square test method and important pathological factors determined by information gain, and dividing the patients into an early group and a middle-late group;
step three: respectively obtaining preoperative blood conventional biochemical indexes of esophageal squamous carcinoma patients in an early group and a middle and late group, and selecting indexes which are obviously related to postoperative survival risks by using minimum absolute contraction and a selection operator;
step four: respectively clustering early-stage group and middle-stage and late-stage group esophageal squamous carcinoma patients into different clusters by using an improved density peak clustering algorithm based on cosine distance and K nearest neighbor;
step five: for each cluster, constructing a nomogram based on a logistic regression model to predict the survival risk of the esophageal squamous cell carcinoma patient;
step six: and evaluating the performance of the nomogram in the fifth step by using the confusion matrix and the area under the operating characteristic curve of the subject.
Preferably, the pathological data of the esophageal squamous carcinoma patient comprise sex, age, tumor size, differentiation degree, infiltration degree and lymph node metastasis.
Preferably, the chi-square verification method is as follows:
wherein m isiAnd mjRespectively representing the number of variables and the number of samples, i represents the value of the variables, j represents the value of the sample of the esophageal squamous cell carcinoma patient, AijRepresenting a variableThe value is i and belongs to the actual value of the sample of the jth esophageal squamous carcinoma patient, TijRepresenting the expected value of a sample with the variable value i and belonging to the jth esophageal squamous carcinoma patient, wherein TijThe definition is as follows:
preferably, the information gain is calculated by:
wherein, grThe information gain rate is represented, delta H is the information gain of the attribute, and InfoBeform (H) is the information entropy before attribute classification; the method for calculating the information gain delta H comprises the following steps:
△H=InfoBefore(H)-InfoAfter(H);
wherein, InfoAfter (H) is the information entropy after attribute classification;
the calculation methods of the information entropy InfoBeform (H) before attribute classification and the information entropy InfoAfter (H) after attribute classification are respectively as follows:
wherein, P (x)1) Event x1Probability of occurrence, P (x)2) Is an event x2The probability of the occurrence of the event is,is an eventThe probability of occurrence.
Preferably, the common biochemical indices of blood of the esophageal squamous carcinoma patient include White Blood Cell Count (WBCC), lymphocyte count (LYC), monocyte count (MOC), neutrophil count (NEC), eosinophil count (EOS), basophil count (BAC), erythrocyte count (ERY), Hemoglobin (HGB), platelet count (THC), Total Protein (TP), Albumin (ALB), Globulin (GLO), Prothrombin Time (PT), Activated Partial Thromboplastin Time (APTT), Thrombin Time (TT), Fibrinogen (FIB).
Preferably, the method for selecting the index significantly related to the postoperative survival risk by using the minimum absolute contraction and the selection operator comprises:
where Y is an n × 1 vector, Y represents an actual value corresponding to a sample X, X is an n × p matrix, X represents an input sample of LASSO regression, and β ═ is (β ═ is)1,β2,…,βp)TIs a vector of regression coefficients of p x 1,is a penalty term, λ>0 is an adjustment parameter to balance penalty term and empirical risk.
Preferably, in step four, the minimum distance δ between the data point of the improved density peak clustering algorithm based on cosine distance and K nearest neighbors and the clustering centeriThe calculation method comprises the following steps:
where ρ isi'Is the local density, di'j”Is xi'And xj”Cosine distance between, xi'Denotes the ith' patient sample, xj”Represents the jth "patient sample, N is the number of patient samples;
xi'and xj”Cosine distance d betweeni'j”The calculation method comprises the following steps:
wherein x isi'aRepresents a sample xi'Corresponding value of middle feature a, xj”aRepresenting a sample xj”The corresponding value of the characteristic a is shown, and L is the characteristic quantity;
local density ρi'The calculation method comprises the following steps:
wherein, kNN (x)i') Is xi'K neighbor set of (1);
xi'k neighbor set kNN (x)i') The calculation method comprises the following steps:
kNN(xi')={xj”∈X|d(xi',xj”)≤d(xi',NNk(xi'))};
wherein, d (x)i',xj”) Is xi'And xj”Cosine distance of, NNk(xi') Is xi'Is adjacent to the k-th neighbor.
Preferably, the logistic regression model is:
wherein,p (y '═ 1| x') is the probability that the input variable x 'belongs to the positive class, p (y' ═ 0| x ') is the probability that the input variable x' belongs to the negative class, and α0Is a constant number, α1Is the regression coefficient of the input variable x'.
Preferably, the parameter α in the logistic regression model0And the regression coefficient alpha1The calculation method comprises the following steps:
wherein, L (alpha)0,α1) Representing the estimated alpha0And alpha1N, n represents the number of samples, y represents the likelihood function of (1, 2, …)t' represent input variable xt' is predicted.
Preferably, the construction method of the nomogram based on the logistic regression model comprises the following steps: assigning scores to each value level of the influence factors according to the size of the regression coefficient in the logistic regression model, and then adding the scores to obtain a total score; the 3-year probability of survival and the 5-year probability of survival are obtained by the total score located on the total score scale.
Compared with the prior art, the invention has the following beneficial effects:
1) according to the invention, the pathological indexes which are obviously related to the survival risk of the esophageal squamous cell carcinoma are screened out by chi-square test and information entropy, and a decision tree is constructed, so that patients with the esophageal squamous cell carcinoma are effectively divided into an early group and a middle-late group.
2) The early group and the middle and late groups are divided into different clusters by using a method of combining LASSO regression analysis and IDPC, so that guarantee is provided for further constructing a high-accuracy risk prediction model of the esophageal squamous cell carcinoma patient.
3) And for different patient clusters, a nomogram model based on logistic regression is constructed, and an accurate, intuitive and easy-to-use prognosis survival risk assessment system is provided for users.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a decision tree that considers the degree of lymph node metastasis and infiltration;
FIG. 3 is a selection of the regulatory parameter λ in LASSO by the lowest criteria, (a) is the LASSO regression analysis for patients with early stage esophageal squamous carcinoma, (b) is the LASSO regression analysis for patients with intermediate and late stage esophageal squamous carcinoma;
FIG. 4 is a decision graph of DPC algorithm for patients with early esophageal squamous carcinoma, (a) is a decision graph of DPC-LASSO, and (b) is an IDPC-LASSO decision graph with cosine distance and KNN;
FIG. 5 is a decision map of the DPC algorithm for patients with middle and advanced esophageal cancer, (a) is a decision map of DPC-LASSO, (b) is an IDPC-LASSO decision map with cosine distance and KNN;
FIG. 6 is a nomogram model of patients with esophageal squamous carcinoma, wherein (a) is a nomogram for predicting 5-year survival probability of patients with early esophageal squamous carcinoma cluster 1, (b) is a nomogram for predicting 5-year survival probability of patients with early esophageal squamous carcinoma cluster 2, (c) is a nomogram for predicting 3-year survival probability of patients with intermediate and late esophageal squamous carcinoma cluster 1, and (d) is a nomogram for predicting 3-year survival probability of patients with intermediate and late esophageal squamous carcinoma cluster 2;
FIG. 7 shows the results of model comparisons of different clustering algorithms, where (a) is the ROC curve based on the survival risk model of early esophageal squamous carcinoma patients, and (b) is the ROC curve of the survival risk model of middle and late esophageal squamous carcinoma patients;
FIG. 8 is a graph of the results of different model tests based on patients with early esophageal squamous carcinoma;
FIG. 9 is a graph of the results of different model tests based on patients with middle and advanced esophageal squamous carcinoma.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides an IDPC and LASSO-based esophageal squamous cell carcinoma prognosis survival risk assessment method, which includes the following specific steps:
the method comprises the following steps: removing patients who do not meet the standard according to the inclusion standard, and acquiring pathological data of the esophageal squamous carcinoma patients; the embodiment of the invention incorporates the data set of 418 samples in total. According to the international cancer control union standard, the following samples were excluded: (a) patients with other malignancies as well; (b) patients who have not successfully performed surgery; (c) patients who die from heart disease, lung cancer, liver cancer or acute infection; (d) patients with incomplete follow-up data and unknown prognosis. Pathological data of esophageal squamous carcinoma patients are acquired, including sex, age, tumor size, differentiation degree, infiltration degree and lymph node metastasis.
Step two: using pathological data of esophageal squamous carcinoma patients, constructing a decision tree by using a chi-square test method and important pathological factors determined by information gain, and dividing the patients into an early group and a middle-late group;
the chi-square test method comprises the following steps:
wherein m isiAnd mjRespectively representing the variable number and the sample number, i represents the value of the variable, j represents the value of the sample of the esophageal squamous cell carcinoma patient, AijRepresents the actual value, T, of a sample which has a variable value of i and belongs to the jth esophageal squamous carcinoma patientijRepresenting the expected value of a sample with the variable value i and belonging to the jth esophageal squamous carcinoma patient, wherein TijThe definition is as follows:
the information gain calculation method comprises the following steps:
wherein, grThe information gain rate is represented, delta H is the information gain of the attribute, and InfoBeform (H) is the information entropy before attribute classification;
the method for calculating the information gain delta H comprises the following steps:
△H=InfoBefore(H)-InfoAfter(H);
wherein, InfoAfter (H) is the information entropy after attribute classification;
the calculation methods of the information entropy InfoBeform (H) before attribute classification and the information entropy InfoAfter (H) after attribute classification are respectively as follows:
wherein, P (x)1) Event x1Probability of occurrence, P (x)2) Is an event x2The probability of the occurrence of the event is,is an eventThe probability of occurrence.
Clinical records were collected for 418 patients with esophageal squamous carcinoma, of which 115 (27.5%) were in the early stage and 303 (72.5%) were in the mid-to late stage. These disease diagnosis information records are saved in a text format. 260 (62.2%) patients were male, 158 (37.8%) were female. 79 (18.9%) occurred in the upper thoracic region, 279 (66.7%) in the mid-thoracic region, and 60 (14.4%) in the lower thoracic region. 26 cases (6.2%) of highly differentiated tumors, 224 cases (53.6%) of moderately differentiated tumors, and 168 cases (40.2%) of poorly differentiated tumors. 14 (3.3%) patients infiltrated the mucosal layer, 34 (8.1%) patients infiltrated the submucosa, 111 (26.6%) patients infiltrated the muscular layer, and 259 (62.0%) patients infiltrated the fibrous layer. 204 cases (48.8%) were lymph node negative metastases, and 214 cases (51.2%) were lymph node positive metastases. Table 1 shows the chi-square test results of pathological features of patients with esophageal squamous carcinoma. As can be seen from table 1, early and middle-late stage patients were significantly associated with sex (P <0.001), tumor size (P <0.001), degree of differentiation (P <0.001), degree of infiltration (P <0.001) and lymph node metastasis (P < 0.001). It can also be seen from table 1 that the risk of esophageal cancer is independent of tumor site (P ═ 0.227) and age (P ═ 0.642).
TABLE 1 Chifang test results of pathological characteristics of patients with esophageal squamous carcinoma
Table 2 shows the entropy analysis of the information of the significant pathological factors of early and middle-late esophageal squamous carcinoma patients. As can be seen from table 2, lymph node metastasis (H-0.6036, Δ H-0.2451, g)r28.88%) and degree of wetting (H0.5099, Δ H0.3388, g)r39.92%) is an important factor for patients with early and middle-advanced esophageal squamous carcinoma and is used for constructing a decision tree.
TABLE 2 entropy analysis of information of significant pathological factors of early and middle-late esophageal squamous carcinoma patients
The decision tree for distinguishing between early and middle stage esophageal squamous carcinoma patients is shown in fig. 2. The model was evaluated by 10-fold cross validation. The entire cohort was randomly divided into 10 sub-cohorts, with the predictive model first fitted in 80% of the population (training set) and the remaining 20% of the population (test set) used to evaluate the performance of the decision tree. The accuracy of the decision model reaches 95.2%, which is helpful for distinguishing early stage ESCC patients from middle and late stage ESCC patients. And further calculating the survival risk of different populations.
Step three: respectively obtaining preoperative blood conventional biochemical indexes of esophageal squamous carcinoma patients in an early group and a middle and late group, and selecting an index which is obviously related to postoperative survival risk by using a least absolute contraction and selection operator (LASSO); the conventional biochemical indices of blood of esophageal squamous carcinoma patients include White Blood Cell Count (WBCC), lymphocyte count (LYC), monocyte count (MOC), neutrophil count (NEC), eosinophil count (EOS), basophil count (BAC), erythrocyte count (ERY), Hemoglobin (HGB), platelet count (THC), Total Protein (TP), Albumin (ALB), Globulin (GLO), Prothrombin Time (PT), Activated Partial Thromboplastin Time (APTT), Thrombin Time (TT), Fibrinogen (FIB).
Based on the constructed decision tree, 418 patients were divided into two groups, an early group and a middle and late group. Each group was divided into two independent queues at an 8:2 ratio, with 80% used as the training set and 20% used as the test set. All statistical analyses were considered significant at a two-tailed P < 0.1. The study of early esophageal squamous carcinoma included 115 patients. According to the follow-up survival time and the 5-year survival probability, patients with early esophageal squamous carcinoma are divided into two types, namely high-risk patients and low-risk patients. The study of middle and advanced esophageal squamous carcinoma included 303 patients. Patients in the middle and late stages are also classified into two categories, high risk and low risk, according to follow-up survival time and 3-year survival probability.
The method for selecting the indexes which are obviously related to the postoperative survival risk by using the minimum absolute shrinkage and the selection operator comprises the following steps:
where Y is an n × 1 vector, Y represents an actual value corresponding to a sample X, X is an n × p matrix, X represents an input sample of LASSO regression, and β ═ is (β ═ is)1,β2,…,βp)TIs a vector of regression coefficients of p x 1,is a penalty term, λ>0 is an adjustment parameter to balance penalty term and empirical risk.
FIG. 3(a) is a LASSO regression analysis of patients with early esophageal squamous carcinoma. It can be seen that the most important 5 indices in the final model are MOC, ALB, PT, NEC and ERY, according to the non-zero coefficients retained in the LASSO analysis, at the most appropriate tuning parameter λ 0.1256.
FIG. 3(b) is a LASSO regression analysis of patients with intermediate and advanced esophageal squamous carcinoma. It can be seen that the most important 3 indices in the final model are PT, WBCC and ALB, according to the non-zero coefficients retained in the LASSO analysis, when the most suitable tuning parameter λ is 0.1249.
Step four: aggregating early and middle-late groups of esophageal squamous carcinoma patients into different clusters respectively by utilizing improved density peak clustering algorithm (DPC) algorithm based on cosine distance and K nearest neighbor (IDPC);
in the fourth step, the minimum distance delta between the data point of the improved density peak value clustering algorithm based on the cosine distance and the K nearest neighbor and the clustering centeriThe calculation method comprises the following steps:
where ρ isi'Is the local density, di'j”Is xi'And xj”Cosine distance between, xi'Denotes the ith' patient sample, xj”Represents the jth "patient sample, N is the number of patient samples;
xi'and xj”Cosine distance d betweeni'j”The calculation method comprises the following steps:
wherein x isi'aRepresents a sample xi'Corresponding value of middle feature a, xj”aRepresents a sample xj”The corresponding value of the characteristic a is shown, and L is the characteristic quantity;
local density ρi'The calculating method comprises the following steps:
wherein, kNN (x)i') Is xi'K neighbor set of (1);
xi'k neighbor set kNN (x)i') The calculation method comprises the following steps:
kNN(xi')={xj”∈X|d(xi',xj”)≤d(xi',NNk(xi'))};
wherein d (x)i',xj”) Is xi'And xj”Cosine distance of, NNk(xi') Is xi'Is adjacent to the k-th neighbor.
Early stage patients: early esophageal squamous carcinoma patients were clustered using the DPC and IDPC algorithms, respectively, based on 5 important indicators determined by the LASSO algorithm. The clustering results of DPC and IDPC are shown in FIG. 4. Fig. 4(a) shows a decision diagram of the DPC algorithm based on euclidean distance. Only one clustering center is arranged at the upper right corner, and effective clustering can not be carried out on patient samples. The decision diagram of the IDPC algorithm with cosine distance and KNN is shown in fig. 4 (b). There were two samples with significantly larger ρ and δ, indicating that patients with early esophageal squamous carcinoma were divided into two categories. In the upper right corner, there are two cluster centers, where cluster center cluster 1 is MOC 0.4, NEC 2.2, ERY 4.13, ALB 40, PT 7.1, cluster center 2 is MOC 0.5, NEC 5.9, ERY 4.35, ALB 49, and PT 12.9.
Patients in middle and advanced stages: and (3) clustering middle and late esophageal squamous carcinoma patients by using DPC (DPC-based data processing) and IDPC (idle data processing) algorithms respectively according to 3 important indexes determined by the LASSO algorithm. Also, the DPC algorithm cannot efficiently cluster patient samples (fig. 5 (a)). The two samples in fig. 5(b) have significantly larger ρ and δ, indicating that the IDPC algorithm can classify patients into two classes. Cluster 1 of the cluster center is PT 10.9, WBCC 5, ALB 37, cluster 2 of the cluster center is PT 11.1, WBCC 8, and ALB 50.
By comparing fig. 4 and 5, it can be observed that the distance δ and the cluster density ρ obtained by the cosine distance and KNN are sufficiently large, which facilitates the IDPC algorithm to construct two centers for early and middle-late esophageal squamous carcinoma patients, respectively. The result shows that the proposed IDPC algorithm based on cosine distance and KNN can improve the clustering capability of the DPC algorithm.
Step five: for each cluster, constructing a nomogram based on a Logistic Regression (LR) model to predict survival risk of esophageal squamous cell carcinoma patients;
the logistic regression model is as follows:
wherein,p (y '═ 1| x') is the probability that the input variable x 'belongs to the positive class, p (y' ═ 0| x ') is the probability that the input variable x' belongs to the negative class, and α0Is a constant number, α1Is the regression coefficient of the input variable x'.
Parameter alpha in the logistic regression model0And the regression coefficient alpha1The calculation method comprises the following steps:
wherein, L (alpha)0,α1) Representing the estimated alpha0And alpha1N, n represents the number of samples, y represents the likelihood function of (1, 2, …)t' representing an input variable xt' is predicted.
Based on the blood routine biochemical examination indexes which are obviously related to the survival risk of early and middle and late esophageal squamous cell carcinoma patients, LR models are respectively established for different clusters of early and middle and late esophageal squamous cell carcinoma patients. The LR model for esophageal squamous carcinoma patients is shown in table 3.
TABLE 3 LR model of esophageal squamous carcinoma patients
The collinear chart method comprises the following steps: and assigning scores to the value level of each influence factor according to the size of the regression coefficient in the multi-factor LR regression model, and then adding the scores to obtain a total score. The 3-year and 5-year survival probabilities are obtained by the total score located on the total score scale. Since the survival time of most early esophageal squamous carcinoma patients is more than 5 years, and the survival time of middle and late esophageal squamous carcinoma patients is less than 5 years, the 5-year survival risk of early esophageal squamous carcinoma patients and the 3-year survival risk of middle and late esophageal squamous carcinoma patients are predicted. FIG. 6 is a histogram model of patients with esophageal squamous carcinoma, wherein (a) is a histogram predicting 5-year survival probability of patients with early esophageal squamous carcinoma cluster 1, (b) is a histogram predicting 5-year survival probability of patients with early esophageal squamous carcinoma cluster 2, (c) is a histogram predicting 3-year survival probability of patients with middle and late esophageal squamous carcinoma cluster 1, and (d) is a histogram predicting 3-year survival probability of patients with middle and late esophageal squamous carcinoma cluster 2.
Step six: the performance of the nomograms in step five was evaluated using the confusion matrix and the area under the subject operating characteristic curve (ROC) curve.
The confusion matrix is shown in table 4. Wherein True Positive (TP) indicates that the predicted result is positive and the actual result is positive; false Positive (FP) indicates that the predicted result is positive, while the actual result is negative; true Negative (TN) means that the predicted result is negative, while the actual result is negative; false Negatives (FN) indicate negative results in the prediction and positive results in the actual results.
TABLE 4 confusion matrix
Four classification indices are defined from the parameters of the confusion matrix to evaluate the performance of the classification model, namely accuracy (Acc), Positive Predictive Value (PPV), recall (R) and F1-score, defined as follows:
the results of the test set model ROC curve comparison are shown in fig. 7. As can be seen from FIG. 7(a), the AUC in the LASSO-IDPC-LR model of Cluster 1 in patients with early esophageal squamous carcinoma was 0.881 (95% CI:0.779-0.983), which is 0.014 higher than the AUC in the LASSO-Kmeans-LR model; the AUC in the LASSO-IDPC-LR model for Cluster 2 was 0.873 (95% CI:0.776-0.970), which was 0.067 higher than the AUC in the LASSO-Kmeans-LR model. In FIG. 7(b), the AUC in the LASSO-IDPC-LR model for Cluster 1 in patients with middle and advanced esophageal squamous carcinoma was 0.802 (95% CI:0.722-0.883), which is 0.104 higher than the AUC in the LASSO-Kmeans-LR model; the AUC of LASSO-IDPC-LR model for Cluster 2 was 0.774 (95% CI:0.680-0.869), which was 0.095 higher than the AUC of LASSO-Kmeans-LR model. Obviously, the IDPC algorithm has better performance than the Kmeans algorithm in the aspect of evaluating the survival risk of the esophageal squamous carcinoma patient after prognosis.
Based on the confusion matrix evaluation index, the model test set results are shown in fig. 8 and 9. Early esophageal squamous carcinoma patients: the model performance of the different models is shown in fig. 8. The performance of the TNM staging system was the worst, with Acc, R, PPV, and F1-scores of 75.3%, 50.8%, 60.8%, and 0.554, respectively. The Acc, R, PPV and F1-scores of the LR-IDPC-LASSO model are 81.4%, 83.7%, 80.4% and 0.82 respectively, and the performance is best compared with the LASSO-Kmeans-LR model, the LASSO-LR model and the TNM staging system, wherein the Acc, R, PPV and F1-scores of the LR-IDPC-LASSO model are respectively improved by 8.9%, 14.9%, 9% and 0.118 compared with the LASSO-Kmeans-LR model. Patients with middle and late stage esophageal squamous carcinoma: the performance of the different models of the patient is shown in figure 9. As can be seen from FIG. 9, the predicted performance of the proposed LASSO-IDPC-LR model is also superior to other models, i.e., the LASSO-Kmeans-LR model, the LASSO-LR model, and the TNM phase system. The Acc, R, PPV and F1-scores of the LASSO-IDPC-LR model were 75.1%, 67.6%, 75.3% and 0.712, respectively.
As can be seen from fig. 8 and 9, the conventional TNM method has difficulty in accurately predicting the survival risk of patients with esophageal squamous cell carcinoma. The results show that the combination of the LASSO algorithm and the LR model improves the prediction performance of the LR model. By introducing the IDPC algorithm, the prediction capability of the LR model is further improved. In addition, the clustering capability of the IDPC algorithm is superior to that of the Kmeans algorithm. The method comprises the steps of firstly selecting important indexes by using an LASSO algorithm, then clustering patients by using an IDPC algorithm, and finally establishing a plurality of linear prognosis evaluation models by using LR.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. An IDPC and LASSO based esophageal squamous carcinoma prognosis survival risk assessment method is characterized by comprising the following steps:
the method comprises the following steps: acquiring pathological data of an esophageal squamous carcinoma patient;
step two: using pathological data of esophageal squamous carcinoma patients, constructing a decision tree by using a chi-square test method and important pathological factors determined by information gain, and dividing the patients into an early group and a middle-late group;
step three: respectively obtaining preoperative blood conventional biochemical indexes of esophageal squamous carcinoma patients in an early group and a middle and late group, and selecting indexes which are obviously related to postoperative survival risks by using minimum absolute contraction and a selection operator;
step four: respectively clustering early-stage group and middle-stage and late-stage group esophageal squamous carcinoma patients into different clusters by using an improved density peak clustering algorithm based on cosine distance and K nearest neighbor;
step five: for each cluster, constructing a nomogram based on a logistic regression model to predict the survival risk of the esophageal squamous cell carcinoma patient;
step six: and evaluating the performance of the nomogram in the fifth step by using the confusion matrix and the area under the operating characteristic curve of the subject.
2. The IDPC and LASSO based prognostic survival risk assessment method of claim 1 wherein the pathological data of the patients with esophageal squamous carcinoma include sex, age, tumor size, degree of differentiation, degree of infiltration and lymph node metastasis.
3. The IDPC and LASSO based esophageal squamous cancer prognostic survival risk assessment method according to claim 1, wherein said chi-square test method is:
wherein m isiAnd mjRespectively representing the number of variables and the number of samples, i represents the value of the variables, j represents the value of the sample of the esophageal squamous cell carcinoma patient, AijRepresents the actual value, T, of a sample which has a variable value of i and belongs to the jth esophageal squamous carcinoma patientijRepresenting the expected value of a sample with the variable value i and belonging to the jth esophageal squamous carcinoma patient, wherein TijThe definition is as follows:
4. the IDPC and LASSO based esophageal squamous cancer prognostic survival risk assessment method according to claim 1, wherein said information gain is calculated by:
wherein, grThe information gain rate is represented, delta H is the information gain of the attribute, and InfoBeform (H) is the information entropy before attribute classification;
the method for calculating the information gain delta H comprises the following steps:
△H=InfoBefore(H)-InfoAfter(H);
wherein, the InfoAfter (H) is the information entropy after attribute classification;
the calculation methods of the information entropy InfoBeform (H) before attribute classification and the information entropy InfoAfter (H) after attribute classification are respectively as follows:
5. The IDPC-and LASSO-based esophageal squamous cancer prognostic survival risk assessment method according to claim 1, wherein said esophageal squamous cancer patient's blood-related biochemical indicators include White Blood Cell Count (WBCC), lymphocyte count (LYC), monocyte count (MOC), neutrophil count (NEC), eosinophil count (EOS), basophil count (BAC), erythrocyte count (ERY), Hemoglobin (HGB), platelet count (THC), Total Protein (TP), Albumin (ALB), Globulin (GLO), Prothrombin Time (PT), Activated Partial Thrombin Time (APTT), Thrombin Time (TT), Fibrinogen (FIB).
6. The IDPC and LASSO based method of assessing risk of esophageal squamous carcinoma prognosis survival as claimed in claim 1, wherein the method of using minimum absolute contraction and selection operator to select out the index significantly correlated with risk of postoperative survival is:
where Y is an n × 1 vector, Y represents an actual value corresponding to a sample X, X is an n × p matrix, X represents an input sample of LASSO regression, and β ═ is (β ═ is)1,β2,…,βp)TIs a vector of regression coefficients of p x 1,is a penalty term, λ>0 is an adjustment parameter to balance penalty term and empirical risk.
7. The IDPC-and LASSO-based method for assessing risk of esophageal squamous cancer prognosis survival as claimed in claim 1, wherein in step four the minimum distance δ between the data point of the improved density peak clustering algorithm based on cosine distance and K nearest neighbors and the cluster centeriThe calculation method comprises the following steps:
where ρ isi'Is the local density, di'j”Is xi'And xj”Cosine distance between, xi'Denotes the ith' patient sample, xj”Represents the jth "patient sample, N is the number of patient samples;
xi'and xj”Cosine distance d betweeni'j”The calculation method comprises the following steps:
wherein x isi'aRepresents a sample xi'Corresponding value of middle feature a, xj”aRepresents a sample xj”The corresponding value of the characteristic a is shown, and L is the characteristic quantity;
local density ρi'The calculation method comprises the following steps:
wherein, kNN (x)i') Is xi'K neighbor set of (1);
xi'k neighbor set kNN (x)i') The calculation method comprises the following steps:
kNN(xi')={xj”∈X|d(xi',xj”)≤d(xi',NNk(xi'))};
wherein d (x)i',xj”) Is xi'And xj”Cosine distance of, NNk(xi') Is xi'The k-th neighbor of (2).
8. The IDPC and LASSO based esophageal squamous cancer prognostic survival risk assessment method according to claim 1, wherein said logistic regression model is:
9. The IDPC and LASSO based esophageal squamous cancer prognostic survival risk assessment method according to claim 8, wherein the parameter α in the logistic regression model0And the regression coefficient alpha1The calculation method comprises the following steps:
wherein, L (alpha)0,α1) Representing the estimated alpha0And alpha1N, n represents the number of samples, y represents the likelihood function of (1, 2, …)t'represents an input variable x'tThe predicted value of (2).
10. The IDPC and LASSO based esophageal squamous cancer prognostic survival risk assessment method according to claim 8 or 9, wherein said logistic regression model based nomogram is constructed by the method comprising: assigning scores to each value level of the influence factors according to the size of the regression coefficient in the logistic regression model, and then adding the scores to obtain a total score; the 3-year probability of survival and the 5-year probability of survival are obtained by the total score located on the total score scale.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210276812.3A CN114639482A (en) | 2022-03-21 | 2022-03-21 | IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210276812.3A CN114639482A (en) | 2022-03-21 | 2022-03-21 | IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114639482A true CN114639482A (en) | 2022-06-17 |
Family
ID=81949717
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210276812.3A Pending CN114639482A (en) | 2022-03-21 | 2022-03-21 | IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114639482A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
-
2022
- 2022-03-21 CN CN202210276812.3A patent/CN114639482A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
CN117524486B (en) * | 2024-01-04 | 2024-04-05 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112185549B (en) | Esophageal squamous carcinoma risk prediction system based on clinical phenotype and logistic regression analysis | |
US20230222311A1 (en) | Generating machine learning models using genetic data | |
CN112259221A (en) | Lung cancer diagnosis system based on multiple machine learning algorithms | |
US20020042681A1 (en) | Characterization of phenotypes by gene expression patterns and classification of samples based thereon | |
CN113517073B (en) | Method for constructing survival rate prediction model after lung cancer surgery and prediction model system | |
CN113539498A (en) | Decision tree model-based system for predicting malignant risk of isolated pulmonary nodules | |
CN111128372A (en) | Disease prediction method based on RF-LR improved algorithm | |
CN114639482A (en) | IDPC and LASSO-based esophageal squamous carcinoma prognosis survival risk assessment method | |
CN112735606A (en) | Colorectal cancer risk prediction method, device and storage medium | |
CN115862838A (en) | Bile duct cancer diagnosis model based on machine learning algorithm and construction method and application thereof | |
CN114220487A (en) | Construction method of novel 9-gene RISK acute myelogenous leukemia prognosis model | |
CN117423479A (en) | Prediction method and system based on pathological image data | |
CN116047074B (en) | Marker for diagnosing and/or predicting lung cancer, diagnostic model and construction method thereof | |
CN110010246A (en) | A kind of disease Intelligent Diagnosis Technology based on neural network and confidence interval | |
CN116130105A (en) | Health risk prediction method based on neural network | |
Casey et al. | A machine learning approach to prostate cancer risk classification through use of RNA sequencing data | |
KR102397822B1 (en) | Apparatus and method for analyzing cells using chromosome structure and state information | |
CN113971984A (en) | Classification model construction method and device, electronic equipment and storage medium | |
JP2002132749A (en) | Sampling bias evaluating/decreasing device | |
KR102225231B1 (en) | IDENTIFYING METHOD FOR TUMOR PATIENT BASED ON miRNA IN EXOSOME AND APPARATUS FOR THE SAME | |
Suhiman et al. | Classification of Breast Cancer Subtypes using Microarray RNA Expression Data | |
Mishra et al. | Analyzing the Impact of Feature Correlation on Classification Acuracy of Machine Learning Model | |
Kumar et al. | Cervical Cancer Prediction Using Machine Learning Algorithms | |
Agaal et al. | Biological and Tumor Markers in Early Prediction Phase of Breast Cancer Using Classification and Regression Tree: Sebha Oncology Center as a Case study | |
CN117174323B (en) | SFTs integration risk assessment system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |