CN112768060A - Liver cancer postoperative recurrence prediction method based on random survival forest and storage medium - Google Patents
Liver cancer postoperative recurrence prediction method based on random survival forest and storage medium Download PDFInfo
- Publication number
- CN112768060A CN112768060A CN202110098484.8A CN202110098484A CN112768060A CN 112768060 A CN112768060 A CN 112768060A CN 202110098484 A CN202110098484 A CN 202110098484A CN 112768060 A CN112768060 A CN 112768060A
- Authority
- CN
- China
- Prior art keywords
- recurrence
- liver cancer
- case
- postoperative
- risk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000007270 liver cancer Diseases 0.000 title claims abstract description 73
- 208000014018 liver neoplasm Diseases 0.000 title claims abstract description 72
- 230000002980 postoperative effect Effects 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000004083 survival effect Effects 0.000 title claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 238000001356 surgical procedure Methods 0.000 claims abstract description 20
- 230000001575 pathological effect Effects 0.000 claims abstract description 11
- 238000007689 inspection Methods 0.000 claims abstract description 6
- 206010028980 Neoplasm Diseases 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 27
- 238000012360 testing method Methods 0.000 claims description 24
- 230000009545 invasion Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 11
- 206010073071 hepatocellular carcinoma Diseases 0.000 claims description 9
- 201000011510 cancer Diseases 0.000 claims description 8
- 208000019425 cirrhosis of liver Diseases 0.000 claims description 8
- 102000009027 Albumins Human genes 0.000 claims description 7
- 108010088751 Albumins Proteins 0.000 claims description 7
- 238000008050 Total Bilirubin Reagent Methods 0.000 claims description 7
- 102000013529 alpha-Fetoproteins Human genes 0.000 claims description 7
- 108010026331 alpha-Fetoproteins Proteins 0.000 claims description 7
- 230000004069 differentiation Effects 0.000 claims description 7
- 238000002271 resection Methods 0.000 claims description 6
- 206010027476 Metastases Diseases 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 5
- 231100000844 hepatocellular carcinoma Toxicity 0.000 claims description 5
- 230000003908 liver function Effects 0.000 claims description 5
- 230000009401 metastasis Effects 0.000 claims description 5
- 210000000056 organ Anatomy 0.000 claims description 5
- 230000007170 pathology Effects 0.000 claims description 5
- 210000004204 blood vessel Anatomy 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 230000002265 prevention Effects 0.000 abstract description 4
- 230000000306 recurrent effect Effects 0.000 abstract description 4
- 210000001772 blood platelet Anatomy 0.000 description 6
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009790 vascular invasion Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 208000006454 hepatitis Diseases 0.000 description 2
- 231100000283 hepatitis Toxicity 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 208000005189 Embolism Diseases 0.000 description 1
- 206010016654 Fibrosis Diseases 0.000 description 1
- 208000005176 Hepatitis C Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000002775 capsule Substances 0.000 description 1
- 230000007882 cirrhosis Effects 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000002672 hepatitis B Diseases 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
Abstract
The invention provides a liver cancer postoperative recurrence prediction method and a storage medium based on a random survival forest, wherein the method comprises the following steps: acquiring clinical data and recurrence time of each case; the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors; acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case; and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case. The method can accurately predict the postoperative recurrence probability of the liver cancer of individual patients, and better determine the postoperative attention; help in active prevention; particularly, aiming at medical institutions, the method can help medical staff to accurately screen out high-risk recurrent patients after liver cancer surgery, is helpful for intervention in early recurrence and guides postoperative follow-up and treatment.
Description
Technical Field
The invention relates to the field of bioinformatics, in particular to a liver cancer postoperative recurrence prediction method and a storage medium based on a random survival forest.
Background
Primary liver cancer (hereinafter referred to as liver cancer) is one of the most common malignant tumors in China, the incidence rate is the fourth rate of tumor incidence in China, the mortality rate is the third rate of tumor mortality in China, and the liver cancer seriously threatens the life and health of people in China. At present, surgical resection is the main means for radical treatment of liver cancer, but postoperative recurrence is still the important reason for death after liver cancer operation. Clinical data indicate that the recurrence rate after liver cancer surgery is about 50%. Recurrence is generally divided into early recurrence and late recurrence at 2-year cut-off, with the number of early recurrence accounting for about 70% of the total recurrence. Therefore, the method can be used for accurately predicting the early relapse of the liver cancer after the operation, screening the patients with high risk of early relapse, providing proper monitoring in clinical diagnosis and treatment so as to find the tumor at the early stage of relapse, and providing radical treatment again, so that the method has very high clinical value.
In recent years, the method for realizing disease risk prediction by utilizing various machine learning algorithms is a research hotspot in the field of medical big data, various complex algorithms can deeply mine the interrelationship among disease variables, but the mainstream machine learning algorithm is difficult to process medical data with deletion characteristics, so that certain deviation still exists, and the accuracy is not high.
Random Survival Forest (RSF) is a random forest method that can analyze right-erasure survival data. The method introduces a new memory splitting rule for growing the survival tree and a new missing data algorithm for estimating missing data, and is suitable for application of survival analysis. The application aims to provide a method and a storage medium for establishing a liver cancer postoperative early relapse prediction model based on random survival forests so as to obtain a more accurate disease variable relation.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method and the storage medium for predicting the postoperative recurrence of the liver cancer based on the random survival forest are provided, the postoperative recurrence probability of the liver cancer of an individual patient can be accurately predicted, and reference is provided for postoperative attention.
In order to solve the technical problems, the invention adopts the technical scheme that:
the liver cancer postoperative recurrence prediction method based on the random survival forest comprises the following steps:
acquiring clinical data and recurrence time of each case;
the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors;
acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case;
and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case.
The invention provides another technical scheme as follows:
a computer readable storage medium, having stored thereon a computer program, which when executed by a processor, is capable of implementing the steps included in the above method for predicting post-operative recurrence of liver cancer based on a random survival forest.
The invention has the beneficial effects that: according to the invention, based on the random survival forest and the clinical data of a certain amount of historical relapse cases, the early relapse prediction model after the liver cancer operation is established and obtained, so that individual prediction of patients based on the model can be realized, the relapse condition can be obtained, and active prevention is facilitated; particularly, aiming at medical institutions, the method can help medical staff to accurately screen out high-risk recurrent patients after liver cancer surgery, help intervention in early recurrence, guide postoperative follow-up and treatment, and improve the cure rate.
Drawings
FIG. 1 is a schematic flow chart of a method for predicting postoperative recurrence of liver cancer based on a random survival forest according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for predicting postoperative recurrence of liver cancer based on a random survival forest according to a second embodiment of the present invention;
fig. 3 is an exemplary diagram of an interface effect of a predicted result in the fifth embodiment of the present invention.
Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 1, the present invention provides a method for predicting postoperative recurrence of liver cancer based on a random survival forest, comprising:
acquiring clinical data and recurrence time of each case;
the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors;
acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case;
and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case.
From the above description, the beneficial effects of the present invention are: the individual prediction of the patient can be realized based on the model, the recurrence condition of the patient can be obtained, and active prevention is facilitated; particularly, aiming at medical institutions, the method can help medical staff to accurately screen out high-risk recurrent patients after liver cancer surgery, help intervention in early recurrence, guide postoperative follow-up and treatment, and improve the cure rate.
Further, the acquiring clinical data and recurrence time of each case further comprises:
the patients were obtained in the group, who had normal liver function assessment before surgery, had no history of malignant tumor, no invasion of adjacent organs and distant metastasis, had undergone hepatoma resection surgery and had pathology confirmed as hepatocellular carcinoma after surgery, and had relapsed after surgery.
From the above description, it can be known that determining qualified incoming cases according to the above conditions can significantly improve the accuracy of the model.
Further, the method for constructing and obtaining the corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case comprises the following steps:
dividing each case according to a preset proportion to obtain a training group case and a testing group case;
dividing the data set according to the training group cases and the test group to obtain a training group data set and a test group data set;
according to the training group data set and the recurrence time of each case in the training group cases, adopting a random survival forest algorithm to construct and obtain a corresponding liver cancer postoperative early recurrence prediction model and an accumulated risk function thereof;
predicting each case in the training set of cases by using the cumulative risk function to obtain a risk score set;
and dividing the risk score set according to a preset proportion to obtain risk score ranges respectively corresponding to the low-risk recurrence group, the medium-risk recurrence group and the high-risk recurrence group.
According to the description, a test group is set, and the risk score of each test case is obtained according to the model; and then, marking out the score ranges corresponding to different risk groups according to medical experience and rules, and providing support for quickly and clearly determining the risk grade to which the risk score obtained based on model calculation belongs.
Further, still include:
acquiring the group entry dimension of a case;
calculating a risk score corresponding to the case through the early relapse prediction model after the liver cancer operation according to the grouping dimension of the case;
determining corresponding risk groups according to the risk score range to which the calculated risk score belongs;
outputting the determined risk group.
According to the above description, the risk grouping to which the case belongs can be directly output, and a more intuitive and understandable prediction result can be provided.
Further, still include:
and deploying the early relapse prediction model after the liver cancer operation into a server, and generating a corresponding prediction webpage.
The above description shows that the prediction function can be provided in the form of a web page, and the method has the characteristics of simpler operation, more flow saving, less memory and resource occupation and the like.
Further, still include:
acquiring the group entry dimension of a case;
and according to the grouping dimension of the case, calculating the recurrence condition corresponding to the case through the liver cancer postoperative early recurrence prediction model.
According to the description, the recurrence condition of the case can be quickly known by directly inputting the grouping dimension information of the case, and a more accurate prediction function is provided for the user.
Further, the recurrence condition includes risk score, probability of no recurrence and their curves.
As can be seen from the above description, the data obtained based on model calculation has the characteristics of intuition, comprehensiveness and fineness.
Further, the patient basic factors include age and gender; the preoperative test factors comprise platelets, albumin, total bilirubin, etiological examination results and alpha fetoprotein; the postoperative pathological factors comprise tumor maximum diameter, tumor number, macroscopic blood vessel invasion, microvascular invasion, satellite foci, tumor envelope, liver cancer differentiation grade and liver cirrhosis type.
As can be seen from the above description, the accuracy of the prediction result can be ensured by analyzing and obtaining the early recurrence prediction model after the liver cancer operation based on the clinical data which is comprehensive enough and key to the cases.
The invention provides another technical scheme as follows:
a computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, enables the following method for predicting post-operative recurrence of liver cancer based on a random survival forest, comprising the steps of:
acquiring clinical data and recurrence time of each case;
the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors;
acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case;
and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case.
Further, the acquiring clinical data and recurrence time of each case further comprises:
the patients were obtained in the group, who had normal liver function assessment before surgery, had no history of malignant tumor, no invasion of adjacent organs and distant metastasis, had undergone hepatoma resection surgery and had pathology confirmed as hepatocellular carcinoma after surgery, and had relapsed after surgery.
Further, the method for constructing and obtaining the corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case comprises the following steps:
dividing each case according to a preset proportion to obtain a training group case and a testing group case;
dividing the data set according to the training group cases and the test group to obtain a training group data set and a test group data set;
according to the training group data set and the recurrence time of each case in the training group cases, adopting a random survival forest algorithm to construct and obtain a corresponding liver cancer postoperative early recurrence prediction model and an accumulated risk function thereof;
predicting each case in the training set of cases by using the cumulative risk function to obtain a risk score set;
and dividing the risk score set according to a preset proportion to obtain risk score ranges respectively corresponding to the low-risk recurrence group, the medium-risk recurrence group and the high-risk recurrence group.
Further, still include:
acquiring the group entry dimension of a case;
calculating a risk score corresponding to the case through the early relapse prediction model after the liver cancer operation according to the grouping dimension of the case;
determining corresponding risk groups according to the risk score range to which the calculated risk score belongs;
outputting the determined risk group.
Further, still include:
and deploying the early relapse prediction model after the liver cancer operation into a server, and generating a corresponding prediction webpage.
Further, still include:
acquiring the group entry dimension of a case;
and according to the grouping dimension of the case, calculating the recurrence condition corresponding to the case through the liver cancer postoperative early recurrence prediction model.
Further, the recurrence condition includes risk score, probability of no recurrence and their curves.
Further, the patient basic factors include age and gender; the preoperative test factors comprise platelets, albumin, total bilirubin, etiological examination results and alpha fetoprotein; the postoperative pathological factors comprise tumor maximum diameter, tumor number, macroscopic blood vessel invasion, microvascular invasion, satellite foci, tumor envelope, liver cancer differentiation grade and liver cirrhosis type.
As can be understood from the above description, those skilled in the art can understand that all or part of the processes in the above technical solutions can be implemented by instructing related hardware through a computer program, where the program can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the above methods. The program can also achieve advantageous effects corresponding to the respective methods after being executed by a processor.
The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Example one
The embodiment provides a liver cancer postoperative recurrence prediction method based on random survival forests, which comprises the following steps:
s1: obtaining cases for which liver function assessment was normal before surgery, had no history of malignant tumor, no invasion of adjacent organs and distant metastasis, underwent hepatoma resection surgery and postoperative pathology confirmed to be hepatocellular carcinoma, and recurred after surgery;
s2: acquiring clinical data and recurrence time of each case;
s3: the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors;
specifically, the patient basic factors include age and gender; the preoperative test factors comprise platelets, albumin, total bilirubin, etiological examination results and alpha fetoprotein; the postoperative pathological factors comprise tumor maximum diameter, tumor number, macroscopic blood vessel invasion, microvascular invasion, satellite foci, tumor envelope, liver cancer differentiation grade and liver cirrhosis type.
S4: acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case;
s5: and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case. Preferably, a random survival forest algorithm is adopted, and a random forest SRC program package of the R language is used for constructing and obtaining the prediction model.
S6: acquiring the group entry dimension of a case;
s7: and according to the grouping dimension of the case, calculating the recurrence condition corresponding to the case through the liver cancer postoperative early recurrence prediction model.
Preferably, the recurrence profile includes a risk score, a probability of no recurrence, and a curve thereof.
In one embodiment, the method further comprises the following steps:
s8: and deploying the liver cancer postoperative early relapse prediction model into a server, and generating a corresponding prediction webpage or prediction application.
Example two
Referring to fig. 2, the present embodiment is further defined on the basis of the first embodiment:
the S5 specifically includes:
s51: dividing each case according to a preset proportion to obtain a training group case and a testing group case;
s52: dividing the data set according to the training group cases and the test group to obtain a training group data set and a test group data set;
s53: according to the training group data set and the recurrence time of each case in the training group cases, adopting a random survival forest algorithm to construct and obtain a corresponding liver cancer postoperative early recurrence prediction model and an accumulated risk function thereof;
s54: predicting each case in the training set of cases by using the cumulative risk function to obtain a risk score set formed by the risk scores of each case;
s55: and dividing the risk score set according to a preset proportion to obtain risk score ranges respectively corresponding to the low-risk recurrence group, the medium-risk recurrence group and the high-risk recurrence group.
Specifically, the risk scores of all cases in the test group are ranked from low to high according to medical experience and rules: the high-risk patients account for a minority, the low-risk patients account for about half, namely, the high-risk patients are divided according to 50 percent and 85 percent of the patients, and the risk score range corresponding to 0-50 percent of cases is defined as a low-risk recurrence group; the corresponding risk score range of 50% -85% of cases is defined as the medium-risk relapse group; the range of risk scores corresponding to more than 85% of cases was defined as the high-risk relapse group. For example, if a patient has a risk score of 25 points that falls within the low risk recurrence group, the patient is a low risk recurrence.
Meanwhile, the method further comprises the following steps:
acquiring the group entry dimension of a case;
calculating a risk score corresponding to the case through the early relapse prediction model after the liver cancer operation according to the grouping dimension of the case;
determining corresponding risk groups according to the risk score range to which the calculated risk score belongs;
outputting the determined risk group.
The prediction result of the embodiment also comprises the risk level of the individual case, so the prediction result is more intuitive and understandable, and is more favorable for being popularized to non-medical personnel for use, thereby having stronger practicability.
EXAMPLE III
This embodiment corresponds to the second embodiment, and the whole of the scheme is further limited, and also refer to fig. 2, the method includes:
s1: acquiring grouped cases, wherein each case meets the following requirements: liver function assessment before operation is normal, the history of malignant tumor is absent, adjacent organ invasion and distant metastasis are absent, liver cancer resection operation is performed, and postoperative pathology is proved to be hepatocellular carcinoma, and postoperative recurrence is caused;
s2: acquiring the recurrence time of each case, relevant clinical data and follow-up data, and eliminating patients with incomplete data;
s3: determining an entry dimension, comprising at least:
1. basic factors of cases: sex, age;
2. preoperative test factors: platelets, albumin, total bilirubin, etiological tests (hepatitis b, hepatitis c, others), alpha fetoprotein;
3. pathological factors after operation: tumor maximum diameter, tumor number, macroscopic vascular invasion, microvascular invasion, satellite foci, tumor envelope, liver cancer differentiation grade, and liver cirrhosis type;
acquiring a data set according to the respective corresponding dimensionalities of the cases determined by S2;
s4: dividing the data set into a training group and a testing group according to a proportion by taking case corresponding data as a unit;
s5: based on the training set data set, adopting a random survival forest algorithm, constructing a model by using a randomForestSRC program package of an R language, and selecting default parameters to form a liver cancer postoperative early relapse prediction model;
s6: predicting each patient in the test group according to the accumulated risk function of the model to obtain a corresponding risk score; wherein a greater risk score indicates a greater probability of early relapse;
s7: sorting the risk scores of all patients in the test group from low to high, wherein according to medical experience and rules, the high-risk patients account for a few, the low-risk patients account for about half, and the patients are segmented according to 50% and 85% of the number of the patients, and if two segmentation points of risk scores 32.524 and 66.511 are obtained, 0-50% of the patients are divided into a low-risk recurrence group (the corresponding risk score is less than or equal to 32.524), 50% -85% of the patients are divided into a medium-risk recurrence group (32.524< risk score is less than or equal to 66.511), and more than 85% of the patients are divided into a high-risk recurrence group (risk score is more than 66.511);
for example, if a patient has a risk score of 25, the patient is a low risk recurrence; if the risk score of one patient is 50 points, the patient is in medium risk relapse; one patient had a risk score of 71, and was a high risk recurrence.
S8: constructing a webpage and a server by using a Shiny program package based on an R language, and deploying the liver cancer postoperative early relapse prediction model into the server to form a webpage prediction page;
s9: the patient who meets the grouping condition is collected, the grouping dimension of the patient is collected, and 15 indexes of the age (numerical value), the sex (male and female), the etiology (hepatitis B, hepatitis C and other), the blood platelet (numerical value), the albumin (numerical value), the total bilirubin (numerical value), the alpha fetoprotein (numerical value), the tumor size (numerical value), the tumor number (1, 2, 3, 4,5 and above), the microvascular cancer embolus (existence or not), the macroscopic vascular invasion (existence or not), the differentiation grade (I-II, III-IV), the tumor envelope (existence or not), the satellite focus (existence or not) and the liver cirrhosis condition (existence or not) of the patient are input into the model prediction page through a selector and a sliding strip;
s10: clicking a prediction button, receiving webpage data by the server, and finally obtaining model scores, risk groups, probability of no recurrence within 2 years and a curve of no recurrence by utilizing the logic operation of a training model; for example, a risk score greater than 66.511, the patient is in a high risk group and the physician needs to pay special attention to optimize post-operative treatment and follow-up.
Example four
This embodiment corresponds to the first to third embodiments, and provides a specific application scenario:
as shown in fig. 3, the patient data is entered as: age 60 (Age), Male (Male), HBV infection, platelets 57 x 109/l (plt), albumin 30g/l (alb), total bilirubin 10 μmol/l (tbil), alpha fetoprotein 388ng/ml (afp), Tumor size 12cm (Tumor size), Tumor number 1 (Tumor number), Microvascular cancer plug (Microvascular invasion), macroscopic vascular invasion (Macrovascular invasion), differentiation grade I-II (edmondside), no Tumor envelope (Tunor capsule), no Satellite foci (Satellite nodules), with a background of cirrhosis (Liver cirrhosis);
the random survival forest algorithm of the above embodiment is used for prediction, a model score of 71.39 is obtained, a curve (curve in fig. 3) of high risk patients and no recurrence in 2 years is obtained, and probabilities of no recurrence in 3 months, 6 months, 9 months, 12 months, 18 months and 24 months are calculated to be 66%, 44%, 33%, 26%, 18% and 14% respectively (above the curve in the figure, the probabilities correspond to time periods).
EXAMPLE five
In this embodiment, corresponding to the first to fourth embodiments, a computer-readable storage medium is provided, on which a computer program is stored, and the program, when executed by a processor, can implement the steps included in the method for predicting recurrence after liver cancer operation based on random survival forest of any one of the first to fourth embodiments. The detailed steps are not repeated here, and refer to the descriptions of the first to fourth embodiments for details.
In conclusion, the liver cancer postoperative recurrence prediction method and the storage medium based on the random survival forest provided by the invention can accurately predict the postoperative recurrence probability of the liver cancer of an individual patient, and better determine the postoperative attention; help in active prevention; particularly, aiming at medical institutions, the method can help medical staff to accurately screen out high-risk recurrent patients after liver cancer surgery, is helpful for intervention in early recurrence and guiding postoperative follow-up and treatment, and thus improves the cure rate; the prediction result is visual and understandable, the application range is wide, and the practicability is strong. Therefore, the method has the characteristics of easiness in implementation, convenience and quickness in operation, low cost, high accuracy, strong practicability, easiness in popularization and the like.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.
Claims (9)
1. The method for predicting postoperative recurrence of liver cancer based on random survival forest is characterized by comprising the following steps:
acquiring clinical data and recurrence time of each case;
the preset grouping dimension comprises basic factors of a patient, preoperative inspection factors and postoperative pathological factors;
acquiring a data set according to the clinical data, wherein the data set is composed of preset grouping dimensions corresponding to each case;
and (3) constructing a corresponding liver cancer postoperative early-stage recurrence prediction model by adopting a random survival forest algorithm according to the data set and the recurrence time of each case.
2. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 1, wherein the obtaining of clinical data and time of recurrence for each case further comprises:
the patients were obtained in the group, who had normal liver function assessment before surgery, had no history of malignant tumor, no invasion of adjacent organs and distant metastasis, had undergone hepatoma resection surgery and had pathology confirmed as hepatocellular carcinoma after surgery, and had relapsed after surgery.
3. The method for predicting postoperative recurrence of liver cancer based on random survival forest as claimed in claim 1, wherein the step of constructing a corresponding postoperative early recurrence prediction model of liver cancer by using a random survival forest algorithm according to the data set and the recurrence time of each case comprises:
dividing each case according to a preset proportion to obtain a training group case and a testing group case;
dividing the data set according to the training group cases and the test group to obtain a training group data set and a test group data set;
according to the training group data set and the recurrence time of each case in the training group cases, adopting a random survival forest algorithm to construct and obtain a corresponding liver cancer postoperative early recurrence prediction model and an accumulated risk function thereof;
predicting each case in the training set of cases by using the cumulative risk function to obtain a risk score set;
and dividing the risk score set according to a preset proportion to obtain risk score ranges respectively corresponding to the low-risk recurrence group, the medium-risk recurrence group and the high-risk recurrence group.
4. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 3 further comprising:
acquiring the group entry dimension of a case;
calculating a risk score corresponding to the case through the early relapse prediction model after the liver cancer operation according to the grouping dimension of the case;
determining corresponding risk groups according to the risk score range to which the calculated risk score belongs;
outputting the determined risk group.
5. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 1 further comprising:
and deploying the early relapse prediction model after the liver cancer operation into a server, and generating a corresponding prediction webpage.
6. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 1 further comprising:
acquiring the group entry dimension of a case;
and according to the grouping dimension of the case, calculating the recurrence condition corresponding to the case through the liver cancer postoperative early recurrence prediction model.
7. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 6 wherein the recurrence profile comprises risk score, probability of no recurrence and their profile.
8. The method of predicting post-operative recurrence of liver cancer based on random survival forests as claimed in claim 6 wherein the patient's basic factors include age and gender; the preoperative test factors comprise platelets, albumin, total bilirubin, etiological examination results and alpha fetoprotein; the postoperative pathological factors comprise tumor maximum diameter, tumor number, macroscopic blood vessel invasion, microvascular invasion, satellite foci, tumor envelope, liver cancer differentiation grade and liver cirrhosis type.
9. A computer-readable storage medium, having a computer program stored thereon, wherein the program, when executed by a processor, is capable of implementing the steps included in the method for predicting post-operative recurrence of liver cancer based on random survival forest according to any one of claims 1 to 8.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010671934 | 2020-07-14 | ||
CN2020106719343 | 2020-07-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112768060A true CN112768060A (en) | 2021-05-07 |
Family
ID=75707197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110098484.8A Pending CN112768060A (en) | 2020-07-14 | 2021-01-25 | Liver cancer postoperative recurrence prediction method based on random survival forest and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112768060A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113517023A (en) * | 2021-05-18 | 2021-10-19 | 柳州市人民医院 | Sex-related liver cancer prognosis marker factor and screening method thereof |
CN113571194A (en) * | 2021-07-09 | 2021-10-29 | 清华大学 | Modeling method and device for hepatocellular carcinoma long-term prognosis prediction |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120034235A1 (en) * | 2009-01-22 | 2012-02-09 | Korea Institute Of Radiological & Medical Sciences | Marker for Liver-Cancer Diagnosis and Recurrence and Survival Prediction, a Kit Comprising the Same, and Prognosis Prediction in Liver-Cancer Patients Using the Marker |
US20180251851A1 (en) * | 2015-09-10 | 2018-09-06 | Mathias HEIKENWALDER | Ectopic lymphoid structures as targets for liver cancer detection, risk prediction and therapy |
CN110660481A (en) * | 2019-09-27 | 2020-01-07 | 颐保医疗科技(上海)有限公司 | Artificial intelligence technology-based primary liver cancer recurrence prediction method |
CN110791565A (en) * | 2019-09-29 | 2020-02-14 | 浙江大学 | Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model |
CN110993106A (en) * | 2019-12-11 | 2020-04-10 | 深圳市华嘉生物智能科技有限公司 | Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information |
-
2021
- 2021-01-25 CN CN202110098484.8A patent/CN112768060A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120034235A1 (en) * | 2009-01-22 | 2012-02-09 | Korea Institute Of Radiological & Medical Sciences | Marker for Liver-Cancer Diagnosis and Recurrence and Survival Prediction, a Kit Comprising the Same, and Prognosis Prediction in Liver-Cancer Patients Using the Marker |
US20180251851A1 (en) * | 2015-09-10 | 2018-09-06 | Mathias HEIKENWALDER | Ectopic lymphoid structures as targets for liver cancer detection, risk prediction and therapy |
CN110660481A (en) * | 2019-09-27 | 2020-01-07 | 颐保医疗科技(上海)有限公司 | Artificial intelligence technology-based primary liver cancer recurrence prediction method |
CN110791565A (en) * | 2019-09-29 | 2020-02-14 | 浙江大学 | Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model |
CN110993106A (en) * | 2019-12-11 | 2020-04-10 | 深圳市华嘉生物智能科技有限公司 | Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information |
Non-Patent Citations (1)
Title |
---|
陈凯 等: "肝癌根治性切除术后早期复发危险因素分析及预测模型构建", 《中华肿瘤防治杂志》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113517023A (en) * | 2021-05-18 | 2021-10-19 | 柳州市人民医院 | Sex-related liver cancer prognosis marker factor and screening method thereof |
CN113571194A (en) * | 2021-07-09 | 2021-10-29 | 清华大学 | Modeling method and device for hepatocellular carcinoma long-term prognosis prediction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102491988B1 (en) | Methods and systems for using quantitative imaging | |
Azar et al. | Decision tree classifiers for automated medical diagnosis | |
CN105184103B (en) | Virtual name based on the database of case history cures system | |
Bozkurt et al. | Using automatically extracted information from mammography reports for decision-support | |
Peng et al. | Random forest can predict 30‐day mortality of spontaneous intracerebral hemorrhage with remarkable discrimination | |
CN110246577B (en) | Method for assisting gestational diabetes genetic risk prediction based on artificial intelligence | |
CN115036002B (en) | Treatment effect prediction method based on multi-mode fusion model and terminal equipment | |
CN112768060A (en) | Liver cancer postoperative recurrence prediction method based on random survival forest and storage medium | |
CN112542247B (en) | Method and system for predicting complete remission probability of pathology after breast cancer neoadjuvant chemotherapy | |
CN113223722B (en) | Method and system for constructing lung nodule database and prediction model based on nomogram | |
Movahedi et al. | Limitations of receiver operating characteristic curve on imbalanced data: assist device mortality risk scores | |
Alam et al. | A machine learning classification technique for predicting prostate cancer | |
CN115376706A (en) | Prediction model-based breast cancer drug scheme prediction method and device | |
Chen et al. | Integration of pre-surgical blood test results predict microvascular invasion risk in hepatocellular carcinoma | |
Armstrong | Diagnosis: From classification to prediction | |
CN111524600A (en) | Liver cancer postoperative recurrence risk prediction system based on neighbor2vec | |
Fogarasi et al. | Glandular object based tumor morphometry in H&E biopsy samples for prostate cancer prognosis | |
CN117271804A (en) | Method, device, equipment and medium for generating common disease feature knowledge base | |
CN110895969A (en) | Atrial fibrillation prediction decision tree and pruning method thereof | |
Lu et al. | Deep learning-based long term mortality prediction in the National Lung Screening Trial | |
Singh | A Comprehensive Review of Diagnosis of Renal Cancer | |
CN114613498B (en) | Machine learning-based MDT (minimization drive test) clinical decision making assisting method, system and equipment | |
Guo et al. | Integrated learning: screening optimal biomarkers for identifying preeclampsia in placental mRNA samples | |
Dy et al. | Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology A Step Closer to Widescale Deployment | |
Guo et al. | LesionTalk: Core Data Extraction and Multi-class Lesion Detection in IoT-based Intelligent Healthcare |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210507 |