CN117116357A - Bragg treatment immune response prediction method and device - Google Patents
Bragg treatment immune response prediction method and device Download PDFInfo
- Publication number
- CN117116357A CN117116357A CN202311132990.XA CN202311132990A CN117116357A CN 117116357 A CN117116357 A CN 117116357A CN 202311132990 A CN202311132990 A CN 202311132990A CN 117116357 A CN117116357 A CN 117116357A
- Authority
- CN
- China
- Prior art keywords
- immune response
- data
- sample
- patient
- treatment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011282 treatment Methods 0.000 title claims abstract description 84
- 230000028993 immune response Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000007637 random forest analysis Methods 0.000 claims abstract description 38
- 239000013610 patient sample Substances 0.000 claims abstract description 32
- 210000005259 peripheral blood Anatomy 0.000 claims abstract description 29
- 239000011886 peripheral blood Substances 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 14
- 238000007477 logistic regression Methods 0.000 claims abstract description 10
- 239000000523 sample Substances 0.000 claims description 53
- 238000003066 decision tree Methods 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 16
- 238000000611 regression analysis Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 230000009467 reduction Effects 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 9
- 238000002626 targeted therapy Methods 0.000 claims description 9
- 238000002790 cross-validation Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 230000008685 targeting Effects 0.000 claims 1
- 206010028980 Neoplasm Diseases 0.000 abstract description 43
- 230000004083 survival effect Effects 0.000 abstract description 11
- 210000004698 lymphocyte Anatomy 0.000 description 16
- 238000001959 radiotherapy Methods 0.000 description 14
- 108010017213 Granulocyte-Macrophage Colony-Stimulating Factor Proteins 0.000 description 12
- 201000011510 cancer Diseases 0.000 description 12
- 230000008859 change Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 12
- 238000009169 immunotherapy Methods 0.000 description 12
- 102100039620 Granulocyte-macrophage colony-stimulating factor Human genes 0.000 description 11
- 239000012270 PD-1 inhibitor Substances 0.000 description 10
- 239000012668 PD-1-inhibitor Substances 0.000 description 10
- 229940121655 pd-1 inhibitor Drugs 0.000 description 10
- 238000012360 testing method Methods 0.000 description 10
- 230000004044 response Effects 0.000 description 9
- 102000000588 Interleukin-2 Human genes 0.000 description 7
- 108010002350 Interleukin-2 Proteins 0.000 description 7
- 230000008901 benefit Effects 0.000 description 7
- 210000004369 blood Anatomy 0.000 description 7
- 239000008280 blood Substances 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 210000003651 basophil Anatomy 0.000 description 5
- 210000002865 immune cell Anatomy 0.000 description 5
- 210000000440 neutrophil Anatomy 0.000 description 5
- 210000002966 serum Anatomy 0.000 description 5
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 4
- 108010074328 Interferon-gamma Proteins 0.000 description 4
- 102000003855 L-lactate dehydrogenase Human genes 0.000 description 4
- 108700023483 L-lactate dehydrogenases Proteins 0.000 description 4
- 239000012269 PD-1/PD-L1 inhibitor Substances 0.000 description 4
- 210000001744 T-lymphocyte Anatomy 0.000 description 4
- 210000003719 b-lymphocyte Anatomy 0.000 description 4
- 230000006378 damage Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 210000003979 eosinophil Anatomy 0.000 description 4
- 210000000822 natural killer cell Anatomy 0.000 description 4
- 229940121653 pd-1/pd-l1 inhibitor Drugs 0.000 description 4
- 230000005855 radiation Effects 0.000 description 4
- 108010088751 Albumins Proteins 0.000 description 3
- 102000009027 Albumins Human genes 0.000 description 3
- 102000004127 Cytokines Human genes 0.000 description 3
- 108090000695 Cytokines Proteins 0.000 description 3
- 239000003154 D dimer Substances 0.000 description 3
- 102000003814 Interleukin-10 Human genes 0.000 description 3
- 108090000174 Interleukin-10 Proteins 0.000 description 3
- 102000004388 Interleukin-4 Human genes 0.000 description 3
- 108090000978 Interleukin-4 Proteins 0.000 description 3
- 102000004889 Interleukin-6 Human genes 0.000 description 3
- 108090001005 Interleukin-6 Proteins 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 108010052295 fibrin fragment D Proteins 0.000 description 3
- 210000000265 leukocyte Anatomy 0.000 description 3
- 210000001616 monocyte Anatomy 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 108010074708 B7-H1 Antigen Proteins 0.000 description 2
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 2
- 108010049003 Fibrinogen Proteins 0.000 description 2
- 102000008946 Fibrinogen Human genes 0.000 description 2
- 102100037850 Interferon gamma Human genes 0.000 description 2
- 102000008070 Interferon-gamma Human genes 0.000 description 2
- 102000013691 Interleukin-17 Human genes 0.000 description 2
- 108050003558 Interleukin-17 Proteins 0.000 description 2
- 208000032818 Microsatellite Instability Diseases 0.000 description 2
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 description 2
- 238000000692 Student's t-test Methods 0.000 description 2
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004820 blood count Methods 0.000 description 2
- 238000000546 chi-square test Methods 0.000 description 2
- 229940012952 fibrinogen Drugs 0.000 description 2
- 229940044627 gamma-interferon Drugs 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 229940076144 interleukin-10 Drugs 0.000 description 2
- 229940028885 interleukin-4 Drugs 0.000 description 2
- 229940100601 interleukin-6 Drugs 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- 230000008791 toxic response Effects 0.000 description 2
- 102000003390 tumor necrosis factor Human genes 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 102100036475 Alanine aminotransferase 1 Human genes 0.000 description 1
- 108010082126 Alanine transaminase Proteins 0.000 description 1
- 108010003415 Aspartate Aminotransferases Proteins 0.000 description 1
- 102000004625 Aspartate Aminotransferases Human genes 0.000 description 1
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 102000004457 Granulocyte-Macrophage Colony-Stimulating Factor Human genes 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 102100034458 Hepatitis A virus cellular receptor 2 Human genes 0.000 description 1
- 101000834898 Homo sapiens Alpha-synuclein Proteins 0.000 description 1
- 101001068133 Homo sapiens Hepatitis A virus cellular receptor 2 Proteins 0.000 description 1
- 101001057504 Homo sapiens Interferon-stimulated gene 20 kDa protein Proteins 0.000 description 1
- 101001055144 Homo sapiens Interleukin-2 receptor subunit alpha Proteins 0.000 description 1
- 101001137987 Homo sapiens Lymphocyte activation gene 3 protein Proteins 0.000 description 1
- 101000611936 Homo sapiens Programmed cell death protein 1 Proteins 0.000 description 1
- 101000652359 Homo sapiens Spermatogenesis-associated protein 2 Proteins 0.000 description 1
- 101000716102 Homo sapiens T-cell surface glycoprotein CD4 Proteins 0.000 description 1
- 101000946843 Homo sapiens T-cell surface glycoprotein CD8 alpha chain Proteins 0.000 description 1
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 1
- 102100027268 Interferon-stimulated gene 20 kDa protein Human genes 0.000 description 1
- 102000017578 LAG3 Human genes 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 102100040678 Programmed cell death protein 1 Human genes 0.000 description 1
- 102100036011 T-cell surface glycoprotein CD4 Human genes 0.000 description 1
- 102100034922 T-cell surface glycoprotein CD8 alpha chain Human genes 0.000 description 1
- 102000004887 Transforming Growth Factor beta Human genes 0.000 description 1
- 108090001012 Transforming Growth Factor beta Proteins 0.000 description 1
- PNNCWTXUWKENPE-UHFFFAOYSA-N [N].NC(N)=O Chemical compound [N].NC(N)=O PNNCWTXUWKENPE-UHFFFAOYSA-N 0.000 description 1
- 208000037844 advanced solid tumor Diseases 0.000 description 1
- 230000000259 anti-tumor effect Effects 0.000 description 1
- 238000011319 anticancer therapy Methods 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- JXSJBGJIGXNWCI-UHFFFAOYSA-N diethyl 2-[(dimethoxyphosphorothioyl)thio]succinate Chemical compound CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC JXSJBGJIGXNWCI-UHFFFAOYSA-N 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000035931 haemagglutination Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000005534 hematocrit Methods 0.000 description 1
- 230000002489 hematologic effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 239000000411 inducer Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010253 intravenous injection Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 210000000207 lymphocyte subset Anatomy 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010172 mouse model Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 229910052708 sodium Inorganic materials 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 239000007929 subcutaneous injection Substances 0.000 description 1
- 238000010254 subcutaneous injection Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- ZRKFYGHZFMAOKI-QMGMOQQFSA-N tgfbeta Chemical compound C([C@H](NC(=O)[C@H](C(C)C)NC(=O)CNC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CC(C)C)NC(=O)CNC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCSC)C(C)C)[C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N1[C@@H](CCC1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O)C1=CC=C(O)C=C1 ZRKFYGHZFMAOKI-QMGMOQQFSA-N 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 230000004797 therapeutic response Effects 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Physiology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a Bragg treatment immune response prediction method and a Bragg treatment immune response prediction device, wherein the method comprises the following steps: acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a logistic regression algorithm. The invention predicts the immune response and survival of the Bragg treatment patient by using a pre-trained prediction model, thereby noninvasively identifying the tumor patient possibly benefiting from the Bragg scheme treatment by using a relatively accurate prediction result and providing data support for early and accurate individualized intervention on different patients in different periods.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a Bragg treatment immune response prediction method and device.
Background
Tumor immunotherapy (anti-PD-1 or anti-PD-L1) has made breakthrough progress in recent years, and is widely used in the treatment of various advanced solid tumors, but the effective rate of immunotherapy alone is only 15-25%. Thus, patent CN111951897a has proposed a method of predicting the responsiveness of cancer patients to anti-PD-1/PD-L1 immunotherapy, predicting the effect of the treatment prior to the first treatment, helping cancer patients to decide whether to receive immunotherapy; or to determine the effect of a treatment after a previous treatment before the next treatment, helping the cancer patient to decide whether to continue to receive immunotherapy. The method comprises the following steps: obtaining a peripheral blood sample from the cancer patient prior to receiving the immunotherapy; detecting the number of immune cells in the peripheral blood sample of the cancer patient; and comparing the immune cell number to a first threshold to predict whether the cancer patient will benefit from the immunotherapy, wherein the first threshold is determined by: a statistical analysis is performed on the correlation between the number of immune cells in a group of cancer patients and the expected risk of disease progression in the group of cancer patients, and then statistically significant values are obtained, wherein the values are used to define the correlation. The immune cells exhibit at least one of the following markers: PD1, CD8, CD4, IFN-gamma, TIM3, LAG3, CD25, TGF-beta. The patent has the following problems that the selected peripheral blood index is not screened, part of immune cells are artificially selected, the index showing the relevant condition of a tumor or a patient is not available, the immune response itself is an countermeasure process between the organism and the tumor, and the response condition can be accurately estimated by exploring from the two aspects of the organism and the tumor; when the related immune indexes are used for prediction, only the efficiency of a single index is considered, the influence of other indexes is ignored, whether the indexes are related or not is unknown, and whether the influence of the relation between the indexes on the prediction factors is unknown or not is not known, so that the prediction efficiency of the indexes is considered; different tumor types have different prediction indexes and are complicated in clinical use.
Through searching, it was found that, in order to predict the therapeutic response of cancer patients to anticancer therapy, the hafumez roche company used data from the flair Health database to conduct survival analysis of 99,249 people from 12 different groups (RoPro 1) and 110,538 people from 15 different groups (RoPro 2), the groups being defined by tumor types, and validated the results in two independent clinical studies. Inputting cancer patient information into a model to generate a score indicative of a risk of mortality of the cancer patient, wherein the patient information includes data corresponding to each of the following parameters: (i) albumin levels in serum or plasma; (ii) eastern tumor cooperative group (ECOG) physical condition; (iii) lymphocyte to leukocyte ratio in blood; (iv) a smoking condition; (v) age; (vi) TNM classification of malignancy stage; (vii) heart rate; (viii) chloride or sodium levels in serum or plasma; (ix) urea nitrogen levels in serum or plasma; (x) sex; (xi) hemoglobin or hematocrit levels in blood; (xii) Level of aspartate aminotransferase activity in serum or plasma; and (xiii) alanine aminotransferase activity level in serum or plasma. However, in the data preliminary screening, each parameter is analyzed separately, the influence of the correlation between the parameters on the predicted variable is not considered, the parameter data not only relates to the blood detection part, but also relates to the basic information part of the patient, and factors which are subjective or greatly change with the activity state of the patient, such as physical state scores (ECOG scores), blood pressure, heart rate and the like, and the predicted result of the model may have larger deviation. For the same tumor, the data of different treatment methods are included, so that the accurate prediction of the immunotherapy can not be performed; immunotherapy, which may be characterized by its unique hematological response, especially dynamic changes in lymphocytes and their related subtypes, is not demonstrated.
Immunization in combination with other therapeutic modalities, such as radiation therapy, chemotherapy, targeted therapy, cytokine therapy, etc., is currently the means to increase the efficacy of immunotherapy. The Bragg treatment is an important means of an immune combined method, and utilizes the combined application of PD-1inhibitor, radiotherapy and granulocyte-macrophage colony stimulating factor (PRaG: PD-1inhibitor,Radiotherapy and GM-CSF, which has better curative effect in the treatment of advanced refractory tumors, partial research results apply for related patents, such as colon cancer peritoneal metastasis mouse model for evaluating the curative effect of immunotherapy, patent application CN202110311772.7, bragg decision scheme evaluation method and device, and patent application CN202210774618.8.
However, not all patients would benefit significantly from this regimen, and PRaG 1.0 treatment studies showed a median progression free survival (mPFS) of 4.0 months, a Disease Control Rate (DCR) of 46.3% and an objective tumor remission rate (ORR) of only 16.7%.
Therefore, it is desirable to provide a method and apparatus for predicting the immune response and survival of a patient undergoing bragg treatment, so as to identify a tumor patient who may benefit from bragg treatment using a relatively accurate prediction result, and provide data support for early and accurate individualized intervention for different patients in different periods, which is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention provides a tumor immune response prediction method and a device, so that an immune response and survival of a Bragg treatment patient can be predicted by using a pre-trained prediction model, and thus a tumor patient who is likely to benefit from Bragg scheme treatment can be identified noninvasively by using a relatively accurate prediction result, and data support is provided for early and accurate individualized intervention on different patients in different periods.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
the present invention provides a method of predicting an immune response to a Bragg treatment, in particular, the present invention provides a method of predicting an immune response to a Bragg treatment patient, the method comprising:
acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;
inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
In some embodiments, training with an optimal feature subset of a patient sample based on a random forest model results in the immune response prediction model, specifically comprising:
collecting characteristic data of a patient sample, wherein the characteristic data comprises peripheral blood index data and image data;
carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using regression analysis calculation, and obtaining an optimal feature subset;
building a training set d= { (x) containing m samples based on an optimal feature subset 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m )};
Based on the training set, a random forest (a plurality of decision trees) is constructed.
In some embodiments, calculating an output result of each of the decision trees using a first expression;
the first expression is:
wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the decision tree (number) in the forest, and H t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.
In some embodiments, the regression analysis algorithm is used to screen the feature data after the noise reduction processing, and obtain an optimal feature subset, which specifically includes:
Based on LASSO regression, L1 regularization is introduced on the basis of logistic regression;
and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.
In some embodiments, all sample tags y in the training set are predicted using the second expression;
the second expression is:
wherein θ= (w, b) represents the target parameter, h θ (x) According to the probability size, the prediction result (assigned as 1 or 0 in the case of classification) of the sample label y is represented, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.
In some embodiments, with the goal of minimizing the sum of logistic regression loss and regularization term, the optimization function is constructed as:
wherein m represents the number of samples in the training set, θ= (w, b) represents the target parameter, h θ (x i ) Representing the conditional probability of the tag value at a sample known value, λ represents the parameter of the regularization term, x i Represents the i-th sample, y i Representing the label corresponding to the ith sample, k representing the number of feature elements, w j The weight coefficient representing the j-th feature.
In some embodiments, collecting the characteristic data of the patient sample includes collecting the characteristic data of a case sample that did not employ the target therapy, and collecting the characteristic data of a case sample that employed the target therapy.
The present invention also provides a bragg treatment immune response prediction device, the device comprising:
the data acquisition unit is used for acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;
the result output unit is used for inputting the characteristic data into a pre-trained immune response prediction model so as to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
According to the tumor immune response prediction method provided by the invention, the characteristic data of a target patient are obtained, wherein the characteristic data comprise peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
The beneficial technical effects of the invention are as follows:
the biomarkers predicted by the current immunotherapy are mainly tumor PD-L1 expression, tumor mutation load (TMB) and tumor microsatellite instability (MSI), but the prediction efficacy of the three are not satisfactory. The multi-parameter immune prediction model combined with various gene characteristics improves the prediction efficiency to a certain extent, but greatly depends on tumor tissue biopsy, has invasiveness and is limited by tumor accessibility, patient illness state and willingness; in addition, tumor tissue has heterogeneity, immune response within a single lesion does not represent a systemic anti-tumor immune state, and blood sequencing costs are too high, limiting the wide clinical applications. The peripheral blood is the most common sample in clinic, is easy to obtain and has little damage to patients, can be basically considered as noninvasive, can be dynamically monitored, and has strong clinical applicability. The invention predicts the immune response or survival of the patient by using peripheral blood, and brings important references for clinical treatment decision selection, thereby benefiting the patient to the greatest extent.
There are no studies or reports currently that model the prediction of immune response or survival using only peripheral blood indicators. Research on immune related factor exploration is mostly carried out by adopting basic statistical methods such as t-test, rank and sum test, chi-square test, correlation coefficient and the like to find features related to immune response, and the methods have unavoidable defects such as t-test, rank and test, chi-square test neglecting correlations among features, correlation coefficient selection only taking into consideration linear relations between features and target variables and the like, and the methods only find that certain features are possibly related to immune response, and cannot predict immune response conditions alone or in combination of multiple features. According to the invention, only peripheral blood indexes are utilized, and the selected Lasso regression method can automatically reject some characteristics with small interaction with the predicted variables when screening parameter characteristics, and retain the characteristics with relatively large influence, so that the influence of all the characteristics on the predicted variables is judged on the whole. In addition, the random forest is not just a classification model, and when constructing each decision tree, the random forest uses part of the screened sub-features, which is equivalent to feature screening again, so that the essential relation between the predicted variable and the selected features can be further mined.
The Bragg series study is an original study of a tension element teaching subject group, no related model is used for predicting the immune response of a patient who is subjected to Bragg treatment at present, no definite index can predict the immune response clinically, and the invention firstly screens out peripheral blood indexes and builds a corresponding prediction model to realize tumor immune response prediction and survival prediction.
According to the invention, the random forest model is used for analyzing and constructing the hematology data of the case subjected to the Bragg treatment, and the possible effect of a new patient after receiving the treatment can be predicted after the relevant hematology data of the new patient is input; meanwhile, the treatment response of the patient is predicted through the almost noninvasive detection of the hematology index, so that the frequency of medical image examination and the potential radiation damage caused by the image examination are greatly reduced, the burden of the patient and the society is reduced, and the medical resource waste is reduced.
The invention predicts the immune response and survival of the Bragg treatment patient by using the pre-trained prediction model, thereby being capable of noninvasively identifying the tumor patient possibly benefiting from the Bragg scheme treatment by using the accurate prediction result and providing data support for individuation intervention of different patients in different periods as early as possible and accurately.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flowchart of a method for predicting an immune response to Bragg treatment according to the present invention;
FIG. 2 is a second flowchart of a method for predicting an immune response of Bragg treatment according to the present invention;
FIG. 3 is a third flowchart of a method for predicting an immune response of Bragg treatment according to the present invention;
FIG. 4 is a graph showing the effect of the method for predicting the immune response of Bragg treatment according to the present invention;
FIG. 5 is a schematic diagram of a method for predicting the immune response of Bragg treatment according to the present invention;
FIG. 6 is a block diagram of a Bragg treatment immune response predicting device according to the present invention;
fig. 7 is a block diagram of a computer device according to the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a Bragg treatment immune response prediction method, which is used for predicting the immune response of a new patient by using a pre-trained model and providing data support for the selection of a subsequent treatment scheme according to the prediction result.
Referring to fig. 1, fig. 1 is a flowchart of a method for predicting an immune response of bragg treatment according to the present invention.
In one embodiment, the present invention provides a method of predicting an immune response to Bragg treatment comprising the steps of:
s110, acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;
patient and data requirements for the feature data acquisition: 1) Patients received at least one treatment with a Bragg regimen (1.0-3.0); 2) Has baseline (within 28 days before the first treatment) and hematology detection data within 28 days before the 2 nd and 3 rd period treatment and belonging to the last period treatment; 3 and if only one bragg regimen treatment is used, it is necessary to have baseline and hematology data within 8 weeks after use; 4) At least 1 imaging evaluation result; 5) If there are multiple data within the baseline or 28 days prior to treatment of cycles 2 and 3, only the first detected data at the time of admission (i.e., without any treatment); 6) The missing data is not filled, and if the index included in a piece of data is missing, the piece of data is not used in the model construction.
The Bragg regimen (1.0-3.0) treats the particular regimen:
Prag1.0: patients in the group select proper focuses to carry out large-segment radiotherapy (5 or 8Gy multiplied by 2-3 Fx), GM-CSF (200 ug/d) is injected subcutaneously for 14 days beginning the second day after the radiotherapy is finished, PD-1 inhibitor is used in one week after the radiotherapy is finished, one treatment period is used every three weeks, the next period can carry out radiotherapy on different target focuses, triple treatment is carried out for at least more than or equal to 2 periods (until the focus is not properly irradiated or the tolerizing dose of normal tissues is reached), and PD-1 inhibitor is used for sequentially GM-CSF and IL-2 for 6 periods, and then PD-1 inhibitor single drug maintenance (one period every 21 days) can be used until the progress or intolerable toxic reaction is achieved.
Prag2.0: selecting proper focus for large-scale segmented radiotherapy of 10-24Gy/5-8Gy/2-3f by the patients in the group; GM-CSF200ug was injected subcutaneously Qd for 7 days starting on the current day of radiation; PD-1 inhibitor is used within one week after the radiotherapy is finished; qd was subcutaneously injected with IL-2200 ten thousand IU for 7 days 24 hours after GM-CSF was completed, with a period of 21 days. After more than or equal to 2 cycles of treatment with PD-1 inhibitor in combination with GM-CSF and IL-2, 6 cycles of treatment with PD-1 inhibitor in combination with GM-CSF and IL-2, the treatment may be maintained with PD-1 inhibitor alone (one cycle every 21 days) until progression or intolerable toxic response.
Prag3.0: the first day of treatment of the patients in the group was initiated with an intravenous injection of RC-48ADC2.0mg/kg d 1; selecting proper focus for large-scale split radiotherapy for 10-24Gy/5-8Gy/2-3f in the third day of treatment; PD-1/PD-L1 inhibitor is used within one week after the radiotherapy is finished; the day of radiotherapy begins with 200 μg subcutaneous injections Qd of GM-CSF for 5 days; IL-2200 ten thousand IU was used the next day after GM-CSF was terminated, and Qd was subcutaneously injected for 5 days; RC-48ADC combined radiotherapy and PD-1/PD-L1 inhibitor are used for sequentially treating GM-CSF and IL-2 for more than or equal to 2 periods; after 6 weeks following subsequent use of RC-48ADC and PD-1/PD-L1 inhibitor to sequential GM-CSF, IL-2, PD-1/PD-L1 inhibitor may be used alone to maintain until progression or intolerable toxic response.
The peripheral blood index data and the acquisition:
in the invention, 70 peripheral blood indexes are taken as initial characteristics, and the detection result of the peripheral blood indexes is derived from a corresponding test report of a second hospital clinical laboratory affiliated to the university of Suzhou. Wherein 35 indexes are selected from peripheral blood tests (blood routine, biochemical, hemagglutination, tumor markers, lymphocyte subpopulation analysis, cytokines) of patients in three studies of Bragg 1.0, bragg 2.0 and Bragg 3.0, which are routinely evaluated before receiving treatment. The screening basis is peripheral blood indexes possibly related to tumor immunity reported in the prior literature, and specifically comprises the following steps: comprising the following steps: t lymphocyte ratio, T helper/suppressor lymphocyte ratio, B lymphocyte ratio, NK cell ratio, T lymphocyte absolute, T helper/suppressor lymphocyte absolute, T killer/suppressor lymphocyte absolute, B lymphocyte absolute, NK cell absolute, interleukin-2, interleukin-4, interleukin-6, interleukin-10, interleukin-17A, tumor necrosis factor, gamma interferon, carcinoembryonic antigen, albumin, lactate dehydrogenase, white blood cell count, lymphocyte ratio, neutrophil ratio, lymphocyte number, neutrophil number, monocyte ratio, NLR, eosinophil number, eosinophil ratio, basophil number, basophil ratio, international normalized ratio, D-dimer, fibrinogen. The other 35 indexes are the dynamic change values of the selected 35 indexes, namely the ratio of the value of a certain index after being treated by the Bragg scheme to the index before being treated by the Bragg scheme (baseline).
The acquisition method comprises the following steps: the inspection data are manually collected by team members and scientific research assistants according to hospital numbers and related information registered by clinical trials, and the image evaluation results are evaluated by image professionals and clinical trial doctors from a second hospital inspection system and an image system attached to the university of Suzhou.
S120, inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
In some embodiments, training is performed by using an optimal feature subset of a patient sample based on a random forest model to obtain the immune response prediction model, as shown in fig. 2, specifically including the following steps:
s210: characteristic data of a patient sample is collected, wherein the characteristic data comprises peripheral blood index data and image data.
In some embodiments, collecting the characteristic data of the patient sample includes collecting the characteristic data of a case sample that did not employ the target therapy, and collecting the characteristic data of a case sample that employed the target therapy. In the data collection and pretreatment, for example, the base line of three clinical-trial patients with advanced refractory solid tumors based on a "PD-1 inhibitor combined large-segment radiotherapy and cytokine (GM-csf±il-2)" based regimen derived from a certain target hospital database in a certain period of time, and 70 indexes of 35 peripheral blood indexes before 2-3-cycle treatment and dynamic change values thereof (ratio of values before 2-3-cycle treatment to base line values) can be collected, and the first image evaluation result of the patients (classified as SD, PR, CR, PD according to the solid tumor efficacy evaluation standard RECIST 1.1) can be collected. The missing data is not filled, and if the index included in a piece of data is missing, the piece of data is not used in the model construction.
Wherein, 35 peripheral blood indexes can be reported in the prior literature as blood indexes possibly related to immune response and detected in clinical experiments, and can comprise, for example: t lymphocyte ratio, T helper/suppressor lymphocyte ratio, B lymphocyte ratio, NK cell ratio, T lymphocyte absolute, T helper/suppressor lymphocyte absolute, T killer/suppressor lymphocyte absolute, B lymphocyte absolute, NK cell absolute, interleukin-2, interleukin-4, interleukin-6, interleukin-10, interleukin-17A, tumor necrosis factor, gamma interferon, carcinoembryonic antigen, albumin, lactate dehydrogenase, white blood cell count, lymphocyte ratio, neutrophil ratio, lymphocyte number, neutrophil number, monocyte ratio, NLR, eosinophil number, eosinophil ratio, basophil number, basophil ratio, international normalized ratio, D-dimer, fibrinogen.
S220: carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using regression analysis calculation, and obtaining an optimal feature subset;
S230: building a training set d= { (x) containing m samples based on an optimal feature subset 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m )};
S240: based on the training set, a random forest is constructed.
S250: calculating an output result of each decision tree by using a first expression, wherein the first expression is as follows:
wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the decision tree (number) in the forest, and H t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.
The Random Forest (RF) is a Bagging algorithm, and based on constructing a Bagging integration by taking a decision tree as a base learner, random attribute selection is further introduced in the training process of the decision tree. The use of random forests for prediction has many benefits, one in that random forests can handle missing values and maintain high accuracy, and the other in that important features can also be identified from the training dataset during construction of random forests. When selecting the partition attribute, the traditional decision tree selects an optimal attribute from the attribute set of the current node according to a certain rule; in the random forest, for each node of the decision tree, a subset including a plurality of attributes is selected randomly from the attribute set of the node, and then an optimal attribute is selected from the subset according to a certain rule for partitioning.
Specifically, given training set d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m ) Obtaining a sampling set containing m samples by a self-help sampling method (Bootstrap Sampling), wherein some samples in the initial training set appear multiple times in the sampling set, and some samples never appear, and the Bagging algorithm constructs a plurality of decision trees by the following method:
wherein F represents a decision tree algorithm, and CART decision trees are used under default conditions; t represents the number of decision trees in the random forest,representing sample distribution generated by self-service sampling; in combining the predicted results of multiple decision trees, a simple voting method is typically used to determine the output result:
when constructing a random forest in the invention, setting the number T=100 of decision trees in the random forest by using a feature subset obtained after feature selection by Lasso, generating 10 different random forest models by using S-fold cross validation, and finally obtaining average results of the cross validation models on a test set as shown in table 1:
TABLE 1 random forest test results based on Cross-validation of the optimal features
Index (I) | Roc_auc | Accuracy | Precision | Recall | F1 |
Numerical value | 0.868 | 0.800 | 0.845 | 0.842 | 0.838 |
In step S220, the feature data after the noise reduction processing is filtered by using regression analysis, and an optimal feature subset is obtained, as shown in fig. 3, and specifically includes the following steps:
S310: predicting all sample labels y in the training set to obtain a prediction result of each sample label; predicting all sample labels y in the training set by using a second expression; the second expression is:
wherein θ= (w, b) represents the target parameter, h θ (x) And (3) representing the prediction result of the sample label y, wherein x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.
S320: according to the prediction result, an optimization function is constructed by taking the minimum sum of the minimum logistic regression loss and the regularization term as a target; the constructed optimization function is as follows:
where m represents the number of samples in the training set, θ= (w, b) represents the target parameter, λ represents the parameter of the regularization term, and x i Represents the i-th sample, y i Representing the label corresponding to the ith sample, k representing the number of feature elements, w j The weight coefficient representing the j-th feature.
S330: and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.
In the biomedical field, the original dataset usually contains a large number of features, but not all features have an important role in solving the problem, and too many features may introduce noise, increase computational complexity, and even cause the model to appear to be over-fitted. In order to simplify the model, reduce computational complexity, and increase the interpretability and generalization ability of the model, feature selection (Feature Selection) is required to select the most relevant or representative feature subset from the original dataset. In the invention, lasso regression analysis is used for feature selection. Lasso (Least Absolute Selection and Shrinkage Operator) regression analysis is a linear regression method commonly used for feature selection and sparse modeling by introducing L into the loss function 1 Regularization terms promote model coefficient sparsification, i.e., scaling down or even setting 0 the coefficients corresponding to certain features.
Specifically, given training data set d= { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m ) Each sample corresponds to a feature vector of x= (x) 1 ,x 2 ,…,x k ) Are all composed of k different features, x j Is the value of x on the j-th feature, each feature corresponds to an index of the case sample, such as various lymphocytes andcytokines, carcinoembryonic antigens, lactate dehydrogenase, etc., the label y of the sample indicates the immune response of the tumor after treatment, and is classified into a response and a non-response (SD/PR/CR is classified as a response and PD is classified as a non-response according to the solid tumor efficacy evaluation standard RECIST 1.1), so that the regression problem is converted into a classification problem. Lasso regression analysis requires prediction of the true sample label y:
where θ= (w, b) represents the parameters of the model, the goal of Lasso regression is to minimize the sum of logistic regression loss and regularization term:
where m represents the number of samples in the training set and λ represents the parameters of the regularization term. Solving the above minimization problem by gradient descent method to obtain parameter θ= (w, b) of the model, and according to weight w corresponding to each feature j And screening to obtain a feature subset.
In the feature selection in the invention, lasso regression analysis is respectively carried out on the case without Bragg treatment and the case with Bragg treatment, and the union of the cases is taken as the result of the feature selection, the feature number selection in the Lasso regression analysis is completed through cross verification, specifically, the training data set D= { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m ) Dividing into several mutually exclusive subsets:
in order to maintain consistency of data distribution, each subset is obtained from the training set D through hierarchical sampling, S-1 subsets are used as training sets, the rest subset is used as a test set, and finally the average value of S tests is used as the result of the corresponding feature number. Various evaluation criteria are provided in the present invention, including accuracy ", precision", recall "recovery", and the like. For example, fig. 4 shows the trend of the "roc _ auc" index over the training set (non-time line) as a function of the number of features when the random seed of the cross-validated partitioning data is set to 42, where the training set and the test set contain 119 and 30 samples, respectively: as can be seen from fig. 4, as the number of features increases, the "roc _ auc" index on the training set shows a tendency to rise and fall, and takes a maximum value of 0.84 at 18 features. The optimal feature subset obtained through cross verification comprises the following steps: t killer/suppressor lymphocyte ratio, T absolute change, T helper/inducer lymphocyte absolute value, IL-4 change, IL-6 change, IL-10 change, IFN-gamma change, CEA carcinoembryonic antigen, CEA change, lactate dehydrogenase, LDH change, lymphocyte ratio, neutrophil count, monocyte ratio change, basophil ratio change, D-dimer, 18 blood indices. In one embodiment, it has also been attempted to set different random seeds for cross-validation partition data, with the resulting curve trend similar to that of FIG. 4. Feature subsets obtained after feature selection using Lasso were evaluated on the test set and the results are shown in table 2:
TABLE 2 logistic regression classifier test results based on optimal features obtained by cross-validation
Index (I) | Roc_auc | Accuracy | Precision | Recall | F1 |
Numerical value | 0.765 | 0.693 | 0.772 | 0.731 | 0.750 |
Compared with the use of L 2 Ridge Regression (Ridge Regression) of regularized terms, lasso Regression can also solve the problem of model overfitting, as shown in fig. 5. Wherein the method comprises the steps ofRepresenting the optimal solution of the model. Assume that the objective function of the linear regression model without regularization term is:
the ellipses in FIG. 5 represent the contour of the function, using L 1 Or L 2 Regularization term is equivalent to limiting the parameter beta of the model in a gray area, lasso regression limits the value range of the parameter beta of the model in a square area, so that tangential points of the contour line of the objective function and the square area are more likely to appear on a coordinate axis, and therefore, the Lasso regression easily enables the partial weight of the model parameter to be 0.
In the above specific embodiment, the bragg treatment immune response prediction method provided by the present invention is implemented by obtaining characteristic data of a target patient, where the characteristic data includes peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
According to the invention, the random forest is used for analyzing the hematology data of the existing cases and constructing a model, and the possible effect of a new patient after receiving the treatment can be predicted after the relevant hematology data of the new patient is input; meanwhile, the treatment response of the patient is predicted through the almost noninvasive detection of the hematology index, so that the frequency of medical image examination and the potential radiation damage caused by the image examination are greatly reduced, the burden of the patient and the society is reduced, and the medical resource waste is reduced. In this way, the immune response and survival of the Bragg treatment patient are predicted by using a pre-trained prediction model, so that a tumor patient possibly benefiting from Bragg scheme treatment is identified noninvasively by using a relatively accurate prediction result, and data support is provided for early and accurate individualized intervention on different patients in different periods.
In addition to the above method, the present invention also provides a Bragg treatment immune response predicting device, as shown in FIG. 6, comprising:
a data acquisition unit 610 for acquiring feature data of a target patient, the feature data including peripheral blood index data and image data;
A result output unit 620 for inputting the feature data into a pre-trained immune response prediction model to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
In some embodiments, training with an optimal feature subset of a patient sample based on a random forest model results in the immune response prediction model, specifically comprising:
collecting characteristic data of a patient sample, wherein the characteristic data comprises peripheral blood index data and image data;
carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using regression analysis calculation, and obtaining an optimal feature subset;
building a training set d= { (x) containing m samples based on an optimal feature subset 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m ) -a }; constructing a random forest based on the training set, in some embodiments, calculating an output result of each of the decision trees using a first expression;
the first expression is:
wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the decision tree (number) in the forest, and H t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.
In some embodiments, classifying the feature data after the noise reduction processing by using regression analysis algorithm, and obtaining an optimal feature subset specifically includes:
based on LASSO regression, namely L1 regularization is introduced on the basis of logistic regression;
and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.
In some embodiments, all sample tags y in the training set are predicted using the second expression;
the second expression is:
wherein θ= (w, b) represents the target parameter, h t (x) According to the probability is bigThe method is used for representing the prediction result of the sample label y, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.
In some embodiments, with the goal of minimizing the sum of logistic regression loss and regularization term, the optimization function is constructed as:
wherein m represents the number of samples in the training set, θ= (w, b) represents the target parameter, h θ (x i ) Representing the conditional probability of the tag value at a sample known value, λ represents the parameter of the regularization term, x i Represents the i-th sample, y i Representing the label corresponding to the ith sample, k representing the number of feature elements, w j The weight coefficient representing the j-th feature.
In some embodiments, collecting the characteristic data of the patient sample includes collecting the characteristic data of a case sample that did not employ the target therapy, and collecting the characteristic data of a case sample that employed the target therapy.
In the above specific embodiment, the bragg treatment immune response prediction device provided by the present invention obtains the characteristic data of the target patient, where the characteristic data includes peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
According to the invention, the random forest is used for analyzing the hematology data of the existing cases and constructing a model, and the possible effect of a new patient after receiving the treatment can be predicted after the relevant hematology data of the new patient is input; meanwhile, the treatment response of the patient is predicted through the almost noninvasive detection of the hematology index, so that the frequency of medical image examination and the potential radiation damage caused by the image examination are greatly reduced, the burden of the patient and the society is reduced, and the medical resource waste is reduced. In this way, the immune response and survival of the Bragg treatment patient are predicted by using a pre-trained prediction model, so that a tumor patient possibly benefiting from Bragg scheme treatment is identified noninvasively by using a relatively accurate prediction result, and data support is provided for early and accurate individualized intervention on different patients in different periods.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and model predictions. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The model predictions of the computer device are used to store static information and dynamic information data. The network interface of the computer device is used for communicating with an external terminal through a network connection. Which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Corresponding to the above embodiments, the present invention further provides a computer storage medium, which contains one or more program instructions. Wherein the one or more program instructions are for being executed with the method as described above.
The present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program being capable of performing the above method when being executed by a processor.
In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.
The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory.
The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DRRAM).
The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.
Claims (10)
1. A method of predicting an immune response to bragg treatment, the method comprising:
acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;
inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
2. The method of claim 1, wherein training with an optimal feature subset of patient samples based on a random forest model results in the immune response prediction model, comprising:
collecting characteristic data of a patient sample, wherein the characteristic data comprises peripheral blood index data and image data;
carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using a regression analysis algorithm, and obtaining an optimal feature subset;
building a training set d= { (x) containing m samples based on an optimal feature subset 1 ,y 1 ),(x 2 ,y 2 ),…,(x m ,y m )};
Based on the training set, a random forest (a plurality of decision trees) is constructed.
3. The method of claim 2, wherein the output of each of the decision trees is calculated using a first expression;
the first expression is:
wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the T-th decision tree (number) in the forest, and H t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.
4. The method for predicting the immune response of the bragg treatment according to claim 2, wherein the regression analysis algorithm is used to screen the feature data after the noise reduction treatment, and the optimal feature quantity is selected through cross-validation to obtain the optimal feature subset, and the method specifically comprises the following steps:
based on LASSO regression, L1 regularization is introduced on the basis of logistic regression;
and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.
5. The method of claim 4, wherein all sample tags y in the training set are predicted using the second expression;
The second expression is:
wherein θ= (w, b) represents the target parameter, h θ (x) According to the probability size, the prediction result (assigned as 1 or 0 in the case of classification) of the sample label is represented, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.
6. The method of claim 2, wherein the constructing an optimization function targeting minimizing the sum of logistic regression loss and regularization term is:
wherein m represents the number of samples in the training set, θ= (w, b) represents the target parameter, h θ (x i ) Conditional probability representing tag value under sample knownLambda represents a parameter of the regularization term, x i Represents the i-th sample, y i Representing the label corresponding to the ith sample, k representing the number of feature elements, w j The weight coefficient representing the j-th feature.
7. The method of claim 2, wherein collecting characteristic data of the patient sample comprises collecting characteristic data of a case sample not taking the target therapy and characteristic data of a case sample taking the target therapy.
8. A bragg treatment immune response prediction device, the device comprising:
The data acquisition unit is used for acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;
the result output unit is used for inputting the characteristic data into a pre-trained immune response prediction model so as to obtain an immune response prediction result;
the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311132990.XA CN117116357A (en) | 2023-09-04 | 2023-09-04 | Bragg treatment immune response prediction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311132990.XA CN117116357A (en) | 2023-09-04 | 2023-09-04 | Bragg treatment immune response prediction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117116357A true CN117116357A (en) | 2023-11-24 |
Family
ID=88808911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311132990.XA Pending CN117116357A (en) | 2023-09-04 | 2023-09-04 | Bragg treatment immune response prediction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117116357A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115125211A (en) * | 2021-03-24 | 2022-09-30 | 核工业总医院 | Colon cancer peritoneal metastasis mouse model for evaluating curative effect of immunotherapy |
-
2023
- 2023-09-04 CN CN202311132990.XA patent/CN117116357A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115125211A (en) * | 2021-03-24 | 2022-09-30 | 核工业总医院 | Colon cancer peritoneal metastasis mouse model for evaluating curative effect of immunotherapy |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Christo et al. | Feature selection and instance selection from clinical datasets using co-operative co-evolution and classification using random forest | |
Alirezaei et al. | A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines | |
US7428554B1 (en) | System and method for determining matching patterns within gene expression data | |
Akbugday | Classification of breast cancer data using machine learning algorithms | |
Saygılı | Classification and diagnostic prediction of breast cancers via different classifiers | |
US11710540B2 (en) | Multi-level architecture of pattern recognition in biological data | |
Mahesh et al. | Early predictive model for breast cancer classification using blended ensemble learning | |
CN111387938B (en) | Patient heart failure death risk prediction system based on characteristic rearrangement one-dimensional convolutional neural network | |
CN117116357A (en) | Bragg treatment immune response prediction method and device | |
Megna et al. | A comparison among different machine learning pretest approaches to predict stress-induced ischemia at PET/CT myocardial perfusion imaging | |
US20190189248A1 (en) | Methods, systems and apparatus for subpopulation detection from biological data based on an inconsistency measure | |
Kuzmanovski et al. | Extensive evaluation of the generalized relevance network approach to inferring gene regulatory networks | |
Hasan et al. | Science and business | |
De Paz et al. | MicroCBR: A case-based reasoning architecture for the classification of microarray data | |
Ashraf et al. | Iterative weighted k-NN for constructing missing feature values in Wisconsin breast cancer dataset | |
Mythili et al. | Similarity Disease Prediction System for Efficient Medicare | |
Amutha et al. | A Survey on Machine Learning Algorithms for Cardiovascular Diseases Predic-tion | |
Fadhil et al. | Classification of Cancer Microarray Data Based on Deep Learning: A Review | |
Sumitha | ANALYSIS OF GENE EXPRESSION VALUE USING BIO-INSPIRED ALGORITHMS. | |
Wilson et al. | Machine intelligence for radiation science: summary of the Radiation Research Society 67th annual meeting symposium | |
Koul et al. | A perturbation based algorithm for inference of gene regulatory networks for multiple Myeloma | |
Li et al. | PAST: latent feature extraction with a prior-based self-attention framework for spatial transcriptomics | |
Heydari et al. | N-ACT: An interpretable deep learning model for automatic cell type and salient gene identification | |
Abou Haidar et al. | Classification of Malignant or Benign Cancer using Neural Networks | |
ERTEL et al. | SIMULATING BREAST CANCER TREATMENT EFFICACY: A COMPUTATIONAL APPROACH TO OPTIMIZING PATIENT CARE |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |