CN117116357A

CN117116357A - Bragg treatment immune response prediction method and device

Info

Publication number: CN117116357A
Application number: CN202311132990.XA
Authority: CN
Inventors: 张力元; 徐红; 王邦军
Original assignee: Nuclear Industry General Hospital
Current assignee: Nuclear Industry General Hospital
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-11-24

Abstract

The invention discloses a Bragg treatment immune response prediction method and a Bragg treatment immune response prediction device, wherein the method comprises the following steps: acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a logistic regression algorithm. The invention predicts the immune response and survival of the Bragg treatment patient by using a pre-trained prediction model, thereby noninvasively identifying the tumor patient possibly benefiting from the Bragg scheme treatment by using a relatively accurate prediction result and providing data support for early and accurate individualized intervention on different patients in different periods.

Description

Bragg treatment immune response prediction method and device

Technical Field

The invention relates to the field of artificial intelligence, in particular to a Bragg treatment immune response prediction method and device.

Background

Tumor immunotherapy (anti-PD-1 or anti-PD-L1) has made breakthrough progress in recent years, and is widely used in the treatment of various advanced solid tumors, but the effective rate of immunotherapy alone is only 15-25%. Thus, patent CN111951897a has proposed a method of predicting the responsiveness of cancer patients to anti-PD-1/PD-L1 immunotherapy, predicting the effect of the treatment prior to the first treatment, helping cancer patients to decide whether to receive immunotherapy; or to determine the effect of a treatment after a previous treatment before the next treatment, helping the cancer patient to decide whether to continue to receive immunotherapy. The method comprises the following steps: obtaining a peripheral blood sample from the cancer patient prior to receiving the immunotherapy; detecting the number of immune cells in the peripheral blood sample of the cancer patient; and comparing the immune cell number to a first threshold to predict whether the cancer patient will benefit from the immunotherapy, wherein the first threshold is determined by: a statistical analysis is performed on the correlation between the number of immune cells in a group of cancer patients and the expected risk of disease progression in the group of cancer patients, and then statistically significant values are obtained, wherein the values are used to define the correlation. The immune cells exhibit at least one of the following markers: PD1, CD8, CD4, IFN-gamma, TIM3, LAG3, CD25, TGF-beta. The patent has the following problems that the selected peripheral blood index is not screened, part of immune cells are artificially selected, the index showing the relevant condition of a tumor or a patient is not available, the immune response itself is an countermeasure process between the organism and the tumor, and the response condition can be accurately estimated by exploring from the two aspects of the organism and the tumor; when the related immune indexes are used for prediction, only the efficiency of a single index is considered, the influence of other indexes is ignored, whether the indexes are related or not is unknown, and whether the influence of the relation between the indexes on the prediction factors is unknown or not is not known, so that the prediction efficiency of the indexes is considered; different tumor types have different prediction indexes and are complicated in clinical use.

Through searching, it was found that, in order to predict the therapeutic response of cancer patients to anticancer therapy, the hafumez roche company used data from the flair Health database to conduct survival analysis of 99,249 people from 12 different groups (RoPro 1) and 110,538 people from 15 different groups (RoPro 2), the groups being defined by tumor types, and validated the results in two independent clinical studies. Inputting cancer patient information into a model to generate a score indicative of a risk of mortality of the cancer patient, wherein the patient information includes data corresponding to each of the following parameters: (i) albumin levels in serum or plasma; (ii) eastern tumor cooperative group (ECOG) physical condition; (iii) lymphocyte to leukocyte ratio in blood; (iv) a smoking condition; (v) age; (vi) TNM classification of malignancy stage; (vii) heart rate; (viii) chloride or sodium levels in serum or plasma; (ix) urea nitrogen levels in serum or plasma; (x) sex; (xi) hemoglobin or hematocrit levels in blood; (xii) Level of aspartate aminotransferase activity in serum or plasma; and (xiii) alanine aminotransferase activity level in serum or plasma. However, in the data preliminary screening, each parameter is analyzed separately, the influence of the correlation between the parameters on the predicted variable is not considered, the parameter data not only relates to the blood detection part, but also relates to the basic information part of the patient, and factors which are subjective or greatly change with the activity state of the patient, such as physical state scores (ECOG scores), blood pressure, heart rate and the like, and the predicted result of the model may have larger deviation. For the same tumor, the data of different treatment methods are included, so that the accurate prediction of the immunotherapy can not be performed; immunotherapy, which may be characterized by its unique hematological response, especially dynamic changes in lymphocytes and their related subtypes, is not demonstrated.

Immunization in combination with other therapeutic modalities, such as radiation therapy, chemotherapy, targeted therapy, cytokine therapy, etc., is currently the means to increase the efficacy of immunotherapy. The Bragg treatment is an important means of an immune combined method, and utilizes the combined application of PD-1inhibitor, radiotherapy and granulocyte-macrophage colony stimulating factor (PRaG: PD-1inhibitor,Radiotherapy and GM-CSF, which has better curative effect in the treatment of advanced refractory tumors, partial research results apply for related patents, such as colon cancer peritoneal metastasis mouse model for evaluating the curative effect of immunotherapy, patent application CN202110311772.7, bragg decision scheme evaluation method and device, and patent application CN202210774618.8.

However, not all patients would benefit significantly from this regimen, and PRaG 1.0 treatment studies showed a median progression free survival (mPFS) of 4.0 months, a Disease Control Rate (DCR) of 46.3% and an objective tumor remission rate (ORR) of only 16.7%.

Therefore, it is desirable to provide a method and apparatus for predicting the immune response and survival of a patient undergoing bragg treatment, so as to identify a tumor patient who may benefit from bragg treatment using a relatively accurate prediction result, and provide data support for early and accurate individualized intervention for different patients in different periods, which is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention provides a tumor immune response prediction method and a device, so that an immune response and survival of a Bragg treatment patient can be predicted by using a pre-trained prediction model, and thus a tumor patient who is likely to benefit from Bragg scheme treatment can be identified noninvasively by using a relatively accurate prediction result, and data support is provided for early and accurate individualized intervention on different patients in different periods.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

the present invention provides a method of predicting an immune response to a Bragg treatment, in particular, the present invention provides a method of predicting an immune response to a Bragg treatment patient, the method comprising:

acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;

inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result;

the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.

In some embodiments, training with an optimal feature subset of a patient sample based on a random forest model results in the immune response prediction model, specifically comprising:

collecting characteristic data of a patient sample, wherein the characteristic data comprises peripheral blood index data and image data;

carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using regression analysis calculation, and obtaining an optimal feature subset;

building a training set d= { (x) containing m samples based on an optimal feature subset ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m )}；

Based on the training set, a random forest (a plurality of decision trees) is constructed.

In some embodiments, calculating an output result of each of the decision trees using a first expression;

the first expression is:

wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the decision tree (number) in the forest, and H _t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.

In some embodiments, the regression analysis algorithm is used to screen the feature data after the noise reduction processing, and obtain an optimal feature subset, which specifically includes:

Based on LASSO regression, L1 regularization is introduced on the basis of logistic regression;

and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.

In some embodiments, all sample tags y in the training set are predicted using the second expression;

the second expression is:

wherein θ= (w, b) represents the target parameter, h _θ (x) According to the probability size, the prediction result (assigned as 1 or 0 in the case of classification) of the sample label y is represented, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.

In some embodiments, with the goal of minimizing the sum of logistic regression loss and regularization term, the optimization function is constructed as:

wherein m represents the number of samples in the training set, θ= (w, b) represents the target parameter, h _θ (x ⁱ ) Representing the conditional probability of the tag value at a sample known value, λ represents the parameter of the regularization term, x ⁱ Represents the i-th sample, y ⁱ Representing the label corresponding to the ith sample, k representing the number of feature elements, w _j The weight coefficient representing the j-th feature.

In some embodiments, collecting the characteristic data of the patient sample includes collecting the characteristic data of a case sample that did not employ the target therapy, and collecting the characteristic data of a case sample that employed the target therapy.

The present invention also provides a bragg treatment immune response prediction device, the device comprising:

the data acquisition unit is used for acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;

the result output unit is used for inputting the characteristic data into a pre-trained immune response prediction model so as to obtain an immune response prediction result;

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.

According to the tumor immune response prediction method provided by the invention, the characteristic data of a target patient are obtained, wherein the characteristic data comprise peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.

The beneficial technical effects of the invention are as follows:

the biomarkers predicted by the current immunotherapy are mainly tumor PD-L1 expression, tumor mutation load (TMB) and tumor microsatellite instability (MSI), but the prediction efficacy of the three are not satisfactory. The multi-parameter immune prediction model combined with various gene characteristics improves the prediction efficiency to a certain extent, but greatly depends on tumor tissue biopsy, has invasiveness and is limited by tumor accessibility, patient illness state and willingness; in addition, tumor tissue has heterogeneity, immune response within a single lesion does not represent a systemic anti-tumor immune state, and blood sequencing costs are too high, limiting the wide clinical applications. The peripheral blood is the most common sample in clinic, is easy to obtain and has little damage to patients, can be basically considered as noninvasive, can be dynamically monitored, and has strong clinical applicability. The invention predicts the immune response or survival of the patient by using peripheral blood, and brings important references for clinical treatment decision selection, thereby benefiting the patient to the greatest extent.

There are no studies or reports currently that model the prediction of immune response or survival using only peripheral blood indicators. Research on immune related factor exploration is mostly carried out by adopting basic statistical methods such as t-test, rank and sum test, chi-square test, correlation coefficient and the like to find features related to immune response, and the methods have unavoidable defects such as t-test, rank and test, chi-square test neglecting correlations among features, correlation coefficient selection only taking into consideration linear relations between features and target variables and the like, and the methods only find that certain features are possibly related to immune response, and cannot predict immune response conditions alone or in combination of multiple features. According to the invention, only peripheral blood indexes are utilized, and the selected Lasso regression method can automatically reject some characteristics with small interaction with the predicted variables when screening parameter characteristics, and retain the characteristics with relatively large influence, so that the influence of all the characteristics on the predicted variables is judged on the whole. In addition, the random forest is not just a classification model, and when constructing each decision tree, the random forest uses part of the screened sub-features, which is equivalent to feature screening again, so that the essential relation between the predicted variable and the selected features can be further mined.

The Bragg series study is an original study of a tension element teaching subject group, no related model is used for predicting the immune response of a patient who is subjected to Bragg treatment at present, no definite index can predict the immune response clinically, and the invention firstly screens out peripheral blood indexes and builds a corresponding prediction model to realize tumor immune response prediction and survival prediction.

According to the invention, the random forest model is used for analyzing and constructing the hematology data of the case subjected to the Bragg treatment, and the possible effect of a new patient after receiving the treatment can be predicted after the relevant hematology data of the new patient is input; meanwhile, the treatment response of the patient is predicted through the almost noninvasive detection of the hematology index, so that the frequency of medical image examination and the potential radiation damage caused by the image examination are greatly reduced, the burden of the patient and the society is reduced, and the medical resource waste is reduced.

The invention predicts the immune response and survival of the Bragg treatment patient by using the pre-trained prediction model, thereby being capable of noninvasively identifying the tumor patient possibly benefiting from the Bragg scheme treatment by using the accurate prediction result and providing data support for individuation intervention of different patients in different periods as early as possible and accurately.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.

FIG. 1 is a flowchart of a method for predicting an immune response to Bragg treatment according to the present invention;

FIG. 2 is a second flowchart of a method for predicting an immune response of Bragg treatment according to the present invention;

FIG. 3 is a third flowchart of a method for predicting an immune response of Bragg treatment according to the present invention;

FIG. 4 is a graph showing the effect of the method for predicting the immune response of Bragg treatment according to the present invention;

FIG. 5 is a schematic diagram of a method for predicting the immune response of Bragg treatment according to the present invention;

FIG. 6 is a block diagram of a Bragg treatment immune response predicting device according to the present invention;

fig. 7 is a block diagram of a computer device according to the present invention.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a Bragg treatment immune response prediction method, which is used for predicting the immune response of a new patient by using a pre-trained model and providing data support for the selection of a subsequent treatment scheme according to the prediction result.

Referring to fig. 1, fig. 1 is a flowchart of a method for predicting an immune response of bragg treatment according to the present invention.

In one embodiment, the present invention provides a method of predicting an immune response to Bragg treatment comprising the steps of:

s110, acquiring characteristic data of a target patient, wherein the characteristic data comprises peripheral blood index data and image data;

patient and data requirements for the feature data acquisition: 1) Patients received at least one treatment with a Bragg regimen (1.0-3.0); 2) Has baseline (within 28 days before the first treatment) and hematology detection data within 28 days before the 2 nd and 3 rd period treatment and belonging to the last period treatment; 3 and if only one bragg regimen treatment is used, it is necessary to have baseline and hematology data within 8 weeks after use; 4) At least 1 imaging evaluation result; 5) If there are multiple data within the baseline or 28 days prior to treatment of cycles 2 and 3, only the first detected data at the time of admission (i.e., without any treatment); 6) The missing data is not filled, and if the index included in a piece of data is missing, the piece of data is not used in the model construction.

The Bragg regimen (1.0-3.0) treats the particular regimen:

Prag1.0: patients in the group select proper focuses to carry out large-segment radiotherapy (5 or 8Gy multiplied by 2-3 Fx), GM-CSF (200 ug/d) is injected subcutaneously for 14 days beginning the second day after the radiotherapy is finished, PD-1 inhibitor is used in one week after the radiotherapy is finished, one treatment period is used every three weeks, the next period can carry out radiotherapy on different target focuses, triple treatment is carried out for at least more than or equal to 2 periods (until the focus is not properly irradiated or the tolerizing dose of normal tissues is reached), and PD-1 inhibitor is used for sequentially GM-CSF and IL-2 for 6 periods, and then PD-1 inhibitor single drug maintenance (one period every 21 days) can be used until the progress or intolerable toxic reaction is achieved.

Prag2.0: selecting proper focus for large-scale segmented radiotherapy of 10-24Gy/5-8Gy/2-3f by the patients in the group; GM-CSF200ug was injected subcutaneously Qd for 7 days starting on the current day of radiation; PD-1 inhibitor is used within one week after the radiotherapy is finished; qd was subcutaneously injected with IL-2200 ten thousand IU for 7 days 24 hours after GM-CSF was completed, with a period of 21 days. After more than or equal to 2 cycles of treatment with PD-1 inhibitor in combination with GM-CSF and IL-2, 6 cycles of treatment with PD-1 inhibitor in combination with GM-CSF and IL-2, the treatment may be maintained with PD-1 inhibitor alone (one cycle every 21 days) until progression or intolerable toxic response.

Prag3.0: the first day of treatment of the patients in the group was initiated with an intravenous injection of RC-48ADC2.0mg/kg d 1; selecting proper focus for large-scale split radiotherapy for 10-24Gy/5-8Gy/2-3f in the third day of treatment; PD-1/PD-L1 inhibitor is used within one week after the radiotherapy is finished; the day of radiotherapy begins with 200 μg subcutaneous injections Qd of GM-CSF for 5 days; IL-2200 ten thousand IU was used the next day after GM-CSF was terminated, and Qd was subcutaneously injected for 5 days; RC-48ADC combined radiotherapy and PD-1/PD-L1 inhibitor are used for sequentially treating GM-CSF and IL-2 for more than or equal to 2 periods; after 6 weeks following subsequent use of RC-48ADC and PD-1/PD-L1 inhibitor to sequential GM-CSF, IL-2, PD-1/PD-L1 inhibitor may be used alone to maintain until progression or intolerable toxic response.

The peripheral blood index data and the acquisition:

in the invention, 70 peripheral blood indexes are taken as initial characteristics, and the detection result of the peripheral blood indexes is derived from a corresponding test report of a second hospital clinical laboratory affiliated to the university of Suzhou. Wherein 35 indexes are selected from peripheral blood tests (blood routine, biochemical, hemagglutination, tumor markers, lymphocyte subpopulation analysis, cytokines) of patients in three studies of Bragg 1.0, bragg 2.0 and Bragg 3.0, which are routinely evaluated before receiving treatment. The screening basis is peripheral blood indexes possibly related to tumor immunity reported in the prior literature, and specifically comprises the following steps: comprising the following steps: t lymphocyte ratio, T helper/suppressor lymphocyte ratio, B lymphocyte ratio, NK cell ratio, T lymphocyte absolute, T helper/suppressor lymphocyte absolute, T killer/suppressor lymphocyte absolute, B lymphocyte absolute, NK cell absolute, interleukin-2, interleukin-4, interleukin-6, interleukin-10, interleukin-17A, tumor necrosis factor, gamma interferon, carcinoembryonic antigen, albumin, lactate dehydrogenase, white blood cell count, lymphocyte ratio, neutrophil ratio, lymphocyte number, neutrophil number, monocyte ratio, NLR, eosinophil number, eosinophil ratio, basophil number, basophil ratio, international normalized ratio, D-dimer, fibrinogen. The other 35 indexes are the dynamic change values of the selected 35 indexes, namely the ratio of the value of a certain index after being treated by the Bragg scheme to the index before being treated by the Bragg scheme (baseline).

The acquisition method comprises the following steps: the inspection data are manually collected by team members and scientific research assistants according to hospital numbers and related information registered by clinical trials, and the image evaluation results are evaluated by image professionals and clinical trial doctors from a second hospital inspection system and an image system attached to the university of Suzhou.

S120, inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result;

In some embodiments, training is performed by using an optimal feature subset of a patient sample based on a random forest model to obtain the immune response prediction model, as shown in fig. 2, specifically including the following steps:

s210: characteristic data of a patient sample is collected, wherein the characteristic data comprises peripheral blood index data and image data.

In some embodiments, collecting the characteristic data of the patient sample includes collecting the characteristic data of a case sample that did not employ the target therapy, and collecting the characteristic data of a case sample that employed the target therapy. In the data collection and pretreatment, for example, the base line of three clinical-trial patients with advanced refractory solid tumors based on a "PD-1 inhibitor combined large-segment radiotherapy and cytokine (GM-csf±il-2)" based regimen derived from a certain target hospital database in a certain period of time, and 70 indexes of 35 peripheral blood indexes before 2-3-cycle treatment and dynamic change values thereof (ratio of values before 2-3-cycle treatment to base line values) can be collected, and the first image evaluation result of the patients (classified as SD, PR, CR, PD according to the solid tumor efficacy evaluation standard RECIST 1.1) can be collected. The missing data is not filled, and if the index included in a piece of data is missing, the piece of data is not used in the model construction.

Wherein, 35 peripheral blood indexes can be reported in the prior literature as blood indexes possibly related to immune response and detected in clinical experiments, and can comprise, for example: t lymphocyte ratio, T helper/suppressor lymphocyte ratio, B lymphocyte ratio, NK cell ratio, T lymphocyte absolute, T helper/suppressor lymphocyte absolute, T killer/suppressor lymphocyte absolute, B lymphocyte absolute, NK cell absolute, interleukin-2, interleukin-4, interleukin-6, interleukin-10, interleukin-17A, tumor necrosis factor, gamma interferon, carcinoembryonic antigen, albumin, lactate dehydrogenase, white blood cell count, lymphocyte ratio, neutrophil ratio, lymphocyte number, neutrophil number, monocyte ratio, NLR, eosinophil number, eosinophil ratio, basophil number, basophil ratio, international normalized ratio, D-dimer, fibrinogen.

S220: carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using regression analysis calculation, and obtaining an optimal feature subset;

S230: building a training set d= { (x) containing m samples based on an optimal feature subset ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m )}；

S240: based on the training set, a random forest is constructed.

S250: calculating an output result of each decision tree by using a first expression, wherein the first expression is as follows:

The Random Forest (RF) is a Bagging algorithm, and based on constructing a Bagging integration by taking a decision tree as a base learner, random attribute selection is further introduced in the training process of the decision tree. The use of random forests for prediction has many benefits, one in that random forests can handle missing values and maintain high accuracy, and the other in that important features can also be identified from the training dataset during construction of random forests. When selecting the partition attribute, the traditional decision tree selects an optimal attribute from the attribute set of the current node according to a certain rule; in the random forest, for each node of the decision tree, a subset including a plurality of attributes is selected randomly from the attribute set of the node, and then an optimal attribute is selected from the subset according to a certain rule for partitioning.

Specifically, given training set d= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Obtaining a sampling set containing m samples by a self-help sampling method (Bootstrap Sampling), wherein some samples in the initial training set appear multiple times in the sampling set, and some samples never appear, and the Bagging algorithm constructs a plurality of decision trees by the following method:

wherein F represents a decision tree algorithm, and CART decision trees are used under default conditions; t represents the number of decision trees in the random forest,representing sample distribution generated by self-service sampling; in combining the predicted results of multiple decision trees, a simple voting method is typically used to determine the output result:

when constructing a random forest in the invention, setting the number T=100 of decision trees in the random forest by using a feature subset obtained after feature selection by Lasso, generating 10 different random forest models by using S-fold cross validation, and finally obtaining average results of the cross validation models on a test set as shown in table 1:

TABLE 1 random forest test results based on Cross-validation of the optimal features

Index (I)	Roc_auc	Accuracy	Precision	Recall	F1
						Numerical value	0.868	0.800	0.845	0.842	0.838

In step S220, the feature data after the noise reduction processing is filtered by using regression analysis, and an optimal feature subset is obtained, as shown in fig. 3, and specifically includes the following steps:

S310: predicting all sample labels y in the training set to obtain a prediction result of each sample label; predicting all sample labels y in the training set by using a second expression; the second expression is:

wherein θ= (w, b) represents the target parameter, h _θ (x) And (3) representing the prediction result of the sample label y, wherein x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.

S320: according to the prediction result, an optimization function is constructed by taking the minimum sum of the minimum logistic regression loss and the regularization term as a target; the constructed optimization function is as follows:

where m represents the number of samples in the training set, θ= (w, b) represents the target parameter, λ represents the parameter of the regularization term, and x ⁱ Represents the i-th sample, y ⁱ Representing the label corresponding to the ith sample, k representing the number of feature elements, w _j The weight coefficient representing the j-th feature.

S330: and solving parameters of the target optimization function through a gradient descent method, and screening according to weights corresponding to each feature to obtain the optimal feature subset.

In the biomedical field, the original dataset usually contains a large number of features, but not all features have an important role in solving the problem, and too many features may introduce noise, increase computational complexity, and even cause the model to appear to be over-fitted. In order to simplify the model, reduce computational complexity, and increase the interpretability and generalization ability of the model, feature selection (Feature Selection) is required to select the most relevant or representative feature subset from the original dataset. In the invention, lasso regression analysis is used for feature selection. Lasso (Least Absolute Selection and Shrinkage Operator) regression analysis is a linear regression method commonly used for feature selection and sparse modeling by introducing L into the loss function ₁ Regularization terms promote model coefficient sparsification, i.e., scaling down or even setting 0 the coefficients corresponding to certain features.

Specifically, given training data set d= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Each sample corresponds to a feature vector of x= (x) ₁ ,x ₂ ,…,x _k ) Are all composed of k different features, x _j Is the value of x on the j-th feature, each feature corresponds to an index of the case sample, such as various lymphocytes andcytokines, carcinoembryonic antigens, lactate dehydrogenase, etc., the label y of the sample indicates the immune response of the tumor after treatment, and is classified into a response and a non-response (SD/PR/CR is classified as a response and PD is classified as a non-response according to the solid tumor efficacy evaluation standard RECIST 1.1), so that the regression problem is converted into a classification problem. Lasso regression analysis requires prediction of the true sample label y:

where θ= (w, b) represents the parameters of the model, the goal of Lasso regression is to minimize the sum of logistic regression loss and regularization term:

where m represents the number of samples in the training set and λ represents the parameters of the regularization term. Solving the above minimization problem by gradient descent method to obtain parameter θ= (w, b) of the model, and according to weight w corresponding to each feature _j And screening to obtain a feature subset.

In the feature selection in the invention, lasso regression analysis is respectively carried out on the case without Bragg treatment and the case with Bragg treatment, and the union of the cases is taken as the result of the feature selection, the feature number selection in the Lasso regression analysis is completed through cross verification, specifically, the training data set D= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Dividing into several mutually exclusive subsets:

in order to maintain consistency of data distribution, each subset is obtained from the training set D through hierarchical sampling, S-1 subsets are used as training sets, the rest subset is used as a test set, and finally the average value of S tests is used as the result of the corresponding feature number. Various evaluation criteria are provided in the present invention, including accuracy ", precision", recall "recovery", and the like. For example, fig. 4 shows the trend of the "roc _ auc" index over the training set (non-time line) as a function of the number of features when the random seed of the cross-validated partitioning data is set to 42, where the training set and the test set contain 119 and 30 samples, respectively: as can be seen from fig. 4, as the number of features increases, the "roc _ auc" index on the training set shows a tendency to rise and fall, and takes a maximum value of 0.84 at 18 features. The optimal feature subset obtained through cross verification comprises the following steps: t killer/suppressor lymphocyte ratio, T absolute change, T helper/inducer lymphocyte absolute value, IL-4 change, IL-6 change, IL-10 change, IFN-gamma change, CEA carcinoembryonic antigen, CEA change, lactate dehydrogenase, LDH change, lymphocyte ratio, neutrophil count, monocyte ratio change, basophil ratio change, D-dimer, 18 blood indices. In one embodiment, it has also been attempted to set different random seeds for cross-validation partition data, with the resulting curve trend similar to that of FIG. 4. Feature subsets obtained after feature selection using Lasso were evaluated on the test set and the results are shown in table 2:

TABLE 2 logistic regression classifier test results based on optimal features obtained by cross-validation

Index (I)	Roc_auc	Accuracy	Precision	Recall	F1
						Numerical value	0.765	0.693	0.772	0.731	0.750

Compared with the use of L ₂ Ridge Regression (Ridge Regression) of regularized terms, lasso Regression can also solve the problem of model overfitting, as shown in fig. 5. Wherein the method comprises the steps ofRepresenting the optimal solution of the model. Assume that the objective function of the linear regression model without regularization term is:

the ellipses in FIG. 5 represent the contour of the function, using L ₁ Or L ₂ Regularization term is equivalent to limiting the parameter beta of the model in a gray area, lasso regression limits the value range of the parameter beta of the model in a square area, so that tangential points of the contour line of the objective function and the square area are more likely to appear on a coordinate axis, and therefore, the Lasso regression easily enables the partial weight of the model parameter to be 0.

In the above specific embodiment, the bragg treatment immune response prediction method provided by the present invention is implemented by obtaining characteristic data of a target patient, where the characteristic data includes peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.

According to the invention, the random forest is used for analyzing the hematology data of the existing cases and constructing a model, and the possible effect of a new patient after receiving the treatment can be predicted after the relevant hematology data of the new patient is input; meanwhile, the treatment response of the patient is predicted through the almost noninvasive detection of the hematology index, so that the frequency of medical image examination and the potential radiation damage caused by the image examination are greatly reduced, the burden of the patient and the society is reduced, and the medical resource waste is reduced. In this way, the immune response and survival of the Bragg treatment patient are predicted by using a pre-trained prediction model, so that a tumor patient possibly benefiting from Bragg scheme treatment is identified noninvasively by using a relatively accurate prediction result, and data support is provided for early and accurate individualized intervention on different patients in different periods.

In addition to the above method, the present invention also provides a Bragg treatment immune response predicting device, as shown in FIG. 6, comprising:

a data acquisition unit 610 for acquiring feature data of a target patient, the feature data including peripheral blood index data and image data;

A result output unit 620 for inputting the feature data into a pre-trained immune response prediction model to obtain an immune response prediction result;

building a training set d= { (x) containing m samples based on an optimal feature subset ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) -a }; constructing a random forest based on the training set, in some embodiments, calculating an output result of each of the decision trees using a first expression;

the first expression is:

In some embodiments, classifying the feature data after the noise reduction processing by using regression analysis algorithm, and obtaining an optimal feature subset specifically includes:

based on LASSO regression, namely L1 regularization is introduced on the basis of logistic regression;

the second expression is:

wherein θ= (w, b) represents the target parameter, h _t (x) According to the probability is bigThe method is used for representing the prediction result of the sample label y, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.

In the above specific embodiment, the bragg treatment immune response prediction device provided by the present invention obtains the characteristic data of the target patient, where the characteristic data includes peripheral blood index data and image data; inputting the characteristic data into a pre-trained immune response prediction model to obtain an immune response prediction result; the immune response prediction model is obtained by training an optimal feature subset of a patient sample based on a random forest model, and the optimal feature subset is obtained by processing feature data of the patient sample through a regression algorithm.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and model predictions. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The model predictions of the computer device are used to store static information and dynamic information data. The network interface of the computer device is used for communicating with an external terminal through a network connection. Which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

Corresponding to the above embodiments, the present invention further provides a computer storage medium, which contains one or more program instructions. Wherein the one or more program instructions are for being executed with the method as described above.

The present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program being capable of performing the above method when being executed by a processor.

In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.

The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory.

The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DRRAM).

The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.

Claims

1. A method of predicting an immune response to bragg treatment, the method comprising:

2. The method of claim 1, wherein training with an optimal feature subset of patient samples based on a random forest model results in the immune response prediction model, comprising:

carrying out noise reduction treatment on the feature data, screening the feature data subjected to the noise reduction treatment by using a regression analysis algorithm, and obtaining an optimal feature subset;

3. The method of claim 2, wherein the output of each of the decision trees is calculated using a first expression;

the first expression is:

wherein H (x) represents the output result of the decision tree, Y represents a real sample label, Y represents a label class set, T represents the number of decision trees in a random forest, T represents the T-th decision tree (number) in the forest, and H _t (x) The prediction result of the sample label y is represented, and x represents the feature vector corresponding to each sample.

4. The method for predicting the immune response of the bragg treatment according to claim 2, wherein the regression analysis algorithm is used to screen the feature data after the noise reduction treatment, and the optimal feature quantity is selected through cross-validation to obtain the optimal feature subset, and the method specifically comprises the following steps:

5. The method of claim 4, wherein all sample tags y in the training set are predicted using the second expression;

The second expression is:

wherein θ= (w, b) represents the target parameter, h _θ (x) According to the probability size, the prediction result (assigned as 1 or 0 in the case of classification) of the sample label is represented, x represents the feature vector corresponding to each sample, w represents the weight coefficient, and b represents the bias coefficient.

6. The method of claim 2, wherein the constructing an optimization function targeting minimizing the sum of logistic regression loss and regularization term is:

wherein m represents the number of samples in the training set, θ= (w, b) represents the target parameter, h _θ (x ⁱ ) Conditional probability representing tag value under sample knownLambda represents a parameter of the regularization term, x ⁱ Represents the i-th sample, y ⁱ Representing the label corresponding to the ith sample, k representing the number of feature elements, w _j The weight coefficient representing the j-th feature.

7. The method of claim 2, wherein collecting characteristic data of the patient sample comprises collecting characteristic data of a case sample not taking the target therapy and characteristic data of a case sample taking the target therapy.

8. A bragg treatment immune response prediction device, the device comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the program is executed.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.