CN114266729A

CN114266729A - Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning

Info

Publication number: CN114266729A
Application number: CN202111434814.2A
Authority: CN
Inventors: 陈思嘉; 石丽婉; 李涛; 高翔; 康峥; 李夷民; 林勤
Original assignee: First Affiliated Hospital of Xiamen University
Current assignee: First Affiliated Hospital of Xiamen University
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-04-01

Abstract

The invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the following steps: acquiring patient data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the working characteristic curve by using the test set to evaluate the precision of the model; and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis. The method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy.

Description

Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning

Technical Field

The invention relates to the field of machine learning, in particular to prediction of radiation pneumonitis after chest tumor radiotherapy based on machine learning.

Background

Radiation pneumonitis belongs to a part of radiation lung injury, is the most main and serious complication of breast tumor radiotherapy, and is usually found within 1-3 months after radiotherapy starts. 10% -30% of patients with thoracic radiotherapy can develop radiation pneumonitis, which not only limits the implementation of treatment and influences the treatment effect, but also reduces the quality of life and the survival rate of the patients, so that the reduction of the incidence rate of the radiation pneumonitis has important significance clinically. The accurate prediction of the occurrence condition of the radiation pneumonitis is beneficial to timely clinical intervention and reduces the radiotherapy risk of patients.

Current research has found that the occurrence of acute radiation pneumonitis is associated with a number of factors, such as: patient age, sex, KPS functional status score, whether suffering from hypertension, whether chemotherapy, chemotherapy cycles before radiotherapy, tumor target volume, tumor position, lymph node position, lung irradiated average dose, lung irradiated volume with 5Gy, 10Gy, 20Gy and 30Gy dose, patient daily irradiation dose and total treatment course irradiation dose. The incidence of radiation pneumonitis is evaluated by the irradiation volume (V5Gy, V20Gy) of 5Gy and 20Gy dose of the lung and the average irradiation dose of the lung in a clinical common radiotherapy plan. However, a simple and effective model capable of predicting the occurrence of the radiation pneumonitis by combining various factors is not established at present.

Disclosure of Invention

The invention mainly aims to overcome the defects in the prior art, and provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which can be combined with a large number of medical case databases, and a plurality of factors such as clinical information, dosage information, CT image omics and the like of a patient to more quickly and intuitively predict the radiation pneumonitis occurrence condition of the chest tumor patient after radiotherapy.

The invention adopts the following technical scheme:

a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning comprises the following steps:

acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;

digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;

dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;

calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;

and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.

Specifically, the patient data includes patient lung CT omics data, patient clinical data, and patient radiotherapy plan dosimetry data, specifically:

the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;

patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;

patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.

Specifically, the feature missing value prediction and redundant feature removal are performed, specifically:

fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;

according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.

Specifically, the training generates an improved support vector machine model, specifically comprising:

training sample set D { (x)₁，y₁)，(x₂，y₂)，…，(x_m,y_m)},y_iE { -1, +1 }; m is the characteristic number;

introducing "soft intervals" that allow certain samples not to satisfy constraints

y_i(ω^Tx_i+b)≥1

The optimization objective can be written as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0，i＝1，2，…，m.

The model is an improved support vector machine model;

gaussian kernel function in model:

wherein, omega and b are model parameters, C is called penalty coefficient, gamma width coefficient, i is relaxation variable, and i is more than or equal to 0.

The embodiment of the invention also provides a system for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises:

an acquire patient data module: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;

a data preprocessing module: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;

a model training module: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;

a model evaluation module: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;

a prediction module: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.

Specifically, in the data preprocessing module, the feature missing value prediction and the redundant feature removal are performed, specifically:

Specifically, in the model training module, training to generate an improved support vector machine model specifically includes:

training sample set D { (x)₁，y₁),(x₂,y₂),…,(x_m,y_m)},_iE { -1, +1 }; m is the characteristic number;

y_i(ω^Tx_i+b)≥1

The optimization objective can be written as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0,i＝1,2，…,m.

The model is an improved support vector machine model;

gaussian kernel function in model:

As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:

(1) the invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the steps of firstly obtaining patient data, wherein the patient data comprises patient lung CT (computed tomography) image omics data, patient clinical data and patient radiotherapy plan dosimetry data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model; inputting relevant data of a patient to be predicted into a verification passing model, and predicting the occurrence of the radiation pneumonitis; the method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy. Compared with the traditional single-factor prediction and the traditional dosimetry prediction, the machine learning technology inputs various parameters simultaneously, and the prediction of the radiation pneumonitis is more accurate.

(2) The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning adopts an improved support vector machine model, introduces soft intervals into the model, and can realize quick and effective prediction.

Drawings

Fig. 1 is a flowchart of a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to an embodiment of the present invention;

fig. 2 is a structural diagram of a system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning according to an embodiment of the present invention.

The invention is described in further detail below with reference to the figures and specific examples.

Detailed Description

The imagery omics features combine the shape, intensity, texture features of the original image of the lesion with the images transformed by various filters (e.g., wavelet and laplacian of gaussian). By combining a feature selection method and a machine learning algorithm, a prediction model can be constructed on a training data set, and further evaluation can be carried out on a test data set.

Machine learning is the ability of a machine to learn and predict future events and outcomes based on large data sets. In the field of healthcare, machine learning aims to improve the interpretation of medical data, thereby speeding up workflow, reducing errors, eliminating unnecessary expenses, and improving human health. As a risk factor for the development of radiation pneumonitis, patient imaging characteristics, clinical and therapeutic parameters, and dosimetry parameters are interleaved, and it is not possible to link radiation pneumonitis with a single parameter. Prediction of radiation pneumonitis using machine learning methods can help reduce patient toxicity, improve patient quality of life, and also bring about a reduction in patient medical costs.

As shown in fig. 1, a flowchart of a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to an embodiment of the present invention specifically includes:

s1: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;

In addition, when a new patient exists, corresponding data can be updated into the patient data, the number of samples is increased, and prediction accuracy is improved.

S2: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;

non-digital features in patient data are digitized, such as: lymph node locations were classified as mediastinal, pulmonic, and clavicle, and in pretreatment, lymph node locations were defined as numbers: 1-mediastinum, 2-lung, 3-clavicle. Other non-digital features are treated the same.

S3: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;

randomly selecting 70% of data as model training data, using the rest 30% of data as model test data, and using the training data to generate a support vector machine model;

given a training sample set D { (x)₁,y₁),(x₂,y₂),…,(x_m,y_m)},y_iAnd e { -1, +1}, finding a partition hyperplane in the sample space based on the training set D, and separating different classes. For some training samples, which cannot be classified linearly, the samples may be mapped from the original space to a higher-dimensional feature space, such that the samplesLinearly separable within this feature space.

The feature vectors after mapping x are expressed, so the model corresponding to the partition of hyperplane in the feature space can be expressed as

Where ω and b are model parameters, in order to maximize the distance of any point x in the sample space from the hyperplane (ω, b), i.e. to know the dividing hyperplane with "maximum separation", i.e. to know the separation of the hyperplanes

s.t.y_i(ω^Tφ(x_i)+b)≥1，i＝1,2,…,m.

The dual problem is that

α_i≥0，i＝1,2,…,m.

Wherein phi (x)_i)^Tφ(x_j) Is a sample x_iAnd x_jThe inner product after mapping to the feature space. Since the feature space dimension may be high, phi (x) is directly calculated_i)^Tφ(x_j) Is difficult. A kernel function may be defined:

κ(x_i,x_j)＝<φ(x_i),φ(x_j)>＝φ(x_i)^Tφ(x_j)

the dual problem can be rewritten as

α_i≥0，i＝1,2,…，m.

After solving, obtain

κ (·,. cndot.) is the "Kernel function" (Kernel), and the above formula is called the "support vector formulation".

It is often difficult to determine a suitable kernel function in real-world problems so that the training samples are linearly separable in the feature space. To avoid model overfitting, the support vector machine is allowed to make errors on some samples, a 'soft interval' is introduced, and some samples are allowed not to meet the constraint

y_i(ω^Tx_i+b)≥1

Samples that do not meet the constraints should be as few as possible while maximizing the separation. Thus, the optimization objective can be written as

Where C > 0 is a constant, called penalty factor, the above equation allows some samples not to satisfy the constraint when C takes a finite value. l_0/1Is the "0/1 loss function":

however, since l_0/1Non-convex and non-continuous, the mathematical property is not good, and some other functions are usually used to replace l_0/1Is called as"loss of replacement". The model employs a change loss: l_hinge(z) ═ max (0,1-z), the optimization objective can be written as

Introducing a 'relaxation variable' i ≧ 0, and each sample has a corresponding relaxation variable for representing the degree to which each sample does not satisfy the constraint. The above formula can be rewritten as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0，i＝1,2，…,m.

This is the "improved support vector machine". The dual problem of the above formula can be obtained by the Lagrange multiplier method

0≤α_i≤C，i＝1,2,…,m.

The kernel function selection mentioned above, depending on the type of hyperplane, selects a gaussian kernel function (RBF function) in the model that is suitable for the case of linear inseparability:

the width coefficient gamma in the formula defines the influence range of a single sample, and the larger the gamma is, the more support vectors are.

The penalty factor (regularization parameter) C, mentioned above, defines how tolerant it is to "foul" samples.

Using data in the training set, cross-validating GridSearchCV by grid search, wherein the value range of C is from 10^-4To 10¹⁰Taking 15 numbers in the middle. Gamma ranges from 10^-4To 10^{^3}And taking 24 numbers in the middle. And traversing all given parameter combinations of the gamma and the C, training the data, and searching the optimal collocation value of the parameters of the gamma and the C, so that the prediction accuracy is highest. And generates a support vector machine model.

S4: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;

s5: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.

Based on the prediction, the characteristic parameters can be adjusted back for patients whose prediction is "occurring", such as lowering the prescribed dose of radiotherapy, modifying the radiotherapy plan, tightly controlling the exposure dose to the lungs, adjusting the number of chemotherapy cycles, etc. And inputting the new characteristic parameters into the output model to predict the occurrence of the radiation pneumonitis.

As shown in fig. 2, another embodiment of the present invention further provides a system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning, including:

the acquire patient data module 201: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;

The data preprocessing module 202: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;

The model training module 203: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;

given a training sample set D { (x)₁,y₁),(x₂，y₂)，…，(x_m，y_m)},y_iAnd e { -1, +1}, finding a partition hyperplane in the sample space based on the training set D, and separating different classes. For some training samples, which cannot be classified linearly, samples may be mapped from the original space to a higher-dimensional feature space, such that the samples are linearly separable within this feature space.

s.t.y_i(ω^Tφ(x_i)+b)≥1，i＝1,2,…，m.

The dual problem is that

α_i≥0，i＝1,2,…,m.

κ(x_i,x_j)＝<φ(x_i),φ(x_j)>＝φ(x_i)^Tφ(x_j)

the dual problem can be rewritten as

α_i≥0，i＝1,2,…,m.

After solving, obtain

y_i(ω^Tx_i+b)≥1

however, since l_0/1Non-convex and non-continuous, the mathematical property is not good, and some other functions are usually used to replace l_0/1Referred to as "substitution loss". The model employs a change loss: l_hinge(z) ═ max (0,1-z), the optimization objective can be written as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0，i＝1,2，…,m.

0≤α_i≤C，i＝1,2,…,m.

The model evaluation module 204: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;

the prediction module 205: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.

The invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the steps of firstly obtaining patient data, wherein the patient data comprises patient lung CT (computed tomography) image omics data, patient clinical data and patient radiotherapy plan dosimetry data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model; inputting relevant data of a patient to be predicted into a verification passing model, and predicting the occurrence of the radiation pneumonitis; the method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy. Compared with the traditional single-factor prediction and the traditional dosimetry prediction, the machine learning technology inputs various parameters simultaneously, and the prediction of the radiation pneumonitis is more accurate.

The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning adopts an improved support vector machine model, introduces soft intervals into the model, and can realize quick and effective prediction.

The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims

1. A method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning is characterized by comprising the following steps:

2. The method for predicting radiation pneumonitis after radiotherapy of breast tumor based on machine learning of claim 1, wherein the patient data includes CT image omics data of lung of patient, clinical data of patient, and radiation therapy planning dosimetry data of patient, specifically:

3. The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to claim 1, wherein the feature missing value prediction and redundant feature removal are performed, specifically:

4. The method for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning according to claim 1, wherein the training generates an improved support vector machine model, and specifically comprises:

training sample set D { (x)₁,y₁),(x₂,y₂),…,(x_m,y_m)}，y_iE { -1, +1 }; m is the characteristic number;

y_i(ω^Tx_i+b)≥1

The optimization objective can be written as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0，i＝1,2，…，m.

The model is an improved support vector machine model;

gaussian kernel function in model:

5. A system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning, comprising:

6. The system of claim 5, wherein the patient data comprises CT imaging data of lung of patient, clinical data of patient, and dose data of radiotherapy plan of patient, specifically:

7. The system for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to claim 5, wherein the data preprocessing module performs feature missing value prediction and redundant feature removal, specifically:

8. The system of claim 5, wherein the model training module trains and generates an improved support vector machine model, and comprises:

training sample set D { (x)₁，y₁)，(x₂，y₂)，...，(x_m，y_m)}，y_iE { -1, +1 }; m is the characteristic number;

y_i(ω^Tx_i+b)≥1

The optimization objective can be written as

s.t.y_i(ω^Tx_i+b)≥1-i

i≥0，i＝1，2，…，m.

The model is an improved support vector machine model;

gaussian kernel function in model: