CN114266729A - Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning - Google Patents

Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning Download PDF

Info

Publication number
CN114266729A
CN114266729A CN202111434814.2A CN202111434814A CN114266729A CN 114266729 A CN114266729 A CN 114266729A CN 202111434814 A CN202111434814 A CN 202111434814A CN 114266729 A CN114266729 A CN 114266729A
Authority
CN
China
Prior art keywords
patient
data
model
radiotherapy
curve
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111434814.2A
Other languages
Chinese (zh)
Inventor
陈思嘉
石丽婉
李涛
高翔
康峥
李夷民
林勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Xiamen University
Original Assignee
First Affiliated Hospital of Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Xiamen University filed Critical First Affiliated Hospital of Xiamen University
Priority to CN202111434814.2A priority Critical patent/CN114266729A/en
Publication of CN114266729A publication Critical patent/CN114266729A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Radiation-Therapy Devices (AREA)

Abstract

The invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the following steps: acquiring patient data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the working characteristic curve by using the test set to evaluate the precision of the model; and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis. The method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy.

Description

Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning
Technical Field
The invention relates to the field of machine learning, in particular to prediction of radiation pneumonitis after chest tumor radiotherapy based on machine learning.
Background
Radiation pneumonitis belongs to a part of radiation lung injury, is the most main and serious complication of breast tumor radiotherapy, and is usually found within 1-3 months after radiotherapy starts. 10% -30% of patients with thoracic radiotherapy can develop radiation pneumonitis, which not only limits the implementation of treatment and influences the treatment effect, but also reduces the quality of life and the survival rate of the patients, so that the reduction of the incidence rate of the radiation pneumonitis has important significance clinically. The accurate prediction of the occurrence condition of the radiation pneumonitis is beneficial to timely clinical intervention and reduces the radiotherapy risk of patients.
Current research has found that the occurrence of acute radiation pneumonitis is associated with a number of factors, such as: patient age, sex, KPS functional status score, whether suffering from hypertension, whether chemotherapy, chemotherapy cycles before radiotherapy, tumor target volume, tumor position, lymph node position, lung irradiated average dose, lung irradiated volume with 5Gy, 10Gy, 20Gy and 30Gy dose, patient daily irradiation dose and total treatment course irradiation dose. The incidence of radiation pneumonitis is evaluated by the irradiation volume (V5Gy, V20Gy) of 5Gy and 20Gy dose of the lung and the average irradiation dose of the lung in a clinical common radiotherapy plan. However, a simple and effective model capable of predicting the occurrence of the radiation pneumonitis by combining various factors is not established at present.
Disclosure of Invention
The invention mainly aims to overcome the defects in the prior art, and provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which can be combined with a large number of medical case databases, and a plurality of factors such as clinical information, dosage information, CT image omics and the like of a patient to more quickly and intuitively predict the radiation pneumonitis occurrence condition of the chest tumor patient after radiotherapy.
The invention adopts the following technical scheme:
a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning comprises the following steps:
acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
Specifically, the patient data includes patient lung CT omics data, patient clinical data, and patient radiotherapy plan dosimetry data, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
Specifically, the feature missing value prediction and redundant feature removal are performed, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
Specifically, the training generates an improved support vector machine model, specifically comprising:
training sample set D { (x)1,y1),(x2,y2),…,(xm,ym)},yiE { -1, +1 }; m is the characteristic number;
introducing "soft intervals" that allow certain samples not to satisfy constraints
yiTxi+b)≥1
The optimization objective can be written as
Figure BDA0003381259890000021
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
The model is an improved support vector machine model;
gaussian kernel function in model:
Figure BDA0003381259890000022
wherein, omega and b are model parameters, C is called penalty coefficient, gamma width coefficient, i is relaxation variable, and i is more than or equal to 0.
The embodiment of the invention also provides a system for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises:
an acquire patient data module: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
a data preprocessing module: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
a model training module: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
a model evaluation module: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
a prediction module: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
Specifically, the patient data includes patient lung CT omics data, patient clinical data, and patient radiotherapy plan dosimetry data, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
Specifically, in the data preprocessing module, the feature missing value prediction and the redundant feature removal are performed, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
Specifically, in the model training module, training to generate an improved support vector machine model specifically includes:
training sample set D { (x)1,y1),(x2,y2),…,(xm,ym)},iE { -1, +1 }; m is the characteristic number;
introducing "soft intervals" that allow certain samples not to satisfy constraints
yiTxi+b)≥1
The optimization objective can be written as
Figure BDA0003381259890000031
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
The model is an improved support vector machine model;
gaussian kernel function in model:
Figure BDA0003381259890000041
wherein, omega and b are model parameters, C is called penalty coefficient, gamma width coefficient, i is relaxation variable, and i is more than or equal to 0.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
(1) the invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the steps of firstly obtaining patient data, wherein the patient data comprises patient lung CT (computed tomography) image omics data, patient clinical data and patient radiotherapy plan dosimetry data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model; inputting relevant data of a patient to be predicted into a verification passing model, and predicting the occurrence of the radiation pneumonitis; the method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy. Compared with the traditional single-factor prediction and the traditional dosimetry prediction, the machine learning technology inputs various parameters simultaneously, and the prediction of the radiation pneumonitis is more accurate.
(2) The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning adopts an improved support vector machine model, introduces soft intervals into the model, and can realize quick and effective prediction.
Drawings
Fig. 1 is a flowchart of a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to an embodiment of the present invention;
fig. 2 is a structural diagram of a system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning according to an embodiment of the present invention.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The imagery omics features combine the shape, intensity, texture features of the original image of the lesion with the images transformed by various filters (e.g., wavelet and laplacian of gaussian). By combining a feature selection method and a machine learning algorithm, a prediction model can be constructed on a training data set, and further evaluation can be carried out on a test data set.
Machine learning is the ability of a machine to learn and predict future events and outcomes based on large data sets. In the field of healthcare, machine learning aims to improve the interpretation of medical data, thereby speeding up workflow, reducing errors, eliminating unnecessary expenses, and improving human health. As a risk factor for the development of radiation pneumonitis, patient imaging characteristics, clinical and therapeutic parameters, and dosimetry parameters are interleaved, and it is not possible to link radiation pneumonitis with a single parameter. Prediction of radiation pneumonitis using machine learning methods can help reduce patient toxicity, improve patient quality of life, and also bring about a reduction in patient medical costs.
As shown in fig. 1, a flowchart of a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to an embodiment of the present invention specifically includes:
s1: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
specifically, the patient data includes patient lung CT omics data, patient clinical data, and patient radiotherapy plan dosimetry data, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
In addition, when a new patient exists, corresponding data can be updated into the patient data, the number of samples is increased, and prediction accuracy is improved.
S2: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
non-digital features in patient data are digitized, such as: lymph node locations were classified as mediastinal, pulmonic, and clavicle, and in pretreatment, lymph node locations were defined as numbers: 1-mediastinum, 2-lung, 3-clavicle. Other non-digital features are treated the same.
Specifically, the feature missing value prediction and redundant feature removal are performed, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
S3: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
randomly selecting 70% of data as model training data, using the rest 30% of data as model test data, and using the training data to generate a support vector machine model;
given a training sample set D { (x)1,y1),(x2,y2),…,(xm,ym)},yiAnd e { -1, +1}, finding a partition hyperplane in the sample space based on the training set D, and separating different classes. For some training samples, which cannot be classified linearly, the samples may be mapped from the original space to a higher-dimensional feature space, such that the samplesLinearly separable within this feature space.
Figure BDA0003381259890000061
The feature vectors after mapping x are expressed, so the model corresponding to the partition of hyperplane in the feature space can be expressed as
Figure BDA0003381259890000062
Where ω and b are model parameters, in order to maximize the distance of any point x in the sample space from the hyperplane (ω, b), i.e. to know the dividing hyperplane with "maximum separation", i.e. to know the separation of the hyperplanes
Figure BDA0003381259890000063
s.t.yiTφ(xi)+b)≥1,i=1,2,…,m.
The dual problem is that
Figure BDA0003381259890000064
Figure BDA0003381259890000065
αi≥0,i=1,2,…,m.
Wherein phi (x)i)Tφ(xj) Is a sample xiAnd xjThe inner product after mapping to the feature space. Since the feature space dimension may be high, phi (x) is directly calculatedi)Tφ(xj) Is difficult. A kernel function may be defined:
κ(xi,xj)=<φ(xi),φ(xj)>=φ(xi)Tφ(xj)
the dual problem can be rewritten as
Figure BDA0003381259890000066
Figure BDA0003381259890000067
αi≥0,i=1,2,…,m.
After solving, obtain
Figure BDA0003381259890000071
κ (·,. cndot.) is the "Kernel function" (Kernel), and the above formula is called the "support vector formulation".
It is often difficult to determine a suitable kernel function in real-world problems so that the training samples are linearly separable in the feature space. To avoid model overfitting, the support vector machine is allowed to make errors on some samples, a 'soft interval' is introduced, and some samples are allowed not to meet the constraint
yiTxi+b)≥1
Samples that do not meet the constraints should be as few as possible while maximizing the separation. Thus, the optimization objective can be written as
Figure BDA0003381259890000072
Where C > 0 is a constant, called penalty factor, the above equation allows some samples not to satisfy the constraint when C takes a finite value. l0/1Is the "0/1 loss function":
Figure BDA0003381259890000073
however, since l0/1Non-convex and non-continuous, the mathematical property is not good, and some other functions are usually used to replace l0/1Is called as"loss of replacement". The model employs a change loss: lhinge(z) ═ max (0,1-z), the optimization objective can be written as
Figure BDA0003381259890000074
Introducing a 'relaxation variable' i ≧ 0, and each sample has a corresponding relaxation variable for representing the degree to which each sample does not satisfy the constraint. The above formula can be rewritten as
Figure BDA0003381259890000075
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
This is the "improved support vector machine". The dual problem of the above formula can be obtained by the Lagrange multiplier method
Figure BDA0003381259890000081
Figure BDA0003381259890000082
0≤αi≤C,i=1,2,…,m.
The kernel function selection mentioned above, depending on the type of hyperplane, selects a gaussian kernel function (RBF function) in the model that is suitable for the case of linear inseparability:
Figure BDA0003381259890000083
the width coefficient gamma in the formula defines the influence range of a single sample, and the larger the gamma is, the more support vectors are.
The penalty factor (regularization parameter) C, mentioned above, defines how tolerant it is to "foul" samples.
Using data in the training set, cross-validating GridSearchCV by grid search, wherein the value range of C is from 10-4To 1010Taking 15 numbers in the middle. Gamma ranges from 10-4To 10^3And taking 24 numbers in the middle. And traversing all given parameter combinations of the gamma and the C, training the data, and searching the optimal collocation value of the parameters of the gamma and the C, so that the prediction accuracy is highest. And generates a support vector machine model.
S4: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
s5: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
Based on the prediction, the characteristic parameters can be adjusted back for patients whose prediction is "occurring", such as lowering the prescribed dose of radiotherapy, modifying the radiotherapy plan, tightly controlling the exposure dose to the lungs, adjusting the number of chemotherapy cycles, etc. And inputting the new characteristic parameters into the output model to predict the occurrence of the radiation pneumonitis.
As shown in fig. 2, another embodiment of the present invention further provides a system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning, including:
the acquire patient data module 201: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
specifically, the patient data includes patient lung CT omics data, patient clinical data, and patient radiotherapy plan dosimetry data, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
In addition, when a new patient exists, corresponding data can be updated into the patient data, the number of samples is increased, and prediction accuracy is improved.
The data preprocessing module 202: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
non-digital features in patient data are digitized, such as: lymph node locations were classified as mediastinal, pulmonic, and clavicle, and in pretreatment, lymph node locations were defined as numbers: 1-mediastinum, 2-lung, 3-clavicle. Other non-digital features are treated the same.
Specifically, the feature missing value prediction and redundant feature removal are performed, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
The model training module 203: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
randomly selecting 70% of data as model training data, using the rest 30% of data as model test data, and using the training data to generate a support vector machine model;
given a training sample set D { (x)1,y1),(x2,y2),…,(xm,ym)},yiAnd e { -1, +1}, finding a partition hyperplane in the sample space based on the training set D, and separating different classes. For some training samples, which cannot be classified linearly, samples may be mapped from the original space to a higher-dimensional feature space, such that the samples are linearly separable within this feature space.
Figure BDA0003381259890000091
The feature vectors after mapping x are expressed, so the model corresponding to the partition of hyperplane in the feature space can be expressed as
Figure BDA0003381259890000092
Where ω and b are model parameters, in order to maximize the distance of any point x in the sample space from the hyperplane (ω, b), i.e. to know the dividing hyperplane with "maximum separation", i.e. to know the separation of the hyperplanes
Figure BDA0003381259890000093
s.t.yiTφ(xi)+b)≥1,i=1,2,…,m.
The dual problem is that
Figure BDA0003381259890000101
Figure BDA0003381259890000102
αi≥0,i=1,2,…,m.
Wherein phi (x)i)Tφ(xj) Is a sample xiAnd xjThe inner product after mapping to the feature space. Since the feature space dimension may be high, phi (x) is directly calculatedi)Tφ(xj) Is difficult. A kernel function may be defined:
κ(xi,xj)=<φ(xi),φ(xj)>=φ(xi)Tφ(xj)
the dual problem can be rewritten as
Figure BDA0003381259890000103
Figure BDA0003381259890000104
αi≥0,i=1,2,…,m.
After solving, obtain
Figure BDA0003381259890000105
κ (·,. cndot.) is the "Kernel function" (Kernel), and the above formula is called the "support vector formulation".
It is often difficult to determine a suitable kernel function in real-world problems so that the training samples are linearly separable in the feature space. To avoid model overfitting, the support vector machine is allowed to make errors on some samples, a 'soft interval' is introduced, and some samples are allowed not to meet the constraint
yiTxi+b)≥1
Samples that do not meet the constraints should be as few as possible while maximizing the separation. Thus, the optimization objective can be written as
Figure BDA0003381259890000106
Where C > 0 is a constant, called penalty factor, the above equation allows some samples not to satisfy the constraint when C takes a finite value. l0/1Is the "0/1 loss function":
Figure BDA0003381259890000111
however, since l0/1Non-convex and non-continuous, the mathematical property is not good, and some other functions are usually used to replace l0/1Referred to as "substitution loss". The model employs a change loss: lhinge(z) ═ max (0,1-z), the optimization objective can be written as
Figure BDA0003381259890000112
Introducing a 'relaxation variable' i ≧ 0, and each sample has a corresponding relaxation variable for representing the degree to which each sample does not satisfy the constraint. The above formula can be rewritten as
Figure BDA0003381259890000113
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
This is the "improved support vector machine". The dual problem of the above formula can be obtained by the Lagrange multiplier method
Figure BDA0003381259890000114
Figure BDA0003381259890000115
0≤αi≤C,i=1,2,…,m.
The kernel function selection mentioned above, depending on the type of hyperplane, selects a gaussian kernel function (RBF function) in the model that is suitable for the case of linear inseparability:
Figure BDA0003381259890000116
the width coefficient gamma in the formula defines the influence range of a single sample, and the larger the gamma is, the more support vectors are.
The penalty factor (regularization parameter) C, mentioned above, defines how tolerant it is to "foul" samples.
Using data in the training set, cross-validating GridSearchCV by grid search, wherein the value range of C is from 10-4To 1010Taking 15 numbers in the middle. Gamma ranges from 10-4To 10^3And taking 24 numbers in the middle. And traversing all given parameter combinations of the gamma and the C, training the data, and searching the optimal collocation value of the parameters of the gamma and the C, so that the prediction accuracy is highest. And generates a support vector machine model.
The model evaluation module 204: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
the prediction module 205: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
Based on the prediction, the characteristic parameters can be adjusted back for patients whose prediction is "occurring", such as lowering the prescribed dose of radiotherapy, modifying the radiotherapy plan, tightly controlling the exposure dose to the lungs, adjusting the number of chemotherapy cycles, etc. And inputting the new characteristic parameters into the output model to predict the occurrence of the radiation pneumonitis.
The invention provides a method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning, which comprises the steps of firstly obtaining patient data, wherein the patient data comprises patient lung CT (computed tomography) image omics data, patient clinical data and patient radiotherapy plan dosimetry data; digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data; dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model; calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model; inputting relevant data of a patient to be predicted into a verification passing model, and predicting the occurrence of the radiation pneumonitis; the method provided by the invention can be combined with a large number of medical case databases and a plurality of factors of clinical information, dosage information, CT image omics and the like of the patient, and can be used for more quickly and intuitively predicting the occurrence condition of the radiation pneumonitis of the breast tumor patient after radiotherapy. Compared with the traditional single-factor prediction and the traditional dosimetry prediction, the machine learning technology inputs various parameters simultaneously, and the prediction of the radiation pneumonitis is more accurate.
The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning adopts an improved support vector machine model, introduces soft intervals into the model, and can realize quick and effective prediction.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims (8)

1. A method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning is characterized by comprising the following steps:
acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
2. The method for predicting radiation pneumonitis after radiotherapy of breast tumor based on machine learning of claim 1, wherein the patient data includes CT image omics data of lung of patient, clinical data of patient, and radiation therapy planning dosimetry data of patient, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
3. The method for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to claim 1, wherein the feature missing value prediction and redundant feature removal are performed, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
4. The method for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning according to claim 1, wherein the training generates an improved support vector machine model, and specifically comprises:
training sample set D { (x)1,y1),(x2,y2),…,(xm,ym)},yiE { -1, +1 }; m is the characteristic number;
introducing "soft intervals" that allow certain samples not to satisfy constraints
yiTxi+b)≥1
The optimization objective can be written as
Figure FDA0003381259880000021
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
The model is an improved support vector machine model;
gaussian kernel function in model:
Figure FDA0003381259880000022
wherein, omega and b are model parameters, C is called penalty coefficient, gamma width coefficient, i is relaxation variable, and i is more than or equal to 0.
5. A system for predicting radiation pneumonitis after breast tumor radiotherapy based on machine learning, comprising:
an acquire patient data module: acquiring patient data, wherein the patient data comprises patient lung CT (computed tomography) imaging omics data, patient clinical data and patient radiotherapy plan dosimetry data;
a data preprocessing module: digitizing non-digital features in the patient data, and performing feature missing value prediction and redundant feature removal to obtain preprocessed patient data;
a model training module: dividing the preprocessed patient data into a training set and a testing set, and training to generate an improved support vector machine model;
a model evaluation module: calculating the area AUC under the curve of the ROC curve of the operating characteristic curve by using the test set to evaluate the precision of the model, if the area under the curve is more than or equal to 0.9, the verification is passed, and a verification passing model is output; if the area under the curve is less than 0.9, regenerating a support vector machine model;
a prediction module: and inputting the relevant data of the patient needing prediction into a verification passing model to predict the occurrence of the radiation pneumonitis.
6. The system of claim 5, wherein the patient data comprises CT imaging data of lung of patient, clinical data of patient, and dose data of radiotherapy plan of patient, specifically:
the CT imaging omics data of the lung of the patient comprise: the autocorrelation of the first-order texture feature minimum value under LLH wavelet transform, the first-order texture feature minimum value under HHL wavelet transform and the gray level co-occurrence matrix, and the autocorrelation of the gray level co-occurrence matrix under HHH wavelet transform;
patient clinical data include: the age, sex, KPS functional status score, hypertension, chemotherapy cycle number before radiotherapy, tumor target volume, tumor position, and lymph node position of breast tumor radiotherapy patient;
patient radiotherapy planning dosimetry data comprises: mean dose of lung irradiation, volume of lung receiving 5Gy, 10Gy, 20Gy, 30Gy dose, daily dose of patient irradiation, total course of treatment irradiation.
7. The system for predicting radiation pneumonitis after chest tumor radiotherapy based on machine learning according to claim 5, wherein the data preprocessing module performs feature missing value prediction and redundant feature removal, specifically:
fitting the non-missing patient data by adopting a K nearest neighbor algorithm, and predicting a characteristic missing value;
according to the variance maximization principle, a group of new vectors which are linearly independent and mutually orthogonal are used for representing the rows/columns of the original data matrix, so that the number of the features is compressed, and redundant features are eliminated.
8. The system of claim 5, wherein the model training module trains and generates an improved support vector machine model, and comprises:
training sample set D { (x)1,y1),(x2,y2),...,(xm,ym)},yiE { -1, +1 }; m is the characteristic number;
introducing "soft intervals" that allow certain samples not to satisfy constraints
yiTxi+b)≥1
The optimization objective can be written as
Figure FDA0003381259880000031
s.t.yiTxi+b)≥1-i
i≥0,i=1,2,…,m.
The model is an improved support vector machine model;
gaussian kernel function in model:
Figure FDA0003381259880000032
wherein, omega and b are model parameters, C is called penalty coefficient, gamma width coefficient, i is relaxation variable, and i is more than or equal to 0.
CN202111434814.2A 2021-11-29 2021-11-29 Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning Pending CN114266729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111434814.2A CN114266729A (en) 2021-11-29 2021-11-29 Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111434814.2A CN114266729A (en) 2021-11-29 2021-11-29 Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning

Publications (1)

Publication Number Publication Date
CN114266729A true CN114266729A (en) 2022-04-01

Family

ID=80825801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111434814.2A Pending CN114266729A (en) 2021-11-29 2021-11-29 Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN114266729A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707742A (en) * 2022-04-15 2022-07-05 中国医学科学院肿瘤医院 Artificial intelligence prediction method and system for adaptive radiotherapy strategy
CN115100155A (en) * 2022-07-01 2022-09-23 广州医科大学附属肿瘤医院 Method and system for establishing radiation pneumonitis prediction model
CN116311389A (en) * 2022-08-18 2023-06-23 荣耀终端有限公司 Fingerprint identification method and device
CN117745717A (en) * 2024-02-08 2024-03-22 江南大学附属医院 Method and system for predicting radiation pneumonitis by using dosimetry and deep learning characteristics

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707742A (en) * 2022-04-15 2022-07-05 中国医学科学院肿瘤医院 Artificial intelligence prediction method and system for adaptive radiotherapy strategy
CN115100155A (en) * 2022-07-01 2022-09-23 广州医科大学附属肿瘤医院 Method and system for establishing radiation pneumonitis prediction model
CN116311389A (en) * 2022-08-18 2023-06-23 荣耀终端有限公司 Fingerprint identification method and device
CN116311389B (en) * 2022-08-18 2023-12-12 荣耀终端有限公司 Fingerprint identification method and device
CN117745717A (en) * 2024-02-08 2024-03-22 江南大学附属医院 Method and system for predicting radiation pneumonitis by using dosimetry and deep learning characteristics
CN117745717B (en) * 2024-02-08 2024-04-26 江南大学附属医院 Method and system for predicting radiation pneumonitis by using dosimetry and deep learning characteristics

Similar Documents

Publication Publication Date Title
US11491350B2 (en) Decision support system for individualizing radiotherapy dose
Prajapati et al. Classification of dental diseases using CNN and transfer learning
CN114266729A (en) Chest tumor radiotherapy-based radiation pneumonitis prediction method and system based on machine learning
AU2016338923B2 (en) Pseudo-CT generation from MR data using a feature regression model
AU2016339009B2 (en) Pseudo-CT generation from MR data using tissue parameter estimation
US20210350179A1 (en) Method for detecting adverse cardiac events
Li et al. DenseX-net: an end-to-end model for lymphoma segmentation in whole-body PET/CT images
US8423596B2 (en) Methods of multivariate data cluster separation and visualization
CN113610845B (en) Construction method and prediction method of tumor local control prediction model and electronic equipment
Shakeri et al. Deep spectral-based shape features for Alzheimer’s disease classification
CN115131642B (en) Multi-modal medical data fusion system based on multi-view subspace clustering
CN110444294B (en) Auxiliary analysis method and equipment for prostate cancer based on perception neural network
Rios et al. Population model of bladder motion and deformation based on dominant eigenmodes and mixed-effects models in prostate cancer radiotherapy
CN116864109B (en) Medical image artificial intelligence auxiliary diagnosis system
CN113989551A (en) Alzheimer disease classification method based on improved ResNet network
Yao et al. Enhanced deep residual network for bone classification and abnormality detection
Landgren et al. An automated system for the detection and diagnosis of kidney lesions in children from scintigraphy images
CN108346471A (en) A kind of analysis method and device of pathological data
Liu et al. Combining ExtremeNet with Shape Constraints and Re-Discrimination to Detect Cells from CD56 Images
Chen et al. Low-dose CT image denoising and pulmonary nodule identification
Yan Gingivitis detection by Fractional Fourier Entropy and Biogeography-based Optimization
CN118365970B (en) Medical data classification method and device based on mutual correction and information fusion
Thilakavathy et al. Intelligent quotient estimation from MRI images using optimal light gradient boosting machine
Ma et al. Learning with distribution of optimized features for recognizing common CT imaging signs of lung diseases
Lu et al. An Approach to Classifying X-Ray Images of Scoliosis and Spondylolisthesis Based on Fine-Tuned Xception Model.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination