CN114332577A

CN114332577A - Colorectal cancer image classification method and system combining deep learning and radiomics

Info

Publication number: CN114332577A
Application number: CN202111648121.3A
Authority: CN
Inventors: 黄立勤; 何甜; 潘林; 郑绍华
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12

Abstract

The present invention proposes a colorectal cancer image classification method and system combining deep learning and radiomics, aiming at the small sample problem of deep learning model training, making full use of existing data for data enhancement (rotation, translation, image transformation, etc.) In order to solve the problem of time-consuming and labor-intensive manual labeling by doctors, an automatic segmentation network model of deep learning is introduced to automatically label regions of interest from images. In view of the problem that the features extracted by the deep learning model are poorly interpretable and the feature information is not comprehensive enough, the combination of radiomics features, deep learning features and clinicopathological information is used to obtain more and more comprehensive feature information, and further improve the performance of radiomics. Classification accuracy and reliability.

Description

Colorectal cancer image classification method and system combining deep learning and radiomics

技术领域technical field

本发明属于医学图像处理和图像分类技术领域，尤其涉及一种结合深度学习与影像组学的结直肠癌图像分类方法及系统。The invention belongs to the technical field of medical image processing and image classification, and in particular relates to a colorectal cancer image classification method and system combining deep learning and radiomics.

背景技术Background technique

1、影像组学的方案：影像组学主要借助计算机软件高通量地从CT、MRI及PET影像中提取大量高维的定量影像特征；其流程包括采集数据、图像分割、特征提取、特征选择及模型建立，凭借对海量影像数据信息进行更深层次的挖掘、预测和分析来辅助医师做出最准确的影像分类。在实际应用中，影像组学病灶需要医生手工标记，不仅耗时且存在主观偏差。除此之外，采用定量方法计算目标特征时缺少一套规范和标准的流程和质控体系，限制了这类方法的性能。1. Radiomics solution: Radiomics mainly uses computer software to extract a large number of high-dimensional quantitative image features from CT, MRI and PET images with high throughput; the process includes data collection, image segmentation, feature extraction, and feature selection. And model building, relying on deeper mining, prediction and analysis of massive image data information to assist physicians in making the most accurate image classification. In practical applications, radiomic lesions require manual marking by doctors, which is not only time-consuming but also subject to subjective bias. In addition, the lack of a standardized and standard process and quality control system for the calculation of target characteristics using quantitative methods limits the performance of such methods.

2、深度学习的方案：深度学习是指通过组合低层特征形成更加抽象的高层特征或类别，进而从大量的输入数据中学习有效特征，并把这些特征用于分类、回归和信息检索的一种技术。模型可以自动学习提取和选择图像特征并进行预测，从而更全面、深入地挖掘图像中的信息。其模型种类众多，目前在医学影像学领域，卷积神经网络(convolutionneural network, CNN)的应用最为广泛。使用卷积网络使得训练和推理过程需要占用了大量的计算资源。深度学习在完成模型训练后，可以实现影像全自动分析，但是，这种优势是建立在更加高昂的数据获取成本之上的。深度学习要求更多收集和标注更多的数据，数据量可能是影像组学的十倍或百倍以上。2. The scheme of deep learning: deep learning refers to a method of combining low-level features to form more abstract high-level features or categories, and then learning effective features from a large amount of input data, and using these features for classification, regression and information retrieval. technology. Models can automatically learn to extract and select image features and make predictions, thereby mining information in images more comprehensively and deeply. There are many types of models, and currently in the field of medical imaging, convolution neural network (CNN) is the most widely used. Using convolutional networks makes the training and inference process very computationally intensive. After completing model training, deep learning can realize fully automatic analysis of images, but this advantage is based on higher data acquisition costs. Deep learning requires more collection and labeling of more data, perhaps ten or more times that of radiomics.

如上所述，现有的结直肠癌医学图像分类所存在的缺点主要有：As mentioned above, the shortcomings of the existing colorectal cancer medical image classification mainly include:

1、影像组学需要医生手工标记数据，耗时且存在主观偏差，不同厂家的影像设备在扫描参数和重建算法方面尚缺乏统一标准，其提取的特征也是表型特征，不够深层次。1. Radiomics requires doctors to manually label data, which is time-consuming and subject to subjective bias. Imaging equipment from different manufacturers still lacks uniform standards in terms of scanning parameters and reconstruction algorithms, and the extracted features are also phenotypic features, which are not deep enough.

2、深度学习模型需要大量的数据进行训练，占用大量的计算资源，对硬件设备要求较高，提取的深度学习特征可解释性差。2. The deep learning model requires a large amount of data for training, occupies a large amount of computing resources, requires high hardware equipment, and the extracted deep learning features are poorly interpretable.

3、所提取的特征还不够全面。仅使用影像组学特征或者深度学习特征，没有将两者结合。3. The extracted features are not comprehensive enough. Only radiomics features or deep learning features are used, not a combination of the two.

发明内容SUMMARY OF THE INVENTION

为了弥补现有技术的空白和不足，本发明提出一种结合深度学习与影像组学的结直肠癌图像分类方法及系统。其主要设计包括：In order to make up for the gaps and deficiencies of the prior art, the present invention proposes a colorectal cancer image classification method and system combining deep learning and radiomics. Its main designs include:

1、针对深度学习模型训练种小样本问题，充分利用已有的数据进行数据增强（旋转，平移、图像变换等）1. For the small sample problem of deep learning model training, make full use of existing data for data enhancement (rotation, translation, image transformation, etc.)

2、为了解决医生手工标记耗时耗力的问题，引入深度学习的自动分割网络模型，实现自动的从图像中标注感兴趣区域。2. In order to solve the problem of time-consuming and labor-intensive manual labeling by doctors, an automatic segmentation network model of deep learning is introduced to automatically label regions of interest from images.

3、针对深度学习模型提取的特征可解释性差，获取特征信息不够全面的问题，采用影像组学特征、深度学习特征和临床病理信息三者融合获取更多更全面的特征信息，进一步提高影像组学的分类准确性和可靠性。3. In view of the problem that the features extracted by the deep learning model are poorly interpretable and the feature information obtained is not comprehensive enough, the combination of radiomics features, deep learning features and clinicopathological information is used to obtain more and more comprehensive feature information to further improve the imaging group. classification accuracy and reliability.

本发明具体采用以下技术方案：The present invention specifically adopts the following technical solutions:

一种结合深度学习与影像组学的结直肠癌图像分类方法，其特征在于，包括以下步骤：A colorectal cancer image classification method combining deep learning and radiomics, characterized in that it includes the following steps:

步骤S1：数据预处理：采用数据增强方法，对结直肠癌图像数据进行扩增；使手动标记数据对分割网络进行训练，利用训练好的模型从图像中自动分割感兴趣区域，以获取更多含有标注的数据；Step S1: Data preprocessing: using the data enhancement method to amplify the colorectal cancer image data; use the manually labeled data to train the segmentation network, and use the trained model to automatically segment the region of interest from the image to obtain more contains labeled data;

步骤S2：特征提取：通过开源的Python软件包Pyradiomics从腹部CT中提取出相关组学特征；采用resnet训练模型选择结果最好的一个模型，用这个模型去提取深度学习特征；Step S2: Feature extraction: Extract relevant omics features from abdominal CT through the open source Python software package Pyradiomics; use the resnet training model to select a model with the best results, and use this model to extract deep learning features;

步骤S3：特征选择：去除相关特征，计算Pearson相关矩阵剔除高相关特征（p>0.90)；并利用递归特征消除的方法根据预测能力对剩余特征排序；Step S3: Feature selection: remove the relevant features, calculate the Pearson correlation matrix to eliminate the highly correlated features (p>0.90); and use the method of recursive feature elimination to sort the remaining features according to the predictive ability;

步骤S4：特征融合：采用自然语言处理技术转换医生提供的临床病理信息；将影像组学特征、深度学习特征和临床病理信息三者融合，以获得更加全面的特征信息；Step S4: Feature fusion: using natural language processing technology to convert the clinicopathological information provided by the doctor; fuse the radiomics features, deep learning features and clinicopathological information to obtain more comprehensive feature information;

步骤S5：采用集成学习，利用包括支持向量机SVM、贝叶斯判别器、逻辑回归判别器和Lasso回归的分类器，预测时对各分类器的分类结果进行投票，选出最好的模型用于执行最终的图像分类。Step S5: Using ensemble learning, using classifiers including support vector machine SVM, Bayesian discriminator, logistic regression discriminator and Lasso regression, vote on the classification results of each classifier during prediction, and select the best model to use. to perform the final image classification.

进一步地，在步骤S1中：首先对已有的结直肠癌图像数据进行数据增强，获取更多的腹部图像，包括通过图像旋转，图像平移和图像转换的方式，将扩增后的数据输入深度学习网络U-Net进行训练，模型训练后可用于自动分割出感兴趣区域，并通过手动微调分割模型得到的病变区域，以此获得更多的标注数据。Further, in step S1: first, perform data enhancement on the existing colorectal cancer image data to obtain more abdominal images, including inputting the amplified data into the depth through image rotation, image translation and image conversion. The learning network U-Net is trained. After the model is trained, it can be used to automatically segment the region of interest, and manually fine-tune the lesion region obtained by the segmentation model to obtain more labeled data.

进一步地，在步骤S3中：特征选择首先遍历所有特征，两两计算Pearson相关系数，当P>0.90时，随机去除其中一个，以使得降维后的特征不具有高相似度；然后利用递归特征消除，根据预测能力对剩余特征排序。Further, in step S3: feature selection first traverses all features, calculates Pearson correlation coefficients in pairs, when P>0.90, removes one of them randomly, so that the features after dimension reduction do not have high similarity; then use recursive features Elimination, the remaining features are sorted according to their predictive power.

进一步地，在步骤S4中：临床病理信息特征选择通过逐步判别回归，将所有特征依次引入进行逐个检验；当原引入的特征变量由于后面变量的引入而变得不再显著时，将其剔除；反复进行直到既无显著的变量选入方程，也无不显著自变量从回归方程中剔除为止。Further, in step S4: clinical pathological information feature selection is performed by stepwise discriminant regression, and all features are introduced one by one for testing one by one; when the originally introduced feature variable becomes no longer significant due to the introduction of subsequent variables, it is eliminated; Repeat until neither significant variables are selected into the equation nor non-significant independent variables are eliminated from the regression equation.

以及，一种结合深度学习与影像组学的结直肠癌图像分类系统，其特征在于：基于计算机系统，包括：And, a colorectal cancer image classification system combining deep learning and radiomics, characterized in that: based on a computer system, including:

数据预处理模块，采用数据增强方法，对结直肠癌图像数据进行扩增；使用手动标记数据对分割网络进行训练，利用训练好的模型从图像中自动分割感兴趣区域，以获取更多含有标注的数据；The data preprocessing module adopts the data enhancement method to amplify the colorectal cancer image data; uses the manually labeled data to train the segmentation network, and uses the trained model to automatically segment the region of interest from the image to obtain more labeled data. The data;

特征提取模块，通过开源的Python软件包Pyradiomics从腹部CT中提取出相关组学特征；采用resnet训练模型选择结果最好的一个模型，用这个模型去提取深度学习特征；The feature extraction module extracts relevant omics features from abdominal CT through the open source Python software package Pyradiomics; uses the resnet training model to select the model with the best results, and uses this model to extract deep learning features;

特征选择模块，用于去除相关特征，计算Pearson相关矩阵剔除高相关特征（p>0.90)；并利用递归特征消除的方法根据预测能力对剩余特征排序；The feature selection module is used to remove relevant features, calculate the Pearson correlation matrix to eliminate high correlation features (p>0.90); and use the method of recursive feature elimination to sort the remaining features according to the predictive ability;

特征融合模块，采用自然语言处理技术转换医生提供的临床病理信息；将影像组学特征、深度学习特征和临床病理信息三者融合，以获得更加全面的特征信息；Feature fusion module, which uses natural language processing technology to convert clinical pathological information provided by doctors; integrates radiomics features, deep learning features and clinical pathological information to obtain more comprehensive feature information;

集成学习模块，利用包括支持向量机SVM、贝叶斯判别器、逻辑回归判别器和Lasso回归的分类器，预测时对各分类器的分类结果进行投票，选出最好的模型用于执行最终的图像分类。The integrated learning module uses classifiers including support vector machine SVM, Bayesian discriminator, logistic regression discriminator and Lasso regression to vote on the classification results of each classifier during prediction, and select the best model for final execution. image classification.

进一步地，在所述数据预处理模块中，数据增强采用图像旋转，图像平移和图像转换的方式，并将扩增后的数据输入深度学习网络U-Net进行训练。Further, in the data preprocessing module, the data enhancement adopts the methods of image rotation, image translation and image conversion, and the augmented data is input into the deep learning network U-Net for training.

进一步地，在所述特征选择模块中，首先遍历所有特征，两两计算Pearson相关系数，当P>0.90时，随机去除其中一个，以使得降维后的特征不具有高相似度；然后利用递归特征消除，根据预测能力对剩余特征排序。Further, in the feature selection module, firstly traverse all the features, calculate the Pearson correlation coefficients in pairs, when P>0.90, remove one of them randomly, so that the features after dimension reduction do not have high similarity; then use recursion Feature elimination, sorting the remaining features according to their predictive power.

进一步地，在所述特征融合模块中，对临床病理信息特征的选择处理为，通过逐步判别回归，将所有特征依次引入进行逐个检验；当原引入的特征变量由于后面变量的引入而变得不再显著时，将其剔除；反复进行直到既无显著的变量选入方程，也无不显著自变量从回归方程中剔除为止。Further, in the feature fusion module, the selection processing of clinical pathological information features is to introduce all the features one by one through step-by-step discriminant regression and test them one by one; When it is significant again, it is eliminated; it is repeated until neither significant variables are selected into the equation nor non-significant independent variables are eliminated from the regression equation.

与现有技术相比，本发明及其优选方案的主要设计点包括：Compared with the prior art, the main design points of the present invention and its preferred solution include:

1、针对结直肠癌医学图像分类的任务，设计了一种融合影像组学特征、深度学习特征和临床特征的网络，能够实现将多种特征融合，不仅有具有可解释性的影像组学特征，还结合更深层次的抽象的深度学习特征以及医生和病人提供的信息。1. For the task of colorectal cancer medical image classification, a network that integrates radiomics features, deep learning features and clinical features is designed, which can realize the fusion of multiple features, not only interpretable radiomics features , which also incorporates deeper abstraction of deep learning features and information provided by doctors and patients.

2、充分结合了影像组学和深度学习。深度学习应用在数据预处理阶段，包括数据扩增和自动分割感兴趣区域，有效的缓解数据量少以及医生标注耗时等限制。深度学习模型应用在提取高维有效特征，自然语言处理技术用于分析临床病理信息，而影像组学能够提取丰富的表型特征。2. Fully combines radiomics and deep learning. The application of deep learning in the data preprocessing stage, including data augmentation and automatic segmentation of regions of interest, can effectively alleviate the limitations of small data volume and time-consuming doctor labeling. Deep learning models are used to extract high-dimensional effective features, natural language processing techniques are used to analyze clinicopathological information, and radiomics can extract rich phenotypic features.

3、集成学习建模，多种分类器的使用，使得最终结果更加全面。3. Integrated learning modeling and the use of multiple classifiers make the final result more comprehensive.

其相对于现有技术的优势包括：Its advantages over existing technologies include:

1、现有方法单独使用影像组学特征或者深度学习特征，特征信息应用不够全面，影像组学需要医生手动标注的特征区域耗时耗力且提取的特征为表型特征不够全面，深度学习需要大量的数据且其提取的特征为深层的抽象特征可解释性不强。1. Existing methods use radiomics features or deep learning features alone, and the application of feature information is not comprehensive enough. Radiomics requires doctors to manually label feature regions that are time-consuming and labor-intensive, and the extracted features are not comprehensive enough for phenotypic features. Deep learning requires A large amount of data and its extracted features are deep abstract features, and the interpretability is not strong.

本发明及其优选方案引入了具有可解释的影像组学特征和深层抽象特征以及临床病理特征，不仅解决了特征提取不全面，解释性不强等弊端，还解决了数量需求量大，手动标注耗时耗力等局限。The present invention and its preferred solution introduce interpretable radiomics features, deep abstract features and clinical pathological features, which not only solves the drawbacks of incomplete feature extraction and poor interpretability, but also solves the problem of large quantity demand and manual labeling. Time-consuming and labor-intensive, etc.

2、现有加入的临床信息大都是一些性别、年龄等定量特征，比较有局限。2. Most of the clinical information currently added are some quantitative characteristics such as gender and age, which are relatively limited.

本发明及其优选方案引入了自然语言处理（NLP)的方法处理临床病理信息，这使得导入的信息更加全面。NLP技术能够处理大批量的文本数据，让机器理解更丰富的文本信息并加以利用。The present invention and its preferred solution introduce the method of natural language processing (NLP) to process clinical pathological information, which makes the imported information more comprehensive. NLP technology can process large amounts of text data, allowing machines to understand and utilize richer text information.

附图说明Description of drawings

图1为本发明实施例所设计的分类网络框架示意图；1 is a schematic diagram of a classification network framework designed by an embodiment of the present invention;

图2为本发明实施例采用的U-Net网络结构示意图；2 is a schematic diagram of a U-Net network structure adopted in an embodiment of the present invention;

图3为本发明实施例采用的Resnet网络结构示意图。FIG. 3 is a schematic structural diagram of a Resnet network adopted in an embodiment of the present invention.

具体实施方式Detailed ways

为让本专利的特征和优点能更明显易懂，下文特举实施例，作详细说明如下：In order to make the features and advantages of this patent more obvious and easy to understand, the following specific examples are given and described in detail as follows:

如图1所示，本实施例提出的结合深度学习与影像组学的结直肠癌图像分类方法及系统具体包括以下方案：As shown in Figure 1, the colorectal cancer image classification method and system combining deep learning and radiomics proposed in this embodiment specifically includes the following solutions:

（1）数据预处理：采用数据增强方法（旋转，平移、图像变换等），充分对已有的数据进行扩增；使用已有的手动标记数据对分割网络进行训练，利用训练好的模型从图像中自动分割感兴趣区域，来获取更多含有标注的数据。(1) Data preprocessing: use data enhancement methods (rotation, translation, image transformation, etc.) to fully augment the existing data; use the existing manually labeled data to train the segmentation network, and use the trained model from Automatically segment regions of interest in images to obtain more labeled data.

（2）特征提取：通过开源的Python软件包Pyradiomics从腹部CT中提取出相关组学特征；resnet训练模型选择结果最好的一个模型，用这个模型去提取深度学习特征。(2) Feature extraction: Extract relevant omics features from abdominal CT through the open source Python software package Pyradiomics; the resnet training model selects the model with the best results, and uses this model to extract deep learning features.

（3）特征融合：采用自然语言处理(NLP)技术转换医生提供的临床病理信息。将影像组学特征、深度学习特征和临床病理信息三者融合获得更加全面的特征信息。(3) Feature fusion: natural language processing (NLP) technology is used to convert clinicopathological information provided by doctors. Integrate radiomic features, deep learning features and clinicopathological information to obtain more comprehensive feature information.

（4）特征选择：去除相关特征，计算Pearson相关矩阵剔除高相关特征（p>0.90)；并利用递归特征消除的方法根据预测能力对剩余特征排序。(4) Feature selection: Remove the relevant features, calculate the Pearson correlation matrix to eliminate the highly correlated features (p>0.90); and use the recursive feature elimination method to sort the remaining features according to the predictive ability.

（5）分类器：采用集成学习（Ensemble Learning)，包含支持向量机（SVM）、贝叶斯判别器、逻辑回归判别器和Lasso回归等分类器。预测时对各分类器的分类结果进行投票，选出最好的模型。(5) Classifier: Ensemble Learning is adopted, including support vector machine (SVM), Bayesian discriminator, logistic regression discriminator and Lasso regression and other classifiers. When predicting, the classification results of each classifier are voted to select the best model.

2、细节设计2. Detailed design

（1）基于深度学习和影像组学结合的分类网络(1) Classification network based on the combination of deep learning and radiomics

本实施例所设计的分类网络框架如图1所示，首先对原有数据进行数据增强，获取更多的腹部图像，主要是通过图像旋转，图像平移和图像转换等方式。将扩增后的数据输入深度学习网络U-Net进行训练，模型训练后可用于自动分割出感兴趣区域，并由专业医生手动微调分割模型得到的病变区域，以此获得更多的标注数据，此方法极大的减少了医生人工标注的时间。The classification network framework designed in this embodiment is shown in Figure 1. First, data enhancement is performed on the original data to obtain more abdominal images, mainly through image rotation, image translation, and image conversion. The amplified data is input into the deep learning network U-Net for training. After the model is trained, it can be used to automatically segment the region of interest, and the lesion region obtained by the segmentation model can be manually fine-tuned by professional doctors to obtain more labeled data. This method greatly reduces the time for manual annotation by doctors.

通过一个开源的Python软件包Pyradiomics从腹部CT中提取出影像组学特征，再进行特征选择，结合深度学习网络提取的深度特征以及经过NLP处理的临床特征通过集成学习进行建模。采用集成学习（Ensemble Learning），包含支持向量机（SVM）、贝叶斯判别器、逻辑回归判别器和Lasso回归等分类器。预测时对各分类器的分类结果进行投票，选出最好的模型。Radiomics features were extracted from abdominal CT through an open-source Python software package Pyradiomics, and then feature selection was performed. The deep features extracted by deep learning network and the clinical features processed by NLP were modeled by ensemble learning. Using ensemble learning (Ensemble Learning), including support vector machine (SVM), Bayesian discriminator, logistic regression discriminator and Lasso regression and other classifiers. When predicting, the classification results of each classifier are voted to select the best model.

提取的影像组学特征数量可能有几百到几万不等，而并不是每一个特征都与要解决的临床问题相关联；另一方面，在实践中，由于特征数量相对较多，而样本数量较少，容易导致随后的模型出现过拟合的现象，从而影响模型的准确率。特征选择首先遍历所有特征，两两计算Pearson相关系数，当P>0.90时，随机去除其中一个，该方法可以使得降维后的特征不具有高相似度；然后利用递归特征消除，根据预测能力对剩余特征排序。The number of extracted radiomic features may vary from hundreds to tens of thousands, and not every feature is associated with the clinical problem to be solved; on the other hand, in practice, due to the relatively large number of features, the If the number is small, it is easy to lead to the phenomenon of overfitting of the subsequent model, thus affecting the accuracy of the model. Feature selection first traverses all features, and calculates the Pearson correlation coefficient in pairs. When P>0.90, one of them is randomly removed. This method can make the features after dimension reduction not have high similarity; The remaining features are sorted.

临床病理信息特征选择通过逐步判别回归，将所有特征依次引入进行逐个检验。当原引入的特征变量由于后面变量的引入而变得不再显著时，将其剔除。反复进行直到既无显著的变量选入方程，也无不显著自变量从回归方程中剔除为止。The feature selection of clinicopathological information is through stepwise discriminant regression, and all features are introduced one by one and tested one by one. When the originally introduced feature variable becomes no longer significant due to the introduction of later variables, it is eliminated. Repeat until neither significant variables are selected into the equation nor non-significant independent variables are eliminated from the regression equation.

（2）自动分割网络U-Net(2) Automatic segmentation network U-Net

U -Net网络由一个捕获上下文信息的编码路径和一个解码路径组成，如图2所示，将编码器的特征图通过跳层连接方式与每个阶段的解码器上采样特征图进行拼接，从而形成一个U形结构。通过在每个阶段进行跳层连接使解码器学习因编码器池化丢失的特征。The U-Net network consists of an encoding path that captures contextual information and a decoding path. As shown in Figure 2, the feature map of the encoder is spliced with the up-sampling feature map of the decoder at each stage through a skip-layer connection, so that form a U-shaped structure. The decoder learns the features lost due to the encoder pooling by making skip connections at each stage.

（3）深度学习网络提取特征(3) Feature extraction by deep learning network

使用基于Resnet的架构来训练和提取高维深度学习特征，与传统卷积神经网络不同的是，Resnet网络结构中每一层都接受前一层的输出，使得网络更准确、高效。在网络的输出层之前提取深度学习特征，移除输出层，将隐藏层最后一层得到的高维特征作为输出的深度学习特征，如图3所示。The Resnet-based architecture is used to train and extract high-dimensional deep learning features. Unlike traditional convolutional neural networks, each layer in the Resnet network structure accepts the output of the previous layer, making the network more accurate and efficient. The deep learning features are extracted before the output layer of the network, the output layer is removed, and the high-dimensional features obtained from the last layer of the hidden layer are used as the output deep learning features, as shown in Figure 3.

本实施例提供的以上程序设计方案可以代码化的形式存储在计算机可读取存储介质中，并以计算机程序的方式进行实现，并通过计算机硬件输入计算所需的基本参数信息，并输出计算结果。The above program design solution provided in this embodiment can be stored in a computer-readable storage medium in a coded form, and implemented in the form of a computer program, and the basic parameter information required for calculation is input through computer hardware, and the calculation result is output. .

本领域内的技术人员应明白，本发明的实施例可提供为方法、装置、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备（装置）、和计算机程序产品的流程图来描述的。应理解可由计算机程序指令实现流程图中的每一流程、以及流程图中的流程结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程中指定的功能的装置。The present invention is described with reference to flowchart illustrations of methods, apparatus (apparatus), and computer program products according to embodiments of the invention. It will be understood that each process in the flowchart, and combinations of processes in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce A device that implements the functions specified in one or more of the flow charts.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程图中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The device implements the functions specified in one or more of the flowcharts.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that Instructions provide steps for implementing the functions specified in a flow or flows of the flowchart.

以上所述，仅是本发明的较佳实施例而已，并非是对本发明作其它形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例。但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in other forms. Any person skilled in the art may use the technical content disclosed above to make changes or modifications to equivalent changes. Example. However, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solutions of the present invention still belong to the protection scope of the technical solutions of the present invention.

本专利不局限于上述最佳实施方式，任何人在本专利的启示下都可以得出其它各种形式的结合深度学习与影像组学的结直肠癌图像分类方法及系统，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本专利的涵盖范围。This patent is not limited to the above-mentioned best embodiment, anyone can come up with other various forms of colorectal cancer image classification methods and systems combining deep learning and radiomics under the inspiration of this patent. Equivalent changes and modifications made within the scope of the patent shall fall within the scope of this patent.

Claims

1. a colorectal cancer image classification method combining deep learning and radiomics, is characterized in that, comprises the following steps:

Step S1: Data preprocessing: using the data enhancement method to amplify the colorectal cancer image data; use the manually labeled data to train the segmentation network, and use the trained model to automatically segment the region of interest from the image to obtain more contains labeled data;

Step S2: Feature extraction: Extract relevant omics features from abdominal CT through the open source Python software package Pyradiomics; use the resnet training model to select a model with the best results, and use this model to extract deep learning features;

Step S3: Feature selection: remove the relevant features, calculate the Pearson correlation matrix to eliminate the highly correlated features (p>0.90); and use the method of recursive feature elimination to sort the remaining features according to the predictive ability;

Step S4: Feature fusion: using natural language processing technology to convert the clinicopathological information provided by the doctor; fuse the radiomics features, deep learning features and clinicopathological information to obtain more comprehensive feature information;

Step S5: Using ensemble learning, using classifiers including support vector machine SVM, Bayesian discriminator, logistic regression discriminator and Lasso regression, vote on the classification results of each classifier during prediction, and select the best model to use. to perform the final image classification.

2. The colorectal cancer image classification method combining deep learning and radiomics according to claim 1, characterized in that: in step S1: firstly, data enhancement is performed on the existing colorectal cancer image data to obtain more The abdominal image, including the way of image rotation, image translation and image transformation, input the augmented data into the deep learning network U-Net for training, after the model is trained, it can be used to automatically segment the region of interest, and manually fine-tune the segmentation The lesion area obtained by the model can obtain more labeled data.

3. The colorectal cancer image classification method combining deep learning and radiomics according to claim 1, characterized in that: in step S3: feature selection first traverses all features, and calculates Pearson correlation coefficients in pairs, when P > At 0.90, one of them is randomly removed so that the features after dimension reduction do not have high similarity; then recursive feature elimination is used to sort the remaining features according to their predictive ability.

4. The colorectal cancer image classification method combining deep learning and radiomics according to claim 1, characterized in that: in step S4: clinical pathological information feature selection is by stepwise discriminant regression, and all features are sequentially introduced to perform one by one. Test; when the originally introduced characteristic variable becomes no longer significant due to the introduction of later variables, it is eliminated; it is repeated until neither significant variables are selected into the equation nor insignificant independent variables are eliminated from the regression equation.

5. A colorectal cancer image classification system combining deep learning and radiomics, characterized in that: based on a computer system, comprising:

The data preprocessing module adopts the data enhancement method to amplify the colorectal cancer image data; uses the manually labeled data to train the segmentation network, and uses the trained model to automatically segment the region of interest from the image to obtain more labeled data. The data;

The feature extraction module extracts relevant omics features from abdominal CT through the open source Python software package Pyradiomics; uses the resnet training model to select the model with the best results, and uses this model to extract deep learning features;

The feature selection module is used to remove relevant features, calculate the Pearson correlation matrix to eliminate high correlation features (p>0.90); and use the method of recursive feature elimination to sort the remaining features according to the predictive ability;

Feature fusion module, which uses natural language processing technology to convert clinical pathological information provided by doctors; integrates radiomics features, deep learning features and clinical pathological information to obtain more comprehensive feature information;

The integrated learning module uses classifiers including support vector machine SVM, Bayesian discriminator, logistic regression discriminator and Lasso regression to vote on the classification results of each classifier during prediction, and select the best model for final execution. image classification.

6. The colorectal cancer image classification method combining deep learning and radiomics according to claim 5, characterized in that: in the data preprocessing module, data enhancement adopts image rotation, image translation and image conversion. , and input the augmented data into the deep learning network U-Net for training.

7. The colorectal cancer image classification method combining deep learning and radiomics according to claim 5, characterized in that: in the feature selection module, all features are first traversed, and Pearson correlation coefficients are calculated pairwise, and when P When >0.90, one of them is randomly removed so that the features after dimension reduction do not have high similarity; then recursive feature elimination is used to sort the remaining features according to their predictive ability.

8. The colorectal cancer image classification method combining deep learning and radiomics according to claim 5, characterized in that: in the feature fusion module, the selection processing of clinical pathological information features is, by stepwise discriminant regression , all the features are introduced in turn and tested one by one; when the originally introduced feature variables become no longer significant due to the introduction of subsequent variables, they are eliminated; repeat until neither significant variables are selected into the equation nor insignificant independent variables removed from the regression equation.