CN111583271A

CN111583271A - Method for automatically predicting gene expression categories based on cancer CT images

Info

Publication number: CN111583271A
Application number: CN202010285446.9A
Authority: CN
Inventors: 胡文心; 张绪坤; 李新星
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2020-08-25

Abstract

The invention discloses a method for automatically predicting gene expression categories based on cancer CT images, which comprises the following steps: a) obtaining ROI slices and expanding the number by 48 times; b) constructing a neural network based on the DenseNet-12 and the spatial pyramid module; c) training using a focus loss function; d) and comprehensively evaluating the model prediction to obtain a final prediction result. The data expansion technology adopted by the invention can greatly expand the data volume without changing the property of the CT image. A spatial pyramid pooling module with 4 dimensions extracts multilevel image features, including features of global semantics and detail grasping. Focal-local is used for guiding a network to pay more attention to tumor edges, namely slices with difficult effective characteristics at the head end and the tail end, and accurate and efficient CT image gene mutation prediction is finally realized by using a training strategy with gradually improved precision.

Description

Method for automatically predicting gene expression categories based on cancer CT images

Technical Field

The invention relates to the fields of Image processing, Computer vision, deep learning, Medical Image calculation and Computer-assisted intervention technology (Medical Image Computing and Computer-assisted intervention), in particular to a method for automatically detecting gene expression types based on cancer CT images.

Background

Recent studies at home and abroad show that the characteristics of cancer CT image extraction are related to certain gene expression patterns. As Shinagare et al verified in 2015 the association between tumor margins, nodule enhancement and intratumoral vessels and VHL mutations, Karlo et al suggested in 2014 that both PBRM1 and SETD2 gene mutations were mainly found in the solid (non-cystic) renal clear cell carcinoma cases. In the last two years, more and more people have begun exploring this. Mohammad et al, used the CNN network learned with multiple examples to detect the detection of the most common gene mutations in 4 in renal clear cell carcinoma as in 2018; in 2019, a college and universities utilize a 3D neural network to predict EGFR mutation in lung cancer, and the effect exceeding that of the traditional imaging omics is achieved; NicolasCoudray et al use neural network to predict multiple genes (STK11, EGFR, SETBP1, TP53, FAT1, KRAS, KEAP1, LRP1B, FAT4, NF1) in non-small cell lung cancer, and discuss and analyze experimental results, which proves the feasibility of neural network technology to explore in more tumor types and genotypes.

However, the existing methods use a large amount of medical data, and in practical situations, it is often difficult to collect a specific tumor CT data set with a gene mutation state gold standard. Moreover, due to the difference of tumor size, position and shape, the existing methods will re-sample the tumor to a fixed size for training, which undoubtedly will lose the accuracy of the image and ignore the difference between individual tumors. In addition, the tumor margins (i.e., the axial beginning and end) of the CT sequence generally contain fewer tumor portions, and these slice levels are difficult to identify, and have not been addressed by existing methods.

Disclosure of Invention

The invention aims to provide an auxiliary diagnosis method for automatically detecting gene expression types based on cancer CT images, aiming at the defects of the prior art. On one hand, the method plays a larger role after a small amount of data is expanded, and in addition, the pyramid module is adopted to cancel the limit that the input size must be fixed, and Focal-loss functions are used for paying attention to slices which are difficult to predict, and finally accurate and efficient prediction results are obtained through training.

The specific technical scheme for realizing the purpose of the invention is as follows:

a method for automatically predicting gene expression classes based on cancer CT images comprises the following specific steps:

step 1: obtaining ROI slices and expanding the number of images by 48 times;

step 2: constructing a neural network based on the DenseNet-12 and the spatial pyramid module;

and step 3: taking the image expanded in the step 1 as input, training by using the neural network constructed in the step 2, wherein the training Loss function adopts focus Loss, namely Focal-Loss;

and 4, step 4: and (3) predicting by using the network model trained in the step (3) to obtain the prediction results of the gene expression types of each input image, namely over-expression, non-expression or positive and negative, and summarizing the prediction results of all the input images belonging to the same CT sequence to obtain the overall prediction result of the CT sequence.

The step 1 specifically comprises:

step A1: extracting a slice containing the tumor from the complete CT sequence, and cutting the slice according to the position and the size of the tumor on the slice to obtain an ROI cube, wherein the ROI cube is a slice sequence containing the complete tumor;

step A2: obtaining an ROI slice sequence by cutting, wherein the size of the ROI slice sequence is n multiplied by w multiplied by h, n is the sequence layer number, w is the width, and h is the height, and 3 adjacent slices are stacked to form a group of data with 3 channels, and the size of the data is 3 multiplied by w multiplied by h; and 3 ROI slices in each set of 3-channel data were shuffled to form 6 stacked formats, and the obtained data was denoted as a, which has a size of n '× 3 × w × h, where n' ═ 6 × n;

step A3: transposing the data A in the step A2 to obtain transposed data B with the size of n' × 3 × h × w;

step A4: turning the data A in the step A2 up and down to form data C with the size of n' × 3 × w × h;

step A5: turning the data A in the step A2 left and right to form data D with the size of n' × 3 × w × h;

step A6: performing 90-degree rotation on the data A obtained in the step A2 for 1 time to form data E with the size of n' × 3 × h × w;

step A7: performing 90-degree rotation on the data A obtained in the step A2 for 2 times to form data F with the size of n' × 3 × w × h;

step A8: performing 90-degree rotation on the data B in the step A3 for 1 time to form data G with the size of n' × 3 × w × h;

step A9: and C, left-right turning the data B of the step A3 to form data H with the size of n' × 3 × H × w, and expanding the number of the images to 48 times of the original number of the images while not changing the image property of the ROI slice sequence, namely the sum of A + B + C + D + E + F + G + H.

The step 2 specifically comprises:

step B1: adjusting the convolution kernel of the DenseNet-12 first layer convolution layer to 5 x 5, and adjusting the step length to 1;

step B2: removing the first pooling layer of DenseNet-12 and directly connecting the convolutional layer of step B1 to the first Dense Block;

step B3: the first Dense Block contains 6 Dense layers, each Layer being formed by the combination of sequentially connected convolutional layers (Conv), batch normalization Layer (BatchNorm) and active Layer (ReLU), in which the convolutional kernels of all convolutional layers are adjusted to 3 x 3 and the step size is adjusted to 1;

step B4: adjusting the transition layer after the first Dense Block to a maximum pooling of 2 x 2;

step B5: a second Dense Block is connected after the transition layer, and the setting is the same as the DenseBlock in step B3;

step B6: the second Dense Block is followed by a Spatial Pyramid Pooling module with 4 Pooling cores, SPP, Spatial Pyramid Pooling; the SPP is used for extracting multi-level image features and outputting feature maps with 4 sizes including 1 × 1, 2 × 2, 3 × 3 and 4 × 4;

step B7: sequentially connecting 3 layers of full-connection layers after SPP, and arranging a Dropout layer with a parameter loss rate of 0.5 between adjacent full-connection layers to ensure that the characteristic with the maximum prediction relevance to the gene expression type is gradually screened out; the number of input units of the first layer of full connection layer is 4200, and the number of output units of the first layer of full connection layer is 4200; the number of input units of the second layer of full connection layer is 4200, and the number of output units is 1000; the number of input units of the third full-connection layer is 1000, and the number of output units is 2.

The step 3 specifically includes:

step C1: recording the data A + B + C + D + E + F + G + H obtained in the step 1 as data-1, and cutting the data-1 through the center to obtain data with the slice size of 64 x 64, and recording the data as data-2;

step C2: and (3) sending the data-2 into the network of the step 2 to carry out 50 rounds of training by using a stochastic gradient descent method, wherein the training setting batch is 64, and a Loss function required by the training uses a focusing Loss, namely Focal-Loss, and the calculation formula is as follows:

wherein y is a gene expression class label of the authentic data, and is 1 or 0, wherein y ═ 1 indicates that the gene expression class of the data is "over-expressed" or "positive"; y ═ 0 indicates that the gene expression class of the data is "no expression" or "negative";

y' in the formula is the probability value of correct prediction of the model for each input image, and is a decimal number between 0 and 1; wherein the closer y' is to 1, the higher the likelihood that the model predicts the input image correctly;

since the probability that the gene expression class is "over-expressed" is greater than the probability that the gene expression class is "not expressed" in practical cases, that is, the number of data with y equal to 1 is smaller than the number of data with y equal to 0 in the distribution of the training data, the data amount of the two classes is unbalanced in the training process; the imbalance of the data volumes of different classes can cause the network to be difficult to learn the rules from the data; alpha is an adjustable parameter, and the range of the parameter value is between 0 and 1, so as to solve the problem of data volume imbalance; specifically, when α is set to be greater than 0.5 and less than 1, 1- α is correspondingly greater than 0 and less than 0.5, and then in the above formula, the data with y being 1 has a greater influence, and the data with y being 0 has a smaller influence, so that the network has more attention to the data in the "over-expression" category;

in addition, it is often difficult to mine valid image features at the beginning or end of the tumor slice axis, because the images at the beginning and end both contain the edge region of the tumor with only a small amount of tumor tissue information, and thus the images are easily predicted incorrectly in the model. And the parameter y in the formula is used to solve this problem. Specifically, if γ is set to 2, the loss generated by model prediction is squared, so that a larger loss is generated for the image with the wrong prediction, and the network is guided to give more attention to the image in the training process, so that the feature learning capability of the model is stronger.

This step was followed by 50 rounds of training to obtain model M₁；

Step C3: cutting the data-1 through the center to obtain data with the slice size of 100 x 100, and recording the data as data-3; and sends the data-3 to the model M obtained in step C2₁Middle training, training setting and obtaining the model M after 50 training rounds in the same step C2₂；

Step C4: data-1 is fed into the model M obtained in step C3₂Middle training, obtaining the final model M with the highest accuracy₃(ii) a The training setting batch is 1, and the training is also performed by the random gradient descent method of Focal-loss.

The step 4 specifically includes:

step D1: extracting an ROI cube of the CT sequence, combining 3 adjacent slices of the ROI slice sequence to form 3-channel Input data, and recording the 3-channel Input data as Input;

step D2: inputting Input into trained model M₃Predicting the gene expression category of each 3-channel data, wherein the category is over-expression, non-expression or positive and negative;

step D3: setting a threshold value to be 0.5, comprehensively judging the gene expression type prediction of all 3-channel data in Input, wherein the type prediction of more than 50% is the final prediction result.

The invention has the beneficial effects that:

the method has the advantages of easy implementation, only need of manually extracting the tumor region from the CT image, no need of fixed size, and capability of training the obtained slices of any size. The present invention is non-invasive, and conventional genotyping requires biopsy and sequence detection, which is invasive and may be affected by difficulty in obtaining tissue samples, and increased risk to the patient. In the deep learning method proposed herein, the expression state of genes in a tumor is predicted by non-invasive Computed Tomography (CT). The method has high efficiency, on one hand, a small amount of data can play a larger role after being expanded, and on the other hand, the difference among tumor individuals is considered, so that the limitation that the input size needs to be fixed is eliminated by adopting the 4-dimensional pyramid module, and meanwhile, the multi-level image characteristics can be grasped. Finally, Focal-loss function is used for focusing on the slices with the characteristics difficult to identify at the edge of the tumor, and accurate and efficient prediction results are obtained through a training mode of 3 lots with different sizes.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic view of ROI slice acquisition of the present invention;

FIG. 3 is a schematic diagram of data expansion according to the present invention;

FIG. 4 is a diagram of a network framework according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

Examples

Referring to fig. 1, 2, 3 and 4, the present invention uses a CT image taken by a cancer patient to extract an ROI slice including a tumor portion by delineating and perform a series of data expansion methods, so as to greatly increase the data volume. Then, a network model is designed, the pyramid pooling module is used for enabling the model not to be limited by input of a fixed size, and a Focal-local function is used for adjusting Loss so as to train the model better. Finally, the predicted gene expression category of each section can be obtained, and the prediction of the tumor grade can be realized by fusing the prediction of each section. The specific operation is carried out according to the following steps;

1) table 1 shows CT data of 20 gastric cancers with the results of detection of HER-2 gene mutation collected from a hospital. The training set and test set are partitioned in a 3:1 manner. As shown in fig. 2, first, a CT sequence of each sample in the training set is extracted to obtain a slice containing a tumor, and a ROI (region of interest) cube is obtained by clipping according to the position and size of the tumor on the slice, where the ROI cube is a slice sequence containing a complete tumor. Specific data information and the size of the extracted ROI cube are shown in table 1;

TABLE 1 CT data information used in the implementation and data size after ROI extraction

2) FIG. 3 depicts the formation of 3-channel data and data expansion. First, the ROI slice sequence is obtained by clipping, and has a size of n × w × h, n is the number of sequence layers, w is the width, and h is the height (for case-1, for example, the size is as shown in table 1). Stacking adjacent 3 slices to form a set of data with 3 channels, the size of which is 3 xwxh; and 3 ROI slices in each group of 3-channel data are disorderly stacked to form 6 stacking forms. Table 1 15 samples in the training set were processed, and the data obtained after the processing was denoted as a, where n 'is × 3 × w × h, where n' is 6 × n;

3) transposing the data A to obtain transposed data B with the size of n' × 3 × h × w;

4) turning the data A up and down to form data C with the size of n' × 3 × w × h;

5) turning the data A left and right to form data D with the size of n' × 3 × w × h;

6) performing 90-degree rotation on the data A for 1 time to form data E with the size of n' × 3 × h × w;

7) performing 90-degree rotation on the data A for 2 times to form data F, wherein the size of the data F is n' × 3 × w × h;

8) performing 90-degree rotation on the data B for 1 time to form data G with the size of n' × 3 × w × h;

9) and (3) turning the data B left and right to form data H, wherein the size of the data H is n' × 3 × H × w, and the number of the images of the ROI slice sequence is expanded to 48 times of the original number of the images while the properties of the images of the ROI slice sequence are not changed, namely the sum of A + B + C + D + E + F + G + H. FIG. 3 shows 8 transformed versions of A-H, where the shape of the image can be seen to change, which in effect mimics the morphological differences that different tumors have in real world;

10) designing a network structure, firstly adjusting the convolution kernel of the first layer convolution layer of the DenseNet-12 to 5 x 5, and adjusting the step length to 1;

11) removing the first pooling layer of DenseNet-12, and directly connecting the first convolution layer introduced in step 10) with the first Dense Block;

12) one Dense Block in the network comprises 6 Dense layers, each Dense Layer is formed by combining a convolutional Layer, namely Conv, a batch normalization Layer, namely BatchNorm, and an active Layer, namely ReLU, which are connected in sequence, the convolutional cores of all convolutional layers are adjusted to be 3 x 3, and the step size is adjusted to be 1;

13) adjusting the transition layer after the first Dense Block to a maximum pooling of 2 x 2;

14) connecting a second Dense Block behind the transition layer, wherein the setting is the same as that of the first Dense Block;

15) a spatial pyramid pooling module (SPP) with 4 pooling cores is connected behind the second Dense Block; the SPP is used for extracting multi-level image features and outputting feature maps with the sizes of 1 × 1, 2 × 2, 3 × 3 and 4 × 4;

16) sequentially connecting 3 layers of full-connection layers behind the SPP, and arranging a Dropout layer with a parameter loss rate of 0.5 between adjacent full-connection layers to ensure that the characteristic with the maximum prediction relevance to the gene expression type is gradually screened out; the number of input units of the first layer of full connection layer is 4200, and the number of output units of the first layer of full connection layer is 4200; the number of input units of the second layer of full connection layer is 4200, and the number of output units is 1000; the number of input units of the third full-connection layer is 1000, and the number of output units is 2. Fig. 4 is a schematic diagram of a network framework.

17) Starting a training process, firstly marking the obtained data A + B + C + D + E + F + G + H as data-1, and cutting the data-1 through the center to obtain data with the slice size of 64 x 64, and marking the data as data-2;

18) sending the data-2 into a designed network, and performing 50 rounds of training by using a random gradient descent method to obtain a model M₁. Wherein, the training setting batch is 64, and the Loss function required by training uses the focus Loss, i.e. Focal-Loss;

19) cutting the data-1 through the center to obtain data with the slice size of 100 x 100, and recording the data as data-3; and feeds data-3 into model M₁Middle training, the training setting is the same as step 18), and after 50 training rounds, the model M is obtained₂；

20) Feeding data-1 into model M₂Obtaining a final model M through 50 rounds of training₃(ii) a The training setting batch is 1, and the training is also performed by the random gradient descent method of Focal-loss.

21) For the predicted 5 test samples (test-1, test-2, test-3, test-4, test-5). Similarly, firstly, providing an ROI (region of interest) slice sequence for the CT sequence of each sample, and combining 3 adjacent slices of the ROI slice sequence to form 3-channel Input data which is recorded as Input;

22) inputting Input into the trained final model M₃Predicting the gene expression category (the category is 'over-expression, no-expression' or 'positive and negative') of each 3-channel data;

23) the threshold value is set to be 0.5, the gene expression class prediction of all 3 channels of data in Input is comprehensively judged, the class prediction of more than 50% is the final prediction result, table 2 shows the prediction results (the number of correct images for predicting HER-2 and the proportion of the number of layers of the ROI sequence) of the method in the specific example, it can be seen that the image prediction accuracy of 5 test samples is high, and meanwhile, the gene mutation prediction results (the final prediction results) of tumors are correct under the threshold value of 50%.

TABLE 2. results of the present invention for 5 test data predictions (number and ratio of correct picture predictions)

Test set	HER-2 mutational status	Number of ROI sequence layers	Predicting the correct number	Predicting correct ratio
					test-1	Over-expression/positivity	50	47	94％(>50％)
test-2	Over-expression/positivity	102	91	89％(>50％)
					test-3	Non-expression/negativity	138	120	87％(>50％)
test-4	Non-expression/negativity	45	44	98％(>50％)
					test-5	Non-expression/negativity	74	69	93％(>50％)

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, which is set forth in the following claims.

Claims

1. A method for automatically predicting gene expression categories based on cancer CT images is characterized by comprising the following specific steps:

step 1: obtaining ROI slices and expanding the number of images by 48 times;

2. The method for automatically detecting gene expression type based on cancer CT image according to claim 1, wherein the step 1 comprises:

3. The method of claim 1, wherein the step 2 comprises:

step B3: the first Dense Block comprises 6 Dense layers, each Dense Layer is formed by combining sequentially connected convolution layers, namely Conv, batch normalization Layer, namely BatchNorm and active Layer, namely ReLU, and convolution kernels of all the convolution layers are adjusted to be 3 x 3, and the step size is adjusted to be 1;

step B5: connecting a second Dense Block after the transition layer, and setting the same as the Dense Block in the step B3;

step B6: a spatial pyramid pooling module (SPP) with 4 pooling cores is connected behind the second Dense Block; the SPP is used for extracting multi-level image features and outputting feature maps with the sizes of 1 × 1, 2 × 2, 3 × 3 and 4 × 4;

4. The method for automatically predicting gene expression classes based on cancer CT images as claimed in claim 1, wherein said step 3 specifically comprises:

step C2: sending the data-2 into the network of the step 2 to perform 50 rounds of training by using a random gradient descent method to obtain a model M₁(ii) a Wherein, the training setting batch is 64, the Loss function required by training uses the focus Loss, i.e. Focal-Loss, and the calculation formula is as follows:

wherein y is a gene expression class label of the authentic data, and is 1 or 0, wherein y ═ 1 indicates that the gene expression class of the data is "over-expressed" or "positive"; y ═ 0 indicates that the gene expression class of the data is "no expression" or "negative"; y' is the probability value of the correct prediction of the model for each input image, and is a decimal between 0 and 1; wherein the closer y' is to 1, the higher the likelihood that the model predicts the input image correctly; alpha is an adjustable parameter, and the range of the parameter value is between 0 and 1, so as to solve the problem of data volume imbalance; gamma is a prediction loss adjustment parameter, and gamma is set to be 2;

5. The method for automatically predicting gene expression classes based on cancer CT images as claimed in claim 1, wherein the step 4 specifically comprises: