CN116030025A

CN116030025A - Hepatocellular carcinoma prediction method based on modal sensing distillation network

Info

Publication number: CN116030025A
Application number: CN202310058590.2A
Authority: CN
Inventors: 王连生; 张英豪
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-04-28

Abstract

The invention discloses a hepatocellular carcinoma prediction method based on a modal sensing distillation network, which comprises the following steps of: s1, acquiring a data set of a hepatocellular carcinoma patient, dividing the whole data set into five folds according to a five-fold cross-validation scheme, wherein in each round of cross-validation, one fold data is used as a test set, and the other four fold data are used as training sets; s2, preprocessing the data, finding the largest external cube for the tumors of all patients, and removing other non-tumor areas except the cube; s3, establishing a modal sensing distillation network, and training the modal sensing distillation network, wherein the modal sensing distillation network is used for transferring the knowledge learned by combining a teacher network through clinical data modalities and image modalities to a student network only provided with the image modalities; s4, predicting the hepatocellular carcinoma through the trained modal sensing distillation network.

Description

Hepatocellular carcinoma prediction method based on modal sensing distillation network

Technical Field

The invention relates to the technical field of biology, in particular to a hepatocellular carcinoma prediction method based on a modal sensing distillation network.

Background

Hepatocellular carcinoma refers to malignant tumor generated by liver cells, and is a common pathological type of primary liver cancer. Currently, there are the following methods for predicting the infiltration of hepatic cell carcinoma: 1. predicting the pre-operative MVI using extreme gradient enhancement and deep learning of the CT image; 2. utilizing a 3DCNN prediction model to fuse features from multiple MR sequences; 3. embedding long-term memory LSTM into CNN to fuse the multi-modal MR volumes to predict MVI of HCC patients; the three methods described above only involve MR images to predict MVI states with low accuracy. In addition, predictions were made using the following method of knowledge distillation: 1. KD (knowledge-distillation) was used to effectively segment neuronal structure microscopy images from 3D optical images; 2. referring to the concept of KD, using soft labels to segment brain injury 3 by expanding mask boundaries, and using KD to carry out multi-source transmission learning lung mode analysis tasks; 4. a class-directed contrast distillation module is formulated to pull pairs of positive images from the same class in the teacher and student models while pushing pairs of negative images from different classes away; the distillation network adopted by the four methods only considers different image data, and transmits information from the input image data, so that the classification precision is poor and the prediction accuracy is low.

Disclosure of Invention

The invention aims to provide a hepatocellular carcinoma prediction method based on a modal sensing distillation network, which can effectively improve classification precision and prediction accuracy by migrating teacher network knowledge with image modes and non-image clinical data to a student network only with image modes and providing a modal sensing distillation network (MD-Net) for HCCMVI prediction.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a hepatocellular carcinoma prediction method based on a modal sensing distillation network comprises the following steps:

s1, acquiring a data set of a hepatocellular carcinoma patient, dividing the whole data set into five folds according to a five-fold cross-validation scheme, wherein in each round of cross-validation, one fold data is used as a test set, and the other four fold data are used as training sets;

s2, preprocessing the data, finding the largest external cube for the tumors of all patients, and removing other non-tumor areas except the cube;

s3, establishing a modal sensing distillation network, and training the modal sensing distillation network, wherein the modal sensing distillation network is used for transferring the knowledge learned by combining a teacher network through clinical data modalities and image modalities to a student network only provided with the image modalities;

s4, predicting the hepatocellular carcinoma through the trained modal sensing distillation network.

Preferably, the dataset in step S1 consists of 270 pathologically confirmed HCC patient data, 270 patients comprising 128M ₀ Patient, 93M ₁ Patients and 49M ₂ A patient; wherein M is ₀ Indicating no microvascular invasion, M ₁ Represents no more than 5 invasive blood vessels or is located within 1cm of the tumor surface, M ₂ Represents invasion of more than 5 blood vessels or more than 1cm from the tumor surface.

Preferably, in step S2, the size of the cube is set to 80×80×20 pixels.

Preferably, the training process of the modal sensing distillation network in step S3 is specifically:

s31, transmitting the HBP image and clinical data to an MRI-clinical fusion module by a teacher network, and extracting 512-dimensional vector features; inputting the PRE image and clinical data into another MRI-clinical fusion module to obtain another 512-dimensional vector feature; inputting the obtained two 512-dimensional vector features into an SA module to obtain new features fused with each other information

and />

Finally, new features are added>

and />

Spliced to generate Z ^t And Z is to ^t Passed into two fully connected layers to predict classification result P ^t ；

S32, the student network takes the 3DHBPMRI image and the 3DPREMRI image as input, transmits HBP data to the MRI-only module to obtain the characteristics, and transmits PRE data to the other MRI-only module to obtain the characteristics; inputting the obtained two features into an SA module to obtain new features fused with each other information

and />

Wherein the new feature->

and />

Is two feature vectors containing 512 dimensions; finally, new features are added>

and />

Are connected to generate Z ^s And Z is to ^s Input into the full connection layer to predict MVI classification result P ^s ；

S33, introducing a regression task into the student network to input HBConnection feature Z of P image and input PRE image ^s Delivered to two fully connected layers, the 52-dimensional vector P is predicted ^c For estimating potential clinical information, reusing input clinical data as prediction P ^c Is a real tag of (1);

s34, distilling the fused features in the teacher network clinical data and the MRI images to the features extracted from the MRI images by using the classification level distillation loss and the feature level distillation loss, and converting the clinical information of the teacher network into the student network by using a knowledge distillation strategy.

Preferably, in step S31, the MRI-clinical fusion module integrates MRI data and non-imaging clinical data, takes 3DMRI data and vectorized clinical data as input, and applies four full connection layers on the input clinical data to obtain four feature maps, wherein the feature channels are 64, 128, 256 and 256 respectively; another 3D feature map is obtained with four convolution blocks on the input MRI image, each consisting of two 3 x 3 convolution layers, and the feature channels are also set to 64, 128, 256, and 256; the four feature maps in the clinical data and the corresponding four features in the MRI data are channel-wise multiplied to integrate them together, and then a 3 x 3 convolution layer and a full connection layer are applied to output feature vectors having 512 dimensions.

Preferably, the MRI-only module extracts 512-dimensional feature vectors from the 3DMRI image in step S32; the MRI-only module consists of nine convolution blocks and a full connection layer, wherein each convolution block comprises a batch processing standard layer, a ReLU activation layer and a 3X 3 convolution layer and is used for improving the robustness of a network; the number of channels of the output features of the nine convolution blocks is set to be different, the feature channels of the first five layers are 32, 64 and 128, and the feature channels of the last four layers are 256, 128, 256 and 256, for balancing the efficiency and the calculation load.

Preferably, the SA module in steps S31 and S32 is a symmetric attention module, X and Y represent two input feature maps of the SA module, and the SA module applies a linear transformation layer on X to obtain three feature maps including Query vector Q _x Key vector K _x And Value vector V _x The method comprises the steps of carrying out a first treatment on the surface of the SA module applies an adaptive transformation layer on Y to generate Key feature map K _y And Value feature map V _y The method comprises the steps of carrying out a first treatment on the surface of the By multiplying by Q _x and K_x Generates score feature vector S by transpose of (1) _x By multiplying by Q _y and K_y To generate another score feature vector S _y The method comprises the steps of carrying out a first treatment on the surface of the The obtained score feature vector S _x And Value feature vector V _x Multiply and add S _y And V is equal to _y Multiplying to generate two result eigenvectors, adding them, and finally generating output refined eigenvector

The SA module applies another linear transformation layer on Y to obtain a characteristic vector Query vector Q _y By combining Q _y and K_y Is multiplied by the transpose of (2) and Q _y and K_y To calculate two score eigenvectors by transposed multiplication, and then calculate a refinement eigenvector by the following formula

Preferably, the mode-aware distillation in step S3 includes classified distillation and super distillation;

in a classified distillation, let

Representing MRI image data x generated from a student network _i Class probability of the belonging class, and +.>

Representing MRI image data generated from teacher networkx _i Class probability of the class; definition of Classification level distillation loss->

So that class probabilities from the teacher network are the targets for training the student network, the difference between the two distributions is measured using the Kullback-Leibler divergence:

wherein N and M represent the number of training samples and the number of total categories, respectively, DKL (. Cndot.) represents the Kullback-Leibler divergence between the two probabilities,

representing student network prediction samples, < >>

Representing teacher network prediction samples;

in the characteristic stage distillation, the characteristic stage distillation is lost

Calculated as +.>

and />

Kullback-Leibler divergence between and +.>

and />

Combinations of Kulback-leibler divergence:

wherein beta is used for weighting the Kullback-Leibler divergence term, and the weight beta ₁ ＝1，

Representing two features->

and />

Kullback-Leibler divergence between,/-between>

Representing two features->

and />

Kullback-Leibler divergence between,/-between>

Representing student network prediction samples, < >>

Representing teacher network prediction samples;

the final loss function includes two supervised losses on the teacher network and the student network, a self-supervised loss predicted by clinical data, and a distillation loss between the student network and the teacher network, the loss function is defined as follows:

wherein ,

and />

Representing respectively teacher network predicted supervision loss and student network predicted supervision loss, calculating +_ using Focalloss losses>

and />

Predicted loss of L _clinical Self-supervising loss representing clinical data predictions, P calculated using cross entropy loss ^C Is a basic fact of prediction error and clinical data; />

Represents the loss of class distillation, < >>

Representing the equation feature level distillation loss between the teacher and student networks, using equation L _total Training a modal aware distillation network for MIV prediction.

After the technical scheme is adopted, the invention has the following beneficial effects:

1. according to the invention, the teacher network knowledge with image modes and non-image clinical data is migrated to the student network with only image modes, and the mode sensing distillation network (MD-Net) for HCCMVI prediction is provided, so that the classification precision and the prediction accuracy can be effectively improved.

2. The student network of the modal aware distillation network (MD-Net) of the present invention includes two MRI-only modules for extracting MRI features only and one Symmetric Attention (SA) module for refining features from two MRI images, while the teacher network includes two MRI-clinical fusion modules for fusing MRI data with clinical data with 52-dimensional vectors and one SA module for refining the two fused features.

3. In addition to the resulting distillation of the original classification level, the present invention also designs a feature level distillation for the modal aware distillation network (MD-Net) to better transfer clinical data from the teacher network to the student network. In addition, a new self-supervising task was devised to predict clinical data from image data to further enhance MVI prediction.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is an exemplary view of a tumor in an HCCMRI image of the present invention;

FIG. 3 is a block diagram of a modal aware distillation network according to the present invention;

FIG. 4 is an exemplary diagram of an MRI-clinical fusion module, an MRI-only module, and a Channel-wise multiplication of the present invention, wherein (a) the MRI-clinical fusion module: fusing MRI data with non-imaging clinical data; (b) MRI-only module: using only MRI image data; (c) one example of Channel-wise multiplication;

fig. 5 is a frame flow chart of the SA module of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Examples

As shown in fig. 1 to 5, a hepatocellular carcinoma prediction method based on a modal sensing distillation network includes the following steps:

the dataset in step S1 consisted of 270 pathologically confirmed HCC patients data, 270 patients including 128M ₀ Patient, 93M ₁ Patients and 49M ₂ A patient; wherein M is ₀ Indicating no microvascular invasion, M ₁ Indicating no more than 5 invasive blood vessels or tumor locationWithin 1cm near the surface, M ₂ Represents invasion of more than 5 blood vessels or more than 1cm from the tumor surface;

setting the size of the cube to 80×80×20 pixels in step S2;

the training process of the modal sensing distillation network in the step S3 specifically comprises the following steps:

and />

Finally, new features are added>

and />

S32, the student network takes the 3DHBPMRI image and the 3DPREMRI image as input, transmits HBP data to the MRI-only module to obtain the characteristics, and transmits PRE data to the other MRI-only module to obtain the characteristics; the two obtainedThe features are input into an SA module to obtain new features fused with each other information

and />

Wherein the new feature->

and />

and />

S33, introducing a regression task into the student network, and inputting the connection characteristic Z of the HBP image and the PRE image ^s Delivered to two fully connected layers, the 52-dimensional vector P is predicted ^c For estimating potential clinical information, reusing input clinical data as prediction P ^c Is a real tag of (1);

s34, distilling the characteristics fused in the teacher network clinical data and the MRI images to the characteristics extracted from the MRI images by adopting the classification level distillation loss and the characteristic level distillation loss, and converting the clinical information of the teacher network into the student network by utilizing a knowledge distillation strategy;

in step S31, the MRI-clinical fusion module integrates MRI data and non-image clinical data, takes 3DMRI data and vectorized clinical data as input, and applies four full connection layers to the input clinical data to obtain four feature graphs, wherein the feature channels are 64, 128, 256 and 256 respectively; another 3D feature map is obtained with four convolution blocks on the input MRI image, each consisting of two 3 x 3 convolution layers, and the feature channels are also set to 64, 128, 256, and 256; multiplying the four feature maps in the clinical data and the corresponding four features in the MRI data by channel-wise to integrate the four feature maps and the corresponding four features together, and then applying a 3X 3 convolution layer and a full connection layer to output feature vectors with 512 dimensions;

in step S32, the MRI-only module extracts 512-dimensional feature vectors from the 3DMRI image; the MRI-only module consists of nine convolution blocks and a full connection layer, wherein each convolution block comprises a batch processing standard layer, a ReLU activation layer and a 3X 3 convolution layer and is used for improving the robustness of a network; setting the channel numbers of the output characteristics of the nine convolution blocks to be different, wherein the characteristic channels of the first five layers are 32, 64 and 128, and the characteristic channels of the last four layers are 256, 128, 256 and 256, so as to balance efficiency and calculation burden;

the SA modules in steps S31 and S32 are symmetric attention modules, X and Y represent two input feature maps of the SA module, and the SA module applies a linear transformation layer on X to obtain three feature maps including Query vector Q _x Key vector K _x And Value vector V _x The method comprises the steps of carrying out a first treatment on the surface of the SA module applies an adaptive transformation layer on Y to generate Key feature map K _y And Value feature map V _y The method comprises the steps of carrying out a first treatment on the surface of the By multiplying by Q _x and K_x Generates score feature vector S by transpose of (1) _x By multiplying by Q _y and K_y To generate another score feature vector S _y The method comprises the steps of carrying out a first treatment on the surface of the The obtained score feature vector S _x And Value feature vector V _x Multiply and add S _y And V is equal to _y Multiplying to generate two result eigenvectors, adding them, and finally generating output refined eigenvector

SA module application on YAnother linear transformation layer, obtaining a characteristic vector Query vector Q _y By combining Q _y and K_y Is multiplied by the transpose of (2) and Q _y and K_y To calculate two score eigenvectors by transposed multiplication, and then calculate a refinement eigenvector by the following formula

/>

The mode sensing distillation in the step S3 comprises classified distillation and superfine distillation;

in a classified distillation, let

Representing MRI image data x generated from teacher network _i Class probability of the class; definition of Classification level distillation loss->

representing student network prediction samples, < >>

Representing teacher network prediction samples;

Calculated as +.>

and />

Kullback-Leibler divergence between and +.>

and />

Combinations of Kulback-leibler divergence:

Representing two features->

and />

Kullback-Leibler divergence between,/-between>

Representing two features->

and />

Kullback-Leibler divergence between,/-between>

Representing student network prediction samples, < >>

Representing teacher network prediction samples;

wherein ,

and />

and />

Represents the distillation loss of the classification stage,

representing the equation feature level distillation loss between the teacher and student networks, using equation L _total Training a modal aware distillation network for MIV prediction;

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The hepatocellular carcinoma prediction method based on the modal sensing distillation network is characterized by comprising the following steps of:

s1, acquiring a data set of a hepatocellular carcinoma patient, dividing the whole data set into five folds according to a five-fold cross-validation scheme, taking one fold of data as a test set and the other four folds of data as a training set in each round of cross-validation, and calculating five rounds of mean and variance values of an evaluation index;

2. The method for predicting hepatocellular carcinoma based on modal sensory distillation network as recited in claim 1 wherein said dataset in step S1 is composed of 270 pathologically confirmed cellsReal HCC patient data composition, 270 patients including 128M ₀ Patient, 93M ₁ Patients and 49M ₂ A patient; wherein M is ₀ Indicating no microvascular invasion, M ₁ Represents no more than 5 invasive blood vessels or is located within 1cm of the tumor surface, M ₂ Represents invasion of more than 5 blood vessels or more than 1cm from the tumor surface.

3. A method for predicting hepatocellular carcinoma based on a modal sensing and distillation network as recited in claim 1 wherein in step S2 the size of said cubes is set to 80 x 20 pixels.

4. The hepatocellular carcinoma prediction method based on a modal sensing distillation network as set forth in claim 1, wherein the training process of the modal sensing distillation network in step S3 is specifically as follows:

and />

Finally, new features are added>

and />

S32, student network uses 3D HBP MRI image and 3D PRE MRI image is taken as input, HBP data is transmitted to an MRI-only module to obtain characteristics, PRE data is transmitted to another MRI-only module to obtain characteristics; inputting the obtained two features into an SA module to obtain new features fused with each other information

and />

Wherein the new feature->

and />

and />

5. The method for predicting hepatocellular carcinoma based on modal-aware distillation network as set forth in claim 4, wherein in step S31, the MRI-clinical fusion module integrates MRI data and non-imaging clinical data, takes 3D MRI data and vectorized clinical data as input, and applies four fully connected layers to the input clinical data to obtain four feature maps, the feature channels being 64, 128, 256 and 256, respectively; another 3D feature map is obtained with four convolution blocks on the input MRI image, each consisting of two 3 x 3 convolution layers, and the feature channels are also set to 64, 128, 256, and 256; the four feature maps in the clinical data and the corresponding four features in the MRI data are channel-wise multiplied to integrate them together, and then a 3 x 3 convolution layer and a full connection layer are applied to output feature vectors having 512 dimensions.

6. The method for predicting hepatocellular carcinoma based on modal-aware distillation network as set forth in claim 4, wherein the MRI-only module extracts 512-dimensional feature vectors from the 3D MRI image in step S32; the MRI-only module consists of nine convolution blocks and a full connection layer, wherein each convolution block comprises a batch processing standard layer, a ReLU activation layer and a 3X 3 convolution layer and is used for improving the robustness of a network; the number of channels of the output features of the nine convolution blocks is set to be different, the feature channels of the first five layers are 32, 64 and 128, and the feature channels of the last four layers are 256, 128, 256 and 256, for balancing the efficiency and the calculation load.

7. The method for predicting hepatocellular carcinoma based on modal-aware distillation network as set forth in claim 4, wherein the SA module in steps S31 and S32 is a symmetric attention module, X and Y represent two input feature maps of the SA module, and the SA module applies a linear transformation layer on X to obtain three feature maps including Query vector Q _x Key vector K _x And Value vector V _x The method comprises the steps of carrying out a first treatment on the surface of the SA module applies an adaptive transformation layer on Y to generate Key feature map K _y And Value feature map V _y The method comprises the steps of carrying out a first treatment on the surface of the By multiplying by Q _x and K_x Is transposed to generate score feature vector S _x By multiplying by Q _y and K_y To generate another score feature vector S _y The method comprises the steps of carrying out a first treatment on the surface of the The obtained score feature vector S _x And Value feature vector V _x Multiply and add S _y And V is equal to _y Multiplying to generate two result eigenvectors, adding them, and finally generating output refined eigenvector

8. The method for predicting hepatocellular carcinoma based on modal sense distillation network as recited in claim 4 wherein the modal sense distillation in step S3 comprises classified distillation and superfine distillation;

in a classified distillation, let

Representing MRI image data x generated from teacher network _i Class probability of the class; definition of classified horizontal distillation loss

representing student network prediction samples, < >>

Representing teacher network prediction samples;

Calculated as +.>

and />

Kullback-Leibler divergence between and +.>

and />

Kulback-L therebetweenCombination of eibleer divergences:

Representing two features

and />

Kullback-Leibler divergence between,/-between>

Representing two features->

and />

Kullback-Leibler divergence between,/-between>

Representing student network prediction samples, < >>

Representing teacher network prediction samples;

wherein ,

and />

Representing teacher network predicted supervision loss and student network predicted supervision loss, respectively, calculating +.>

and />

Represents the loss of class distillation, < >>

Representing the equation feature level distillation loss between the teacher and student networks, using equation L _total Training a modal aware distillation network for MIV prediction. />