CN114596318A

CN114596318A - Breast cancer magnetic resonance imaging focus segmentation method based on Transformer

Info

Publication number: CN114596318A
Application number: CN202210277852.XA
Authority: CN
Inventors: 邵叶秦; 许昌炎; 桑子江; 盛美红
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-06-07

Abstract

The invention provides a breast cancer magnetic resonance imaging focus segmentation method based on a Transformer, which relates to the field of intelligent medical treatment and deep learning and has the technical key points that: constructing a TransBC, wherein the TransBC is an MRI focus segmentation model based on Transformer combined with 3D convolution, and a network of the TransBC is an encoder-decoder structure; the encoder-decoder structure is divided into a down-sampling stage and an up-sampling stage, wherein the down-sampling stage is a CNN encoder and is used for extracting feature representations of different layers; and the up-sampling stage is a Transformer encoder and is used for extracting long-distance dependency of the high-resolution feature map for multiple times, supplementing and correcting the low-resolution CNN features. The core of the model is to encode a high-resolution feature map by using a Transformer and extract long-distance dependence to supplement and correct low-resolution CNN features. The model can more accurately process the edge part of the focus, and meanwhile, the model has a better segmentation effect on the difficult samples with unbalanced gray values in some focuses.

Description

Breast cancer magnetic resonance imaging focus segmentation method based on Transformer

Technical Field

The invention relates to the technical field of intelligent medical treatment and deep learning, in particular to a breast cancer magnetic resonance imaging focus segmentation method based on a Transformer.

Background

At present, breast cancer has become a leading cause of threat to female health, and according to latest data of international research on cancer (IARC) investigation in 2020, the number of newly increased breast cancer reaches 226 ten thousand, lung cancer is 220 ten thousand, and breast cancer formally replaces lung cancer, becoming the first cancer in the world. Various imaging examinations can be used to effectively assess and diagnose medical conditions. In practice, breast cancer dynamic enhanced sequences (DCE-MRI) possess the best diagnostic efficacy, with particular advantages in finding microscopic lesions, multicenter, multifocal and evaluating the range of lesions. With the rise and development of deep learning, researchers want to be able to segment medical images automatically through AI to assist doctor's visit. Compared with the traditional machine learning method, the convolutional network has more advantages in extracting the depth features: the weight sharing of the convolution operation brings the translation invariance of the characteristics, and the property of the convolution operator brings good local sensitivity.

At present, Convolutional Neural Networks (CNNs) have become the standard method for medical image segmentation tasks. The Unet model makes full convolutional networks and encoder-decoder architectures a new paradigm. However, the advantages of convolution operations also bring with them the inherent drawbacks of both receptive field-limited and spatial induction bias, which cannot capture the relationships of the global context. The unique self-attention mechanism in the Transformer can dynamically adjust the range of the receptive field according to the input content, and has more advantages in long-distance dependent modeling compared with the convolution operation. However, the recently proposed transform-based medical image segmentation method simply treats the transform as an auxiliary module without effectively combining the self-attention mechanism and the convolution.

Disclosure of Invention

The invention aims to solve the problems and provides a method for segmenting a breast cancer magnetic resonance imaging focus based on a Transformer.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for segmenting a breast cancer magnetic resonance imaging focus based on a Transformer is characterized in that a TransBC is constructed, the TransBC is an MRI focus segmentation model based on the Transformer combined with 3D convolution, and a network of the TransBC is an encoder-decoder structure; the encoder-decoder structure is divided into a down-sampling stage and an up-sampling stage,

the down-sampling stage is a CNN encoder and is used for extracting feature representations of different layers;

and the up-sampling stage is a Transformer encoder and is used for extracting long-distance dependency of the high-resolution feature map for multiple times, supplementing and correcting the low-resolution CNN features.

Preferably, it comprises the following steps:

s1: collecting dynamic enhanced breast cancer (DCE-MRI) data and preprocessing the data;

s2: constructing a TransBC network;

s3: constructing an encoder of a TransBC network, wherein the encoder of the TransBC network comprises a bottleneck module and a down-sampling module;

s4: constructing a decoder of a TransBC network, wherein the decoder of the TransBC network comprises a Transformer module, a feature fusion module and an up-sampling module;

s5: and training and testing the TransBC network by using the training set and the testing set obtained in the step S1.

Preferably, the preprocessing in S1 includes the following steps: the preprocessing step includes the steps of collecting patient breast cancer MRI data provided by a hospital, resampling MRI images to ensure that the spatial distance is 1mm, then performing cutting operation on the MRI images, unifying the sizes of the cut images to be (64, 64, 64), and dividing the collected data into a training set and a testing set after the data preprocessing in S1 is completed.

Preferably, the S3 includes the following steps:

3-1: using CNN encoders F^CNN(·)；

3-1-1: constructing a bottleneck block, wherein the bottleneck block is designed by using a classical residual error structure in ResNet;

3-1-2: constructing a down-sampling block, wherein the down-sampling block is composed of 3D convolution layers;

3-1-3: setting an encoder F^CNNThe activation function of the convolution operation in (-) is the ReLU function, which is defined as: out (in) max (0, in); the convolution kernel size is set to 2 x 2, step size is 2.

3-2: inputting pictures

Through F^CNNThe formula of the characteristic diagram after the operation is as follows:

preferably, the S4 includes the following steps:

4-1: constructing a Transformer module;

4-2: designing a feature fusion module by referring to the CBAM;

4-3 constructing an upsampling module to progressively resolve to

The feature map of (1) is restored to the original size.

Preferably, said 4-1 comprises the steps of:

4-1-1: determining the input of a Transformer module;

4-1-2: the input of the Transformer module is a 3D picture block

Wherein H, W, D and C respectively represent the height, width, depth and channel number of the optical fiber;

4-1-3: adding position codes and using learnable position codes;

4-1-4: the Transformer encoder comprises a multi-head self-attention block and a multi-layer perceptron block, wherein the self-attention block is responsible for completing the computation of query-key-value attention.

Preferably, the 4-1-2 comprises the following steps: partitioning the picture, partitioning the feature map x along three dimensions of width, height and depth, and stacking the blocks; a scaling strategy for the block side lengths is then used.

Preferably, the step S5 includes the steps of:

5-1: determining a basic architecture of a TransBC network, and initializing connection weight, residual error unit quantity, convolution layer quantity, learning rate, training step length, an optimizer, iteration times and training batches of each component of the network;

5-2: encoder F for inputting training set divided by S1 into TransBC network^CNN(. to obtain a down-sampled output X_S；

5-3: decoding the down-sampling result by using a Transformer module, a characteristic fusion module and an up-sampling module of a decoder part to obtain a model output value X_U；

5-4: evaluating the accuracy of model segmentation by adopting Dice, IoU and accuracy;

5-5: and (5) training the model according to the iteration times set in the step (5-1), and verifying the segmentation effect of the model by using the test set.

Preferably, in the 5-4:

the formula of Dice is:

wherein GT represents a gold standard binary image manually labeled by an expert, and Pred is a model prediction result. The value of Dice is [ 0-1%]The closer the Dice is to 1, the higher the contact ratio with the gold standard is;

IoU is given by the following formula:

IoU is used for measuring the contact ratio of the network prediction image and the gold standard as the Dice;

the accuracy is formulated as:

in (1), TP represents true positive; TN indicates true negative; FP and FN indicate false positives and false negatives.

The application also provides an MRI focus segmentation model, which is constructed by using the method for segmenting the breast cancer magnetic resonance imaging focus based on the Transformer, wherein the MRI focus segmentation model is a 3D medical image segmentation model based on the Transformer combined with 3D convolution, and a network of the MRI focus segmentation model adopts a coder-decoder structure; the encoder-decoder is divided into a down-sampling stage and an up-sampling stage.

The method for segmenting the breast cancer magnetic resonance imaging focus based on the Transformer is different from the method for segmenting the medical image based on the Transformer in the prior art in that the Transformer is simply regarded as an auxiliary module, and the transfbc in the method effectively utilizes the information extracted by the CNN and the Transformer respectively. The network continues to use an encoder-decoder structure, uses CNN to extract feature representations of different levels, and uses a Transformer encoder to extract long-distance dependency of feature maps for multiple times. Meanwhile, a fusion module which can fully utilize the Transformer characteristics and the CNN characteristics is designed to expand a jump layer in a classic encoder-decoder structure. The core of the method is to encode the high-resolution characteristic graph by using a Transformer, extract long-distance dependence to supplement and correct the low-resolution CNN characteristic, and more accurately process complex focuses and the edge parts of the focuses.

Drawings

FIG. 1 is a schematic diagram illustrating steps of a method for segmenting a breast cancer magnetic resonance imaging lesion based on a Transformer according to the present invention;

FIG. 2 is a flowchart of a method for segmenting a lesion of breast cancer by magnetic resonance imaging based on a Transformer according to the present invention;

FIG. 3 is a model structure diagram of a method for segmenting a breast cancer magnetic resonance imaging lesion based on a Transformer according to the present invention;

FIG. 4 is a block diagram of a Transformer-based magnetic resonance imaging lesion segmentation method for breast cancer according to the present invention;

FIG. 5 is a structural diagram of a feature fusion module of a transform-based breast cancer MRI lesion segmentation method according to the present invention;

FIG. 6 shows the lesion edge segmentation result of the transform-based lesion segmentation method for magnetic resonance imaging of breast cancer according to the present invention. Fig. 5 shows the predictive power of the model for the edge slices of malignant masses. The first row corresponds to the 1 st, 2 nd, 3 rd, 4 th, 5 th, 6 th, 39 th, 40 th, 41 th, 42 th and 43 th coronal slices in the lesion, the 2 nd behavioral model segmentation result, and the last row corresponds to the doctor labeled label (ground route) corresponding to the slice.

Fig. 7 is a result of segmenting a complex lesion by a transform-based breast cancer mri lesion segmentation method of the present invention. Fig. 6 illustrates the predictive power of the model for lesion slices with non-uniform internal gray values, with alternating bright and dark regions. The structure of fig. 6 is the same as that of fig. 5. The first row corresponds to the 28 th to 37 th coronal slices of the mass, which are the central parts of the mass.

FIG. 8 is an experimental result of a transform-based segmentation method for breast cancer MRI lesions on a test set according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and thus the present invention is not limited to the specific embodiments of the present disclosure.

A method for segmenting a magnetic resonance imaging lesion of breast cancer based on Transformer, please refer to fig. 1 and fig. 2, comprising the following steps:

specifically, in one embodiment, each patient contains 4 MRI images, dynamic first, second, fourth and sixth phases, respectively, wherein the first phase of the dynamic scan is a mask before contrast enhancement is effective, contrast is effective from the second phase of the scan sequence, and then the next phase of the dynamic scan is performed at intervals (90 seconds to 120 seconds). And then preprocessing the data to reduce the redundancy and complexity of the data.

After the preprocessing, the MRI data is divided into training set test sets according to a fixed ratio, and in one embodiment, the data uses MRI data of dynamically enhanced breast cancer of 200 patients provided by a certain hospital, wherein 95 patients are benign tumor patients, and 105 patients are malignant tumor patients.

The preprocessing step is to collect patient breast cancer MRI data D provided by the hospital_BC＝{id_i,img_i1,img_i2,img_i4,img_i6,seg_iIn which id_iNumber, img, representing the ith patient_i1,img_i2,img_i4,img_i6Respectively corresponding to the first-stage, second-stage, fourth-stage and sixth-stage MR images, seg, of the dynamic scan of the ith patient_iIs a labeled binary image of the lesion in the MR image. Meanwhile, in order to reduce the error caused by measurement conversion, in one embodiment, the MRI image is resampled to ensure that the spatial distance is 1mm by 1 mm. In consideration of the difference in the condition of the patient and the computational pressure of transform encoding of the 3D medical image, in one embodiment, a cropping operation is performed on the MRI image, with the cropped images being uniformly sized (64, 64, 64). After the data preprocessing is finished, D is_BCThe data is divided into training and testing sets in a fixed ratio.

S2: constructing an MRI focus segmentation model-TransBC network based on Transformer combined with 3D convolution, wherein the network adopts an encoder-decoder structure; a classical encoder-decoder is divided into a downsampling stage and an upsampling stage.

S3: constructing an encoder of a TransBC network;

the encoder of the transBC network comprises a bottleneck module and a down-sampling module.

The specific steps are as follows:

3-1: using CNN encoders F^CNN(·)，

Specifically, in one embodiment, the step 3-1 comprises the following steps:

3-1-1: constructing a bottleneck block:

the bottleneck block is designed using the classical residual structure in ResNet.

Specifically, the bottleneck block comprises a three-stage repeated 3D convolutional layer-batch normalization-ReLu layer block, and a short-jumper design is adopted in the last stage.

Compared to 2D convolution, the 3D convolution implements feature extraction of 3-dimensional spatial data, and the convolution formula is generally defined as: x^l＝f(W^l*X^l-1；b^l) Wherein X is^l-1And X^lInput and output of respective first layer convolution layer, W^lAnd b^lConvolution kernel parameters and bias terms of the first layer of convolution layer are respectively;

the residual structure is introduced to avoid the reduction of the precision of the model along with the increase of the convolution layer number, and the residual unit operation is defined as Z^l＝Z^l-1+F(Z^l-1；θ^l) Wherein Z is^l-1And Z^lInput and output of the l-th layer residual operation layer, theta^lIs the set of parameters in the layer i residual operation.

3-1-2: a downsample block is constructed.

The lower sampling block is composed of a 3D convolution layer, and a convolution operation formula is shown as a step 4-1-1;

3-2: inputting pictures

s4: a decoder of the TransBC network is constructed.

The decoder of the TransBC network comprises a Transformer module, a feature fusion module and an upsampling module, and the S4 comprises the following steps:

4-1: constructing a Transformer module:

referring to fig. 4, fig. 4 is a structure of a Transformer module for capturing long-distance dependency of features and performing global information correction and spatial information complementation on a feature map extracted from CNN encoder branches.

The specific steps of the step 4-1 are as follows:

4-1-1: determining the input of a Transformer module:

TransBC uses a Transformer module at each hop layer stage.

Assume a Transformer encoder is F^CNN(. h) feature fusion Module is F^Fusion(. 2), then each fused feature f^fusThe formula is as follows:

f^fus＝F^Fusion(f_l+1,F^Tran(f_l))

in one embodiment, the input of the Transformer module is not f_lBut is instead provided with a layer f_l-1. Compare f with_l，f_l-1One less convolution operation, therefore f_l-1Contains more spatial detail information;

4-1-2: the input of the Transformer module is a 3D picture block

Wherein H, W, D, C respectively represent the height, width and depth and the number of channels.

The size of the 3D picture is (H, W, D, C). In one embodiment, to meet the input requirement of the transform encoder, a picture needs to be partitioned, i.e. picture serialized. The feature map x is partitioned along three dimensions of width, height and depth, and the blocks are stacked. Such processing may reduce information loss compared to blocking using convolutional layers.

In one embodiment, to reduce the computational load of the transform encoder, a scaling strategy of block side lengths is used. By experimental comparison, considering the balance between time and space complexity, the side length of the small cube in the present application is 1/8 of the side length of the original cube, so the number of blocks is 8 × 512.

4-1-3: in order to preserve the spatial information of the image sequence, a position code is added, a learnable position code is used, the dimension of which is consistent with the dimension in 4-1-2:

wherein the content of the first and second substances,

for the purpose of embedding the projection in a block,

for position coding, a represents the side length of the block, n is the number of the blocks, and b is the length of the embedded vector;

4-1-4: the transform encoder includes a multi-headed self-attention block and a multi-layered perceptron block.

The multi-head self-attention block is responsible for completing the calculation of query-key-value attention, and vectors Q, K and V come from the same input. The calculation formula is as follows,

the specific calculation process can be disassembled into the following steps:

(1) and calculating the similarity of the Q and the K through point multiplication, and transposing the K to meet the matrix operation rule. Normalizing, i.e. dividing, the result after the operation

d_kMeaning the length of vector K.

(2) The similarity is quantified by the softmax function as a probability distribution.

(3) And performing vector multiplication on the probability distribution and V, and introducing the coding output of each input picture sequence into the coding information of the other picture sequences through an attention mechanism.

The multi-head self-attention block and the multi-layer perceptron block form a primary encoding of a transform encoder, and the transform encoder generally comprises N-times of encoding, and the encoding formula is as follows:

4-1-5: the dimension of the output characteristic diagram of the Transformer encoder is (N, D), and the dimension can be changed into a dimension by using a related dimension adjusting function in a deep learning framework

4-2: the feature fusion module is designed by referring to CBAM, and the structure is shown in FIG. 5.

The design of the feature fusion module mainly takes two aspects, namely, the difference between the CNN and Transformer coding modes and the difference between the CNN feature map and the Transformer feature map resolution.

The inputs of the feature module come from the CNN encoder module and the Transformer module respectively. The fusion module structure is shown in fig. 5. And adding the alignment elements of the local feature l and the global feature g, sending the added alignment elements into a bottleneck structure block, and calculating the importance of the features from the angles of a channel and a space position for the output of the bottleneck structure block. The operation process of the feature fusion module is as follows:

f＝bottleneck(l+g)

ch_c of＝avg_p ooling(f)+max_p ooling(f)

sp_c of＝softmax(f)

ff＝f*ch_c of+f*sp_c of。

4-3 build up of upsampling module, progressively resolving resolution into

The feature map of (1) is restored to the original size.

The upsampling process comprises a plurality of repeated upsampling blocks consisting of a 2 x upsampler, a 3D convolution layer with a core size of 3 x 3 and a ReLu layer. The up sampler adopts an interpolation value method, namely, a new voxel value is inserted between voxel points by using a trilinear interpolation algorithm on the basis of the original image voxel. For the feature map x, the size is (W, H, D), x is up-sampled by 2 times, and the down-sampled x' size is (2W,2H, 2D).

5-2: inputting the training set divided in the step 1) into an encoder F of a TransBC network^CNN(. C) to obtain a down-sampled output X_S；

5-3: decoding the down-sampling result by using a Transformer module, a feature fusion module and an up-sampling module of a decoder part to obtain a model output value X_U；

the formula of Dice is:

wherein GT represents a gold standard binary image manually labeled by an expert, and Pred is a model prediction result. The value of Dice is [ 0-1%]A closer Dice to 1 indicates a higher degree of overlap with the gold standard.

IoU is given by the following formula:

IoU is a measure of the degree of overlap of the network predicted image with the gold standard, as is done for Dice.

The accuracy is formulated as:

in (1), TP represents true positive; TN indicates true negative; FP and FN indicate false positives and false negatives. A higher value of accuracy indicates a higher proportion of total voxels in the prediction of correct voxels.

The method for segmenting the breast cancer magnetic resonance imaging focus based on the Transformer aims at the problems that breast cancer Magnetic Resonance Imaging (MRI) often has the characteristics of focus boundary blurring and focus internal gray value imbalance, and a traditional convolution network is used to cause spatial induction deviation and limited receptive field in the processing process, and provides a 3D medical image segmentation model, namely TransBC, combining the Transformer and convolution operation. The network follows a classical coder-decoder architecture. A CNN encoder extracts feature representations of different layers in a downsampling stage; and (3) extracting long-distance dependence of the high-resolution feature map for multiple times by the aid of the transform encoder in the upsampling stage to supplement and correct the low-resolution CNN features. The core of the model is to encode a high-resolution feature map by using a Transformer and extract long-distance dependence to supplement and correct low-resolution CNN features. The experimental result on the breast cancer data set also shows that the model can more accurately process the marginal part of the focus, and meanwhile, the model has a better segmentation effect on the difficult samples with uneven gray value in some focuses.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A breast cancer magnetic resonance imaging focus segmentation method based on a Transformer is characterized by comprising the following steps: constructing a TransBC, wherein the TransBC is an MRI focus segmentation model based on a Transformer combined with 3D convolution, and a network of the TransBC is an encoder-decoder structure; the encoder-decoder structure is divided into a down-sampling stage and an up-sampling stage,

and the up-sampling stage is a Transformer encoder and is used for extracting long-distance dependency of the high-resolution feature map for multiple times, supplementing and correcting the low-resolution CNN features. In the up-sampling stage, the resolution of the characteristic image is gradually reduced to the original size by applying the up-sampler for multiple times, and the output of the network is the label image of the medical image.

2. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 1, wherein: which comprises the following steps:

s2: constructing a TransBC network;

3. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: the preprocessing in the S1 includes the steps of: the preprocessing step comprises the steps of collecting patient breast cancer MRI data provided by a hospital, resampling MRI images to ensure that the spatial distance is 1mm, then cutting the MRI images, unifying the sizes of the cut images to be (64, 64, 64), and dividing the collected data into a training set and a testing set after the data preprocessing in S1 is finished.

4. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: the S3 includes the following steps:

3-1: using CNN encoders F^CNN(·)；

3-1-3: setting an encoder F^CNNThe activation function of the convolution operation in (-) is the ReLU function, which is defined as: out (in) max (0, in); setting the convolution kernel size to be 2 x 2 and the step size to be 2;

3-2: inputting pictures

5. the Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: the S4 includes the following steps:

4-1: constructing a Transformer module;

4-2: designing a feature fusion module by referring to the CBAM;

4-3 constructing an upsampling module to progressively resolve to

The feature map of (1) is restored to the original size.

6. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 5, wherein: the 4-1 comprises the following steps:

4-1-1: determining the input of a Transformer module;

4-1-2: the input of the transform module is a 3D picture block

4-1-3: adding position codes and using learnable position codes;

4-1-4: the Transformer encoder comprises a multi-head self-attention block and a multi-layer perceptron block, wherein the self-attention block is responsible for completing the calculation of query-key-value attention.

7. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: the 4-1-2 comprises the following steps: partitioning the picture, partitioning the feature map x along three dimensions of width, height and depth, and stacking the blocks; a scaling strategy for the block side lengths is then used.

8. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: the step of S5 comprises the following steps:

9. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: in the 5-4:

the formula of Dice is:

wherein GT represents a gold standard binary image manually labeled by an expert, and Pred is a model prediction result;

IoU is given by the following formula:

the accuracy is formulated as:

in (1), TP represents true positive; TN meterShowing true negative; FP and FN indicate false positives and false negatives.

10. The Transformer-based lesion segmentation method for breast cancer magnetic resonance imaging according to claim 2, wherein: in the 5-5:

to verify the segmentation performance of the model on breast cancer MRI, the images need to be processed to meet the input requirements of the model. The output image tag map of the model was applied with the softmax function and the threshold was set to 0.5. If the value in the label map is greater than the threshold value, it is set to 1, and if it is less than 0.5, it is set to 0. After the processing, the label map and the MRI image are in one-to-one correspondence, and if the voxel value is 0, the non-focus is represented, and if the voxel value is 1, the focus is represented.

11. An MRI lesion segmentation model characterized by: constructed using the method for transform-based MRI lesion segmentation of any one of claims 1-9, wherein the MRI lesion segmentation model is a transform-based 3D medical image segmentation model combined with 3D convolution, and a network of the MRI lesion segmentation model adopts a coder-decoder structure; the encoder-decoder is divided into a downsampling stage and an upsampling stage.