CN114677403A

CN114677403A - Liver tumor image segmentation method based on deep learning attention mechanism

Info

Publication number: CN114677403A
Application number: CN202111359719.0A
Authority: CN
Inventors: 李春国; 陆敬奔; 张翅; 冷天然; 高振; 孙希茜; 杨绿溪
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-06-28

Abstract

The invention discloses a liver tumor image segmentation method based on a deep learning attention mechanism, and belongs to the field of image processing. The invention integrates axial attention and multi-scale attention into a deep learning liver tumor segmentation network. The axial attention can effectively extract the global information features of the liver tumor CT image under the condition of occupying a small amount of computing resources, and the multi-scale attention can effectively perform self-adaptive feature extraction aiming at a multi-scale target. The whole network adopts a U-shaped network structure, the main trunk is a convolution extraction path, and the auxiliary way is an attention mechanism, so that the liver tumor segmentation performance can be effectively improved.

Description

Liver tumor image segmentation method based on deep learning attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to a liver tumor image segmentation method based on a deep learning attention mechanism.

Background

Liver cancer seriously threatens the health of human beings. Among all cancers, liver cancer incidence rates rank sixth, but mortality rates rank fourth. The liver cancer lesion excision operation is an important treatment means for curing liver cancer, and requires accurate excision of a tumor lesion region and guarantee of the integrity of a non-lesion region. Doctors can only accurately diagnose, analyze and treat liver cancer patients by accurately segmenting liver lesion areas. Accurate segmentation of liver tumors is a difficult point of research, and in CT images of different patients, even in different scanning slices of a CT image of the same patient, the position of the liver tumor is not fixed, and the size, number and shape of the liver tumor continuously change with the change of a scanning layer, even worse, image noise and fuzzy tissue boundaries also seriously affect a doctor to segment a lesion region.

The current best results of manual segmentation of lesions by experienced physicians or specialists are often also used as a criterion for prediction of medical image segmentation, but it is very time and effort consuming. More importantly, manual segmentation has non-repeatability, which is limited by the level and experience of the physician, and even the working status. Due to different experience and level, segmentation results for lesions vary from person to person; even two segmentations of a tissue portion by the same physician may cause a certain difference due to fatigue.

In order to overcome the defects of manual segmentation, researchers propose some semi-automatic segmentation methods, such as traditional medical image segmentation algorithms and machine learning algorithms, and compared with manual segmentation, the segmentation efficiency of the algorithms is greatly improved. The traditional medical image segmentation algorithm, such as Snakes algorithm, needs an operator to manually determine a part of contour points at the edge of a liver tumor to form an initial contour when segmenting the tumor, then the algorithm automatically fits a final contour, the operation depends on the subjective experience and knowledge of a doctor, and the precision is not high enough; the dependence of machine learning on manual feature extraction is very serious, and the quality of feature extraction directly influences the final liver tumor segmentation result, so the segmentation precision is obviously lower than that of manual segmentation, and the segmentation effect is limited by the level and experience of doctors to a certain extent. Modern medicine needs a highly efficient, convenient, simple, practical, highly repeatable, and accurate full-automatic segmentation means.

The appearance of deep learning becomes a research hotspot in the field of computer vision, and particularly, the development of the image field is promoted by the CNN. The deep learning utilizes the neural network to automatically extract the features, overcomes the limitation of manually extracting the features by the traditional algorithm, and is widely applied to the field of medical image processing.

The earliest deep learning liver tumor segmentation method is based on an image block, each pixel of the whole image is taken as a central point, the image block is divided by taking the central point as the center, then the image block is sent to a CNN (compressed natural number), the characteristics of the image block are extracted, the category of the image block is predicted, and the predicted category is taken as the category of the central point of the image block. Image segmentation methods based on image blocks have two major drawbacks: firstly, the method is low in efficiency, each pixel point needs to be classified, and the image block needs to be divided by taking each pixel point as a center, so that a plurality of repeated calculations are inevitably included between adjacent pixel points, and when the image resolution is high, a large amount of calculation resources are additionally occupied; the second is the loss of global information. In the image block-based method, an image block with a certain pixel point as a center is used as training data, so that image information except the image block is inevitably lost, and global information is ignored, so that the method is not beneficial to high-precision liver tumor segmentation.

Long et al propose a full convolution network FCN based on a convolution neural network, promoting progress in the field of image segmentation. The FCN has no fully connected layers, and only contains convolutional layers and pooling layers. The FCN overcomes the defects of an image block segmentation method, truly realizes end-to-end pixel-to-pixel training and prediction, does not lose global information in the segmentation process, and improves the segmentation precision to a certain extent; compared with the partitioning method based on the image block, the FCN has no calculation redundancy, and the calculation efficiency is improved. FCN also has the great advantage of not limiting the spatial size of the input image, which in conventional convolutional neural networks is limited by the number of neurons in the fully-connected layer.

On the basis of FCN, researchers put forward a plurality of new network models, the most famous is U-Net network, and other networks are improvements on the U-Net network, so that a plurality of U-type variant networks are derived.

Conventional convolutional neural networks do not perform well in multi-scale feature extraction and global information modeling. In the task of segmenting liver tumor, the size, shape, position and number of liver tumor are different from patient to patient, even the same patient, which brings great challenges to automatic fine segmentation. In addition, the boundary of some lesions is not clear, and the noise is large, so that the performance of edge segmentation is limited. Clearly, the final segmentation result is significantly challenged by multi-scale and global spatial modeling problems.

Attention mechanisms were originally proposed and successfully applied in natural language processing and machine translation tasks with some success. In the field of computer vision, there are also some scholars exploring ways to use attention mechanism in visual and convolutional neural networks to improve network performance. Similar to spatial attention, channel attention, and self-attention, etc., have all proven beneficial in improving network performance. In the task of deep learning image segmentation, there are also some attempts on attention mechanism, such as integrating channel attention or Transformer including self-attention mechanism into U-Net.

The image segmentation task can be thought of as a pixel-level classification problem that is amenable to supervised learning using cross-entropy and guided backpropagation. In the liver lesion segmentation task, there is a severe imbalance between lesion and background, which is very different from the conventional segmentation task. The background loss generated by the ordinary cross entropy function calculation is far larger than the foreground (object to be segmented) loss, which causes the network parameters to be updated towards the direction of optimizing background segmentation. To solve the above problem, the background and foreground (liver nodules to be segmented) are given weights in cross entropy, giving the nodules greater loss weight to emphasize the importance of the liver nodules on the network classification.

Disclosure of Invention

The invention aims to solve the technical problem of a liver tumor image segmentation method based on a deep learning attention mechanism, which is a semantic segmentation image processing network. Two attention mechanisms are designed according to the characteristics of liver tumors, and the semantic segmentation effect of the original U-shaped network on the liver tumor image is favorably improved.

Aiming at multi-scale feature extraction, a dynamic multi-scale attention mechanism is provided, and the mechanism allocates adaptive weights to multi-scale convolution. The scale attention can fuse the receptive fields of multiple scales, and the segmentation of the multi-scale target is facilitated. An axial attention mechanism is provided for global modeling of spatial information, and long-term dependence of the space can be effectively simulated. The two attention mechanisms are organically combined together through an adaptive global pooling mode and are embedded into a U-shaped network to improve the segmentation performance of the liver tumor.

The invention relates to a liver tumor image segmentation method based on a deep learning attention mechanism, which specifically comprises the following steps of:

step 1: image acquisition and image pre-processing

Slicing the 3D CT image to obtain a two-dimensional CT slice image; setting a proper window width for the slice of the CT image to remove excessive areas irrelevant to the liver tumor segmentation task in the image, and processing the influence of the liver CT through operations such as normalization and standardization;

And 2, step: building U-shaped attention mechanism network

Firstly, a U-shaped frame is constructed, an axial attention mechanism and a multi-scale attention mechanism are added on the basis of the U-shaped network, and the axial attention mechanism and the multi-scale attention mechanism are combined in a self-adaptive pooling mode;

and step 3: training network

Selecting images from the processed data in the step 2) to divide the images into a training set, a testing set and a verification set, and training the network model by using the data set to obtain a converged network model;

and 4, step 4: liver tumor image segmentation

And inputting the CT slices of the liver tumor to be processed into a network for full-automatic liver tumor segmentation.

Has the advantages that:

compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention provides a dynamic multi-scale attention mechanism aiming at multi-scale feature extraction, and the mechanism allocates adaptive weight for multi-scale convolution. The scale attention can fuse the receptive fields of multiple scales, and is beneficial to the segmentation of multi-scale targets.

2. The invention provides an axial attention mechanism aiming at the global modeling of spatial information, and can effectively simulate the long-term dependence of the space. The two attention mechanisms are organically combined together through an adaptive global pooling mode and are embedded into a U-shaped network to improve the segmentation performance of the liver tumor.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

fig. 2 is a schematic diagram of a general U-type network;

FIG. 3 is a schematic diagram of a multi-scale attention mechanism;

FIG. 4 is a schematic diagram of an axial attention mechanism;

FIG. 5 is a schematic diagram of a structure that organically combines two kinds of attention in an adaptive pooling manner;

FIG. 6 is a result of a comparison of the present invention and a prior art method on a LiTS dataset;

FIG. 7 is the test results of the present invention on 3D-IRCADB and hospital provided test sets.

Detailed Description

The liver tumor image segmentation method based on the deep learning attention mechanism of the invention has an implementation flow chart shown in fig. 1, and specifically comprises the following steps:

step 1: image acquisition and image pre-processing

First, a 3D CT image is sliced to obtain a two-dimensional CT slice image, and the spatial resolution of the original CT slice is 512 × 512. Then, an appropriate window width value is set for the slice of the CT image, and the slice is cut by the window width to remove excessive regions in the image which are irrelevant to the liver tumor segmentation task, wherein the reference Hu value of the liver is taken as the window level, and the window width is set to be 200 according to an empirical value. And then, performing peripheral cutting on the CT image and the corresponding segmentation label thereof, and removing excessive completely black background areas to optimize network training, wherein the spatial resolution of the cut slice is 400 x 400. Finally, processing the influence of the liver CT through operations such as normalization and standardization;

Step 2: building U-shaped attention mechanism network

Firstly, a U-shaped frame is constructed, an axial attention mechanism and a multi-scale attention mechanism are added on the basis of the U-shaped network, and the axial attention mechanism and the multi-scale attention mechanism are combined in an adaptive pooling mode. The U-shaped attention mechanism network comprises an encoding network and a decoding network, the encoding network comprises an axial attention mechanism and a multi-scale attention mechanism, the encoding network is responsible for feature extraction, global dependence is constructed by applying global modeling of axial attention, information is fully extracted, multi-scale feature extraction capability of multi-scale attention is applied, self-adaptive feature extraction is carried out on multi-scale input, and finally the encoding network and the decoding network are organically combined in a self-adaptive pooling mode to fully extract semantic information; while the features of the encoding network are also fed to the decoding network to assist in the semantic segmentation of the decoding network.

The encoding network comprises four layers, each layer comprises a multi-scale feature extraction trunk based on a multi-scale attention mechanism and an attention bypass based on axial attention, effective global information is extracted by the bypass, and self-adaptive selection weights are given to the multi-scale features extracted by the main route;

the decoding network also comprises four layers, and each layer corresponds to each layer of the coding network; each layer of the decoding network is formed by deconvolution;

The information is transmitted between the coding network and the decoding network through jump connection, and the low-level semantic information of each layer of the coding network flows into the network layer corresponding to the decoding network to assist the high-level semantic information in realizing accurate segmentation.

The U-shaped attention mechanism network comprises a multi-scale attention mechanism, and can realize self-adaptive multi-scale feature extraction.

The multi-scale attention mechanism comprises two parts, namely multi-scale feature extraction and self-adaptive weight, and firstly, multi-scale features are extracted:

the method is multi-scale convolution with n paths of different receptive fields, concat (phi) represents feature map splicing, F (phi) represents a feature map, k (phi) represents a convolution kernel, and p is an element in the feature map.

Then generating self-adaptive weights of the multi-scale features, wherein the weights need to be generated by the feature graph itself and need to be subjected to global average pooling, global maximum pooling and weight generation layers:

Z_max(c)＝Max(X(c,i,j))0≤i≤W-1 0≤j≤H-1

Weights＝σ(f(δ(f(Z_avg+Z_max))))

Z_avg(c) representing global average pooling, H and W representing two spatial dimensions of the input feature map, respectively, and X representing the input feature map; z_max(c) Representing global maximum pooling, Max representing a maximum function; weights represents the generated multi-scale weight, f is a full connection layer, delta is a ReLU activation function, and sigma represents a Sigmoid activation function.

And finally, weighting the different scale characteristic graphs extracted by the multi-scale convolution by using the generated channel-level weight to obtain a self-adaptive weighted multi-scale attention result.

The U-shaped attention mechanism network comprises an axial attention mechanism, and effective global dependency modeling can be achieved.

The axial attention mechanism comprises three parts of transverse axis attention, longitudinal axis attention and an output module; first, input requires attention through the horizontal and vertical axes to capture global dependencies in both the lateral and vertical directions:

matmul (-) denotes a matrix multiplication, Softmax (-) denotes a Softmax function, Q_h、K_hAnd V_hRepresenting the input in the transverse direction, Q_w、K_wAnd V_wRepresenting longitudinal input, D_hAnd D_wThe dimensions of the horizontal input and the vertical input are respectively, permute represents a permutation function of the dimensions, and the dimension direction of the vertical direction is restored.

The lateral attention and the longitudinal attention can model the global dependency in the lateral direction and the longitudinal direction respectively, and then the final spatial global attention is obtained through an output module:

Weights＝Softmax(conv(concat(X_h,X_w)))

X_h,X_wfor global dependency of output in both horizontal and vertical directions, Weights represents a heatmap of spatial global attention Weights of final output, Softmax ((-)) represents a Softmax function, conv (-) represents 3 × 3 convolution, and concat (-) represents feature map splicing;

And then weighting the spatial dimension of the feature map by the generated spatial global attention weight heat map to finally obtain a weighting result of the axial attention mechanism.

And finally, organically combining the axial attention mechanism with the multi-scale attention mechanism, and specifically combining the axial attention mechanism with the global average pooling and the global maximum pooling to construct the self-adaptive global pooling. Due to the effective modeling capability of the axial attention on the global spatial information, the information lost by global pooling, especially the detail information which is important for segmentation, can be effectively prevented. A block diagram combining axial attention and multi-scale attention is shown in fig. 4.

In the liver tumor image segmentation task, the problem of serious sample imbalance between the background and the tumor in the liver tumor segmentation task can be solved by using the weight cross entropy. Since the image segmentation task can be considered a pixel-level classification problem, it is suitable for using cross-entropy supervised learning and guided backpropagation. In the liver lesion segmentation task, there is a severe imbalance between lesion and background, which is very different from the conventional segmentation task. The background loss generated by the ordinary cross entropy function calculation is far larger than the foreground (object to be segmented) loss, which causes the network parameters to be updated towards the direction of optimizing background segmentation. To solve the above problem, the background and foreground (liver nodules to be segmented) are given weights in cross entropy, giving the nodules greater loss weight to emphasize the importance of the liver nodules on the network classification.

And 3, step 3: training network

Selecting images from the processed data in the step 2) to be divided into a training set, a test set and a verification set, and training the network model by using the data set to obtain a converged network model; in the process of model training, in order to reduce the optimization difficulty of the model and accelerate the training of the model, a transfer learning training method is used. The specific implementation steps are that firstly, a U-shaped basic network is trained by utilizing the processed data set, the U-shaped network is a basic network for image semantic segmentation, the network structure is simple, but the network structure comprises a structure necessary for semantic segmentation, and the network structure comprises an encoder, a decoder and a jump connection structure. Then on the basis of the trained parameters of the U-shaped network, a multi-scale attention mechanism and an axial attention mechanism proposed by the invention are added, and the parameters of the whole network are continuously updated through training. In the network training process, the network is trained by using an Nvidia GTX1080Ti GPU, and the training is stopped when the network converges.

And 4, step 4: liver tumor image segmentation

And (3) implementing a preprocessing method which is the same as the data set on the liver tumor CT slice to be processed, and then inputting the processed slice into a network to perform full-automatic liver tumor segmentation.

As shown in fig. 6, our invention (attention network second column in the figure) achieves the best performance compared to other methods of testing on a partitioned LiTS dataset. For some CT slices with complex background and serious noise, the method has better performance than other algorithms. In particular, the CT slice noise is significantly higher in (a) and (e), and the method can still accurately capture the small lesion. However, other comparison networks produce more or less false positive and false negative predictions.

As shown in FIG. 7, to demonstrate the generalization performance of the present invention, we tested the performance of the present invention on 3D-IRCADB not involved in training and liver tumor data sets provided by Shandong provincial Hospital. The first and third rows are original CT slices and the second and fourth rows are the corresponding segmentation results. In fig. 7, the red area is the tumor predicted by the present invention. The segmentation results show that the present invention has a strong ability to capture small target tumors, and can effectively capture liver tumors of different sizes, and obtain good segmentation results, as shown in the segmentation examples (e), (f), (g) and (h) in fig. 7, wherein the segmentation examples include a plurality of tumors of multiple scales.

The technical means disclosed in the scheme of the invention are not limited to the technical means disclosed in the above embodiments, but also include the technical means formed by any combination of the above technical features. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. The liver tumor image segmentation method based on the deep learning attention mechanism is characterized by comprising the following steps:

step 1: image acquisition and image preprocessing;

firstly, slicing a 3D CT image to obtain a two-dimensional CT slice image, then setting a proper window level window width value for the slice of the CT image to remove excessive regions irrelevant to liver tumor segmentation tasks in the image, then peripherally cutting the CT image and corresponding segmentation labels thereof to remove excessive completely black background regions to optimize network training, and finally processing the influence of normalization standardization and other operations on liver CT;

step 2: constructing a U-shaped attention mechanism network;

firstly, a U-shaped frame is constructed, an axial attention mechanism and a multi-scale attention mechanism are added on the basis of the U-shaped network, and the axial attention mechanism and the multi-scale attention mechanism are combined in a self-adaptive pooling mode; extracting multi-scale features as main roads of the network, combining axial attention and global pooling in the multi-scale attention to form self-adaptive global pooling, and generating more effective multi-scale self-adaptive weight;

And step 3: training a network;

selecting images from the data processed in the step 2, dividing the images into a training set, a testing set and a verification set, and training the network model by using the data set to obtain a converged network model;

and 4, step 4: segmenting a liver tumor image;

2. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein the U-shaped attention mechanism network comprises an encoding network and a decoding network, the encoding network comprises an axial attention mechanism and a multi-scale attention mechanism, the encoding network is responsible for feature extraction, global dependence is constructed by applying global modeling of axial attention, information is fully extracted, multi-scale feature extraction capability of multi-scale attention is applied, adaptive feature extraction is performed on multi-scale input, and finally the two are organically combined in an adaptive pooling manner to fully extract semantic information; meanwhile, the characteristics of the coding network can also be fed into a decoding network to assist the semantic segmentation of the decoding network;

The encoding network comprises four layers, each layer comprises a multi-scale feature extraction trunk based on a multi-scale attention mechanism and an attention bypass based on axial attention, effective global information is extracted by the bypass, and self-adaptive selection weights are given to the multi-scale features extracted by the main path;

3. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein the U-shaped attention mechanism network in step 3 comprises a multi-scale attention mechanism, which can realize self-adaptive multi-scale feature extraction;

the multi-scale attention mechanism comprises a multi-scale feature extraction part and an adaptive weight part; the multi-scale feature extraction is shown as formula (I):

in the formula (I), the compound is shown in the specification,

the method is multi-scale convolution with n paths of different receptive fields, concat (phi) represents feature map splicing, F (phi) represents a feature map, k (phi) represents a convolution kernel, and p is an element in the feature map;

The adaptive weight of the multi-scale feature map is generated by the feature map itself, and is subjected to global average pooling, global maximum pooling and weight generation layers, and the three processes are shown as formula (II), formula (III) and formula (IV):

Z_max(c)＝Max(X(c,i,j))0≤i≤W-1 0≤j≤H-1 (III)

Weights＝σ(f(δ(f(Z_avg+Z_max)))) (IV)

in the formulae (II), (III) and (IV), Z_avg(c) Representing global average pooling, H and W representing two spatial dimensions of the input feature map, respectively, and X representing the input feature map; z_max(c) Representing global maximum pooling, Max representing a maximum function; weight represents generated multi-scale weight, f is a full connection layer, delta is a ReLU activation function, and sigma represents a Sigmoid activation function;

the generated weight weights different scale characteristic graphs extracted by the multi-scale convolution to obtain a self-adaptive weighted multi-scale attention result.

4. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein the U-shaped attention mechanism network in step 3 includes an axial attention mechanism, so that effective global dependency modeling can be realized;

the axial attention mechanism comprises three parts of transverse axis attention, longitudinal axis attention and an output module; horizontal axis attention and vertical axis attention are shown as formulas (V) and (VI):

Formula (V) and formula (V)In (VI), Matmul (. cndot.) represents matrix multiplication, Softmax (. cndot.) represents Softmax function, Q_h、K_hAnd V_hRepresenting the input in the transverse direction, Q_w、K_wAnd V_wRepresenting longitudinal input, D_hAnd D_wRespectively representing the dimensionality of the horizontal input and the dimensionality of the longitudinal input, wherein permute represents a permutation function of the dimensionality and restores the dimensionality direction of the longitudinal input;

the lateral attention and the longitudinal attention can be modeled into global dependencies in the lateral direction and the longitudinal direction respectively, and then the final spatial global attention is obtained through an output module, as shown in formula (VII):

Weights＝Softmax(conv(concat(X_h,X_w))) (VII)

in the formula (VII), X_hAnd X_wComputing the horizontal and vertical global dependencies of the output for equations (V) and (VI), weight representing the heatmap of the spatial global attention weight of the final output, Softmax (-) representing the Softmax function, conv (-) representing the 3 × 3 convolution, and concat (-) representing the feature map stitching;

and then the generated spatial global attention weight heat map weights the spatial dimension of the feature map, and finally the weighting result of the axial attention mechanism is obtained.

5. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein in the step 3 of constructing the U-shaped attention mechanism network, the axial attention mechanism and the multi-scale attention mechanism are organically combined, and the specific steps are that the axial attention mechanism and the global average pooling and the global maximum pooling are combined to construct an adaptive global pooling, and then the adaptive global pooling result is used as an adaptive weight for generating the multi-scale feature map.

6. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein the weight cross entropy is used to solve the problem of serious sample imbalance between background and tumor in the liver tumor segmentation task.

7. The liver tumor image segmentation method based on the deep learning attention mechanism as claimed in claim 1, wherein in the step 4 network training process, the network is trained by using an Nvidia GTX1080Ti GPU, and the training is stopped when the network converges.