CN114581459A

CN114581459A - Improved 3D U-Net model-based segmentation method for image region of interest of preschool child lung

Info

Publication number: CN114581459A
Application number: CN202210117033.9A
Authority: CN
Inventors: 俞刚; 李哲明; 黄坚; 沈忱; 李竞; 杨丽; 柴象飞; 左盼莉; 钱宝鑫; 余卓
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2022-06-03

Abstract

The invention provides a method for segmenting a region of interest of a preschool child lung image based on an improved 3D U-Net model, which comprises the following steps: (1) collecting CT image data of preschool children patients for preprocessing; (2) dividing the preprocessed image into a training set, a verification set and a test set; (3) constructing a segmentation model, wherein the segmentation model adopts an improved 3D U-Net network model, a channelized Transformer module is designed between an encoder and a decoder of the 3D U-Net network model, and a UCTransNet framework is constructed to replace a jump connection in U-Net so as to better fuse the characteristics of the encoder; (4) sending the preprocessed training set into a constructed segmentation model for training; (5) and inputting the pre-school child lung image to be segmented into the trained segmentation model to obtain the region of interest of the lung image. The method can solve the problem that the image quality is inferior to that of adults caused by excessive motion when preschool children take CT images, so that the interested area of the lung is automatically and accurately segmented from the CT images.

Description

Improved 3D U-Net model-based segmentation method for image region of interest of preschool child lung

Technical Field

The invention belongs to the field of medical artificial intelligence, and particularly relates to a method for segmenting a region of interest of a preschool child lung image based on an improved 3D U-Net model.

Background

Computer vision technology is commonly used in the field of image rapid intelligent processing, such as image classification, target detection and target retrieval, computer vision simulates a human vision mechanism, and has the advantages of high detection speed and low cost.

In recent years, with the application of deep learning in the field of computer vision, particularly the field of medical imaging, breakthrough progress is made, and a data-driven based deep learning technology enables a computer to realize full-automatic segmentation of a target region by combining the technologies of imaging and medical image processing with the analysis and calculation of the computer, so that the segmentation of organ tissues or focuses in medical images can be used for quantitative analysis of medical information and diagnosis assistance of doctors; but also three-dimensional reconstruction for computer surgical guidance.

For example, chinese patent publication No. CN110097550A discloses a medical image segmentation method based on deep learning, which includes: acquiring a historical Magnetic Resonance Imaging (MRI) modality image; dividing the historical Magnetic Resonance Imaging (MRI) modality image into a training set and a test set; in the down-sampling process, inputting two adjacent characteristic layers with different resolutions in any one of the historical MRI modal images in a training set into a neural network model for multi-level characteristic re-extraction and aggregation, and determining a segmented MRI modal image; two adjacent feature layers with different resolutions comprise a low-resolution feature layer and a high-resolution feature layer; two adjacent characteristic layers with different resolutions sequentially pass through a residual convolution unit, a resolution fusion unit and an aggregation unit to determine the segmented MRI modal image.

Chinese patent publication No. CN112950582A discloses a 3D lung lesion segmentation method based on deep learning, which is to pre-process a lung nodule dicom image by acquiring the dicom image; three-dimensionally stacking the preprocessed dicom images to obtain 3D image blocks, and cutting the 3D image blocks; performing feature extraction on the cut 3D image through a pre-trained spherical segmentation model to obtain a regression subgraph; and calculating the product of the centrality and the probability of the regression subgraph to obtain a plurality of central point coordinates, and obtaining coordinates of the regression point through the central point coordinates to obtain a segmentation result.

A doctor automatically and accurately segments the interested lung region from the CT image, obtains a three-dimensional lung model from the image, designs a treatment scheme and is very important for further lesion detection. However, for children, the image quality is not as good as that of adults due to excessive motion during CT taking, and the existing segmentation method has low segmentation precision on the region of interest of the lung image of children.

Disclosure of Invention

The invention provides a method for segmenting a lung image region of interest of a preschool child based on an improved 3D U-Net model, which can relieve the problem that the image quality is not good for adults caused by excessive motion when the preschool child takes a CT image, so that the lung region of interest can be automatically and accurately segmented from the CT image.

A method for segmenting a region of interest of a preschool child lung image based on an improved 3D U-Net model comprises the following steps:

(1) collecting CT image data of preschool child patients, and preprocessing the image data;

(2) dividing the preprocessed image data into a training set, a verification set and a test set;

(3) constructing a segmentation model, wherein the segmentation model adopts an improved 3D U-Net network model, a channelized Transformer module is designed between an encoder and a decoder of the 3D U-Net network model, and a UCTransNet framework is constructed to replace a jump connection in U-Net so as to better fuse the characteristics of the encoder;

the channelized Transformer module consists of a multi-scale channel fusion submodule and a multi-scale channel attention submodule;

(4) sending the preprocessed training set into the constructed segmentation model for training, and adjusting the hyper-parameters of the segmentation model by using the preprocessed verification set to obtain the trained segmentation model;

(5) and inputting the pre-school child lung image to be segmented into the trained segmentation model to obtain the region of interest of the lung image.

In the step (1), the preprocessing includes cutting the image, cutting off irrelevant areas, resampling the image, normalizing the image in scale, and filtering out noise by using a median filter.

In the step (2), the preprocessed image data is divided into a training set, a verification set and a test set according to the ratio of 7:2: 1.

In the step (3), 5 encoders E are arranged in the 3D U-Net network model₁～E₅And 4 decoders D₁～D₄。

The multi-scale channel fusion submodule includes 2 steps, first multi-scale feature embedding, and then using a multi-layer perceptron.

The structure of the multi-scale channel fusion submodule is as follows:

first 4 encoders E of 3D U-Net network model₁～E₄The output is T₁～T₄First, features are reshaped into unfolded 2D patch sequences of different sizes using layer normalization, with the patches in placeThe same region mapped to the encoder features at 4 scales; three elements were introduced in the self-attention of the Transformer: query, Key, and Value; in the process, the original channel size is kept, then the outputs of 4 layers are combined to be used as Key and Value, and the Key and the Value are unified into a feature space through space transformation; mapping Query, Key and Value to a uniform scale of encoder features through layer standardization, and then inputting the features into a multi-layer sensor MLP with a residual error structure to obtain an output O₁～O₄。

The structure of the multi-scale channel attention submodule is as follows:

outputting O to the ith stage of Transformer_iAnd ith decoder feature map D_iAs an input for channel cross attention; spatial compression is performed by the global average pooling layer, producing the vector g (x) and its kth channel, using this operation to embed global spatial information, and then generating an attention Mask:

M_i＝L1*σ(O_i)+L2*σ(D_i)

wherein L1 and L2 are the weights of 2 linear layers and the ReLU operator; channel attention maps are constructed using a single linear layer and Sigmoid functions, with the resulting vectors used to recalibrate or fire O_iTo

Wherein σ (M) is activated_i) Representing the importance of each channel; finally, the Mask is processed

With up-sampling feature D of the i-th decoder_iAre connected together.

And (4) training the segmentation model by adopting a supervision training method.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, a channelized Transformer module CTrans is designed between a 3D U-Net encoder (E1-E5) and a decoder (D1-D4) to construct a UCTransNet frame to replace skip connection (skip connection) in U-Net so as to better fuse the characteristics of the encoder and reduce semantic difference, thereby realizing accurate automatic segmentation of medical images.

Drawings

FIG. 1 is a flowchart of a method for segmenting a region of interest of a preschool child lung image based on an improved 3D U-Net model according to the present invention;

FIG. 2 is a general block diagram of a segmentation model according to the present invention;

FIG. 3 is a schematic diagram of a network structure of a multi-scale channel fusion submodule in a segmentation model;

FIG. 4 is a network structure diagram of a multi-scale channel attention submodule in a segmentation model.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in fig. 1, a method for segmenting a region of interest of a lung image of a preschool child based on an improved 3D U-Net model includes the following steps:

1. image pre-processing

Collecting preschool children CT image data, cutting the image, cutting off irrelevant areas, resampling the image, carrying out scale normalization on the image, and filtering noise by adopting a median filter.

2. Data packet

70% of the data set was used as the training set, 20% of the data set was used as the validation set, and 10% of the data set was used as the test set.

3. Model construction

As shown in fig. 2, a segmentation model is constructed, and the segmentation model adopts a modified 3D U-Net network model. In the figure, sections E1-E5 and D1-D4 constitute an encoder and a decoder of 3D U-Net, respectively. The invention is realized by a 3D U-Net encoder (E)₁～E₅) And a decoder (D)₁～D₄) A channelized Transformer (CTrans) module is designed to construct a UCTransNet framework to replace a skip connection in U-Net for better performanceAnd the characteristics of the encoder are fused, and the semantic gap is reduced.

CTrans consists of two modules: one multi-scale channel fusion submodule is used for carrying out multi-scale channel fusion with a Transformer, and the other multi-channel attention submodule is used for better solving the characteristic of semantic inconsistency between the channelized Transformer and a U-Net decoder so as to eliminate ambiguity. Therefore, the connection composed of multi-scale channel fusion and multi-channel attention provided by the invention can replace the original skip connections, solve the semantic blank and realize accurate automatic segmentation of the medical image.

In order to solve the problem of skip connection, the invention provides a novel channel cross fusion transform-multi-scale channel fusion, and the features of a multi-scale encoder are fused by using the advantage of long-dependence modeling of the transform. The multi-scale channel fusion sub-module comprises 2 steps, firstly multi-scale feature embedding, and finally multi-layer perceptron (MLP) is used.

The network structure of the multi-scale channel fusion submodule is shown in FIG. 3, given the output T of 4 decoders₁～T₄First, we use layer normalization to map features reshape into an expanded 2D patch sequence of different sizes so that these patches can be mapped to the same region of the encoder feature at 4 scales. A very critical contribution in the Transformer is self-attention, which is to construct an attention model by using the relation of input samples themselves. Three very important elements are introduced into self-attention: query, Key, and Value. In this process, the original channel size is maintained, and then the outputs of the 4 layers are merged as Key and Value. In self-association, both Key and Value are a transformation of the input sequence itself, and both itself are used as Key and Value. Key and Value can be unified into a feature space through certain spatial transformation. Query, Key and Value are mapped to a uniform scale of encoder features through layer normalization, and then the features are input into a multi-layer perceptron (MLP) with residual structure to encode channel relationships and dependencies, using multi-scale features to extract features from each U-Net encoder level.

Network of multi-channel attention sub-modulesThe structure is shown in FIG. 4, mathematically, the i-th stage transform is output as O_iAnd ith-level decoder feature map D_iAs an input for channel cross attention. Spatial compression is performed by the Global Average Pooling (GAP) layer, producing the vector g (x) and its kth channel, embedding global spatial information using this operation, and then generating an attention Mask:

M_i＝L1*σ(O_i)+L2*σ(D_i)

where L1 and L2 are the weights of the 2 linear layers and the ReLU operator. Experience with ECA-Net has shown that avoiding dimensionality reduction is important to learning channel attention, using a single linear layer and Sigmoid functions to construct a channel attention map. The resulting vector is used to recalibrate or excite O_iTo

In which sigma (M) is activated_i) Indicating the importance of each channel. Finally, the Mask is processed

With up-sampling feature D of the i-th decoder_iAre connected together.

The invention provides a new visual angle to improve the performance of semantic segmentation, namely, the semantic and resolution difference between low-level and high-level features is made up through more effective feature fusion and multi-scale channel cross attention so as to capture more complex channel dependence. UCTransNet reconsiders the method of the Transformer self-attention mechanism from a channel perspective.

4. Model training and segmentation testing

When the segmentation model is trained, the training set is sent into an improved 3D U-Net segmentation network; the verification set adjusts the hyper-parameters of the model, an optimizer is used for updating the parameters, the network is optimized, the learning rate is automatically adjusted, and a trained segmentation network is obtained; the test set is used to estimate the generalization ability of the model after the learning process is completed.

And training the model by adopting a supervised training method, in order to obtain a corresponding image label, delineating a lung field region in the CT image by an experienced sonographer, and verifying another expert label to ensure the accuracy of the CT image.

5. Evaluation phase

And (3) evaluating the lung field segmentation condition in each patient image: the evaluation of the segmentation task is essentially a pixel-level classification problem. The Accuracy of segmentation (ACC) is divided by the number of correctly classified pixels (TP) by the total number of pixels. When the model outputs a classification in a region that does not contain lung fields, False Positives (FP) are counted. Finally, the segmentation performance of the intelligent lung field segmentation model on the test set is evaluated by adopting a Dice coefficient, Jaccard accuracy, Average Surface Distance (ASD) and Hough Distance (HD).

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for segmenting a region of interest of a preschool child lung image based on an improved 3D U-Net model is characterized by comprising the following steps:

2. The method for segmenting the region of interest of the lung image of the preschool child based on the improved 3D U-Net model as claimed in claim 1, wherein in the step (1), the preprocessing comprises clipping the image, clipping the irrelevant region, resampling the image, normalizing the image in a scale, and filtering out the noise by using a median filter.

3. The method for segmenting the lung image region of interest of the preschool child based on the improved 3D U-Net model is characterized in that in the step (2), the preprocessed image data is divided into a training set, a verification set and a test set according to the ratio of 7:2: 1.

4. The method for segmenting ROI (region of interest) of preschool children based on improved 3D U-Net model of claim 1, wherein in step (3), 5 encoders E are provided in the 3D U-Net network model₁～E₅And 4 decoders D₁～D₄。

5. The method for segmenting the region of interest of the pulmonary image of the preschool child based on the improved 3D U-Net model is characterized in that in the step (3), the multi-scale channel fusion submodule comprises 2 steps of firstly embedding the multi-scale features and then using a multi-layer perceptron.

6. The method for segmenting the region of interest of the pulmonary image of the preschool child based on the improved 3D U-Net model according to claim 5, wherein the multi-scale channel fusion submodule has the following structure:

first 4 encoders E of 3D U-Net network model₁～E₄The output is T₁～T₄First, features are reshaped into unwrapped 2D patch sequences of different sizes using layer normalization, such that these patches map to the same region of the encoder feature at 4 scales; three elements were introduced in self-attention of the Transformer: query, Key, and Value; in the process, the original channel size is kept, then the outputs of 4 layers are combined to be used as Key and Value, and the Key and the Value are unified into a feature space through space transformation; mapping Query, Key and Value to a uniform scale of encoder features through layer standardization, and then inputting the features into a multi-layer sensor MLP with a residual error structure to obtain an output O₁～O₄。

7. The method for segmenting the region of interest of the pulmonary image of the preschool child based on the improved 3D U-Net model according to claim 6, wherein in the step (3), the structure of the multi-scale channel attention submodule is as follows:

outputting O to the ith stage of Transformer_iAnd ith-level decoder feature map D_iAs an input for channel cross attention; spatial compression is performed by the global mean pooling layer, producing the vector g (x) and its kth channel, using this operation to embed global spatial information, and then generating an attention Mask:

M_i＝L1*σ(O_i)+L2*σ(D_i)

With up-sampling feature D of the i-th decoder_iAre connected together.

8. The method for segmenting the lung image region of interest of the preschool child based on the improved 3D U-Net model according to claim 1, wherein in the step (4), a supervised training method is used for training the segmentation model.