CN116205936A

CN116205936A - An Image Segmentation Method Introducing Spatial Information and Attention Mechanism

Info

Publication number: CN116205936A
Application number: CN202310350733.7A
Authority: CN
Inventors: 栾晓; 薛加望; 刘玲慧
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-06-02

Abstract

The invention relates to an image segmentation method for introducing spatial information and an attention mechanism, belonging to the field of image segmentation. The method specifically comprises the following steps: s1: data preparation stage: preprocessing a brain medical image, and cutting out an image block from the preprocessed image; s2: feature encoding stage: extracting image features by pre-activated 3D convolution; s3: feature decoding stage: and restoring the original image size of the feature map obtained in the encoding stage through deconvolution and a attention mechanism with position encoding, and completing the image segmentation process. According to the invention, the attention mechanism is used for paying attention to the space information, so that the network segmentation performance is improved.

Description

An Image Segmentation Method Introducing Spatial Information and Attention Mechanism

技术领域technical field

本发明属于图像分割领域，涉及一种引入空间信息和注意力机制的图像分割方法。The invention belongs to the field of image segmentation and relates to an image segmentation method introducing spatial information and attention mechanism.

背景技术Background technique

随着计算机断层扫描(Computed Tomography，CT)、核磁共振图像(MagneticResonance Imaging，MRI)等人体医学成像技术的迅速发展，医学图像在临床医学诊断中发挥着越来越重要的作用。医学图像分割可以更好的在医务人员判断和诊断病情病因时提供科学参考，从而会大大减少因为人类本身视力分辨力不足或是医疗人员主观上临床经验不足产生的误诊率，进一步提高医学图像的利用率。U-Net被证明在医学图像处理任务上十分有效，在医学图像中，有一种3D图像，其解剖结构复杂，完全基于卷积的编解码结构不能够充分利用3D图像的空间信息。With the rapid development of human medical imaging technologies such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), medical images are playing an increasingly important role in clinical medical diagnosis. Medical image segmentation can better provide scientific reference for medical staff to judge and diagnose the cause of the disease, which will greatly reduce the misdiagnosis rate caused by insufficient visual resolution of human beings or insufficient clinical experience of medical staff, and further improve the accuracy of medical images. utilization rate. U-Net has been proven to be very effective in medical image processing tasks. In medical images, there is a 3D image with complex anatomical structures, and the convolution-based codec structure cannot make full use of the spatial information of 3D images.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种引入空间信息和注意力机制的图像分割方法，通过使用3D相对位置编码以及注意力机制去充分挖掘3D图像的空间信息，在解码路径上，更准确的恢复图像的语义信息，从而提高模型分割精度。In view of this, the purpose of the present invention is to provide an image segmentation method that introduces spatial information and an attention mechanism, by using 3D relative position coding and attention mechanism to fully mine the spatial information of 3D images, and on the decoding path, more accurate The semantic information of the restored image can be improved to improve the segmentation accuracy of the model.

为达到上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种引入空间信息和注意力机制的图像分割方法，本方法设计了一种相对位置编码辅助的自注意力分割网络，在编码阶段，由预激活的3D卷积提取图像特征，在解码阶段，由反卷积逐渐恢复图像尺寸，并通过嵌入了相对位置编码的NonLocal自注意力模块恢复图像特征，该方法包含如下步骤：An image segmentation method that introduces spatial information and attention mechanism. This method designs a self-attention segmentation network assisted by relative position encoding. In the encoding stage, image features are extracted by pre-activated 3D convolution. In the decoding stage, The image size is gradually restored by deconvolution, and the image features are restored by the NonLocal self-attention module embedded with relative position encoding. This method includes the following steps:

S1：数据准备阶段：预处理大脑医学影像，从预处理过的图像中裁剪出图像块；S1: Data preparation stage: preprocessing brain medical images, cropping image blocks from preprocessed images;

S2：特征编码阶段：通过预激活的3D卷积提取图像特征；S2: Feature encoding stage: extract image features through pre-activated 3D convolution;

S3：特征解码阶段：将编码阶段得到的特征图通过反卷积以及具有位置编码的注意力机制恢复原图像尺寸大小，完成图像分割过程。S3: Feature decoding stage: The feature map obtained in the encoding stage is restored to the original image size through deconvolution and attention mechanism with position encoding, and the image segmentation process is completed.

进一步，其特征在于，所述步骤S1包括以下步骤：Further, it is characterized in that the step S1 includes the following steps:

S11：对三维医学图像进行裁剪，沿任意两轴所形成的平面裁掉灰度值为0的背景区域；S11: cropping the 3D medical image, cutting off the background area with a gray value of 0 along the plane formed by any two axes;

S12：对裁剪过的图像使用Z-Score归一化，让图像灰度分布的均值为0，标准差为1，使其服从正态分布；S12: Use Z-Score normalization on the cropped image, so that the mean value of the gray distribution of the image is 0, and the standard deviation is 1, so that it obeys the normal distribution;

S13：把裁剪过的图像中切割成一个个32×32×32大小的图像切块，随机选择一个切块作为步骤S2中特征编码阶段的输入；如果是多模态的数据，则将所有模态的数据沿通道维度拼接起来，组成多通道图像作为步骤S2中特征编码阶段的输入。S13: Cut the cropped image into image blocks of 32×32×32 in size, randomly select a block as the input of the feature encoding stage in step S2; if it is multi-modal data, all modes The state data are spliced along the channel dimension to form a multi-channel image as the input of the feature encoding stage in step S2.

进一步，所述步骤S2包括以下步骤：Further, the step S2 includes the following steps:

S21：使用普通3D卷积提取对裁剪后的三维图像进行特征提取，得到32×32×32的特征映射；S21: Using common 3D convolution extraction to perform feature extraction on the cropped 3D image to obtain a 32×32×32 feature map;

S22：使用步长为2的3D卷积对S21的特征图进行下采样操作；S22: use a 3D convolution with a step size of 2 to perform a downsampling operation on the feature map of S21;

S23：重复操作S21和S22，最终得到4×4×4的特征图。S23: Repeat operations S21 and S22 to finally obtain a 4×4×4 feature map.

进一步，所述步骤S3具体包括以下步骤：Further, the step S3 specifically includes the following steps:

S31：针对编码阶段获取的3D特征图，分别以所有像素点为原点计算其他像素点的相对位置；S31: For the 3D feature map obtained in the encoding stage, calculate the relative positions of other pixels with all pixels as the origin;

S32：把S31中生成的位置编码嵌入到NonLocal自注意力机制中，对编码阶段的特征图做特征融合；S32: Embed the position encoding generated in S31 into the NonLocal self-attention mechanism, and perform feature fusion on the feature map in the encoding stage;

S33：使用反卷积对特征图上采样，随后重复S32；S33: Use deconvolution to upsample the feature map, and then repeat S32;

S34：重复步骤S33两次，实现图像分割。S34: Repeat step S33 twice to realize image segmentation.

本发明的有益效果在于：通过相对位置编码引入空间信息，并将其嵌入到自注意力机制中，使得注意力机制中权重的学习不仅依赖于灰度信息和还依赖于位置信息。通过注意力机制去关注空间信息，提升网络分割性能。The beneficial effect of the present invention is that spatial information is introduced through relative position encoding and embedded into the self-attention mechanism, so that the learning of weights in the attention mechanism not only depends on grayscale information but also depends on position information. Attention mechanism is used to pay attention to spatial information and improve network segmentation performance.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述，并且在某种程度上，基于对下文的考察研究对本领域技术人员而言将是显而易见的，或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects and features of the present invention will be set forth in the following description to some extent, and to some extent, will be obvious to those skilled in the art based on the investigation and research below, or can be obtained from Taught in the practice of the present invention. The objects and other advantages of the invention may be realized and attained by the following specification.

附图说明Description of drawings

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作优选的详细描述，其中：In order to make the purpose of the present invention, technical solutions and advantages clearer, the present invention will be described in detail below in conjunction with the accompanying drawings, wherein:

图1为本发明中的图像分割模型的网络结构图；Fig. 1 is the network structure diagram of the image segmentation model among the present invention;

图2为本发明中的相对位置编码结构示意图；Fig. 2 is a schematic diagram of a relative position encoding structure in the present invention;

图3为本发明中网络的上采样模块。Fig. 3 is an upsampling module of the network in the present invention.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。需要说明的是，以下实施例中所提供的图示仅以示意方式说明本发明的基本构想，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。Embodiments of the present invention are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific implementation modes, and various modifications or changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the diagrams provided in the following embodiments are only schematically illustrating the basic concept of the present invention, and the following embodiments and the features in the embodiments can be combined with each other in the case of no conflict.

其中，附图仅用于示例性说明，表示的仅是示意图，而非实物图，不能理解为对本发明的限制；为了更好地说明本发明的实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；对本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。Wherein, the accompanying drawings are for illustrative purposes only, and represent only schematic diagrams, rather than physical drawings, and should not be construed as limiting the present invention; in order to better illustrate the embodiments of the present invention, some parts of the accompanying drawings may be omitted, Enlargement or reduction does not represent the size of the actual product; for those skilled in the art, it is understandable that certain known structures and their descriptions in the drawings may be omitted.

本发明实施例的附图中相同或相似的标号对应相同或相似的部件；在本发明的描述中，需要理解的是，若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此附图中描述位置关系的用语仅用于示例性说明，不能理解为对本发明的限制，对于本领域的普通技术人员而言，可以根据具体情况理解上述术语的具体含义。In the drawings of the embodiments of the present invention, the same or similar symbols correspond to the same or similar components; , "front", "rear" and other indicated orientations or positional relationships are based on the orientations or positional relationships shown in the drawings, which are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred devices or elements must It has a specific orientation, is constructed and operated in a specific orientation, so the terms describing the positional relationship in the drawings are for illustrative purposes only, and should not be construed as limiting the present invention. For those of ordinary skill in the art, the understanding of the specific meaning of the above terms.

请参阅附图1～图3，本发明提供一种引入空间信息和注意力机制的图像分割方法，在本实施例中假设用于对脑组织图像进行分割，本方法设计了一种相对位置编码辅助的自注意力分割网络，网络结构图如附图1所示。在编码阶段，由预激活的3D卷积提取图像特征，在解码阶段，由反卷积逐渐恢复图像尺寸，并通过嵌入了相对位置编码的NonLocal自注意力模块(如附图3所示)恢复图像特征。该方法包含如下步骤：Please refer to accompanying drawings 1 to 3. The present invention provides an image segmentation method that introduces spatial information and attention mechanism. In this embodiment, it is assumed that it is used to segment brain tissue images. This method designs a relative position encoding The auxiliary self-attention segmentation network, the network structure diagram is shown in Figure 1. In the encoding stage, image features are extracted by pre-activated 3D convolution, and in the decoding stage, the image size is gradually restored by deconvolution, and restored by the NonLocal self-attention module embedded with relative position encoding (as shown in Figure 3) image features. The method comprises the steps of:

步骤1：预处理大脑医学影像，从预处理过的图像中裁剪出32×32×32的图像块；Step 1: Preprocess the brain medical image, and cut out a 32×32×32 image block from the preprocessed image;

步骤101：对三维医学图像进行裁剪，沿任意两轴所形成的平面裁掉灰度值为0的背景区域；Step 101: cutting the three-dimensional medical image, cutting off the background area with a gray value of 0 along the plane formed by any two axes;

步骤102：对裁剪过的图像使用Z-Score归一化，让图像灰度分布的均值为0，标准差为1，使其服从正态分布；Step 102: Use Z-Score normalization on the cropped image, so that the mean value of the gray distribution of the image is 0, and the standard deviation is 1, so that it obeys the normal distribution;

步骤103：把裁剪过的图像中切割成一个个32×32×32大小的图像切块，随机选择一个切块作为模型的输入；如果是多模态的数据，则将所有模态的数据沿通道维度拼接起来，组成多通道图像作为网络输入。Step 103: Cut the cropped image into 32×32×32 image slices, randomly select a slice as the input of the model; if it is multi-modal data, all the modality data along the The channel dimensions are concatenated to form a multi-channel image as network input.

步骤2：特征编码阶段：通过预激活的3D卷积提取图像特征；Step 2: Feature encoding stage: extract image features through pre-activated 3D convolution;

步骤201：使用普通3D卷积提取对三维图像进行特征提取，得到32×32×32的特征映射；Step 201: using common 3D convolution extraction to perform feature extraction on the 3D image to obtain a 32×32×32 feature map;

步骤202：使用步长为2的3D卷积对步骤201的特征图进行下采样操作；Step 202: use a 3D convolution with a step size of 2 to perform a downsampling operation on the feature map in step 201;

步骤203：重复操作步骤201和步骤202，最终得到4×4×4的特征图。Step 203: Repeat steps 201 and 202 to finally obtain a 4×4×4 feature map.

步骤S3：特征解码阶段：将编码阶段得到的特征图通过反卷积以及具有位置编码的注意力机制恢复原图像尺寸大小，完成图像分割过程。Step S3: Feature decoding stage: The feature map obtained in the encoding stage is restored to the original image size through deconvolution and attention mechanism with position encoding, and the image segmentation process is completed.

步骤301：针对编码阶段获取的3D特征图，分别以所有像素点为原点计算其他像素点的相对位置，如图2所示；Step 301: For the 3D feature map obtained in the encoding stage, calculate the relative positions of other pixels with all pixels as the origin, as shown in FIG. 2 ;

步骤302：如图3所示，把步骤301中生成的位置编码嵌入到NonLocal自注意力机制中，对编码阶段的特征图做特征融合。Step 302: As shown in Figure 3, embed the position code generated in step 301 into the NonLocal self-attention mechanism, and perform feature fusion on the feature map in the coding stage.

步骤303：使用反卷积对特征图上采样，随后重复步骤302。Step 303: Use deconvolution to upsample the feature map, and then repeat step 302.

步骤304：重复步骤303两次实现图像分割。Step 304: Repeat step 303 twice to realize image segmentation.

通过相对位置编码引入空间信息，并将其嵌入到自注意力机制中，使得注意力机制中权重的学习不仅依赖于灰度信息和还依赖于位置信息。通过注意力机制去关注空间信息，提升网络分割性能。The spatial information is introduced by relative position encoding and embedded into the self-attention mechanism, so that the learning of weights in the attention mechanism not only depends on the gray level information but also depends on the position information. Attention mechanism is used to pay attention to spatial information and improve network segmentation performance.

为了验证本发明的效果，进行了以下实验：In order to verify the effect of the present invention, the following experiments have been carried out:

基于该引入空间信息和注意力机制的图像分割方法，在IBSR18数据集上进行了测试。IBSR18数据集包含18个训练样本，测试目标是将大脑组织核磁共振图像分割成灰质(GM)、白质(WM)、脑脊液(CSF)以及背景。使用14个数据样本作为训练集，余下一个作为验证集。同时对比了同时使用空间注意力和通道注意力的方法1、使用自注意力机制的方法2、使用轴向注意力的方法3以及本发明的方法4。采用Dice系数作为评价指标，其公式如下：Based on the image segmentation method that introduces spatial information and attention mechanism, it is tested on the IBSR18 dataset. The IBSR18 dataset contains 18 training samples, and the test goal is to segment brain tissue MRI images into gray matter (GM), white matter (WM), cerebrospinal fluid (CSF) and background. 14 data samples are used as the training set, and the remaining one is used as the validation set. At the same time, method 1 using spatial attention and channel attention, method 2 using self-attention mechanism, method 3 using axial attention and method 4 of the present invention are compared. The Dice coefficient is used as the evaluation index, and its formula is as follows:

其中A代表神经网络分割的结果，B代表数据集给出的金标准。Among them, A represents the result of neural network segmentation, and B represents the gold standard given by the data set.

表1给出了数据集上测试的结果，可以看到在Dice系数上，基于本发明的神经网络在每个分割结果上都表现更加优秀。Table 1 shows the test results on the data set. It can be seen that the neural network based on the present invention performs better on each segmentation result on the Dice coefficient.

表1Table 1

CSFCSF GMGM WMW M AVGAVG 方法1method 1 85.8885.88 95.3095.30 95.0595.05 92.0892.08 方法2Method 2 86.1286.12 95.3895.38 95.0495.04 92.1792.17 方法3Method 3 86.2086.20 95.3795.37 95.0195.01 92.1992.19 方法4Method 4 86.7786.77 95.4895.48 95.0695.06 92.4492.44

最后说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本技术方案的宗旨和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it is noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be carried out Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should be included in the scope of the claims of the present invention.

Claims

1. An image segmentation method introducing spatial information and an attention mechanism is characterized in that: comprises the following steps:

s1: data preparation stage: preprocessing a brain medical image, and cutting out an image block from the preprocessed image;

s2: feature encoding stage: extracting image features by pre-activated 3D convolution;

s3: feature decoding stage: and restoring the original image size of the feature map obtained in the encoding stage through deconvolution and a attention mechanism with position encoding, and completing the image segmentation process.

2. The method of image segmentation incorporating spatial information and attention mechanisms of claim 1, wherein: the method is characterized in that the step S1 comprises the following steps:

s11: cutting the three-dimensional medical image, and cutting off a background area with a gray value of 0 along a plane formed by any two axes;

s12: the cut image is normalized by Z-Score, the average value of gray distribution of the image is 0, and the standard deviation is 1, so that the image is subjected to normal distribution;

s13: cutting the cut image into a plurality of image cut blocks with the size of 32 multiplied by 32, and randomly selecting one cut block as an input of the feature encoding stage in the step S2; if the data are multi-mode data, all the mode data are spliced together along the channel dimension to form a multi-channel image which is used as the input of the feature encoding stage in the step S2.

3. The method of image segmentation incorporating spatial information and attention mechanisms of claim 1, wherein: the method is characterized in that the step S2 comprises the following steps:

s21: performing feature extraction on the cut three-dimensional image by using common 3D convolution extraction to obtain 32X 32 feature mapping;

s22: downsampling the feature map of S21 using a 3D convolution with a step size of 2;

s23: the operations S21 and S22 are repeated, finally obtain 4 multiplied by 4 feature map x 4.

4. The method of image segmentation incorporating spatial information and attention mechanisms of claim 1, wherein: the method is characterized in that the step S3 specifically comprises the following steps:

s31: for the 3D feature map obtained in the encoding stage, calculating the relative positions of other pixel points by taking all the pixel points as original points;

s32: embedding the position code generated in the step S31 into a non local self-attention mechanism, and carrying out feature fusion on the feature map of the coding stage;

s33: upsampling the feature map using deconvolution, followed by repeating S32;

s34: the step S33 is repeated twice to realize image segmentation.