CN116542924A

CN116542924A - Method, device and storage medium for detecting prostate lesion area

Info

Publication number: CN116542924A
Application number: CN202310486436.5A
Authority: CN
Inventors: 朴永日; 李智玮; 张淼; 吴岚虎
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-04

Abstract

The invention discloses a prostate lesion area detection method, device and storage medium, and relates to the field of computer vision. Accurate automatic prostate lesion area segmentation is an important requirement for computer-aided diagnosis and treatment of prostate diseases. However, the lack of clear boundaries and high contrast between the prostate and surrounding tissues makes it difficult to accurately extract the prostate from the background. Due to the scarcity of information in medical images, multiple methods must be used for multi-dimensional information extraction, but current methods cannot fully learn and mine hidden information. In order to deal with the above challenges, the present invention proposes a brand-new prostate lesion area detection method, device and storage medium suitable for the field of computer vision. In order to improve the quality of extracted information at different scales, different volumes are used in the proposed network. Active combinations and various attention mechanisms. So that it can have superior performance in different imaging methods.

Description

Method, device and storage medium for detecting prostate lesion area

技术领域technical field

本发明涉及到计算机视觉领域，具体为一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法、装置及存储介质。The invention relates to the field of computer vision, in particular to a method, device and storage medium for detecting a prostate lesion area based on a dynamic multi-scale perception self-adaptive integration network.

背景技术Background technique

前列腺分割是指对于输入的一张前列腺MR图像，要检测出其中的前列腺区域，而如果是前列腺病灶分割，则是如果有就将病灶从原图中分割出来。由于医学图像与常规的自然图像有着很大的差异，所以直接将常规的分割算法应用于医学图像分割，性能会有着不可避免的下降。对于前列腺病灶区域的分割更是如此。从磁共振(MR)图像中准确分割前列腺以及病灶区域对于前列腺疾病是至关重要的诊断和治疗计划，特别是前列腺癌症、前列腺炎、前列腺增生等像这些常见的前列腺疾病在疾病的一般诊断中，医学图像通常是分割的由几位专家手动操作，而这是很耗时过程。随着深度学习的发展，卷积神经网络已经在前列腺以及病灶分割领域有了长足的进步，进而产生了有关于前列腺以及前列腺病灶区域分割的更多分支研究。而这些研究，无论是在对于前列腺及其疾病的理论研究，还是一线的医学实践中，都起到了十分重要的作用。Prostate segmentation refers to detecting the prostate region in an input prostate MR image, and if it is prostate lesion segmentation, it is to separate the lesion from the original image if there is one. Since medical images are very different from conventional natural images, the performance of conventional segmentation algorithms directly applied to medical image segmentation will inevitably decline. This is especially true for the segmentation of prostate lesion regions. Accurate segmentation of the prostate and lesion areas from magnetic resonance (MR) images is crucial for the diagnosis and treatment planning of prostate diseases, especially prostate cancer, prostatitis, benign prostatic hyperplasia and other common prostate diseases in the general diagnosis of the disease , medical images are usually segmented manually by several experts, and this is a time-consuming process. With the development of deep learning, convolutional neural networks have made great progress in the field of prostate and lesion segmentation, which has led to more branch research on prostate and prostate lesion segmentation. And these studies have played a very important role in both the theoretical research on the prostate and its diseases, and the first-line medical practice.

根据输入形式的不同，前列腺以及前列腺病灶分割可以分为两大种类：2D的前列腺及其病灶区域检测以及3D的前列腺及其病灶区域检测。其中，2D的前列腺及其病灶区域检测输入的单张的MR图像，通道数为1的灰度图。3D的前列腺及其病灶区域检测输入的则是连续的多张MR图像，这正是因为对于一个病人而言，其前列腺及其病灶区域的成像是连续的若干张图，即在轴向上的不同切片，由于相同病人的切片之间是可以有联系的，所以也就产生了3D的前列腺及其病灶区域检测。随着技术以及硬件的进步，最近也兴起了输入为视频的前列腺及其病灶区域检测，即利用了更多的输入信息，以实现更加精确的分割结果。According to different input forms, prostate and prostate lesion segmentation can be divided into two categories: 2D prostate and its lesion area detection and 3D prostate and its lesion area detection. Among them, the 2D prostate and its lesion area detection input is a single MR image, and the number of channels is a grayscale image. The input of 3D detection of prostate and its lesion area is multiple continuous MR images. This is precisely because for a patient, the imaging of the prostate and its lesion area is several consecutive images, that is, in the axial direction Different slices, since the slices of the same patient can be linked, so a 3D detection of the prostate and its lesion area is produced. With the advancement of technology and hardware, the detection of the prostate and its lesion area whose input is video has also emerged recently, that is, more input information is used to achieve more accurate segmentation results.

然而，目前最先进的MR图像自动分割方法面临着一些挑战。前列腺与周围组织之间缺乏清晰的边界和高对比度，很难从背景中准确提取前列腺。此外，背景纹理的复杂性和前列腺本身的尺度、形状和强度分布的较大变化使得分割更加复杂。除此以外，前列腺病灶区域的特征相较于前列腺区域而言，视觉上更难以区分，专业的医生同样会也是需要在查看多个模态下的数据才能够确定病灶区域的所在。同时，由于医学图像的信息稀缺性，必须采用多种方法进行多维度的信息提取，使网络以多种方式提取的丰富信息弥补图像先天的信息不足，但目前的方法并不能充分学习和挖掘图像中的隐藏信息。However, the current state-of-the-art automatic segmentation methods for MR images face several challenges. The lack of a clear border and high contrast between the prostate and surrounding tissues makes it difficult to accurately extract the prostate from the background. In addition, the complexity of the background texture and large variations in the scale, shape, and intensity distribution of the prostate itself complicate segmentation. In addition, the characteristics of the prostate lesion area are more difficult to distinguish visually than the prostate area. Professional doctors also need to check the data in multiple modalities to determine the location of the lesion area. At the same time, due to the scarcity of information in medical images, multiple methods must be used for multi-dimensional information extraction, so that the rich information extracted by the network in various ways can make up for the lack of innate information in the image, but the current methods cannot fully learn and mine images. Hidden information in .

发明内容Contents of the invention

针对目前前列腺病灶区域检测方法仍采用常规的固定参数层对输入的前列腺图像进行推理，难以适应输入图像中前列腺尺寸变化、边界模糊等的问题，本发明提供了一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法，利用输入的前列腺MR图像进行病灶区域检测，并通过动态局部池化以及全局高效注意力的网络进行优化更新，实现了在给定前列腺图像的场景下高质量的病灶区域检测。Aiming at the current prostate lesion area detection method still using the conventional fixed parameter layer to infer the input prostate image, it is difficult to adapt to the problems of prostate size change and boundary blur in the input image, the present invention provides a dynamic multi-scale perception based self-adaptive The network-integrated prostate lesion area detection method uses the input prostate MR image for lesion area detection, and optimizes and updates through dynamic local pooling and global efficient attention network, achieving high-quality detection in a given prostate image scene. Lesion area detection.

为此，本发明提供了以下技术方案：For this reason, the invention provides the following technical solutions:

一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法，包括如下步骤：A method for detecting a prostate lesion area based on a dynamic multi-scale perception adaptive integration network, comprising the following steps:

A、根据前列腺的数据集获取前列腺输入图像并得到张量；A. Obtain the input image of the prostate according to the dataset of the prostate and obtain a tensor;

B、将张量输入特征编码器，通过特征编码器得到基于每一张图像多尺度的编码特征；B. Input the tensor into the feature encoder, and obtain the multi-scale encoding features based on each image through the feature encoder;

C、针对编码特征，通过对应的特征增强层得到更加丰富的特征表示 C. For coding features, a richer feature representation is obtained through the corresponding feature enhancement layer

D、通过解码器对更加丰富的特征表示进行特征解码，得到最终的前列腺分割预测结果，包括：D. Use the decoder to perform feature decoding on richer feature representations to obtain the final prostate segmentation prediction results, including:

D1、通过带有卷积的多级整合模块，利用层级互补的特性建立多级特征金字塔，将一张图像的多尺度信息自适应的编码进当前的金字塔特征向量中，得到包含有全局和局部的不同层次整合后的特征信息；D1. Through the multi-level integration module with convolution, the multi-level feature pyramid is established by using the complementary characteristics of the levels, and the multi-scale information of an image is adaptively encoded into the current pyramid feature vector, and the global and local features are obtained. The feature information integrated at different levels;

层级互补的机制包括：由于网络提取出来的特征包括了局部特征以及全局特征。其中，局部特征是由浅层的卷积网络提取得到的，其中主要包含的是有关于图像的细节、纹理等信息，这些信息不区分是否是属于前景或者背景的；而全局特征主要是通过深层的卷积模块以及注意力机制提取得到的，全局特征主要包括的就是有关于高级语义信息的部分，比如位置、前景背景的区别等。正是由于这两种特征所关注的重点分别是局部和全局，而针对于前列腺病灶分割而言，位置和细节信息同样重要，所以在整合的时候可以利用不同层级之间全局与局部特征的互补特性，获得更好的预测结果。The mechanism of hierarchical complementarity includes: the features extracted by the network include local features and global features. Among them, local features are extracted by shallow convolutional networks, which mainly contain information about image details, textures, etc., which do not distinguish whether they belong to the foreground or background; while global features are mainly obtained through deep layers. The convolution module and the attention mechanism are extracted, and the global features mainly include parts related to high-level semantic information, such as the position, the difference between the foreground and the background, and so on. It is precisely because the focus of these two features is local and global respectively, and for the segmentation of prostate lesions, location and detail information are equally important, so the complementarity of global and local features between different levels can be used when integrating features to obtain better prediction results.

整合模块包括：通过卷积以及融合的操作，获得连续两层的整合特征图；接着将所获得的整合特征图进行通道变换以及归一化操作得到某一层整合模块的输出。其中，卷积可以使用多个不同大小的卷积核以进行邻域信息的融合或者通道数的变换，以更好的进行下一步的融合。The integration module includes: obtaining two consecutive layers of integrated feature maps through convolution and fusion operations; then performing channel transformation and normalization operations on the obtained integrated feature maps to obtain the output of a certain layer of integration module. Among them, the convolution can use multiple convolution kernels of different sizes to fuse neighborhood information or transform the number of channels for better fusion in the next step.

D2、在多个阶段以渐进的方式对不同层级之间特征层进行动态融合，从而得到更加准确、更加丰富的融合特征表示。而其中的融合权重则是通过卷积操作自动学习得到。通过对不同层级之间的特征进行自适应动态加权融合，得到最终的前列腺分割预测结果图。D2. Dynamic fusion of feature layers between different levels in a gradual manner in multiple stages, so as to obtain more accurate and richer fusion feature representations. The fusion weights are automatically learned through convolution operations. Through adaptive dynamic weighted fusion of features between different levels, the final prediction result map of prostate segmentation is obtained.

进一步地，步骤A包括：Further, step A includes:

根据前列腺数据集划分具有固定数量的训练集和测试集，其中训练集和测试集要都具有两个子集，即输入图像以及真值；Divide a fixed number of training sets and test sets according to the prostate data set, where both the training set and the test set have two subsets, namely the input image and the true value;

首先是对前列腺训练集中的数据进行数据增强，包括但是不限于，对输入的前列腺图像进行调整大小的操作，调整到H×W，而调整的大小可以在使用中设置，使其能够最好匹配网络模型；其次是使用随机概率的随机翻转、随机裁剪；对增强后的图像进行格式转换，使其转化为网络可以处理的张量，从而得到batchsize大小的张量。值得注意的是，对前列腺训练集中的真值同样要进行上述操作。The first is to perform data augmentation on the data in the prostate training set, including but not limited to, resizing the input prostate image to H×W, and the resizing can be set in use so that it can best match Network model; followed by random flipping and random cropping using random probability; format conversion of the enhanced image to convert it into a tensor that can be processed by the network, thereby obtaining a batchsize tensor. It is worth noting that the above operations are also performed on the ground truth values in the prostate training set.

而对于前列腺测试集中的数据，则是与上述操作不同。首先会对输入图像进行尺寸的调整，随后会直接将图像进行张量化处理，得到的batchsize大小的张量后会将之直接送入网络进行测试。For the data in the prostate test set, it is different from the above operation. First, the size of the input image will be adjusted, and then the image will be directly tensorized, and the obtained tensor of batch size will be directly sent to the network for testing.

进一步地，batchsize取8；H×W取224×224。Further, the batchsize is 8; H×W is 224×224.

进一步地，特征编码器为ResNet架构，并丢弃最后两层以保留空间结构，然后在每一层的输出后都添加全局-局部互补模块，以提取多尺度的上下文信息；并且会将除了第一层之外的所有层的特征存入特征金字塔，以方便后续模块的操作。即特征编码器针对每一张图像生成1个特征金字塔，其中包括4个具有不同空间分辨率和通道数的特征图。Further, the feature encoder is a ResNet architecture, and the last two layers are discarded to preserve the spatial structure, and then a global-local complementary module is added after the output of each layer to extract multi-scale context information; and will be except the first The features of all layers other than the layer are stored in the feature pyramid to facilitate the operation of subsequent modules. That is, the feature encoder generates a feature pyramid for each image, including 4 feature maps with different spatial resolutions and channel numbers.

进一步地，ResNet架构为ResNet-50架构，其中将第一个卷积模块的输入通道数修改为1，以适配输入图像的通道数。全局-局部互补模块，是包含两支路的模块，分别是动态局部池化支路以及全局高效注意力的支路。Further, the ResNet architecture is a ResNet-50 architecture, in which the number of input channels of the first convolution module is modified to 1 to adapt to the number of channels of the input image. The global-local complementary module is a module containing two branches, namely the dynamic local pooling branch and the global efficient attention branch.

动态局部池化包括，在网络的每一层根据该层在整个网络的位置，给该层动态分配合适大小的池化层组合，以更好的提取局部信息。即，在网络的低层使用数量多、卷积核尺寸大的组合以适配大尺寸的低层特征图；而在网络的高层，则是使用数量少、卷积核尺寸小的组合以适配小尺寸的高层特征图，同时防止信息丢失。Dynamic local pooling includes, at each layer of the network, according to the position of the layer in the entire network, dynamically assigning a pooling layer combination of an appropriate size to the layer to better extract local information. That is, in the low layer of the network, a large number of combinations with a large convolution kernel size is used to adapt to the large-sized low-level feature map; while in the high layer of the network, a combination of a small number and a small convolution kernel size is used to adapt to the small Dimensions of high-level feature maps while preventing information loss.

全局高效注意力包括，直接对输入的特征图进行相似度比较的操作，从而得到相似度权重图，如果该层特征图的维度较高，可以先降维以节省计算资源。然后利用该权重图与输入图像的结合，计算出在全局范围下，哪些区域应该被网络重点关注。Global efficient attention includes the operation of directly comparing the similarity of the input feature map to obtain a similarity weight map. If the dimension of the feature map of this layer is high, the dimensionality can be reduced first to save computing resources. Then use the combination of the weight map and the input image to calculate which areas should be focused on by the network in the global scope.

进一步地，动态局部池化的卷积核依次是1,3,5,7,9。其中尺寸大于3的使用的均为深度可分离卷积替代普通卷积以降低参数量。Further, the convolution kernels of dynamic local pooling are 1, 3, 5, 7, and 9 in sequence. Among them, those with a size greater than 3 use depth-wise separable convolutions instead of ordinary convolutions to reduce the amount of parameters.

进一步地，步骤D1中，如果是用于邻域融合，则动态内核K_t的大小皆为3×3；而若是用于通道数整合，则大小皆为1×1。Further, in step D1, if it is used for neighborhood fusion, the size of the dynamic kernel K _t is 3×3; if it is used for channel number integration, the size is 1×1.

进一步地，步骤C包括：Further, step C includes:

在对应的特征融合层里，分别采用与特征编码器对应尺度的特征作为输入；In the corresponding feature fusion layer, the features corresponding to the scale of the feature encoder are used as input;

对于该尺度的特征，分别利用双流注意力机制和残差-注意力机制，进一步优化经过特征编码器后的特征表达；For the features of this scale, the two-stream attention mechanism and the residual-attention mechanism are used to further optimize the feature expression after the feature encoder;

对分别使用不同注意力变换后的相同空间分辨率的特征表达，采用像素级的加和得到融合后的更加丰富的特征表示。For the feature representations of the same spatial resolution after different attention transformations, pixel-level summation is used to obtain a richer feature representation after fusion.

进一步地，对于每一个尺度的特征，利用不同的注意力机制，使得网络重点关注不同的信息，从而提升特征表示的丰富度核准确度，包括：Furthermore, for the features of each scale, different attention mechanisms are used to make the network focus on different information, thereby improving the richness and accuracy of feature representation, including:

对于双流注意力，而是利用注意力强调不同区域的特性。使用空间注意力和通道注意力计算出常规情况下网络应该重点关注的前景区域，随后对前景注意力的权重图进行归一化且反转的操作，使其重点关注背景，更好的找出隐藏在背景特征中的前景信息，随后利用像素级的加和以及卷积操作得到融合后的特征表示。For dual-stream attention, attention is instead used to emphasize the features of different regions. Use spatial attention and channel attention to calculate the foreground area that the network should focus on under normal circumstances, and then normalize and invert the weight map of the foreground attention to make it focus on the background and find out better The foreground information hidden in the background features is then used to obtain the fused feature representation by pixel-level summation and convolution operations.

而对于残差-注意力机制，则是利用残差模块与自注意力、互注意力的结合。即对输入的特征表示进行多路的残差化卷积操作。而每个支路之间则是通过自注意力权重和互注意力权重的结合，实现支路之间的信息流动，从而得到更加丰富的特征表达。For the residual-attention mechanism, the combination of the residual module with self-attention and mutual attention is used. That is, a multi-channel residual convolution operation is performed on the input feature representation. Between each branch, the information flow between branches is realized through the combination of self-attention weight and mutual attention weight, so as to obtain richer feature expression.

进一步地，残差-注意力机制中的多支路数目设置为4，自注意力权重图与互注意力权重图之间的加权均为1。Further, the number of multi-branches in the residual-attention mechanism is set to 4, and the weight between the self-attention weight map and the mutual-attention weight map is 1.

本发明提供的上述技术方案具有以下有益效果：The above-mentioned technical scheme provided by the present invention has the following beneficial effects:

本发明提出了一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法，该方法考虑了输入图像中多尺度信息之间的相干性。首先通过特征编码器得到基于每一张图像多尺度的编码特征，在特征编码器中，首先利用了动态池化卷积来重点关注图像的局部信息，搭配使用全局高效注意力机制来重点关注图像的全局信息，通过像素加和以及卷积的操作将二者进行整合，以最大程度的提取输入图像中不同尺度下的有效信息。其次，为了避免在信息提取过程中出现的噪声对预测结果的负面影响，针对每一个尺度下的特征图利用多种注意力结合的方式进行了信息增强的操作。一方面采用了双流注意力机制来使得网络能够更多的关注前景信息并抑制噪声；另一方面则是利用残差注意力来使得网络在提取整合信息的同时，加强了各支路之间的联系，利用各支路之间的互补特性来进行更为准确的预测。实验结果表明，本发明提出的基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法对于前列腺病灶区域分割以及前列腺分割有着准确的预测效果。The present invention proposes a prostate lesion area detection method based on a dynamic multi-scale perception self-adaptive integration network, which considers the coherence between multi-scale information in an input image. First, the multi-scale coding features based on each image are obtained through the feature encoder. In the feature encoder, dynamic pooling convolution is first used to focus on the local information of the image, and the global efficient attention mechanism is used to focus on the image. The global information of the image is integrated through pixel summation and convolution operations to maximize the extraction of effective information at different scales in the input image. Secondly, in order to avoid the negative impact of the noise in the information extraction process on the prediction results, the information enhancement operation is carried out by using multiple attention combinations for the feature maps at each scale. On the one hand, a dual-stream attention mechanism is used to enable the network to pay more attention to foreground information and suppress noise; on the other hand, the residual attention is used to enable the network to extract integrated information while strengthening the relationship between branches. Connections, using the complementary characteristics between branches to make more accurate predictions. Experimental results show that the prostate lesion area detection method based on the dynamic multi-scale perception self-adaptive integration network proposed by the present invention has an accurate prediction effect on prostate lesion area segmentation and prostate segmentation.

基于上述理由本发明可在前列腺及其病灶区域领域广泛推广。Based on the above reasons, the present invention can be widely promoted in the field of prostate and its lesion area.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是本发明实施例中输入的前列腺MR示意图；Fig. 1 is the schematic diagram of prostate MR input in the embodiment of the present invention;

图2是本发明实施例中基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法的流程图；2 is a flowchart of a method for detecting a prostate lesion area based on a dynamic multi-scale perception adaptive integration network in an embodiment of the present invention;

图3是本发明实施例中动态自适应池化模块的结构示意图；Fig. 3 is a schematic structural diagram of a dynamic adaptive pooling module in an embodiment of the present invention;

图4是本发明实施例中双流注意力模块的结构示意图；Fig. 4 is a schematic structural diagram of a dual-stream attention module in an embodiment of the present invention;

图5是本发明实施例中残差-注意力模块内部注意力模块的结构示意图。Fig. 5 is a schematic structural diagram of the internal attention module of the residual-attention module in the embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

参见图2，其示出了本发明实施例中基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法的流程图，该方法包括如下步骤：Referring to FIG. 2 , it shows a flowchart of a method for detecting a prostate lesion area based on a dynamic multi-scale perception adaptive integration network in an embodiment of the present invention. The method includes the following steps:

在具体实施中，步骤A具体包括：In specific implementation, step A specifically includes:

A1、获取前列腺图像：A1. Obtain the prostate image:

输入的前列腺MR示意图如图1所示，将前列腺据集按照一定比例划分为训练集和测试集，并且在训练集和测试集中，都有两个子数据集，即输入图像以及真值。The schematic diagram of the input prostate MR is shown in Figure 1. The prostate data set is divided into a training set and a test set according to a certain ratio, and in the training set and the test set, there are two sub-data sets, namely the input image and the true value.

A2、对于输入前列腺图像，得到通道数为batchsize大小的张量TA2. For the input prostate image, get the tensor T with the number of channels as batchsize

对前列腺训练集中的输入图像以及对应的真值进行数据增强，首先对输入的MR原图(即单通道的灰度图)和GT图采用尺度为s、比例为r的随机裁剪策略，调整大小到H×W(本方法采用的图像分辨率为224×224)，接着使用随机概率的随机翻转；随后对增强后的灰度图像转变为网络可处理的张量，并利用数据加载器，将输出的张量的通道数调整为batchsize大小，使得网络能够更加快速的收敛，此处的batchsize的值设置为8。Data enhancement is performed on the input images in the prostate training set and the corresponding true values. First, the input MR original image (that is, the single-channel grayscale image) and the GT image are randomly cropped with a scale of s and a ratio of r, and the size is adjusted. to H×W (the image resolution used in this method is 224×224), and then use the random flip of random probability; then convert the enhanced grayscale image into a tensor that can be processed by the network, and use the data loader to convert The number of channels of the output tensor is adjusted to the batch size, so that the network can converge more quickly, and the value of batch size here is set to 8.

而对于前列腺测试集中的输入前列腺图像以及对应的真值，则是采用对其进行调整大小的操作，将其调整到H×W(本方法采用的图像分辨率为224×224)，接着对调整后的前列腺灰度图像转变为网络可处理的张量，并利用数据加载器，将输出的张量的通道数调整为batchsize大小，使得网络能够更加快速的收敛，此处的batchsize的值设置为1。For the input prostate image and the corresponding true value in the prostate test set, the operation of resizing it is used to adjust it to H×W (the image resolution used in this method is 224×224), and then adjust The final grayscale image of the prostate is converted into a tensor that can be processed by the network, and the number of channels of the output tensor is adjusted to the batchsize size by using the data loader, so that the network can converge more quickly. The value of batchsize here is set to 1.

B、将张量输入特征编码器，通过特征编码器得到基于每一张图像多尺度的编码特征 B. Input the tensor into the feature encoder, and obtain the multi-scale encoding features based on each image through the feature encoder

在具体实施中，步骤B具体包括：In specific implementation, step B specifically includes:

B1、将所得张量I_t输入特征编码器：B1. Input the obtained tensor I _t into the feature encoder:

所采用的特征编码器为ResNet-50架构，其中其中将第一个卷积模块的输入通道数修改为1，以适配输入图像的通道数，同时去掉了最后的全连接层，以适配后续的模块。The feature encoder used is the ResNet-50 architecture, in which the number of input channels of the first convolution module is modified to 1 to adapt to the number of channels of the input image, and the last fully connected layer is removed to adapt to subsequent modules.

B2、得到多尺度的编码特征 B2. Obtain multi-scale coding features

特征编码器会针对每一张图像生成5个具有不同空间分辨率和通道数的多尺度的特征图，也即其分辨率及通道数(W×H×C)分别为由于第一层的噪声太多，且有用信息较少，所以在后续信息增强中，对只对第1-5层特征进行操作。The feature encoder will generate 5 multi-scale feature maps with different spatial resolutions and channel numbers for each image, that is, Its resolution and number of channels (W×H×C) are respectively Since the first layer has too much noise and less useful information, in the subsequent information enhancement, only the 1-5 layer features are operated.

其中，步骤B2具体包括：Wherein, step B2 specifically includes:

每一层包含一个残差卷积模块以及一个双支路的全局-局部互补模块，且这两个模块是以串联的形式组合而成。Each layer contains a residual convolution module and a two-branch global-local complementary module, and these two modules are combined in series.

残差卷积模块即为加入残差连接的卷积模块。而双支路的全局-局部互补模块，则是包含了一路动态自适应局部池化和一路全局高效注意力的模块。如图3所示，动态自适应局部池化则采用的是不同尺寸大小的卷积核进行卷积，其中，对于低层特征，动态自适应池化模块所包含的卷积数目越多，其卷积核所涉及范围越大，用以弥补其在低层时信息的不足；而对于高层特征而言，由于其信息已经十分充足，需要的更多是整合与转化，所以其卷积核的尺寸就比较小，数量也比较少。整个流程可被表示为：The residual convolution module is the convolution module with residual connections. The dual-branch global-local complementary module includes a dynamic adaptive local pooling module and a global efficient attention module. As shown in Figure 3, dynamic adaptive local pooling uses convolution kernels of different sizes for convolution. Among them, for low-level features, the more convolutions the dynamic adaptive pooling module contains, the convolution The larger the area involved in the product kernel, it is used to make up for the lack of information at the low level; for high-level features, because the information is already sufficient, more integration and transformation are needed, so the size of the convolution kernel is Smaller and fewer in number. The whole process can be expressed as:

layers(i)∈{RL(j)|j＝0,1}∪{OL(k)|1≤k≤5-i,k∈N⁺} (1)layers(i)∈{RL(j)|j＝0,1}∪{OL(k)|1≤k≤5-i,k∈N ⁺ } (1)

其中layers(i)代表的是第i层的特征图。RL则是代表着对于所有包含动态自适应局部池化模块的特征层而言是必选的部分，RL部分包含了一路全局平均池化、上采样以及一路卷积核尺寸为1的卷积操作。而OL部分则代表对于不同的特征层，OL部分的组成是不同的，其组成的卷积核尺寸和个数是不一样的(即k的取值范围)。而卷积核的大小等于2k-1。Where layers(i) represents the feature map of the i-th layer. RL represents a mandatory part for all feature layers that contain dynamic adaptive local pooling modules. The RL part includes a global average pooling, upsampling, and a convolution operation with a convolution kernel size of 1. . The OL part means that for different feature layers, the composition of the OL part is different, and the size and number of convolution kernels are different (that is, the value range of k). The size of the convolution kernel is equal to 2k-1.

而全局高效注意力模块则是利用了相似度的概念，结合了自注意力机制，增强了网络关注全局区域的能力，使得网络能够更好的从全局视野下学习到更多的信息，从而作出更加准确的预测。并且使用残差连接，防止在网络运行的时候出现的信息丢失情况。整个流程可被表示为：The global high-efficiency attention module uses the concept of similarity and combines the self-attention mechanism to enhance the ability of the network to focus on the global area, so that the network can better learn more information from the global perspective, so as to make more accurate predictions. And use residual connections to prevent information loss when the network is running. The whole process can be expressed as:

其中，x代表的是输入的特征图，Normalize是归一化操作。Among them, x represents the input feature map, and Normalize is the normalization operation.

在么一层的最后，会通过像素级别的加和以及卷积的操作，将两个支路的特征图进行整合。At the end of each layer, the feature maps of the two branches are integrated through pixel-level summation and convolution operations.

在具体实施中，步骤C具体包括：In specific implementation, step C specifically includes:

C1、将每个尺度的特征送入到双流注意力模块用以增强前景，并抑制背景噪声：C1, the features of each scale Send it to the dual-stream attention module to enhance the foreground and suppress background noise:

常规的注意力只能够重点关注前景区域，而忽略背景区域中潜藏的前景信息，所以在本发明实施例中的双流注意力模块中，结合了空间注意力机制以及通道注意力机制，使得网络在关注前景区域的同时，也能够关注背景区域中潜藏的前景信息，如图4所示。在双流注意力模块的最后，会将前景支路与背景支路的特征通过卷积的形式进行整合，学习其中的信息，以便于更加精确的分割。Conventional attention can only focus on the foreground area, while ignoring the hidden foreground information in the background area. Therefore, in the dual-stream attention module in the embodiment of the present invention, the spatial attention mechanism and the channel attention mechanism are combined to make the network in the While paying attention to the foreground area, it can also pay attention to the hidden foreground information in the background area, as shown in Figure 4. At the end of the dual-stream attention module, the features of the foreground branch and the background branch will be integrated in the form of convolution, and the information in it will be learned for more accurate segmentation.

C2、将每个尺度的特征送入到残差-注意力模块，用多支路的形式进一步整合信息，并利用注意力机制，以残差的形式增强多支路之间的关联，以共同突出前景目标。C2, the features of each scale Send it to the residual-attention module to further integrate information in the form of multi-branches, and use the attention mechanism to enhance the association between multiple branches in the form of residuals to jointly highlight foreground targets.

如图5所示，残差-注意力模块中，总计有4条支路，每条支路中都有2个残差卷积模块。在每个支路之间，使用注意力模块加强关联。注意力模块分为两个部分，一部分的主题是自注意力，用于产生有关于自己的权重加权矩阵；而另一部分就是互注意力，利用两条支路的相似度来产生这两条支路进行信息交流的时候的权重加权矩阵。而每条支路的输出，都是由相邻支路的特征图与自己支路的特征图经由上述计算出来的矩阵加权求和得到。最后会将4个支路的结果进行整合，得到残差-注意力模块的输出。As shown in Figure 5, there are a total of 4 branches in the residual-attention module, and each branch has 2 residual convolution modules. Between each branch, an attention module is used to strengthen the association. The attention module is divided into two parts, one part is self-attention, which is used to generate a weighted matrix about itself; the other part is mutual attention, which uses the similarity of the two branches to generate the two branches The weight weighting matrix when the road communicates with each other. The output of each branch is obtained by the matrix weighted summation of the feature map of the adjacent branch and the feature map of its own branch through the above calculation. Finally, the results of the four branches will be integrated to obtain the output of the residual-attention module.

D、通过解码器对更加丰富的特征表示进行特征解码，得到最终的前列腺病灶区域的分割预测结果，D. Represent richer features through the decoder Perform feature decoding to obtain the final segmentation prediction result of the prostate lesion area,

解码器中包括整合模块以及预测模块，解码器接收到来自之前模块的由4个不同尺度特征图组成的特征金字塔。解码器的整合模块是从尺寸最小、层数最高的特征图(即7×7×2048)开始。通过对连续两个尺寸的特征图进行逐步的整合，从而得到最终的一个整合特征图。而整合操作主要分为按通道方向上的特征图堆叠、特征变换两个步骤，而最终的整合特征图会被送入一个预测模块，预测模块会根据任务要求的分类数目，将整合特征图按类别进行预测。The decoder includes an integration module and a prediction module, and the decoder receives a feature pyramid composed of four feature maps of different scales from the previous module. The integration module of the decoder starts from the feature map with the smallest size and the highest number of layers (i.e., 7×7×2048). The final integrated feature map is obtained by gradually integrating the feature maps of two consecutive sizes. The integration operation is mainly divided into two steps: feature map stacking and feature transformation in the channel direction, and the final integrated feature map will be sent to a prediction module. The prediction module will integrate the feature map according to the number of classifications required by the task. category to predict.

具体步骤如下：Specific steps are as follows:

D1、通过带有卷积的多级整合模块，利用层级互补的特性建立多级特征金字塔，将一张图像的多尺度信息自适应的编码进当前的金字塔特征向量中，得到包含有全局和局部的不同层次整合后的特征信息。层级互补的机制包括：由于网络提取出来的特征包括了局部特征以及全局特征。其中，局部特征是由浅层的卷积网络提取得到的，其中主要包含的是有关于图像的细节、纹理等信息，这些信息不区分是否是属于前景或者背景的；而全局特征主要是通过深层的卷积模块以及注意力机制提取得到的，全局特征主要包括的就是有关于高级语义信息的部分，比如位置、前景背景的区别等。正是由于这两种特征所关注的重点分别是局部和全局，而针对于前列腺病灶分割而言，位置和细节信息同样重要，所以在整合的时候可以利用不同层级之间全局与局部特征的互补特性，获得更好的预测结果。D1. Through the multi-level integration module with convolution, the multi-level feature pyramid is established by using the complementary characteristics of the levels, and the multi-scale information of an image is adaptively encoded into the current pyramid feature vector, and the global and local features are obtained. The feature information integrated at different levels. The mechanism of hierarchical complementarity includes: the features extracted by the network include local features and global features. Among them, local features are extracted by shallow convolutional networks, which mainly contain information about image details, textures, etc., which do not distinguish whether they belong to the foreground or background; while global features are mainly obtained through deep layers. The convolution module and the attention mechanism are extracted, and the global features mainly include parts related to high-level semantic information, such as the position, the difference between the foreground and the background, and so on. It is precisely because the focus of these two features is local and global respectively, and for the segmentation of prostate lesions, location and detail information are equally important, so the complementarity of global and local features between different levels can be used when integrating features to obtain better prediction results.

D2、在多个阶段以渐进的方式对不同层级之间特征层进行动态融合，从而得到更加准确、更加丰富的融合特征表示。而其中的融合权重则是通过卷积操作自动学习得到。通过对不同层级之间的特征进行自适应动态加权融合，得到最终的前列腺分割预测结果图。E、动态上下文感知滤波网络的训练及优化：D2. Dynamic fusion of feature layers between different levels in a gradual manner in multiple stages, so as to obtain more accurate and richer fusion feature representations. The fusion weights are automatically learned through convolution operations. Through adaptive dynamic weighted fusion of features between different levels, the final prediction result map of prostate segmentation is obtained. E. Training and optimization of dynamic context-aware filtering network:

本方法整体可分为训练和推理两个阶段，在训练时以训练集的张量作为输入，得到训练好的网络参数；在推理阶段使用训练阶段保存的参数进行测试，得到最终的显著性预测结果。The method as a whole can be divided into two stages of training and inference. During training, the tensor of the training set is used as input to obtain the trained network parameters; in the inference stage, the parameters saved in the training stage are used for testing to obtain the final saliency prediction. result.

本发明实施例在Pytorch框架下实现，其中训练阶段时使用ADAMW优化器，学习率为1e^-3，权重衰减系数为5e^-2，并且批处理大小8。在训练期间，图像的空间分辨率为224×224，但是模型可以是在测试时以全卷积方式应用于任意分辨率。The embodiment of the present invention is realized under the framework of Pytorch, wherein the ADAMW optimizer is used in the training phase, the learning rate is 1e ^-3 , the weight decay coefficient is 5e ^-2 , and the batch size is 8. During training, the images have a spatial resolution of 224×224, but the model can be applied to any resolution in a fully convolutional manner at test time.

本发明实施例提出的基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法，采用动态局部池化搭配全局注意力机制，将输入前列腺MR图像中的上下文信息编入当前的特征矩阵之中，得到包含有全局信息和局部细节信息的特征向量，适应目标的尺度变化。其次，为了避免对最终显著性结果产生误导，本发明中采用多种注意力互补感知融合方式，对每个尺度下生成的特征图使用了多种注意力进行前景的增强、背景噪声的抑制。实验结果表明，本发明提出的一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法对于很多前列腺尺寸变化以及输入图像质量低下的场景都能取得准确的预测结果。The prostate lesion area detection method based on the dynamic multi-scale perception self-adaptive integration network proposed by the embodiment of the present invention adopts the dynamic local pooling and the global attention mechanism, and compiles the context information in the input prostate MR image into the current feature matrix , to obtain a feature vector containing global information and local detail information, which adapts to the scale change of the target. Secondly, in order to avoid misleading the final saliency results, the present invention adopts multiple attention complementary perceptual fusion methods, and uses multiple attentions to enhance the foreground and suppress background noise for the feature maps generated at each scale. Experimental results show that a prostate lesion area detection method based on a dynamic multi-scale perceptual self-adaptive integration network proposed by the present invention can achieve accurate prediction results for many scenes with prostate size changes and low-quality input images.

对应于上述实施例中的前列腺病灶区域检测方法，本发明实施例中还提供了一种基于动态多尺度感知自适应整合网络的前列腺病灶区域检测装置，包括：Corresponding to the prostate lesion area detection method in the above-mentioned embodiments, an embodiment of the present invention also provides a prostate lesion area detection device based on a dynamic multi-scale perception self-adaptive integration network, including:

张量单元，用于根据前列腺癌的数据集获取前列腺输入图像并得到张量；The tensor unit is used to obtain the prostate input image and obtain the tensor according to the prostate cancer data set;

编码单元，用于将张量单元得到的张量输入特征编码器，通过特征编码器利用动态池化卷积关注图像的局部信息，结合全局高效注意力机制关注图像的全局信息，通过像素加和以及卷积的操作将局部信息和全局信息进行整合，得到基于每一张图像多尺度的编码特征；The encoding unit is used to input the tensor obtained by the tensor unit into the feature encoder, and use the dynamic pooling convolution to focus on the local information of the image through the feature encoder, and combine the global efficient attention mechanism to focus on the global information of the image, and through pixel summation And the convolution operation integrates local information and global information to obtain multi-scale coding features based on each image;

增强单元，用于针对编码单元得到的编码特征，通过对应的特征增强层得到特征表示；The enhancement unit is used to obtain the feature representation through the corresponding feature enhancement layer for the coding feature obtained by the coding unit;

预测单元，用于通过解码器对增强单元得到的特征表示进行特征解码，得到最终的前列腺病灶区域分割预测结果。The prediction unit is configured to perform feature decoding on the feature representation obtained by the enhancement unit through the decoder to obtain a final prediction result of prostate lesion region segmentation.

对于本发明实施例的前列腺病灶区域检测装置而言，由于其与上面实施例中的前列腺病灶区域检测方法相对应，所以描述的比较简单，相关相似之处请参见上面实施例中前列腺病灶区域检测方法部分的说明即可，此处不再详述。For the prostate lesion area detection device in the embodiment of the present invention, because it corresponds to the prostate lesion area detection method in the above embodiment, the description is relatively simple. For related similarities, please refer to the prostate lesion area detection in the above embodiment. The description in the method part is sufficient, and will not be described in detail here.

本发明实施例中还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机指令，计算机指令用于使计算机执行如上述基于动态多尺度感知自适应整合网络的前列腺病灶区域检测方法。An embodiment of the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to perform the detection of prostate lesion areas based on the dynamic multi-scale perception adaptive integration network as described above method.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. A prostate lesion area detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that, comprises the steps:

Get the prostate input image and get the tensor according to the prostate cancer dataset;

The tensor is input into the feature encoder, and the feature encoder uses the dynamic pooling convolution to focus on the local information of the image, and combines the global efficient attention mechanism to focus on the global information of the image, and through the operation of pixel addition and convolution, the Local information and global information are integrated to obtain multi-scale coding features based on each image;

Obtaining a feature representation through a corresponding feature enhancement layer for the encoded feature;

The feature representation is decoded by a decoder to obtain a final prostate lesion region segmentation prediction result.

2. according to claim 1, a kind of prostate lesion region detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that, according to the data set of prostate cancer, obtain prostate input image and obtain tensor, comprising:

The dataset partition according to the prostate has a fixed number of training and testing sets, where both the training and testing sets have two subsets, the input image and the ground truth;

Perform data augmentation on the data in the prostate training set;

Perform format conversion on the enhanced image to convert it into a tensor that the network can handle, and obtain a tensor of batchsize size;

For the data in the prostate test set, the size of the input image is adjusted, and the image is directly tensorized to obtain a batchsize tensor.

3. according to claim 1, a kind of prostate lesion area detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that: described feature coder is ResNet structure, and discards last two layers to retain spatial structure, then After the output of each layer, a global-local complementary module is added to extract multi-scale context information; and the features of all layers except the first layer are stored in the feature pyramid.

4. A video saliency detection method based on a dynamic context-aware filtering network according to claim 3, wherein the global-local complementary module includes a dynamic local pooling branch and a global high-efficiency attention branch;

The dynamic local pooling includes, at each layer of the network, according to the position of the layer in the entire network, dynamically assigning a pooling layer combination of an appropriate size to the layer to better extract local information;

The global high-efficiency attention includes the operation of directly comparing the similarity of the input feature map to obtain a similarity weight map; using the combination of the weight map and the input image to calculate the global range, which needs to be focused on area.

5. according to claim 1, a kind of prostate lesion area detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that: carry out feature decoding to described feature representation by decoder, obtain final prostate lesion area segmentation prediction Results, including:

Through the multi-level integration module with convolution, the multi-level feature pyramid is established by using the complementary characteristics of the levels, and the multi-scale information of an image is adaptively encoded into the current pyramid feature vector, and the global and local differences are obtained. Feature information after hierarchical integration;

The feature layers between different levels are adaptively and dynamically weighted and fused in a gradual manner in multiple stages to obtain the final prediction result map of prostate segmentation, in which the fusion weight is automatically learned through convolution operation.

6. according to claim 1, a kind of prostate lesion area detection method based on dynamic multi-scale perception self-adaptive integration network, it is characterized in that, for described encoding feature, obtain feature representation by corresponding feature enhancement layer, comprising:

In the corresponding feature fusion layer, the features corresponding to the scale of the feature encoder are used as input;

For the features of this scale, the two-stream attention mechanism and the residual-attention mechanism are used to further optimize the feature expression after the feature encoder;

For the feature representations of the same spatial resolution after different attention transformations, pixel-level summation is used to obtain a richer feature representation after fusion.

7. according to claim 6, a kind of prostate lesion region detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that, described dual-stream attention mechanism uses spatial attention and channel attention to calculate The network should focus on the foreground area, and then normalize and invert the weight map of the foreground attention to make it focus on the background, and then use pixel-level summation and convolution operations to obtain the fused feature representation;

The residual-attention mechanism uses the residual module, self-attention, and mutual-attention to perform multi-channel residual convolution operations on the input feature representation, and each branch is passed through the self-attention weight Combining with mutual attention weights, the information flow between branches is realized.

8. according to claim 7, a kind of prostate lesion area detection method based on dynamic multi-scale perception self-adaptive integration network, is characterized in that: the multi-branch number in the residual-attention mechanism is set to 4, self-attention The weight between the force weight map and the mutual attention weight map is 1.

9. A prostate lesion area detection device based on a dynamic multi-scale perception self-adaptive integration network, characterized in that it comprises:

The tensor unit is used to obtain the prostate input image and obtain the tensor according to the prostate cancer data set;

The encoding unit is configured to input the tensor obtained by the tensor unit into a feature encoder, and use the dynamic pooling convolution to focus on the local information of the image through the feature encoder, and pay attention to the global information of the image in combination with a global efficient attention mechanism, Integrate local information and global information through pixel summation and convolution operations to obtain multi-scale coding features based on each image;

An enhancement unit, configured to obtain a feature representation through a corresponding feature enhancement layer for the coding features obtained by the coding unit;

The prediction unit is configured to perform feature decoding on the feature representation obtained by the enhancement unit through the decoder to obtain a final prediction result of prostate lesion region segmentation.

10. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make a computer execute a computer-based computer program according to any one of claims 1-8. Prostate lesion region detection method with dynamic multi-scale perceptual adaptive integration network.