WO2022198808A1 - 基于双线性注意力网络的医学影像数据分类方法及系统 - Google Patents

基于双线性注意力网络的医学影像数据分类方法及系统 Download PDF

Info

Publication number
WO2022198808A1
WO2022198808A1 PCT/CN2021/099784 CN2021099784W WO2022198808A1 WO 2022198808 A1 WO2022198808 A1 WO 2022198808A1 CN 2021099784 W CN2021099784 W CN 2021099784W WO 2022198808 A1 WO2022198808 A1 WO 2022198808A1
Authority
WO
WIPO (PCT)
Prior art keywords
attention
feature map
network
bilinear
spatial
Prior art date
Application number
PCT/CN2021/099784
Other languages
English (en)
French (fr)
Inventor
马凤英
纪鹏
曹茂永
姚辉
薛景瑜
Original Assignee
齐鲁工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 齐鲁工业大学 filed Critical 齐鲁工业大学
Publication of WO2022198808A1 publication Critical patent/WO2022198808A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present disclosure belongs to the technical field of image data processing, and in particular, relates to a method and system for classifying medical image data based on a bilinear attention network.
  • Alzheimer's disease most of the diagnosis is mainly based on clinical indicators and medical images.
  • the morphology of the relevant regions of the brain is observed in the medical diagram, and the clinical indicators are determined by measuring the biomarkers of the cerebrospinal fluid.
  • the evaluation generally needs to be combined with the corresponding neuropsychological evaluation. Since there are many reasons for the disease to be combined, it has been challenging to accurately and effectively diagnose Alzheimer's disease.
  • auxiliary diagnosis magnetic resonance imaging is most commonly used as the basis for computer-aided diagnosis of Alzheimer's disease, and corresponding diagnosis results are made by measuring medical images in specific regions.
  • corresponding diagnosis results are made by measuring medical images in specific regions.
  • most of the traditional Alzheimer's disease medical imaging data processing methods have certain deficiencies, and cannot accurately analyze and utilize the corresponding data.
  • the present disclosure provides a method for classifying medical image data based on a bilinear attention network, which can effectively utilize and process image data and improve the accuracy of data classification.
  • a method for classifying medical image data based on bilinear attention network including:
  • the above-mentioned spatial feature map is used as the first input feature map, and the channel attention mechanism is used to process the first input feature map to generate the final channel attention feature map;
  • a two-dimensional spatial attention map is output after applying the spatial attention mechanism to the second input feature map
  • the channel attention mechanism and the spatial attention mechanism realize data processing based on a bilinear attention network, and obtain fusion features, which are used for classifier classification.
  • the first input feature map is first passed through the global maximum pooling layer and the global average pooling layer, respectively, and then through the multi-layer perceptron respectively, after the function The activation operation is performed to generate the final channel attention feature map.
  • the spatial attention mechanism when the spatial attention mechanism is applied to the second input feature map, the maximum pooling layer and the average pooling layer are applied to the second input feature map to generate two-dimensional channel information maps of the two feature maps, and the The results are aggregated to generate effective feature descriptors, and then the above information is connected and convolved through a standard convolutional layer, and finally a two-dimensional spatial attention map is output.
  • the bilinear attention network includes a residual learning module, which is used to directly transmit the original input information to the next layer of network through skip connections, and the gradient is also directly transmitted through skip connections during backpropagation. top layer.
  • the bilinear attention network is a bilinear network constructed by using ResNet50 as a skeleton network, and ResNet50 is a network built based on Bottleneck. Layers make up the entire network.
  • a further technical solution is to add a bilinear pooling layer to the skeleton network of ResNet50.
  • the output matrix is subjected to matrix operations, and then obtained through the pooling layer.
  • the fusion feature is finally obtained through the normalization operation, which is used for classifier classification.
  • a medical image data classification system based on bilinear attention network including:
  • Two-dimensional convolutional CNN module for extracting the spatial feature map of a single static image
  • Attention mechanism module including channel attention module and spatial attention module
  • the channel attention module is used to use the above-mentioned spatial feature map as the first input feature map, and use the channel attention mechanism to process the first input feature map to generate the final channel attention feature map;
  • the spatial attention module outputs a two-dimensional spatial attention map after applying the spatial attention mechanism to the second input feature map
  • the channel attention mechanism and the spatial attention mechanism realize data processing based on the bilinear attention network, and obtain fusion features, which are used for classifier classification.
  • a further technical solution also includes: a bilinear residual network structure module, based on the bilinear attention network, the original input information is directly transmitted to the next layer network through a skip connection, and the gradient is also skipped during backpropagation. Connections are passed directly to the previous layer.
  • the attention mechanism module is embedded behind the last convolutional layer of each block of the bilinear attention network
  • each block When embedding, each block is disconnected, the convolutional layers are not directly connected, and the output of the first convolutional layer of each block is used as the input of the channel attention module, which is used to ensure that the input image is fully extracted in the backbone network. features while removing redundant information.
  • the present disclosure solves the problem that most traditional fine-grained visual recognition methods ignore the problem that some feature interactions between layers and feature learning are interrelated and can reinforce each other, and synergistic attention is extended to Bilinear attention can perform multi-attention distribution analysis on data of different modalities, which can better utilize the diversity of data than the analysis using a single compressed attention distribution.
  • Bilinear attention can perform multi-attention distribution analysis on data of different modalities, which can better utilize the diversity of data than the analysis using a single compressed attention distribution.
  • it can fully extract the features of multi-modal input image data, and at the same time remove redundant information, and improve the accuracy of Alzheimer's disease medical image data classification on the basis of ensuring the convergence speed. Rate.
  • 1 is a two-dimensional convolution structure diagram of an example of the present disclosure
  • FIG. 2 is a schematic diagram of a hybrid attention module according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an example channel attention module according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a spatial attention module according to an example of the present disclosure.
  • Conv Block convolution block
  • FIG. 6 is a schematic structural diagram of an identity block (Identity Block) of an example of the present disclosure.
  • FIG. 7 is a schematic diagram of the structure of the ResNet50 skeleton network adopted in an example of the present disclosure.
  • FIG. 8 is a schematic diagram of a bilinear residual network structure module improved by an example of the present disclosure.
  • FIG. 9 is a schematic diagram of embedding a hybrid attention module into a network structure according to an embodiment of the present disclosure.
  • FIG. 10 is an overall structural composition diagram of an embodiment of the present disclosure.
  • FIG. 11 is an extracted feature map of an example of the present disclosure.
  • This embodiment discloses a method for classifying medical image data based on a bilinear attention network, including:
  • the above-mentioned spatial feature map is used as the first input feature map, and the channel attention mechanism is used to process the first input feature map to generate the final channel attention feature map;
  • a two-dimensional spatial attention map is output after applying the spatial attention mechanism to the second input feature map
  • the channel attention mechanism and the spatial attention mechanism realize data processing based on a bilinear attention network, and obtain fusion features, which are used for classifier classification.
  • the first input feature map is first passed through the global maximum pooling layer and the global average pooling layer, and then through the multi-layer perceptron respectively, and the function is activated through the function to generate The final channel attention feature map.
  • the maximum pooling layer and the average pooling layer are applied to the second input feature map to generate two-dimensional channel information maps of the two feature maps, and the results are aggregated to generate Effective feature descriptors, and then the above information is connected and convolved through a standard convolution layer, and finally a two-dimensional spatial attention map is output.
  • This embodiment discloses a medical image data classification system based on bilinear attention network, including:
  • Two-dimensional convolutional CNN module for extracting spatial features of a single static image
  • the attention mechanism module is used to link the independent attention distribution originally established for each modal data of the input, focusing on the interaction between the multi-modal input data;
  • the bilinear residual network structure module directly transmits the original input information to the next layer of the network through skip connections, and the gradient is also directly transmitted to the previous layer through skip connections during backpropagation.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer.
  • a neuron is only connected to some of its neighbors.
  • a convolutional layer of CNN it usually contains several feature planes (featureMap), each feature plane is composed of some neurons arranged in a rectangle, and neurons in the same feature plane share weights, and the shared weights here are the volumes accumulated nucleus.
  • featureMap feature planes
  • each feature plane is composed of some neurons arranged in a rectangle, and neurons in the same feature plane share weights, and the shared weights here are the volumes accumulated nucleus.
  • the convolution kernel is generally initialized in the form of a random decimal matrix.
  • the convolution kernel will learn to obtain reasonable weights.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the network, while reducing the risk of overfitting.
  • Subsampling is also called pooling, usually in two forms: mean pooling and max pooling. Subsampling can be seen as a special kind of convolution process. Convolution and subsampling greatly simplifies model complexity and reduces model parameters. Convolution is divided into one-dimensional convolution, two-dimensional convolution and three-dimensional convolution. According to the requirements of experimental data, the embodiment of the present disclosure uses the most commonly used two-dimensional convolution CNN to extract the spatial features of a single static image. , as shown in Figure 1, which is a simulation diagram of two-dimensional convolution extraction features. According to the template matrix arrangement, the pixel data of the original image is converted, and the feature map of the corresponding size is obtained.
  • the visual attention mechanism is a brain signal processing mechanism unique to human vision. Human vision obtains the target area that needs to be focused on by quickly scanning the global image, which is generally referred to as the focus of attention, and then invests more attention resources in this area to obtain more detailed information about the target that needs attention. And suppress other useless information.
  • the attention mechanism in deep learning is essentially similar to the selective visual attention mechanism of human beings, and the core goal is to select information that is more critical to the current task goal from a large number of information.
  • attention mechanism in neural network it can be divided into two types, channel attention mechanism and spatial attention mechanism.
  • the channel attention mechanism is to let the network focus on different filters, and the spatial attention mechanism is to focus on the key areas of image information.
  • the bilinear idea is added to form a new attention mechanism, as shown in Figure 2.
  • the mixed attention mechanism combines the original independent input for each modal data. Attention distributions are linked, focusing on interactions between multimodal input data.
  • the channel attention module it makes sense for the channel attention to focus on which part of the input image.
  • the input feature map feature map
  • global max pooling global maximum pooling layer
  • global average pooling global average pooling
  • MLP Multilayer Perceptron
  • the sigmoid function is activated to generate the final channel attention feature map as shown in Figure 3.
  • the element-wise multiplication operation is performed on the channel attention feature map and the input feature map to generate the input features required by the spatial attention module. The specific calculations are shown in 2.1.1 and 2.1.2.
  • W 0 , W 1 are the shared weights of the MLP, W 0 ⁇ R C/r ⁇ C , W 1 ⁇ R C ⁇ C/r , ⁇ is the sigmoid function, and r is the reduction rate.
  • the spatial attention module is generated by exploiting the spatial relationship between features. Different from the channel attention module, the spatial attention focuses on the position and spatial information between features, which is a supplement to the channel attention.
  • the feature map output by the channel attention module is used as the input feature map of this module.
  • the feature map is applied to the maximum pooling layer and the average pooling layer operation to generate two-dimensional channel information maps F max and F avg of the two feature maps, and the results are aggregated to generate an effective feature descriptor.
  • the convolutional layer concatenates these information and performs convolution, and finally outputs a two-dimensional spatial attention map, as shown in Figure 4, and the specific calculations are shown in 2.2.1 and 2.2.2.
  • is the sigmoid function
  • n is the size of the convolution kernel
  • the two are spliced to form a mixed attention module, as shown in Figure 2.
  • ResNet50 is a network based on Bottleneck. Each layer is constructed of several blocks, and then the entire network is composed of layers. The basic network structure is shown in Table 1.
  • Convolution 2_x corresponds to layer 1
  • convolution 3_x corresponds to layer 2
  • convolution 4_x corresponds to layer 3
  • convolution 5_x corresponds to layer 4.
  • ResNet makes the nonlinear layer satisfy H (x, wh ), and adopts a short-connection (shortcut) structure to directly introduce a short connection from the input to the output of the nonlinear layer, and the entire mapping becomes as shown in Equation 3.1, which is also ResNet's core formula.
  • ResNet has two basic blocks, one is the identity block, the input and output dimensions are the same, and multiple can be connected in series; the other is the convolution block (Conv Block), the input and output dimensions It is not the same and cannot be connected in series. Its function is to change the dimension of the feature vector, such as the convolution block structure in Figure 5 and the characteristic block structure shown in Figure 6.
  • the overall ResNet50 skeleton structure is shown in Figure 7.
  • the improved network structure is shown in Figure 8.
  • the present invention embeds the mixed attention mechanism module in the built and improved ResNet50 network structure.
  • the attention mechanism module can improve the performance of the network model and can be applied to various networks, but its embedding position and method in different networks are different, and the performance of the model is also different.
  • the mixed attention module needs to be added after the last convolutional layer of each block, as shown in Figure 9.
  • each block is disconnected, and there is no difference between the convolutional layers.
  • the output of the first convolutional layer of each block is used as the input of the mixed-channel attention module.
  • This connection method can ensure that in the backbone network, the model can fully extract the features of the input image, while removing redundant information.
  • the accuracy of model training and speed up the convergence, the overall structure is shown in Figure 10.
  • the idea of the fine-grained classification method in image visual recognition is applied to the classification of medical image data of Alzheimer's disease.
  • a new hierarchical bilinear pooling framework is proposed to obtain the partial feature relationship between layers, and integrate multiple cross-layer bilinear features to enhance its representation ability, which can significantly improve the classification effect.
  • the present disclosure designs a bilinear pooling network with ResNet50 as the skeleton network.
  • the purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the method in Example 1 of the foregoing embodiment when the processor executes the program A step of.
  • the purpose of this embodiment is to provide a computer-readable storage medium.
  • the steps involved in the apparatuses of the second, third, and fourth embodiments above correspond to the method embodiment 1, and the specific implementation can refer to the relevant description part of the embodiment 1.
  • the term "computer-readable storage medium” should be understood to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying for use by a processor
  • the executed set of instructions causes the processor to perform any of the methods in this disclosure.
  • modules or steps of the present disclosure can be implemented by a general-purpose computer device, or alternatively, they can be implemented by a program code executable by the computing device, so that they can be stored in a storage device.
  • the device is executed by a computing device, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps in them are fabricated into a single integrated circuit module for implementation.
  • the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

本公开提出了基于双线性注意力网络的医学影像数据分类方法及系统,包括:提取单张静态图像的空间特征图;将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;其中,所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。在保证收敛速度的基础上提高了阿尔兹海默症医学影像数据分类的准确率。

Description

基于双线性注意力网络的医学影像数据分类方法及系统 技术领域
本公开属于图像数据处理技术领域,尤其涉及基于双线性注意力网络的医学影像数据分类方法及系统。
背景技术
本部分的陈述仅仅是提供了与本公开相关的背景技术信息,不必然构成在先技术。
在医学影像数据领域中,由于其数据为多模态数据,目前的数据处理中大多数传统的细粒度视觉识别方法忽略了层间部分特征交互和特征学习是相互关联并可以相互加强的这一问题,因此,医学影像数据的分类不够准确,在辅助诊断过程中,现有的分类结果不精确继而影响后续的判断。
例如,在阿尔茨海默症相关的诊断中,大多数诊断主要依据临床指标和医学图像,在医学图中观察大脑相关区域的形态,临床指标中通过测定脑脊液的生物标志物等等来进行诊断评估,一般还需要结合相应的神经心理学进行评估,由于病情需要结合的原因较多,所以目前为止要想准确的对阿尔茨海默症进行有效诊断具有一定的挑战性。
在辅助诊断过程中,最常采用是利用磁共振成像来作为计算机辅助阿尔茨海默症的诊断依据,通过对特定的区域医学影像进行测量来做出相应的诊断结果。但是大多属传统的阿尔茨海默症医学影像数据的处理方式存在着一定的不足,无法对相应的数据进行准确分析和利用。
发明内容
为克服上述现有技术的不足,本公开提供了基于双线性注意力网络的医学影像数据分类方法,能够有效利用和处理影像数据,改善提高数据分类精度。
为实现上述目的,本公开的一个或多个实施例提供了如下技术方案:
第一方面,公开了基于双线性注意力网络的医学影像数据分类方法,包括:
提取单张静态图像的空间特征图;
将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;
将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;
针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;
其中,所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。
进一步的技术方案,利用通道注意力机制处理第一输入特征图时,先将第一输入特征图分别经过全局最大池化层和全局平均池化层,然后再分别经过多层感知机,经过函数进行激活操作,生成最终的通道注意力特征图。
进一步的技术方案,针对第二输入特征图应用空间注意力机制处理时,将第二输入特征图应用最大池化层和平均池化层操作,生成两个特征映射的二维通道信息图,将结果做聚合操作生成有效的特征描述符,再经过标准卷积层将上述信息连接并进行卷积,最后输出二维空间注意力图。
进一步的技术方案,所述双线性注意力网络包括残差学习模块,用于将原始输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层。
进一步的技术方案,所述双线性注意力网络为采用ResNet50作为骨架网络来构建的双线性网络,ResNet50是基于Bottleneck搭成的网络,每个层都由若干的块搭建而成,再由层组成整个网络。
进一步的技术方案,在ResNet50的骨架网络中添加双线性池化层,对于卷积网络从输入图像在某一位置上提取的两个特征,将输出矩阵通过矩阵运算,之后通过池化层得到关于两个特征的线性向量,最后通过归一化操作得到融合特征,用于分类器分类。
第二方面,公开了基于双线性注意力网络的医学影像数据分类系统,包括:
二维卷积CNN模块,用于提取单张静态图像的空间特征图;
注意力机制模块,包括通道注意力模块及空间注意力模块;
所述通道注意力模块用于将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;
将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;
所述空间注意力模块,针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;
所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。
进一步的技术方案,还包括:双线性残差网络结构模块,基于双线性注意力网络将原始输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层。
进一步的技术方案,所述注意力机制模块嵌入双线性注意力网络每个块最后卷积层的后面;
嵌入时,将每个块断开,卷积层间不直接相连,每个块第一个卷积层的输出作为通道注意力模块的输入,用于保证在主干网络中,充分提取输入图像的特征,同时去掉冗余信息。
以上一个或多个技术方案存在以下有益效果:
本公开针对医学影像的多模态数据,解决了大多数传统的细粒度视觉识别方法忽略了层间部分特征交互和特征学习是相互关联并可以相互加强的这一问题,并且协同注意力扩展为双线性注意力,对不同模态的数据可以进行多注意力分布的分析,可比使用单个压缩注意力分布的分析,更好的利用数据的多样性。能够在接受多模态输入图像数据的基础上,充分提取多模态输入图像数据的特征,同时去掉冗余信息,在保证收敛速度的基础上提高了阿尔兹海默症医 学影像数据分类的准确率。
本发明附加方面的优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
构成本公开的一部分的说明书附图用来提供对本公开的进一步理解,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。
图1是本公开实施例子二维卷积结构图;
图2是本公开实施例子混合注意力模块示意图;
图3是本公开实施例子通道注意力模块示意图;
图4是本公开实施例子空间注意力模块示意图;
图5是本公开实施例子卷积块(Conv Block)结构示意图;
图6是本公开实施例子特性块(Identity Block)结构示意图;
图7是本公开实施例子采用的ResNet50骨架网络结构示意图;
图8是本公开实施例子改进的双线性残差网络结构模块示意图;
图9是本公开实施例子将混合注意力模块嵌入网络结构示意图;
图10是本公开实施例子的整体结构组成图。
图11是本公开实施例子的提取的特征图。
具体实施方式
应该指出,以下详细说明都是示例性的,旨在对本公开提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本公开的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。
在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
实施例一
本实施例公开了基于双线性注意力网络的医学影像数据分类方法,包括:
提取单张静态图像的空间特征图;
将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;
将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;
针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;
其中,所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。
利用通道注意力机制处理第一输入特征图时,先将第一输入特征图分别经过全局最大池化层和全局平均池化层,然后再分别经过多层感知机,经过函数进行激活操作,生成最终的通道注意力特征图。
针对第二输入特征图应用空间注意力机制处理时,将第二输入特征图应用最大池化层和平均池化层操作,生成两个特征映射的二维通道信息图,将结果做聚合操作生成有效的特征描述符,再经过标准卷积层将上述信息连接并进行卷积,最后输出二维空间注意力图。
实施例二
本实施例公开了基于双线性注意力网络的医学影像数据分类系统,包括:
二维卷积CNN模块,用于提取单张静态图像的空间特征;
注意力机制模块,用于将原本为输入的每个模态数据建立的独立注意力分布联系起来,专注于多模态输入数据之间的相互作用;
双线性残差网络结构模块,将原始输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层。
具体实施例子中,关于二维卷积CNN模块,深度学习在图像分类领域的应 用广泛及其取得的成就是有目共睹。卷积神经网络与普通神经网络的区别在于,卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。在卷积神经网络的卷积层中,一个神经元只与部分邻层神经元连接。在CNN的一个卷积层中,通常包含若干个特征平面(featureMap),每个特征平面由一些矩形排列的的神经元组成,同一特征平面的神经元共享权值,这里共享的权值就是卷积核。卷积核一般以随机小数矩阵的形式初始化,在网络的训练过程中卷积核将学习得到合理的权值。共享权值(卷积核)带来的直接好处是减少网络各层之间的连接,同时又降低了过拟合的风险。子采样也叫做池化(pooling),通常有均值子采样(mean pooling)和最大值子采样(max pooling)两种形式。子采样可以看作一种特殊的卷积过程。卷积和子采样大大简化了模型复杂度,减少了模型的参数。卷积分为一维卷积、二维卷积和三维卷积,根据实验数据的要求,本公开实施例子使用的是目前最为常用的二维卷积CNN,用于提取单张静态图像的空间特征,如图1所示,图1为二维卷积提取特征的模拟图,根据模板矩阵排列,对原图像素数据进行转换,并获得相应尺寸的特征图。
视觉注意力机制是人类视觉所特有的大脑信号处理机制。人类视觉通过快速扫描全局图像,获得需要重点关注的目标区域,也就是一般所说的注意力焦点,而后对这一区域投入更多注意力资源,以获取更多所需要关注目标的细节信息,而抑制其他无用信息。深度学习中的注意力机制从本质上讲和人类的选择性视觉注意力机制类似,核心目标也是从众多信息中选择出对当前任务目标更关键的信息。根据注意力机制在神经网络的作用可分为两种类型,通道注意力机制和空间注意力机制。通道注意力机制在于让网络关注于不同滤波器上,空间注意力机制在于关注图片信息的重点区域。在本发明将二者进行结合的基础上加入了双线性思想,形成了一种新的注意力机制,如图2,该混合注意力机制将原本为输入的每个模态数据建立的独立注意力分布联系起来,专注于多模态输入数据之间的相互作用。
关于通道注意力模块,通道注意力专注于输入图像的哪一部分是有意义的。 先将输入的特征图(feature map)分别经过全局最大池化层(global max pooling)和全局平均池化层(global average pooling),然后再分别经过多层感知机(Multilayer Perceptron,MLP),经过sigmoid函数进行激活操作,生成最终的通道注意力特征图(channel attention feature map)如图3。最后将通道注意力特征图和输入特征图(input feature map)做元素级(element wise)乘法操作,生成空间注意力模块需要的输入特征,具体计算如2.1.1和2.1.2所示。
X c(F)=σ(MLP(gavgP(F)+MLP(gmaxP(F)))        2.1.1
=σ(W 1(W 0(F gavg))+W 1(W 0(F gmax)))        2.1.2
其中W 0,W 1为MLP的共享权重,W 0∈R C/r×C,W 1∈R C×C/r,σ为sigmoid函数,r为减少率。
关于空间注意力模块,利用特征之间的空间关系生成空间注意力模块。与通道注意力模块不同的是,空间注意力关注的是特征间的位置,空间信息,是对通道注意力的补充。将通道注意力模块输出的特征图作为本模块的输入特征图。首先将特征图应用最大池化层和平均池化层操作,生成两个特征映射的二维通道信息图F max和F avg,将结果做聚合操作生成一个有效的特征描述符,再经过一个标准卷积层将这些信息连接起来并进行卷积,最后输出一个二维空间注意力图,如图4,具体计算如2.2.1和2.2.2所示。
X s(F)=σ(f n×n([avgP(F);maxP(F)])         2.2.1
=σ(f n×n(F avg;F max))             2.2.2
其中σ为sigmoid函数,n为卷积核大小,根据前人经验这里选择n=7。
将二者进行拼接形成混合注意力模块,如图2所示。
关于双线性残差网络结构搭建,本实施例子采用ResNet50作为骨架网络来构建双线性网络。ResNet在传统卷积神经网络中加入了残差学习(residual learning)模块,即将原始输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层。ResNet解决了深 层网络中随着网络的加深出现的训练集和测试集准确率均下降的问题,即梯度弥散和梯度下降问题,从而使得神经网络能在保证精度与运算速度的条件下越来越深。ResNet50是基于Bottleneck搭成的网络,每个层(layer)都由若干的块(Block)搭建而成,再由层组成整个网络,基础网络结构如表1所示。
表1 基础网络结构
Figure PCTCN2021099784-appb-000001
卷积2_x对应层(layer)1,卷积3_x对应layer2,卷积4_x对应layer3,卷积5_x对应layer4。方框中的“×2\times2×2”、“×3\times3×3”等指的是该layer由几个相同的结构组成。
ResNet使非线性层满足H(x,w h),采用短连接(shortcut)结构从输入直接 引入一个短连接到非线性层的输出上,整个映射变为如式3.1所示,这也是ResNet的核心公式。
y=H(x,w h)+x           3.1
ResNet有两个基本的块,一个是特性块(Identity Block),输入和输出的维度(dimension)是一样的,可以串联多个;另一个是卷积块(Conv Block),输入和输出的维度是不一样的,不可以连续串联,其作用是为了改变特征向量的维度,如图5卷积块结构,图6所示特性块结构。整体ResNet50骨架结构搭建如图7所示。
本发明我们在前人提出在同一位置上的两个特征经过双线性融合后得到特征矩阵b,对所有位置的b进行加和池化(sumpooling)得到矩阵ξ,最后把ξ张成一个向量x。对x进行矩归一化操作和L2归一化操作后,就得到融合后的特征z,最后把z用于细粒度(fine-grained)分类的思想基础上,在ResNet50的骨架网络中添加双线性池化层。对于卷积网络从输入图像A在位置a上提取的两个特征f m(a,A)和f n(a,A),进行下式3.2和3.3操作,这里暂不规定提取特征的尺寸只进行理论推导。
Figure PCTCN2021099784-appb-000002
Figure PCTCN2021099784-appb-000003
将输出矩阵b通过矩阵运算得到矩阵ξ,之后通过的池化层得到线性向量x=vec(ζ(A))。最后通过归一化操作得到融合特征z,用于分类器分类。改进后的网络结构如图8所示。
注意力机制模块在双线性注意力网络中的嵌入
本发明在将混合注意力机制模块嵌入搭建好的改进后的ResNet50的网络结构中。注意力机制模块能够提升网络模型的性能,可应用于各种网络之中,但其在不同网络中的嵌入位置不同、方式不同,对模型产生的性能也不相同。基于前一小节搭建的改进后的ResNet50网络,需要将混合注意力模块加在每个块 最后卷积层的后面,如图9所示,首先,将每个块断开,卷积层间不直接相连,每个块第一个卷积层的输出作为混合通道注意力模块的输入,这样的连接方式能够保证在主干网络中,模型能够充分提取输入图像的特征,同时去掉冗余信息,提高模型训练的准确度并加快收敛速度,整体结构如图10所示。
在具体实施例子中,将图像视觉识别中的细粒度分类方法的思想应用到阿尔兹海默症的医学影像数据分类中。并提出了一个新的分层双线性池化框架来获取层间部分特征关系,并整合多个跨层双线性特征以增强其表示能力,对分类效果有明显的提高。本公开基于该思想设计了以ResNet50为骨架网络的双线性池化网络,为了进一步提升网络分类准确率且去除冗余信息,加入了混合注意力机制,并将双线性思想应用到了注意力机制中,针对阿尔兹海默症医学影像的多模态数据,解决了大多数传统的细粒度视觉识别方法忽略了层间部分特征交互和特征学习是相互关联并可以相互加强的这一问题,并且协同注意力扩展为双线性注意力,对不同模态的数据可以进行多注意力分布的分析,可比使用单个压缩注意力分布的分析,更好的利用数据的多样性。通过本实例提取的实例特征图如图11所示。
实施例二
本实施例的目的是提供一种计算装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施例子一中的方法的步骤。
实施例三
本实施例的目的是提供一种计算机可读存储介质。
一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时执行上述实施例子一中的方法的步骤。
以上实施例二、三和四的装置中涉及的各步骤与方法实施例一相对应,具体实施方式可参见实施例一的相关说明部分。术语“计算机可读存储介质”应该理解为包括一个或多个指令集的单个介质或多个介质;还应当被理解为包括 任何介质,所述任何介质能够存储、编码或承载用于由处理器执行的指令集并使处理器执行本公开中的任一方法。
本领域技术人员应该明白,上述本公开的各模块或各步骤可以用通用的计算机装置来实现,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。本公开不限制于任何特定的硬件和软件的结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
上述虽然结合附图对本公开的具体实施方式进行了描述,但并非对本公开保护范围的限制,所属领域技术人员应该明白,在本公开的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本公开的保护范围以内。

Claims (10)

  1. 基于双线性注意力网络的医学影像数据分类方法,其特征是,包括:
    提取单张静态图像的空间特征图;
    将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;
    将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;
    针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;
    其中,所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。
  2. 如权利要求1所述的基于双线性注意力网络的医学影像数据分类方法,其特征是,利用通道注意力机制处理第一输入特征图时,先将第一输入特征图分别经过全局最大池化层和全局平均池化层,然后再分别经过多层感知机,经过函数进行激活操作,生成最终的通道注意力特征图。
  3. 如权利要求1所述的基于双线性注意力网络的医学影像数据分类方法,其特征是,针对第二输入特征图应用空间注意力机制处理时,将第二输入特征图应用最大池化层和平均池化层操作,生成两个特征映射的二维通道信息图,将结果做聚合操作生成有效的特征描述符,再经过标准卷积层将上述信息连接并进行卷积,最后输出二维空间注意力图。
  4. 如权利要求1所述的基于双线性注意力网络的医学影像数据分类方法,其特征是,所述双线性注意力网络包括残差学习模块,用于将原始输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层。
  5. 如权利要求4所述的基于双线性注意力网络的医学影像数据分类方法,其特征是,所述双线性注意力网络为采用ResNet50作为骨架网络来构建的双线性网络,ResNet50是基于Bottleneck搭成的网络,每个层都由若干的块搭建而成,再由层组成整个网络。
  6. 如权利要求5所述的基于双线性注意力网络的医学影像数据分类方法,其特征是,在ResNet50的骨架网络中添加双线性池化层,对于卷积网络从输入图像在某一位置上提取的两个特征,将输出矩阵通过矩阵运算,之后通过池化层得到关于两个特征的线性向量,最后通过归一化操作得到融合特征,用于分类器分类。
  7. 基于双线性注意力网络的医学影像数据分类系统,其特征是,包括:
    二维卷积CNN模块,用于提取单张静态图像的空间特征图;
    注意力机制模块,包括通道注意力模块及空间注意力模块;
    所述通道注意力模块用于将上述空间特征图作为第一输入特征图,利用通道注意力机制处理第一输入特征图生成最终的通道注意力特征图;
    将通道注意力特征图和第一输入特征图做元素级乘法操作生成第二输入特征图;
    所述空间注意力模块,针对第二输入特征图应用空间注意力机制处理后输出二维空间注意力图;
    所述通道注意力机制及空间注意力机制基于双线性注意力网络实现数据的处理,得到融合特征,用于分类器分类。
  8. 如权利要求7所述的基于双线性注意力网络的医学影像数据分类系统,其特征是,还包括:双线性残差网络结构模块,基于双线性注意力网络将原始 输入信息通过跳跃连接方式直接传输至下一层网络,同时梯度在反向传播时也是通过跳跃连接直接传递至上一层;
    优选的,所述注意力机制模块嵌入双线性注意力网络每个块最后卷积层的后面;
    嵌入时,将每个块断开,卷积层间不直接相连,每个块第一个卷积层的输出作为通道注意力模块的输入,用于保证在主干网络中,充分提取输入图像的特征,同时去掉冗余信息。
  9. 一种计算装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征是,所述处理器执行所述程序时实现上述权利要求1-7任一所述的方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征是,该程序被处理器执行时执行上述权利要求1-7任一所述的方法的步骤。
PCT/CN2021/099784 2021-03-24 2021-06-11 基于双线性注意力网络的医学影像数据分类方法及系统 WO2022198808A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110315001.5A CN113065588A (zh) 2021-03-24 2021-03-24 基于双线性注意力网络的医学影像数据分类方法及系统
CN202110315001.5 2021-03-24

Publications (1)

Publication Number Publication Date
WO2022198808A1 true WO2022198808A1 (zh) 2022-09-29

Family

ID=76561627

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099784 WO2022198808A1 (zh) 2021-03-24 2021-06-11 基于双线性注意力网络的医学影像数据分类方法及系统

Country Status (2)

Country Link
CN (1) CN113065588A (zh)
WO (1) WO2022198808A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375999A (zh) * 2022-10-25 2022-11-22 城云科技(中国)有限公司 应用于危化品车检测的目标检测模型、方法及装置
CN115375691A (zh) * 2022-10-26 2022-11-22 济宁九德半导体科技有限公司 基于图像的半导体扩散纸源缺陷检测系统及其方法
CN115546879A (zh) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 用于表情识别的细粒度识别模型及方法
CN116030048A (zh) * 2023-03-27 2023-04-28 山东鹰眼机械科技有限公司 灯检机及其方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612791B (zh) * 2022-05-11 2022-07-29 西南民族大学 一种基于改进注意力机制的目标检测方法及装置
CN117636074B (zh) * 2024-01-25 2024-04-26 山东建筑大学 基于特征交互融合的多模态图像分类方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871777A (zh) * 2019-01-23 2019-06-11 广州智慧城市发展研究院 一种基于注意力机制的行为识别系统
CN110084794A (zh) * 2019-04-22 2019-08-02 华南理工大学 一种基于注意力卷积神经网络的皮肤癌图片识别方法
CN110197208A (zh) * 2019-05-14 2019-09-03 江苏理工学院 一种纺织品瑕疵智能检测分类方法及装置
WO2020244774A1 (en) * 2019-06-07 2020-12-10 Leica Microsystems Cms Gmbh A system and method for training machine-learning algorithms for processing biology-related data, a microscope and a trained machine learning algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189334B (zh) * 2019-05-28 2022-08-09 南京邮电大学 基于注意力机制的残差型全卷积神经网络的医学图像分割方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871777A (zh) * 2019-01-23 2019-06-11 广州智慧城市发展研究院 一种基于注意力机制的行为识别系统
CN110084794A (zh) * 2019-04-22 2019-08-02 华南理工大学 一种基于注意力卷积神经网络的皮肤癌图片识别方法
CN110197208A (zh) * 2019-05-14 2019-09-03 江苏理工学院 一种纺织品瑕疵智能检测分类方法及装置
WO2020244774A1 (en) * 2019-06-07 2020-12-10 Leica Microsystems Cms Gmbh A system and method for training machine-learning algorithms for processing biology-related data, a microscope and a trained machine learning algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SANGHYUN WOO; JONGCHAN PARK; JOON-YOUNG LEE; IN SO KWEON: "CBAM: Convolutional Block Attention Module", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 July 2018 (2018-07-17), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081113447 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375999A (zh) * 2022-10-25 2022-11-22 城云科技(中国)有限公司 应用于危化品车检测的目标检测模型、方法及装置
CN115375999B (zh) * 2022-10-25 2023-02-14 城云科技(中国)有限公司 应用于危化品车检测的目标检测模型、方法及装置
CN115375691A (zh) * 2022-10-26 2022-11-22 济宁九德半导体科技有限公司 基于图像的半导体扩散纸源缺陷检测系统及其方法
CN115546879A (zh) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 用于表情识别的细粒度识别模型及方法
CN115546879B (zh) * 2022-11-29 2023-02-17 城云科技(中国)有限公司 用于表情识别的细粒度识别模型及方法
CN116030048A (zh) * 2023-03-27 2023-04-28 山东鹰眼机械科技有限公司 灯检机及其方法

Also Published As

Publication number Publication date
CN113065588A (zh) 2021-07-02

Similar Documents

Publication Publication Date Title
WO2022198808A1 (zh) 基于双线性注意力网络的医学影像数据分类方法及系统
WO2021018163A1 (zh) 神经网络的搜索方法及装置
CN109685819B (zh) 一种基于特征增强的三维医学图像分割方法
CN110188795A (zh) 图像分类方法、数据处理方法和装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
CN109902548B (zh) 一种对象属性识别方法、装置、计算设备及系统
Tang et al. DFFNet: An IoT-perceptive dual feature fusion network for general real-time semantic segmentation
CN110378381A (zh) 物体检测方法、装置和计算机存储介质
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
WO2022052601A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
CN106339984B (zh) 基于k均值驱动卷积神经网络的分布式图像超分辨方法
WO2022001372A1 (zh) 训练神经网络的方法、图像处理方法及装置
US20220157041A1 (en) Image classification method and apparatus
WO2020098257A1 (zh) 一种图像分类方法、装置及计算机可读存储介质
CN110222717A (zh) 图像处理方法和装置
CN112529146B (zh) 神经网络模型训练的方法和装置
CN112489050A (zh) 一种基于特征迁移的半监督实例分割算法
CN110222718A (zh) 图像处理的方法及装置
CN109190683A (zh) 一种基于注意力机制和双模态图像的分类方法
CN110222556A (zh) 一种人体动作识别系统及方法
Zhang et al. Channel-wise and feature-points reweights densenet for image classification
Zheng et al. Interactive multi-scale feature representation enhancement for small object detection
Yuan et al. Low-res MobileNet: An efficient lightweight network for low-resolution image classification in resource-constrained scenarios
CN116432736A (zh) 神经网络模型优化方法、装置及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932424

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE