NL2032936B1 - Brain tumor image region segmentation method and device, neural network and electronic equipment - Google Patents
Brain tumor image region segmentation method and device, neural network and electronic equipment Download PDFInfo
- Publication number
- NL2032936B1 NL2032936B1 NL2032936A NL2032936A NL2032936B1 NL 2032936 B1 NL2032936 B1 NL 2032936B1 NL 2032936 A NL2032936 A NL 2032936A NL 2032936 A NL2032936 A NL 2032936A NL 2032936 B1 NL2032936 B1 NL 2032936B1
- Authority
- NL
- Netherlands
- Prior art keywords
- network
- feature
- layer
- attention
- input
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30016—Brain
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Probability & Statistics with Applications (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
- Image Processing (AREA)
Abstract
Disclosed is a brain tumor image region segmentation method and device, a neural network and electronic equipment. The method includes: acquiring a brain MRI image; building coding modules, wherein each of the coding modules includes coding blocks and an 5 attention model; the MRI image is inputted into the coding blocks to obtain. a first feature map; the attention. model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the 10 spatial attention submodel is configured to input the second feature map into the first network and the second network respectively; inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI image; and decoding the third feature map to obtain a 15 segmented region of the brain tumor.
Description
P1584 /NL
BRAIN TUMOR IMAGE REGION SEGMENTATION METHOD AND DEVICE, NEURAL
NETWORK AND ELECTRONIC EQUIPMENT
The application relates to the technical field of deep neural network, and in particular to a brain tumor image region segmenta- tion method and device, a neural network and electronic equipment.
Glioma is one of the most common intracranial tumors. The clinical prognosis of glioma varies widely, and the degree of ma- lignancy is high. It can lead to various symptoms, such as head- ache, epilepsy and intracranial nerve diseases. According to the classification of the World Health Organization, there are four grades of glioma. Grades I and II are low-grade glioma, and Grades
III and IV are high-grade glioma (HGG). The glioma can be divided into three subregions: enhancement tumor (ET), tumor core (TC), and whole tumor (WT). Magnetic resonance imaging (MRI) has been widely applied in the imaging diagnosis of various systems of the whole body. Craniocerebral MRI is more sensitive to the diagnosis of brain tumors than CT, and can detect early lesions with more accurate localization. Using MRI scanning to segment glioma can obtain better segmentation results.
However, the appearance, shape and location of glioma vary greatly among patients, so naive U-Net and some variants {such as 3D U-Net, Res-UNet and UNet++) cannot achieve good segmentation results. Many recent studies have used attention mechanisms to solve the problem, such as Attention U-Net, TransUNet and Trans-
BTS. These three models essentially use spatial attention mecha- nisms, which can help the models focus on the spatial location of the segmentation target.
In the realization process of the present invention, the in- ventor has found at least the following problems in the prior art: these spatial attention mechanisms are all deficient. Due to the limitation of computing resources, TransUNet and TransBTS can learn the global spatial location dependence of low-resolution high-level semantic feature maps only, but ignore low-level seman- tic feature maps, which contain a lot of geometric information which is important for tumor localization; although the attention gate mechanism of Attention U-Net and the spatial attention mecha- nism in CBAM can be applied to high-level and low-level semantic feature maps, they can only learn the local spatial location rela- tionship, and the global spatial location relationship is also critical for tumor localization.
The embodiments of the application are intended to provide a brain tumor image region segmentation method and device, a neural network and electronic equipment, so as to solve the technical problem in related art that the existing spatial attention mecha- nisms fail to learn the global spatial location relationship in high-level and low-level semantic feature maps at the same time.
According to the first aspect of the embodiments of the ap- plication, a brain tumor image region segmentation method is pro- vided, including the following steps: acquiring a brain MRI image; building coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spa- tial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to ob- tain a second feature map; the spatial attention submodel includes a first network, a second network and an activation (Sigmeid) lay- er; the second feature map is inputted into the first network and the second network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a maximum pooling (MaxPool) layer and a hierarchical fully connected layer along a feature dimension, and the second network includes an average pooling (AvgPool) layer and a hierarchical fully connected layer along the feature dimension; inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI image; and decoding the third feature map to obtain a segmented region of the brain tumor.
Further, the feature attention submodel includes a third net- work, a fourth network and a Sigmoid layer; the first feature map is inputted into the third network and the fourth network respec- tively, and outputs of the third network and the fourth network are added and inputted into the Sigmoid layer to obtain a feature attention map; the feature attention map is multiplied by the first feature map to obtain the second feature map.
Further, the third network includes a MaxPool layer, a fully connected layer, a nonlinear activation (ReLu) layer and a fully connected layer along the spatial dimension; the fourth network includes an AvgPool layer, a fully connected layer, a RelLu layer and a fully connected layer along the spatial dimension.
Further, the hierarchical fully connected layer includes some sequentially connected submodules, and all second outputs of the submodules are added as an output of the hierarchical fully con- nected layer; wherein each of the submodules is configured to perform the following operation, until a certain spatial dimension of an input is smaller than a region size, and the input of a first submodule is the second feature map after max pooling or avg pooling: dividing an input according to a region size; inputting the divided input into a feed forward network to learn the spatial location relationship of local regions, and then restoring a shape to the same shape as the input to obtain a first output; upsampling the first output to obtain a second output with the same size as an input of the hierarchical fully connected lay- er; and downsampling the first output and taking the first output as an input of a next submodule after the first output passes through a batch normalization layer.
According to the second aspect of the embodiments of the ap- plication, a neural network used for brain tumor image region seg- mentation is provided, including the following paths: a coding path, wherein the coding path includes coding blocks and an attention model, configured to extract features of the brain MRI image; the brain MRI image are inputted into the coding blocks to obtain a first feature map, and the first feature map is inputted into the attention model to obtain a third feature map; wherein the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention sub- model to obtain a second feature map; the spatial attention sub- model includes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first net- work and the second network are added and inputted into the Sig- moid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimen- sion, and the second network includes an AvgPool layer and a hier- archical fully connected layer along the feature dimension; and a decoding path, wherein the decoding path includes decoding blocks, configured to decode the third feature map to obtain a re- gion of the brain tumor.
According to the third aspect of the embodiments of the ap- plication, a brain tumor image region segmentation device is pro- vided, including the following modules: an acquisition module, configured to acquire a brain MRI im- age; a building module, configured to build coding modules, where- in each of the coding modules includes coding blocks and an atten- tion model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel which are con- nected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second net- work and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs 5 of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a Max-
Pool layer and a hierarchical fully connected layer along a fea- ture dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimen- sion; a feature extraction module, configured to input the brain
MRI image into the coding modules, and extract to obtain a third feature map of the brain MRI image; and a decoding module, configured to decode the third feature map to obtain a segmented region of the brain tumor.
According to the fourth aspect of the embodiments of the ap- plication, electronic equipment is provided, including the follow- ing components: one or more processors; and a memory, configured to store one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the method as de- scribed in the first aspect.
According to the fourth aspect of the embodiments of the ap- plication, a computer readable storage medium is provided, storing a computer instruction, wherein the instruction, when executed by the processor{s}), implements the steps of the method as described in the first aspect.
The technical solutions provided by the embodiments of the application can include the following beneficial effects:
According to the above-mentioned embodiments, the application adopts a spatial attention mechanism realized based on hierar- chical full connection, which overcomes the problem that the ex- isting spatial attention mechanisms cannot be applied to the learning of global spatial location relationship in high-level and low-level semantic feature maps at the same time. According to the patent, the global attention mechanism, which is sequentially com- bined by the feature attention mechanism based on SE module and the spatial attention mechanism realized based on hierarchical full connection, is inserted into a coding path of six-layer 3D U-
Net. The attention mechanism can greatly improve the performance of 3D U-Net in the glioma subregion segmentation task.
The above-mentioned general description and the following de- tailed description should be understood as exemplary and explana- tory only but not to limit the application.
The accompanying drawings herein are incorporated as a part of the specification, show embodiments conforming to the applica- tion, and are used in conjunction with the specification to ex- plain the rationale for the application.
FIG. 1 is a schematic diagram of a neural network used for brain tumor image region segmentation as shown in an exemplary em- bodiment.
FIG. 2 is a structural diagram of a coding subblock/decoding subblock as shown in an exemplary embodiment.
FIG. 3 is a structural diagram of an attention model as shown in an exemplary embodiment.
FIG. 4 is a structural diagram of a feature attention submod- el as shown in an exemplary embodiment.
FIG. 5 is a structural diagram of a spatial attention submod- el realized based on hierarchical full connection as shown in an exemplary embodiment.
FIG. 6 is a structural diagram of a hierarchical fully con- nected layer as shown in an exemplary embodiment.
FIG. 7 is a flow chart of a brain tumor image region segmen- tation method as shown in an exemplary embodiment.
FIG. 8 is a structural diagram of a brain tumor image region segmentation device as shown in an exemplary embodiment.
The exemplary embodiments will be described in detail herein and the examples are represented in the accompanying drawings.
Where the accompanying drawings are referred to in the following description, the same numbers in different accompanying drawings indicate the same or similar elements, unless otherwise stated.
The implementation modes described in the following exemplary em- bodiments do not represent all implementation modes consistent with the application. Rather, they are only examples of devices and methods consistent with some aspects of the application as de- tailed in the claims.
The terms used in the application are intended only to de- scribe specific embodiments but not to limit the application. The singular forms “a”, “the” and “that” used in the application and the claims are also intended to include the plural form, unless otherwise indicated clearly in the context. The term "and/or" used herein should also be understood as referring to and containing any or all possible combinations of one or more related listed items.
It should be understood that although the terms first, sec- ond, third, etc. may be used in the application to describe vari- ous information, such information should not be limited to these terms. These terms are used only to distinguish the same type of information from one another. For example, without departing from the scope of the application, the first information may also be called second information, and similarly the second information may also be called the first information. Depending on the con- text, the word “if” used herein may be interpreted as “at the mo- ment of...” or “when...” or “in response to certainty”.
FIG. 1 is a schematic diagram of a neural network used for brain tumor image region segmentation as shown in an exemplary em- bodiment. As shown in FIG. 1, the neural network includes a coding path 11 and a decoding path 12: wherein the coding path 11 includes coding blocks and an at- tention model; the brain MRI image is inputted into the coding blocks to obtain a first feature map, and the first feature map is inputted into the attention model to obtain a third feature map; wherein the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence;
the first feature map is inputted into the feature attention sub- model to obtain a second feature map; the spatial attention sub- model includes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first net- work and the second network are added and inputted into the Sig- moid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimen- sion, and the second network includes an AvgPool layer and a hier- archical fully connected layer along the feature dimension.
Specifically, the coding path includes six coding modules, respectively called the first coding module, the second coding module, the third coding module, the fourth coding module, the fifth coding module and the sixth coding module. Each of the cod- ing modules includes two coding blocks and an attention model, wherein each of the two coding blocks of the first coding module and the second coding blocks of the other coding modules includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 1x1x1 and a padding size of 1x1x1, a Batch Normalization layer and a LeakyReLu layer; as shown in FIG. 2, each of the first cod- ing blocks of the second coding module to the sixth coding module includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1, a Batch Normali- zation layer and a LeakyReLu layer, which are used as a downsam- pling layer.
Specifically, firstly, a four modalities brain MRI image is read from the brain tumor segmentation dataset BraTs2019, and the input I € R&OMWXD of the coding path is obtained after background clippings and image enhancement on the image, wherein C represents the number of modality or channel (4 actually), and H, W and D represent the length, width and depth (128 actually) respectively.
The output I’ is obtained from I through feature extraction and downsampling by each coding module in the coding path. The number of coding modules contained in the coding path, the structure of coding modules and the structure of coding blocks all refer to the classical network structure in the field of brain tumor segmenta- tion, such as 3D U-Net.
FIG. 3 is a structural diagram of an attention model as shown in an exemplary embodiment. As shown in FIG. 2, the output
X € ROXAIXWXD of the previous coding block is inputted into two par- allel paths, respectively called residual connection and path 1; wherein the residual connection includes a 3D convolution layer with a kernel size of 1x1x1 and a stride size of 1x1x1; the path 1 includes a feature attention module and a spatial attention module in sequence; the outputs of the residual connection and the path 1 are added as the output X” € ROH*WXD of the attention mechanism.
Specifically, X™ and X° are obtained respectively from the output X of the previous coding block through the residual connec- tion and feature attention module, then XS is obtained from X° through the spatial attention module, and finally each element of
XxX? and X¥ is added to obtain the output X' of the attention mod- el. The process can be represented as follows:
X" = SAM(CAM(X))DRES(X) = SAM(X)BXTe = XSpXTes
Wherein SAM, CAM, RES and @ represent the spatial attention mod- ule, the feature attention module, the residual connection and the addition of each element respectively. The role of residual con- nection is to make a deeper neural network can be trained more stably. The feature attention module and the spatial attention module are used simultaneously, and the sequential connection of the feature attention module and the spatial attention module has a better effect.
The feature attention submodel includes a first network, a second network and a Sigmoid layer; the first feature map is in- putted into the third network and the fourth network respectively, and outputs of the third network and the fourth network are added and inputted into the Sigmoid layer to obtain a feature attention map; the feature attention map is multiplied by the first feature map to obtain the second feature map.
Specifically, the third network includes a MaxPool layer, a fully connected layer, a ReLu layer and a fully connected layer along the spatial dimension; the fourth network includes an Avg-
Pool layer, a fully connected layer, a ReLu layer and a fully con- nected layer along the spatial dimension.
In one embodiment, as shown in FIG. 4, the output X € ROXHxWxD of the previous coding block is inputted into two parallel paths, respectively called path 2 and path 3; wherein the path 2 includes the MaxPool layer, the fully connected layer, the ReLu layer and the fully connected layer along the spatial dimension; the path 3 includes the AvgPool layer, the fully connected layer, the ReLu layer and the fully connected layer along the spatial dimension; outputs of the two paths are added and pass through the Sigmoid layer to obtain the feature attention map; the result of the fea- ture attention map multiplying by each spatial dimension of the output of the previous coding subblock is taken as the output
XC ge ROHWXD of the feature attention mechanism.
Specifically, the output X of the previous coding block pass- es through the AvgPool layer along the spatial dimension and the
MaxPool layer along the spatial dimension respectively to obtain
XI EROP and XP g REOAXIXL Then, XY passes through the first fully connected layer to obtain X92 g RPxIxL wherein r is magni- ficine. Next, xe? passes through the ReLu layer and the second fully connected layer to obtain Xara XM goes through the same operation to obtain X73, then is added with each element of Xora’ and passes through the Sigmoid layer to obtain the feature atten- tion map Xt. Finally, Xe is multiplied by each element of X to ob- tain the cutput X¢ of the feature attention module. The process can be represented as follows:
XC = Sigmoid(FC(ReLu(FC(Avgs(X)))) DFC (ReLu(FC(Maxs(X)))))®X = Sigmoid (XP eX) DX = X.QX
Wherein Avgs, Max,, FC, Relu, Sigmoid, @ and ® represent the
AvgPool layer along the spatial dimension, the MaxPool layer along the spatial dimension, the fully connected layer, the ReLu layer, the Sigmoid layer, the addition of each element and the multipli- cation of each element respectively. In the design of feature at- tention module, a MaxPool path is added on the basis of a SE mod- ule. The reason for using the MaxPool layer and the AvgPool layer simultaneously is that the both operations can extract effective and complementary information, and the combination of both can further improve the performance of the feature attention mecha- nism. The role of the fully connected layer in each path is to learn the dependencies between the channels.
FIG. 5 is a structural diagram of a spatial attention submod- el as shown in an exemplary embodiment. As shown in FIG. 5, the output XE REHPWXD of the previous feature attention submodel is inputted into two parallel paths, respectively called path 4 and path 5; wherein the path 4 includes the MaxPool layer and the hi- erarchical fully connected layer along the feature dimension; the path 5 includes the AvgPool layer and the hierarchical fully con- nected layer along the feature dimension; outputs of the two paths are added and pass through the Sigmoid layer to obtain the spatial attention map; the result of the spatial attention map multiplying by each channel of the output of the previous feature attention module is taken as the output XS € ROHXWXD Sf the spatial attention module.
Specifically, the output X° of the previous feature attention submodel passes through the AvgPool layer along the feature dimen- sion and the MaxPool layer along the feature dimension respective- ly to obtain XI € RIXIWXD and ymax ¢ RUHXWXD x WI asses through the hierarchical fully connected layer to obtain xara? learning the global spatial location dependence. XP goes through the same op- eration to obtain X7%? and then is added with each element of xara? and passes through the Sigmoid layer to obtain the spatial attention map X, € RPMXWxD. Finally X, is multiplied by each element of X¢ to obtain the output X° of the spatial attention module. The process can be represented as follows:
X = Sigmoid (HFC (Avg (X6))®HFC(Max.(X))) ®X¢ = Sigmoid (XV @XT 2) @X¢ = X,®X°
Wherein Avg., Max,, HFC, Sigmoid, ® and @ represent the Avg-
Pool layer along the spatial dimension, the MaxPool layer along the spatial dimension, the hierarchical fully connected layer, the
Sigmoid layer, the addition of each element and the multiplication of each element respectively. The reason for using the MaxPool layer and the AvgPool layer simultaneously is that the both opera- tions can extract effective and complementary information, and the combination of both can further improve the performance of the spatial attention mechanism.
The hierarchical fully connected layer includes some sequen- tially connected submodules, and all second outputs of the submod- ules are added as an output of the hierarchical fully connected layer; wherein each of the submodules is configured to perform the following operation, until a certain spatial dimension of an input is smaller than a region size, and the input of a first submodule is the second feature map after max pooling or avg pooling: dividing an input according to a region size; inputting the divided input into a feed forward network to learn the spatial location relationship of local regions, and then restoring a shape to the same shape as the input to obtain a first output; upsampling the first output to obtain a second output with the same size as an input of the hierarchical fully connected lay- er; and downsampling the first output and taking the first output as an input of a next submodule after the first output passes through a batch normalization layer.
In one embodiment, as shown in FIG. 6, the hierarchical fully connected layer includes some submodules, wherein each of the sub- module performs the following operation until a certain spatial dimension H, W or D of the input Y € R&HXWXD js smaller than the region size G:
the input Y € ROHWXD is divided according to the appropriate region size G to obtain ye ROG ex, then the value is inputted into the feed forward network to learn the spatial location rela- tionship of the local region RE 63, and then the shape of Y' is restored to the same shape as the input Y to obtain the first out- put Y, € ROHXWxD
The first output YH is upsampled to obtain a second output with the same size as an input of the hierarchical fully connected layer; the first output Y is inputted into a 3D convolution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1 for downsampling, and used as an input of a next submodule after the first output passes through a batch normaliza- tion layer.
All second outputs of the submodules are added as an output of the hierarchical fully connected layer, which represent the lo- cal spatial location relationship at different levels, and are added as an output to represent the global spatial location rela- tionship. In this way, the calculated load of the global spatial location relationship is very small, so as to solve the problem that the existing spatial attention submodel cannot learn the global spatial location relationship in high-level and low-level semantic feature maps at the same time.
Specifically, for the input Y, first the torch.Tensor.unfold function is used to divide each spatial dimension H, W and D of VY, the unfold function's slice size and stride size are set to the region size G to obtain the output Vo € ROPE and Yyg is re- shaped to obtain ye RG 66", thus completing the region divi- sion of Y. Then, Y’ is inputted into the feed forward network, which has the same structure as that in Transformer, to learn the spatial location relationship of local regions. Then, the shape of
Y' is restored to the same shape as Y to obtain the first output
Y, € REXHXWXD . Next, ¥, is upsampled through the torch.nn. functional .upsample function to obtain the second output with the same size as an input of the hierarchical fully connected layer. The first output ¥; is inputted inte the 3D convolution lay- er with a kernel size of 3x3x3, a stride size of 2x2x2 and a pad- ding size of 1x1x1 for downsampling, and used as an input of a next submodule after the first output Y passes through a Batch
Normalization layer, until a certain spatial dimension H, W or D of the input is smaller than the region size G. The hierarchical fully connected layer occupies very little GPU memory. It approxi- mates the global spatial location relationship by learning the lo- cal and global spatial location relationships at different levels, overcoming the problem that the existing spatial attention mecha- nism cannot be applied to the learning of global spatial location relationship in high-level and low-level semantic feature maps at the same time.
The decoding path 12 includes decoding modules, configured to decode the third feature map to obtain a region of the brain tu- mor.
In this embodiment, the decoding path includes five decoding modules and a channel adjustment layer, wherein the decoding mod- ules are respectively called the first decoding module, the second decoding module, the third decoding module, the fourth decoding module and the fifth decoding module. Each decoding module in- cludes an upsampling layer and two decoding blocks, and the struc- ture of each decoding block is the same as that of the coding blocks of the first coding module. The upsampling layer of the fifth decoding module receives the output of the coding path, and the upsampling layers of other decoding modules receive the output of the previous decoding module (for example, the upsampling layer of the fourth decoding module receives the output of the fifth de- coding module). The first decoding block of each decoding module simultaneously receives the output of the upsampling layer in its own decoding block and the output of the coding module with a cor- responding number (for example, the first decoding block of the fifth decoding module also receives the output of the fifth coding module). The channel adjustment layer adjusts the number of chan- nels outputted by the decoding block 1 to be consistent with the number of glioma subregions, and the prediction results of glioma subregions are obtained after processing.
Specifically, the upsampling layer is realized by a 3D decon- volution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1. The channel adjustment layer includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 1x1x1 and a padding size of lx1x1 and a Softmax layer, wherein the number of channels outputted by the 3D convolution layer is consistent with the number of glioma subregions, and the Softmax layer is used to normalize the probability of each channel. The number of decoding modules contained in the decoding path, the structure of decoding modules and the structure of decoding blocks all refer to the classical network structure in the field of brain tumor segmentation, such as 3D U-Net.
Corresponding to the embodiment of the neural network used for brain tumor image region segmentation, the application further provides an embodiment of a brain tumor image region segmentation method.
FIG. 7 is a flow chart of a brain tumor image region segmen- tation method as shown in an exemplary embodiment. As shown in
FIG. 7, the method can include the following steps: 3101, acquiring a brain MRI image; 5102, building coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI im- age is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel in- cludes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the sec- ond network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierar- chical fully connected layer along a feature dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimension;
5103, inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI im- age; and 5104, decoding the third feature map to obtain a segmented region of the brain tumor.
According to the above-mentioned embodiments, the application adopts a spatial attention mechanism realized based on hierar- chical full connection, which overcomes the problem that the ex- isting spatial attention mechanisms cannot be applied to the learning of global spatial location relationship in high-level and low-level semantic feature maps at the same time. According to the patent, the global attention mechanism, which is sequentially com- bined by the feature attention mechanism based on SE module and the spatial attention mechanism realized based on hierarchical full connection, is inserted into a coding path of six-layer 3D U-
Net. The attention mechanism can greatly improve the performance of 3D U-Net in the glioma subregion segmentation task.
FIG. 8 is a block diagram of a brain tumor image region seg- mentation device as shown in an exemplary embodiment. As shown in
FIG. 8, the device includes the following modules: an acquisition module 21, configured to acquire a brain MRI image; a building module 22, configured to build coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a fea- ture attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second net- work and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a Max-
Pool layer and a hierarchical fully connected layer along a fea-
ture dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimen- sion; a feature extraction module 23, configured to input the brain
MRI image into the coding modules, and extract to obtain a third feature map of the brain MRI image; and a decoding module 24, configured to decode the third feature map to obtain a segmented region of the brain tumor.
With respect to the device in the above-mentioned embodiment, the specific manner in which each module performs operations is described in detail in the embodiment related to the method and will not be elaborated here.
For the device embodiment, because it basically corresponds to the method embodiment, relevant points are shown in a part of the description in the method embodiment. The device embodiment described above is only schematic, in that the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed over multiple network units. Some or all of the modules can be se- lected according to the actual needs to achieve the purpose of the application solutions. The example can be understood and imple- mented by those of ordinary skill in the art without creative ef- forts.
Correspondingly, the application further provides electronic equipment, including the following components: a memory, config- ured to store one or more programs; when the one or more programs are executed by the one or more processors, the one or more pro- cessors implement the brain tumor image region segmentation meth- od.
Correspondingly, the application further provides a computer readable storage medium, storing a computer instruction, wherein the instruction, when executed by the processor(s), implements the brain tumor image region segmentation method.
Other implementation solutions of the application will be thought easily by those skilled in the art after considering the specification and practicing the contents disclosed herein. The application is intended to cover any modifications, applications or adaptive changes of the application, and these modifications, applications or adaptive changes follow the general principles of the application and include common general knowledge or customary technical means in the art which are not disclosed by the applica- tion. The specification and embodiments are considered exemplary only, and the true scope and spirit of the application are indi- cated by the claims below.
It should be understood that the application is not limited to the precise structure described above and shown in the accompa- nying drawings, and various modifications and changes may be made without deviating from its scope. The scope of the application is limited only by the attached claims.
Claims (10)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111038003.0A CN113744284B (en) | 2021-09-06 | 2021-09-06 | Brain tumor image region segmentation method and device, neural network and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
NL2032936A NL2032936A (en) | 2023-03-10 |
NL2032936B1 true NL2032936B1 (en) | 2023-10-11 |
Family
ID=78735960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
NL2032936A NL2032936B1 (en) | 2021-09-06 | 2022-09-01 | Brain tumor image region segmentation method and device, neural network and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113744284B (en) |
NL (1) | NL2032936B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115330813A (en) * | 2022-07-15 | 2022-11-11 | 深圳先进技术研究院 | Image processing method, device and equipment and readable storage medium |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108364023A (en) * | 2018-02-11 | 2018-08-03 | 北京达佳互联信息技术有限公司 | Image-recognizing method based on attention model and system |
CN110533045B (en) * | 2019-07-31 | 2023-01-17 | 中国民航大学 | Luggage X-ray contraband image semantic segmentation method combined with attention mechanism |
CN111028242A (en) * | 2019-11-27 | 2020-04-17 | 中国科学院深圳先进技术研究院 | Automatic tumor segmentation system and method and electronic equipment |
CN111046939B (en) * | 2019-12-06 | 2023-08-04 | 中国人民解放军战略支援部队信息工程大学 | Attention-based CNN class activation graph generation method |
US11270447B2 (en) * | 2020-02-10 | 2022-03-08 | Hong Kong Applied Science And Technology Institute Company Limited | Method for image segmentation using CNN |
CN111626300B (en) * | 2020-05-07 | 2022-08-26 | 南京邮电大学 | Image segmentation method and modeling method of image semantic segmentation model based on context perception |
CN112102324B (en) * | 2020-09-17 | 2021-06-18 | 中国科学院海洋研究所 | Remote sensing image sea ice identification method based on depth U-Net model |
CN112308835A (en) * | 2020-10-27 | 2021-02-02 | 南京工业大学 | Intracranial hemorrhage segmentation method integrating dense connection and attention mechanism |
CN112418027A (en) * | 2020-11-11 | 2021-02-26 | 青岛科技大学 | Remote sensing image road extraction method for improving U-Net network |
CN112381897B (en) * | 2020-11-16 | 2023-04-07 | 西安电子科技大学 | Low-illumination image enhancement method based on self-coding network structure |
CN112365496B (en) * | 2020-12-02 | 2022-03-29 | 中北大学 | Multi-modal MR image brain tumor segmentation method based on deep learning and multi-guidance |
CN112651978B (en) * | 2020-12-16 | 2024-06-07 | 广州医软智能科技有限公司 | Sublingual microcirculation image segmentation method and device, electronic equipment and storage medium |
CN112818904A (en) * | 2021-02-22 | 2021-05-18 | 复旦大学 | Crowd density estimation method and device based on attention mechanism |
CN113344951B (en) * | 2021-05-21 | 2024-05-28 | 北京工业大学 | Boundary-aware dual-attention-guided liver segment segmentation method |
-
2021
- 2021-09-06 CN CN202111038003.0A patent/CN113744284B/en active Active
-
2022
- 2022-09-01 NL NL2032936A patent/NL2032936B1/en active
Also Published As
Publication number | Publication date |
---|---|
CN113744284A (en) | 2021-12-03 |
NL2032936A (en) | 2023-03-10 |
CN113744284B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | Semi-supervised contrastive learning for label-efficient medical image segmentation | |
Calisto et al. | BreastScreening: on the use of multi-modality in medical imaging diagnosis | |
Wang et al. | Multimodal medical image segmentation using multi-scale context-aware network | |
NL2032936B1 (en) | Brain tumor image region segmentation method and device, neural network and electronic equipment | |
US11610303B2 (en) | Data processing apparatus and method | |
Reddy et al. | Joint DR-DME classification using deep learning-CNN based modified grey-wolf optimizer with variable weights | |
Zhang et al. | RFI-GAN: A reference-guided fuzzy integral network for ultrasound image augmentation | |
Ding et al. | MallesNet: A multi-object assistance based network for brachial plexus segmentation in ultrasound images | |
Ni et al. | DNL-Net: deformed non-local neural network for blood vessel segmentation | |
Ali et al. | Multi-level Kronecker Convolutional Neural Network (ML-KCNN) for glioma segmentation from multi-modal MRI volumetric data | |
Ge et al. | Grayscale medical image segmentation method based on 2D&3D object detection with deep learning | |
Yang et al. | NAUNet: lightweight retinal vessel segmentation network with nested connections and efficient attention | |
Ni et al. | SSCA‐Net: Simultaneous Self‐and Channel‐Attention Neural Network for Multiscale Structure‐Preserving Vessel Segmentation | |
Azad et al. | Unlocking fine-grained details with wavelet-based high-frequency enhancement in transformers | |
Song et al. | NMNet: Learning Multi-level semantic information from scale extension domain for improved medical image segmentation | |
Wang et al. | CTCNet: A bi-directional cascaded segmentation network combining Transformers with CNNs for skin lesions | |
Jianjian et al. | MCSC-UTNet: Honeycomb lung segmentation algorithm based on Separable Vision Transformer and context feature fusion | |
Zhang et al. | CIDN: A context interactive deep network with edge-aware for X-ray angiography images segmentation | |
Naderi et al. | Dynamic-Pix2Pix: Medical image segmentation by injecting noise to cGAN for modeling input and target domain joint distributions with limited training data | |
Nayak et al. | DMF-Net: a deep multi-level semantic fusion network for high-resolution chest CT and X-ray image de-noising | |
Nguyen et al. | AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation | |
Zhao et al. | VCMix-Net: A hybrid network for medical image segmentation | |
AU2021102976A4 (en) | Automatic Pancreas Segmentation using A Novel Modified Semantic Deep Learning Bottom-Up Approach | |
Hao et al. | Learning saliently temporal-spatial features for x-ray coronary angiography sequence segmentation | |
Ma | Deep learning‐based image processing for financial audit risk quantification in healthcare |