NL2032936B1

NL2032936B1 - Brain tumor image region segmentation method and device, neural network and electronic equipment

Info

Publication number: NL2032936B1
Application number: NL2032936A
Authority: NL
Inventors: Xu Qingsheng; Lou Yufeng; Gu Pengkun; He Zhe; Cai Jianping; Huo Meimei
Original assignee: Univ Zhejiang City College
Priority date: 2021-09-06
Filing date: 2022-09-01
Publication date: 2023-10-11
Also published as: CN113744284A; NL2032936A; CN113744284B

Abstract

Disclosed is a brain tumor image region segmentation method and device, a neural network and electronic equipment. The method includes: acquiring a brain MRI image; building coding modules, wherein each of the coding modules includes coding blocks and an 5 attention model; the MRI image is inputted into the coding blocks to obtain. a first feature map; the attention. model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the 10 spatial attention submodel is configured to input the second feature map into the first network and the second network respectively; inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI image; and decoding the third feature map to obtain a 15 segmented region of the brain tumor.

Description

P1584 /NL

BRAIN TUMOR IMAGE REGION SEGMENTATION METHOD AND DEVICE, NEURAL

NETWORK AND ELECTRONIC EQUIPMENT

TECHNICAL FIELD

The application relates to the technical field of deep neural network, and in particular to a brain tumor image region segmenta- tion method and device, a neural network and electronic equipment.

BACKGROUND ART

Glioma is one of the most common intracranial tumors. The clinical prognosis of glioma varies widely, and the degree of ma- lignancy is high. It can lead to various symptoms, such as head- ache, epilepsy and intracranial nerve diseases. According to the classification of the World Health Organization, there are four grades of glioma. Grades I and II are low-grade glioma, and Grades

III and IV are high-grade glioma (HGG). The glioma can be divided into three subregions: enhancement tumor (ET), tumor core (TC), and whole tumor (WT). Magnetic resonance imaging (MRI) has been widely applied in the imaging diagnosis of various systems of the whole body. Craniocerebral MRI is more sensitive to the diagnosis of brain tumors than CT, and can detect early lesions with more accurate localization. Using MRI scanning to segment glioma can obtain better segmentation results.

However, the appearance, shape and location of glioma vary greatly among patients, so naive U-Net and some variants {such as 3D U-Net, Res-UNet and UNet++) cannot achieve good segmentation results. Many recent studies have used attention mechanisms to solve the problem, such as Attention U-Net, TransUNet and Trans-

BTS. These three models essentially use spatial attention mecha- nisms, which can help the models focus on the spatial location of the segmentation target.

In the realization process of the present invention, the in- ventor has found at least the following problems in the prior art: these spatial attention mechanisms are all deficient. Due to the limitation of computing resources, TransUNet and TransBTS can learn the global spatial location dependence of low-resolution high-level semantic feature maps only, but ignore low-level seman- tic feature maps, which contain a lot of geometric information which is important for tumor localization; although the attention gate mechanism of Attention U-Net and the spatial attention mecha- nism in CBAM can be applied to high-level and low-level semantic feature maps, they can only learn the local spatial location rela- tionship, and the global spatial location relationship is also critical for tumor localization.

SUMMARY

The embodiments of the application are intended to provide a brain tumor image region segmentation method and device, a neural network and electronic equipment, so as to solve the technical problem in related art that the existing spatial attention mecha- nisms fail to learn the global spatial location relationship in high-level and low-level semantic feature maps at the same time.

According to the first aspect of the embodiments of the ap- plication, a brain tumor image region segmentation method is pro- vided, including the following steps: acquiring a brain MRI image; building coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spa- tial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to ob- tain a second feature map; the spatial attention submodel includes a first network, a second network and an activation (Sigmeid) lay- er; the second feature map is inputted into the first network and the second network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a maximum pooling (MaxPool) layer and a hierarchical fully connected layer along a feature dimension, and the second network includes an average pooling (AvgPool) layer and a hierarchical fully connected layer along the feature dimension; inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI image; and decoding the third feature map to obtain a segmented region of the brain tumor.

Further, the feature attention submodel includes a third net- work, a fourth network and a Sigmoid layer; the first feature map is inputted into the third network and the fourth network respec- tively, and outputs of the third network and the fourth network are added and inputted into the Sigmoid layer to obtain a feature attention map; the feature attention map is multiplied by the first feature map to obtain the second feature map.

Further, the third network includes a MaxPool layer, a fully connected layer, a nonlinear activation (ReLu) layer and a fully connected layer along the spatial dimension; the fourth network includes an AvgPool layer, a fully connected layer, a RelLu layer and a fully connected layer along the spatial dimension.

Further, the hierarchical fully connected layer includes some sequentially connected submodules, and all second outputs of the submodules are added as an output of the hierarchical fully con- nected layer; wherein each of the submodules is configured to perform the following operation, until a certain spatial dimension of an input is smaller than a region size, and the input of a first submodule is the second feature map after max pooling or avg pooling: dividing an input according to a region size; inputting the divided input into a feed forward network to learn the spatial location relationship of local regions, and then restoring a shape to the same shape as the input to obtain a first output; upsampling the first output to obtain a second output with the same size as an input of the hierarchical fully connected lay- er; and downsampling the first output and taking the first output as an input of a next submodule after the first output passes through a batch normalization layer.

According to the second aspect of the embodiments of the ap- plication, a neural network used for brain tumor image region seg- mentation is provided, including the following paths: a coding path, wherein the coding path includes coding blocks and an attention model, configured to extract features of the brain MRI image; the brain MRI image are inputted into the coding blocks to obtain a first feature map, and the first feature map is inputted into the attention model to obtain a third feature map; wherein the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention sub- model to obtain a second feature map; the spatial attention sub- model includes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first net- work and the second network are added and inputted into the Sig- moid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimen- sion, and the second network includes an AvgPool layer and a hier- archical fully connected layer along the feature dimension; and a decoding path, wherein the decoding path includes decoding blocks, configured to decode the third feature map to obtain a re- gion of the brain tumor.

According to the third aspect of the embodiments of the ap- plication, a brain tumor image region segmentation device is pro- vided, including the following modules: an acquisition module, configured to acquire a brain MRI im- age; a building module, configured to build coding modules, where- in each of the coding modules includes coding blocks and an atten- tion model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel which are con- nected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second net- work and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs 5 of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a Max-

Pool layer and a hierarchical fully connected layer along a fea- ture dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimen- sion; a feature extraction module, configured to input the brain

MRI image into the coding modules, and extract to obtain a third feature map of the brain MRI image; and a decoding module, configured to decode the third feature map to obtain a segmented region of the brain tumor.

According to the fourth aspect of the embodiments of the ap- plication, electronic equipment is provided, including the follow- ing components: one or more processors; and a memory, configured to store one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the method as de- scribed in the first aspect.

According to the fourth aspect of the embodiments of the ap- plication, a computer readable storage medium is provided, storing a computer instruction, wherein the instruction, when executed by the processor{s}), implements the steps of the method as described in the first aspect.

The technical solutions provided by the embodiments of the application can include the following beneficial effects:

According to the above-mentioned embodiments, the application adopts a spatial attention mechanism realized based on hierar- chical full connection, which overcomes the problem that the ex- isting spatial attention mechanisms cannot be applied to the learning of global spatial location relationship in high-level and low-level semantic feature maps at the same time. According to the patent, the global attention mechanism, which is sequentially com- bined by the feature attention mechanism based on SE module and the spatial attention mechanism realized based on hierarchical full connection, is inserted into a coding path of six-layer 3D U-

Net. The attention mechanism can greatly improve the performance of 3D U-Net in the glioma subregion segmentation task.

The above-mentioned general description and the following de- tailed description should be understood as exemplary and explana- tory only but not to limit the application.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings herein are incorporated as a part of the specification, show embodiments conforming to the applica- tion, and are used in conjunction with the specification to ex- plain the rationale for the application.

FIG. 1 is a schematic diagram of a neural network used for brain tumor image region segmentation as shown in an exemplary em- bodiment.

FIG. 2 is a structural diagram of a coding subblock/decoding subblock as shown in an exemplary embodiment.

FIG. 3 is a structural diagram of an attention model as shown in an exemplary embodiment.

FIG. 4 is a structural diagram of a feature attention submod- el as shown in an exemplary embodiment.

FIG. 5 is a structural diagram of a spatial attention submod- el realized based on hierarchical full connection as shown in an exemplary embodiment.

FIG. 6 is a structural diagram of a hierarchical fully con- nected layer as shown in an exemplary embodiment.

FIG. 7 is a flow chart of a brain tumor image region segmen- tation method as shown in an exemplary embodiment.

FIG. 8 is a structural diagram of a brain tumor image region segmentation device as shown in an exemplary embodiment.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The exemplary embodiments will be described in detail herein and the examples are represented in the accompanying drawings.

Where the accompanying drawings are referred to in the following description, the same numbers in different accompanying drawings indicate the same or similar elements, unless otherwise stated.

The implementation modes described in the following exemplary em- bodiments do not represent all implementation modes consistent with the application. Rather, they are only examples of devices and methods consistent with some aspects of the application as de- tailed in the claims.

The terms used in the application are intended only to de- scribe specific embodiments but not to limit the application. The singular forms “a”, “the” and “that” used in the application and the claims are also intended to include the plural form, unless otherwise indicated clearly in the context. The term "and/or" used herein should also be understood as referring to and containing any or all possible combinations of one or more related listed items.

It should be understood that although the terms first, sec- ond, third, etc. may be used in the application to describe vari- ous information, such information should not be limited to these terms. These terms are used only to distinguish the same type of information from one another. For example, without departing from the scope of the application, the first information may also be called second information, and similarly the second information may also be called the first information. Depending on the con- text, the word “if” used herein may be interpreted as “at the mo- ment of...” or “when...” or “in response to certainty”.

FIG. 1 is a schematic diagram of a neural network used for brain tumor image region segmentation as shown in an exemplary em- bodiment. As shown in FIG. 1, the neural network includes a coding path 11 and a decoding path 12: wherein the coding path 11 includes coding blocks and an at- tention model; the brain MRI image is inputted into the coding blocks to obtain a first feature map, and the first feature map is inputted into the attention model to obtain a third feature map; wherein the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence;

the first feature map is inputted into the feature attention sub- model to obtain a second feature map; the spatial attention sub- model includes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first net- work and the second network are added and inputted into the Sig- moid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimen- sion, and the second network includes an AvgPool layer and a hier- archical fully connected layer along the feature dimension.

Specifically, the coding path includes six coding modules, respectively called the first coding module, the second coding module, the third coding module, the fourth coding module, the fifth coding module and the sixth coding module. Each of the cod- ing modules includes two coding blocks and an attention model, wherein each of the two coding blocks of the first coding module and the second coding blocks of the other coding modules includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 1x1x1 and a padding size of 1x1x1, a Batch Normalization layer and a LeakyReLu layer; as shown in FIG. 2, each of the first cod- ing blocks of the second coding module to the sixth coding module includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1, a Batch Normali- zation layer and a LeakyReLu layer, which are used as a downsam- pling layer.

Specifically, firstly, a four modalities brain MRI image is read from the brain tumor segmentation dataset BraTs2019, and the input I € R&OMWXD of the coding path is obtained after background clippings and image enhancement on the image, wherein C represents the number of modality or channel (4 actually), and H, W and D represent the length, width and depth (128 actually) respectively.

The output I’ is obtained from I through feature extraction and downsampling by each coding module in the coding path. The number of coding modules contained in the coding path, the structure of coding modules and the structure of coding blocks all refer to the classical network structure in the field of brain tumor segmenta- tion, such as 3D U-Net.

FIG. 3 is a structural diagram of an attention model as shown in an exemplary embodiment. As shown in FIG. 2, the output

X € ROXAIXWXD of the previous coding block is inputted into two par- allel paths, respectively called residual connection and path 1; wherein the residual connection includes a 3D convolution layer with a kernel size of 1x1x1 and a stride size of 1x1x1; the path 1 includes a feature attention module and a spatial attention module in sequence; the outputs of the residual connection and the path 1 are added as the output X” € ROH*WXD of the attention mechanism.

Specifically, X™ and X° are obtained respectively from the output X of the previous coding block through the residual connec- tion and feature attention module, then XS is obtained from X° through the spatial attention module, and finally each element of

XxX? and X¥ is added to obtain the output X' of the attention mod- el. The process can be represented as follows:

X" = SAM(CAM(X))DRES(X) = SAM(X)BXTe = XSpXTes

Wherein SAM, CAM, RES and @ represent the spatial attention mod- ule, the feature attention module, the residual connection and the addition of each element respectively. The role of residual con- nection is to make a deeper neural network can be trained more stably. The feature attention module and the spatial attention module are used simultaneously, and the sequential connection of the feature attention module and the spatial attention module has a better effect.

The feature attention submodel includes a first network, a second network and a Sigmoid layer; the first feature map is in- putted into the third network and the fourth network respectively, and outputs of the third network and the fourth network are added and inputted into the Sigmoid layer to obtain a feature attention map; the feature attention map is multiplied by the first feature map to obtain the second feature map.

Specifically, the third network includes a MaxPool layer, a fully connected layer, a ReLu layer and a fully connected layer along the spatial dimension; the fourth network includes an Avg-

Pool layer, a fully connected layer, a ReLu layer and a fully con- nected layer along the spatial dimension.

In one embodiment, as shown in FIG. 4, the output X € ROXHxWxD of the previous coding block is inputted into two parallel paths, respectively called path 2 and path 3; wherein the path 2 includes the MaxPool layer, the fully connected layer, the ReLu layer and the fully connected layer along the spatial dimension; the path 3 includes the AvgPool layer, the fully connected layer, the ReLu layer and the fully connected layer along the spatial dimension; outputs of the two paths are added and pass through the Sigmoid layer to obtain the feature attention map; the result of the fea- ture attention map multiplying by each spatial dimension of the output of the previous coding subblock is taken as the output

XC ge ROHWXD of the feature attention mechanism.

Specifically, the output X of the previous coding block pass- es through the AvgPool layer along the spatial dimension and the

MaxPool layer along the spatial dimension respectively to obtain

XI EROP and XP g REOAXIXL Then, XY passes through the first fully connected layer to obtain X92 g RPxIxL wherein r is magni- ficine. Next, xe? passes through the ReLu layer and the second fully connected layer to obtain Xara XM goes through the same operation to obtain X73, then is added with each element of Xora’ and passes through the Sigmoid layer to obtain the feature atten- tion map Xt. Finally, Xe is multiplied by each element of X to ob- tain the cutput X¢ of the feature attention module. The process can be represented as follows:

XC = Sigmoid(FC(ReLu(FC(Avgs(X)))) DFC (ReLu(FC(Maxs(X)))))®X = Sigmoid (XP eX) DX = X.QX

Wherein Avgs, Max,, FC, Relu, Sigmoid, @ and ® represent the

AvgPool layer along the spatial dimension, the MaxPool layer along the spatial dimension, the fully connected layer, the ReLu layer, the Sigmoid layer, the addition of each element and the multipli- cation of each element respectively. In the design of feature at- tention module, a MaxPool path is added on the basis of a SE mod- ule. The reason for using the MaxPool layer and the AvgPool layer simultaneously is that the both operations can extract effective and complementary information, and the combination of both can further improve the performance of the feature attention mecha- nism. The role of the fully connected layer in each path is to learn the dependencies between the channels.

FIG. 5 is a structural diagram of a spatial attention submod- el as shown in an exemplary embodiment. As shown in FIG. 5, the output XE REHPWXD of the previous feature attention submodel is inputted into two parallel paths, respectively called path 4 and path 5; wherein the path 4 includes the MaxPool layer and the hi- erarchical fully connected layer along the feature dimension; the path 5 includes the AvgPool layer and the hierarchical fully con- nected layer along the feature dimension; outputs of the two paths are added and pass through the Sigmoid layer to obtain the spatial attention map; the result of the spatial attention map multiplying by each channel of the output of the previous feature attention module is taken as the output XS € ROHXWXD Sf the spatial attention module.

Specifically, the output X° of the previous feature attention submodel passes through the AvgPool layer along the feature dimen- sion and the MaxPool layer along the feature dimension respective- ly to obtain XI € RIXIWXD and ymax ¢ RUHXWXD x WI asses through the hierarchical fully connected layer to obtain xara? learning the global spatial location dependence. XP goes through the same op- eration to obtain X7%? and then is added with each element of xara? and passes through the Sigmoid layer to obtain the spatial attention map X, € RPMXWxD. Finally X, is multiplied by each element of X¢ to obtain the output X° of the spatial attention module. The process can be represented as follows:

X = Sigmoid (HFC (Avg (X6))®HFC(Max.(X))) ®X¢ = Sigmoid (XV @XT 2) @X¢ = X,®X°

Wherein Avg., Max,, HFC, Sigmoid, ® and @ represent the Avg-

Pool layer along the spatial dimension, the MaxPool layer along the spatial dimension, the hierarchical fully connected layer, the

Sigmoid layer, the addition of each element and the multiplication of each element respectively. The reason for using the MaxPool layer and the AvgPool layer simultaneously is that the both opera- tions can extract effective and complementary information, and the combination of both can further improve the performance of the spatial attention mechanism.

The hierarchical fully connected layer includes some sequen- tially connected submodules, and all second outputs of the submod- ules are added as an output of the hierarchical fully connected layer; wherein each of the submodules is configured to perform the following operation, until a certain spatial dimension of an input is smaller than a region size, and the input of a first submodule is the second feature map after max pooling or avg pooling: dividing an input according to a region size; inputting the divided input into a feed forward network to learn the spatial location relationship of local regions, and then restoring a shape to the same shape as the input to obtain a first output; upsampling the first output to obtain a second output with the same size as an input of the hierarchical fully connected lay- er; and downsampling the first output and taking the first output as an input of a next submodule after the first output passes through a batch normalization layer.

In one embodiment, as shown in FIG. 6, the hierarchical fully connected layer includes some submodules, wherein each of the sub- module performs the following operation until a certain spatial dimension H, W or D of the input Y € R&HXWXD js smaller than the region size G:

the input Y € ROHWXD is divided according to the appropriate region size G to obtain ye ROG ex, then the value is inputted into the feed forward network to learn the spatial location rela- tionship of the local region RE 63, and then the shape of Y' is restored to the same shape as the input Y to obtain the first out- put Y, € ROHXWxD

The first output YH is upsampled to obtain a second output with the same size as an input of the hierarchical fully connected layer; the first output Y is inputted into a 3D convolution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1 for downsampling, and used as an input of a next submodule after the first output passes through a batch normaliza- tion layer.

All second outputs of the submodules are added as an output of the hierarchical fully connected layer, which represent the lo- cal spatial location relationship at different levels, and are added as an output to represent the global spatial location rela- tionship. In this way, the calculated load of the global spatial location relationship is very small, so as to solve the problem that the existing spatial attention submodel cannot learn the global spatial location relationship in high-level and low-level semantic feature maps at the same time.

Specifically, for the input Y, first the torch.Tensor.unfold function is used to divide each spatial dimension H, W and D of VY, the unfold function's slice size and stride size are set to the region size G to obtain the output Vo € ROPE and Yyg is re- shaped to obtain ye RG 66", thus completing the region divi- sion of Y. Then, Y’ is inputted into the feed forward network, which has the same structure as that in Transformer, to learn the spatial location relationship of local regions. Then, the shape of

Y' is restored to the same shape as Y to obtain the first output

Y, € REXHXWXD . Next, ¥, is upsampled through the torch.nn. functional .upsample function to obtain the second output with the same size as an input of the hierarchical fully connected layer. The first output ¥; is inputted inte the 3D convolution lay- er with a kernel size of 3x3x3, a stride size of 2x2x2 and a pad- ding size of 1x1x1 for downsampling, and used as an input of a next submodule after the first output Y passes through a Batch

Normalization layer, until a certain spatial dimension H, W or D of the input is smaller than the region size G. The hierarchical fully connected layer occupies very little GPU memory. It approxi- mates the global spatial location relationship by learning the lo- cal and global spatial location relationships at different levels, overcoming the problem that the existing spatial attention mecha- nism cannot be applied to the learning of global spatial location relationship in high-level and low-level semantic feature maps at the same time.

The decoding path 12 includes decoding modules, configured to decode the third feature map to obtain a region of the brain tu- mor.

In this embodiment, the decoding path includes five decoding modules and a channel adjustment layer, wherein the decoding mod- ules are respectively called the first decoding module, the second decoding module, the third decoding module, the fourth decoding module and the fifth decoding module. Each decoding module in- cludes an upsampling layer and two decoding blocks, and the struc- ture of each decoding block is the same as that of the coding blocks of the first coding module. The upsampling layer of the fifth decoding module receives the output of the coding path, and the upsampling layers of other decoding modules receive the output of the previous decoding module (for example, the upsampling layer of the fourth decoding module receives the output of the fifth de- coding module). The first decoding block of each decoding module simultaneously receives the output of the upsampling layer in its own decoding block and the output of the coding module with a cor- responding number (for example, the first decoding block of the fifth decoding module also receives the output of the fifth coding module). The channel adjustment layer adjusts the number of chan- nels outputted by the decoding block 1 to be consistent with the number of glioma subregions, and the prediction results of glioma subregions are obtained after processing.

Specifically, the upsampling layer is realized by a 3D decon- volution layer with a kernel size of 3x3x3, a stride size of 2x2x2 and a padding size of 1x1x1. The channel adjustment layer includes a 3D convolution layer with a kernel size of 3x3x3, a stride size of 1x1x1 and a padding size of lx1x1 and a Softmax layer, wherein the number of channels outputted by the 3D convolution layer is consistent with the number of glioma subregions, and the Softmax layer is used to normalize the probability of each channel. The number of decoding modules contained in the decoding path, the structure of decoding modules and the structure of decoding blocks all refer to the classical network structure in the field of brain tumor segmentation, such as 3D U-Net.

Corresponding to the embodiment of the neural network used for brain tumor image region segmentation, the application further provides an embodiment of a brain tumor image region segmentation method.

FIG. 7 is a flow chart of a brain tumor image region segmen- tation method as shown in an exemplary embodiment. As shown in

FIG. 7, the method can include the following steps: 3101, acquiring a brain MRI image; 5102, building coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI im- age is inputted into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel in- cludes a first network, a second network and a Sigmoid layer; the second feature map is inputted into the first network and the sec- ond network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierar- chical fully connected layer along a feature dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimension;

5103, inputting the brain MRI image into the coding modules, and extracting to obtain a third feature map of the brain MRI im- age; and 5104, decoding the third feature map to obtain a segmented region of the brain tumor.

FIG. 8 is a block diagram of a brain tumor image region seg- mentation device as shown in an exemplary embodiment. As shown in

FIG. 8, the device includes the following modules: an acquisition module 21, configured to acquire a brain MRI image; a building module 22, configured to build coding modules, wherein each of the coding modules includes coding blocks and an attention model; the MRI image is inputted into the coding blocks to obtain a first feature map; the attention model includes a fea- ture attention submodel and a spatial attention submodel which are connected in sequence; the first feature map is inputted into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second net- work and a Sigmoid layer; the second feature map is inputted into the first network and the second network respectively, and outputs of the first network and the second network are added and inputted into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a Max-

Pool layer and a hierarchical fully connected layer along a fea-

ture dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimen- sion; a feature extraction module 23, configured to input the brain

MRI image into the coding modules, and extract to obtain a third feature map of the brain MRI image; and a decoding module 24, configured to decode the third feature map to obtain a segmented region of the brain tumor.

With respect to the device in the above-mentioned embodiment, the specific manner in which each module performs operations is described in detail in the embodiment related to the method and will not be elaborated here.

For the device embodiment, because it basically corresponds to the method embodiment, relevant points are shown in a part of the description in the method embodiment. The device embodiment described above is only schematic, in that the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed over multiple network units. Some or all of the modules can be se- lected according to the actual needs to achieve the purpose of the application solutions. The example can be understood and imple- mented by those of ordinary skill in the art without creative ef- forts.

Correspondingly, the application further provides electronic equipment, including the following components: a memory, config- ured to store one or more programs; when the one or more programs are executed by the one or more processors, the one or more pro- cessors implement the brain tumor image region segmentation meth- od.

Correspondingly, the application further provides a computer readable storage medium, storing a computer instruction, wherein the instruction, when executed by the processor(s), implements the brain tumor image region segmentation method.

Other implementation solutions of the application will be thought easily by those skilled in the art after considering the specification and practicing the contents disclosed herein. The application is intended to cover any modifications, applications or adaptive changes of the application, and these modifications, applications or adaptive changes follow the general principles of the application and include common general knowledge or customary technical means in the art which are not disclosed by the applica- tion. The specification and embodiments are considered exemplary only, and the true scope and spirit of the application are indi- cated by the claims below.

It should be understood that the application is not limited to the precise structure described above and shown in the accompa- nying drawings, and various modifications and changes may be made without deviating from its scope. The scope of the application is limited only by the attached claims.

Claims

CONCLUSIONS

CLAIMS 1. Method for segmentation of brain tumor image areas, comprising the following steps: obtaining an MRI image of the brain; building coding modules, where each of the coding modules includes coding blocks and an attention model; the MRI image is input into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel connected in sequence; the first feature map is input into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second network and an activation layer (Sigmoid); the second feature map is input into the first network and the second network, respectively, and outputs from the first network and the second network are added and input into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a maximum pooling (MaxPool) layer and a hierarchical fully connected layer along a feature dimension, and the second network includes an average pooling (AvgPool) layer and a hierarchical fully connected layer along the feature dimension; inputting the brain MRI image into the encoding modules, and extracting to obtain a third feature map of the brain MRI image; and decoding the third feature map to obtain a segmented region of the brain tumor.

The method of claim 1, wherein the feature attention sub-model includes a third network, a fourth network and a Sigmoid layer; the first feature map is input to the third network and the fourth network respectively, and outputs from the third network and the fourth network are added and input to the Sigmoid layer to obtain a feature attention map; the knowledge

brand attention card is multiplied by the first characteristic card to obtain the second characteristic card.

The method of claim 2, wherein the third network includes a MaxPool layer, a fully connected layer, a non-linear activation layer (ReLu) and a fully connected layer along the spatial dimension; the fourth network includes an AvgPool layer, a fully connected layer, a ReLu layer and a fully connected layer along the spatial dimension.

The method of claim 1, wherein the hierarchical fully connected layer includes some sequentially connected sub-modules, and all second outputs of the sub-modules are added as output of the hierarchical fully connected layer; where each of the sub-modules is configured to perform the following operation, until a certain spatial dimension of an input is less than an area size, and the input of a first sub-module is the second function map after max pooling or avg pooling: dividing an input according to an area size; feeding the distributed input into a feedforward network to learn the spatial location relationship of local areas, and then restoring a shape to the same shape as the input to obtain a first output; upsampling the first output to obtain a second output of the same size as an input of the hierarchical fully connected layer; and downsampling the first output and taking the first output as input to a subsequent submodule after the first output has passed through a batch normalization layer.

5. A neural network used for segmentation of brain tumor image areas, comprising the following paths: an encoding path, wherein the encoding path includes encoding blocks and an attention model configured to extract features of the brain MRI image; the MRI image of the brain is input into the coding blocks to obtain a first feature map, and the first feature map is input into the attention model to obtain a third feature map; wherein the attention model includes a feature attention submodel and a spatial attention submodel connected in sequence; the first feature map is input into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second network and a Sigmoid layer; the second feature map is input into the first network and the second network respectively, and outputs from the first network and the second network are added and input into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimension; and a decoding path, the decoding path comprising decoding blocks configured to decode the third feature map to obtain a region of the brain tumor.

6. An apparatus for segmenting image areas of brain tumors, comprising the following modules: an acquisition module configured to acquire an MRI image of the brain; a build module configured to build coding modules, each of the coding modules including coding blocks and an attention model; the MRI image is input into the coding blocks to obtain a first feature map; the attention model includes a feature attention submodel and a spatial attention submodel connected in sequence; the first feature map is input into the feature attention submodel to obtain a second feature map; the spatial attention submodel includes a first network, a second network and a Sigmoid layer; the second feature map is input into the first network and the second network respectively, and outputs from the first network and the second network are added and input into the Sigmoid layer to obtain a spatial attention map; a third feature map is obtained according to the spatial attention map and the second feature map; wherein the first network includes a MaxPool layer and a hierarchical fully connected layer along a feature dimension, and the second network includes an AvgPool layer and a hierarchical fully connected layer along the feature dimension; a feature extraction module configured to input and extract the brain MRI image into the encoding modules to obtain a third feature map of the brain MRI image; and a decoding module configured to decode the third feature map to obtain a segmented region of the brain tumor.

The device of claim 6, wherein the feature attention submodel includes a third network, a fourth network and a Sigmoid layer; the first feature map is input to the third network and the fourth network respectively, and outputs from the third network and the fourth network are added and input to the Sigmoid layer to obtain a feature attention map; the feature attention card is multiplied by the first feature card to obtain the second feature card.

The apparatus of claim 6, wherein the hierarchical fully connected layer includes some sequentially connected sub-modules, and all second outputs of the sub-modules are added as output of the hierarchical fully connected layer; where each of the sub-modules is configured to perform the following operation, until a certain spatial dimension of an input is less than an area size, and the input of a first sub-module is the second function map after max pooling or avg pooling: dividing an input according to an area size; feeding the distributed input into a feedforward network to learn the spatial location relationship of local areas, and then restoring a shape to the same shape as the input to obtain a first output;

upsampling the first output to obtain a second output of the same size as an input of the hierarchical fully connected layer; and downsampling the first output and taking the first output as input to a subsequent submodule after the first output has passed through a batch normalization layer.

9. Electronic equipment, including the following components: one or more processors; and a memory configured to store one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any one of claims 1 to 4.

A computer-readable storage medium storing a computer instruction, wherein the instruction, when executed by the processor(s), implements the method steps of any one of claims 1 to 4.