CN111325751B - CT image segmentation system based on attention convolution neural network - Google Patents

CT image segmentation system based on attention convolution neural network Download PDF

Info

Publication number
CN111325751B
CN111325751B CN202010190946.4A CN202010190946A CN111325751B CN 111325751 B CN111325751 B CN 111325751B CN 202010190946 A CN202010190946 A CN 202010190946A CN 111325751 B CN111325751 B CN 111325751B
Authority
CN
China
Prior art keywords
module
attention
convolution
feature
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010190946.4A
Other languages
Chinese (zh)
Other versions
CN111325751A (en
Inventor
龙建武
宋鑫磊
安勇
鄢泽然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202010190946.4A priority Critical patent/CN111325751B/en
Publication of CN111325751A publication Critical patent/CN111325751A/en
Application granted granted Critical
Publication of CN111325751B publication Critical patent/CN111325751B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Abstract

The invention provides a CT image segmentation system based on an attention convolution neural network, which comprises a feature coding module, a feature extraction module and a feature extraction module, wherein the feature coding module is used for gradually reducing the size of a feature map of an input image by using a parallel convolution neural network, and realizing the simultaneous extraction of image semantic information and spatial information through network layer multiplexing and the interception and fusion of features of each layer; the attention feature is generated by using pooling, and the semantic information extraction attention module is used for further refining and refining the semantic information features extracted by the feature coding module; the feature fusion pooling attention module is used for combining the refined semantic information features with the semantic information and spatial information features spliced by the feature coding module in parallel to form an attention feature map; and a convolution module and an up-sampling module are used for finely restoring the attention feature map step by step into a feature map code module of the size of the input image. The invention realizes efficient and accurate image segmentation by fusing the attention module.

Description

CT image segmentation system based on attention convolution neural network
Technical Field
The invention relates to the technical field of image understanding, in particular to a CT image segmentation system based on an attention convolution neural network.
Background
Image segmentation is an important fundamental research problem in the field of computer vision, while medical image segmentation is an application of image segmentation, which can accurately and rapidly position a large number of patient lesions in a short time. Therefore, how to effectively apply the image segmentation technique to medical images becomes a major task of researchers.
The medical image segmentation classifies semantic expressions in an image pixel by extracting medical image features, and the medical image segmentation needs to accurately position an object and a class to which the object belongs and the position of the object, and clearly divides an object boundary to distinguish different classes of objects.
At present, there are many medical image segmentation methods widely used at home and abroad, wherein the traditional method mainly comprises the following steps: based on threshold segmentation, the threshold segmentation has the advantages of relatively simple implementation, but is not suitable for multi-channel images and images with little difference of characteristic values, and is difficult to obtain accurate results for the image segmentation problem that obvious gray difference does not exist in the images or gray value ranges of various objects are greatly overlapped; based on the edge segmentation method, the edge detection has the advantages of high search detection speed and good edge detection effect, but also has the defects of incapability of obtaining better region structure and contradiction between noise resistance and detection precision during edge detection; the method based on the active contour model is also called as a Snake model, the basic idea of the original Snake model is that an initial curve with an energy function is gradually deformed and moved towards the contour direction of a target to be detected through energy minimization, and finally the initial curve is converged to a target boundary to obtain a smooth and continuous contour, and the original Snake model has the defects of difficulty in capturing a target concave boundary, sensitivity to an initial contour line and the like, so that a plurality of subsequent improved methods are provided.
In addition, the segmentation method based on the neural network populates the end-to-end Convolutional network into semantic segmentation since Long et al proposed an FCN algorithm (full probabilistic Networks) in 2014. The pretrained ImageNet network is used for the segmentation problem again, the deconvolution layer is used for up-sampling, the jump connection is provided to improve the roughness of the up-sampling, but the result obtained by the FCN has a certain difference from the practical application. Although the accuracy is improved by using the skip structure, the model cannot be well separated from the edge information of the image. In the process of classifying pixels one by one, the FCN does not fully consider the connection between pixels, and lacks spatial consistency. Vijay et al proposed a SegNet (semantic segmentation) algorithm in 2015 that shifted large pooling indices into the decoder, improving segmentation resolution. In an FCN network, a coarse segmentation map is generated by convolutional layers and some hopping connections, and more hopping connections are introduced to improve the effect. However, FCN only replicates the encoder features, while SegNet replicates the maximum pooling index, which makes SegNet more efficient than FCN in memory usage.
The U-Net proposed by Ronneberger et al combines shallow semantic information with deep semantic information, and segments medical images using Encoder and Decoder architectures, but the feature extraction is not good. Yu et al proposed in 2016 a hole convolution layer (dilatedconvolentions) that increased the corresponding receptive field index without reducing the spatial dimensions. In deep lab, which will be mentioned next, the void Convolution is called porous Convolution (Atrous Convolution). The last two pooling layers are removed from the pre-trained classification Network (here, VGG, Visual Geometry Group Network) and the subsequent convolutional layers are replaced with hole convolutions. The DeepLabV2 and V3 use hole convolution, and implement Pyramid-shaped hole pooling ASPP (atomic Spatial Pyramid) in Spatial dimension, and use full-connected conditional random field, and the hole convolution increases the receptive field without increasing the number of parameters.
Zhao et al proposed pspnet (pyramid Scene Parsing network) in 2017. the algorithm proposed a pyramid pooling module to aggregate the background information and used additive Loss (auxiary Loss). In addition global scene classification is important because it provides clues to segment the distribution of classes, and pyramid pooling modules use large kernel pooling layers to capture this information. As with the hole convolution system mentioned above, PSPNet also improves the ResNet structure with hole convolution and adds a pyramidal pooling module that connects the feature map of ResNet to the upsampled output of the parallel pooling layer, with the kernel covering the entire area, half-area and small areas of the image, respectively.
Chen et al, in 2018, again proposed the deep labv3+ model, using a spatial pyramid pool module and a codec structure to be used for the deep neural network for the semantic segmentation task. The former network can encode multi-scale context information by detecting input features with filter or sink operations at multiple rates and multiple effective fields of view, while the latter network can capture sharper object boundaries by gradually restoring spatial information. The algorithm combines the advantages of both methods, extending deepLabv3+ by adding a simple and efficient decoder module to refine the segmentation results, especially along object boundaries. By further exploring the Xception model and applying the deep separable convolution to aspp (advanced Spatial Pyramid boosting) and decoder modules, a faster and stronger encoder-decoder network is constructed, but there are disadvantages of large consumption of computational resources, etc. The pyramid structure is used as a module for semantic segmentation, has good integration, can be easily added into any neural network structure, and obtains excellent effect in the process of extracting context information. However, the pyramid structure has some defects, such as what is really needed to be valued by the network for the extracted information, and the pyramid structure is not well explained.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides a CT image segmentation system based on an attention convolution neural network, which designs an accurate and efficient segmentation model by using a deep learning method and fusing an attention module, so that the execution efficiency of the existing CT image segmentation method is improved, and a more accurate segmentation result is obtained.
In order to solve the technical problems, the invention adopts the following technical scheme:
a CT image segmentation system based on an attention convolution neural network comprises a feature coding module, a semantic information extraction attention module, a feature fusion pooling attention module and a feature graph code module; the feature coding module gradually reduces the size of a feature map of an input image by using a parallel convolution neural network, and realizes the simultaneous extraction of semantic information features and spatial information features of the image through network layer multiplexing and interception and fusion of features of each layer; the semantic information extraction attention module generates attention features by using pooling, and further refines and refines the semantic information features extracted by the feature coding module; the feature fusion pooling attention module is connected in parallel with the average pooling by using maximum pooling and average pooling, and combines semantic information features refined by the semantic information extraction attention module with semantic information and spatial information features spliced by the feature coding module to form an attention feature map; and the feature map decoding module gradually and finely restores the attention feature map fused by the feature fusion pooling attention module into the size of the input image by using a convolution module and an up-sampling module.
Compared with the prior art, the CT image segmentation system based on the attention convolution neural network provided by the invention firstly gradually reduces the size of the feature map of an input image by using the convolution neural network, further extracts abundant semantic information features for optimizing a classification task, and simultaneously reduces the loss of space information feature compression by network design during the extraction of the semantic information features; then, optimizing the extraction of the semantic information by using a semantic information extraction attention module; then, combining the semantic information features refined by the semantic information extraction attention module with the semantic information and spatial information features spliced by the feature coding module by using a feature fusion pooling attention module, and performing fusion processing through pooling attention to obtain an attention feature map; and finally, performing upsampling and convolution operations by using a feature map decoding module, and finely restoring the attention feature map to the size of the input image step by step. In addition, compared with the current typical segmentation network, the segmentation system model provided by the invention has higher adaptability to the CT image data set segmentation.
Further, the feature coding module comprises a first convolution module, a second convolution module, a first bottleneck channel, a second bottleneck channel, a third bottleneck channel, a fourth bottleneck channel and a first splicing operation module which are arranged in sequence, the first convolution module comprises a convolution layer and a batch regularization which are sequentially arranged, the second convolution module comprises a convolution layer, a batch regularization and a ReLu activation function which are sequentially arranged, the first bottleneck channel, the second bottleneck channel, the third bottleneck channel and the fourth bottleneck channel are arranged in parallel, the bottleneck layer in each bottleneck channel is continuously reduced from the first bottleneck channel to the end of the fourth bottleneck channel, while the second to fourth bottleneck passageways are continuously reduced in size compared to the output characteristic of the first bottleneck passageway, and the number of the feature map channels finally output by each bottleneck layer is increased along with the increase of the number of the layers, and the semantic information features and the spatial information features extracted by the four bottleneck channels are spliced by the first splicing operation module.
Further, the convolution kernel size of the convolutional layer is 3 × 3, and the step size is 2.
Further, the number of the bottleneck layers in the first to fourth bottleneck paths is 4, 3, 2, 1 respectively, the output characteristic diagram size of the second to fourth bottleneck paths compared with the first bottleneck path is 1/2, 1/4, 1/8 respectively, and the number of the output characteristic diagram channels in the first to fourth bottleneck paths is 128, 256, 512 and 1024 respectively.
Further, each bottleneck layer comprises three convolution units, an addition unit and a ReLu activation function unit which are sequentially arranged, each convolution unit comprises a convolution kernel, a batch regularization function and a ReLu activation function which are sequentially arranged, and the addition unit is also in jump connection with the feature map input into the convolution kernel of the first convolution unit.
Further, the semantic information extraction attention module comprises a first channel attention module, a second channel attention module, a global pooling module, a multiplication operation module and a second splicing operation module, wherein the first channel attention module and the second channel attention module are arranged in parallel, each channel attention module comprises a global average pooling module, a convolution module, a batch regularization and Sigmoid activation function and a multiplication operation, the global average pooling module is sequentially arranged and used for capturing the semantic feature information of the lower context in the input feature map, the convolution module is used for calculating the weight of the semantic information, the batch regularization and Sigmoid activation function are used for refining the semantic information extraction, the multiplication operation is used for multiplying the refined semantic information and the input feature map, the multiplication operation module is used for multiplying the feature map output by the second channel attention module and the output feature map processed by the global pooling module, the second splicing operation module is used for splicing the feature map output by the first channel attention module and the output feature map of the multiplication operation module, and the input feature maps of the two channel attention modules are obtained by connecting semantic information features extracted by the feature coding module.
Further, the feature fusion pooling attention module comprises a third convolution module, an average pooling passage, a maximum pooling passage and a two-way pooling multiplication operation module, wherein the third convolution module is used for extracting mixed information features of the fused semantic information features and the spatial information features and simultaneously converting channels of the information, the average pooling passage and the maximum pooling passage are arranged in parallel and are respectively used for processing the features extracted by the third convolution module, and the two-way pooling multiplication operation module is used for multiplying the two processed features of the average pooling passage and the maximum pooling passage to form an attention feature map.
Further, the average pooling passage uses two serially connected average pooling modules to process the features as a first passage for feature extraction, and the maximum pooling passage uses two serially connected maximum pooling modules to process the features as a second passage for feature extraction.
Further, the feature map coding module comprises a first up-sampling module, a fourth convolution module, a second up-sampling module, a fifth convolution module and a sixth convolution module which are sequentially arranged, the feature maps output by the first up-sampling module and the fourth convolution module are the same in size, and the feature maps output by the second up-sampling module, the fifth convolution module and the sixth convolution module are all the same in size as the input image.
Further, the sampling coefficients of the first and second upsampling modules are 2.
Drawings
FIG. 1 is a schematic block diagram of a CT image segmentation system based on an attention convolution neural network according to the present invention.
Fig. 2 is a schematic structural diagram of the feature encoding module of fig. 1.
Fig. 3 is a schematic diagram of the structure of each bottleneck layer in the feature encoding module of fig. 2.
FIG. 4 is a block diagram of a channel attention module of the semantic information extraction attention module of FIG. 1.
FIG. 5 is a schematic diagram of the structure of the feature fusion pooling attention module of FIG. 1.
Fig. 6 is a schematic structural diagram of a feature diagram decoding module of fig. 1.
FIG. 7 is a graph illustrating the FCN and FEM training process.
FIG. 8 is a schematic diagram of an image comparison of pancreas segmentation test results provided by the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained below by combining the specific drawings.
Referring to fig. 1, the present invention provides a CT image segmentation system based on an attention convolutional neural network, which includes a feature coding module, a semantic information extraction attention module, a feature fusion pooling attention module, and a feature graph coding module; the feature coding module gradually reduces the size of a feature map of an input image by using a parallel convolution neural network, and realizes the simultaneous extraction of semantic information features and spatial information features of the image through network layer multiplexing and interception and fusion of features of each layer; the semantic information extraction attention module generates attention features by using pooling, and further refines and refines the semantic information features extracted by the feature coding module; the feature fusion pooling attention module is connected in parallel by maximum pooling and average pooling, and fuses semantic information features refined by the semantic information extraction attention module and semantic information and spatial information features spliced by the feature coding module to form an attention feature map; and the feature map decoding module gradually and finely restores the attention feature map fused by the feature fusion pooling attention module into the size of the input image by using a convolution module and an up-sampling module.
Compared with the prior art, the CT image segmentation system based on the attention convolution neural network provided by the invention firstly gradually reduces the size of the feature map of an input image by using the convolution neural network, further extracts abundant semantic information features for optimizing a classification task, and simultaneously reduces the loss of space information feature compression by network design during the extraction of the semantic information features; then, optimizing the extraction of the semantic information by using a semantic information extraction attention module; then, a characteristic fusion pooling attention module is used for combining semantic information characteristics refined by a semantic information extraction attention module with semantic information and spatial information characteristics spliced by a characteristic coding module, and fusion processing is carried out through pooling attention to obtain an attention characteristic diagram; and finally, performing upsampling and convolution operations by using a feature map decoding module, and finely restoring the attention feature map to the size of the input image step by step. In addition, compared with the current typical segmentation network, the segmentation system model provided by the invention has higher adaptability to the CT image data set segmentation.
Specifically, the design background on the feature encoding module is as follows: as is well known, for a semantic segmentation task, spatial information is as important as semantic information, and a traditional deep learning method uses a series convolution mode to reduce the size of a feature map step by step through convolution and pooling so as to achieve the purpose of extracting semantic information and spatial information, for example, methods such as FCN, SegNet, U-Net, deep lab, and the like. However, spatial information is inevitably lost in the process of reducing the feature map, so many models make a lot of improvements to this point, for example: DeepLabV3 and PSPNet extract spatial information by using pyramid pooling and cavity convolution, BiseNet extracts spatial features by adding a very short network again, DenseASPP reduces the loss of the feature space to the minimum by using a Dense connection structure, PAN is arranged at the tail and the middle of a backbone network, and an attention module is added to increase the spatial feature extraction power of the network. However, if the spatial information is too much emphasized, very accurate semantic information cannot be obtained, which causes a dilemma. According to the invention, through designing a network, two complex tasks of semantic information extraction and spatial information extraction are simultaneously carried out, and under the condition of only increasing a small amount of network parameters, spatial information and semantic information features are simultaneously extracted through network layer multiplexing and interception and fusion of features of each layer, and no additional loss is brought.
As a specific embodiment, please refer to fig. 2, the feature encoding module includes a first convolution module, a second convolution module, first to fourth bottleneck paths, and a first splicing operation module arranged in sequence, the first convolution module comprises a sequentially arranged convolution layer (Conv) and batch regularization (BN), the second convolution module comprises a convolutional layer (Conv), a batch regularization (BN) and a ReLu activation function which are sequentially arranged, the first Bottleneck passage, the second Bottleneck passage and the fourth Bottleneck passage are arranged in parallel, from the first Bottleneck passage to the end of the fourth Bottleneck passage, the Bottleneck layer (Bottleneck) in each Bottleneck passage is continuously reduced, while the second to fourth bottleneck passageways are continuously reduced in size compared to the output characteristic of the first bottleneck passageway, and the number of the feature map channels finally output by each bottleneck layer is increased along with the increase of the number of the layers, and the semantic information features and the spatial information features extracted by the four bottleneck channels are spliced by the first splicing (concat) operation module. In the design of the feature coding module provided by this embodiment, a traditional convolution series mode is changed, a parallel mode is used to simultaneously extract semantic information features and spatial information features, a Bottleneck layer (Bottleneck) is set as 4 parallel paths when a network is designed, the spatial information features are retained because the size of a feature map on each path is not changed, and the combination of multi-scale feature maps is realized because the size of each channel feature map is different; the size of each path feature map is gradually reduced, so that the extraction of semantic information features is realized at the top layer of each path.
As a preferred embodiment, please refer to fig. 2, the convolution kernel size of the convolution layer is 3 × 3, and the step size is 2, so that the first convolution module and the second convolution module can be used to reduce the feature map of the input image, and reduce the amount of computation.
As a preferred embodiment, please refer to fig. 2, the number of the bottleneck layers in the first to fourth bottleneck paths is 4, 3, 2, 1, respectively, the sizes of the feature maps output by the second to fourth bottleneck paths are 1/2, 1/4, 1/8, respectively, compared to the first bottleneck path, and the number of channels of the output feature maps in the first to fourth bottleneck paths is 128, 256, 512, 1024, respectively, thereby better extracting the semantic information features and the spatial information features at the same time.
As a specific embodiment, please refer to fig. 3, each bottleneck layer includes three convolution units, an adding unit (Add) and a ReLu activation function unit, which are sequentially arranged, each convolution unit includes a convolution kernel (ConV2D), a Batch regularization (BN), and a ReLu activation function, which are sequentially arranged, and the adding unit is also jump-connected to a feature map in the convolution kernel input to the first convolution unit, so that the jump-connection and the ReLu activation function are added to the convolution layer, so that a path of a convolutional neural network can be autonomously selected through network learning, thereby further improving accuracy.
Specifically, aiming at the Semantic Information characteristics, the invention designs a Semantic Information Extraction Attention Module (SIEAM) for the task again. As a specific embodiment, please refer to fig. 1 and 4, the semantic information extracting attention module includes a first channel attention module, a second channel attention module, a global pooling module, a multiplication operation module, and a second concatenation operation module, the first channel attention module and the second channel attention module are arranged in parallel, each of the channel attention modules includes a global average pooling for capturing context semantic feature information in an input feature map, a convolution (ConV2D) for calculating semantic information weight, a canonical Batching (BN) and Sigmoid activation functions for refining semantic information extraction after convolution, and a multiplication (Mul) operation for multiplying the refined semantic information with the input feature map, the multiplication (Mul) operation module is used for multiplying the feature map output by the second channel attention module with an output feature map processed by the global pooling module, the second splicing (concat) operation module is used for splicing the feature graph output by the first channel attention module and the output feature graph of the multiplying operation module, and the feature graph is multiplied to be used as a weight influence input feature graph, so that the task of thinning semantic information is achieved; wherein, the input feature maps of the two channel attention modules are obtained by butting semantic information features extracted by a feature coding module, specifically, as shown in fig. 2, a leftmost Bottleneck layer (Bottleneck) and a second leftmost upper Bottleneck layer (Bottleneck) in fig. 2 are rich in a large amount of semantic information features, so for the two bottlenecks, two channel attention modules in a Semantic Information Extraction Attention Module (SIEAM) are butted with the two Bottleneck layers in a one-to-one correspondence manner, specifically, the leftmost Bottleneck layer is connected with the second channel attention module, and the second leftmost upper Bottleneck layer is connected with the first channel attention module, so that the semantic information features extracted by the two Bottleneck layers are respectively used as input feature maps of the two channel attention modules one by one, and then, after being refined by the semantic information extraction attention module, are sent to a feature fusion pooling attention module for integration, and accordingly, the SIEAM realizes integration of a large amount of global context semantic information features, only a little more computational cost is added.
Specifically, the design background on the feature fusion pooling attention module is as follows: although the feature coding module can fully extract the spatial information of the image features, and the semantic information extraction attention module can also extract more detailed semantic information, the spatial information is not matched with the semantic information, and a module is required to integrate the two information instead of removing the rough fusion. Therefore, the invention provides a Feature Fusion Pooling Attention Module (FFPAM), semantic information features and spatial information features are fused through the FFPAM and are applied to a Feature map as Attention information, so that the context semantic information and the spatial information can be fully fused, and the segmentation precision is improved.
As a specific embodiment, please refer to fig. 5, the feature fusion pooling attention module includes a third convolution module (including a convolution ConV2D-BN-ReLU activation function) for extracting mixed information features of fused semantic information features and spatial information features and simultaneously converting channels of information, an average pooling path, a maximum pooling path, and a two-way pooling multiplication operation module, where the average pooling path and the maximum pooling path are arranged in parallel and are respectively used for processing the features extracted by the third convolution module, and the two-way pooling multiplication operation module is used for multiplying the two processed features of the average pooling path and the maximum pooling path to form an attention feature map. The invention fuses the spatial information characteristic and the semantic information characteristic by two paths of the average pooling path and the maximum pooling path which are connected in parallel, thereby increasing the receptive field of the model and enhancing the characteristic extraction capability of the model, and an attention characteristic diagram formed by multiplying the two paths of characteristics has the characteristics of the average pooling path and the maximum pooling path at the same time, the attention characteristic is multiplied with the input characteristic diagram and is superposed on the input characteristic as a weight to influence the input characteristic diagram, and finally, a jump connection structure in ResNet is used, so that the negative influence of an attention module on the input characteristic diagram can be reduced and the final characteristic diagram can be output. The feature fusion pooling attention module in the embodiment successfully combines context semantic information and image space information together through multiplication of two routes, so that higher precision is improved, in order to verify the effectiveness of average pooling and maximum pooling, the invention tests 5 conditions of single-path maximum pooling, single-path average pooling, two-path pooling addition, two-path pooling combination and two-path pooling multiplication, and experiments prove that the two-path pooling multiplication really brings optimal precision, and the similarity (dice) precision of 2.71% is improved by the module through the design of the module.
As a preferred embodiment, please refer to fig. 5, the average pooling path uses two serially connected average pooling modules (including average pooled AvgPool-convolution ConV2D-ReLU activation function) to process the features as a first path for feature extraction, and after the output of the ReLU activation function in the second average pooling module is multiplied by the input feature map of the path, the multiplied feature map is added to the input feature map of the path to obtain the final output result of the path; the maximum pooling path uses two maximum pooling modules (including maximum pooling Maxpool-convolution ConV2D-ReLU activation function) connected in series to process the characteristics as a second path for characteristic extraction, after the output of the ReLU activation function in the second maximum pooling module is multiplied by the input characteristic diagram of the path, the characteristic diagram formed by multiplication is added to the input characteristic diagram of the path to serve as the final output result of the path; and finally, multiplying the characteristics finally output by the two channels with the characteristics extracted by the third convolution module (namely the output of the ReLU activation function in the third convolution module), adding (Add) the multiplied result with the characteristics extracted by the third convolution module, and forming an attention characteristic diagram through the ReLU activation function.
As a specific example, please refer to fig. 6, the feature map coding module includes a first upsampling module (Upsample), a fourth convolution module (including convolution Conv-BN-ReLU activation function), a second upsampling module (Upsample), a fifth convolution module (including convolution Conv-BN-ReLU activation function), and a sixth convolution module (including convolution Conv-BN-ReLU activation function), which are sequentially arranged, the feature map sizes output by the first upsampling module and the fourth convolution module are the same (e.g., 96, 128), and the feature map sizes output by the second upsampling module, the fifth convolution module, and the sixth convolution module (e.g., 192, 256) are all the same as the input image. In the embodiment, the up-sampling information is refined by using the three convolution modules, so that the segmentation result is refined by one step, and the precision is improved finally.
As a specific embodiment, the sampling coefficient of the first upsampling module and the sampling coefficient of the second upsampling module are 2, and specifically, the existing bilinear interpolation method can be used for sampling, that is, 2 times of upsampling by the bilinear interpolation method is used for sampling, and a convolution module is used for refining certain spatial information loss caused by the bilinear interpolation method upsampling, so that spatial information loss caused by sampling is reduced.
In designing a CT image (e.g. pancreas image) segmentation system model provided by the present invention, it is first necessary to prepare a data set and preprocess the data set, and process the data set as an input required by the model, so as to improve the robustness of the model. Specifically, the data preprocessing includes: processing each slice, and classifying all pixels with the pixel larger than 240 as 240 and all pixels with the pixel smaller than-100 as-100, wherein the calculation formula is as follows:
image Pixel[Pixel<low_range]=low_range
image Pixel[Pixel>high_range]=high_range
wherein image Pixel is an image Pixel, low _ range is-100, and high _ range is 240. Each slice is then normalized so that its pixel intensity iso-map is between (-1, 1).
The data set preparation includes: the data set is divided into three parts, namely a training set, a verification set and a test set by adopting an NIH criteria segmentation dataset and using 4-fold cross-validation. The training set and the validation set total 62 samples, and the test set total 20 samples. During training, using Adam optimizer, initial learning rate was set to 10-5Then the learning rate decays by 0.2 every 10epochs (understood as a batch, equal to one training with all samples in the training set), for a total of 100 batches of training repeated in the experiment. The results show that training medical images from scratch can achieve better performance and shorter training times than model pre-trained with fine-tuned natural images.
Compared with the prior art, the CT image segmentation system based on the attention convolution neural network has the following advantages:
firstly, in a feature coding module, the FCN is used for carrying out an experiment relative to a backbone network, due to the fact that strategies such as learning rate attenuation, initialization parameters, regularization input and overfitting prevention are used, under the condition that 100epochs are repeated in a training process, in a scheme image of the system provided by the invention, object segmentation has a high dice value; because the semantic information and the spatial information are considered at the same time, the convergence rate is very high, and the loss value is lower than the baseline FCN, which is also reflected in that the dice value is higher than the FCN.
Secondly, the cross parallel network used in the invention learns more features than the FCN in the process of extracting the image information. As shown in table 1 below, while the parameter amount is much smaller than the FCN with VGG16 as the basic architecture, the network of the present invention scores much higher than the FCN in terms of precision rate, recall rate or dice score, which proves the effectiveness of the feature coding module used in the present invention.
TABLE 1
Model (model) Average dice% Maximum dice% Minimum dice% Rate of accuracy Recall rate Amount of ginseng
FCN 69.02±6.3 76.14 49.48 0.7092 0.6754 134.3M
FEM 78.93±5.6 86.54 65.15 0.8339 0.7543 16.15M
Thirdly, in the feature fusion pooling attention module, as shown in table 2 below, the path of the feature fusion pooling attention module is set to be one, and an experiment is performed by multiplying the average pooling path by the maximum pooling path, so that indexes in all aspects are greatly improved, all indexes are higher than those in the previous item, and the context semantic information and the image space information are successfully combined together by multiplying the results of the two paths, so that high precision is brought.
TABLE 2
Figure BDA0002415879340000131
Figure BDA0002415879340000141
Fourth, as shown in table 3 below, the frame used in the present invention has a large increase in dice value when the parameters are much smaller than those of the current typical networks FCN and U-Net.
TABLE 3
Model (model) Basic network Dice% Amount of ginseng
FCN VGG16 80.3 134.3M
U-Net VGG16 79.7 23.3M
Bisenet XceptionV1 82.8 44.8M
Framework for use in the system FEM 86.6 18.9M
Fifth, as shown in table 4 below, the present invention will be compared to the current typical network to observe the adaptability of each model to the pancreatic CT dataset. Of the current 82 samples, most models used 62/20 training/test set ratios, with # Folds being the fold number for cross validation, it can be seen that the system model of the present invention is higher than these typical models at present.
TABLE 4
Figure BDA0002415879340000142
Sixthly, the fusion experiment of each module uses 20 samples as a test set as the same as the previous experiment, and then the precision, recall rate and dice value of the 20 samples are respectively tested. As shown in table 5 below, it can be seen that Base + Decoder + ARM + GAM is much higher than others in the aspect of recall rate and dice value, except that the precision rate is slightly lower than that of Base + Decoder + ARM, which also verifies the validity of all modules stacked.
TABLE 5
Model (model) Average dice% Maximum dice% Minimum dice% Rate of accuracy Recall rate Amount of ginseng
FCN(baseline) 69.02±6.3 76.14 49.48 0.7092 0.6754 134.3M
FEM+FDM 82.81±4.2 88.54 74.07 0.8477 0.8115 16.15M
FEM+FDM+SIEAM 83.91±4.4 89.70 73.89 0.8726 0.8106 18,96M
FEM+FDM+SIEAM+FFPAM 86.62±3.6 91.31 78.91 0.8607 0.8737 19.8M
Referring to fig. 8, the Image before the 1 st behavior segmentation, the 2 nd behavior label GT, the test result segmented by the 3 rd behavior FCN, the test result segmented by the 4 th behavior U-Net, the test result segmented by the 5 th behavior FEM + FDM, and the test result segmented by the 6 th behavior FEM + FDM + SIEAM + FFPAM are the final algorithm segmentation test results. As can be seen from this figure, since the FCN directly upsamples the feature map with small segmentation using the transposed convolution, the result lacks edge smoothness, presenting a mosaic-like segmentation result. The algorithm well smoothes the hard edge feature of the FCN because U-Net has a gentle upsampling, but U-Net generates many extra small fragments on the detail segmentation, and the small fragments are not generated on the 2 nd, 3 rd and 4 th segmentation prediction maps of the 4 th row. On the 5 th line, on the FEM + FDM used by the invention, as the spatial information and semantic information of the image are effectively reserved, the fragments generated in the segmentation process of U-Net are effectively reduced, and the whole picture becomes clean; however, in detail segmentation, there are some disadvantages. For example, in row 5, no effective segmentation of the folds of the pancreas occurred, and in row 5, no segmentation of the pancreas was achieved, in row 3, too much segmentation of the pancreas area. On the basis of the method, two attention modules are added to the method, so that the method is focused on solving the detail defects. In line 6, the final model used in the present invention effectively solves the fragmentation of the region around the segmented target, and is more complete for the detailed region than FEM + FDM, and the whole is closer to GT.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (9)

1. The CT image segmentation system based on the attention convolutional neural network is characterized by comprising a feature coding module, a semantic information extraction attention module, a feature fusion pooling attention module and a feature graph code module; the feature coding module gradually reduces the size of a feature map of an input image by using a parallel convolution neural network, and realizes the simultaneous extraction of semantic information features and spatial information features of the image through network layer multiplexing and interception and fusion of features of each layer; the semantic information extraction attention module generates attention features by using pooling, and further refines and refines the semantic information features extracted by the feature coding module; the feature fusion pooling attention module is connected in parallel with the average pooling by using maximum pooling and average pooling, and combines semantic information features refined by the semantic information extraction attention module with semantic information and spatial information features spliced by the feature coding module to form an attention feature map; the feature map decoding module gradually and finely restores the attention feature map fused by the feature fusion pooling attention module into the size of an input image by using a convolution module and an up-sampling module; the feature coding module comprises a first convolution module, a second convolution module, a first bottleneck channel, a second bottleneck channel, a third bottleneck channel, a fourth bottleneck channel and a first splicing operation module, wherein the first convolution module, the second convolution module, the first splicing operation module, the second convolution module, the second bottleneck channel and the fourth splicing operation module are sequentially arranged, the first convolution module, the second convolution module, the third bottleneck channel, the fourth bottleneck channel and the ReLu activation module are sequentially arranged, the first bottleneck channel, the fourth bottleneck channel and the first bottleneck channel are arranged in parallel, bottleneck layers in each bottleneck channel are continuously reduced from the first bottleneck channel to the fourth bottleneck channel, the size of feature graphs output by the second bottleneck channel to the fourth bottleneck channel is continuously reduced compared with the size of feature graphs output by the first bottleneck channel, the number of feature graphs output by each bottleneck layer is increased along with the increase of the number of layers, and the first splicing operation module splices semantic information features and spatial information features extracted by the four bottleneck channels.
2. The attention convolution neural network-based CT image segmentation system of claim 1, wherein the convolution layer has a convolution kernel size of 3 x 3 and a step size of 2.
3. The attention convolution neural network-based CT image segmentation system according to claim 1, wherein the number of bottleneck layers in the first to fourth bottleneck paths is 4, 3, 2, 1, respectively, the sizes of feature maps output by the second to fourth bottleneck paths compared with the first bottleneck path are 1/2, 1/4, 1/8, respectively, and the number of channels of output feature maps in the first to fourth bottleneck paths is 128, 256, 512 and 1024, respectively.
4. The attention convolution neural network-based CT image segmentation system according to claim 1, wherein each bottleneck layer comprises three convolution units, an addition unit and a ReLu activation function unit which are sequentially arranged, each convolution unit comprises a convolution kernel, a batch regularization and a ReLu activation function which are sequentially arranged, and the addition unit is also in jump connection with a feature map in the convolution kernel input to the first convolution unit.
5. The CT image segmentation system based on attention convolution neural network of claim 1, wherein the semantic information extraction attention module comprises a first channel attention module, a second channel attention module, a global pooling module, a multiplication module and a second stitching module, the first channel attention module and the second channel attention module are arranged in parallel, each of the channel attention modules comprises a global average pooling for capturing context semantic feature information in an input feature map, a convolution for calculating semantic information weight, a batch regularization and Sigmoid activation function for refining semantic information extraction, and a multiplication operation for multiplying the refined semantic information with the input feature map, the multiplication module is used for multiplying the feature map output by the second channel attention module with an output feature map processed by the global pooling module, the second splicing operation module is used for splicing the feature map output by the first channel attention module and the output feature map of the multiplication operation module, and the input feature maps of the two channel attention modules are obtained by butting semantic information features extracted by the feature coding module.
6. The CT image segmentation system based on the attention convolutional neural network of claim 1, wherein the feature fusion pooling attention module comprises a third convolution module, an average pooling path, a maximum pooling path and a two-way pooling multiplication operation module, the third convolution module is used for extracting mixed information features of the fused semantic information features and spatial information features and simultaneously converting channels of information, the average pooling path and the maximum pooling path are arranged in parallel and are respectively used for processing the features extracted by the third convolution module, and the two-way pooling multiplication operation module is used for multiplying the two paths of features processed by the average pooling path and the maximum pooling path to form an attention feature map.
7. The attention convolution neural network-based CT image segmentation system of claim 6, wherein the average pooling pass uses two serially connected average pooling modules to process features as a first pass of feature extraction and the max pooling pass uses two serially connected max pooling modules to process features as a second pass of feature extraction.
8. The CT image segmentation system based on the attention convolution neural network is characterized in that the feature map coding module comprises a first up-sampling module, a fourth convolution module, a second up-sampling module, a fifth convolution module and a sixth convolution module which are sequentially arranged, the feature maps output by the first up-sampling module and the fourth convolution module are the same in size, and the feature maps output by the second up-sampling module, the fifth convolution module and the sixth convolution module are the same in size as the input image.
9. The attention convolution neural network-based CT image segmentation system of claim 8, wherein a sampling coefficient of the first and second upsampling modules is 2.
CN202010190946.4A 2020-03-18 2020-03-18 CT image segmentation system based on attention convolution neural network Expired - Fee Related CN111325751B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010190946.4A CN111325751B (en) 2020-03-18 2020-03-18 CT image segmentation system based on attention convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010190946.4A CN111325751B (en) 2020-03-18 2020-03-18 CT image segmentation system based on attention convolution neural network

Publications (2)

Publication Number Publication Date
CN111325751A CN111325751A (en) 2020-06-23
CN111325751B true CN111325751B (en) 2022-05-27

Family

ID=71171544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010190946.4A Expired - Fee Related CN111325751B (en) 2020-03-18 2020-03-18 CT image segmentation system based on attention convolution neural network

Country Status (1)

Country Link
CN (1) CN111325751B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798428B (en) * 2020-07-03 2023-05-30 南京信息工程大学 Automatic segmentation method for multiple tissues of skin pathology image
CN111914947B (en) * 2020-08-20 2024-04-16 华侨大学 Image instance segmentation method, device, equipment and storage medium based on feature fusion
CN112084911B (en) * 2020-08-28 2023-03-07 安徽清新互联信息科技有限公司 Human face feature point positioning method and system based on global attention
CN112085741B (en) * 2020-09-04 2024-03-26 厦门大学 Gastric cancer pathological section segmentation algorithm based on deep learning
CN112446891B (en) * 2020-10-23 2024-04-02 浙江工业大学 Medical image segmentation method based on U-Net network brain glioma
CN112287940A (en) * 2020-10-30 2021-01-29 西安工程大学 Semantic segmentation method of attention mechanism based on deep learning
CN112365480B (en) * 2020-11-13 2021-07-16 哈尔滨市科佳通用机电股份有限公司 Brake pad loss fault identification method for brake clamp device
CN112509052B (en) * 2020-12-22 2024-04-23 苏州超云生命智能产业研究院有限公司 Method, device, computer equipment and storage medium for detecting macula fovea
CN112598650A (en) * 2020-12-24 2021-04-02 苏州大学 Combined segmentation method for optic cup optic disk in fundus medical image
CN112580654A (en) * 2020-12-25 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Semantic segmentation method for ground objects of remote sensing image
CN112767502B (en) * 2021-01-08 2023-04-07 广东中科天机医疗装备有限公司 Image processing method and device based on medical image model
CN112927255B (en) * 2021-02-22 2022-06-21 武汉科技大学 Three-dimensional liver image semantic segmentation method based on context attention strategy
CN113065412A (en) * 2021-03-12 2021-07-02 武汉大学 Improved Deeplabv3+ based aerial image electromagnetic medium semantic recognition method and device
CN113158802A (en) * 2021-03-22 2021-07-23 安徽理工大学 Smart scene segmentation technique
CN113112465B (en) * 2021-03-31 2022-10-18 上海深至信息科技有限公司 System and method for generating carotid intima-media segmentation model
CN113129321A (en) * 2021-04-20 2021-07-16 重庆邮电大学 Turbine blade CT image segmentation method based on full convolution neural network
CN113033572B (en) * 2021-04-23 2024-04-05 上海海事大学 Obstacle segmentation network based on USV and generation method thereof
CN113744279B (en) * 2021-06-09 2023-11-14 东北大学 Image segmentation method based on FAF-Net network
CN113298825B (en) * 2021-06-09 2023-11-14 东北大学 Image segmentation method based on MSF-Net network
CN113298174B (en) * 2021-06-10 2022-04-29 东南大学 Semantic segmentation model improvement method based on progressive feature fusion
CN113436094B (en) * 2021-06-24 2022-05-31 湖南大学 Gray level image automatic coloring method based on multi-view attention mechanism
CN113378791B (en) * 2021-07-09 2022-08-05 合肥工业大学 Cervical cell classification method based on double-attention mechanism and multi-scale feature fusion
CN113689434B (en) * 2021-07-14 2022-05-27 淮阴工学院 Image semantic segmentation method based on strip pooling
CN113361537B (en) * 2021-07-23 2022-05-10 人民网股份有限公司 Image semantic segmentation method and device based on channel attention
CN113689326B (en) * 2021-08-06 2023-08-04 西南科技大学 Three-dimensional positioning method based on two-dimensional image segmentation guidance
CN113610164B (en) * 2021-08-10 2023-12-22 北京邮电大学 Fine granularity image recognition method and system based on attention balance
CN114049315B (en) * 2021-10-29 2023-04-18 北京长木谷医疗科技有限公司 Joint recognition method, electronic device, storage medium, and computer program product
CN114038037B (en) * 2021-11-09 2024-02-13 合肥工业大学 Expression label correction and identification method based on separable residual error attention network
CN114638256A (en) * 2022-02-22 2022-06-17 合肥华威自动化有限公司 Transformer fault detection method and system based on sound wave signals and attention network
CN116229065B (en) * 2023-02-14 2023-12-01 湖南大学 Multi-branch fusion-based robotic surgical instrument segmentation method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965705B2 (en) * 2015-11-03 2018-05-08 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
US10558750B2 (en) * 2016-11-18 2020-02-11 Salesforce.Com, Inc. Spatial attention model for image captioning
CN107506774A (en) * 2017-10-09 2017-12-22 深圳市唯特视科技有限公司 A kind of segmentation layered perception neural networks method based on local attention mask
US10878570B2 (en) * 2018-07-17 2020-12-29 International Business Machines Corporation Knockout autoencoder for detecting anomalies in biomedical images
US10922816B2 (en) * 2018-08-27 2021-02-16 Siemens Healthcare Gmbh Medical image segmentation from raw data using a deep attention neural network
US10482603B1 (en) * 2019-06-25 2019-11-19 Artificial Intelligence, Ltd. Medical image segmentation using an integrated edge guidance module and object segmentation network
CN110490813B (en) * 2019-07-05 2021-12-17 特斯联(北京)科技有限公司 Feature map enhancement method, device, equipment and medium for convolutional neural network
CN110211127B (en) * 2019-08-01 2019-11-26 成都考拉悠然科技有限公司 Image partition method based on bicoherence network
CN110490891A (en) * 2019-08-23 2019-11-22 杭州依图医疗技术有限公司 The method, equipment and computer readable storage medium of perpetual object in segmented image
CN110532955B (en) * 2019-08-30 2022-03-08 中国科学院宁波材料技术与工程研究所 Example segmentation method and device based on feature attention and sub-upsampling

Also Published As

Publication number Publication date
CN111325751A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111325751B (en) CT image segmentation system based on attention convolution neural network
Zhou et al. GMNet: Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN111612008A (en) Image segmentation method based on convolution network
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN110223304B (en) Image segmentation method and device based on multipath aggregation and computer-readable storage medium
CN111340814A (en) Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN113807355A (en) Image semantic segmentation method based on coding and decoding structure
CN114943963A (en) Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN110866938B (en) Full-automatic video moving object segmentation method
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN114119975A (en) Language-guided cross-modal instance segmentation method
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN115620010A (en) Semantic segmentation method for RGB-T bimodal feature fusion
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
CN116129289A (en) Attention edge interaction optical remote sensing image saliency target detection method
CN114219824A (en) Visible light-infrared target tracking method and system based on deep network
CN116486080A (en) Lightweight image semantic segmentation method based on deep learning
CN110599495B (en) Image segmentation method based on semantic information mining
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
Zhang et al. CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detection
CN113870286A (en) Foreground segmentation method based on multi-level feature and mask fusion
CN115995002B (en) Network construction method and urban scene real-time semantic segmentation method
CN117152438A (en) Lightweight street view image semantic segmentation method based on improved deep LabV3+ network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220527

CF01 Termination of patent right due to non-payment of annual fee