CN114565770B

CN114565770B - Image segmentation method and system based on edge auxiliary calculation and mask attention

Info

Publication number: CN114565770B
Application number: CN202210288277.3A
Authority: CN
Inventors: 王勇; 钟立科; 黄伟红; 胡建中
Original assignee: Central South University; Xiangya Hospital of Central South University
Current assignee: Central South University; Xiangya Hospital of Central South University
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-09-13
Anticipated expiration: 2042-03-23
Also published as: CN114565770A

Abstract

The invention discloses an image segmentation method and system based on edge auxiliary computation and mask attention, wherein a feature encoder constructed by multi-order cascaded residual modules is established, an edge feature map is obtained by fusing three shallow feature maps, an edge predicted image is obtained after feature dimension reduction, the characterization capability of the front three-layer feature encoder is enhanced, the last-order residual module sequentially passes through a plurality of feature decoders and a mask attention module, the mask attention module is utilized to improve the key attention of each level feature decoder to a local area, a segmentation result image corresponding to dimension prediction is output at each level, the output feature map of the feature decoder and the edge feature map of the front three-order residual module are fused, and the final segmentation result image is predicted through feature dimension reduction. Compared with the existing image segmentation method, the method can provide more accurate segmentation edge prediction, is suitable for image segmentation under various complex scenes, and has stronger generalization performance and better segmentation effect.

Description

Image segmentation method and system based on edge auxiliary calculation and mask attention

Technical Field

The invention belongs to the field of computer vision, and relates to an image segmentation method and system based on edge auxiliary calculation and mask attention.

Background

The current image segmentation technology based on deep learning is an important research direction in the field of computer vision and has been widely applied, and the current image segmentation method is to classify each pixel in an image by using a deep learning model and finally obtain the semantic category of each pixel. However, the existing method still has the following problems that the model is inaccurate in segmenting the edge of the target in the image, multi-scale context information cannot be fully utilized, excessive information loss exists in the prediction process, the target loss function of model optimization is too single, effective modeling cannot be performed, and the segmentation effect of the model is finally influenced.

The noun explains:

BatchNorm layer: the input of each layer of neural network keeps the same distribution in the deep neural network training process. The method comprises the steps of firstly calculating the integral mean value and variance of input data, then carrying out normalization operation, and finally carrying out scaling and translation according to set scaling factors and translation factors.

ReLu layer: the linear rectification function is realized, and as an activation function in the neural network, for an input vector x, the output is a larger value of 0 and x after the ReLu layer is used.

MaxPool layer: the whole feature map is divided into a plurality of blocks with the same size in an non-overlapping way, only the maximum parameter is reserved in each block, and other parameters are discarded. The MaxPool layer mentioned in this patent has a size of 2 × 2, a step size of 2, and the width and height of the output feature map are half of those of the input feature map.

Conv Block: for simplicity of illustration, the description that follows does not include the BatchNorm layer and the ReLu layer. The inputs have the following two branches: (1) sequentially passing through 1 × 1 convolution layer, 3 × 3 convolution layer and 1 × 1 convolution layer; (2) the number of signature channels was varied through 1 x 1 convolutional layers. And then, fusing the feature maps output by the two branches through an adding operation to obtain an output.

Identity Block: for simplicity of illustration, the following description does not include the BatchNorm layer and the ReLu layer. The inputs have the following two branches: (1) sequentially passing through 1 × 1 convolution layer, 3 × 3 convolution layer and 1 × 1 convolution layer; (2) jump connected, output equals input. And then, fusing the feature maps output by the two branches through an adding operation to obtain an output.

Disclosure of Invention

In order to solve the technical problems, the invention discloses an image segmentation method and system based on edge auxiliary calculation and mask attention, and the invention improves the segmentation effect and the segmentation accuracy of the image.

The technical scheme adopted by the invention for solving the technical problems is as follows:

an image segmentation system based on boundary perception attention comprises a feature encoder constructed by n stages of cascaded residual modules conv, wherein a feature decoder block is arranged corresponding to each stage of residual module, and a mask attention module is arranged corresponding to each stage of feature decoder block; the characteristic diagram output by the residual error module of the a-th level is input by the residual error module of the a + 1-th level; the characteristic diagram output by the a-stage residual error module is input into a-stage characteristic decoder block, and the output characteristic diagram output by the a-stage characteristic decoder block is input into a mask attention block of the a-stage mask attention module; the enhanced feature map output by the a-th level mask attention block is input into an a-1-th level feature decoder;

respectively performing descending and up-sampling on the output of the first 3-level residual error module to obtain three shallow layer feature maps, fusing the three shallow layer feature maps to obtain a final edge feature map effect, and performing dimensionality reduction on the final edge feature map effect feature to obtain an edge predicted image edge _ prediction; splicing and fusing the final edge feature image and the last enhanced feature image last _ feature output by the first-level mask attention module to obtain a final prediction segmentation image prediction _ 512; n is more than or equal to 3.

In a further improvement, n is 5; the first-stage residual module conv1 includes a convolution layer with convolution kernel 7 × 7, a BatchNorm layer, a ReLu layer and a MaxPool layer; the remaining residual blocks comprise Conv Block and Identity Block; the Conv Block has different input and output dimensions and is used for changing the dimension of the network; the k input dimension and the output dimension of the Identity Bloc are the same and are used for deepening the network; the method comprises the steps that an input image respectively obtains a feature map feature _1, a feature map feature _2, a feature map feature _3, a feature map feature _4 and a feature map feature _5 through five residual modules, the feature maps feature _1, feature _2 and feature _3 are taken, channel dimensionality reduction is respectively carried out on convolutional layers with convolution kernels of 1 x 1, feature extraction is carried out on the convolutional layers with convolution kernels of 3 x 3, linear interpolation up-sampling operation with factors of 2, 4 and 8 is respectively carried out, and three edge feature maps effect _1, effect _2 and effect _3 with the same scale are obtained; and performing feature fusion on the three edge feature maps, namely, the edge _1, the edge _2 and the edge _3 in a splicing mode to obtain a final edge feature map, and performing channel dimension reduction on the edge feature map by using a convolution layer with a convolution kernel of 1 × 1 to obtain an edge predicted image edge _ prediction.

In a further improvement, the feature map output by the a-th stage residual module and the enhanced feature map output by the a + 1-th stage mask attention block are input into an a-th stage feature decoder block; in the decoder block of the a-level feature decoder, the enhanced feature graph is subjected to linear interpolation upsampling with a factor of 2, then is subjected to feature fusion with the feature graph through splicing operation, and is output through two convolution layers with convolution kernels of 3 x 3 to obtain an output feature graph;

inputting an output characteristic diagram of the decoder block of the a-level characteristic decoder into a mask attention block of the a-level mask; firstly, obtaining a prediction segmentation image with a corresponding scale by a convolution layer with a convolution kernel of 3 × 3 and a convolution kernel of 1 × 1 by an output feature map, multiplying the prediction segmentation image as a mask attention map mask _ attribute by the output feature map to obtain an attention feature map att _ feature, and directly adding the attention feature map att _ feature and the output feature map to obtain an enhanced feature map; the mask attention modules of the fifth stage to the second stage output predicted divided images prediction _ x having a scale size of 32 × 32, 64 × 64, 128 × 128, 256 × 256, respectively, x being 2, 3, 4, 5.

Further improvement, the last enhanced feature pattern last _ feature output by the first-stage mask attention module is spliced with the final edge feature pattern effect, then the result of the convolution operation is input into a Sigmoid activation function through a convolution layer with a convolution kernel of 1 × 1, and a final prediction segmentation image prediction _512 is obtained;

respectively calculating an aggregation Loss function Loss for the edge prediction image edge _ prediction, the multi-scale prediction segmentation image prediction _ x and the final prediction segmentation image prediction _ 512:

Loss＝BCELoss+DiceLoss+JaccardLoss

the BCELoss calculates the cross entropy loss of two classes under a single-label two-class scene, one input sample picture corresponds to one output segmentation picture, and for a Batch data set D (p, y) containing N sample pictures, p is a prediction result, the value range is 0-1, y is label information, and the value range is 0 or 1; the BCELoss calculation formula is as follows:

wherein pi represents a prediction result of the ith sample picture, and yi represents label information of the ith sample picture;

wherein DiceLoss 1-Dice (P, Y)

Wherein, Dice (P, Y) represents Dice coefficient, P is a prediction result and has a value range of 0-1, Y is label information and has a value of 0 or 1;

JaccardLoss＝1-Jaccard(P，Y)；

wherein Jaccard (P, Y) represents Jaccard coefficient;

obtaining a total aggregation Loss function Loss _ sum of an edge prediction image edge _ prediction, a multi-scale prediction segmentation image prediction _ x and a final prediction segmentation image prediction _ 512:

the method comprises the following steps of (1) obtaining a predicted segmentation image, wherein the predicted segmentation image is a prediction segmentation image output by a mask attention module from a fifth stage to a second stage, and the prediction segmentation image comprises the following steps of (32) + Loss (64) + Loss (128) + Loss (256) + Loss (512) + Loss (edge); loss (512) is the aggregation Loss function of the final prediction segmentation image prediction _ 512; loss (edge) is the aggregation loss function of edge predicted image edge _ prediction;

and optimizing the total aggregation Loss function Loss _ sum to minimize the Loss _ sum, thereby obtaining the optimized image segmentation system.

And further improving, optimizing a total aggregation Loss function Loss _ sum by adopting an Adam gradient descent algorithm.

An initial image is input into the image segmentation system based on the boundary perception attention, and a final edge feature image effect and a final prediction segmentation image prediction _512 are obtained.

The invention has the advantages that:

1. aiming at the problem that the edge segmentation of a model to a target in an image is inaccurate in the prior art, the invention provides edge auxiliary calculation, the structure takes cascade of depth residual modules as a feature coding path, semantic information is transmitted layer by layer, the representation capability and the feature extraction capability of a feature encoder in the first three layers are enhanced by fusing shallow low-dimensional high-fine granularity detail features, the optimization process of the model parameters is assisted by an edge loss function, and guidance is provided for image segmentation in a feature decoding path, so that the edge of a segmented target is more accurate and clear.

2. Aiming at the problem that multi-scale context information cannot be fully utilized and a target loss function of model optimization is too single in the prior art, the invention provides a mask attention structure and a multi-scale aggregation loss function, wherein the structure takes a double convolution layer and mask attention module as a feature decoding path, focuses on the position containing important information in a feature space, and supplements detailed local features layer by layer. The method carries out strong supervision learning aiming at multi-scale segmentation predicted images, and the scales are fused layer by layer, so that the global and detail local characteristics required by segmentation are continuously enriched and perfected, the spatial resolution of a characteristic diagram is improved, and the accuracy and the effect of target segmentation in the images are further improved.

Drawings

FIG. 1 is a block diagram of a network model structure of an image segmentation algorithm according to an embodiment of the present invention;

FIG. 2 is a block diagram of an encoder path according to an embodiment of the present invention;

FIG. 3 is a block diagram of a mask attention module according to an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following examples.

Example 1

An image segmentation method based on edge-aided calculation and mask attention, the frame diagram of the method is shown in figure 1, and the method comprises the following steps:

s1, establishing a multi-order cascaded residual error module-constructed feature encoder, respectively performing dimensionality reduction and upsampling on the output of the first three-order residual error module, fusing three shallow feature maps to obtain an edge feature map, obtaining an edge predicted image after feature dimensionality reduction, and enhancing the characterization capability of the first three-layer feature encoder, wherein the specific implementation method comprises the following steps:

the feature encoder is composed of five levels including a Conv1 layer, a Conv2_ x layer, a Conv3_ x layer, a Conv4_ x layer and a Conv5_ x layer, wherein the Conv1 layer comprises a convolution layer with convolution kernel of 7 × 7, a BatchNorm layer, a ReLu layer and a MaxPool layer, all levels except the Conv1 layer are cascade residual blocks, the residual blocks mainly comprise a Conv Block and an Identity Block, the input dimension and the output dimension of the Conv Block are different, the dimension of the Identity Block is used for changing the dimension of the network, and the input dimension and the output dimension of the Identity Block are the same and can be connected in series for deepening the network. An input image respectively obtains a feature map feature _1, a feature map feature _2, a feature map feature _3, a feature map feature _4 and a feature map feature _5 through five levels of a feature encoder, the feature maps feature _1, feature _2 and feature _3 corresponding to the first three levels are taken out, channel dimensionality reduction is respectively carried out on convolutional layers with convolution kernels of 1 x 1, feature extraction is carried out on convolutional layers with convolution kernels of 3 x 3, linear interpolation up-sampling operation with factors of 2, 4 and 8 is respectively carried out, and three edge feature maps effect _1, effect _2 and effect _3 with the same scale are obtained. And performing feature fusion on the three edge feature maps in a splicing mode to obtain a final edge feature map efeature, and performing channel dimension reduction on the edge feature map efeature by adopting a convolution layer with a convolution kernel of 1 x 1 to obtain an edge prediction image edge _ prediction. The method can fuse the detail features of shallow layer, low dimension and high fine granularity, enhance the representation capability and the feature extraction capability of the feature encoder of the first three layers, assist the optimization process of model parameters through an edge loss function, and provide guidance for image segmentation in a feature decoding path, so that the edge of a segmented target is more accurate and clear.

S2, the last-order residual error module sequentially passes through a plurality of feature decoders and mask attention modules which are in charge of up-sampling and jumping connection, the mask attention modules are utilized to improve the focus attention of each level feature decoder to a local area, and a segmentation result image predicted by a corresponding scale is output at each level, and the specific implementation method is as follows:

the input of the mask attention module firstly passes through convolution layers with convolution kernels of 3 × 3 and convolution kernels of 1 × 1 to obtain a prediction segmentation image with a corresponding scale, the prediction segmentation image is used as a mask attention map mask _ attribute to be multiplied by the input to obtain an attention feature map att _ feature, and the attention feature map att _ feature is directly added with the input through jump connection to obtain an enhanced feature map. The feature decoder is composed of two convolution layers with convolution kernel of 3 x 3, the input is the feature graph of the corresponding hierarchical feature encoder and the feature graph of the previous mask attention module, wherein the feature graph of the previous mask attention module needs to be subjected to linear interpolation upsampling with factor of 2, and feature fusion of the two feature graphs is completed by utilizing a splicing operation under the condition that the feature graphs have the same scale. In this process, four different scales of the predictive segmented image predictive _ x are generated, with the scales being 32 × 32, 64 × 64, 128 × 128, 256 × 256, respectively. The step can be used for obtaining the mask attention map according to different training of importance of spatial position information on the image, namely extracting a mask attention moment array on an information path of a multi-scale decoder module to guide the segmentation of semantic information of the target, determining the spatial position needing important attention and finally improving the overall segmentation effect of the target.

S3, merging the output characteristic diagram of the characteristic decoder and the edge characteristic diagram of the first three-order residual error module, aiming at introducing high fine granularity information of the edge, improving the edge prediction accuracy in the segmentation result, and calculating the aggregation loss function for model parameter optimization on 6 types of prediction result diagrams output by the model through the final segmentation result image of characteristic dimension reduction prediction, wherein the specific implementation method comprises the following steps:

and fusing the last feature pattern of the last feature decoder and the edge feature pattern effect calculated by the previous three-order residual error module by utilizing splicing operation, then inputting the result of the convolution operation into a Sigmoid activation function through a convolution layer with a convolution kernel of 1 x 1, and obtaining the final prediction segmentation image prediction _ 512. And calculating an aggregation loss function for each prediction result according to the edge prediction image edge _ prediction, the multi-scale prediction segmentation image prediction _ x and the final prediction segmentation image prediction _512, wherein the aggregation loss function is the addition of BCELoss, DiceLoss and JacchardLoss, and the total loss function during model training is the addition of the edge prediction image edge _ prediction, the multi-scale prediction segmentation image prediction _ x and the final prediction segmentation image prediction _ 512. The Adam gradient descent algorithm designs independent adaptive learning rates for different parameters by calculating the first moment estimation and the second moment estimation of the gradient, has excellent performance on unsteady state and online problems, and utilizes the Adam gradient descent algorithm to optimize model parameters. The step can carry out strong supervision and learning on the multi-scale output result of the decoder path, the model parameters are optimized faster and better under the constraint of a plurality of optimization targets, excellent performance is shown in the segmentation problem, and multi-scale information can be aggregated more effectively in the process, so that help is provided for final image segmentation.

The embodiment of the invention also provides an image segmentation system based on edge auxiliary calculation and mask attention, which comprises computer equipment; the computer device is configured or programmed for performing the steps of the above-described embodiment method.

In the invention, the computer equipment can be a microprocessor, an upper computer and other equipment.

The Montgomery published CXR data set is tested, experimental results using the data set in a paper published in the last 5 years are compared (due to the fact that evaluation indexes adopted in the paper are different, partial evaluation index results are absent), and after comparison, the method provided by the Montgomery published CXR data set has obvious advantages on three indexes of acc, dice and jaccard.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An image segmentation system based on boundary perception attention is characterized by comprising a feature encoder constructed by n stages of cascaded residual modules conv, wherein a feature decoder block is arranged corresponding to each stage of residual module, and a mask attention module is arranged corresponding to each stage of feature decoder block; the characteristic diagram output by the residual error module of the a-th level is input by the residual error module of the a + 1-th level; the characteristic diagram output by the a-stage residual error module is input into a-stage characteristic decoder block, and the output characteristic diagram output by the a-stage characteristic decoder block is input into a mask attention block of the a-stage mask attention module; the enhanced feature map output by the a-th level mask attention block is input into an a-1-th level feature decoder;

respectively performing descending and up-sampling on the output of the first 3-level residual error module to obtain three shallow layer feature maps, fusing the three shallow layer feature maps to obtain a final edge feature map effect, and performing dimensionality reduction on the final edge feature map effect feature to obtain an edge predicted image edge _ prediction; splicing and fusing the final edge feature image and the last enhanced feature image last _ feature output by the first-level mask attention module to obtain a final prediction segmentation image predict _ 512; n is 5; the first-stage residual error module conv1 includes a convolution layer with convolution kernel 7 × 7, a batch norm layer, a ReLu layer and a MaxPool layer; the remaining residual blocks comprise Conv Block and Identity Block; the dimensions of Conv Block inputs and outputs are different; the k input dimension and the k output dimension of the Identity Bloc are the same and are used for deepening the network; the method comprises the steps that an input image respectively obtains a feature map feature _1, a feature map feature _2, a feature map feature _3, a feature map feature _4 and a feature map feature _5 through five residual modules, the feature maps feature _1, feature _2 and feature _3 are taken, channel dimensionality reduction is respectively carried out on convolutional layers with convolution kernels of 1 x 1, feature extraction is carried out on the convolutional layers with convolution kernels of 3 x 3, linear interpolation up-sampling operation with factors of 2, 4 and 8 is respectively carried out, and three edge feature maps effect _1, effect _2 and effect _3 with the same scale are obtained; and performing feature fusion on the three edge feature maps, namely, the edge _1, the edge _2 and the edge _3 in a splicing mode to obtain a final edge feature map, and performing channel dimension reduction on the edge feature map by using a convolution layer with a convolution kernel of 1 × 1 to obtain an edge predicted image edge _ prediction.

2. The boundary aware attention-based image segmentation system as claimed in claim 1, wherein the feature map output by the a-th stage residual module and the enhanced feature map output by the a + 1-th stage mask attention block are input to a-th stage feature decoder block; in the decoder block of the a-level feature decoder, the enhanced feature graph is subjected to linear interpolation upsampling with a factor of 2, then is subjected to feature fusion with the feature graph through splicing operation, and is output through two convolution layers with convolution kernels of 3 x 3 to obtain an output feature graph;

inputting an output characteristic diagram of the decoder block of the a-stage characteristic decoder into a mask _ attention block of the a-stage mask; firstly, obtaining a prediction segmentation image with a corresponding scale by convolution layers with convolution kernels of 3 x 3 and convolution kernels of 1 x 1 through an output feature map, multiplying the prediction segmentation image serving as a mask attention map spectrum mask _ attribute with the output feature map to obtain an attention feature map att _ feature, and directly adding the attention feature map att _ feature with the output feature map to obtain an enhanced feature map; the mask attention modules of the fifth stage to the second stage output predicted divided images predicted _ x, x 2, 3, 4, and 5 having sizes of 32 × 32, 64 × 64, 128 × 128, and 256 × 256, respectively.

3. The boundary awareness-based image segmentation system as claimed in claim 2, wherein the last enhanced feature map last _ feature output by the first-stage mask attention module is subjected to a stitching operation with the final edge feature map effect, and then the result of the stitching operation is input into a Sigmoid activation function through a convolution layer with a convolution kernel of 1 × 1 to obtain a final predicted segmented image predict _ 512;

calculating an aggregation Loss function Loss for the edge prediction image edge _ prediction, the multi-scale prediction segmentation image prediction _ x, and the final prediction segmentation image prediction _512 respectively:

Loss＝BCELoss+DiceLoss+JaccardLoss

wherein p is _i Indicates the prediction result of the ith sample picture, y _i Label information indicating an ith sample picture;

wherein DiceLoss 1-Dice (P, Y)

JaccardLoss＝1-Jaccard(P,Y)；

wherein Jaccard (P, Y) represents Jaccard coefficient;

the method comprises the following steps of (1) obtaining a predicted segmentation image, wherein the predicted segmentation image is a prediction segmentation image output by a mask attention module from a fifth stage to a second stage, and the prediction segmentation image comprises the following steps of (32) low (64) low (128) low (256) low (512) low (256) and low (256) low (32), 64) low (128) and 256) aggregation Loss functions; loss (512) is the aggregation Loss function of the final prediction segmentation image prediction _ 512; loss (edge) is the aggregation loss function of edge predicted image edge _ prediction;

4. The boundary aware attention-based image segmentation system as claimed in claim 3, wherein the Adam gradient descent algorithm is adopted to optimize the total aggregation Loss function Loss _ sum.

5. An image segmentation method based on boundary awareness, characterized in that an initial image is input into the image segmentation system based on boundary awareness as claimed in any one of claims 1 to 4, and a final edge feature map effect and a final prediction segmentation image prediction _512 are obtained.