CN114170167B

CN114170167B - Polyp segmentation method and computer device based on attention-guided context correction

Info

Publication number: CN114170167B
Application number: CN202111434451.2A
Authority: CN
Inventors: 施连焘; 李正国; 王玉峰; 郭玉宝
Original assignee: Shenzhen Polytechnic
Current assignee: Shenzhen Polytechnic
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-11-18
Anticipated expiration: 2041-11-29
Also published as: CN114170167A

Abstract

The invention provides a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium and a computer device, comprising: inputting a polyp picture to be segmented into an enhanced context correction model for training, then performing down-sampling, then performing enhanced context correction model training, then performing down-sampling, and repeatedly obtaining a final semantic information image; inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; performing upsampling on the semantic information image after feature fusion, performing enhanced context correction model training to obtain a feature mapping image, performing upsampling, performing enhanced context correction model training, and repeatedly obtaining a final feature mapping image with the same size as a polyp image channel to be segmented; and inputting the final feature mapping image into a multi-level feature fusion model for training and outputting a polyp segmentation image. Thereby enabling the present invention to more accurately identify polyps.

Description

Polyp segmentation method and computer device based on attention-guided context correction

Technical Field

The present invention belongs to the medical field, and in particular, to a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium, and a computer device.

Background

Colorectal cancer is a colorectal cancer which is caused by long-term and many reasons when polyps (raised masses in gastrointestinal channels) formed in intestinal tracts are developed at the earliest stage, and if the polyps can be detected and cut off by intervention at the early stage and the colorectal cancer can be prevented, the most effective method for screening and diagnosing the colorectal cancer is colorectal endoscopy which is the most mainstream method for diagnosing the colorectal cancer at present.

However, although the current diagnosis method is advanced and accurate, some problems still exist, according to some professional research reports, in the process of endoscopy, every four polyps are missed to cause incomplete resection, and hidden troubles are left, in addition, the shapes of the polyps are different and changeable, each fine judgment is difficult to be carried out through naked eyes, and especially under the condition that some polyps are not different from the gastrointestinal channel background, the last is that the rapid identification can not be carried out only by people, a lot of time and energy are needed, and the judgment needs to be carried out by adding a lot of workload of gastroenterology doctors under the current medical system.

Disclosure of Invention

The invention aims to provide a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium and a computer device, and aims to solve the problem that hidden dangers are left due to incomplete resection caused by omission of endoscopic polyps.

In a first aspect, the present invention provides a polyp segmentation method based on attention-guided context correction, comprising:

acquiring a polyp picture to be segmented;

inputting a polyp picture to be segmented into an enhanced context correction model for training to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the enhanced context correction model again for training and then performing down-sampling, and repeating for multiple times to obtain a final semantic information image; the enhanced context correction model is characterized in that an input polyp picture to be segmented is divided into two feature maps with equal channel quantity by channel dimension, one feature map is subjected to attention mechanism to obtain a first feature map, the other feature map is subjected to feature extraction by depth separable convolution to obtain a second feature map, the first feature map and the second feature map are spliced, and then residual errors are connected and fused to output a semantic information image;

inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; the progressive context fusion model is used for respectively performing feature extraction on a final semantic information image through a hole convolution and a conventional convolution to obtain two feature maps, splicing the two feature maps and inputting the spliced feature maps into a channel attention mechanism of context modeling to obtain a channel weight, performing feature fusion on the channel weight and the final semantic information image according to channel dimensions, and outputting a semantic information image after feature fusion;

performing up-sampling on the semantic information image after feature fusion to obtain an up-sampled semantic information image, performing enhanced context correction model training on the up-sampled semantic information image to obtain a feature mapping image, performing up-sampling on the feature mapping image again, performing enhanced context correction model training, and repeating for multiple times to obtain a final feature mapping image with the same channel size as the polyp image to be segmented;

inputting the final feature mapping image into a multi-level feature fusion model for training, and outputting a polyp segmentation image; the multi-level feature fusion model is characterized in that the final feature mapping image is up-sampled to the same resolution ratio one by one, the up-sampled feature mapping images are spliced and input into a channel attention mechanism to obtain channel weight, and the channel weight and the final feature mapping output image are modeled by pixel-level multiplication to obtain a polyp segmentation image.

Further, after the upsampling is performed on the semantic information image after the feature fusion, the method further includes: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

Further, the inputting of the polyp picture to be segmented into the enhanced context correction model training specifically includes:

defining a polyp picture X to be segmented _in Is X _in ∈R ^C×H×W Extracting the characteristics of the polyp picture to be segmented by convolution of 1 multiplied by 1, and outputting two characteristic graphs X with the same channel number ₁ And X ₂ ，

And

and two feature maps X ₁ And X ₂ Obtaining a first profile X by means of an attention mechanism and a depth-separable convolution, respectively _att And a second feature map X' ₂ Namely:

the first characteristic diagram X _att And a second characteristic diagram X ₂ ' splicing, and then obtaining semantic information image X by residual connection and fusion _out Comprises the following steps:

wherein the content of the first and second substances,

is X ₁ Sending the feature graph obtained by 1 × 1 convolution through a batch regularization algorithm and a RELU nonlinear activation function; r represents a three-dimensional array image, C is the number of channels, H is the length, W is the width, sigma and

respectively referred to as sigmoid activation function and pixel level summation,

for pixel level multiplication, up is conventional bilinear interpolation upsampling, down is downsampling, cat represents splicing in channel dimension, and X _out To output a characteristic map, F _3×3 Representing a 3 x 3 convolution batch normalization and nonlinear activation function.

Further, the training of the progressive context fusion model specifically includes:

defining the final semantic information image X 'as X' belonged to R ^C×H×W Extracting features by convolution of 1 × 1, and extracting two feature maps X according to conventional convolution and cavity convolution respectively _s And X _l Namely:

two kinds of feature maps X _s And X _l Obtaining a spliced characteristic diagram X after splicing _cat The method comprises the following steps:

X _cat ＝Cat(X _l ,X _s )；

the spliced feature map X _cat Inputting the channel attention mechanism of global context modeling to obtain channel weight, performing feature fusion on the channel weight and the final semantic information image according to channel dimension, and outputting semantic information image y after feature fusion _i The method comprises the following steps:

wherein, T _p = H · W denotes X _cat J represents the lower bound of the summation symbol, β _j Representing global attention pooling weights for context modeling, P = S ₃ RELU(LN(S ₂ (. -)) refers to a bottleneck layer for capturing the dependency between channels, RELU stands for a non-linear activation function, S ₂ And S ₃ Representing the information interaction between the channel dimensions, the LN is used as an optimizer for LayerNorm.

Further, the step of inputting the final feature map into the multi-level feature fusion model for training specifically comprises:

definition of FinalIs L ∈ R ^C×H×W The final feature maps are up-sampled to the same resolution one by one and then spliced, and the spliced feature maps are sent into a convolution W of 1 multiplied by 1 ₁ Extracting the features to obtain an extracted feature map G:

G＝W ₁ (Cat(B(l ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ ))；

inputting G into a channel attention mechanism to obtain channel weight, modeling the channel weight and the final feature mapping graph by pixel level multiplication to obtain a polyp segmentation picture Y, wherein the obtained polyp segmentation picture Y comprises the following steps:

wherein, L = (L) ₀ ,l ₁ ,l ₂ ,l ₃ ,l ₄ )，l ₀ -l ₄ Respectively represent decoded layer feature maps from large resolution to small resolution, B (l) ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ Represents l ₁ -l ₄ Upsampling to ₀ Splicing is carried out after the resolution is the same, ξ is a correlation coefficient related to G, G is global average pooling, and delta represents an activation function.

In a second aspect, the invention provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when being executed by a processor, performs the steps of the method for polyp segmentation based on attention-guided context correction according to the first aspect.

In a third aspect, the invention provides a computer apparatus comprising: one or more processors, a memory, and one or more computer programs, said processors and said memory being connected by a bus, wherein said one or more computer programs are stored in said memory and configured to be executed by said one or more processors, characterized in that said processor, when executing said computer programs, implements the steps of the polyp segmentation method based on attention-directed context correction as described in the first aspect.

In the invention, the operation of downsampling is carried out after the enhanced context correction model is repeatedly trained for many times, deeper semantic information can be obtained, and the interference of background noise can be effectively inhibited; the problem of large-scale polyps in the polyp recognition process is solved through the progressive context fusion model training; through multi-level feature fusion model training, accurate segmentation results can be effectively obtained, and the accuracy of polyp recognition results is improved.

Drawings

Fig. 1 is a flowchart of a polyp segmentation method based on attention-guided context correction according to an embodiment of the present invention.

Fig. 2 is a flowchart of another polyp segmentation method based on attention-guided context correction according to an embodiment of the present invention.

FIG. 3 is a flowchart of training an enhanced context correction model according to an embodiment of the present invention.

FIG. 4 is a flowchart of progressive context fusion model training according to an embodiment of the present invention.

Fig. 5 is a flowchart of multi-level feature fusion model training according to an embodiment of the present invention.

Fig. 6 is a block diagram of a specific structure of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, a polyp segmentation method based on attention-guided context correction according to an embodiment of the present invention includes the following steps: it should be noted that the polyp segmentation method based on attention-directed context correction according to the present invention is not limited to the flow sequence shown in fig. 1, if the results are substantially the same.

S1, obtaining a polyp picture to be segmented;

s2, inputting a polyp picture to be segmented into an enhanced context correction model for training to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the enhanced context correction model again for training and then performing down-sampling, and repeating for multiple times to obtain a final semantic information image; the enhanced context correction model is characterized in that an input polyp picture to be segmented is divided into two feature maps with equal channel quantity by channel dimension, one feature map is subjected to attention mechanism to obtain a first feature map, the other feature map is subjected to feature extraction by depth separable convolution to obtain a second feature map, the first feature map and the second feature map are spliced, and then residual errors are connected and fused to output a semantic information image;

s3, inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; the progressive context fusion model is used for respectively extracting the characteristics of the final semantic information image through a cavity convolution and a conventional convolution to obtain two characteristic graphs, splicing the two characteristic graphs and inputting the spliced characteristic graphs into a channel attention mechanism of context modeling to obtain channel weights, performing characteristic fusion on the channel weights and the final semantic information image according to channel dimensions, and outputting a semantic information image after the characteristic fusion;

s4, performing up-sampling on the semantic information image subjected to feature fusion to obtain an up-sampled semantic information image, performing enhanced context correction model training on the up-sampled semantic information image to obtain a feature mapping image, performing up-sampling on the feature mapping image again, performing enhanced context correction model training, and repeating for multiple times to obtain a final feature mapping image with the same channel size as the polyp image to be segmented;

s5, inputting the final feature mapping image into a multi-level feature fusion model for training, and outputting a polyp segmentation image; the multi-level feature fusion model is characterized in that the final feature mapping images are up-sampled one by one to the same resolution, the up-sampled feature mapping images are spliced and input into a channel attention mechanism to obtain channel weights, and the channel weights and the final feature mapping output images are modeled by pixel-level multiplication to obtain polyp segmentation images.

Fig. 2 is a flowchart illustrating a polyp segmentation method based on attention-guided context correction according to an embodiment of the present invention, where Input is an Input picture to be segmented, ECC represents an enhanced context correction model, down × 2 represents a downsampling operation (with a convolution kernel size of 2), PCF progressive context fusion model, up × 2 represents an upsampling operation (with a convolution kernel size of 2), MPA is a multi-level feature fusion model, output is an Output segmentation result, and skip connection is residual connection.

In an embodiment of the present invention, after performing upsampling on the semantic information image after the feature fusion, the method further includes: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

In an embodiment of the present invention, the inputting of the polyp picture to be segmented into the enhanced context correction model training specifically includes:

defining a polyp picture X to be segmented _in Is X _in ∈R ^C×H×W Extracting features from the polyp picture to be segmented by 1X 1 convolution, and outputting two feature maps X with equal channel number ₁ And X ₂ ，

And

and two feature maps X ₁ And X ₂ Obtaining a first profile X by means of an attention mechanism and a depth-separable convolution, respectively _att And a second characteristic diagram X ₂ ', i.e.:

wherein the content of the first and second substances,

referred to as sigmoid activation function and pixel level summation respectively,

for pixel-level multiplication, up is conventional bilinear interpolation upsampling, down is downsampling, cat represents concatenation in channel dimension, and X _out For outputting a characteristic map, F _3×3 Representing a 3 x 3 convolution batch normalization and a nonlinear activation function.

Fig. 3 is a flow chart of enhanced context correction model training, wherein,

a pixel level summation, a pixel level multiplication operation,

the function is activated for the sigmoid and,

in order to perform the down-sampling,

in order to splice in the channel dimension,

is a conventional bilinear interpolated upsampling.

In an embodiment of the present invention, referring to fig. 4, the training of the progressive context fusion model specifically includes:

defining the final semantic information image X 'as X' belonging to R ^C×H×W Extracting features by convolution of 1 × 1, and extracting two feature maps X according to conventional convolution and cavity convolution respectively _s And X _l Namely:

X _cat ＝Cat(X _l ,X _s )；

the spliced characteristic diagram X _cat Inputting the channel attention mechanism of global context modeling to obtain channel weight, performing feature fusion on the channel weight and the final semantic information image according to channel dimension, and outputting semantic information image y after feature fusion _i The method comprises the following steps:

wherein, T _p = H · W denotes X _cat J represents the lower bound of the summation symbol, β _j Representing global attention pooling weights for context modeling, P = S ₃ RELU(LN(S ₂ (.))) refers to a bottleneck layer for capturing the dependencies between channels, RELU stands for a non-linear activation function, S ₂ And S ₃ Representing the information interaction between the channel dimensions, LN is used as optimizer for LayerNorm.

In an embodiment of the present invention, the step of inputting the final feature map into the multi-level feature fusion model for training specifically includes:

defining the final characteristic mapping chart L as L epsilon R ^C×H×W The final feature maps are up-sampled to the same resolution one by one and then spliced, and the spliced feature maps are sent into a convolution W of 1 multiplied by 1 ₁ Extracting the features to obtain an extracted feature map G:

G＝W ₁ (Cat(B(l ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ ))；

wherein, L = (L) ₀ ,l ₁ ,l ₂ ,l ₃ ,l ₄ )，l ₀ -l ₄ Respectively representing decoded layer feature maps from large to small resolution, B (l) ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ Represents l ₁ -l ₄ Upsampling to ₀ Splicing is carried out after the resolution is the same, xi is a correlation coefficient related to G, G is global average pooling, and delta represents an activation function.

Fig. 5 is a flow chart of multi-level fusion model training, wherein,

for the purpose of bi-linear interpolation up-sampling,

is a pixel level multiplication operation.

An embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of a polyp segmentation method based on attention-guided context correction as provided by an embodiment of the present invention.

Fig. 6 shows a specific block diagram of a computer device according to an embodiment of the present invention, where the computer device 100 includes: one or more processors 101, a memory 102, and one or more computer programs, wherein the processors 101 and the memory 102 are connected by a bus, the one or more computer programs being stored in the memory 102 and configured to be executed by the one or more processors 101, the processor 101 implementing the steps of the attention-directed context correction based polyp segmentation method as provided by an embodiment of the invention when executing the computer programs.

The computer equipment comprises a server, a terminal and the like. The computer device may be a desktop computer, a mobile terminal or a vehicle-mounted device, and the mobile terminal includes at least one of a mobile phone, a tablet computer, a personal digital assistant or a wearable device.

In the embodiment of the invention, the operation of downsampling is carried out after the enhanced context correction model is repeatedly trained for many times, deeper semantic information can be obtained, and the interference of background noise can be effectively inhibited; the problem of large-scale polyps in the polyp recognition process is solved through the progressive context fusion model training; through multi-level feature fusion model training, accurate segmentation results can be effectively obtained, and the accuracy of polyp recognition results is improved.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A polyp segmentation method based on attention-directed context correction, comprising:

acquiring a polyp picture to be segmented;

inputting a polyp picture to be segmented into an enhanced context correction model for training to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the enhanced context correction model again for training and then performing down-sampling, and repeating for multiple times to obtain a final semantic information image; the enhanced context correction model is characterized in that an input polyp picture to be segmented is divided into two feature maps with equal channel quantity according to channel dimensions, one feature map is subjected to attention mechanism to obtain a first feature map, the other feature map is subjected to depth separable convolution to extract features to obtain a second feature map, the first feature map and the second feature map are spliced to obtain a spliced feature map, and then the spliced feature map and pixel features of the polyp picture to be segmented are fused by residual connection and a semantic information image is output;

inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; the progressive context fusion model is used for respectively extracting the characteristics of the final semantic information image through a cavity convolution and a conventional convolution to obtain two characteristic graphs, splicing the two characteristic graphs and inputting the spliced characteristic graphs into a channel attention mechanism of context modeling to obtain channel weights, performing characteristic fusion on the channel weights and the final semantic information image according to channel dimensions, and outputting a semantic information image after the characteristic fusion;

inputting the final feature mapping image into a multi-level feature fusion model for training, and outputting a polyp segmentation image; the multi-level feature fusion model is characterized in that final feature maps with the same channel size are up-sampled one by one to the same resolution size as a polyp picture to be segmented, the up-sampled feature maps with the same resolution size are spliced and then feature extraction is carried out to obtain extracted feature maps, the extracted feature maps are input into a channel attention mechanism to obtain channel weights, and the channel weights and the extracted feature maps are modeled by pixel-level multiplication to obtain the polyp segmentation picture.

2. The polyp segmentation method according to claim 1, wherein after the upsampling of the feature-fused semantic information image, further comprising: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

3. The polyp segmentation method as set forth in claim 1, wherein the inputting of the polyp picture to be segmented into the enhanced context correction model training is specifically:

And

the first characteristic diagram X _att And a second feature map X' ₂ Splicing, and obtaining semantic information image X by residual error connection and fusion _out Comprises the following steps:

wherein the content of the first and second substances,

is X ₁ Sending a feature graph obtained by 1 × 1 convolution through a batch regularization algorithm and a RELU nonlinear activation function; r represents a three-dimensional array image, C is the number of channels, H is the length, W is the width, sigma and

for pixel-level multiplication, up is conventional bilinearInterpolation upsampling, down downsampling, cat representing the concatenation in channel dimension, X _out For outputting a characteristic map, F _3×3 Representing a 3 x 3 convolution batch normalization and a nonlinear activation function.

4. The polyp segmentation method of claim 1, wherein the progressive context fusion model is trained to specifically:

two feature maps X _s And X _l Obtaining a spliced characteristic diagram X after splicing _cat The method comprises the following steps:

X _cat ＝Cat(X _l ,X _s )；

the spliced characteristic diagram X _cat Inputting the channel attention mechanism of global context modeling to obtain channel weight, performing feature fusion on the channel weight and the final semantic information image according to channel dimensions, and outputting a semantic information image y after feature fusion _i The method comprises the following steps:

wherein, T _p = H · W denotes X _cat J represents the lower bound of the summation symbol, β _j Representing global attention pooling weights for context modeling, P = S ₃ RELU(LN(S ₂ (·)) to refer to a bottleneck layerDependent relationships between capture channels, RELU stands for nonlinear activation function, S ₂ And S ₃ Representing the information interaction between the channel dimensions, LN is used as optimizer for LayerNorm.

5. The polyp segmentation method as set forth in claim 1, wherein the inputting the final feature map into the multi-level feature fusion model for training is specifically:

G＝W ₁ (Cat(B(l ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ ))；

inputting G into a channel attention mechanism to obtain channel weight, modeling the channel weight and the final feature mapping graph by pixel level multiplication, and obtaining a polyp segmentation picture Y as follows:

wherein, L = (L) ₀ ,l ₁ ,l ₂ ,l ₃ ,l ₄ )，l ₀ -l ₄ Respectively represent decoded layer feature maps from large resolution to small resolution, B (l) ₁ ,l ₂ ,l ₃ ,l ₄ ),l ₀ Represents l ₁ -l ₄ Upsampling to ₀ Splicing is carried out after the resolution is the same, xi is a correlation coefficient related to G, G is global average pooling, and delta represents an activation function.

6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a polyp segmentation method based on attention-guided context correction according to any one of claims 1 to 5.

7. A computer device, comprising: one or more processors, a memory, and one or more computer programs, the processors and the memory being connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, characterized in that the steps of the attention-directed context correction based polyp segmentation method according to any one of claims 1 to 5 are implemented when the computer programs are executed by the processors.