CN114170167A

CN114170167A - Polyp segmentation method and computer device based on attention-guided context correction

Info

Publication number: CN114170167A
Application number: CN202111434451.2A
Authority: CN
Inventors: 施连焘; 李正国; 王玉峰; 郭玉宝
Original assignee: Shenzhen Polytechnic; University of Science and Technology Liaoning USTL
Current assignee: Shenzhen Polytechnic
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-11
Anticipated expiration: 2041-11-29
Also published as: CN114170167B

Abstract

The invention provides a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium and a computer device, comprising: inputting a polyp picture to be segmented into an enhanced context correction model for training, then performing down-sampling, then performing enhanced context correction model training, then performing down-sampling, and repeatedly obtaining a final semantic information image; inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; performing upsampling on the semantic information image after feature fusion, performing enhanced context correction model training to obtain a feature mapping image, performing upsampling, performing enhanced context correction model training, and repeatedly obtaining a final feature mapping image with the same size as a polyp image channel to be segmented; and inputting the final feature mapping image into a multi-level feature fusion model for training and outputting a polyp segmentation picture. Thereby enabling the present invention to more accurately identify polyps.

Description

Polyp segmentation method and computer device based on attention-guided context correction

Technical Field

The present invention belongs to the medical field, and in particular, to a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium, and a computer device.

Background

Colorectal cancer is a colorectal cancer which is caused by long-term and many reasons when polyps (raised masses in gastrointestinal channels) formed in intestinal tracts are developed at the earliest stage, and if the polyps can be detected and cut off by intervention at the early stage and the colorectal cancer can be prevented, the most effective method for screening and diagnosing the colorectal cancer is colorectal endoscopy which is the most mainstream method for diagnosing the colorectal cancer at present.

However, although the current diagnosis method is advanced and accurate, problems still exist, according to some professional research reports, every four polyps are missed in the process of endoscopy, so that the resection is not clean, hidden dangers are left, in addition, the shapes of the polyps are different and variable, each fine judgment is difficult to be carried out through naked eyes, and especially under the condition that the background difference between some polyps and gastrointestinal channels is not large, finally, the rapid identification cannot be carried out only through human beings, so that a great amount of time and effort are needed, and the judgment needs to be carried out by increasing a great amount of workload of a gastroenterology doctor under the current medical system.

Disclosure of Invention

The invention aims to provide a polyp segmentation method based on attention-guided context correction, a computer-readable storage medium and a computer device, and aims to solve the problem that hidden dangers are left due to incomplete resection caused by omission of endoscopic polyps.

In a first aspect, the present invention provides a polyp segmentation method based on attention-guided context correction, comprising:

acquiring a polyp picture to be segmented;

inputting a polyp picture to be segmented into an enhanced context correction model for training to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the enhanced context correction model again for training and then performing down-sampling, and repeating for multiple times to obtain a final semantic information image; the enhanced context correction model is characterized in that an input polyp picture to be segmented is divided into two feature maps with equal channel quantity by channel dimension, one feature map is subjected to attention mechanism to obtain a first feature map, the other feature map is subjected to feature extraction by depth separable convolution to obtain a second feature map, the first feature map and the second feature map are spliced, and then residual errors are connected and fused to output a semantic information image;

inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; the progressive context fusion model is used for respectively extracting the characteristics of the final semantic information image through a cavity convolution and a conventional convolution to obtain two characteristic graphs, splicing the two characteristic graphs and inputting the spliced characteristic graphs into a channel attention mechanism of context modeling to obtain channel weights, performing characteristic fusion on the channel weights and the final semantic information image according to channel dimensions, and outputting a semantic information image after the characteristic fusion;

performing up-sampling on the semantic information image after feature fusion to obtain an up-sampled semantic information image, performing enhanced context correction model training on the up-sampled semantic information image to obtain a feature mapping image, performing up-sampling on the feature mapping image again, performing enhanced context correction model training, and repeating for multiple times to obtain a final feature mapping image with the same channel size as the polyp image to be segmented;

inputting the final feature mapping image into a multi-level feature fusion model for training, and outputting a polyp segmentation picture; the multi-level feature fusion model is characterized in that the final feature mapping image is up-sampled to the same resolution ratio one by one, the up-sampled feature mapping images are spliced and input into a channel attention mechanism to obtain channel weight, and the channel weight and the final feature mapping output image are modeled by pixel-level multiplication to obtain a polyp segmentation image.

Further, after the upsampling is performed on the semantic information image after the feature fusion, the method further includes: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

Further, the inputting of the polyp picture to be segmented into the enhanced context correction model training specifically includes:

defining a polyp picture X to be segmented_inIs X_in∈R^C×H×WExtracting features from the polyp picture to be segmented by 1X 1 convolution, and outputting two feature maps X with equal channel number₁And X₂，

And

and two feature maps X₁And X₂Obtaining a first profile X by means of an attention mechanism and a depth-separable convolution, respectively_attAnd a second feature map X'₂Namely:

the first characteristic diagram X_attAnd a second characteristic diagram X₂' splicing, and then obtaining semantic information image X by residual connection and fusion_outComprises the following steps:

wherein,

is X₁Sending the feature graph obtained by 1 × 1 convolution through a batch regularization algorithm and a RELU nonlinear activation function; r represents a three-dimensional array image, C is the number of channels, H is the length, W is the width, sigma and

referred to as sigmoid activation function and pixel level summation respectively,

for pixel level multiplication, Up is conventional bilinear interpolation upsampling, Down is downsampling, Cat represents splicing in channel dimension, and X_outTo output a characteristic map, F_3×3Representing a 3 x 3 convolution batch normalization and a nonlinear activation function.

Further, the training of the progressive context fusion model specifically includes:

defining the final semantic information image X 'as X' belonging to R^C×H×WExtracting features by convolution of 1 × 1, and extracting two feature maps X according to conventional convolution and cavity convolution respectively_sAnd X_lNamely:

two kinds of feature maps X_sAnd X_lObtaining a spliced characteristic diagram X after splicing_catThe method comprises the following steps:

X_cat＝Cat(X_l,X_s)；

the spliced characteristic diagram X_catInputting the channel attention mechanism of global context modeling to obtain channel weight, performing feature fusion on the channel weight and the final semantic information image according to channel dimension, and outputting semantic information image y after feature fusion_iThe method comprises the following steps:

wherein, T_pH.W represents X_catJ represents the lower bound of the summation symbol, β_jRepresenting global attention pooling weights for context modeling, P ═ S₃RELU(LN(S₂(.))) refers to a bottleneck layer for capturing the dependencies between channels, RELU stands for a non-linear activation function, S₂And S₃Representing the information interaction between the channel dimensions, LN is used as optimizer for LayerNorm.

Further, the step of inputting the final feature map into the multi-level feature fusion model for training specifically includes:

defining the final characteristic mapping chart L as L epsilon R^C×H×WThe final feature maps are up-sampled to the same resolution one by one and then spliced, and the spliced feature maps are sent into a convolution W of 1 multiplied by 1₁Extracting the features to obtain an extracted feature map G:

G＝W₁(Cat(B(l₁,l₂,l₃,l₄),l₀))；

inputting G into a channel attention mechanism to obtain channel weight, modeling the channel weight and the final feature mapping graph by pixel level multiplication to obtain a polyp segmentation picture Y, wherein the obtained polyp segmentation picture Y comprises the following steps:

wherein, L ═ L₀,l₁,l₂,l₃,l₄)，l₀-l₄Respectively representing decoded layer feature maps from large to small resolution, B (l)₁,l₂,l₃,l₄),l₀Represents l₁-l₄Upsampling to₀Splicing is carried out after the resolution is the same, xi is a correlation coefficient related to G, G is global average pooling, and delta represents an activation function.

In a second aspect, the invention provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when being executed by a processor, performs the steps of the method for polyp segmentation based on attention-guided context correction according to the first aspect.

In a third aspect, the present invention provides a computer device comprising: one or more processors, a memory, and one or more computer programs, the processors and the memory being connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, characterized in that the processor, when executing the computer program, implements the steps of the polyp segmentation method based on attention-directed context correction as described in the first aspect.

In the invention, the operation of downsampling is carried out after the enhanced context correction model is repeatedly trained for many times, deeper semantic information can be obtained, and the interference of background noise can be effectively inhibited; the problem of large-scale polyps in the polyp recognition process is solved through the progressive context fusion model training; through multi-level feature fusion model training, accurate segmentation results can be effectively obtained, and the accuracy of polyp recognition results is improved.

Drawings

Fig. 1 is a flowchart of a polyp segmentation method based on attention-guided context correction according to an embodiment of the present invention.

Fig. 2 is a flowchart of another polyp segmentation method based on attention-directed context correction according to an embodiment of the present invention.

FIG. 3 is a flowchart of enhanced context correction model training according to an embodiment of the present invention.

FIG. 4 is a flowchart of progressive context fusion model training according to an embodiment of the present invention.

Fig. 5 is a flowchart of multi-level feature fusion model training according to an embodiment of the present invention.

Fig. 6 is a block diagram of a specific structure of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, a polyp segmentation method based on attention-directed context correction according to an embodiment of the present invention includes the following steps: it should be noted that the polyp segmentation method based on attention-directed context correction according to the present invention is not limited to the flow sequence shown in fig. 1, if the results are substantially the same.

S1, obtaining a polyp picture to be segmented;

s2, inputting a polyp picture to be segmented into an enhanced context correction model for training to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the enhanced context correction model again for training and then performing down-sampling, and repeating for multiple times to obtain a final semantic information image; the enhanced context correction model is characterized in that an input polyp picture to be segmented is divided into two feature maps with equal channel quantity by channel dimension, one feature map is subjected to attention mechanism to obtain a first feature map, the other feature map is subjected to feature extraction by depth separable convolution to obtain a second feature map, the first feature map and the second feature map are spliced, and then residual errors are connected and fused to output a semantic information image;

s3, inputting the final semantic information image into a progressive context fusion model for training, and outputting a semantic information image with fused features; the progressive context fusion model is used for respectively extracting the characteristics of the final semantic information image through a cavity convolution and a conventional convolution to obtain two characteristic graphs, splicing the two characteristic graphs and inputting the spliced characteristic graphs into a channel attention mechanism of context modeling to obtain channel weights, performing characteristic fusion on the channel weights and the final semantic information image according to channel dimensions, and outputting a semantic information image after the characteristic fusion;

s4, performing up-sampling on the semantic information image subjected to feature fusion to obtain an up-sampled semantic information image, performing enhanced context correction model training on the up-sampled semantic information image to obtain a feature mapping image, performing up-sampling on the feature mapping image again, performing enhanced context correction model training, and repeating for multiple times to obtain a final feature mapping image with the same channel size as the polyp image to be segmented;

s5, inputting the final feature mapping image into a multi-level feature fusion model for training, and outputting a polyp segmentation image; the multi-level feature fusion model is characterized in that the final feature mapping image is up-sampled to the same resolution ratio one by one, the up-sampled feature mapping images are spliced and input into a channel attention mechanism to obtain channel weight, and the channel weight and the final feature mapping output image are modeled by pixel-level multiplication to obtain a polyp segmentation image.

Fig. 2 is a flowchart illustrating a polyp segmentation method based on attention-directed context correction according to an embodiment of the present invention, where Input is an Input picture to be segmented, ECC represents an enhanced context correction model, Down × 2 represents a downsampling operation (with a convolution kernel size of 2), a PCF progressive context fusion model, Up × 2 represents an upsampling operation (with a convolution kernel size of 2), MPA is a multi-level feature fusion model, Output is an Output segmentation result, and skip connection is residual connection.

In an embodiment of the present invention, after the upsampling the semantic information image after the feature fusion, the method further includes: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

In an embodiment of the present invention, the inputting of the polyp picture to be segmented into the enhanced context correction model training specifically includes:

And

and two feature maps X₁And X₂Obtaining a first profile X by means of an attention mechanism and a depth-separable convolution, respectively_attAnd a second characteristic diagram X₂', i.e.:

wherein,

Fig. 3 is a flow chart of enhanced context correction model training, wherein,

a pixel level summation, a pixel level multiplication operation,

the function is activated for the sigmoid and,

in order to perform the down-sampling,

in order to splice in the channel dimension,

is a conventional bilinear interpolated upsampling.

In an embodiment of the present invention, referring to fig. 4, the training of the progressive context fusion model specifically includes:

defining a final semantic information graphLike X 'as X' ∈ R^C×H×WExtracting features by convolution of 1 × 1, and extracting two feature maps X according to conventional convolution and cavity convolution respectively_sAnd X_lNamely:

X_cat＝Cat(X_l,X_s)；

In an embodiment of the present invention, the inputting the final feature map into the multi-level feature fusion model for training specifically includes:

defining the final characteristic mapping chart L as L epsilon R^C×H×WThe final feature maps are up-sampled one by one to the same scoreSplicing again according to the resolution, and sending the spliced feature mapping chart into a convolution W of 1 multiplied by 1₁Extracting the features to obtain an extracted feature map G:

G＝W₁(Cat(B(l₁,l₂,l₃,l₄),l₀))；

Fig. 5 is a flow chart of multi-level fusion model training, wherein,

for the purpose of bi-linear interpolation up-sampling,

is a pixel level multiplication operation.

An embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of a polyp segmentation method based on attention-directed context correction as provided by an embodiment of the present invention.

Fig. 6 shows a specific block diagram of a computer device according to an embodiment of the present invention, where the computer device 100 includes: one or more processors 101, a memory 102, and one or more computer programs, wherein the processors 101 and the memory 102 are connected by a bus, the one or more computer programs being stored in the memory 102 and configured to be executed by the one or more processors 101, the processor 101 implementing the steps of the attention-directed context correction based polyp segmentation method as provided by an embodiment of the invention when executing the computer programs.

The computer equipment comprises a server, a terminal and the like. The computer device may be a desktop computer, a mobile terminal or a vehicle-mounted device, and the mobile terminal includes at least one of a mobile phone, a tablet computer, a personal digital assistant or a wearable device.

In the embodiment of the invention, the operation of downsampling is carried out after the enhanced context correction model is repeatedly trained for many times, so that deeper semantic information can be obtained, and the interference of background noise can be effectively inhibited; the problem of large-scale polyps in the polyp recognition process is solved through the progressive context fusion model training; through multi-level feature fusion model training, accurate segmentation results can be effectively obtained, and the accuracy of polyp recognition results is improved.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A polyp segmentation method based on attention-directed context correction, comprising:

acquiring a polyp picture to be segmented;

2. The polyp segmentation method according to claim 1, wherein after the upsampling of the feature-fused semantic information image, further comprising: and adding a jump link structure during the training of the enhanced context correction model, and complementing the spatial fine granularity of deep semantic information of a decoding layer by using the representation information of a shallow coding layer.

3. The polyp segmentation method as set forth in claim 1, wherein the inputting of the polyp picture to be segmented into the enhanced context correction model training is specifically:

And

the first characteristic diagram X_attAnd the second featureSymbol of figure X'₂Splicing, and obtaining semantic information image X by residual error connection and fusion_outComprises the following steps:

wherein,

is X₁Sending the feature graph obtained by 1 × 1 convolution through a batch regularization algorithm and a RELU nonlinear activation function; r represents a three-dimensional array image, C represents the number of channels, H represents the length, W represents the width, sigma and ^ refer to sigma activation function and pixel level summation respectively,

4. The polyp segmentation method as claimed in claim 1 wherein said progressive context fusion model is trained by:

X_cat＝Cat(X_l,X_s)；

5. The polyp segmentation method as set forth in claim 1, wherein the inputting the final feature map into the multi-level feature fusion model for training is specifically:

G＝W₁(Cat(B(l₁,l₂,l₃,l₄),l₀))；

6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of polyp segmentation based on attention-directed context correction according to any one of claims 1 to 5.

7. A computer device, comprising: one or more processors, a memory, and one or more computer programs, the processors and the memory being connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, characterized in that the steps of the attention-directed context correction based polyp segmentation method according to any one of claims 1 to 5 are implemented when the computer programs are executed by the processors.